machine learning yearning 第十二章

Takeaways: Setting up development and test sets

回顧與總結

  • 選擇一個能夠反饋未來期望的開發集和測試集。而它們可能不同於訓練集
  • Choose dev and test sets from a distribution that reflects what data you expect to get in the future and want to do well on. This may not be the same as your training data’s distribution. 
  • 儘量在同一份數據中選擇開發集和測試集。
  • Choose dev and test sets from the same distribution if possible. 
  • 選擇一個單一的評價指標指導你的團隊優化算法。如果你關心大多數的性能,那麼考慮把它們“結合”成一個指標(如用平均值、方差等)或者定義一個優化指標和多個滿足指標(參見第9章)
  • Choose a single-number evaluation metric for your team to optimize. If there’re multiple goals that you care about, consider combining them into a single formula (such as averaging multiple error metrics) or defining satisficing and optimizing metrics. 
  • 機器學習是一個高度循環的過程:嘗試不同的點子,找到最適合的她。
  • Machine learning is a highly iterative process: You may try many dozens of ideas before finding one that you’re satisfied with. 
  • 擁有開發/測試集和一個單一數值指標可以使你更加快速的評估各種算法,提高開發效率。
  • Having dev/test sets and a single-number evaluation metric helps you quickly evaluate algorithms, and therefore iterate faster. 
  • 開啓一個項目的時候,首先應該做的,就是先建立開發/測試集和評價指標,至少在一週內就得完成,當然,如果項目成熟的話,花多點時間也不成問題。
  • When starting out on a brand new application, try to establish dev/test sets and a metric quickly, say in less than a week. It might be okay to take longer on mature applications. 
  • 在你有很多數據的情況下,傳統的方法:即將數據按7:3分爲開發集和測試集就不再試用了。這時候開發/測試集可以低於30%。
  • The old heuristic of a 70%/30% train/test split does not apply for problems where you have a lot of data; the dev and test sets can be much less than 30% of the data. 
  • 你的開發集必須要大到足夠發現微小的細節的程度,但未必要非常大,這樣才能指導你的團隊按正確的方向前進。你的測試集要大到能夠準確的評估系統在實際中的表現。
  • Your dev set should be large enough to detect meaningful changes in the accuracy of your algorithm, but not necessarily much larger. Your test set should be big enough to give you a confident estimate of the final performance of your system. 
  • 如果你的開發集和評價指標不在起作用——即無法再繼續指引你的團隊了,那麼建議你馬上修改它們:
    • 如果你過度擬合了開發集,那麼就添加更多數據給開發集
    • 若果你實際關心的數據類型與測試集和開發集的數據類型有所不同,建議你更新它們
    • 若果說你的評價指標所評價的不在是你關注的重點,建議你修改指標
  • If ever your dev set and metric are no longer pointing your team in the right direction, quickly change them: (i) If you had overfit the dev set, get more dev set data. (ii) If the actual distribution you care about is different from the dev/test set distribution, get new dev/test set data. (iii) If your metric is no longer measuring what is most important to you, change the metric. 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章