why we need validation set?

典型的數據分類是 訓練集-測試集。這種模式存在的問題是,在反覆的調參中,比如 learning rate 等,我們會反覆使用測試集,導致模型過於在意某些特徵,從而產生過擬合等問題,喪失一般性。

In the figure, “Tweak model” means adjusting anything about the model you can dream up—from changing the learning rate, to adding or removing features, to designing a completely new model from scratch. At the end of this workflow, you pick the model that does best on the test set.

所以我們需要驗證集,將上述模式改爲 訓練集-驗證集-測試集,訓練和驗證都在前兩個集合上完成,最後用測試集做性能測試。

Test sets and validation sets “wear out” with repeated use. That is, the more you use the same data to make decisions about hyperparameter settings or other model improvements, the less confidence you’ll have that these results actually generalize to new, unseen data.

If possible, it’s a good idea to collect more data to “refresh” the test set and validation set. Starting anew is a great reset.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章