machine learning yearning 第一章

Why Machine Learning Strategy

三軍未動,戰略先行

機器學習是無數應用重要的應用的基礎,包括:搜索引擎、垃圾郵件分類、語音識別和產品推薦等。如果你或者你的團隊正在開發一個機器學習有關的應用,並且你想要藉此提高自己的水平,那麼這本書將會讓你受益匪淺。

Machine learning is the foundation of countless important applications, including web search, email anti-spam, speech recognition, product recommendations, and more. I assume that you or your team is working on a machine learning application, and that you want to make rapid progress. This book will help you do so. 

Example: Building a cat picture startup  示例:創業——喵喵寫真公司

讓我們假定你將要創業,你計劃創建一個能夠給吸貓人提供琳琅滿目的喵喵圖片的公司。你可以用神經網絡算法來建立一個計算機視覺系統,從而在圖片中“找出”貓。

Say you’re building a startup that will provide an endless stream of cat pictures to cat lovers. You use a neural network to build a computer vision system for detecting cats in pictures. 

不過你悲劇地發現你的學習算法的準確性太差,你現在壓力山大,該如何提高喵喵的識別率的?

你的智囊有很多點子,例如:

  1. 添數據:收集喵喵的圖片。
  2. 收集一個多樣的訓練集:例如,貓咪的位置很奇葩的圖片;貓咪的毛色很詭異的圖片;用不同的相機設備拍的圖片等
  3. 精確地訓練算法,在梯度下降算法上採用更小的梯度。
  4. 嘗試更大的神經網絡,添加更多的維度,隱藏單元或參數等。
  5. 嘗試一個更小的神經網絡。
  6. 添加更多的正則表達式(例如L2正則)
  7. 改變一下神經網絡的結構(函數、隱藏單元的數量等等)

如果你在這些方向上選了一條正確的道路,你將會建立一個主流的吸貓平臺,並領導你的公司走向人生巔峯。如果你運氣不好選錯了,你將會浪費幾個月的時間。你怎麼辦呢?

But tragically, your learning algorithm’s accuracy is not yet good enough. You are under tremendous pressure to improve your cat detector. What do you do? 

Your team has a lot of ideas, such as:

  • Get more data: Collect more pictures of cats. 
  • Collect a more diverse training set. For example, pictures of cats in unusual positions; cats with unusual coloration; pictures shot with a variety of camera settings; …. 
  • Train the algorithm longer, by running more gradient descent iterations.
  • Try a bigger neural network, with more layers/hidden units/parameters. 
  • Try a smaller neural network.
  • Try adding regularization (such as L2 regularization). 
  • Change the neural network architecture (activation function, number of hidden units, etc.) • … 

If you choose well among these possible directions, you’ll build the leading cat picture platform, and lead your company to success. If you choose poorly, you might waste months. How do you proceed? 

這本書就是要告訴你遇到這種情況該如何選擇。大多數的機器學習問題會留下許多線索,這些線索將告訴你什麼方法行之有效,什麼方法徒勞無功。關鍵就在與如何解讀這些線索,進而節約你寶貴的科研時間。

This book will tell you how. Most machine learning problems leave clues that tell you what’s useful to try, and what’s not useful to try. Learning to read those clues will save you months or years of development time. 

本人能力有限,如有錯誤歡迎改正,希望不吝賜教。

                                                                                                  ——譯者:wexin_42141390 郵箱:[email protected]

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章