什麼是機器學習?從3個視角談起:學習任務、學習範式、學習模型

Three Perspectives of Machine Learning 機器學習的三個視角

Perspectives Description 描述 典型
Learning Tasks
學習任務
Denoting the general problems that can be solved by machine learning.
表示可以用機器學習解決的基本問題。
分類、聚類、迴歸、排名、密度估計、降維、優化
Learning Paradigms
學習範式
Denoting the typical scenarios that are happened in machine learning.
表示機器學習中發生的典型場景。
有監督學習、無監督學習、強化學習
Learning Models
學習模型
Denoting the approaches that can handle to fulfil a learning task.
表示可以處理完成一個學習任務的方法。
幾何、邏輯、網絡、概率

1 What is Machine Learning 什麼是機器學習

  • Machine learning is a branch of artificial intelligence, is the key to intelligence. 機器學習是人工智能的一個分支,是實現智能化的關鍵。深度學習是機器學習的一個研究領域。
    Its goal is to construct the systems that can learn from data and make predictions on data. 其目標是要構建可以從數據中學習、並對數據進行預測的系統。
  • Wikipedia:
    Machine learning is the study of algorithms and mathematical models that computer systems use to progressively improve their performance on a specific task. 機器學習是計算機系統用來逐步提高其在特定任務上的性能的算法和數學模型的研究。
    Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.機器學習算法建立樣本數據的數學模型,稱爲“訓練數據”,以便進行預測或決策而無需明確地編程以執行任務。
  • Baidu encyclopedia
    Machine learning is a multi-domain interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and so on. 機器學習是一門多領域交叉學科,涉及概率論、統計學、逼近論、凸分析、算法複雜度理論等多門學科
    It focuses on how computers simulate or implement human learning behaviors in order to acquire new knowledge or skills and reorganize existing knowledge structures to improve their performance. 專門研究計算機怎樣模擬或實現人類的學習行爲,以獲取新知識或技能,重新組織已有的知識結構使之不斷改善自身的性能。

Relations to Other Disciplines 與其他學科的關係

在這裏插入圖片描述

Relationship between ML and other Fields 機器學習與各領域之間的關係

Feild Relationship
Pattern Recognition 模式識別 PR ≈ ML. PR from industry, ML from CS. 模式識別≈機器學習。一個源自工業界,一個源自計算機學科。
Computer Vision
計算機視覺
CV=ML+IP. Image Processing provides input for the ML model, ML provide learning algorithms and output the vision result. 計算機視覺=機器學習+圖像處理。圖像處理負責給機器學習模型提供輸入,機器學習負責學習並給出視覺結果。
Data Mining 數據挖掘 DM=ML+DB. Most DM algorithm is the optimization of ML algorithms in DB. 數據挖掘=機器學習+數據庫。大部分數據挖掘算法,是機器學習算法在數據庫中的優化。
Natural Language Processing (NLP) 自然語言處理 NLP=Text Processing+ML. 自然語言處理=文本處理+機器學習。
Statistic Learning 統計學習 SL is a part of ML. SL focus on mathematical theory research,ML focus on practice. 統計學習是機器學習的一種方法。統計學習偏數學理論研究 ,機器學習偏實踐。
Speech Recognition 語音識別 SR=Speech Processing+ML. SR provides input for the ML model. SR and NLP are usually combined to be used. 語音識別=語音處理+機器學習。語音識別負責給機器學習模型提供輸入,一般語音識別和自然語言處理技術結合使用。

數據挖掘、機器學習和統計學習的關係

統計學主要是通過機器學習來對數據挖掘發揮影響,而機器學習和數據庫則是數據挖掘的兩大支撐技術。

  • 統計學習:是其它兩門技術的基礎,更偏重於理論上的完善;
  • 機器學習:是統計學習對實踐技術的延伸,更偏重於解決小數據量的問題提供算法技術支撐;
  • 數據挖掘:更偏重於大數據的實際問題,更注重實際問題的解決,包括真實數據的數據清洗,建模,預測,等操作。
  • 機器學習可以分爲以支持向量機爲代表的統計學習人工神經網絡爲代表的聯結主義學習
  • 統計學習模型參數往往是可解釋的,而人工神經網絡就是一個黑箱。

Artificial Intelligence vs. Machine Learning 人工智能與機器學習

  • Human Learning 人類學習
    Human being acquire skill with experience accumulated from observations.人類是從觀察中積累經驗來獲取技能。
  • Machine Learning 機器學習
    Machine acquire skill with experience accumulated / computed from **data.**機器是從數據中積累或者計算的經驗中獲取技能。
    在這裏插入圖片描述

What is Skill in Machine Learning 什麼是機器學習的技能

  • Skill 技能
    Skill is used to improve some performance measure. (e.g. prediction accuracy)
    技能用於改善某些性能指標。(如:預測精度)
  • Why Use Machine Learning to acquire skill 爲什麼使用機器學習
    Machine learning can improve some performance measure with experience computed from data. 機器學習可以通過從數據中學到的經驗來改善某些性能指標。

Three Key Elements in the Formal Definition 形式化定義的三要素

To have a well-defined learning problem, we must identity those three features:要得到一個明確定義的學習問題,我們必須識別如下三個特性:

(1) Determine the task and collect training data.
(2) Obtain the experience from the data
(3) Give the result according to the experience and evaluate the performance of result.

  • Example1: A handwriting recognition problem 手寫識別問題
    在這裏插入圖片描述
  • Example2: A robot driving problem 機器人駕駛問題
    Task (T):
    driving on public four-lane highways using vision sensors
    使用視覺傳感器在公共四車道高速公路上駕駛
    Experience (E):
    a sequence of images and steering commands recorded while observing a human driver 觀察人類駕駛員時所記錄的一系列圖像和操縱命令。
    Performance §:
    average distance traveled before an error (as judged by human overseer). 出錯之前行駛的平均距離(由人類督察評判)

2 History of Machine Learning

Timeline of Machine Learning 機器學習的大事年表

在這裏插入圖片描述

Three Parties of Machine Learning 機器學習的三個學派

在這裏插入圖片描述

3 Why Different Perspectives

Difficulty in Understanding Machine Learning 理解機器學習的難點

  • How many learning algorithms 有多少種算法
    There are many algorithms for machine learning.有很多機器學習的算法。
    Literally thousands available, and hundreds more published each year.
    大概有數千種,每年又會發表數百種。
  • Which algorithm should we choose 應該選擇哪種算法
    Suppose we have an application that machine learning might be good for, so we need an appropriate algorithm for learning from data.
    假設我們有一個應用,使用機器學習算法會有幫助,因此需要一個合適的算法用以從數據中進行學習。
    The problem we faced is how to choose one of machine learning
    algorithms.我們面臨的問題是如何選擇一個機器學習算法。
  • What is the difficulty 難點是什麼
    Without a category of machine learning, how to determine which
    algorithm could be used?
    若沒有機器學習算法的分類法,如何確定應該使用哪種學習算法?
    The categorization relates our perspective on choosing machine leaning.
    這種學習算法的類別關係到我們選擇機器學習算法的視角。
  • Is one perspective enough 一個視角夠嗎
    To outlook on most of machine learning algorithms, one perspective is so hard.
    要了解大多數機器學習算法,僅有一個視角是不夠的。
    We should look from multiple perspectives to have a full view of machine learning.
    我們應該從多個視角來觀察,使之對機器學習有一個完整的理解。

4 Three Perspectives on Machine Learning

(1)What are Learning Tasks 什麼是學習任務

  • What are Learning Tasks 什麼是學習任務
    The learning tasks are used to denote the general problems that can be solved by learning with desired output.
    學習任務用於表示可以用機器學習解決的基本問題。

  • Why do we need to Study Learning Tasks 爲什麼要研究學習任務
    Various types of problems arising in applications:應用中會產生各種類型的問題:
     computer vision, 計算機視覺,
     pattern recognition, 模式識別,
     natural language processing, 自然語言處理,
     ………

  • Typical Typical Tasks in Machine Learning 機器學習中的典型任務
    在這裏插入圖片描述

  • Case study: Credit scoring 信用評分
    Two classes: Low-risk and high-risk customers. 二分類:低風險和高風險客戶。
    A customer information makes up the input to one of the two classes. 客戶信息使該輸入構成二分類中的一個。
    After training with past data, a classification rule learned may be: 用過去的數據訓練之後,可以學習得到如下分類規則:
    在這裏插入圖片描述
    在這裏插入圖片描述

(2) Learning Paradigms 學習範式

  • What are Learning Paradigms 什麼是學習範式
    The Learning Paradigms are used to denote the typical scenarios that are happened in machine learning.
    學習範式用於表示機器學習中發生的典型場景。
  • How to Distinguish Learning Paradigms 怎樣區分學習範式
    by the scenarios or styles in machine learning about 根據機器學習的典型場景或樣式:
     how it learns from data, 它怎樣從數據中學習,
     how it interactives with environment.它如何同環境互動。
  • Learning Paradigms in Machine Learning 機器學習中的學習範式
    在這裏插入圖片描述

(3) Learning Models 學習模型

  • What are Learning Models 什麼是學習模型
    The learning models are used to denote the approaches that can fulfil a learning task.學習模型用於表示可以完成一個學習任務的方法。
  • Why Study Learning Models 爲什麼要研究學習模型
    The result of machine learning is heavily dependent on the choice of an approach for solving the learning task. 機器學習的效果在很大程度上取決於解決該學習任務時所選用的方法.
  • Typical Models for Machine Learning 機器學習的代表性模型在這裏插入圖片描述
    The Three Perspectives 三個視角
    在這裏插入圖片描述

5 Applications and Terminologies

(1) Application Fields of Machine Learning 機器學習的應用領域

在這裏插入圖片描述
在這裏插入圖片描述

(2) Some Terminologies in Machine Learning 機器學習中的一些術語

  • Samples 樣本
    Items or instances of data used for learning or evaluation.用於學習或評估的數據項或實例。

  • Features 特徵
    The set of attributes, often represented as a vector associated to a sample:屬性集,通常表示爲與樣本相關的向量:
     Handcrafted features: 手工式特徵
    e.g., SIFT, HOG, SURF, LBP, GLOH, LESH, CENTRIST.
     Learned features: e.g., by convolutional neural network.學習式特徵,例如:通過卷積神經網絡。

  • Handcrafted Features 手工式特徵
    HOG(Histogram of Oriented Gradients,定向梯度直方圖)
     Similar to SIFT(Scale-Invariant Feature Transform), but improved accuracy. 與SIFT (尺度不變特徵變換) 類似,但改善了精度。
     By distribution of intensity gradients or edge directions. 按照強度梯度或邊緣方向分佈。
     64×128 detection window. (檢測窗口)

  • Learned Features 學習式特徵
     Humans can learn to see efficiently. Because brains are deep, with many layers of processing. 人類可以有效地學會觀察。因爲大腦是深度的,具有許多處理層次。
     Some algorithms for such deep architectures, can produce features from raw data for visual recognition. 具有這種深度架構的算法,能從原始數據中生成視覺認知的特徵。
     Feature learning also be called representation learning.特徵學習也被稱爲表示學習。
     Understanding deep learning will enable us to build more intelligent machines for visual recognition.
    理解深度學習將使我們能夠構建更智能的視覺認知機器。

  • Labels 標記
     Values or categories assigned to samples. 在樣本上指定的值或類別。
     In classification problems, samples are assigned specific categories.
    分類問題中,樣本被指定特定的類別。
     In regression problems, items are assigned real-valued labels.
    迴歸問題中,項被指定爲實值的標記。

  • Training sample 訓練樣本
     Samples used for training learning algorithm. 用於訓練學習算法的樣本。
     In spam problem, the training sample consist of a set of email samples along with their associated labels.
    對於垃圾郵件問題,訓練樣本由一組郵件樣本以及相關標籤組成。

  • Validation sample 驗證樣本
     Validation samples are the labeled data used to tune the parameters of a learning algorithm.
    驗證樣本是用於調整學習算法參數的、已標註的數據。
     Learning algorithms typically have one or more free parameters, and validation sample is used to select appropriate values for these model parameters.
    學習算法通常具有一個或多個自由參數,因而驗證樣本用於爲這些模型參數選擇適當的值。

  • Test sample 測試樣本
     Samples used to evaluate the performance of a learning algorithm.
    用於評估學習算法性能的樣本。
     These predictions are then compared with the labels of the test sample to measure the performance of the algorithm.
    然後將這些預測與測試樣本的標籤進行比較,以衡量算法的性能。

  • Loss function 損失函數
     To measure the difference, or loss, between a predicted label and a true label. 用於度量預測標籤和真實標籤之間差異或損失。
     Denote the set of all labels as Y and the set of possible predictions as Y’, a loss function L is a mapping: L: Y×Y’→ℝ+
    將所有的標籤集表示爲Y、並且可能的預測集爲Y’,則損失函數L爲映射:

  • Hypothesis set 假設集(即函數集)
     A set of functions mapping features to the set of labels Y. 假設集是將特徵映射爲標籤Y的函數集。
     For example, the following are a set of functions mapping email features to Y: 例如,映射電子郵件特徵的函數集如下:
    Y={spam, non−spam}.

  • Abstraction 抽象
    It involves the translation of data into broader representations. 其含義是將數據轉化爲更廣泛的表示。

  • Generalization 泛化
    It describes the process of turning abstracted knowledge into a form that can be utilized for action. It is also the ability of a learning algorithm to perform accurately on unseen samples after having experienced a learning data set.
    它形容將抽象知識轉化爲可用於動作形式的過程。它也是學習算法具有學習數據集
    的經驗後,可以對未知樣本正確地進行處理的能力。
    在這裏插入圖片描述

(3) Some Notations

在這裏插入圖片描述

Summary

  • Machine learning is to study some algorithms that can learn from and make predictions on data. 機器學習是研究一些可以從數據中學習、並對數據進行預測的算法。
  • The different perspectives are aimed to try to have a taxonomy on the algorithms of machine learning, for being easy to understand machine learning. 幾個不同視角旨在嘗試對機器學習的算法進行分類,以便於理解機器學習。
  • Three perspectives on machine learning are proposed in this chapter, those are learning tasks, Learning Paradigms and learning models. 本章提出了機器學習的三個視角,它們是:學習任務、學習範式以及學習模型。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章