2016.03.30 Supervised learning

原創

2020-06-08 02:14

1.As with full Bayesian inference, MAP Bayesian inference has the advantage of leveraging information that is brought by the prior and cannot be found in the training data. This additional information helps to reduce the variance in the MAP point estimate (in comparison to the ML estimate). However, it does so at the price of increased bias.

MAP估計器相比於MLE估計器能降低估計器的方差，使得估計器分佈更集中，但由於MAP估計引入了偏好，使得估計的偏增加。

2.The power of kernel trick

The kernel trick is powerful for two reasons. First, it allows us to learn models that are nonlinear as a function of x using convex optimization techniques that are guaranteed to converge eﬃciently. This is possible because we consider φ ﬁxed and optimize only α, i.e., the optimization algorithm can view the decision function as being linear in a diﬀerent space. Second, the kernel function k often admits an implementation that is signiﬁcantly more computational eﬃcient than naively constructing two φ(x) vectors and explicitly taking their dot product.

SVM並不是唯一使用kernel trick的算法，有許多算法通過kernel trick從線性算法推廣到非線性算法。所有采用kernel trick的算法統稱kernel methods。

3.Kernel methods的主要缺點

A major drawback to kernel machines is that the cost of evaluating the decision function is linear in the number of training examples, because the i-th example contributes a term αik(x, x(i)) to the decision function. Support vector machines are able to mitigate this by learning an α vector that contains mostly zeros.Classifying a new example then requires evaluating the kernel function only for the training examples that have non-zero αi. These training examples are known as support vectors.

Kernel machines also suﬀer from a high computational cost of training when the dataset is large. We will revisit this idea in Sec. 5.9. Kernel machines with generic kernels struggle to generalize well. We will explain why in Sec. 5.11. The modern incarnation of deep learning was designed to overcome these limitations of kernel machines. The current deep learning renaissance began when Hinton et al.(2006) demonstrated that a neural network could outperform the RBF kernel SVM on the MNIST benchmark.

4.關於K近鄰算法

As a non-parametric learning algorithm,k-nearest neighbor can achieve very high capacity. For example,suppose we have a multiclass classiﬁcation task and measure performance with 0-1 loss. In this setting, 1-nearest neighbor converges to double the Bayes error as the number of training examples approaches inﬁnity. The error in excess of the Bayes error results from choosing a single neighbor by breaking ties between equally distant neighbors randomly. When there is inﬁnite training data, all test points x will have inﬁnitely many training set neighbors at distance zero. If we allow the algorithm to use all of these neighbors to vote, rather than randomly choosing one of them, the procedure converges to the Bayes error rate. The high capacity of k-nearest neighbors allows it to obtain high accuracy given a large training set.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

2016.03.30 Supervised learning

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

STM32學習筆記一—初識STM32

基於TCP的網絡編程

c++學習連載-求數組大小

VC學習筆記一

randn函數

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結