machine learning——SVM Linear Classification

引言：

本博文MATLAB中的使用LIBSVM庫來實現一個SVM線性分類的簡單例子。

題目：

這是斯坦福大學的一個課堂習題（順便推薦這個大學的網站了），放上題目鏈接：
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex7/ex7.html
數據在這裏下載：
http://openclassroom.stanford.edu/MainFolder/courses/MachineLearning/exercises/ex7materials/ex7Data.zip

LIBSVM庫安裝：

參考這個博主的博客，寫的真的很好：
https://blog.csdn.net/qq_31781741/article/details/82666861#commentBox

SVM：

我們使用的SVM公式如下，推導和求解的方法都比較複雜，所以這裏只給出公式,具體的SVM不再詳細講，網上有很多精彩的講解

二維分類問題：

（1）首先考慮具有兩個功能的分類問題。使用以下命令將“ twofeature.txt”數據文件加載到Matlab / Octave中：
[ trainlabels , trainfeatures ] = libsvmread ( ’ twofeature . txt ’ );
請注意，此文件是針對LIBSVM格式化的，因此無法使用常規的Matlab / Octave命令加載該文件。(這裏的常規命令是指的load命令，其實我覺得也可以使用常規命令，不貴需要另外的格式處理)
（2）首先對twofeature . txt進行打點，生成的圖像如下所示：
代碼：（注意這裏是MATLAB格式的代碼喔！！）

% Load training features and labels
[y, x] = libsvmread('twofeature.txt');
figure
pos = find(y == 1);
neg = find(y == -1);
plot(x(pos,1), x(pos,2), 'ko', 'MarkerFaceColor', 'b'); hold on;
plot(x(neg,1), x(neg,2), 'ko', 'MarkerFaceColor', 'g')

分離間隙有些明顯。但是，藍色類別在最左邊。現在，我們將研究異常值如何影響SVM決策邊界。

設置C=1

SVM優化問題中的參數C是正成本因素，會懲罰分類錯誤的訓練示例。
首先，我們將使用C = 1運行分類器
model = svmtrain ( trainlabels , trainfeatures , ’−s 0 −t 0 −c 1 ’ );
訓練完成後，“模型”將是包含模型參數的結構。現在，我們可以通過以下代碼獲取w和b：

model = svmtrain(y, x, sprintf('-s 0 -t 0 -c %g', C));
w = model.SVs' * model.sv_coef
b = -model.rho
if (model.Label(1) == -1)
    w = -w; b = -b;
end

一旦有了w和b，就可以使用它們繪製決策邊界。結果如下圖所示。在C = 1的情況下，我們看到異常值是分類錯誤，但決策範圍是合理的：
得到w和b值：

設置C=100

現在，讓我們看看當成本因素高得多時會發生什麼。訓練模型並再次繪製決策邊界，這次將C設置爲100。現在可以正確地分類離羣值，但是決策邊界對於其餘數據似乎不是很自然的選擇：此示例說明了成本代價很大，SVM算法將很難避免錯誤分類。
折衷方案是該算法將較少權重以產生較大的分離餘量
C=100下的w和b：

不同C的對比：

調節C可以調節分類面的Margin，C越大，Margin越小正確率也越高，但是在非線性的分類問題中可能是會出現過擬合的，所以選擇一個合適的C值非常重要。

垃圾郵件分類示例：

現在，讓我們回到上一個練習中的垃圾郵件分類示例。在數據文件夾中，應該有與Naive Bayes練習中看到的相同的4個訓練集，但現在僅格式化爲LIBSVM。它們被命名爲：
a. email train-50.txt (based on 50 email documents)
b. email train-100.txt (100 documents)
c. email train-400.txt (400 documents)
d. email train-all.txt (the complete 700 training documents)

選擇不同的訓練集的規模來做比較，可以得到一個結論當訓練集規模越大的時候，那麼我們預測時的誤差也就越小，下圖分別是50、100、400規模的訓練集時得到的準確度，因爲分類器的特徵值的維度太高，無法畫出分界面來直觀觀看。
以其中一個文件爲例，可以得到輸出如下
50 documents: Accuracy = 75.3846% (196/260)
100 documents: Accuracy = 88.4615% (230/260)
400 documents: Accuracy = 98.0769% (255/260)
the complete 700 training documents: Accuracy = 98.4615% (256/260)

完整代碼：
1.m

% SVM Linear classification
% A 2-feature example

clear all; close all; 

% Load training features and labels
[y, x] = libsvmread('twofeature.txt');

% Set the cost
C1= 1;
C2 = 10;
C3 = 50;
C4 = 100;
% Train the model and get the primal variables w, b from the model
% Libsvm options
% -s 0 : classification
% -t 0 : linear kernel
% -c somenumber : set the cost
model = svmtrain(y, x, sprintf('-s 0 -t 0 -c %g', C1));
w = model.SVs' * model.sv_coef
b = -model.rho
if (model.Label(1) == -1)
    w = -w; b = -b;
end


% Plot the data points
figure
pos = find(y == 1);
neg = find(y == -1);
plot(x(pos,1), x(pos,2), 'ko', 'MarkerFaceColor', 'b'); hold on;
plot(x(neg,1), x(neg,2), 'ko', 'MarkerFaceColor', 'g')

% Plot the decision boundary
plot_x = linspace(min(x(:,1)), max(x(:,1)), 30);
plot_y = (-1/w(2))*(w(1)*plot_x + b);
plot(plot_x, plot_y, 'r-', 'LineWidth', 2)
% Plot the decision boundary2
model = svmtrain(y, x, sprintf('-s 0 -t 0 -c %g', C2));
w = model.SVs' * model.sv_coef
b = -model.rho
if (model.Label(1) == -1)
    w = -w; b = -b;
end
plot_x = linspace(min(x(:,1)), max(x(:,1)), 30);
plot_y = (-1/w(2))*(w(1)*plot_x + b);
plot(plot_x, plot_y, 'b-', 'LineWidth', 2)
% Plot the decision boundary3
model = svmtrain(y, x, sprintf('-s 0 -t 0 -c %g', C3));
w = model.SVs' * model.sv_coef
b = -model.rho
if (model.Label(1) == -1)
    w = -w; b = -b;
end
plot_x = linspace(min(x(:,1)), max(x(:,1)), 30);
plot_y = (-1/w(2))*(w(1)*plot_x + b);
plot(plot_x, plot_y, 'c-', 'LineWidth', 2)
% Plot the decision boundary4
model = svmtrain(y, x, sprintf('-s 0 -t 0 -c %g', C4));
w = model.SVs' * model.sv_coef
b = -model.rho
if (model.Label(1) == -1)
    w = -w; b = -b;
end
plot_x = linspace(min(x(:,1)), max(x(:,1)), 30);
plot_y = (-1/w(2))*(w(1)*plot_x + b);
plot(plot_x, plot_y, 'k-', 'LineWidth', 2)
title(sprintf('SVM Linear Classifier'), 'FontSize', 14)

2.m

% SVM Email text classification

clear all; close all; clc

% Load training features and labels
[train_y, train_x] = libsvmread('email_train-all.txt');

% Train the model and get the primal variables w, b from the model

% Libsvm options
% -t 0 : linear kernel
% Leave other options as their defaults 
model = svmtrain(train_y, train_x, '-t 0');
w = model.SVs' * model.sv_coef;
b = -model.rho;
if (model.Label(1) == -1)
    w = -w; b = -b;
end

% Load testing features and labels
[test_y, test_x] = libsvmread('email_test.txt');

[predicted_label, accuracy, decision_values] = svmpredict(test_y, test_x, model);
% After running svmpredict, the accuracy should be printed to the matlab
% console

其他參考：
https://blog.csdn.net/gyh_420/article/details/77943973（重點）
https://www.cnblogs.com/zx-zhang/p/9972173.html

machine learning——SVM Linear Classification

引言：

題目：

LIBSVM庫安裝：

SVM：

二維分類問題：

設置C=1

設置C=100

垃圾郵件分類示例：

操作系統實驗六、死鎖問題實驗——單車道問題

Python初學系列——蟒蛇繪製及turtle庫的使用

Python爬蟲入門——requests爬取單張圖片/視頻

Python初學系列—字符串

計算機視覺——圖像仿射變換與變形（重採樣與不同線性插值方法比較）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結