文章目錄

Sklearn 支持向量機

Sklearn.svm 中用於分類的 SVM 方法：

svm.LinearSVC: Linear Support Vector Classification.
svm.NuSVC: Nu-Support Vector Classification.
svm.OneClassSVM: Unsupervised Outlier Detection.
svm.SVC: C-Support Vector Classification.

Sklearn.svm 中用於迴歸的 SVM 方法：

svm.LinearSVR: Linear Support Vector Regression.
svm.NuSVR: Nu Support Vector Regression.
svm.SVR: Epsilon-Support Vector Regression.
svm.l1_min_c: Return the lowest bound for C such that for C in (l1_min_C, infinity) the model is guaranteed not to be empty.

可以通過 model.support_vectors_ 查看支持向量。

SVM 對特徵的縮放非常敏感，如下圖所示，在左圖中，垂直刻度比水平刻度大得多，因此可能的分離超平面接近於水平。在特徵縮放後（如使用 Sklearn 的 StandardScaler）後（右圖），決策邊界看起來好看很多。

圖1 特徵縮放前後的分離間隔

常用參數解釋：

$C$ ：懲罰係數，用於近似線性數據中。在近似線性支持向量機中，損失函數由兩部分組成：最大化支持向量間隔的大小以及 $C\times$ 進入分類邊界的數據點的懲罰大小。因此當 $C$ 越大時，對進入邊界的數據懲罰越大，表現爲進入分類邊界的數據越少（分類間隔越小）。 $C$ 值的確定與問題有關，如醫療模型或垃圾郵件分類問題。
$loss$ ：損失函數。線性支持向量機中的目標函數可以分爲兩部分，第一部分爲損失函數，第二部分爲正則化項。默認的損失函數爲合頁損失函數（hinge loss function）
$kernel$ ：非線性支持向量機中的核函數。常用的核函數由：線性核（即變爲線性支持向量機）、多項式核、高斯 RBF 核、Sigmoid 核。
$gamma$ ：高斯核中的參數。 $\gamma = \frac{1}{2\sigma^2}$ ， $\sigma$ 即正態分佈中圖像的橫向寬度，所以 $gamma$ 與 $\sigma$ 呈反比，當 $gamma$ 越大時，正態圖越高瘦； $gamma$ 越小時，正態圖越矮胖。在 SVM 中表現如下：

其截面爲：

因此 gamma 越大，越可能過擬合； gamma 越小，越可能欠擬合。

1. 支持向量機分類

1.1 線性 SVM 分類

sklearn.svm.LinearSVC

參數設置：

C: float, optional (default=1.0)

【懲罰參數，默認爲1，C越大間隔越小，間隔中的實例也越少】

loss: string, ‘hinge’ or ‘squared_hinge’ (default=’squared_hinge’)

【loss 參數應設爲 ‘hinge’ ，因爲它不是默認值】

dual bool, (default=True)

【默認 True除非特徵數量比訓練實例還多，否則應設爲 False】

其他參數見官方文檔。

LinearSVC 類會對偏執項進行正則化，所以需要先減去平均值，使訓練集集中。如果使用 StandardScaler 會自動進行這一步。

LinearSVC() 相當於 SVC(kernel=’linear’) ，但這要慢得多。

import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = (iris["target"] == 2).astype(np.float64)  # Iris-Virginica

svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("linear_svc", LinearSVC(C=1, loss="hinge", random_state=42)),
    ])

svm_clf.fit(X, y)

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('linear_svc',
                 LinearSVC(C=1, class_weight=None, dual=True,
                           fit_intercept=True, intercept_scaling=1,
                           loss='hinge', max_iter=1000, multi_class='ovr',
                           penalty='l2', random_state=42, tol=0.0001,
                           verbose=0))],
         verbose=False)

svm_clf.predict([[5.5, 1.7]])

array([1.])

與 Logistic 迴歸分類器不同的是，SVM 分類器不會輸出每個類別的概率。

1.2 非線性 SVM 分類

雖然在許多情況下，線性 SVM 分類器是有效的，並且通常出人意料的好，但是，有很多數據集是非線性可分的。因此需要非線性支持向量機將數據變成線性可分的，如下圖所示，利用多項式對數據進行變換：

圖2 對非線性數據進行線性變換

要使用 Sklearn 實現這個想法，有兩種方法：第一種是首先使用多項式變換並對特徵進行縮放，接着就可以返回線性 linear_svc 分類器了；第二種是直接使用 SVC 分類器並選定多項式內核。

我們首先來看第一種，使用衛星數據來進行測試一下：

from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

def plot_dataset(X, y, axes):
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^")
    plt.axis(axes)
    plt.grid(True, which='both')
    plt.xlabel(r"$x_1$", fontsize=20)
    plt.ylabel(r"$x_2$", fontsize=20, rotation=0)

plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.show()

from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

polynomial_svm_clf = Pipeline([
        ("poly_features", PolynomialFeatures(degree=3)),
        ("scaler", StandardScaler()),
        ("svm_clf", LinearSVC(C=10, loss="hinge", random_state=42))
    ])

polynomial_svm_clf.fit(X, y)

Pipeline(memory=None,
         steps=[('poly_features',
                 PolynomialFeatures(degree=3, include_bias=True,
                                    interaction_only=False, order='C')),
                ('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('svm_clf',
                 LinearSVC(C=10, class_weight=None, dual=True,
                           fit_intercept=True, intercept_scaling=1,
                           loss='hinge', max_iter=1000, multi_class='ovr',
                           penalty='l2', random_state=42, tol=0.0001,
                           verbose=0))],
         verbose=False)

def plot_predictions(clf, axes):
    x0s = np.linspace(axes[0], axes[1], 100)
    x1s = np.linspace(axes[2], axes[3], 100)
    x0, x1 = np.meshgrid(x0s, x1s)
    X = np.c_[x0.ravel(), x1.ravel()]
    y_pred = clf.predict(X).reshape(x0.shape)
    y_decision = clf.decision_function(X).reshape(x0.shape)
    plt.contourf(x0, x1, y_pred, cmap=plt.cm.brg, alpha=0.2)
    plt.contourf(x0, x1, y_decision, cmap=plt.cm.brg, alpha=0.1)

plot_predictions(polynomial_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])

plt.show()

圖3 使用多項式特徵的線性 LVM 分類器

另外一種方法是使用 SVC 函數實現。

sklearn.svm.SVC

參數設置：

C: float, optional (default=1.0)

Penalty parameter C of the error term.

kernel: string, optional (default=’rbf’)

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).

degree: int, optional (default=3)

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma: {‘scale’, ‘auto’} or float, optional (default=’scale’)

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

if ‘auto’, uses 1 / n_features. Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0: float, optional (default=0.0)

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

【控制模型受高階多項式還是低階多項式影響的程度】

其他參數設置見官方文檔。

尋找正確的超參數值的常用方法是網絡搜索。先進行一次粗略的網絡搜索，然後在最好的值附近展開一輪更精細的網絡搜索，這樣通常會快一些。

1.2.1 多項式內核

使用 SVC(kernel=“poly”, degree=3) 進行非線性多項式內核的 SVM 分類：

from sklearn.svm import SVC
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
    ])
poly_kernel_svm_clf.fit(X, y)

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('svm_clf',
                 SVC(C=5, cache_size=200, class_weight=None, coef0=1,
                     decision_function_shape='ovr', degree=3,
                     gamma='auto_deprecated', kernel='poly', max_iter=-1,
                     probability=False, random_state=None, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False)

poly100_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=10, coef0=100, C=5))
    ])
poly100_kernel_svm_clf.fit(X, y)

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('svm_clf',
                 SVC(C=5, cache_size=200, class_weight=None, coef0=100,
                     decision_function_shape='ovr', degree=10,
                     gamma='auto_deprecated', kernel='poly', max_iter=-1,
                     probability=False, random_state=None, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False)

plt.figure(figsize=(11, 4))

plt.subplot(121)
plot_predictions(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=3, r=1, C=5$", fontsize=18)

plt.subplot(122)
plot_predictions(poly100_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=10, r=100, C=5$", fontsize=18)

plt.show()

圖4 多項式核的 SVM 分類器

1.2.2 高斯 RBF 內核

使用 SVC(kernel=‘rbf’, gamma=5, C=0.001) 對非線性數據進行分類：

from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

rbf_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
    ])
rbf_kernel_svm_clf.fit(X, y)

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('svm_clf',
                 SVC(C=0.001, cache_size=200, class_weight=None, coef0=0.0,
                     decision_function_shape='ovr', degree=3, gamma=5,
                     kernel='rbf', max_iter=-1, probability=False,
                     random_state=None, shrinking=True, tol=0.001,
                     verbose=False))],
         verbose=False)

實現簡單的網絡搜索：

from sklearn.svm import SVC

gamma1, gamma2 = 0.1, 5
C1, C2 = 0.001, 1000
hyperparams = (gamma1, C1), (gamma1, C2), (gamma2, C1), (gamma2, C2)

svm_clfs = []
for gamma, C in hyperparams:
    rbf_kernel_svm_clf = Pipeline([
            ("scaler", StandardScaler()),
            ("svm_clf", SVC(kernel="rbf", gamma=gamma, C=C))
        ])
    rbf_kernel_svm_clf.fit(X, y)
    svm_clfs.append(rbf_kernel_svm_clf)

plt.figure(figsize=(11, 7))

for i, svm_clf in enumerate(svm_clfs):
    plt.subplot(221 + i)
    plot_predictions(svm_clf, [-1.5, 2.5, -1, 1.5])
    plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
    gamma, C = hyperparams[i]
    plt.title(r"$\gamma = {}, C = {}$".format(gamma, C), fontsize=16)

plt.show()

圖5 高斯內核 SVM

2. 支持向量機迴歸

SVM 算法非常全面：它不僅支持線性和非線性分類，而且還支持線性和非線性迴歸。訣竅在於將目標反轉一下：不再是嘗試擬合最大分離間隔，SVM 迴歸要做的是讓儘可能多的實例位於間隔中間，同時限制間隔違例。間隔的寬度受超參數 $\varepsilon$ 控制。

2.1 線性 SVM 迴歸

sklearn.svm.LinearSVR （訓練數據需要先縮放並集中）

參數設置：

epsilon: float, optional (default=0.0)

Epsilon parameter in the epsilon-insensitive loss function. Note that the value of this parameter depends on the scale of the target variable y. If unsure, set epsilon=0.
【間隔寬度】

tol: float, optional (default=1e-4)

Tolerance for stopping criteria.

C: float, optional (default=1.0)

Penalty parameter C of the error term. The penalty is a squared l2 penalty. The bigger this parameter, the less regularization is used.

loss: string, optional (default=’epsilon_insensitive’)

Specifies the loss function. The epsilon-insensitive loss (standard SVR) is the L1 loss, while the squared epsilon-insensitive loss (‘squared_epsilon_insensitive’) is the L2 loss.

dual: bool, (default=True)

Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.

from sklearn.svm import LinearSVR

linear_svm_reg = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_reg", LinearSVR(epsilon=1.5))
    ])
linear_svm_reg.fit(X, y)

下圖顯示了用隨機線性數據訓練的兩個線性 SVM迴歸模型，一個間隔較大（ $\varepsilon=1.5$ ），一個間隔較小（ $\varepsilon=0.5$ ）（訓練數據需要先縮放並集中）。

圖6 SVM 迴歸

繪圖代碼：

np.random.seed(42)
m = 50
X = 2 * np.random.rand(m, 1)
y = (4 + 3 * X + np.random.randn(m, 1)).ravel()

from sklearn.svm import LinearSVR

svm_reg = LinearSVR(epsilon=1.5, random_state=42)
svm_reg.fit(X, y)

LinearSVR(C=1.0, dual=True, epsilon=1.5, fit_intercept=True,
          intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
          random_state=42, tol=0.0001, verbose=0)

svm_reg1 = LinearSVR(epsilon=1.5, random_state=42)
svm_reg2 = LinearSVR(epsilon=0.5, random_state=42)
svm_reg1.fit(X, y)
svm_reg2.fit(X, y)

def find_support_vectors(svm_reg, X, y):
    y_pred = svm_reg.predict(X)
    off_margin = (np.abs(y - y_pred) >= svm_reg.epsilon)
    return np.argwhere(off_margin)

svm_reg1.support_ = find_support_vectors(svm_reg1, X, y)
svm_reg2.support_ = find_support_vectors(svm_reg2, X, y)

eps_x1 = 1
eps_y_pred = svm_reg1.predict([[eps_x1]])

def plot_svm_regression(svm_reg, X, y, axes):
    x1s = np.linspace(axes[0], axes[1], 100).reshape(100, 1)
    y_pred = svm_reg.predict(x1s)
    plt.plot(x1s, y_pred, "k-", linewidth=2, label=r"$\hat{y}$")
    plt.plot(x1s, y_pred + svm_reg.epsilon, "k--")
    plt.plot(x1s, y_pred - svm_reg.epsilon, "k--")
    plt.scatter(X[svm_reg.support_], y[svm_reg.support_], s=180, facecolors='#FFAAAA')
    plt.plot(X, y, "bo")
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.legend(loc="upper left", fontsize=18)
    plt.axis(axes)

plt.figure(figsize=(9, 4))
plt.subplot(121)
plot_svm_regression(svm_reg1, X, y, [0, 2, 3, 11])
plt.title(r"$\epsilon = {}$".format(svm_reg1.epsilon), fontsize=18)
plt.ylabel(r"$y$", fontsize=18, rotation=0)
#plt.plot([eps_x1, eps_x1], [eps_y_pred, eps_y_pred - svm_reg1.epsilon], "k-", linewidth=2)
plt.annotate(
        '', xy=(eps_x1, eps_y_pred), xycoords='data',
        xytext=(eps_x1, eps_y_pred - svm_reg1.epsilon),
        textcoords='data', arrowprops={'arrowstyle': '<->', 'linewidth': 1.5}
    )
plt.text(0.91, 5.6, r"$\epsilon$", fontsize=20)
plt.subplot(122)
plot_svm_regression(svm_reg2, X, y, [0, 2, 3, 11])
plt.title(r"$\epsilon = {}$".format(svm_reg2.epsilon), fontsize=18)
plt.show()

2.2 非線性 SVM 迴歸

sklearn.svm.SVR

參數設置：

kernel: string, optional (default=’rbf’)

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

degree: int, optional (default=3)

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma: {‘scale’, ‘auto’} or float, optional
(default=’scale’)

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

if ‘auto’, uses 1 / n_features.

Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0: float, optional (default=0.0)

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

tol: float, optional (default=1e-3)

Tolerance for stopping criterion.

C: float, optional (default=1.0)

Penalty parameter C of the error term.

epsilon: float, optional (default=0.1)

Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.
【它指定了epsilon-tube，其中訓練損失函數中沒有懲罰與在實際值的距離epsilon內預測的點。】

2.2.1 多項式內核

from sklearn.svm import SVR

svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1, gamma="auto")
svm_poly_reg.fit(X, y)

下面展示了不同懲罰係數（C）下的 SVM 迴歸：

圖7 不同懲罰係數下的 SVM 迴歸

代碼如下：

np.random.seed(42)
m = 100
X = 2 * np.random.rand(m, 1) - 1
y = (0.2 + 0.1 * X + 0.5 * X**2 + np.random.randn(m, 1)/10).ravel()

設置不同的正則化值（C 值）

from sklearn.svm import SVR

svm_poly_reg1 = SVR(kernel="poly", degree=2, C=100, epsilon=0.1, gamma="auto")
svm_poly_reg2 = SVR(kernel="poly", degree=2, C=0.01, epsilon=0.1, gamma="auto")
svm_poly_reg1.fit(X, y)
svm_poly_reg2.fit(X, y)

SVR(C=0.01, cache_size=200, coef0=0.0, degree=2, epsilon=0.1, gamma='auto',
    kernel='poly', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

import matplotlib.pyplot as plt
def plot_svm_regression(svm_reg, X, y, axes):
    x1s = np.linspace(axes[0], axes[1], 100).reshape(100, 1)
    y_pred = svm_reg.predict(x1s)
    plt.plot(x1s, y_pred, "k-", linewidth=2, label=r"$\hat{y}$")
    plt.plot(x1s, y_pred + svm_reg.epsilon, "k--")
    plt.plot(x1s, y_pred - svm_reg.epsilon, "k--")
    plt.scatter(X[svm_reg.support_], y[svm_reg.support_], s=180, facecolors='#FFAAAA')
    plt.plot(X, y, "bo")
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.legend(loc="upper left", fontsize=18)
    plt.axis(axes)

plt.figure(figsize=(9, 4))
plt.subplot(121)
plot_svm_regression(svm_poly_reg1, X, y, [-1, 1, 0, 1])
plt.title(r"$degree={}, C={}, \epsilon = {}$".format(svm_poly_reg1.degree, svm_poly_reg1.C, svm_poly_reg1.epsilon), fontsize=18)
plt.ylabel(r"$y$", fontsize=18, rotation=0)
plt.subplot(122)
plot_svm_regression(svm_poly_reg2, X, y, [-1, 1, 0, 1])
plt.title(r"$degree={}, C={}, \epsilon = {}$".format(svm_poly_reg2.degree, svm_poly_reg2.C, svm_poly_reg2.epsilon), fontsize=18)
plt.show()

參考資料

[1] Aurelien Geron, 王靜源, 賈瑋, 邊蕤, 邱俊濤. 機器學習實戰：基於 Scikit-Learn 和 TensorFlow[M]. 北京: 機械工業出版社, 2018: 136-144.

監督學習 | SVM 之支持向量機Sklearn實現

文章目錄

Sklearn 支持向量機

1. 支持向量機分類

1.1 線性 SVM 分類

1.2 非線性 SVM 分類

1.2.1 多項式內核

1.2.2 高斯 RBF 內核

2. 支持向量機迴歸

2.1 線性 SVM 迴歸

2.2 非線性 SVM 迴歸

2.2.1 多項式內核

參考資料

杭州的 IT 崩盤了麼？

開源高性能結構化日誌模塊NanoLog

Python 潮流週刊#55：分享 9 個高質量的技術類信息源！

Azure Virtual Network (22) 多訂閱使用Azure DNS解析問題 Windows Azure Platform 系列文章目錄

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

機器學習 | 目錄（持續更新）

無監督學習 | GMM 高斯混合聚類原理及Sklearn實現

無監督學習 | KMeans與KMeans++原理

無監督學習 | DBSCAN 原理及Sklearn實現

SQLite | SQLite 與 Pandas 比較篇之一

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結