何爲線性迴歸
- 有監督學習 => 學習樣本爲 :
D=(xi,yi)Ni=1 - 輸出/預測的結果yi爲連續值變量
- 需要學習映射ƒ :
χ →y - 假定輸入x和輸出y之間有線性相關關係
測試/預測階段
對於給定的x,預測其輸出
(可以利用最小二乘法對w和b進行估計)
分類
根據自變量個數可以將線性迴歸主要分爲兩種:一元線性迴歸和多元線性迴歸。
一元線性迴歸只有一個自變量,而多元線性迴歸有多個自變量。擬合多元線性迴歸的時候,可以利用多項式迴歸或曲線迴歸。
實例
使用sklearn自帶的房價數據庫上使用線性迴歸,多項式迴歸
線性迴歸
from sklearn import datasets
boston = datasets.load_boston() # 加載房價數據
X = boston.data
y = boston.target
print (X.shape)
print (y.shape)
輸出:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y ,test_size = 1/3.,random_state = 8)
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
# 線性迴歸
lr = LinearRegression(normalize=True,n_jobs=2)
scores = cross_val_score(lr,X_train,y_train,cv=10,scoring='neg_mean_squared_error') #計算均方誤差
print (scores.mean())
lr.fit(X_train,y_train)
lr.score(X_test,y_test)
輸出:
多項式迴歸
from sklearn.preprocessing import PolynomialFeatures
for k in range(1,4):
lr_featurizer = PolynomialFeatures(degree=k) # 用於產生多項式 degree:最高次項的次數
print ('-----%d-----' % k)
X_pf_train = lr_featurizer.fit_transform(X_train)
X_pf_test = lr_featurizer.transform(X_test)
pf_scores = cross_val_score(lr,X_pf_train,y_train,cv=10,scoring='neg_mean_squared_error')
print (pf_scores.mean())
lr.fit(X_pf_train,y_train)
print (lr.score(X_pf_test,y_test))
print (lr.score(X_pf_train,y_train))
輸出:
從上面的結果可以看出,當k=1時,爲線性迴歸;
當k=2時,效果比線性迴歸好一點;
當k=3時,出現過擬合
解決過擬合
Lasson迴歸
# 正則化解決k=3的過擬合現象
lr_featurizer = PolynomialFeatures(degree=3) # 用於產生多項式 degree:最高次項的次數
X_pf_train = lr_featurizer.fit_transform(X_train)
X_pf_test = lr_featurizer.transform(X_test)
# LASSO迴歸:(也叫線性迴歸的L1正則化)
from sklearn.linear_model import Lasso
for a in [i/10000 for i in range(0,6)]:
print ('----%f-----'% a)
lasso = Lasso(alpha=a,normalize=True)
pf_scores = cross_val_score(lasso,X_pf_train,y_train,cv=10,scoring='neg_mean_squared_error')
print (pf_scores.mean())
lasso.fit(X_pf_train,y_train)
print (lasso.score(X_pf_test,y_test))
print (lasso.score(X_pf_train,y_train))
輸出:
從上面的結果可以看出,Lasson正則化處理後,模型的評價會提高很多
嶺迴歸
# 正則化解決k=3的過擬合現象
lr_featurizer = PolynomialFeatures(degree=3) # 用於產生多項式 degree:最高次項的次數
X_pf_train = lr_featurizer.fit_transform(X_train)
X_pf_test = lr_featurizer.transform(X_test)
from sklearn.linear_model import Ridge
# 嶺迴歸
for a in [0,0.005]:
print ('----%f-----'% a)
ridge = Ridge(alpha=a,normalize=True)
pf_scores = cross_val_score(ridge,X_pf_train,y_train,cv=10,scoring='neg_mean_squared_error')
print (pf_scores.mean())
ridge.fit(X_pf_train,y_train)
print (ridge.score(X_pf_test,y_test))
print (ridge.score(X_pf_train,y_train))
輸出:
從上面的結果可以看出,對比alpha=0 和 alpha = 0.0005的情況,發現Ridge正則化處理後,模型的評價會提高很多。