


Feature Description Example
Id 唯一標識 “1” … “5000”
Age 客戶年齡
Job 客戶的工作 “admin.”, “blue-collar”, etc.
Marital 客戶的婚姻狀態 “divorced”, “married”, “single”
Education 客戶的學歷層次 “primary”, “secondary”, etc.
Default 是否有過信用違約 “yes” - 1,“no” - 0
Balance 年平均餘額(美元)
HHInsurance 是否有家庭保險 “yes” - 1,“no” - 0
CarLoan 是否有汽車貸款 “yes” - 1,“no” - 0
Communication 聯繫人通訊類型 “cellular”, “telephone”, “NA”
LastContactMonth 上次聯繫在哪一月 “jan”, “feb”, etc.
LastContactDay 上次聯繫在哪一天
CallStart 上次通話的開始時間 (HH:MM:SS) 12:43:15
CallEnd 上次通話的結束時間 (HH:MM:SS) 12:43:15
NoOfContacts 在此廣告系列中爲此客戶執行的聯繫數量
DaysPassed 上次聯繫客戶後經過的天數, -1表示還沒有聯繫過
PrevAttempts 此廣告系列之前爲此客戶執行的聯繫數量
Outcome 先前營銷活動的結果 “failure”, “other”, “success”, “NA”
CarInsurance 客戶是否購買汽車保險 “yes” - 1,“no” - 0


數據整理是將數據從一種形式轉換爲另一種形式以更好地理解它的過程。 在本例中,我們的數據以CSV文件的形式提供給我們,讓我們使用功能強大的python數據科學庫將其加載到數據框中。 好吧,我從未想過它看起來會如此簡單!

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
%matplotlib inline
from sklearn.model_selection import train_test_split,cross_val_score,KFold,cross_val_predict
from sklearn.metrics import accuracy_score, classification_report, precision_score, recall_score,confusion_matrix,precision_recall_curve,roc_curve
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import ExtraTreesClassifier,RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier
from sklearn.neighbors  import KNeighborsClassifier
from sklearn import svm,tree
df = pd.read_csv('../data/carInsurance_train.csv',index_col = 'Id')
Age Job Marital Education Default Balance HHInsurance CarLoan Communication LastContactDay LastContactMonth NoOfContacts DaysPassed PrevAttempts Outcome CallStart CallEnd CarInsurance
1 32 management single tertiary 0 1218 1 0 telephone 28 jan 2 -1 0 NaN 13:45:20 13:46:30 0
2 32 blue-collar married primary 0 1156 1 0 NaN 26 may 5 -1 0 NaN 14:49:03 14:52:08 0
3 29 management single tertiary 0 637 1 0 cellular 3 jun 1 119 1 failure 16:30:24 16:36:04 1
4 25 student single primary 0 373 1 0 cellular 11 may 2 -1 0 NaN 12:06:43 12:20:22 1
5 30 management married tertiary 0 2694 0 0 cellular 3 jun 1 -1 0 NaN 14:35:44 14:38:56 0
(4000, 18)
Index(['Age', 'Job', 'Marital', 'Education', 'Default', 'Balance',
       'HHInsurance', 'CarLoan', 'Communication', 'LastContactDay',
       'LastContactMonth', 'NoOfContacts', 'DaysPassed', 'PrevAttempts',
       'Outcome', 'CallStart', 'CallEnd', 'CarInsurance'],
Age Default Balance HHInsurance CarLoan LastContactDay NoOfContacts DaysPassed PrevAttempts CarInsurance
count 4000.000000 4000.000000 4000.000000 4000.00000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000 4000.000000
mean 41.214750 0.014500 1532.937250 0.49275 0.133000 15.721250 2.607250 48.706500 0.717500 0.401000
std 11.550194 0.119555 3511.452489 0.50001 0.339617 8.425307 3.064204 106.685385 2.078647 0.490162
min 18.000000 0.000000 -3058.000000 0.00000 0.000000 1.000000 1.000000 -1.000000 0.000000 0.000000
25% 32.000000 0.000000 111.000000 0.00000 0.000000 8.000000 1.000000 -1.000000 0.000000 0.000000
50% 39.000000 0.000000 551.500000 0.00000 0.000000 16.000000 2.000000 -1.000000 0.000000 0.000000
75% 49.000000 0.000000 1619.000000 1.00000 0.000000 22.000000 3.000000 -1.000000 0.000000 1.000000
max 95.000000 1.000000 98417.000000 1.00000 1.000000 31.000000 43.000000 854.000000 58.000000 1.000000
Age                  int64
Job                 object
Marital             object
Education           object
Default              int64
Balance              int64
HHInsurance          int64
CarLoan              int64
Communication       object
LastContactDay       int64
LastContactMonth    object
NoOfContacts         int64
DaysPassed           int64
PrevAttempts         int64
Outcome             object
CallStart           object
CallEnd             object
CarInsurance         int64
dtype: object


Job Marital Education Communication LastContactMonth Outcome CallStart CallEnd
count 3981 4000 3831 3098 4000 958 4000 4000
unique 11 3 3 2 12 3 3777 3764
top management married secondary cellular may failure 15:27:56 10:22:30
freq 893 2304 1988 2831 1049 437 3 3







df[df['Balance'] == 98417]
Age Job Marital Education Default Balance HHInsurance CarLoan Communication LastContactDay LastContactMonth NoOfContacts DaysPassed PrevAttempts Outcome CallStart CallEnd CarInsurance
1743 59 management married tertiary 0 98417 0 0 telephone 20 nov 5 -1 0 NaN 10:51:42 10:54:07 0
df_new = df.drop(df.index[1742]);


缺失值是數據分析的主要問題,處理它們是另一個障礙。 Python將丟失的數據視爲NaN,但不將其包括在計算和可視化中。 同樣,如果不處理缺失值就無法建立預測模型。 在我們的情況下,缺失值主要發生在Outcome和Communication字段中。 Job和Education也具有一定量的缺失值。

像Job和Education這樣的缺失值非常少,可以使用python中的backfill / frontfill pad方法估算。結果和Communication缺失值很多,因此對於NaN值使用None估算。

fillna: https://blog.csdn.net/weixin_39549734/article/details/81221276

Age                    0
Job                   19
Marital                0
Education            169
Default                0
Balance                0
HHInsurance            0
CarLoan                0
Communication        902
LastContactDay         0
LastContactMonth       0
NoOfContacts           0
DaysPassed             0
PrevAttempts           0
Outcome             3041
CallStart              0
CallEnd                0
CarInsurance           0
dtype: int64
#method ='pad'用前一個非缺失值去填充該缺失值

df_new['Job'] = df_new['Job'].fillna(method ='pad')
df_new['Education'] = df_new['Education'].fillna(method ='pad')
df_new['Communication'] = df_new['Communication'].fillna('none')
df_new['Outcome'] = df_new['Outcome'].fillna('none')
none       3041
failure     437
success     326
other       195
Name: Outcome, dtype: int64


Age                 0
Job                 0
Marital             0
Education           0
Default             0
Balance             0
HHInsurance         0
CarLoan             0
Communication       0
LastContactDay      0
LastContactMonth    0
NoOfContacts        0
DaysPassed          0
PrevAttempts        0
Outcome             0
CallStart           0
CallEnd             0
CarInsurance        0
dtype: int64


相關用於確定兩個變量/字段之間的關係。 相關性從-1到1不等; 如果“相關”爲1,則字段爲正相關,“ 0”沒有相關,而“ -1”爲負相關。 讓我們看看使用Heatmap時每個屬性如何相互關聯。 變量之間似乎沒有太多的相關性,但是DaysPassed和PrevAttempts之間具有正相關。

corr = df_new.corr()
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
f, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
sns.heatmap(corr,annot=True, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5});



可視化是數據科學的一個重要方面,沒有它就很難輕易地得出結果。儘管結果在表中是確定的,但是查看細節並得出結論是一個痛點。圖表/圖形對非技術人員輕鬆完成這些任務非常有幫助。高管人員和經理們喜歡以可視化的方式查看報告,以便他們可以輕鬆地制定複雜的決策。下面是一個配對圖,可以將感興趣的字段配對並繪製出來。 Pairplot的變量是從熱圖中選擇的,這些變量會影響結果

** Pairplot的關鍵要點**

df_sub = ['Age','Balance','HHInsurance', 'CarLoan','NoOfContacts','DaysPassed','PrevAttempts','CarInsurance']  #這裏都是數值變量
sns.pairplot(df_new[df_sub],hue='CarInsurance',size=1.5);   #注意這裏df_sub包含因變量CarInsurance



g = sns.PairGrid(df_new,
                 x_vars=["Education","Marital", "Job"],
                 y_vars=["CarInsurance", "Balance"],
                 aspect=.75, size=6)
g.map(sns.barplot, palette="pastel");








特徵工程是機器學習問題的基本要素。 在我們的問題中,有一系列連續變量,例如Age和Balance,需要將它們進行裝箱。 使用四分位數剪切功能將“年齡”和“平衡”連續變量分類爲5個部分。


df_new['AgeBinned'] = pd.qcut(df_new['Age'], 5 , labels = False)
df_new['BalanceBinned'] = pd.qcut(df_new['Balance'], 5,labels = False)

關於CallStart和CallEnd屬性似乎存在一個獨特的問題,它們記錄爲可以使用datetime函數輕鬆計算的對象變量,因此將其轉換爲datetime函數並減去它們會得出實際的CallTime,可以對其進一步進行分箱 如上。

df_new['CallStart'] = pd.to_datetime(df_new['CallStart'] )
df_new['CallEnd'] = pd.to_datetime(df_new['CallEnd'] )

df_new['CallTime'] = (df_new['CallEnd'] - df_new['CallStart']).dt.total_seconds()

df_new['CallTimeBinned'] = pd.qcut(df_new['CallTime'], 5,labels = False)
df_new.drop(['Age','Balance','CallStart','CallEnd','CallTime'],axis = 1,inplace = True)



Job = pd.get_dummies(data = df_new['Job'],prefix = "Job")
Marital= pd.get_dummies(data = df_new['Marital'],prefix = "Marital")
Education= pd.get_dummies(data = df_new['Education'],prefix="Education")
Communication = pd.get_dummies(data = df_new['Communication'],prefix = "Communication")
LastContactMonth = pd.get_dummies(data = df_new['LastContactMonth'],prefix= "LastContactMonth")
Outcome = pd.get_dummies(data = df_new['Outcome'],prefix = "Outcome")
df = pd.concat([df_new,Job,Marital,Education,Communication,LastContactMonth,Outcome],axis=1)
Index(['Default', 'HHInsurance', 'CarLoan', 'LastContactDay', 'NoOfContacts',
       'DaysPassed', 'PrevAttempts', 'CarInsurance', 'AgeBinned',
       'BalanceBinned', 'CallTimeBinned', 'Job_admin.', 'Job_blue-collar',
       'Job_entrepreneur', 'Job_housemaid', 'Job_management', 'Job_retired',
       'Job_self-employed', 'Job_services', 'Job_student', 'Job_technician',
       'Job_unemployed', 'Marital_divorced', 'Marital_married',
       'Marital_single', 'Education_primary', 'Education_secondary',
       'Education_tertiary', 'Communication_cellular', 'Communication_none',
       'Communication_telephone', 'LastContactMonth_apr',
       'LastContactMonth_aug', 'LastContactMonth_dec', 'LastContactMonth_feb',
       'LastContactMonth_jan', 'LastContactMonth_jul', 'LastContactMonth_jun',
       'LastContactMonth_mar', 'LastContactMonth_may', 'LastContactMonth_nov',
       'LastContactMonth_oct', 'LastContactMonth_sep', 'Outcome_failure',
       'Outcome_none', 'Outcome_other', 'Outcome_success'],



X= df.drop(['CarInsurance'],axis=1).values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,random_state=42, stratify = y) #將stratify=y就是按照y中的比例分配 


**預測模型** sklearn中集成了很多分類預測算法,在我們的案例中,我們利用了與問題相關的大多數分類算法。 我們的分類器包括 1. kNN 2. Logistic Regression 3. SVM 4. Decision Tree 5. Random Forest 6. AdaBoost 7. XGBoost **交叉驗證**

交叉驗證用於將數據分爲訓練集和測試集,以評估模型的性能。 在KFold中,K確定要在數據上進行劃分的數目,並從中使用1個樣本進行訓練,而在我們的案例中,將10-1作爲樣本用於驗證。 每個模型的交叉驗證得分是通過將模型分爲10折來評估的。

最好的模型是** Random Forest XGBoost **,它們都以良好的準確性得分很好地完成了自己的任務。

def plot_confusion_matrix(cm, classes,
                          title='Confusion matrix',
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 color="white" if cm[i, j] > thresh else "black")
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

class_names = ['Success','Failure']
knn = KNeighborsClassifier(n_neighbors = 6)
print ("kNN Accuracy is %2.2f" % accuracy_score(y_test, knn.predict(X_test)))

score_knn = cross_val_score(knn, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_knn)
y_pred= knn.predict(X_test)
print(classification_report(y_test, y_pred))

cm = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm, classes=class_names, title='Confusion matrix')
kNN Accuracy is 0.76
Cross Validation Score = 0.75
              precision    recall  f1-score   support

           0       0.75      0.90      0.82       479
           1       0.78      0.55      0.65       321

    accuracy                           0.76       800
   macro avg       0.76      0.72      0.73       800
weighted avg       0.76      0.76      0.75       800


#Logistic Regression Classifier
LR = LogisticRegression()
print ("Logistic Accuracy is %2.2f" % accuracy_score(y_test, LR.predict(X_test)))
score_LR = cross_val_score(LR, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_LR)
y_pred = LR.predict(X_test)
print(classification_report(y_test, y_pred))
# Confusion matrix for LR
cm = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm, classes=class_names, title='Confusion matrix')
Logistic Accuracy is 0.83
Cross Validation Score = 0.81
              precision    recall  f1-score   support

           0       0.85      0.87      0.86       479
           1       0.80      0.78      0.79       321

    accuracy                           0.83       800
   macro avg       0.82      0.82      0.82       800
weighted avg       0.83      0.83      0.83       800


SVM = svm.SVC()
SVM.fit(X_train, y_train)
print ("SVM Accuracy is %2.2f" % accuracy_score(y_test, SVM.predict(X_test)))
score_svm = cross_val_score(SVM, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_svm)
y_pred = SVM.predict(X_test)
cm = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm, classes=class_names, title='Confusion matrix')
SVM Accuracy is 0.67
Cross Validation Score = 0.66
              precision    recall  f1-score   support

           0       0.66      0.91      0.77       479
           1       0.70      0.31      0.43       321

    accuracy                           0.67       800
   macro avg       0.68      0.61      0.60       800
weighted avg       0.68      0.67      0.63       800


# Decision Tree Classifier
DT = tree.DecisionTreeClassifier(random_state = 0,class_weight="balanced",
DT = DT.fit(X_train,y_train)
print ("Decision Tree Accuracy is %2.2f" % accuracy_score(y_test, DT.predict(X_test)))
score_DT = cross_val_score(DT, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_DT)
y_pred = DT.predict(X_test)
print(classification_report(y_test, y_pred))
# Confusion Matrix for Decision Tree
cm = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm, classes=class_names, title='Confusion matrix')
Decision Tree Accuracy is 0.82
Cross Validation Score = 0.81
              precision    recall  f1-score   support

           0       0.88      0.81      0.84       479
           1       0.74      0.83      0.79       321

    accuracy                           0.82       800
   macro avg       0.81      0.82      0.81       800
weighted avg       0.82      0.82      0.82       800


#Random Forest Classifier
rfc = RandomForestClassifier(n_estimators=1000, max_depth=None, min_samples_split=10,class_weight="balanced")
rfc.fit(X_train, y_train)
print ("Random Forest Accuracy is %2.2f" % accuracy_score(y_test, rfc.predict(X_test)))
score_rfc = cross_val_score(rfc, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_rfc)
y_pred = rfc.predict(X_test)
print(classification_report(y_test,y_pred ))
#Confusion Matrix for Random Forest
cm = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm, classes=class_names, title='Confusion matrix')
Random Forest Accuracy is 0.86
Cross Validation Score = 0.84
              precision    recall  f1-score   support

           0       0.90      0.86      0.88       479
           1       0.80      0.86      0.83       321

    accuracy                           0.86       800
   macro avg       0.85      0.86      0.85       800
weighted avg       0.86      0.86      0.86       800


#AdaBoost Classifier
ada = AdaBoostClassifier(n_estimators=400, learning_rate=0.1)
print ("AdaBoost Accuracy= %2.2f" % accuracy_score(y_test,ada.predict(X_test)))
score_ada = cross_val_score(ada, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_ada)
y_pred = ada.predict(X_test)
print(classification_report(y_test,y_pred ))
#Confusion Marix for AdaBoost
cm = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm, classes=class_names, title='Confusion matrix')
AdaBoost Accuracy= 0.83
Cross Validation Score = 0.82
              precision    recall  f1-score   support

           0       0.83      0.90      0.86       479
           1       0.82      0.73      0.77       321

    accuracy                           0.83       800
   macro avg       0.83      0.81      0.82       800
weighted avg       0.83      0.83      0.83       800


#XGBoost Classifier
xgb = GradientBoostingClassifier(n_estimators=1000,learning_rate=0.01)
print ("GradientBoost Accuracy= %2.2f" % accuracy_score(y_test,xgb.predict(X_test)))
score_xgb = cross_val_score(xgb, X, y, cv=10).mean()
print("Cross Validation Score = %2.2f" % score_ada)
y_pred = xgb.predict(X_test) 
#Confusion Matrix for XGBoost Classifier
cm_xg = confusion_matrix(y_test,y_pred)
plot_confusion_matrix(cm_xg, classes=class_names, title='Confusion matrix')
GradientBoost Accuracy= 0.85
Cross Validation Score = 0.82
              precision    recall  f1-score   support

           0       0.87      0.89      0.88       479
           1       0.82      0.79      0.81       321

    accuracy                           0.85       800
   macro avg       0.84      0.84      0.84       800
weighted avg       0.85      0.85      0.85       800



ROC繪製了所有模型,並向左上方繪製了Gradient Boosting(XGBoost)和Randomforest的對應曲線,表明這些預測器模型是最好的

ROC: https://blog.csdn.net/kMD8d5R/article/details/98552574?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.nonecase

#Obtaining False Positive Rate, True Positive Rate and Threshold for all classifiers
fpr, tpr, thresholds = roc_curve(y_test, knn.predict_proba(X_test)[:,1])
LR_fpr, LR_tpr, thresholds = roc_curve(y_test, LR.predict_proba(X_test)[:,1])
#SVM_fpr, SVM_tpr, thresholds = roc_curve(y_test, SVM.predict_proba(X_test)[:,1])
DT_fpr, DT_tpr, thresholds = roc_curve(y_test, DT.predict_proba(X_test)[:,1])
rfc_fpr, rfc_tpr, thresholds = roc_curve(y_test, rfc.predict_proba(X_test)[:,1])
ada_fpr, ada_tpr, thresholds = roc_curve(y_test, ada.predict_proba(X_test)[:,1])
xgb_fpr, xgb_tpr, thresholds = roc_curve(y_test, xgb.predict_proba(X_test)[:,1])
#PLotting ROC Curves for all classifiers
plt.plot(fpr, tpr, label='KNN' )
plt.plot(LR_fpr, LR_tpr, label='Logistic Regression')
#plt.plot(SVM_fpr, SVM_tpr, label='SVM')
plt.plot(DT_fpr, DT_tpr, label='Decision Tree')
plt.plot(rfc_fpr, rfc_tpr, label='Random Forest')
plt.plot(ada_fpr, ada_tpr, label='AdaBoost')
plt.plot(xgb_fpr, xgb_tpr, label='GradientBoosting')
# Plot Base Rate ROC
plt.plot([0,1],[0,1],label='Base Rate')

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Graph')
plt.legend(loc="lower right")



重要特徵識別是通過使用諸如Logistic迴歸和決策樹之類的模型完成的。 兩者在識別特徵時都非常清晰。 下圖顯示了ExtraTreesClassifier確定的最重要變量,而前10個變量是

  1. CallTime
  2. LastContactDay
  3. Balance
  4. NoofContacts
  5. Outcome_success
  6. Age
  7. HHInsurance
  8. Communication_none
  9. Dayspassed
  10. Outcome_none
modell = LogisticRegression()
rfe = RFE(modell, 5)
rfe = rfe.fit(X_train,y_train)
# 顯示變量等級排序
array([10,  9, 16, 41, 34, 42, 32, 36, 33,  3, 24, 15, 17, 27, 28, 20, 22,
       29,  2, 40, 26, 38, 21, 37, 31, 30, 25, 19,  1, 18, 39,  7, 11, 35,
        4,  5, 23,  1,  6,  8,  1,  1, 13, 12, 14,  1])
model = ExtraTreesClassifier()
model.fit(X_train, y_train)

importances = model.feature_importances_
feat_names = df.drop(['CarInsurance'],axis=1).columns

indices = np.argsort(importances)[::-1]
plt.title("Feature importances")
plt.bar(range(len(indices)), importances[indices], color='lightblue',  align="center")
plt.step(range(len(indices)), np.cumsum(importances[indices]), where='mid', label='Cumulative')
plt.xticks(range(len(indices)), feat_names[indices], rotation='vertical',fontsize=14)
plt.xlim([-1, len(indices)])
[0.00268756 0.0313893  0.01658237 0.06520325 0.04730283 0.01636741
 0.01347456 0.04461972 0.04937946 0.25765298 0.01216092 0.01304826
 0.00579582 0.00493788 0.01324779 0.0097079  0.00640912 0.0091661
 0.00601809 0.01457932 0.00655431 0.01135298 0.01605904 0.01331409
 0.00989652 0.01628276 0.01403667 0.01703035 0.02387476 0.00606056
 0.01792573 0.01598123 0.00329203 0.0090054  0.00783392 0.01389945
 0.01580459 0.01121934 0.01654914 0.00965812 0.01079382 0.00951623
 0.00932224 0.02123694 0.00591377 0.04785536]


rfc = RandomForestClassifier(n_estimators=1000, max_depth=None, min_samples_split=10,class_weight="balanced")
y_proba = cross_val_predict(rfc, X, y, cv=10, n_jobs=-1, method='predict_proba')
results = pd.DataFrame({'y': y, 'y_proba': y_proba[:,1]})
results = results.sort_values(by='y_proba', ascending=False).reset_index(drop=True)
results.index = results.index + 1
results.index = results.index / len(results.index) * 100
pred = results
pred['Lift Curve'] = pred.y.cumsum() / pred.y.sum() * 100
pred['Baseline'] = pred.index
base_rate = y.sum() / len(y) * 100
pred[['Lift Curve', 'Baseline']].plot(style=['-', '--', '--'])
pd.Series(data=[0, 100, 100], index=[0, base_rate, 100]).plot(style='--')
plt.title('Cumulative Gains')
plt.xlabel('% of Customers Contacted')
plt.ylabel("% of Positive Results")
plt.legend(['Lift Curve', 'Baseline', 'Ideal']);



** 1。 培訓在呼叫中心工作的員工的人際交往能力,使他們在通話中變得更加友好和參與**

** 2。 保持跟蹤器的作用,以提醒後續行動,以便代表可以再次與該人交談並嘗試說服

** 3。 選擇具有良好信用評分和帳戶餘額的人,以便在他們身上花費的時間是有用的**

** 4。 專注於40歲以上的老年人,因爲根據以前的數據,很容易折衷爲新計劃**

** 5。 上一個廣告系列中的聯絡人做出了迴應,因爲他們更有可能購買保險**

