[胡搞]某煉丹比賽幾小時現學的python+ml效果,時間:2018/8/14(第一次寫python很狼狽,只能說吃好喝好)

#from sklearn.cross_validation import train_test_split
#rtrain, rtest = train_test_split(rentDf, test_size = 0.3)
#strain, stest = train_test_split(soldDf, test_size = 0.3)
strain[strain.bedroom_cnt > 3].bedroom_cnt = 4;
rtrain[rtrain.build_area > 0].build_area = rtrain.build_area * 2/3 + rtrain.insize_area * 1/3
strain[strain.build_area > 0].build_area = strain.build_area * 1/2 + strain.inside_area * 1/2
strain[strain.total_price > 0].total_price = strain.total_price / strain.build_area + rtrain.bedroom * 4/rtrain
rtrain[rtrain.total_price > 0].total_price = rtrain.total_price / rtrain.total_price + rtrain.bedroom_cnt * 1/2



#rtest.to_csv('/home/qushanzu/桌面/jisuanzhidao/a.csv', sep=',', header=True, index=True)
#df = df.groupby(by=['column_A'])['column_B'].sum
i = 0
for data in ref.index:
    if(data['total_price'] < 0)ans[i]=0.1
    i = i+1

from sklearn import tree
clf = tree.DecisionTreeClassifier(criterion='entropy')
print(clf)
clf.fit(x_train, y_train)
 
''''' 把決策樹結構寫入文件 '''
with open("tree.dot", 'w') as f:
  f = tree.export_graphviz(clf, out_file=f)
    
''''' 係數反映每個特徵的影響力。越大表示該特徵在分類中起到的作用越大 '''
print(clf.feature_importances_)
 
'''''測試結果的打印'''
answer = clf.predict(x_train)
print(x_train)
print(answer)
print(y_train)
print(np.mean( answer == y_train))
    


#for i in range(1, rtest.)
    
   
#'''

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章