task1-房價預測

1. 賽題分析

比賽要求參賽選手根據給定的數據集,建立模型,預測房屋租金。
數據集中的數據類別包括租賃房源、小區、二手房、配套、新房、土地、人口、客戶、真實租金等。
這是典型的迴歸預測。

預測指標

迴歸結果評價標準採用R-Square

R2(R-Square)的公式爲
殘差平方和:
SSres=(yiy^i)2 SS_{res}=\sum\left(y_{i}-\hat{y}_{i}\right)^{2}
總平均值:
SStot=(yiyi)2 SS_{tot}=\sum\left(y_{i}-\overline{y}_{i}\right)^{2}

其中y\overline{y}表示yy的平均值
得到R2R^2表達式爲:
R2=1SSresSStot=1(yiy^i)2(yiy)2 R^{2}=1-\frac{SS_{res}}{SS_{tot}}=1-\frac{\sum\left(y_{i}-\hat{y}_{i}\right)^{2}}{\sum\left(y_{i}-\overline{y}\right)^{2}}
R2R^2用於度量因變量的變異中可由自變量解釋部分所佔的比例,取值範圍是 0~1,R2R^2越接近1,表明迴歸平方和佔總平方和的比例越大,迴歸線與各觀測點越接近,用x的變化來解釋y值變化的部分就越多,迴歸的擬合程度就越好。所以R2R^2也稱爲擬合優度(Goodness of Fit)的統計量。

yiy_{i}表示真實值,y^i\hat{y}_{i}表示預測值,yi\overline{y}_{i}表示樣本均值。得分越高擬合效果越好。

數據概況

1.租賃基本信息:

  • ID——房屋編號

  • area——房屋面積

  • rentType——出租方式:整租/合租/未知

  • houseType——房型

  • houseFloor——房間所在樓層:高/中/低

  • totalFloor——房間所在的總樓層數

  • houseToward——房間朝向

  • houseDecoration——房屋裝修

  • tradeTime——成交日期

  • tradeMoney——成交租金

2.小區信息:

  • CommunityName——小區名稱
  • city——城市
  • region——地區
  • plate——區域板塊
  • buildYear——小區建築年代
  • saleSecHouseNum——該板塊當月二手房掛牌房源數

3.配套設施:

  • subwayStationNum——該板塊地鐵站數量
  • busStationNum——該板塊公交站數量
  • interSchoolNum——該板塊國際學校的數量
  • schoolNum——該板塊公立學校的數量
  • privateSchoolNum——該板塊私立學校數量
  • hospitalNum——該板塊綜合醫院數量
  • DrugStoreNum——該板塊藥房數量
  • gymNum——該板塊健身中心數量
  • bankNum——該板塊銀行數量
  • shopNum——該板塊商店數量
  • parkNum——該板塊公園數量
  • mallNum——該板塊購物中心數量
  • superMarketNum——該板塊超市數量

4.其他信息:

  • totalTradeMoney——該板塊當月二手房成交總金額

  • totalTradeArea——該板塊二手房成交總面積

  • tradeMeanPrice——該板塊二手房成交均價

  • tradeSecNum——該板塊當月二手房成交套數

  • totalNewTradeMoney——該板塊當月新房成交總金額

  • totalNewTradeArea——該板塊當月新房成交的總面積

  • totalNewMeanPrice——該板塊當月新房成交均價

  • tradeNewNum——該板塊當月新房成交套數

  • remainNewNum——該板塊當月新房未成交套數

  • supplyNewNum——該板塊當月新房供應套數

  • supplyLandNum——該板塊當月土地供應幅數

  • supplyLandArea——該板塊當月土地供應面積

  • tradeLandNum——該板塊當月土地成交幅數

  • tradeLandArea——該板塊當月土地成交面積

  • landTotalPrice——該板塊當月土地成交總價

  • landMeanPrice——該板塊當月樓板價(元/m^{2})

  • totalWorkers——當前板塊現有的辦公人數

  • newWorkers——該板塊當月流入人口數(現招聘的人員)

  • residentPopulation——該板塊常住人口

  • pv——該板塊當月租客瀏覽網頁次數

  • uv——該板塊當月租客瀏覽網頁總人數

  • lookNum——線下看房次數

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from pylab import *
fname = r"/home/ach/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf/SimHei.TTF"
myfont = FontProperties(fname=fname)
import seaborn as sns
# 根據特徵含義和特徵一覽,大致可以判斷出數值型和類別型特徵如下
categorical_feas = ['rentType', 'houseType', 'houseFloor', 'region', 'plate', 'houseToward', 'houseDecoration',
    'communityName','city','region','plate','buildYear']
numerical_feas=['ID','area','totalFloor','saleSecHouseNum','subwayStationNum',
    'busStationNum','interSchoolNum','schoolNum','privateSchoolNum','hospitalNum',
    'drugStoreNum','gymNum','bankNum','shopNum','parkNum','mallNum','superMarketNum',
    'totalTradeMoney','totalTradeArea','tradeMeanPrice','tradeSecNum','totalNewTradeMoney',
    'totalNewTradeArea','tradeNewMeanPrice','tradeNewNum','remainNewNum','supplyNewNum',
    'supplyLandNum','supplyLandArea','tradeLandNum','tradeLandArea','landTotalPrice',
    'landMeanPrice','totalWorkers','newWorkers','residentPopulation','pv','uv','lookNum']

我們可以發現這是房價預測,所以應該是屬於迴歸問題

data = pd.read_csv('./數據集/train_data.csv')
data
ID area rentType houseType houseFloor totalFloor houseToward houseDecoration communityName city ... landTotalPrice landMeanPrice totalWorkers newWorkers residentPopulation pv uv lookNum tradeTime tradeMoney
0 100309852 68.06 未知方式 2室1廳1衛 16 暫無數據 其他 XQ00051 SH ... 0 0.0000 28248 614 111546 1124.0 284.0 0 2018/11/28 2000.0
1 100307942 125.55 未知方式 3室2廳2衛 14 暫無數據 簡裝 XQ00130 SH ... 0 0.0000 14823 148 157552 701.0 22.0 1 2018/12/16 2000.0
2 100307764 132.00 未知方式 3室2廳2衛 32 暫無數據 其他 XQ00179 SH ... 0 0.0000 77645 520 131744 57.0 20.0 1 2018/12/22 16000.0
3 100306518 57.00 未知方式 1室1廳1衛 17 暫無數據 精裝 XQ00313 SH ... 332760000 3080.0331 8750 1665 253337 888.0 279.0 9 2018/12/21 1600.0
4 100305262 129.00 未知方式 3室2廳3衛 2 暫無數據 毛坯 XQ01257 SH ... 0 0.0000 800 117 125309 2038.0 480.0 0 2018/11/18 2900.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 100000438 10.00 合租 4室1廳1衛 11 精裝 XQ01209 SH ... 573070000 4313.0100 20904 0 245872 29635.0 2662.0 0 2018/2/5 2190.0
41436 100000201 7.10 合租 3室1廳1衛 6 精裝 XQ00853 SH ... 0 0.0000 4370 0 306857 28213.0 2446.0 0 2018/1/22 2090.0
41437 100000198 9.20 合租 4室1廳1衛 18 精裝 XQ00852 SH ... 0 0.0000 4370 0 306857 19231.0 2016.0 0 2018/2/8 3190.0
41438 100000182 14.10 合租 4室1廳1衛 8 精裝 XQ00791 SH ... 0 0.0000 4370 0 306857 17471.0 2554.0 0 2018/3/22 2460.0
41439 100000041 33.50 未知方式 1室1廳1衛 19 其他 XQ03246 SH ... 0 0.0000 13192 990 406803 2556.0 717.0 1 2018/10/21 3000.0

41440 rows × 51 columns

data.info()
data.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41440 entries, 0 to 41439
Data columns (total 51 columns):
ID                    41440 non-null int64
area                  41440 non-null float64
rentType              41440 non-null object
houseType             41440 non-null object
houseFloor            41440 non-null object
totalFloor            41440 non-null int64
houseToward           41440 non-null object
houseDecoration       41440 non-null object
communityName         41440 non-null object
city                  41440 non-null object
region                41440 non-null object
plate                 41440 non-null object
buildYear             41440 non-null object
saleSecHouseNum       41440 non-null int64
subwayStationNum      41440 non-null int64
busStationNum         41440 non-null int64
interSchoolNum        41440 non-null int64
schoolNum             41440 non-null int64
privateSchoolNum      41440 non-null int64
hospitalNum           41440 non-null int64
drugStoreNum          41440 non-null int64
gymNum                41440 non-null int64
bankNum               41440 non-null int64
shopNum               41440 non-null int64
parkNum               41440 non-null int64
mallNum               41440 non-null int64
superMarketNum        41440 non-null int64
totalTradeMoney       41440 non-null int64
totalTradeArea        41440 non-null float64
tradeMeanPrice        41440 non-null float64
tradeSecNum           41440 non-null int64
totalNewTradeMoney    41440 non-null int64
totalNewTradeArea     41440 non-null int64
tradeNewMeanPrice     41440 non-null float64
tradeNewNum           41440 non-null int64
remainNewNum          41440 non-null int64
supplyNewNum          41440 non-null int64
supplyLandNum         41440 non-null int64
supplyLandArea        41440 non-null float64
tradeLandNum          41440 non-null int64
tradeLandArea         41440 non-null float64
landTotalPrice        41440 non-null int64
landMeanPrice         41440 non-null float64
totalWorkers          41440 non-null int64
newWorkers            41440 non-null int64
residentPopulation    41440 non-null int64
pv                    41422 non-null float64
uv                    41422 non-null float64
lookNum               41440 non-null int64
tradeTime             41440 non-null object
tradeMoney            41440 non-null float64
dtypes: float64(10), int64(30), object(11)
memory usage: 16.1+ MB
ID area totalFloor saleSecHouseNum subwayStationNum busStationNum interSchoolNum schoolNum privateSchoolNum hospitalNum ... tradeLandArea landTotalPrice landMeanPrice totalWorkers newWorkers residentPopulation pv uv lookNum tradeMoney
count 4.144000e+04 41440.000000 41440.000000 41440.000000 41440.000000 41440.000000 41440.000000 41440.000000 41440.000000 41440.000000 ... 41440.000000 4.144000e+04 41440.000000 41440.000000 41440.000000 41440.000000 41422.000000 41422.000000 41440.000000 4.144000e+04
mean 1.001221e+08 70.959409 11.413152 1.338538 5.741192 187.197153 1.506395 48.228813 6.271911 4.308736 ... 12621.406425 1.045363e+08 724.763918 77250.235497 1137.132095 294514.059459 26945.663512 3089.077085 0.396260 8.837074e+03
std 9.376566e+04 88.119569 7.375203 3.180349 4.604929 179.674625 1.687631 29.568448 4.946457 3.359714 ... 49853.120341 5.215216e+08 3224.303831 132052.508523 7667.381627 196745.147181 32174.637924 2954.706517 1.653932 5.514287e+05
min 1.000000e+08 1.000000 0.000000 0.000000 0.000000 24.000000 0.000000 9.000000 0.000000 0.000000 ... 0.000000 0.000000e+00 0.000000 600.000000 0.000000 49330.000000 17.000000 6.000000 0.000000 0.000000e+00
25% 1.000470e+08 42.607500 6.000000 0.000000 2.000000 74.000000 0.000000 24.000000 2.000000 1.000000 ... 0.000000 0.000000e+00 0.000000 13983.000000 0.000000 165293.000000 7928.000000 1053.000000 0.000000 2.800000e+03
50% 1.000960e+08 65.000000 7.000000 0.000000 5.000000 128.000000 1.000000 47.000000 5.000000 4.000000 ... 0.000000 0.000000e+00 0.000000 38947.000000 0.000000 245872.000000 20196.000000 2375.000000 0.000000 4.000000e+03
75% 1.001902e+08 90.000000 16.000000 1.000000 7.000000 258.000000 3.000000 61.000000 9.000000 6.000000 ... 0.000000 0.000000e+00 0.000000 76668.000000 0.000000 330610.000000 34485.000000 4233.000000 0.000000 5.500000e+03
max 1.003218e+08 15055.000000 88.000000 52.000000 22.000000 824.000000 8.000000 142.000000 24.000000 14.000000 ... 555508.010000 6.197570e+09 37513.062490 855400.000000 143700.000000 928198.000000 621864.000000 39876.000000 37.000000 1.000000e+08

8 rows × 40 columns

groupby_user = data.groupby('rentType').size()
print(groupby_user)
groupby_user.plot.bar(title='renttype',figsize = (15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# 未知方式爲主
rentType
--          5
合租       5204
整租       5472
未知方式    30759
dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-QQ3pSdib-1578403791151)(output_6_1.png)]

groupby_user = data.groupby('houseType').size()
print(groupby_user)
groupby_user.plot.bar(title='houseType',figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
#說明大部分人選擇一室一廳一衛
houseType
0室0廳1衛       1
1室0廳0衛      86
1室0廳1衛    1286
1室1廳0衛      12
1室1廳1衛    9805
          ... 
8室2廳4衛       1
8室3廳4衛       1
8室4廳4衛       1
9室2廳5衛       1
9室3廳8衛       1
Length: 104, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-FBUaQe1U-1578403791153)(output_7_1.png)]

groupby_user = data.groupby('houseFloor').size()
print(groupby_user)
groupby_user.plot.bar(title='houseFloor', figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
houseFloor
中    15458
低    11916
高    14066
dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-bIQoCg1C-1578403791154)(output_8_1.png)]

groupby_user = data.groupby('totalFloor').size()
print(groupby_user)
groupby_user.plot.bar(title='totalFloor', figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# 大部分人選擇六樓,需要注意一下
totalFloor
0         5
1        98
2       193
3       446
4       486
5      2730
6     15797
7      1362
8       624
9       393
10      401
11     2884
12      738
13      882
14     2166
15      809
16     1147
17     1375
18     3553
19      467
20      457
21      466
22      309
23      161
24      732
25      390
26      300
27      399
28      258
29      289
30      144
31      211
32      234
33      117
34       54
35       57
36       57
37       96
38       33
39       10
40       11
41       17
43       12
45        3
47        4
49       25
51        1
53        7
56       17
58        1
59        1
60        3
61        1
62        5
88        2
dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-P7Dt0Q69-1578403791154)(output_9_1.png)]

groupby_user = data.groupby('houseDecoration').size()
print(groupby_user)
groupby_user.plot.bar(title='houseDecoration', figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# 其他
houseDecoration
其他    29040
毛坯      311
簡裝     1171
精裝    10918
dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-TntFgh88-1578403791155)(output_10_1.png)]

groupby_user = data.groupby('plate').size()
print(groupby_user)
print(sorted(groupby_user.items(),key=lambda item:item[1],reverse=True))
groupby_user.plot.bar(title='plate',figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# bk00031地方更貴
plate
BK00001      1
BK00002    357
BK00003    523
BK00004    189
BK00005    549
          ... 
BK00062    618
BK00063    281
BK00064    590
BK00065    348
BK00066    219
Length: 66, dtype: int64
[('BK00031', 1958), ('BK00033', 1837), ('BK00045', 1816), ('BK00055', 1566), ('BK00056', 1516), ('BK00052', 1375), ('BK00017', 1305), ('BK00041', 1266), ('BK00054', 1256), ('BK00051', 1253), ('BK00046', 1227), ('BK00035', 1156), ('BK00042', 1137), ('BK00009', 1016), ('BK00050', 979), ('BK00043', 930), ('BK00026', 906), ('BK00047', 880), ('BK00034', 849), ('BK00013', 834), ('BK00053', 819), ('BK00028', 745), ('BK00040', 679), ('BK00060', 671), ('BK00010', 651), ('BK00029', 646), ('BK00062', 618), ('BK00022', 614), ('BK00018', 613), ('BK00064', 590), ('BK00005', 549), ('BK00003', 523), ('BK00014', 500), ('BK00019', 498), ('BK00061', 477), ('BK00011', 455), ('BK00037', 444), ('BK00012', 412), ('BK00038', 398), ('BK00024', 397), ('BK00020', 384), ('BK00002', 357), ('BK00065', 348), ('BK00027', 344), ('BK00039', 343), ('BK00063', 281), ('BK00057', 278), ('BK00015', 253), ('BK00006', 231), ('BK00021', 226), ('BK00007', 225), ('BK00030', 219), ('BK00066', 219), ('BK00049', 211), ('BK00008', 210), ('BK00004', 189), ('BK00048', 165), ('BK00025', 157), ('BK00023', 127), ('BK00059', 122), ('BK00044', 98), ('BK00016', 40), ('BK00036', 33), ('BK00058', 15), ('BK00032', 3), ('BK00001', 1)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-k4rD7OzZ-1578403791156)(output_11_1.png)]

print(len(data['plate']))
41440
# plt.scatter(data['plate'],data['busStationNum'])
# 公交車站和地區關係
x = []
y = []
for i in range(len(data['plate'])):
    if(data['plate'][i] not in x):
        x.append(data['plate'][i] )
        y.append(data['busStationNum'][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
    res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)

plt.figure(figsize=(15,6))
plt.title("bustation above plate|")
plt.scatter(res2.keys(),res2.values())
{'BK00045': 824, 'BK00031': 461, 'BK00042': 441, 'BK00016': 387, 'BK00051': 364, 'BK00001': 356, 'BK00057': 331, 'BK00054': 306, 'BK00032': 284, 'BK00052': 276, 'BK00058': 264, 'BK00056': 258, 'BK00062': 196, 'BK00053': 190, 'BK00049': 184, 'BK00035': 178, 'BK00047': 172, 'BK00038': 169, 'BK00046': 167, 'BK00022': 156, 'BK00055': 151, 'BK00061': 151, 'BK00041': 144, 'BK00044': 141, 'BK00040': 138, 'BK00036': 131, 'BK00059': 128, 'BK00020': 114, 'BK00021': 114, 'BK00005': 105, 'BK00026': 101, 'BK00043': 98, 'BK00015': 98, 'BK00033': 96, 'BK00014': 95, 'BK00066': 95, 'BK00017': 92, 'BK00060': 88, 'BK00018': 83, 'BK00012': 82, 'BK00048': 82, 'BK00002': 79, 'BK00034': 78, 'BK00027': 74, 'BK00013': 72, 'BK00025': 70, 'BK00037': 68, 'BK00028': 67, 'BK00010': 62, 'BK00050': 60, 'BK00009': 56, 'BK00007': 52, 'BK00008': 52, 'BK00030': 48, 'BK00023': 47, 'BK00003': 45, 'BK00019': 42, 'BK00039': 41, 'BK00064': 36, 'BK00011': 34, 'BK00004': 30, 'BK00006': 29, 'BK00065': 29, 'BK00029': 27, 'BK00063': 25, 'BK00024': 24}





<matplotlib.collections.PathCollection at 0x7f78282b0210>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-N7GoEP77-1578403791156)(output_13_2.png)]

# plt.scatter(data['plate'],data['busStationNum'])
# 房源售賣和地區關係
x = []
y = []
for i in range(len(data['plate'])):
    if(data['plate'][i] not in x):
        x.append(data['plate'][i] )
        y.append(data['saleSecHouseNum'][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
    res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)
plt.figure(figsize=(15,6))
plt.title("saleSecHouseNum above plate")
plt.scatter(res2.keys(),res2.values())
{'BK00015': 6, 'BK00050': 3, 'BK00032': 3, 'BK00044': 1, 'BK00052': 1, 'BK00064': 0, 'BK00049': 0, 'BK00051': 0, 'BK00031': 0, 'BK00028': 0, 'BK00017': 0, 'BK00045': 0, 'BK00027': 0, 'BK00041': 0, 'BK00047': 0, 'BK00009': 0, 'BK00025': 0, 'BK00024': 0, 'BK00014': 0, 'BK00026': 0, 'BK00042': 0, 'BK00046': 0, 'BK00043': 0, 'BK00013': 0, 'BK00012': 0, 'BK00005': 0, 'BK00011': 0, 'BK00010': 0, 'BK00003': 0, 'BK00033': 0, 'BK00053': 0, 'BK00006': 0, 'BK00004': 0, 'BK00002': 0, 'BK00007': 0, 'BK00016': 0, 'BK00019': 0, 'BK00030': 0, 'BK00048': 0, 'BK00018': 0, 'BK00008': 0, 'BK00029': 0, 'BK00065': 0, 'BK00035': 0, 'BK00036': 0, 'BK00022': 0, 'BK00023': 0, 'BK00054': 0, 'BK00038': 0, 'BK00037': 0, 'BK00034': 0, 'BK00058': 0, 'BK00066': 0, 'BK00039': 0, 'BK00057': 0, 'BK00020': 0, 'BK00059': 0, 'BK00060': 0, 'BK00063': 0, 'BK00055': 0, 'BK00061': 0, 'BK00040': 0, 'BK00056': 0, 'BK00062': 0, 'BK00021': 0, 'BK00001': 0}





<matplotlib.collections.PathCollection at 0x7f78280fce50>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-Yerw2bRO-1578403791157)(output_14_2.png)]

# plt.scatter(data['plate'],data['busStationNum'])
# 學校
x = []
y = []
for i in range(len(data['plate'])):
    if(data['plate'][i] not in x):
        x.append(data['plate'][i] )
        y.append(data['interSchoolNum'][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
    res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)
plt.figure(figsize=(15,6))
plt.title("interSchoolNum above plate")
plt.scatter(res2.keys(),res2.values())
{'BK00007': 8, 'BK00008': 8, 'BK00053': 6, 'BK00038': 6, 'BK00031': 4, 'BK00005': 4, 'BK00016': 4, 'BK00034': 4, 'BK00063': 4, 'BK00045': 3, 'BK00014': 3, 'BK00029': 3, 'BK00035': 3, 'BK00036': 3, 'BK00066': 3, 'BK00057': 3, 'BK00060': 3, 'BK00051': 2, 'BK00052': 2, 'BK00013': 2, 'BK00010': 2, 'BK00039': 2, 'BK00020': 2, 'BK00040': 2, 'BK00062': 2, 'BK00021': 2, 'BK00050': 1, 'BK00028': 1, 'BK00027': 1, 'BK00024': 1, 'BK00026': 1, 'BK00042': 1, 'BK00046': 1, 'BK00004': 1, 'BK00018': 1, 'BK00054': 1, 'BK00037': 1, 'BK00058': 1, 'BK00064': 0, 'BK00049': 0, 'BK00044': 0, 'BK00017': 0, 'BK00041': 0, 'BK00047': 0, 'BK00009': 0, 'BK00025': 0, 'BK00043': 0, 'BK00012': 0, 'BK00011': 0, 'BK00003': 0, 'BK00033': 0, 'BK00006': 0, 'BK00002': 0, 'BK00015': 0, 'BK00019': 0, 'BK00030': 0, 'BK00048': 0, 'BK00065': 0, 'BK00022': 0, 'BK00023': 0, 'BK00059': 0, 'BK00055': 0, 'BK00061': 0, 'BK00056': 0, 'BK00032': 0, 'BK00001': 0}





<matplotlib.collections.PathCollection at 0x7f78284de7d0>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-9tfkHJjT-1578403791158)(output_15_2.png)]

def paint(colum:str):
    # plt.scatter(data['plate'],data['busStationNum'])
    x = []
    y = []
    for i in range(len(data['plate'])):
        if(data['plate'][i] not in x):
            x.append(data['plate'][i] )
            y.append(data[colum][i])
    # plt.scatter(x,y)
    res1 = {}
    for i in range(len(x)):
        res1[x[i]] = y[i]
    # res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
    # dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
    res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
    print(res2)
    plt.figure(figsize=(15,6))
    plt.title("{} above plate".format(colum))
    plt.scatter(res2.keys(),res2.values())
# 地鐵站
paint('subwayStationNum')
{'BK00052': 22, 'BK00057': 14, 'BK00056': 14, 'BK00042': 13, 'BK00002': 11, 'BK00055': 11, 'BK00061': 11, 'BK00041': 9, 'BK00064': 7, 'BK00027': 7, 'BK00012': 7, 'BK00018': 7, 'BK00060': 7, 'BK00040': 7, 'BK00050': 6, 'BK00031': 6, 'BK00053': 6, 'BK00035': 6, 'BK00054': 6, 'BK00020': 6, 'BK00021': 6, 'BK00045': 5, 'BK00014': 5, 'BK00005': 5, 'BK00011': 5, 'BK00010': 5, 'BK00065': 5, 'BK00062': 5, 'BK00013': 4, 'BK00007': 4, 'BK00030': 4, 'BK00008': 4, 'BK00037': 4, 'BK00051': 3, 'BK00028': 3, 'BK00017': 3, 'BK00025': 3, 'BK00024': 3, 'BK00026': 3, 'BK00006': 3, 'BK00004': 3, 'BK00016': 3, 'BK00023': 3, 'BK00066': 3, 'BK00059': 3, 'BK00063': 3, 'BK00049': 2, 'BK00047': 2, 'BK00009': 2, 'BK00043': 2, 'BK00003': 2, 'BK00015': 2, 'BK00019': 2, 'BK00022': 2, 'BK00038': 2, 'BK00034': 2, 'BK00058': 2, 'BK00046': 1, 'BK00033': 1, 'BK00036': 1, 'BK00039': 1, 'BK00044': 0, 'BK00048': 0, 'BK00029': 0, 'BK00032': 0, 'BK00001': 0}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-eqdWulxG-1578403791158)(output_17_1.png)]

# 公立學校
paint('schoolNum')
{'BK00052': 142, 'BK00045': 99, 'BK00056': 98, 'BK00066': 74, 'BK00027': 72, 'BK00031': 71, 'BK00028': 69, 'BK00057': 65, 'BK00013': 64, 'BK00042': 62, 'BK00054': 61, 'BK00060': 61, 'BK00051': 60, 'BK00012': 59, 'BK00023': 57, 'BK00009': 53, 'BK00033': 53, 'BK00016': 52, 'BK00026': 50, 'BK00011': 48, 'BK00055': 48, 'BK00061': 48, 'BK00025': 47, 'BK00005': 47, 'BK00007': 47, 'BK00008': 47, 'BK00024': 45, 'BK00020': 44, 'BK00021': 44, 'BK00050': 43, 'BK00040': 41, 'BK00001': 41, 'BK00018': 39, 'BK00003': 38, 'BK00010': 37, 'BK00030': 32, 'BK00034': 32, 'BK00035': 30, 'BK00037': 30, 'BK00006': 29, 'BK00064': 28, 'BK00002': 28, 'BK00049': 26, 'BK00053': 24, 'BK00029': 24, 'BK00032': 24, 'BK00038': 23, 'BK00017': 22, 'BK00041': 21, 'BK00065': 21, 'BK00022': 21, 'BK00039': 21, 'BK00062': 20, 'BK00019': 18, 'BK00058': 18, 'BK00059': 16, 'BK00044': 15, 'BK00004': 14, 'BK00046': 13, 'BK00063': 13, 'BK00048': 11, 'BK00047': 10, 'BK00014': 10, 'BK00043': 10, 'BK00015': 9, 'BK00036': 9}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-N6McJmHa-1578403791159)(output_18_1.png)]

#私立學校
paint('privateSchoolNum')
{'BK00040': 24, 'BK00034': 16, 'BK00033': 15, 'BK00027': 13, 'BK00056': 13, 'BK00052': 12, 'BK00011': 11, 'BK00037': 11, 'BK00020': 10, 'BK00021': 10, 'BK00045': 9, 'BK00013': 9, 'BK00010': 9, 'BK00029': 9, 'BK00039': 9, 'BK00060': 9, 'BK00009': 8, 'BK00042': 8, 'BK00012': 8, 'BK00018': 8, 'BK00038': 8, 'BK00063': 8, 'BK00026': 7, 'BK00003': 7, 'BK00028': 6, 'BK00066': 6, 'BK00057': 6, 'BK00031': 5, 'BK00005': 5, 'BK00053': 5, 'BK00002': 5, 'BK00007': 5, 'BK00008': 5, 'BK00025': 4, 'BK00035': 4, 'BK00043': 3, 'BK00019': 3, 'BK00050': 2, 'BK00017': 2, 'BK00041': 2, 'BK00046': 2, 'BK00016': 2, 'BK00065': 2, 'BK00023': 2, 'BK00054': 2, 'BK00055': 2, 'BK00061': 2, 'BK00001': 2, 'BK00064': 1, 'BK00051': 1, 'BK00047': 1, 'BK00024': 1, 'BK00014': 1, 'BK00006': 1, 'BK00015': 1, 'BK00048': 1, 'BK00036': 1, 'BK00022': 1, 'BK00062': 1, 'BK00049': 0, 'BK00044': 0, 'BK00004': 0, 'BK00030': 0, 'BK00058': 0, 'BK00059': 0, 'BK00032': 0}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-aaLOdCIA-1578403791160)(output_19_1.png)]

#醫院
paint('hospitalNum')
{'BK00052': 14, 'BK00045': 11, 'BK00042': 9, 'BK00051': 8, 'BK00013': 8, 'BK00005': 8, 'BK00031': 6, 'BK00041': 6, 'BK00025': 6, 'BK00024': 6, 'BK00026': 6, 'BK00007': 6, 'BK00008': 6, 'BK00055': 6, 'BK00061': 6, 'BK00028': 5, 'BK00012': 5, 'BK00030': 5, 'BK00029': 5, 'BK00054': 5, 'BK00020': 5, 'BK00056': 5, 'BK00021': 5, 'BK00027': 4, 'BK00009': 4, 'BK00038': 4, 'BK00058': 4, 'BK00057': 4, 'BK00050': 3, 'BK00046': 3, 'BK00002': 3, 'BK00023': 3, 'BK00037': 3, 'BK00001': 3, 'BK00014': 2, 'BK00011': 2, 'BK00010': 2, 'BK00003': 2, 'BK00006': 2, 'BK00018': 2, 'BK00065': 2, 'BK00035': 2, 'BK00034': 2, 'BK00039': 2, 'BK00060': 2, 'BK00032': 2, 'BK00064': 1, 'BK00049': 1, 'BK00017': 1, 'BK00047': 1, 'BK00043': 1, 'BK00033': 1, 'BK00004': 1, 'BK00016': 1, 'BK00048': 1, 'BK00022': 1, 'BK00066': 1, 'BK00059': 1, 'BK00062': 1, 'BK00044': 0, 'BK00053': 0, 'BK00015': 0, 'BK00019': 0, 'BK00036': 0, 'BK00063': 0, 'BK00040': 0}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-8rSOIqn9-1578403791161)(output_20_1.png)]

# DrugStoreNum——該板塊藥房數量        
paint('drugStoreNum')
{'BK00045': 174, 'BK00042': 145, 'BK00052': 118, 'BK00031': 106, 'BK00054': 94, 'BK00056': 88, 'BK00057': 85, 'BK00051': 83, 'BK00055': 69, 'BK00061': 69, 'BK00040': 67, 'BK00041': 65, 'BK00038': 55, 'BK00046': 54, 'BK00001': 52, 'BK00062': 49, 'BK00020': 48, 'BK00021': 48, 'BK00035': 47, 'BK00034': 41, 'BK00032': 41, 'BK00017': 40, 'BK00026': 40, 'BK00012': 40, 'BK00028': 39, 'BK00053': 39, 'BK00016': 39, 'BK00022': 39, 'BK00027': 37, 'BK00043': 37, 'BK00033': 36, 'BK00018': 35, 'BK00066': 35, 'BK00060': 35, 'BK00009': 34, 'BK00013': 34, 'BK00058': 34, 'BK00002': 33, 'BK00047': 31, 'BK00005': 31, 'BK00010': 31, 'BK00025': 29, 'BK00003': 28, 'BK00049': 27, 'BK00011': 27, 'BK00037': 27, 'BK00036': 25, 'BK00050': 24, 'BK00059': 23, 'BK00019': 22, 'BK00048': 22, 'BK00039': 22, 'BK00044': 21, 'BK00006': 20, 'BK00023': 20, 'BK00007': 19, 'BK00008': 19, 'BK00014': 17, 'BK00024': 15, 'BK00030': 15, 'BK00065': 15, 'BK00015': 13, 'BK00064': 12, 'BK00029': 11, 'BK00063': 11, 'BK00004': 8}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-6jLJjtD6-1578403791161)(output_21_1.png)]

# gymNum——該板塊健身中心數量          
paint('gymNum')
{'BK00045': 88, 'BK00042': 84, 'BK00060': 82, 'BK00037': 78, 'BK00052': 64, 'BK00026': 56, 'BK00056': 52, 'BK00057': 48, 'BK00006': 43, 'BK00040': 43, 'BK00055': 41, 'BK00061': 41, 'BK00053': 40, 'BK00007': 40, 'BK00008': 40, 'BK00024': 39, 'BK00027': 38, 'BK00025': 38, 'BK00020': 38, 'BK00021': 38, 'BK00054': 37, 'BK00031': 36, 'BK00050': 35, 'BK00005': 35, 'BK00010': 34, 'BK00033': 34, 'BK00013': 32, 'BK00051': 30, 'BK00018': 28, 'BK00039': 28, 'BK00028': 27, 'BK00041': 27, 'BK00017': 26, 'BK00011': 26, 'BK00035': 26, 'BK00046': 25, 'BK00012': 25, 'BK00034': 25, 'BK00003': 23, 'BK00062': 23, 'BK00038': 22, 'BK00066': 21, 'BK00019': 20, 'BK00065': 20, 'BK00001': 19, 'BK00023': 18, 'BK00047': 16, 'BK00009': 16, 'BK00043': 16, 'BK00063': 16, 'BK00064': 15, 'BK00048': 14, 'BK00004': 13, 'BK00022': 13, 'BK00030': 12, 'BK00014': 10, 'BK00002': 8, 'BK00036': 8, 'BK00032': 8, 'BK00029': 6, 'BK00058': 6, 'BK00049': 5, 'BK00044': 5, 'BK00016': 5, 'BK00059': 5, 'BK00015': 1}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-KTH69YOw-1578403791162)(output_22_1.png)]

# bankNum——該板塊銀行數量           
paint('bankNum')
{'BK00060': 207, 'BK00045': 119, 'BK00025': 98, 'BK00052': 95, 'BK00057': 92, 'BK00042': 91, 'BK00031': 86, 'BK00007': 86, 'BK00008': 86, 'BK00056': 75, 'BK00024': 69, 'BK00026': 62, 'BK00013': 53, 'BK00033': 52, 'BK00027': 50, 'BK00023': 50, 'BK00054': 50, 'BK00001': 49, 'BK00051': 47, 'BK00028': 46, 'BK00041': 43, 'BK00005': 43, 'BK00053': 43, 'BK00037': 43, 'BK00012': 42, 'BK00010': 41, 'BK00011': 38, 'BK00050': 37, 'BK00020': 35, 'BK00040': 35, 'BK00021': 35, 'BK00055': 34, 'BK00061': 34, 'BK00006': 33, 'BK00058': 32, 'BK00034': 31, 'BK00066': 31, 'BK00062': 31, 'BK00003': 29, 'BK00018': 29, 'BK00016': 28, 'BK00019': 28, 'BK00030': 28, 'BK00029': 27, 'BK00032': 25, 'BK00038': 24, 'BK00039': 24, 'BK00009': 23, 'BK00035': 22, 'BK00017': 21, 'BK00046': 21, 'BK00022': 21, 'BK00002': 20, 'BK00043': 18, 'BK00064': 16, 'BK00049': 16, 'BK00014': 16, 'BK00065': 15, 'BK00036': 14, 'BK00047': 13, 'BK00048': 12, 'BK00004': 11, 'BK00044': 10, 'BK00059': 9, 'BK00015': 7, 'BK00063': 7}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-lyWhXHa8-1578403791162)(output_23_1.png)]

# 購物商店
paint('shopNum')
{'BK00045': 824, 'BK00042': 671, 'BK00031': 598, 'BK00052': 483, 'BK00054': 419, 'BK00057': 404, 'BK00051': 358, 'BK00012': 354, 'BK00001': 353, 'BK00056': 341, 'BK00032': 340, 'BK00020': 318, 'BK00021': 318, 'BK00027': 306, 'BK00041': 301, 'BK00025': 245, 'BK00018': 243, 'BK00055': 236, 'BK00061': 236, 'BK00026': 231, 'BK00016': 224, 'BK00023': 224, 'BK00040': 224, 'BK00013': 223, 'BK00038': 215, 'BK00062': 215, 'BK00035': 214, 'BK00028': 211, 'BK00022': 206, 'BK00005': 200, 'BK00060': 199, 'BK00030': 189, 'BK00034': 189, 'BK00047': 175, 'BK00017': 171, 'BK00046': 167, 'BK00049': 163, 'BK00033': 162, 'BK00037': 160, 'BK00009': 154, 'BK00010': 154, 'BK00053': 154, 'BK00002': 151, 'BK00043': 150, 'BK00066': 143, 'BK00011': 142, 'BK00024': 140, 'BK00058': 134, 'BK00014': 118, 'BK00036': 112, 'BK00065': 109, 'BK00044': 100, 'BK00006': 100, 'BK00048': 99, 'BK00019': 97, 'BK00003': 96, 'BK00007': 90, 'BK00008': 90, 'BK00050': 85, 'BK00015': 84, 'BK00039': 80, 'BK00059': 77, 'BK00064': 76, 'BK00029': 65, 'BK00004': 42, 'BK00063': 10}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-F1bI5Wtw-1578403791163)(output_24_1.png)]

#公園
paint('parkNum')
{'BK00042': 30, 'BK00057': 26, 'BK00045': 24, 'BK00052': 23, 'BK00054': 14, 'BK00060': 13, 'BK00020': 12, 'BK00021': 12, 'BK00056': 11, 'BK00062': 11, 'BK00041': 10, 'BK00033': 8, 'BK00053': 8, 'BK00002': 8, 'BK00022': 8, 'BK00038': 8, 'BK00055': 8, 'BK00061': 8, 'BK00031': 7, 'BK00013': 7, 'BK00007': 7, 'BK00008': 7, 'BK00040': 7, 'BK00049': 6, 'BK00050': 6, 'BK00026': 6, 'BK00043': 6, 'BK00012': 6, 'BK00065': 6, 'BK00036': 6, 'BK00034': 6, 'BK00058': 6, 'BK00064': 5, 'BK00044': 5, 'BK00027': 5, 'BK00025': 5, 'BK00006': 5, 'BK00004': 5, 'BK00015': 5, 'BK00016': 5, 'BK00018': 5, 'BK00035': 5, 'BK00001': 5, 'BK00009': 4, 'BK00003': 4, 'BK00048': 4, 'BK00029': 4, 'BK00023': 4, 'BK00032': 4, 'BK00051': 3, 'BK00028': 3, 'BK00017': 3, 'BK00005': 3, 'BK00010': 3, 'BK00030': 3, 'BK00037': 3, 'BK00011': 2, 'BK00019': 2, 'BK00066': 2, 'BK00047': 1, 'BK00024': 1, 'BK00014': 1, 'BK00039': 1, 'BK00059': 1, 'BK00046': 0, 'BK00063': 0}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-JLzm1eMg-1578403791164)(output_25_1.png)]

# 購物中心
paint('mallNum')
{'BK00045': 19, 'BK00042': 16, 'BK00060': 15, 'BK00025': 14, 'BK00031': 12, 'BK00027': 11, 'BK00054': 10, 'BK00001': 10, 'BK00038': 9, 'BK00034': 9, 'BK00043': 8, 'BK00037': 8, 'BK00057': 8, 'BK00041': 7, 'BK00053': 7, 'BK00006': 7, 'BK00019': 7, 'BK00020': 7, 'BK00056': 7, 'BK00062': 7, 'BK00021': 7, 'BK00005': 6, 'BK00010': 6, 'BK00033': 6, 'BK00007': 6, 'BK00015': 6, 'BK00008': 6, 'BK00040': 6, 'BK00052': 5, 'BK00026': 5, 'BK00003': 5, 'BK00018': 5, 'BK00023': 5, 'BK00066': 5, 'BK00055': 5, 'BK00061': 5, 'BK00049': 4, 'BK00050': 4, 'BK00024': 4, 'BK00012': 4, 'BK00035': 4, 'BK00022': 4, 'BK00058': 4, 'BK00064': 3, 'BK00017': 3, 'BK00011': 3, 'BK00059': 3, 'BK00044': 2, 'BK00028': 2, 'BK00047': 2, 'BK00014': 2, 'BK00046': 2, 'BK00004': 2, 'BK00002': 2, 'BK00048': 2, 'BK00029': 2, 'BK00039': 2, 'BK00051': 1, 'BK00009': 1, 'BK00013': 1, 'BK00030': 1, 'BK00016': 0, 'BK00065': 0, 'BK00036': 0, 'BK00063': 0, 'BK00032': 0}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-MtP5kzvf-1578403791165)(output_26_1.png)]

上面這些我故意選取的地區和一些市場學校等場所的分佈,可以發現bk0045總體來說是最好的

# 超級市場
paint('superMarketNum')
{'BK00045': 299, 'BK00042': 159, 'BK00052': 154, 'BK00057': 145, 'BK00051': 131, 'BK00056': 130, 'BK00054': 126, 'BK00031': 119, 'BK00041': 109, 'BK00020': 103, 'BK00021': 103, 'BK00046': 100, 'BK00062': 98, 'BK00038': 88, 'BK00032': 83, 'BK00055': 78, 'BK00061': 78, 'BK00040': 75, 'BK00017': 74, 'BK00001': 71, 'BK00027': 63, 'BK00012': 63, 'BK00018': 63, 'BK00047': 61, 'BK00035': 60, 'BK00034': 58, 'BK00026': 56, 'BK00013': 56, 'BK00053': 56, 'BK00022': 56, 'BK00060': 55, 'BK00028': 53, 'BK00049': 51, 'BK00058': 51, 'BK00066': 51, 'BK00016': 49, 'BK00002': 48, 'BK00033': 47, 'BK00043': 46, 'BK00037': 43, 'BK00011': 42, 'BK00014': 41, 'BK00065': 38, 'BK00036': 38, 'BK00059': 38, 'BK00009': 37, 'BK00010': 36, 'BK00007': 35, 'BK00008': 35, 'BK00044': 34, 'BK00003': 32, 'BK00019': 32, 'BK00005': 31, 'BK00048': 31, 'BK00050': 30, 'BK00025': 29, 'BK00006': 23, 'BK00064': 22, 'BK00030': 22, 'BK00024': 21, 'BK00023': 21, 'BK00039': 21, 'BK00015': 16, 'BK00029': 15, 'BK00063': 11, 'BK00004': 5}

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-JaSwRtMj-1578403791165)(output_28_1.png)]

data.isnull()
ID area rentType houseType houseFloor totalFloor houseToward houseDecoration communityName city ... landTotalPrice landMeanPrice totalWorkers newWorkers residentPopulation pv uv lookNum tradeTime tradeMoney
0 False False False False False False False False False False ... False False False False False False False False False False
1 False False False False False False False False False False ... False False False False False False False False False False
2 False False False False False False False False False False ... False False False False False False False False False False
3 False False False False False False False False False False ... False False False False False False False False False False
4 False False False False False False False False False False ... False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 False False False False False False False False False False ... False False False False False False False False False False
41436 False False False False False False False False False False ... False False False False False False False False False False
41437 False False False False False False False False False False ... False False False False False False False False False False
41438 False False False False False False False False False False ... False False False False False False False False False False
41439 False False False False False False False False False False ... False False False False False False False False False False

41440 rows × 51 columns

# 缺失值定位
data.isnull().sum()
ID                     0
area                   0
rentType               0
houseType              0
houseFloor             0
totalFloor             0
houseToward            0
houseDecoration        0
communityName          0
city                   0
region                 0
plate                  0
buildYear              0
saleSecHouseNum        0
subwayStationNum       0
busStationNum          0
interSchoolNum         0
schoolNum              0
privateSchoolNum       0
hospitalNum            0
drugStoreNum           0
gymNum                 0
bankNum                0
shopNum                0
parkNum                0
mallNum                0
superMarketNum         0
totalTradeMoney        0
totalTradeArea         0
tradeMeanPrice         0
tradeSecNum            0
totalNewTradeMoney     0
totalNewTradeArea      0
tradeNewMeanPrice      0
tradeNewNum            0
remainNewNum           0
supplyNewNum           0
supplyLandNum          0
supplyLandArea         0
tradeLandNum           0
tradeLandArea          0
landTotalPrice         0
landMeanPrice          0
totalWorkers           0
newWorkers             0
residentPopulation     0
pv                    18
uv                    18
lookNum                0
tradeTime              0
tradeMoney             0
dtype: int64
# 這裏我們可以知道pv和uv有一些缺失值
#參照答案的風格定位值
missing_values = pd.DataFrame(data.isnull().sum(),columns=['missingNum'])
missing_values['existNum'] = len(data)-missing_values ['missingNum']
missing_values['sum'] = len(data)
#太小了,使用百分比形式
missing_values['missingRadio'] = missing_values['missingNum']/len(data)*100
missing_values['dtype'] = data.dtypes# 數據類型查看
missing_values = missing_values[missing_values['missingNum']>0]
missing_values
# 通過下圖中,我們再查看csv的文件,發現這些數據是屬於同一行的,所以我們下次準備去除
missingNum existNum sum missingRadio dtype
pv 18 41422 41440 0.043436 float64
uv 18 41422 41440 0.043436 float64
print(data['pv'].mean())
data.isnull().sum()
#我們先看看均值
26945.663512143306





ID                     0
area                   0
rentType               0
houseType              0
houseFloor             0
totalFloor             0
houseToward            0
houseDecoration        0
communityName          0
city                   0
region                 0
plate                  0
buildYear              0
saleSecHouseNum        0
subwayStationNum       0
busStationNum          0
interSchoolNum         0
schoolNum              0
privateSchoolNum       0
hospitalNum            0
drugStoreNum           0
gymNum                 0
bankNum                0
shopNum                0
parkNum                0
mallNum                0
superMarketNum         0
totalTradeMoney        0
totalTradeArea         0
tradeMeanPrice         0
tradeSecNum            0
totalNewTradeMoney     0
totalNewTradeArea      0
tradeNewMeanPrice      0
tradeNewNum            0
remainNewNum           0
supplyNewNum           0
supplyLandNum          0
supplyLandArea         0
tradeLandNum           0
tradeLandArea          0
landTotalPrice         0
landMeanPrice          0
totalWorkers           0
newWorkers             0
residentPopulation     0
pv                    18
uv                    18
lookNum                0
tradeTime              0
tradeMoney             0
dtype: int64
# print(data['pv'].isnull())
# for i in range(len(data['pv'].isnull())):
#     if(data['pv'].isnull()[i]==True):
        
# print(i)
# data
data = data.fillna(data.mean()) #填補空缺值
data.isnull().sum()
ID                    0
area                  0
rentType              0
houseType             0
houseFloor            0
totalFloor            0
houseToward           0
houseDecoration       0
communityName         0
city                  0
region                0
plate                 0
buildYear             0
saleSecHouseNum       0
subwayStationNum      0
busStationNum         0
interSchoolNum        0
schoolNum             0
privateSchoolNum      0
hospitalNum           0
drugStoreNum          0
gymNum                0
bankNum               0
shopNum               0
parkNum               0
mallNum               0
superMarketNum        0
totalTradeMoney       0
totalTradeArea        0
tradeMeanPrice        0
tradeSecNum           0
totalNewTradeMoney    0
totalNewTradeArea     0
tradeNewMeanPrice     0
tradeNewNum           0
remainNewNum          0
supplyNewNum          0
supplyLandNum         0
supplyLandArea        0
tradeLandNum          0
tradeLandArea         0
landTotalPrice        0
landMeanPrice         0
totalWorkers          0
newWorkers            0
residentPopulation    0
pv                    0
uv                    0
lookNum               0
tradeTime             0
tradeMoney            0
dtype: int64
for i in['rentType', 'houseType', 'houseFloor', 'region', 'plate', 'houseToward', 'houseDecoration',
    'communityName','city','region','plate','buildYear']:
    print(i + "的特徵分佈如下:")
    print(data[i].value_counts())
    #調節具體參數
#bins調節橫座標分區個數,alpha參數用來設置透明度
# plt.hist(data, bins=30, normed=True, alpha=0.5, histtype='stepfilled',
#          color='steelblue', edgecolor='none')
    if i=="communityName":
        continue
    plt.figure(figsize=(15,6))
    plt.hist(data[i],bins=3)#直
    plt.figure(figsize=(15,4))
    plt.show()
rentType的特徵分佈如下:
未知方式    30759
整租       5472
合租       5204
--          5
Name: rentType, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-frVjss7s-1578403791166)(output_34_1.png)]

<Figure size 1080x288 with 0 Axes>


houseType的特徵分佈如下:
1室1廳1衛    9805
2室1廳1衛    8512
2室2廳1衛    6783
3室1廳1衛    3992
3室2廳2衛    2737
          ... 
6室2廳5衛       1
8室2廳4衛       1
7室1廳7衛       1
8室3廳4衛       1
6室2廳6衛       1
Name: houseType, Length: 104, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-P31yTWKe-1578403791167)(output_34_4.png)]

<Figure size 1080x288 with 0 Axes>


houseFloor的特徵分佈如下:
中    15458
高    14066
低    11916
Name: houseFloor, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-o9BgbDkh-1578403791167)(output_34_7.png)]

<Figure size 1080x288 with 0 Axes>


region的特徵分佈如下:
RG00002    11437
RG00005     5739
RG00003     4186
RG00010     3640
RG00012     3368
RG00004     3333
RG00006     1961
RG00007     1610
RG00008     1250
RG00013     1215
RG00001     1157
RG00014     1069
RG00011      793
RG00009      681
RG00015        1
Name: region, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ByI9EAfu-1578403791168)(output_34_10.png)]

<Figure size 1080x288 with 0 Axes>


plate的特徵分佈如下:
BK00031    1958
BK00033    1837
BK00045    1816
BK00055    1566
BK00056    1516
           ... 
BK00016      40
BK00036      33
BK00058      15
BK00032       3
BK00001       1
Name: plate, Length: 66, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-uoGohJnT-1578403791169)(output_34_13.png)]

<Figure size 1080x288 with 0 Axes>


houseToward的特徵分佈如下:
南       34377
南北       2254
北        2043
暫無數據      963
東南        655
東         552
西         264
西南        250
西北         58
東西         24
Name: houseToward, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-aPFZP5Zc-1578403791169)(output_34_16.png)]

<Figure size 1080x288 with 0 Axes>


houseDecoration的特徵分佈如下:
其他    29040
精裝    10918
簡裝     1171
毛坯      311
Name: houseDecoration, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-gidT4VzF-1578403791170)(output_34_19.png)]

<Figure size 1080x288 with 0 Axes>


communityName的特徵分佈如下:
XQ01834    358
XQ01274    192
XQ02273    188
XQ03110    185
XQ02337    173
          ... 
XQ02484      1
XQ02672      1
XQ00390      1
XQ00560      1
XQ02928      1
Name: communityName, Length: 4236, dtype: int64
city的特徵分佈如下:
SH    41440
Name: city, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-vsHyI9rC-1578403791171)(output_34_22.png)]

<Figure size 1080x288 with 0 Axes>


region的特徵分佈如下:
RG00002    11437
RG00005     5739
RG00003     4186
RG00010     3640
RG00012     3368
RG00004     3333
RG00006     1961
RG00007     1610
RG00008     1250
RG00013     1215
RG00001     1157
RG00014     1069
RG00011      793
RG00009      681
RG00015        1
Name: region, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-wbvOSP80-1578403791171)(output_34_25.png)]

<Figure size 1080x288 with 0 Axes>


plate的特徵分佈如下:
BK00031    1958
BK00033    1837
BK00045    1816
BK00055    1566
BK00056    1516
           ... 
BK00016      40
BK00036      33
BK00058      15
BK00032       3
BK00001       1
Name: plate, Length: 66, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-9IYh9DXM-1578403791172)(output_34_28.png)]

<Figure size 1080x288 with 0 Axes>


buildYear的特徵分佈如下:
1994    2851
暫無信息    2808
2006    2007
2007    1851
2008    1849
        ... 
1939       2
1961       2
1962       1
1951       1
1950       1
Name: buildYear, Length: 80, dtype: int64

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-hz1R5rGP-1578403791173)(output_34_31.png)]

<Figure size 1080x288 with 0 Axes>
# 對非數值文件進行頻次分
for i in['rentType', 'houseType', 'houseFloor', 'region', 'plate', 'houseToward', 'houseDecoration',
    'communityName','city','region','plate','buildYear']:
    da = pd.DataFrame(data[i].value_counts()).reset_index()
    da.columns = [i,'counts']
    print(da[da['counts']>100])
  rentType  counts
0     未知方式   30759
1       整租    5472
2       合租    5204
   houseType  counts
0     1室1廳1衛    9805
1     2室1廳1衛    8512
2     2室2廳1衛    6783
3     3室1廳1衛    3992
4     3室2廳2衛    2737
5     4室1廳1衛    1957
6     3室2廳1衛    1920
7     1室0廳1衛    1286
8     1室2廳1衛     933
9     2室2廳2衛     881
10    4室2廳2衛     435
11    2室0廳1衛     419
12    4室2廳3衛     273
13    5室1廳1衛     197
14    2室1廳2衛     155
15    3室2廳3衛     149
16    3室1廳2衛     135
  houseFloor  counts
0          中   15458
1          高   14066
2          低   11916
     region  counts
0   RG00002   11437
1   RG00005    5739
2   RG00003    4186
3   RG00010    3640
4   RG00012    3368
5   RG00004    3333
6   RG00006    1961
7   RG00007    1610
8   RG00008    1250
9   RG00013    1215
10  RG00001    1157
11  RG00014    1069
12  RG00011     793
13  RG00009     681
      plate  counts
0   BK00031    1958
1   BK00033    1837
2   BK00045    1816
3   BK00055    1566
4   BK00056    1516
5   BK00052    1375
6   BK00017    1305
7   BK00041    1266
8   BK00054    1256
9   BK00051    1253
10  BK00046    1227
11  BK00035    1156
12  BK00042    1137
13  BK00009    1016
14  BK00050     979
15  BK00043     930
16  BK00026     906
17  BK00047     880
18  BK00034     849
19  BK00013     834
20  BK00053     819
21  BK00028     745
22  BK00040     679
23  BK00060     671
24  BK00010     651
25  BK00029     646
26  BK00062     618
27  BK00022     614
28  BK00018     613
29  BK00064     590
30  BK00005     549
31  BK00003     523
32  BK00014     500
33  BK00019     498
34  BK00061     477
35  BK00011     455
36  BK00037     444
37  BK00012     412
38  BK00038     398
39  BK00024     397
40  BK00020     384
41  BK00002     357
42  BK00065     348
43  BK00027     344
44  BK00039     343
45  BK00063     281
46  BK00057     278
47  BK00015     253
48  BK00006     231
49  BK00021     226
50  BK00007     225
51  BK00066     219
52  BK00030     219
53  BK00049     211
54  BK00008     210
55  BK00004     189
56  BK00048     165
57  BK00025     157
58  BK00023     127
59  BK00059     122
  houseToward  counts
0           南   34377
1          南北    2254
2           北    2043
3        暫無數據     963
4          東南     655
5           東     552
6           西     264
7          西南     250
  houseDecoration  counts
0              其他   29040
1              精裝   10918
2              簡裝    1171
3              毛坯     311
   communityName  counts
0        XQ01834     358
1        XQ01274     192
2        XQ02273     188
3        XQ03110     185
4        XQ02337     173
5        XQ01389     166
6        XQ01658     163
7        XQ02789     152
8        XQ00530     151
9        XQ01561     151
10       XQ01339     132
11       XQ00826     122
12       XQ01873     122
13       XQ02296     121
14       XQ01232     119
15       XQ01401     118
16       XQ02441     117
17       XQ00196     115
18       XQ02365     109
19       XQ01207     109
20       XQ01410     108
21       XQ00852     105
22       XQ02072     103
23       XQ01672     103
  city  counts
0   SH   41440
     region  counts
0   RG00002   11437
1   RG00005    5739
2   RG00003    4186
3   RG00010    3640
4   RG00012    3368
5   RG00004    3333
6   RG00006    1961
7   RG00007    1610
8   RG00008    1250
9   RG00013    1215
10  RG00001    1157
11  RG00014    1069
12  RG00011     793
13  RG00009     681
      plate  counts
0   BK00031    1958
1   BK00033    1837
2   BK00045    1816
3   BK00055    1566
4   BK00056    1516
5   BK00052    1375
6   BK00017    1305
7   BK00041    1266
8   BK00054    1256
9   BK00051    1253
10  BK00046    1227
11  BK00035    1156
12  BK00042    1137
13  BK00009    1016
14  BK00050     979
15  BK00043     930
16  BK00026     906
17  BK00047     880
18  BK00034     849
19  BK00013     834
20  BK00053     819
21  BK00028     745
22  BK00040     679
23  BK00060     671
24  BK00010     651
25  BK00029     646
26  BK00062     618
27  BK00022     614
28  BK00018     613
29  BK00064     590
30  BK00005     549
31  BK00003     523
32  BK00014     500
33  BK00019     498
34  BK00061     477
35  BK00011     455
36  BK00037     444
37  BK00012     412
38  BK00038     398
39  BK00024     397
40  BK00020     384
41  BK00002     357
42  BK00065     348
43  BK00027     344
44  BK00039     343
45  BK00063     281
46  BK00057     278
47  BK00015     253
48  BK00006     231
49  BK00021     226
50  BK00007     225
51  BK00066     219
52  BK00030     219
53  BK00049     211
54  BK00008     210
55  BK00004     189
56  BK00048     165
57  BK00025     157
58  BK00023     127
59  BK00059     122
   buildYear  counts
0       1994    2851
1       暫無信息    2808
2       2006    2007
3       2007    1851
4       2008    1849
5       2005    1814
6       2010    1774
7       1995    1685
8       1993    1543
9       2011    1498
10      2004    1431
11      2009    1271
12      2014    1238
13      2003    1156
14      1997    1125
15      2002    1120
16      2012    1049
17      1996     991
18      2000     925
19      2001     898
20      2015     840
21      1999     822
22      1998     733
23      2013     714
24      1987     632
25      1983     612
26      1991     545
27      1984     493
28      1980     452
29      1990     431
30      1988     423
31      1989     419
32      1985     359
33      1982     344
34      1986     320
35      1992     308
36      1976     251
37      1957     227
38      1981     221
39      1956     153
40      1977     153
41      2016     140
42      1978     133
43      1958     122
44      1979     116
45      1954     101
# 目標label值進行分析,sns是一個非常好的分佈包
# Labe 分佈
fig,axes = plt.subplots(2,3)
fig.set_size_inches(20,12)
sns.distplot(data['tradeMoney'],ax=axes[0][0])
sns.distplot(data[(data['tradeMoney']<=20000)]['tradeMoney'],ax=axes[0][1])
sns.distplot(data[(data['tradeMoney']>20000)&(data['tradeMoney']<=50000)]['tradeMoney'],ax=axes[0][2])
sns.distplot(data[(data['tradeMoney']>50000)&(data['tradeMoney']<=100000)]['tradeMoney'],ax=axes[1][0])
sns.distplot(data[(data['tradeMoney']>100000)]['tradeMoney'],ax=axes[1][1])
<matplotlib.axes._subplots.AxesSubplot at 0x7f7823b62090>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-Qjcr56xE-1578403791173)(output_36_1.png)]

print('money_all',len(data['tradeMoney']))
print('money<10000',len(data[(data['tradeMoney']<=10000)]))
print("10000<money<=20000",len(data[(data['tradeMoney']>10000)&(data['tradeMoney']<=20000)]['tradeMoney']))
print("20000<money<=50000",len(data[(data['tradeMoney']>20000)&(data['tradeMoney']<=50000)]['tradeMoney']))
print("50000<money<=100000",len(data[(data['tradeMoney']>50000)&(data['tradeMoney']<=100000)]['tradeMoney']))
print("100000<money",len(data[(data['tradeMoney']>100000)]['tradeMoney']))
money_all 41440
money<10000 38964
10000<money<=20000 1985
20000<money<=50000 433
50000<money<=100000 39
100000<money 19
# 對房屋的處理數據將房間,客廳,衛生間分開來
room = []
living_room = []
bathroom = []
for i in data['houseType']:
    room.append(float(i.split('室')[0]))
    living_room.append(float(i.split('室')[-1].split('廳')[0]))
    bathroom.append(float(i.split('室')[-1].split('廳')[0].split('衛')[0]))
data['roomNum'] = room
data['living_room'] = living_room
data['bathroom'] = bathroom
data = data.drop(['houseType'],axis=1)
data
ID area rentType houseFloor totalFloor houseToward houseDecoration communityName city region ... newWorkers residentPopulation pv uv lookNum tradeTime tradeMoney roomNum living_room bathroom
0 100309852 68.06 未知方式 16 暫無數據 其他 XQ00051 SH RG00001 ... 614 111546 1124.0 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0
1 100307942 125.55 未知方式 14 暫無數據 簡裝 XQ00130 SH RG00002 ... 148 157552 701.0 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0
2 100307764 132.00 未知方式 32 暫無數據 其他 XQ00179 SH RG00002 ... 520 131744 57.0 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0
3 100306518 57.00 未知方式 17 暫無數據 精裝 XQ00313 SH RG00002 ... 1665 253337 888.0 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0
4 100305262 129.00 未知方式 2 暫無數據 毛坯 XQ01257 SH RG00003 ... 117 125309 2038.0 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 100000438 10.00 合租 11 精裝 XQ01209 SH RG00002 ... 0 245872 29635.0 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0
41436 100000201 7.10 合租 6 精裝 XQ00853 SH RG00002 ... 0 306857 28213.0 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0
41437 100000198 9.20 合租 18 精裝 XQ00852 SH RG00002 ... 0 306857 19231.0 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0
41438 100000182 14.10 合租 8 精裝 XQ00791 SH RG00002 ... 0 306857 17471.0 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0
41439 100000041 33.50 未知方式 19 其他 XQ03246 SH RG00010 ... 990 406803 2556.0 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0

41440 rows × 53 columns

#樓層低和高和totalFloor有關係,有大小
# res=[]
# for i in range(len(data['houseFloor'])):
# #     print(i)
# #     print(type(i))
#     if(data['houseFloor'][i]=='低'):
#        res.append(1)
#     elif(i=='中'):
#         res.append(2)
#     else:
#         res.append(3)
# data['houseFloor'] = room
# data = data.infer_objects()
# data.info()
# 嘗試一下後,發現樓層高低是通過樓層決定的,那麼可以刪除
data = data.drop(['houseFloor'],axis=1)
data
ID area rentType totalFloor houseToward houseDecoration communityName city region plate ... newWorkers residentPopulation pv uv lookNum tradeTime tradeMoney roomNum living_room bathroom
0 100309852 68.06 未知方式 16 暫無數據 其他 XQ00051 SH RG00001 BK00064 ... 614 111546 1124.0 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0
1 100307942 125.55 未知方式 14 暫無數據 簡裝 XQ00130 SH RG00002 BK00049 ... 148 157552 701.0 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0
2 100307764 132.00 未知方式 32 暫無數據 其他 XQ00179 SH RG00002 BK00050 ... 520 131744 57.0 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0
3 100306518 57.00 未知方式 17 暫無數據 精裝 XQ00313 SH RG00002 BK00051 ... 1665 253337 888.0 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0
4 100305262 129.00 未知方式 2 暫無數據 毛坯 XQ01257 SH RG00003 BK00044 ... 117 125309 2038.0 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 100000438 10.00 合租 11 精裝 XQ01209 SH RG00002 BK00062 ... 0 245872 29635.0 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0
41436 100000201 7.10 合租 6 精裝 XQ00853 SH RG00002 BK00055 ... 0 306857 28213.0 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0
41437 100000198 9.20 合租 18 精裝 XQ00852 SH RG00002 BK00055 ... 0 306857 19231.0 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0
41438 100000182 14.10 合租 8 精裝 XQ00791 SH RG00002 BK00055 ... 0 306857 17471.0 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0
41439 100000041 33.50 未知方式 19 其他 XQ03246 SH RG00010 BK00020 ... 990 406803 2556.0 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0

41440 rows × 52 columns

# id是唯一屬性,可以刪除
data = data.drop(['ID'],axis = 1)
data
area rentType totalFloor houseToward houseDecoration communityName city region plate buildYear ... newWorkers residentPopulation pv uv lookNum tradeTime tradeMoney roomNum living_room bathroom
0 68.06 未知方式 16 暫無數據 其他 XQ00051 SH RG00001 BK00064 1953 ... 614 111546 1124.0 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0
1 125.55 未知方式 14 暫無數據 簡裝 XQ00130 SH RG00002 BK00049 2007 ... 148 157552 701.0 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0
2 132.00 未知方式 32 暫無數據 其他 XQ00179 SH RG00002 BK00050 暫無信息 ... 520 131744 57.0 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0
3 57.00 未知方式 17 暫無數據 精裝 XQ00313 SH RG00002 BK00051 暫無信息 ... 1665 253337 888.0 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0
4 129.00 未知方式 2 暫無數據 毛坯 XQ01257 SH RG00003 BK00044 暫無信息 ... 117 125309 2038.0 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 10.00 合租 11 精裝 XQ01209 SH RG00002 BK00062 2009 ... 0 245872 29635.0 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0
41436 7.10 合租 6 精裝 XQ00853 SH RG00002 BK00055 2004 ... 0 306857 28213.0 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0
41437 9.20 合租 18 精裝 XQ00852 SH RG00002 BK00055 2000 ... 0 306857 19231.0 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0
41438 14.10 合租 8 精裝 XQ00791 SH RG00002 BK00055 1998 ... 0 306857 17471.0 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0
41439 33.50 未知方式 19 其他 XQ03246 SH RG00010 BK00020 2015 ... 990 406803 2556.0 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0

41440 rows × 51 columns

pd.get_dummies(data.rentType)
#這裏我們發現,竟然還有缺失值,上面一部分我直接去掉了,這裏由於樣本過多,我可以選擇刪除5個未知的
-- 合租 整租 未知方式
0 0 0 0 1
1 0 0 0 1
2 0 0 0 1
3 0 0 0 1
4 0 0 0 1
... ... ... ... ...
41435 0 1 0 0
41436 0 1 0 0
41437 0 1 0 0
41438 0 1 0 0
41439 0 0 0 1

41440 rows × 4 columns

print(data['rentType'].value_counts())# 統計租用方式的dict
# 通過中位數發現我們可以使用未知方式填充
for i in range(len(data['rentType'])):
    if(data['rentType'][i]=='--'):
        data['rentType'][i] = '未知方式'
print(data['rentType'].value_counts())# 統計租用方式的dict
未知方式    30759
整租       5472
合租       5204
--          5
Name: rentType, dtype: int64
未知方式    30764
整租       5472
合租       5204
Name: rentType, dtype: int64
pd.get_dummies(data.rentType)
data = data.join(pd.get_dummies(data.rentType))
data.drop(['rentType'],axis=1)
data
area rentType totalFloor houseToward houseDecoration communityName city region plate buildYear ... uv lookNum tradeTime tradeMoney roomNum living_room bathroom 合租 整租 未知方式
0 68.06 未知方式 16 暫無數據 其他 XQ00051 SH RG00001 BK00064 1953 ... 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0 0 0 1
1 125.55 未知方式 14 暫無數據 簡裝 XQ00130 SH RG00002 BK00049 2007 ... 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0 0 0 1
2 132.00 未知方式 32 暫無數據 其他 XQ00179 SH RG00002 BK00050 暫無信息 ... 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0 0 0 1
3 57.00 未知方式 17 暫無數據 精裝 XQ00313 SH RG00002 BK00051 暫無信息 ... 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0 0 0 1
4 129.00 未知方式 2 暫無數據 毛坯 XQ01257 SH RG00003 BK00044 暫無信息 ... 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 10.00 合租 11 精裝 XQ01209 SH RG00002 BK00062 2009 ... 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0 1 0 0
41436 7.10 合租 6 精裝 XQ00853 SH RG00002 BK00055 2004 ... 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0 1 0 0
41437 9.20 合租 18 精裝 XQ00852 SH RG00002 BK00055 2000 ... 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0 1 0 0
41438 14.10 合租 8 精裝 XQ00791 SH RG00002 BK00055 1998 ... 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0 1 0 0
41439 33.50 未知方式 19 其他 XQ03246 SH RG00010 BK00020 2015 ... 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0 0 0 1

41440 rows × 54 columns


data = data.drop(['rentType'],axis=1)
data
area totalFloor houseToward houseDecoration communityName city region plate buildYear saleSecHouseNum ... uv lookNum tradeTime tradeMoney roomNum living_room bathroom 合租 整租 未知方式
0 68.06 16 暫無數據 其他 XQ00051 SH RG00001 BK00064 1953 0 ... 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0 0 0 1
1 125.55 14 暫無數據 簡裝 XQ00130 SH RG00002 BK00049 2007 0 ... 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0 0 0 1
2 132.00 32 暫無數據 其他 XQ00179 SH RG00002 BK00050 暫無信息 3 ... 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0 0 0 1
3 57.00 17 暫無數據 精裝 XQ00313 SH RG00002 BK00051 暫無信息 0 ... 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0 0 0 1
4 129.00 2 暫無數據 毛坯 XQ01257 SH RG00003 BK00044 暫無信息 1 ... 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 10.00 11 精裝 XQ01209 SH RG00002 BK00062 2009 0 ... 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0 1 0 0
41436 7.10 6 精裝 XQ00853 SH RG00002 BK00055 2004 0 ... 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0 1 0 0
41437 9.20 18 精裝 XQ00852 SH RG00002 BK00055 2000 0 ... 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0 1 0 0
41438 14.10 8 精裝 XQ00791 SH RG00002 BK00055 1998 0 ... 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0 1 0 0
41439 33.50 19 其他 XQ03246 SH RG00010 BK00020 2015 3 ... 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0 0 0 1

41440 rows × 53 columns

data
area totalFloor houseToward houseDecoration communityName city region plate buildYear saleSecHouseNum ... uv lookNum tradeTime tradeMoney roomNum living_room bathroom 合租 整租 未知方式
0 68.06 16 暫無數據 其他 XQ00051 SH RG00001 BK00064 1953 0 ... 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0 0 0 1
1 125.55 14 暫無數據 簡裝 XQ00130 SH RG00002 BK00049 2007 0 ... 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0 0 0 1
2 132.00 32 暫無數據 其他 XQ00179 SH RG00002 BK00050 暫無信息 3 ... 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0 0 0 1
3 57.00 17 暫無數據 精裝 XQ00313 SH RG00002 BK00051 暫無信息 0 ... 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0 0 0 1
4 129.00 2 暫無數據 毛坯 XQ01257 SH RG00003 BK00044 暫無信息 1 ... 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 10.00 11 精裝 XQ01209 SH RG00002 BK00062 2009 0 ... 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0 1 0 0
41436 7.10 6 精裝 XQ00853 SH RG00002 BK00055 2004 0 ... 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0 1 0 0
41437 9.20 18 精裝 XQ00852 SH RG00002 BK00055 2000 0 ... 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0 1 0 0
41438 14.10 8 精裝 XQ00791 SH RG00002 BK00055 1998 0 ... 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0 1 0 0
41439 33.50 19 其他 XQ03246 SH RG00010 BK00020 2015 3 ... 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0 0 0 1

41440 rows × 53 columns

print(data['houseDecoration'].value_counts())# 這個暫時沒有思路
其他    29040
精裝    10918
簡裝     1171
毛坯      311
Name: houseDecoration, dtype: int64
#對於建立年代的處理我選擇的處理方式是
num_sum = 0
j = 0
for i in data['buildYear']:
    if(i!="暫無信息"):
        j+=1
        num_sum+=float(i)
mean1 = num_sum/j
for i in range(len(data['buildYear'])):
    if(data['buildYear'][i]=='暫無信息'):
        data['buildYear'][i] = str(mean1)
data
area totalFloor houseToward houseDecoration communityName city region plate buildYear saleSecHouseNum ... uv lookNum tradeTime tradeMoney roomNum living_room bathroom 合租 整租 未知方式
0 68.06 16 暫無數據 其他 XQ00051 SH RG00001 BK00064 1953 0 ... 284.0 0 2018/11/28 2000.0 2.0 1.0 1.0 0 0 1
1 125.55 14 暫無數據 簡裝 XQ00130 SH RG00002 BK00049 2007 0 ... 22.0 1 2018/12/16 2000.0 3.0 2.0 2.0 0 0 1
2 132.00 32 暫無數據 其他 XQ00179 SH RG00002 BK00050 1999.3850952578173 3 ... 20.0 1 2018/12/22 16000.0 3.0 2.0 2.0 0 0 1
3 57.00 17 暫無數據 精裝 XQ00313 SH RG00002 BK00051 1999.3850952578173 0 ... 279.0 9 2018/12/21 1600.0 1.0 1.0 1.0 0 0 1
4 129.00 2 暫無數據 毛坯 XQ01257 SH RG00003 BK00044 1999.3850952578173 1 ... 480.0 0 2018/11/18 2900.0 3.0 2.0 2.0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41435 10.00 11 精裝 XQ01209 SH RG00002 BK00062 2009 0 ... 2662.0 0 2018/2/5 2190.0 4.0 1.0 1.0 1 0 0
41436 7.10 6 精裝 XQ00853 SH RG00002 BK00055 2004 0 ... 2446.0 0 2018/1/22 2090.0 3.0 1.0 1.0 1 0 0
41437 9.20 18 精裝 XQ00852 SH RG00002 BK00055 2000 0 ... 2016.0 0 2018/2/8 3190.0 4.0 1.0 1.0 1 0 0
41438 14.10 8 精裝 XQ00791 SH RG00002 BK00055 1998 0 ... 2554.0 0 2018/3/22 2460.0 4.0 1.0 1.0 1 0 0
41439 33.50 19 其他 XQ03246 SH RG00010 BK00020 2015 3 ... 717.0 1 2018/10/21 3000.0 1.0 1.0 1.0 0 0 1

41440 rows × 53 columns

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41440 entries, 0 to 41439
Data columns (total 53 columns):
area                  41440 non-null float64
totalFloor            41440 non-null int64
houseToward           41440 non-null object
houseDecoration       41440 non-null object
communityName         41440 non-null object
city                  41440 non-null object
region                41440 non-null object
plate                 41440 non-null object
buildYear             41440 non-null object
saleSecHouseNum       41440 non-null int64
subwayStationNum      41440 non-null int64
busStationNum         41440 non-null int64
interSchoolNum        41440 non-null int64
schoolNum             41440 non-null int64
privateSchoolNum      41440 non-null int64
hospitalNum           41440 non-null int64
drugStoreNum          41440 non-null int64
gymNum                41440 non-null int64
bankNum               41440 non-null int64
shopNum               41440 non-null int64
parkNum               41440 non-null int64
mallNum               41440 non-null int64
superMarketNum        41440 non-null int64
totalTradeMoney       41440 non-null int64
totalTradeArea        41440 non-null float64
tradeMeanPrice        41440 non-null float64
tradeSecNum           41440 non-null int64
totalNewTradeMoney    41440 non-null int64
totalNewTradeArea     41440 non-null int64
tradeNewMeanPrice     41440 non-null float64
tradeNewNum           41440 non-null int64
remainNewNum          41440 non-null int64
supplyNewNum          41440 non-null int64
supplyLandNum         41440 non-null int64
supplyLandArea        41440 non-null float64
tradeLandNum          41440 non-null int64
tradeLandArea         41440 non-null float64
landTotalPrice        41440 non-null int64
landMeanPrice         41440 non-null float64
totalWorkers          41440 non-null int64
newWorkers            41440 non-null int64
residentPopulation    41440 non-null int64
pv                    41440 non-null float64
uv                    41440 non-null float64
lookNum               41440 non-null int64
tradeTime             41440 non-null object
tradeMoney            41440 non-null float64
roomNum               41440 non-null float64
living_room           41440 non-null float64
bathroom              41440 non-null float64
合租                    41440 non-null uint8
整租                    41440 non-null uint8
未知方式                  41440 non-null uint8
dtypes: float64(13), int64(29), object(8), uint8(3)
memory usage: 15.9+ MB
data = data.infer_objects()
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41440 entries, 0 to 41439
Data columns (total 53 columns):
area                  41440 non-null float64
totalFloor            41440 non-null int64
houseToward           41440 non-null object
houseDecoration       41440 non-null object
communityName         41440 non-null object
city                  41440 non-null object
region                41440 non-null object
plate                 41440 non-null object
buildYear             41440 non-null object
saleSecHouseNum       41440 non-null int64
subwayStationNum      41440 non-null int64
busStationNum         41440 non-null int64
interSchoolNum        41440 non-null int64
schoolNum             41440 non-null int64
privateSchoolNum      41440 non-null int64
hospitalNum           41440 non-null int64
drugStoreNum          41440 non-null int64
gymNum                41440 non-null int64
bankNum               41440 non-null int64
shopNum               41440 non-null int64
parkNum               41440 non-null int64
mallNum               41440 non-null int64
superMarketNum        41440 non-null int64
totalTradeMoney       41440 non-null int64
totalTradeArea        41440 non-null float64
tradeMeanPrice        41440 non-null float64
tradeSecNum           41440 non-null int64
totalNewTradeMoney    41440 non-null int64
totalNewTradeArea     41440 non-null int64
tradeNewMeanPrice     41440 non-null float64
tradeNewNum           41440 non-null int64
remainNewNum          41440 non-null int64
supplyNewNum          41440 non-null int64
supplyLandNum         41440 non-null int64
supplyLandArea        41440 non-null float64
tradeLandNum          41440 non-null int64
tradeLandArea         41440 non-null float64
landTotalPrice        41440 non-null int64
landMeanPrice         41440 non-null float64
totalWorkers          41440 non-null int64
newWorkers            41440 non-null int64
residentPopulation    41440 non-null int64
pv                    41440 non-null float64
uv                    41440 non-null float64
lookNum               41440 non-null int64
tradeTime             41440 non-null object
tradeMoney            41440 non-null float64
roomNum               41440 non-null float64
living_room           41440 non-null float64
bathroom              41440 non-null float64
合租                    41440 non-null uint8
整租                    41440 non-null uint8
未知方式                  41440 non-null uint8
dtypes: float64(13), int64(29), object(8), uint8(3)
memory usage: 15.9+ MB
# 數值
corr = data.corr()
plt.figure(figsize=(15,6))
# print(corr)
sns.heatmap(corr)
#還有一些數據需要改變
<matplotlib.axes._subplots.AxesSubplot at 0x7f7828753210>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-YkCzMjgV-1578403791176)(output_51_1.png)]

# 箱線圖統計
plt.figure(figsize=(15,6))
data.boxplot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f7828d09190>

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ga6hTSZc-1578403791177)(output_52_1.png)]


發佈了157 篇原創文章 · 獲贊 23 · 訪問量 1萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章