1. 賽題分析
比賽要求參賽選手根據給定的數據集,建立模型,預測房屋租金。
數據集中的數據類別包括租賃房源、小區、二手房、配套、新房、土地、人口、客戶、真實租金等。
這是典型的迴歸預測。
預測指標
迴歸結果評價標準採用R-Square
R2(R-Square)的公式爲:
殘差平方和:
總平均值:
其中表示的平均值
得到表達式爲:
用於度量因變量的變異中可由自變量解釋部分所佔的比例,取值範圍是 0~1,越接近1,表明迴歸平方和佔總平方和的比例越大,迴歸線與各觀測點越接近,用x的變化來解釋y值變化的部分就越多,迴歸的擬合程度就越好。所以也稱爲擬合優度(Goodness of Fit)的統計量。
表示真實值,表示預測值,表示樣本均值。得分越高擬合效果越好。
數據概況
1.租賃基本信息:
-
ID——房屋編號
-
area——房屋面積
-
rentType——出租方式:整租/合租/未知
-
houseType——房型
-
houseFloor——房間所在樓層:高/中/低
-
totalFloor——房間所在的總樓層數
-
houseToward——房間朝向
-
houseDecoration——房屋裝修
-
tradeTime——成交日期
-
tradeMoney——成交租金
2.小區信息:
- CommunityName——小區名稱
- city——城市
- region——地區
- plate——區域板塊
- buildYear——小區建築年代
- saleSecHouseNum——該板塊當月二手房掛牌房源數
3.配套設施:
- subwayStationNum——該板塊地鐵站數量
- busStationNum——該板塊公交站數量
- interSchoolNum——該板塊國際學校的數量
- schoolNum——該板塊公立學校的數量
- privateSchoolNum——該板塊私立學校數量
- hospitalNum——該板塊綜合醫院數量
- DrugStoreNum——該板塊藥房數量
- gymNum——該板塊健身中心數量
- bankNum——該板塊銀行數量
- shopNum——該板塊商店數量
- parkNum——該板塊公園數量
- mallNum——該板塊購物中心數量
- superMarketNum——該板塊超市數量
4.其他信息:
-
totalTradeMoney——該板塊當月二手房成交總金額
-
totalTradeArea——該板塊二手房成交總面積
-
tradeMeanPrice——該板塊二手房成交均價
-
tradeSecNum——該板塊當月二手房成交套數
-
totalNewTradeMoney——該板塊當月新房成交總金額
-
totalNewTradeArea——該板塊當月新房成交的總面積
-
totalNewMeanPrice——該板塊當月新房成交均價
-
tradeNewNum——該板塊當月新房成交套數
-
remainNewNum——該板塊當月新房未成交套數
-
supplyNewNum——該板塊當月新房供應套數
-
supplyLandNum——該板塊當月土地供應幅數
-
supplyLandArea——該板塊當月土地供應面積
-
tradeLandNum——該板塊當月土地成交幅數
-
tradeLandArea——該板塊當月土地成交面積
-
landTotalPrice——該板塊當月土地成交總價
-
landMeanPrice——該板塊當月樓板價(元/m^{2})
-
totalWorkers——當前板塊現有的辦公人數
-
newWorkers——該板塊當月流入人口數(現招聘的人員)
-
residentPopulation——該板塊常住人口
-
pv——該板塊當月租客瀏覽網頁次數
-
uv——該板塊當月租客瀏覽網頁總人數
-
lookNum——線下看房次數
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from pylab import *
fname = r"/home/ach/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf/SimHei.TTF"
myfont = FontProperties(fname=fname)
import seaborn as sns
# 根據特徵含義和特徵一覽,大致可以判斷出數值型和類別型特徵如下
categorical_feas = ['rentType', 'houseType', 'houseFloor', 'region', 'plate', 'houseToward', 'houseDecoration',
'communityName','city','region','plate','buildYear']
numerical_feas=['ID','area','totalFloor','saleSecHouseNum','subwayStationNum',
'busStationNum','interSchoolNum','schoolNum','privateSchoolNum','hospitalNum',
'drugStoreNum','gymNum','bankNum','shopNum','parkNum','mallNum','superMarketNum',
'totalTradeMoney','totalTradeArea','tradeMeanPrice','tradeSecNum','totalNewTradeMoney',
'totalNewTradeArea','tradeNewMeanPrice','tradeNewNum','remainNewNum','supplyNewNum',
'supplyLandNum','supplyLandArea','tradeLandNum','tradeLandArea','landTotalPrice',
'landMeanPrice','totalWorkers','newWorkers','residentPopulation','pv','uv','lookNum']
我們可以發現這是房價預測,所以應該是屬於迴歸問題
data = pd.read_csv('./數據集/train_data.csv')
data
ID | area | rentType | houseType | houseFloor | totalFloor | houseToward | houseDecoration | communityName | city | ... | landTotalPrice | landMeanPrice | totalWorkers | newWorkers | residentPopulation | pv | uv | lookNum | tradeTime | tradeMoney | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100309852 | 68.06 | 未知方式 | 2室1廳1衛 | 低 | 16 | 暫無數據 | 其他 | XQ00051 | SH | ... | 0 | 0.0000 | 28248 | 614 | 111546 | 1124.0 | 284.0 | 0 | 2018/11/28 | 2000.0 |
1 | 100307942 | 125.55 | 未知方式 | 3室2廳2衛 | 中 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | ... | 0 | 0.0000 | 14823 | 148 | 157552 | 701.0 | 22.0 | 1 | 2018/12/16 | 2000.0 |
2 | 100307764 | 132.00 | 未知方式 | 3室2廳2衛 | 低 | 32 | 暫無數據 | 其他 | XQ00179 | SH | ... | 0 | 0.0000 | 77645 | 520 | 131744 | 57.0 | 20.0 | 1 | 2018/12/22 | 16000.0 |
3 | 100306518 | 57.00 | 未知方式 | 1室1廳1衛 | 中 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | ... | 332760000 | 3080.0331 | 8750 | 1665 | 253337 | 888.0 | 279.0 | 9 | 2018/12/21 | 1600.0 |
4 | 100305262 | 129.00 | 未知方式 | 3室2廳3衛 | 低 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | ... | 0 | 0.0000 | 800 | 117 | 125309 | 2038.0 | 480.0 | 0 | 2018/11/18 | 2900.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 100000438 | 10.00 | 合租 | 4室1廳1衛 | 高 | 11 | 北 | 精裝 | XQ01209 | SH | ... | 573070000 | 4313.0100 | 20904 | 0 | 245872 | 29635.0 | 2662.0 | 0 | 2018/2/5 | 2190.0 |
41436 | 100000201 | 7.10 | 合租 | 3室1廳1衛 | 中 | 6 | 北 | 精裝 | XQ00853 | SH | ... | 0 | 0.0000 | 4370 | 0 | 306857 | 28213.0 | 2446.0 | 0 | 2018/1/22 | 2090.0 |
41437 | 100000198 | 9.20 | 合租 | 4室1廳1衛 | 高 | 18 | 北 | 精裝 | XQ00852 | SH | ... | 0 | 0.0000 | 4370 | 0 | 306857 | 19231.0 | 2016.0 | 0 | 2018/2/8 | 3190.0 |
41438 | 100000182 | 14.10 | 合租 | 4室1廳1衛 | 低 | 8 | 北 | 精裝 | XQ00791 | SH | ... | 0 | 0.0000 | 4370 | 0 | 306857 | 17471.0 | 2554.0 | 0 | 2018/3/22 | 2460.0 |
41439 | 100000041 | 33.50 | 未知方式 | 1室1廳1衛 | 中 | 19 | 北 | 其他 | XQ03246 | SH | ... | 0 | 0.0000 | 13192 | 990 | 406803 | 2556.0 | 717.0 | 1 | 2018/10/21 | 3000.0 |
41440 rows × 51 columns
data.info()
data.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41440 entries, 0 to 41439
Data columns (total 51 columns):
ID 41440 non-null int64
area 41440 non-null float64
rentType 41440 non-null object
houseType 41440 non-null object
houseFloor 41440 non-null object
totalFloor 41440 non-null int64
houseToward 41440 non-null object
houseDecoration 41440 non-null object
communityName 41440 non-null object
city 41440 non-null object
region 41440 non-null object
plate 41440 non-null object
buildYear 41440 non-null object
saleSecHouseNum 41440 non-null int64
subwayStationNum 41440 non-null int64
busStationNum 41440 non-null int64
interSchoolNum 41440 non-null int64
schoolNum 41440 non-null int64
privateSchoolNum 41440 non-null int64
hospitalNum 41440 non-null int64
drugStoreNum 41440 non-null int64
gymNum 41440 non-null int64
bankNum 41440 non-null int64
shopNum 41440 non-null int64
parkNum 41440 non-null int64
mallNum 41440 non-null int64
superMarketNum 41440 non-null int64
totalTradeMoney 41440 non-null int64
totalTradeArea 41440 non-null float64
tradeMeanPrice 41440 non-null float64
tradeSecNum 41440 non-null int64
totalNewTradeMoney 41440 non-null int64
totalNewTradeArea 41440 non-null int64
tradeNewMeanPrice 41440 non-null float64
tradeNewNum 41440 non-null int64
remainNewNum 41440 non-null int64
supplyNewNum 41440 non-null int64
supplyLandNum 41440 non-null int64
supplyLandArea 41440 non-null float64
tradeLandNum 41440 non-null int64
tradeLandArea 41440 non-null float64
landTotalPrice 41440 non-null int64
landMeanPrice 41440 non-null float64
totalWorkers 41440 non-null int64
newWorkers 41440 non-null int64
residentPopulation 41440 non-null int64
pv 41422 non-null float64
uv 41422 non-null float64
lookNum 41440 non-null int64
tradeTime 41440 non-null object
tradeMoney 41440 non-null float64
dtypes: float64(10), int64(30), object(11)
memory usage: 16.1+ MB
ID | area | totalFloor | saleSecHouseNum | subwayStationNum | busStationNum | interSchoolNum | schoolNum | privateSchoolNum | hospitalNum | ... | tradeLandArea | landTotalPrice | landMeanPrice | totalWorkers | newWorkers | residentPopulation | pv | uv | lookNum | tradeMoney | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 4.144000e+04 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | ... | 41440.000000 | 4.144000e+04 | 41440.000000 | 41440.000000 | 41440.000000 | 41440.000000 | 41422.000000 | 41422.000000 | 41440.000000 | 4.144000e+04 |
mean | 1.001221e+08 | 70.959409 | 11.413152 | 1.338538 | 5.741192 | 187.197153 | 1.506395 | 48.228813 | 6.271911 | 4.308736 | ... | 12621.406425 | 1.045363e+08 | 724.763918 | 77250.235497 | 1137.132095 | 294514.059459 | 26945.663512 | 3089.077085 | 0.396260 | 8.837074e+03 |
std | 9.376566e+04 | 88.119569 | 7.375203 | 3.180349 | 4.604929 | 179.674625 | 1.687631 | 29.568448 | 4.946457 | 3.359714 | ... | 49853.120341 | 5.215216e+08 | 3224.303831 | 132052.508523 | 7667.381627 | 196745.147181 | 32174.637924 | 2954.706517 | 1.653932 | 5.514287e+05 |
min | 1.000000e+08 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 24.000000 | 0.000000 | 9.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000e+00 | 0.000000 | 600.000000 | 0.000000 | 49330.000000 | 17.000000 | 6.000000 | 0.000000 | 0.000000e+00 |
25% | 1.000470e+08 | 42.607500 | 6.000000 | 0.000000 | 2.000000 | 74.000000 | 0.000000 | 24.000000 | 2.000000 | 1.000000 | ... | 0.000000 | 0.000000e+00 | 0.000000 | 13983.000000 | 0.000000 | 165293.000000 | 7928.000000 | 1053.000000 | 0.000000 | 2.800000e+03 |
50% | 1.000960e+08 | 65.000000 | 7.000000 | 0.000000 | 5.000000 | 128.000000 | 1.000000 | 47.000000 | 5.000000 | 4.000000 | ... | 0.000000 | 0.000000e+00 | 0.000000 | 38947.000000 | 0.000000 | 245872.000000 | 20196.000000 | 2375.000000 | 0.000000 | 4.000000e+03 |
75% | 1.001902e+08 | 90.000000 | 16.000000 | 1.000000 | 7.000000 | 258.000000 | 3.000000 | 61.000000 | 9.000000 | 6.000000 | ... | 0.000000 | 0.000000e+00 | 0.000000 | 76668.000000 | 0.000000 | 330610.000000 | 34485.000000 | 4233.000000 | 0.000000 | 5.500000e+03 |
max | 1.003218e+08 | 15055.000000 | 88.000000 | 52.000000 | 22.000000 | 824.000000 | 8.000000 | 142.000000 | 24.000000 | 14.000000 | ... | 555508.010000 | 6.197570e+09 | 37513.062490 | 855400.000000 | 143700.000000 | 928198.000000 | 621864.000000 | 39876.000000 | 37.000000 | 1.000000e+08 |
8 rows × 40 columns
groupby_user = data.groupby('rentType').size()
print(groupby_user)
groupby_user.plot.bar(title='renttype',figsize = (15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# 未知方式爲主
rentType
-- 5
合租 5204
整租 5472
未知方式 30759
dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-QQ3pSdib-1578403791151)(output_6_1.png)]
groupby_user = data.groupby('houseType').size()
print(groupby_user)
groupby_user.plot.bar(title='houseType',figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
#說明大部分人選擇一室一廳一衛
houseType
0室0廳1衛 1
1室0廳0衛 86
1室0廳1衛 1286
1室1廳0衛 12
1室1廳1衛 9805
...
8室2廳4衛 1
8室3廳4衛 1
8室4廳4衛 1
9室2廳5衛 1
9室3廳8衛 1
Length: 104, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-FBUaQe1U-1578403791153)(output_7_1.png)]
groupby_user = data.groupby('houseFloor').size()
print(groupby_user)
groupby_user.plot.bar(title='houseFloor', figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
houseFloor
中 15458
低 11916
高 14066
dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-bIQoCg1C-1578403791154)(output_8_1.png)]
groupby_user = data.groupby('totalFloor').size()
print(groupby_user)
groupby_user.plot.bar(title='totalFloor', figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# 大部分人選擇六樓,需要注意一下
totalFloor
0 5
1 98
2 193
3 446
4 486
5 2730
6 15797
7 1362
8 624
9 393
10 401
11 2884
12 738
13 882
14 2166
15 809
16 1147
17 1375
18 3553
19 467
20 457
21 466
22 309
23 161
24 732
25 390
26 300
27 399
28 258
29 289
30 144
31 211
32 234
33 117
34 54
35 57
36 57
37 96
38 33
39 10
40 11
41 17
43 12
45 3
47 4
49 25
51 1
53 7
56 17
58 1
59 1
60 3
61 1
62 5
88 2
dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-P7Dt0Q69-1578403791154)(output_9_1.png)]
groupby_user = data.groupby('houseDecoration').size()
print(groupby_user)
groupby_user.plot.bar(title='houseDecoration', figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# 其他
houseDecoration
其他 29040
毛坯 311
簡裝 1171
精裝 10918
dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-TntFgh88-1578403791155)(output_10_1.png)]
groupby_user = data.groupby('plate').size()
print(groupby_user)
print(sorted(groupby_user.items(),key=lambda item:item[1],reverse=True))
groupby_user.plot.bar(title='plate',figsize=(15,4))
warnings.filterwarnings("ignore")# 忽略畫圖的時候的警告
# bk00031地方更貴
plate
BK00001 1
BK00002 357
BK00003 523
BK00004 189
BK00005 549
...
BK00062 618
BK00063 281
BK00064 590
BK00065 348
BK00066 219
Length: 66, dtype: int64
[('BK00031', 1958), ('BK00033', 1837), ('BK00045', 1816), ('BK00055', 1566), ('BK00056', 1516), ('BK00052', 1375), ('BK00017', 1305), ('BK00041', 1266), ('BK00054', 1256), ('BK00051', 1253), ('BK00046', 1227), ('BK00035', 1156), ('BK00042', 1137), ('BK00009', 1016), ('BK00050', 979), ('BK00043', 930), ('BK00026', 906), ('BK00047', 880), ('BK00034', 849), ('BK00013', 834), ('BK00053', 819), ('BK00028', 745), ('BK00040', 679), ('BK00060', 671), ('BK00010', 651), ('BK00029', 646), ('BK00062', 618), ('BK00022', 614), ('BK00018', 613), ('BK00064', 590), ('BK00005', 549), ('BK00003', 523), ('BK00014', 500), ('BK00019', 498), ('BK00061', 477), ('BK00011', 455), ('BK00037', 444), ('BK00012', 412), ('BK00038', 398), ('BK00024', 397), ('BK00020', 384), ('BK00002', 357), ('BK00065', 348), ('BK00027', 344), ('BK00039', 343), ('BK00063', 281), ('BK00057', 278), ('BK00015', 253), ('BK00006', 231), ('BK00021', 226), ('BK00007', 225), ('BK00030', 219), ('BK00066', 219), ('BK00049', 211), ('BK00008', 210), ('BK00004', 189), ('BK00048', 165), ('BK00025', 157), ('BK00023', 127), ('BK00059', 122), ('BK00044', 98), ('BK00016', 40), ('BK00036', 33), ('BK00058', 15), ('BK00032', 3), ('BK00001', 1)]
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-k4rD7OzZ-1578403791156)(output_11_1.png)]
print(len(data['plate']))
41440
# plt.scatter(data['plate'],data['busStationNum'])
# 公交車站和地區關係
x = []
y = []
for i in range(len(data['plate'])):
if(data['plate'][i] not in x):
x.append(data['plate'][i] )
y.append(data['busStationNum'][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)
plt.figure(figsize=(15,6))
plt.title("bustation above plate|")
plt.scatter(res2.keys(),res2.values())
{'BK00045': 824, 'BK00031': 461, 'BK00042': 441, 'BK00016': 387, 'BK00051': 364, 'BK00001': 356, 'BK00057': 331, 'BK00054': 306, 'BK00032': 284, 'BK00052': 276, 'BK00058': 264, 'BK00056': 258, 'BK00062': 196, 'BK00053': 190, 'BK00049': 184, 'BK00035': 178, 'BK00047': 172, 'BK00038': 169, 'BK00046': 167, 'BK00022': 156, 'BK00055': 151, 'BK00061': 151, 'BK00041': 144, 'BK00044': 141, 'BK00040': 138, 'BK00036': 131, 'BK00059': 128, 'BK00020': 114, 'BK00021': 114, 'BK00005': 105, 'BK00026': 101, 'BK00043': 98, 'BK00015': 98, 'BK00033': 96, 'BK00014': 95, 'BK00066': 95, 'BK00017': 92, 'BK00060': 88, 'BK00018': 83, 'BK00012': 82, 'BK00048': 82, 'BK00002': 79, 'BK00034': 78, 'BK00027': 74, 'BK00013': 72, 'BK00025': 70, 'BK00037': 68, 'BK00028': 67, 'BK00010': 62, 'BK00050': 60, 'BK00009': 56, 'BK00007': 52, 'BK00008': 52, 'BK00030': 48, 'BK00023': 47, 'BK00003': 45, 'BK00019': 42, 'BK00039': 41, 'BK00064': 36, 'BK00011': 34, 'BK00004': 30, 'BK00006': 29, 'BK00065': 29, 'BK00029': 27, 'BK00063': 25, 'BK00024': 24}
<matplotlib.collections.PathCollection at 0x7f78282b0210>
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-N7GoEP77-1578403791156)(output_13_2.png)]
# plt.scatter(data['plate'],data['busStationNum'])
# 房源售賣和地區關係
x = []
y = []
for i in range(len(data['plate'])):
if(data['plate'][i] not in x):
x.append(data['plate'][i] )
y.append(data['saleSecHouseNum'][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)
plt.figure(figsize=(15,6))
plt.title("saleSecHouseNum above plate")
plt.scatter(res2.keys(),res2.values())
{'BK00015': 6, 'BK00050': 3, 'BK00032': 3, 'BK00044': 1, 'BK00052': 1, 'BK00064': 0, 'BK00049': 0, 'BK00051': 0, 'BK00031': 0, 'BK00028': 0, 'BK00017': 0, 'BK00045': 0, 'BK00027': 0, 'BK00041': 0, 'BK00047': 0, 'BK00009': 0, 'BK00025': 0, 'BK00024': 0, 'BK00014': 0, 'BK00026': 0, 'BK00042': 0, 'BK00046': 0, 'BK00043': 0, 'BK00013': 0, 'BK00012': 0, 'BK00005': 0, 'BK00011': 0, 'BK00010': 0, 'BK00003': 0, 'BK00033': 0, 'BK00053': 0, 'BK00006': 0, 'BK00004': 0, 'BK00002': 0, 'BK00007': 0, 'BK00016': 0, 'BK00019': 0, 'BK00030': 0, 'BK00048': 0, 'BK00018': 0, 'BK00008': 0, 'BK00029': 0, 'BK00065': 0, 'BK00035': 0, 'BK00036': 0, 'BK00022': 0, 'BK00023': 0, 'BK00054': 0, 'BK00038': 0, 'BK00037': 0, 'BK00034': 0, 'BK00058': 0, 'BK00066': 0, 'BK00039': 0, 'BK00057': 0, 'BK00020': 0, 'BK00059': 0, 'BK00060': 0, 'BK00063': 0, 'BK00055': 0, 'BK00061': 0, 'BK00040': 0, 'BK00056': 0, 'BK00062': 0, 'BK00021': 0, 'BK00001': 0}
<matplotlib.collections.PathCollection at 0x7f78280fce50>
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-Yerw2bRO-1578403791157)(output_14_2.png)]
# plt.scatter(data['plate'],data['busStationNum'])
# 學校
x = []
y = []
for i in range(len(data['plate'])):
if(data['plate'][i] not in x):
x.append(data['plate'][i] )
y.append(data['interSchoolNum'][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)
plt.figure(figsize=(15,6))
plt.title("interSchoolNum above plate")
plt.scatter(res2.keys(),res2.values())
{'BK00007': 8, 'BK00008': 8, 'BK00053': 6, 'BK00038': 6, 'BK00031': 4, 'BK00005': 4, 'BK00016': 4, 'BK00034': 4, 'BK00063': 4, 'BK00045': 3, 'BK00014': 3, 'BK00029': 3, 'BK00035': 3, 'BK00036': 3, 'BK00066': 3, 'BK00057': 3, 'BK00060': 3, 'BK00051': 2, 'BK00052': 2, 'BK00013': 2, 'BK00010': 2, 'BK00039': 2, 'BK00020': 2, 'BK00040': 2, 'BK00062': 2, 'BK00021': 2, 'BK00050': 1, 'BK00028': 1, 'BK00027': 1, 'BK00024': 1, 'BK00026': 1, 'BK00042': 1, 'BK00046': 1, 'BK00004': 1, 'BK00018': 1, 'BK00054': 1, 'BK00037': 1, 'BK00058': 1, 'BK00064': 0, 'BK00049': 0, 'BK00044': 0, 'BK00017': 0, 'BK00041': 0, 'BK00047': 0, 'BK00009': 0, 'BK00025': 0, 'BK00043': 0, 'BK00012': 0, 'BK00011': 0, 'BK00003': 0, 'BK00033': 0, 'BK00006': 0, 'BK00002': 0, 'BK00015': 0, 'BK00019': 0, 'BK00030': 0, 'BK00048': 0, 'BK00065': 0, 'BK00022': 0, 'BK00023': 0, 'BK00059': 0, 'BK00055': 0, 'BK00061': 0, 'BK00056': 0, 'BK00032': 0, 'BK00001': 0}
<matplotlib.collections.PathCollection at 0x7f78284de7d0>
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-9tfkHJjT-1578403791158)(output_15_2.png)]
def paint(colum:str):
# plt.scatter(data['plate'],data['busStationNum'])
x = []
y = []
for i in range(len(data['plate'])):
if(data['plate'][i] not in x):
x.append(data['plate'][i] )
y.append(data[colum][i])
# plt.scatter(x,y)
res1 = {}
for i in range(len(x)):
res1[x[i]] = y[i]
# res2 = sorted(res1.items(),key=lambda item:item[1],reverse=True)
# dict= sorted(res1.iteritems(), key=lambda res1:d[1].getvalue(), reverse = True)
res2 = dict(sorted(res1.items(),key=lambda item:item[1],reverse=True))
print(res2)
plt.figure(figsize=(15,6))
plt.title("{} above plate".format(colum))
plt.scatter(res2.keys(),res2.values())
# 地鐵站
paint('subwayStationNum')
{'BK00052': 22, 'BK00057': 14, 'BK00056': 14, 'BK00042': 13, 'BK00002': 11, 'BK00055': 11, 'BK00061': 11, 'BK00041': 9, 'BK00064': 7, 'BK00027': 7, 'BK00012': 7, 'BK00018': 7, 'BK00060': 7, 'BK00040': 7, 'BK00050': 6, 'BK00031': 6, 'BK00053': 6, 'BK00035': 6, 'BK00054': 6, 'BK00020': 6, 'BK00021': 6, 'BK00045': 5, 'BK00014': 5, 'BK00005': 5, 'BK00011': 5, 'BK00010': 5, 'BK00065': 5, 'BK00062': 5, 'BK00013': 4, 'BK00007': 4, 'BK00030': 4, 'BK00008': 4, 'BK00037': 4, 'BK00051': 3, 'BK00028': 3, 'BK00017': 3, 'BK00025': 3, 'BK00024': 3, 'BK00026': 3, 'BK00006': 3, 'BK00004': 3, 'BK00016': 3, 'BK00023': 3, 'BK00066': 3, 'BK00059': 3, 'BK00063': 3, 'BK00049': 2, 'BK00047': 2, 'BK00009': 2, 'BK00043': 2, 'BK00003': 2, 'BK00015': 2, 'BK00019': 2, 'BK00022': 2, 'BK00038': 2, 'BK00034': 2, 'BK00058': 2, 'BK00046': 1, 'BK00033': 1, 'BK00036': 1, 'BK00039': 1, 'BK00044': 0, 'BK00048': 0, 'BK00029': 0, 'BK00032': 0, 'BK00001': 0}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-eqdWulxG-1578403791158)(output_17_1.png)]
# 公立學校
paint('schoolNum')
{'BK00052': 142, 'BK00045': 99, 'BK00056': 98, 'BK00066': 74, 'BK00027': 72, 'BK00031': 71, 'BK00028': 69, 'BK00057': 65, 'BK00013': 64, 'BK00042': 62, 'BK00054': 61, 'BK00060': 61, 'BK00051': 60, 'BK00012': 59, 'BK00023': 57, 'BK00009': 53, 'BK00033': 53, 'BK00016': 52, 'BK00026': 50, 'BK00011': 48, 'BK00055': 48, 'BK00061': 48, 'BK00025': 47, 'BK00005': 47, 'BK00007': 47, 'BK00008': 47, 'BK00024': 45, 'BK00020': 44, 'BK00021': 44, 'BK00050': 43, 'BK00040': 41, 'BK00001': 41, 'BK00018': 39, 'BK00003': 38, 'BK00010': 37, 'BK00030': 32, 'BK00034': 32, 'BK00035': 30, 'BK00037': 30, 'BK00006': 29, 'BK00064': 28, 'BK00002': 28, 'BK00049': 26, 'BK00053': 24, 'BK00029': 24, 'BK00032': 24, 'BK00038': 23, 'BK00017': 22, 'BK00041': 21, 'BK00065': 21, 'BK00022': 21, 'BK00039': 21, 'BK00062': 20, 'BK00019': 18, 'BK00058': 18, 'BK00059': 16, 'BK00044': 15, 'BK00004': 14, 'BK00046': 13, 'BK00063': 13, 'BK00048': 11, 'BK00047': 10, 'BK00014': 10, 'BK00043': 10, 'BK00015': 9, 'BK00036': 9}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-N6McJmHa-1578403791159)(output_18_1.png)]
#私立學校
paint('privateSchoolNum')
{'BK00040': 24, 'BK00034': 16, 'BK00033': 15, 'BK00027': 13, 'BK00056': 13, 'BK00052': 12, 'BK00011': 11, 'BK00037': 11, 'BK00020': 10, 'BK00021': 10, 'BK00045': 9, 'BK00013': 9, 'BK00010': 9, 'BK00029': 9, 'BK00039': 9, 'BK00060': 9, 'BK00009': 8, 'BK00042': 8, 'BK00012': 8, 'BK00018': 8, 'BK00038': 8, 'BK00063': 8, 'BK00026': 7, 'BK00003': 7, 'BK00028': 6, 'BK00066': 6, 'BK00057': 6, 'BK00031': 5, 'BK00005': 5, 'BK00053': 5, 'BK00002': 5, 'BK00007': 5, 'BK00008': 5, 'BK00025': 4, 'BK00035': 4, 'BK00043': 3, 'BK00019': 3, 'BK00050': 2, 'BK00017': 2, 'BK00041': 2, 'BK00046': 2, 'BK00016': 2, 'BK00065': 2, 'BK00023': 2, 'BK00054': 2, 'BK00055': 2, 'BK00061': 2, 'BK00001': 2, 'BK00064': 1, 'BK00051': 1, 'BK00047': 1, 'BK00024': 1, 'BK00014': 1, 'BK00006': 1, 'BK00015': 1, 'BK00048': 1, 'BK00036': 1, 'BK00022': 1, 'BK00062': 1, 'BK00049': 0, 'BK00044': 0, 'BK00004': 0, 'BK00030': 0, 'BK00058': 0, 'BK00059': 0, 'BK00032': 0}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-aaLOdCIA-1578403791160)(output_19_1.png)]
#醫院
paint('hospitalNum')
{'BK00052': 14, 'BK00045': 11, 'BK00042': 9, 'BK00051': 8, 'BK00013': 8, 'BK00005': 8, 'BK00031': 6, 'BK00041': 6, 'BK00025': 6, 'BK00024': 6, 'BK00026': 6, 'BK00007': 6, 'BK00008': 6, 'BK00055': 6, 'BK00061': 6, 'BK00028': 5, 'BK00012': 5, 'BK00030': 5, 'BK00029': 5, 'BK00054': 5, 'BK00020': 5, 'BK00056': 5, 'BK00021': 5, 'BK00027': 4, 'BK00009': 4, 'BK00038': 4, 'BK00058': 4, 'BK00057': 4, 'BK00050': 3, 'BK00046': 3, 'BK00002': 3, 'BK00023': 3, 'BK00037': 3, 'BK00001': 3, 'BK00014': 2, 'BK00011': 2, 'BK00010': 2, 'BK00003': 2, 'BK00006': 2, 'BK00018': 2, 'BK00065': 2, 'BK00035': 2, 'BK00034': 2, 'BK00039': 2, 'BK00060': 2, 'BK00032': 2, 'BK00064': 1, 'BK00049': 1, 'BK00017': 1, 'BK00047': 1, 'BK00043': 1, 'BK00033': 1, 'BK00004': 1, 'BK00016': 1, 'BK00048': 1, 'BK00022': 1, 'BK00066': 1, 'BK00059': 1, 'BK00062': 1, 'BK00044': 0, 'BK00053': 0, 'BK00015': 0, 'BK00019': 0, 'BK00036': 0, 'BK00063': 0, 'BK00040': 0}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-8rSOIqn9-1578403791161)(output_20_1.png)]
# DrugStoreNum——該板塊藥房數量
paint('drugStoreNum')
{'BK00045': 174, 'BK00042': 145, 'BK00052': 118, 'BK00031': 106, 'BK00054': 94, 'BK00056': 88, 'BK00057': 85, 'BK00051': 83, 'BK00055': 69, 'BK00061': 69, 'BK00040': 67, 'BK00041': 65, 'BK00038': 55, 'BK00046': 54, 'BK00001': 52, 'BK00062': 49, 'BK00020': 48, 'BK00021': 48, 'BK00035': 47, 'BK00034': 41, 'BK00032': 41, 'BK00017': 40, 'BK00026': 40, 'BK00012': 40, 'BK00028': 39, 'BK00053': 39, 'BK00016': 39, 'BK00022': 39, 'BK00027': 37, 'BK00043': 37, 'BK00033': 36, 'BK00018': 35, 'BK00066': 35, 'BK00060': 35, 'BK00009': 34, 'BK00013': 34, 'BK00058': 34, 'BK00002': 33, 'BK00047': 31, 'BK00005': 31, 'BK00010': 31, 'BK00025': 29, 'BK00003': 28, 'BK00049': 27, 'BK00011': 27, 'BK00037': 27, 'BK00036': 25, 'BK00050': 24, 'BK00059': 23, 'BK00019': 22, 'BK00048': 22, 'BK00039': 22, 'BK00044': 21, 'BK00006': 20, 'BK00023': 20, 'BK00007': 19, 'BK00008': 19, 'BK00014': 17, 'BK00024': 15, 'BK00030': 15, 'BK00065': 15, 'BK00015': 13, 'BK00064': 12, 'BK00029': 11, 'BK00063': 11, 'BK00004': 8}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-6jLJjtD6-1578403791161)(output_21_1.png)]
# gymNum——該板塊健身中心數量
paint('gymNum')
{'BK00045': 88, 'BK00042': 84, 'BK00060': 82, 'BK00037': 78, 'BK00052': 64, 'BK00026': 56, 'BK00056': 52, 'BK00057': 48, 'BK00006': 43, 'BK00040': 43, 'BK00055': 41, 'BK00061': 41, 'BK00053': 40, 'BK00007': 40, 'BK00008': 40, 'BK00024': 39, 'BK00027': 38, 'BK00025': 38, 'BK00020': 38, 'BK00021': 38, 'BK00054': 37, 'BK00031': 36, 'BK00050': 35, 'BK00005': 35, 'BK00010': 34, 'BK00033': 34, 'BK00013': 32, 'BK00051': 30, 'BK00018': 28, 'BK00039': 28, 'BK00028': 27, 'BK00041': 27, 'BK00017': 26, 'BK00011': 26, 'BK00035': 26, 'BK00046': 25, 'BK00012': 25, 'BK00034': 25, 'BK00003': 23, 'BK00062': 23, 'BK00038': 22, 'BK00066': 21, 'BK00019': 20, 'BK00065': 20, 'BK00001': 19, 'BK00023': 18, 'BK00047': 16, 'BK00009': 16, 'BK00043': 16, 'BK00063': 16, 'BK00064': 15, 'BK00048': 14, 'BK00004': 13, 'BK00022': 13, 'BK00030': 12, 'BK00014': 10, 'BK00002': 8, 'BK00036': 8, 'BK00032': 8, 'BK00029': 6, 'BK00058': 6, 'BK00049': 5, 'BK00044': 5, 'BK00016': 5, 'BK00059': 5, 'BK00015': 1}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-KTH69YOw-1578403791162)(output_22_1.png)]
# bankNum——該板塊銀行數量
paint('bankNum')
{'BK00060': 207, 'BK00045': 119, 'BK00025': 98, 'BK00052': 95, 'BK00057': 92, 'BK00042': 91, 'BK00031': 86, 'BK00007': 86, 'BK00008': 86, 'BK00056': 75, 'BK00024': 69, 'BK00026': 62, 'BK00013': 53, 'BK00033': 52, 'BK00027': 50, 'BK00023': 50, 'BK00054': 50, 'BK00001': 49, 'BK00051': 47, 'BK00028': 46, 'BK00041': 43, 'BK00005': 43, 'BK00053': 43, 'BK00037': 43, 'BK00012': 42, 'BK00010': 41, 'BK00011': 38, 'BK00050': 37, 'BK00020': 35, 'BK00040': 35, 'BK00021': 35, 'BK00055': 34, 'BK00061': 34, 'BK00006': 33, 'BK00058': 32, 'BK00034': 31, 'BK00066': 31, 'BK00062': 31, 'BK00003': 29, 'BK00018': 29, 'BK00016': 28, 'BK00019': 28, 'BK00030': 28, 'BK00029': 27, 'BK00032': 25, 'BK00038': 24, 'BK00039': 24, 'BK00009': 23, 'BK00035': 22, 'BK00017': 21, 'BK00046': 21, 'BK00022': 21, 'BK00002': 20, 'BK00043': 18, 'BK00064': 16, 'BK00049': 16, 'BK00014': 16, 'BK00065': 15, 'BK00036': 14, 'BK00047': 13, 'BK00048': 12, 'BK00004': 11, 'BK00044': 10, 'BK00059': 9, 'BK00015': 7, 'BK00063': 7}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-lyWhXHa8-1578403791162)(output_23_1.png)]
# 購物商店
paint('shopNum')
{'BK00045': 824, 'BK00042': 671, 'BK00031': 598, 'BK00052': 483, 'BK00054': 419, 'BK00057': 404, 'BK00051': 358, 'BK00012': 354, 'BK00001': 353, 'BK00056': 341, 'BK00032': 340, 'BK00020': 318, 'BK00021': 318, 'BK00027': 306, 'BK00041': 301, 'BK00025': 245, 'BK00018': 243, 'BK00055': 236, 'BK00061': 236, 'BK00026': 231, 'BK00016': 224, 'BK00023': 224, 'BK00040': 224, 'BK00013': 223, 'BK00038': 215, 'BK00062': 215, 'BK00035': 214, 'BK00028': 211, 'BK00022': 206, 'BK00005': 200, 'BK00060': 199, 'BK00030': 189, 'BK00034': 189, 'BK00047': 175, 'BK00017': 171, 'BK00046': 167, 'BK00049': 163, 'BK00033': 162, 'BK00037': 160, 'BK00009': 154, 'BK00010': 154, 'BK00053': 154, 'BK00002': 151, 'BK00043': 150, 'BK00066': 143, 'BK00011': 142, 'BK00024': 140, 'BK00058': 134, 'BK00014': 118, 'BK00036': 112, 'BK00065': 109, 'BK00044': 100, 'BK00006': 100, 'BK00048': 99, 'BK00019': 97, 'BK00003': 96, 'BK00007': 90, 'BK00008': 90, 'BK00050': 85, 'BK00015': 84, 'BK00039': 80, 'BK00059': 77, 'BK00064': 76, 'BK00029': 65, 'BK00004': 42, 'BK00063': 10}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-F1bI5Wtw-1578403791163)(output_24_1.png)]
#公園
paint('parkNum')
{'BK00042': 30, 'BK00057': 26, 'BK00045': 24, 'BK00052': 23, 'BK00054': 14, 'BK00060': 13, 'BK00020': 12, 'BK00021': 12, 'BK00056': 11, 'BK00062': 11, 'BK00041': 10, 'BK00033': 8, 'BK00053': 8, 'BK00002': 8, 'BK00022': 8, 'BK00038': 8, 'BK00055': 8, 'BK00061': 8, 'BK00031': 7, 'BK00013': 7, 'BK00007': 7, 'BK00008': 7, 'BK00040': 7, 'BK00049': 6, 'BK00050': 6, 'BK00026': 6, 'BK00043': 6, 'BK00012': 6, 'BK00065': 6, 'BK00036': 6, 'BK00034': 6, 'BK00058': 6, 'BK00064': 5, 'BK00044': 5, 'BK00027': 5, 'BK00025': 5, 'BK00006': 5, 'BK00004': 5, 'BK00015': 5, 'BK00016': 5, 'BK00018': 5, 'BK00035': 5, 'BK00001': 5, 'BK00009': 4, 'BK00003': 4, 'BK00048': 4, 'BK00029': 4, 'BK00023': 4, 'BK00032': 4, 'BK00051': 3, 'BK00028': 3, 'BK00017': 3, 'BK00005': 3, 'BK00010': 3, 'BK00030': 3, 'BK00037': 3, 'BK00011': 2, 'BK00019': 2, 'BK00066': 2, 'BK00047': 1, 'BK00024': 1, 'BK00014': 1, 'BK00039': 1, 'BK00059': 1, 'BK00046': 0, 'BK00063': 0}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-JLzm1eMg-1578403791164)(output_25_1.png)]
# 購物中心
paint('mallNum')
{'BK00045': 19, 'BK00042': 16, 'BK00060': 15, 'BK00025': 14, 'BK00031': 12, 'BK00027': 11, 'BK00054': 10, 'BK00001': 10, 'BK00038': 9, 'BK00034': 9, 'BK00043': 8, 'BK00037': 8, 'BK00057': 8, 'BK00041': 7, 'BK00053': 7, 'BK00006': 7, 'BK00019': 7, 'BK00020': 7, 'BK00056': 7, 'BK00062': 7, 'BK00021': 7, 'BK00005': 6, 'BK00010': 6, 'BK00033': 6, 'BK00007': 6, 'BK00015': 6, 'BK00008': 6, 'BK00040': 6, 'BK00052': 5, 'BK00026': 5, 'BK00003': 5, 'BK00018': 5, 'BK00023': 5, 'BK00066': 5, 'BK00055': 5, 'BK00061': 5, 'BK00049': 4, 'BK00050': 4, 'BK00024': 4, 'BK00012': 4, 'BK00035': 4, 'BK00022': 4, 'BK00058': 4, 'BK00064': 3, 'BK00017': 3, 'BK00011': 3, 'BK00059': 3, 'BK00044': 2, 'BK00028': 2, 'BK00047': 2, 'BK00014': 2, 'BK00046': 2, 'BK00004': 2, 'BK00002': 2, 'BK00048': 2, 'BK00029': 2, 'BK00039': 2, 'BK00051': 1, 'BK00009': 1, 'BK00013': 1, 'BK00030': 1, 'BK00016': 0, 'BK00065': 0, 'BK00036': 0, 'BK00063': 0, 'BK00032': 0}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-MtP5kzvf-1578403791165)(output_26_1.png)]
上面這些我故意選取的地區和一些市場學校等場所的分佈,可以發現bk0045總體來說是最好的
# 超級市場
paint('superMarketNum')
{'BK00045': 299, 'BK00042': 159, 'BK00052': 154, 'BK00057': 145, 'BK00051': 131, 'BK00056': 130, 'BK00054': 126, 'BK00031': 119, 'BK00041': 109, 'BK00020': 103, 'BK00021': 103, 'BK00046': 100, 'BK00062': 98, 'BK00038': 88, 'BK00032': 83, 'BK00055': 78, 'BK00061': 78, 'BK00040': 75, 'BK00017': 74, 'BK00001': 71, 'BK00027': 63, 'BK00012': 63, 'BK00018': 63, 'BK00047': 61, 'BK00035': 60, 'BK00034': 58, 'BK00026': 56, 'BK00013': 56, 'BK00053': 56, 'BK00022': 56, 'BK00060': 55, 'BK00028': 53, 'BK00049': 51, 'BK00058': 51, 'BK00066': 51, 'BK00016': 49, 'BK00002': 48, 'BK00033': 47, 'BK00043': 46, 'BK00037': 43, 'BK00011': 42, 'BK00014': 41, 'BK00065': 38, 'BK00036': 38, 'BK00059': 38, 'BK00009': 37, 'BK00010': 36, 'BK00007': 35, 'BK00008': 35, 'BK00044': 34, 'BK00003': 32, 'BK00019': 32, 'BK00005': 31, 'BK00048': 31, 'BK00050': 30, 'BK00025': 29, 'BK00006': 23, 'BK00064': 22, 'BK00030': 22, 'BK00024': 21, 'BK00023': 21, 'BK00039': 21, 'BK00015': 16, 'BK00029': 15, 'BK00063': 11, 'BK00004': 5}
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-JaSwRtMj-1578403791165)(output_28_1.png)]
data.isnull()
ID | area | rentType | houseType | houseFloor | totalFloor | houseToward | houseDecoration | communityName | city | ... | landTotalPrice | landMeanPrice | totalWorkers | newWorkers | residentPopulation | pv | uv | lookNum | tradeTime | tradeMoney | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
41436 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
41437 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
41438 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
41439 | False | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
41440 rows × 51 columns
# 缺失值定位
data.isnull().sum()
ID 0
area 0
rentType 0
houseType 0
houseFloor 0
totalFloor 0
houseToward 0
houseDecoration 0
communityName 0
city 0
region 0
plate 0
buildYear 0
saleSecHouseNum 0
subwayStationNum 0
busStationNum 0
interSchoolNum 0
schoolNum 0
privateSchoolNum 0
hospitalNum 0
drugStoreNum 0
gymNum 0
bankNum 0
shopNum 0
parkNum 0
mallNum 0
superMarketNum 0
totalTradeMoney 0
totalTradeArea 0
tradeMeanPrice 0
tradeSecNum 0
totalNewTradeMoney 0
totalNewTradeArea 0
tradeNewMeanPrice 0
tradeNewNum 0
remainNewNum 0
supplyNewNum 0
supplyLandNum 0
supplyLandArea 0
tradeLandNum 0
tradeLandArea 0
landTotalPrice 0
landMeanPrice 0
totalWorkers 0
newWorkers 0
residentPopulation 0
pv 18
uv 18
lookNum 0
tradeTime 0
tradeMoney 0
dtype: int64
# 這裏我們可以知道pv和uv有一些缺失值
#參照答案的風格定位值
missing_values = pd.DataFrame(data.isnull().sum(),columns=['missingNum'])
missing_values['existNum'] = len(data)-missing_values ['missingNum']
missing_values['sum'] = len(data)
#太小了,使用百分比形式
missing_values['missingRadio'] = missing_values['missingNum']/len(data)*100
missing_values['dtype'] = data.dtypes# 數據類型查看
missing_values = missing_values[missing_values['missingNum']>0]
missing_values
# 通過下圖中,我們再查看csv的文件,發現這些數據是屬於同一行的,所以我們下次準備去除
missingNum | existNum | sum | missingRadio | dtype | |
---|---|---|---|---|---|
pv | 18 | 41422 | 41440 | 0.043436 | float64 |
uv | 18 | 41422 | 41440 | 0.043436 | float64 |
print(data['pv'].mean())
data.isnull().sum()
#我們先看看均值
26945.663512143306
ID 0
area 0
rentType 0
houseType 0
houseFloor 0
totalFloor 0
houseToward 0
houseDecoration 0
communityName 0
city 0
region 0
plate 0
buildYear 0
saleSecHouseNum 0
subwayStationNum 0
busStationNum 0
interSchoolNum 0
schoolNum 0
privateSchoolNum 0
hospitalNum 0
drugStoreNum 0
gymNum 0
bankNum 0
shopNum 0
parkNum 0
mallNum 0
superMarketNum 0
totalTradeMoney 0
totalTradeArea 0
tradeMeanPrice 0
tradeSecNum 0
totalNewTradeMoney 0
totalNewTradeArea 0
tradeNewMeanPrice 0
tradeNewNum 0
remainNewNum 0
supplyNewNum 0
supplyLandNum 0
supplyLandArea 0
tradeLandNum 0
tradeLandArea 0
landTotalPrice 0
landMeanPrice 0
totalWorkers 0
newWorkers 0
residentPopulation 0
pv 18
uv 18
lookNum 0
tradeTime 0
tradeMoney 0
dtype: int64
# print(data['pv'].isnull())
# for i in range(len(data['pv'].isnull())):
# if(data['pv'].isnull()[i]==True):
# print(i)
# data
data = data.fillna(data.mean()) #填補空缺值
data.isnull().sum()
ID 0
area 0
rentType 0
houseType 0
houseFloor 0
totalFloor 0
houseToward 0
houseDecoration 0
communityName 0
city 0
region 0
plate 0
buildYear 0
saleSecHouseNum 0
subwayStationNum 0
busStationNum 0
interSchoolNum 0
schoolNum 0
privateSchoolNum 0
hospitalNum 0
drugStoreNum 0
gymNum 0
bankNum 0
shopNum 0
parkNum 0
mallNum 0
superMarketNum 0
totalTradeMoney 0
totalTradeArea 0
tradeMeanPrice 0
tradeSecNum 0
totalNewTradeMoney 0
totalNewTradeArea 0
tradeNewMeanPrice 0
tradeNewNum 0
remainNewNum 0
supplyNewNum 0
supplyLandNum 0
supplyLandArea 0
tradeLandNum 0
tradeLandArea 0
landTotalPrice 0
landMeanPrice 0
totalWorkers 0
newWorkers 0
residentPopulation 0
pv 0
uv 0
lookNum 0
tradeTime 0
tradeMoney 0
dtype: int64
for i in['rentType', 'houseType', 'houseFloor', 'region', 'plate', 'houseToward', 'houseDecoration',
'communityName','city','region','plate','buildYear']:
print(i + "的特徵分佈如下:")
print(data[i].value_counts())
#調節具體參數
#bins調節橫座標分區個數,alpha參數用來設置透明度
# plt.hist(data, bins=30, normed=True, alpha=0.5, histtype='stepfilled',
# color='steelblue', edgecolor='none')
if i=="communityName":
continue
plt.figure(figsize=(15,6))
plt.hist(data[i],bins=3)#直
plt.figure(figsize=(15,4))
plt.show()
rentType的特徵分佈如下:
未知方式 30759
整租 5472
合租 5204
-- 5
Name: rentType, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-frVjss7s-1578403791166)(output_34_1.png)]
<Figure size 1080x288 with 0 Axes>
houseType的特徵分佈如下:
1室1廳1衛 9805
2室1廳1衛 8512
2室2廳1衛 6783
3室1廳1衛 3992
3室2廳2衛 2737
...
6室2廳5衛 1
8室2廳4衛 1
7室1廳7衛 1
8室3廳4衛 1
6室2廳6衛 1
Name: houseType, Length: 104, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-P31yTWKe-1578403791167)(output_34_4.png)]
<Figure size 1080x288 with 0 Axes>
houseFloor的特徵分佈如下:
中 15458
高 14066
低 11916
Name: houseFloor, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-o9BgbDkh-1578403791167)(output_34_7.png)]
<Figure size 1080x288 with 0 Axes>
region的特徵分佈如下:
RG00002 11437
RG00005 5739
RG00003 4186
RG00010 3640
RG00012 3368
RG00004 3333
RG00006 1961
RG00007 1610
RG00008 1250
RG00013 1215
RG00001 1157
RG00014 1069
RG00011 793
RG00009 681
RG00015 1
Name: region, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ByI9EAfu-1578403791168)(output_34_10.png)]
<Figure size 1080x288 with 0 Axes>
plate的特徵分佈如下:
BK00031 1958
BK00033 1837
BK00045 1816
BK00055 1566
BK00056 1516
...
BK00016 40
BK00036 33
BK00058 15
BK00032 3
BK00001 1
Name: plate, Length: 66, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-uoGohJnT-1578403791169)(output_34_13.png)]
<Figure size 1080x288 with 0 Axes>
houseToward的特徵分佈如下:
南 34377
南北 2254
北 2043
暫無數據 963
東南 655
東 552
西 264
西南 250
西北 58
東西 24
Name: houseToward, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-aPFZP5Zc-1578403791169)(output_34_16.png)]
<Figure size 1080x288 with 0 Axes>
houseDecoration的特徵分佈如下:
其他 29040
精裝 10918
簡裝 1171
毛坯 311
Name: houseDecoration, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-gidT4VzF-1578403791170)(output_34_19.png)]
<Figure size 1080x288 with 0 Axes>
communityName的特徵分佈如下:
XQ01834 358
XQ01274 192
XQ02273 188
XQ03110 185
XQ02337 173
...
XQ02484 1
XQ02672 1
XQ00390 1
XQ00560 1
XQ02928 1
Name: communityName, Length: 4236, dtype: int64
city的特徵分佈如下:
SH 41440
Name: city, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-vsHyI9rC-1578403791171)(output_34_22.png)]
<Figure size 1080x288 with 0 Axes>
region的特徵分佈如下:
RG00002 11437
RG00005 5739
RG00003 4186
RG00010 3640
RG00012 3368
RG00004 3333
RG00006 1961
RG00007 1610
RG00008 1250
RG00013 1215
RG00001 1157
RG00014 1069
RG00011 793
RG00009 681
RG00015 1
Name: region, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-wbvOSP80-1578403791171)(output_34_25.png)]
<Figure size 1080x288 with 0 Axes>
plate的特徵分佈如下:
BK00031 1958
BK00033 1837
BK00045 1816
BK00055 1566
BK00056 1516
...
BK00016 40
BK00036 33
BK00058 15
BK00032 3
BK00001 1
Name: plate, Length: 66, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-9IYh9DXM-1578403791172)(output_34_28.png)]
<Figure size 1080x288 with 0 Axes>
buildYear的特徵分佈如下:
1994 2851
暫無信息 2808
2006 2007
2007 1851
2008 1849
...
1939 2
1961 2
1962 1
1951 1
1950 1
Name: buildYear, Length: 80, dtype: int64
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-hz1R5rGP-1578403791173)(output_34_31.png)]
<Figure size 1080x288 with 0 Axes>
# 對非數值文件進行頻次分
for i in['rentType', 'houseType', 'houseFloor', 'region', 'plate', 'houseToward', 'houseDecoration',
'communityName','city','region','plate','buildYear']:
da = pd.DataFrame(data[i].value_counts()).reset_index()
da.columns = [i,'counts']
print(da[da['counts']>100])
rentType counts
0 未知方式 30759
1 整租 5472
2 合租 5204
houseType counts
0 1室1廳1衛 9805
1 2室1廳1衛 8512
2 2室2廳1衛 6783
3 3室1廳1衛 3992
4 3室2廳2衛 2737
5 4室1廳1衛 1957
6 3室2廳1衛 1920
7 1室0廳1衛 1286
8 1室2廳1衛 933
9 2室2廳2衛 881
10 4室2廳2衛 435
11 2室0廳1衛 419
12 4室2廳3衛 273
13 5室1廳1衛 197
14 2室1廳2衛 155
15 3室2廳3衛 149
16 3室1廳2衛 135
houseFloor counts
0 中 15458
1 高 14066
2 低 11916
region counts
0 RG00002 11437
1 RG00005 5739
2 RG00003 4186
3 RG00010 3640
4 RG00012 3368
5 RG00004 3333
6 RG00006 1961
7 RG00007 1610
8 RG00008 1250
9 RG00013 1215
10 RG00001 1157
11 RG00014 1069
12 RG00011 793
13 RG00009 681
plate counts
0 BK00031 1958
1 BK00033 1837
2 BK00045 1816
3 BK00055 1566
4 BK00056 1516
5 BK00052 1375
6 BK00017 1305
7 BK00041 1266
8 BK00054 1256
9 BK00051 1253
10 BK00046 1227
11 BK00035 1156
12 BK00042 1137
13 BK00009 1016
14 BK00050 979
15 BK00043 930
16 BK00026 906
17 BK00047 880
18 BK00034 849
19 BK00013 834
20 BK00053 819
21 BK00028 745
22 BK00040 679
23 BK00060 671
24 BK00010 651
25 BK00029 646
26 BK00062 618
27 BK00022 614
28 BK00018 613
29 BK00064 590
30 BK00005 549
31 BK00003 523
32 BK00014 500
33 BK00019 498
34 BK00061 477
35 BK00011 455
36 BK00037 444
37 BK00012 412
38 BK00038 398
39 BK00024 397
40 BK00020 384
41 BK00002 357
42 BK00065 348
43 BK00027 344
44 BK00039 343
45 BK00063 281
46 BK00057 278
47 BK00015 253
48 BK00006 231
49 BK00021 226
50 BK00007 225
51 BK00066 219
52 BK00030 219
53 BK00049 211
54 BK00008 210
55 BK00004 189
56 BK00048 165
57 BK00025 157
58 BK00023 127
59 BK00059 122
houseToward counts
0 南 34377
1 南北 2254
2 北 2043
3 暫無數據 963
4 東南 655
5 東 552
6 西 264
7 西南 250
houseDecoration counts
0 其他 29040
1 精裝 10918
2 簡裝 1171
3 毛坯 311
communityName counts
0 XQ01834 358
1 XQ01274 192
2 XQ02273 188
3 XQ03110 185
4 XQ02337 173
5 XQ01389 166
6 XQ01658 163
7 XQ02789 152
8 XQ00530 151
9 XQ01561 151
10 XQ01339 132
11 XQ00826 122
12 XQ01873 122
13 XQ02296 121
14 XQ01232 119
15 XQ01401 118
16 XQ02441 117
17 XQ00196 115
18 XQ02365 109
19 XQ01207 109
20 XQ01410 108
21 XQ00852 105
22 XQ02072 103
23 XQ01672 103
city counts
0 SH 41440
region counts
0 RG00002 11437
1 RG00005 5739
2 RG00003 4186
3 RG00010 3640
4 RG00012 3368
5 RG00004 3333
6 RG00006 1961
7 RG00007 1610
8 RG00008 1250
9 RG00013 1215
10 RG00001 1157
11 RG00014 1069
12 RG00011 793
13 RG00009 681
plate counts
0 BK00031 1958
1 BK00033 1837
2 BK00045 1816
3 BK00055 1566
4 BK00056 1516
5 BK00052 1375
6 BK00017 1305
7 BK00041 1266
8 BK00054 1256
9 BK00051 1253
10 BK00046 1227
11 BK00035 1156
12 BK00042 1137
13 BK00009 1016
14 BK00050 979
15 BK00043 930
16 BK00026 906
17 BK00047 880
18 BK00034 849
19 BK00013 834
20 BK00053 819
21 BK00028 745
22 BK00040 679
23 BK00060 671
24 BK00010 651
25 BK00029 646
26 BK00062 618
27 BK00022 614
28 BK00018 613
29 BK00064 590
30 BK00005 549
31 BK00003 523
32 BK00014 500
33 BK00019 498
34 BK00061 477
35 BK00011 455
36 BK00037 444
37 BK00012 412
38 BK00038 398
39 BK00024 397
40 BK00020 384
41 BK00002 357
42 BK00065 348
43 BK00027 344
44 BK00039 343
45 BK00063 281
46 BK00057 278
47 BK00015 253
48 BK00006 231
49 BK00021 226
50 BK00007 225
51 BK00066 219
52 BK00030 219
53 BK00049 211
54 BK00008 210
55 BK00004 189
56 BK00048 165
57 BK00025 157
58 BK00023 127
59 BK00059 122
buildYear counts
0 1994 2851
1 暫無信息 2808
2 2006 2007
3 2007 1851
4 2008 1849
5 2005 1814
6 2010 1774
7 1995 1685
8 1993 1543
9 2011 1498
10 2004 1431
11 2009 1271
12 2014 1238
13 2003 1156
14 1997 1125
15 2002 1120
16 2012 1049
17 1996 991
18 2000 925
19 2001 898
20 2015 840
21 1999 822
22 1998 733
23 2013 714
24 1987 632
25 1983 612
26 1991 545
27 1984 493
28 1980 452
29 1990 431
30 1988 423
31 1989 419
32 1985 359
33 1982 344
34 1986 320
35 1992 308
36 1976 251
37 1957 227
38 1981 221
39 1956 153
40 1977 153
41 2016 140
42 1978 133
43 1958 122
44 1979 116
45 1954 101
# 目標label值進行分析,sns是一個非常好的分佈包
# Labe 分佈
fig,axes = plt.subplots(2,3)
fig.set_size_inches(20,12)
sns.distplot(data['tradeMoney'],ax=axes[0][0])
sns.distplot(data[(data['tradeMoney']<=20000)]['tradeMoney'],ax=axes[0][1])
sns.distplot(data[(data['tradeMoney']>20000)&(data['tradeMoney']<=50000)]['tradeMoney'],ax=axes[0][2])
sns.distplot(data[(data['tradeMoney']>50000)&(data['tradeMoney']<=100000)]['tradeMoney'],ax=axes[1][0])
sns.distplot(data[(data['tradeMoney']>100000)]['tradeMoney'],ax=axes[1][1])
<matplotlib.axes._subplots.AxesSubplot at 0x7f7823b62090>
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-Qjcr56xE-1578403791173)(output_36_1.png)]
print('money_all',len(data['tradeMoney']))
print('money<10000',len(data[(data['tradeMoney']<=10000)]))
print("10000<money<=20000",len(data[(data['tradeMoney']>10000)&(data['tradeMoney']<=20000)]['tradeMoney']))
print("20000<money<=50000",len(data[(data['tradeMoney']>20000)&(data['tradeMoney']<=50000)]['tradeMoney']))
print("50000<money<=100000",len(data[(data['tradeMoney']>50000)&(data['tradeMoney']<=100000)]['tradeMoney']))
print("100000<money",len(data[(data['tradeMoney']>100000)]['tradeMoney']))
money_all 41440
money<10000 38964
10000<money<=20000 1985
20000<money<=50000 433
50000<money<=100000 39
100000<money 19
# 對房屋的處理數據將房間,客廳,衛生間分開來
room = []
living_room = []
bathroom = []
for i in data['houseType']:
room.append(float(i.split('室')[0]))
living_room.append(float(i.split('室')[-1].split('廳')[0]))
bathroom.append(float(i.split('室')[-1].split('廳')[0].split('衛')[0]))
data['roomNum'] = room
data['living_room'] = living_room
data['bathroom'] = bathroom
data = data.drop(['houseType'],axis=1)
data
ID | area | rentType | houseFloor | totalFloor | houseToward | houseDecoration | communityName | city | region | ... | newWorkers | residentPopulation | pv | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100309852 | 68.06 | 未知方式 | 低 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | ... | 614 | 111546 | 1124.0 | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 |
1 | 100307942 | 125.55 | 未知方式 | 中 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | ... | 148 | 157552 | 701.0 | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 |
2 | 100307764 | 132.00 | 未知方式 | 低 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | ... | 520 | 131744 | 57.0 | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 |
3 | 100306518 | 57.00 | 未知方式 | 中 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | ... | 1665 | 253337 | 888.0 | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 |
4 | 100305262 | 129.00 | 未知方式 | 低 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | ... | 117 | 125309 | 2038.0 | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 100000438 | 10.00 | 合租 | 高 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | ... | 0 | 245872 | 29635.0 | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 |
41436 | 100000201 | 7.10 | 合租 | 中 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | ... | 0 | 306857 | 28213.0 | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 |
41437 | 100000198 | 9.20 | 合租 | 高 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | ... | 0 | 306857 | 19231.0 | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 |
41438 | 100000182 | 14.10 | 合租 | 低 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | ... | 0 | 306857 | 17471.0 | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 |
41439 | 100000041 | 33.50 | 未知方式 | 中 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | ... | 990 | 406803 | 2556.0 | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 |
41440 rows × 53 columns
#樓層低和高和totalFloor有關係,有大小
# res=[]
# for i in range(len(data['houseFloor'])):
# # print(i)
# # print(type(i))
# if(data['houseFloor'][i]=='低'):
# res.append(1)
# elif(i=='中'):
# res.append(2)
# else:
# res.append(3)
# data['houseFloor'] = room
# data = data.infer_objects()
# data.info()
# 嘗試一下後,發現樓層高低是通過樓層決定的,那麼可以刪除
data = data.drop(['houseFloor'],axis=1)
data
ID | area | rentType | totalFloor | houseToward | houseDecoration | communityName | city | region | plate | ... | newWorkers | residentPopulation | pv | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100309852 | 68.06 | 未知方式 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | BK00064 | ... | 614 | 111546 | 1124.0 | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 |
1 | 100307942 | 125.55 | 未知方式 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | BK00049 | ... | 148 | 157552 | 701.0 | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 |
2 | 100307764 | 132.00 | 未知方式 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | BK00050 | ... | 520 | 131744 | 57.0 | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 |
3 | 100306518 | 57.00 | 未知方式 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | BK00051 | ... | 1665 | 253337 | 888.0 | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 |
4 | 100305262 | 129.00 | 未知方式 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | BK00044 | ... | 117 | 125309 | 2038.0 | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 100000438 | 10.00 | 合租 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | BK00062 | ... | 0 | 245872 | 29635.0 | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 |
41436 | 100000201 | 7.10 | 合租 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | BK00055 | ... | 0 | 306857 | 28213.0 | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 |
41437 | 100000198 | 9.20 | 合租 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | BK00055 | ... | 0 | 306857 | 19231.0 | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 |
41438 | 100000182 | 14.10 | 合租 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | BK00055 | ... | 0 | 306857 | 17471.0 | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 |
41439 | 100000041 | 33.50 | 未知方式 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | BK00020 | ... | 990 | 406803 | 2556.0 | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 |
41440 rows × 52 columns
# id是唯一屬性,可以刪除
data = data.drop(['ID'],axis = 1)
data
area | rentType | totalFloor | houseToward | houseDecoration | communityName | city | region | plate | buildYear | ... | newWorkers | residentPopulation | pv | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 68.06 | 未知方式 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | BK00064 | 1953 | ... | 614 | 111546 | 1124.0 | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 |
1 | 125.55 | 未知方式 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | BK00049 | 2007 | ... | 148 | 157552 | 701.0 | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 |
2 | 132.00 | 未知方式 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | BK00050 | 暫無信息 | ... | 520 | 131744 | 57.0 | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 |
3 | 57.00 | 未知方式 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | BK00051 | 暫無信息 | ... | 1665 | 253337 | 888.0 | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 |
4 | 129.00 | 未知方式 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | BK00044 | 暫無信息 | ... | 117 | 125309 | 2038.0 | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 10.00 | 合租 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | BK00062 | 2009 | ... | 0 | 245872 | 29635.0 | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 |
41436 | 7.10 | 合租 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | BK00055 | 2004 | ... | 0 | 306857 | 28213.0 | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 |
41437 | 9.20 | 合租 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | BK00055 | 2000 | ... | 0 | 306857 | 19231.0 | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 |
41438 | 14.10 | 合租 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | BK00055 | 1998 | ... | 0 | 306857 | 17471.0 | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 |
41439 | 33.50 | 未知方式 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | BK00020 | 2015 | ... | 990 | 406803 | 2556.0 | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 |
41440 rows × 51 columns
pd.get_dummies(data.rentType)
#這裏我們發現,竟然還有缺失值,上面一部分我直接去掉了,這裏由於樣本過多,我可以選擇刪除5個未知的
-- | 合租 | 整租 | 未知方式 | |
---|---|---|---|---|
0 | 0 | 0 | 0 | 1 |
1 | 0 | 0 | 0 | 1 |
2 | 0 | 0 | 0 | 1 |
3 | 0 | 0 | 0 | 1 |
4 | 0 | 0 | 0 | 1 |
... | ... | ... | ... | ... |
41435 | 0 | 1 | 0 | 0 |
41436 | 0 | 1 | 0 | 0 |
41437 | 0 | 1 | 0 | 0 |
41438 | 0 | 1 | 0 | 0 |
41439 | 0 | 0 | 0 | 1 |
41440 rows × 4 columns
print(data['rentType'].value_counts())# 統計租用方式的dict
# 通過中位數發現我們可以使用未知方式填充
for i in range(len(data['rentType'])):
if(data['rentType'][i]=='--'):
data['rentType'][i] = '未知方式'
print(data['rentType'].value_counts())# 統計租用方式的dict
未知方式 30759
整租 5472
合租 5204
-- 5
Name: rentType, dtype: int64
未知方式 30764
整租 5472
合租 5204
Name: rentType, dtype: int64
pd.get_dummies(data.rentType)
data = data.join(pd.get_dummies(data.rentType))
data.drop(['rentType'],axis=1)
data
area | rentType | totalFloor | houseToward | houseDecoration | communityName | city | region | plate | buildYear | ... | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | 合租 | 整租 | 未知方式 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 68.06 | 未知方式 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | BK00064 | 1953 | ... | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 | 0 | 0 | 1 |
1 | 125.55 | 未知方式 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | BK00049 | 2007 | ... | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
2 | 132.00 | 未知方式 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | BK00050 | 暫無信息 | ... | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
3 | 57.00 | 未知方式 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | BK00051 | 暫無信息 | ... | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
4 | 129.00 | 未知方式 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | BK00044 | 暫無信息 | ... | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 10.00 | 合租 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | BK00062 | 2009 | ... | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41436 | 7.10 | 合租 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | BK00055 | 2004 | ... | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41437 | 9.20 | 合租 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | BK00055 | 2000 | ... | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41438 | 14.10 | 合租 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | BK00055 | 1998 | ... | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41439 | 33.50 | 未知方式 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | BK00020 | 2015 | ... | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
41440 rows × 54 columns
data = data.drop(['rentType'],axis=1)
data
area | totalFloor | houseToward | houseDecoration | communityName | city | region | plate | buildYear | saleSecHouseNum | ... | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | 合租 | 整租 | 未知方式 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 68.06 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | BK00064 | 1953 | 0 | ... | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 | 0 | 0 | 1 |
1 | 125.55 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | BK00049 | 2007 | 0 | ... | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
2 | 132.00 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | BK00050 | 暫無信息 | 3 | ... | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
3 | 57.00 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | BK00051 | 暫無信息 | 0 | ... | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
4 | 129.00 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | BK00044 | 暫無信息 | 1 | ... | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 10.00 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | BK00062 | 2009 | 0 | ... | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41436 | 7.10 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | BK00055 | 2004 | 0 | ... | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41437 | 9.20 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | BK00055 | 2000 | 0 | ... | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41438 | 14.10 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | BK00055 | 1998 | 0 | ... | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41439 | 33.50 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | BK00020 | 2015 | 3 | ... | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
41440 rows × 53 columns
data
area | totalFloor | houseToward | houseDecoration | communityName | city | region | plate | buildYear | saleSecHouseNum | ... | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | 合租 | 整租 | 未知方式 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 68.06 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | BK00064 | 1953 | 0 | ... | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 | 0 | 0 | 1 |
1 | 125.55 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | BK00049 | 2007 | 0 | ... | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
2 | 132.00 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | BK00050 | 暫無信息 | 3 | ... | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
3 | 57.00 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | BK00051 | 暫無信息 | 0 | ... | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
4 | 129.00 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | BK00044 | 暫無信息 | 1 | ... | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 10.00 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | BK00062 | 2009 | 0 | ... | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41436 | 7.10 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | BK00055 | 2004 | 0 | ... | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41437 | 9.20 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | BK00055 | 2000 | 0 | ... | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41438 | 14.10 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | BK00055 | 1998 | 0 | ... | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41439 | 33.50 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | BK00020 | 2015 | 3 | ... | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
41440 rows × 53 columns
print(data['houseDecoration'].value_counts())# 這個暫時沒有思路
其他 29040
精裝 10918
簡裝 1171
毛坯 311
Name: houseDecoration, dtype: int64
#對於建立年代的處理我選擇的處理方式是
num_sum = 0
j = 0
for i in data['buildYear']:
if(i!="暫無信息"):
j+=1
num_sum+=float(i)
mean1 = num_sum/j
for i in range(len(data['buildYear'])):
if(data['buildYear'][i]=='暫無信息'):
data['buildYear'][i] = str(mean1)
data
area | totalFloor | houseToward | houseDecoration | communityName | city | region | plate | buildYear | saleSecHouseNum | ... | uv | lookNum | tradeTime | tradeMoney | roomNum | living_room | bathroom | 合租 | 整租 | 未知方式 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 68.06 | 16 | 暫無數據 | 其他 | XQ00051 | SH | RG00001 | BK00064 | 1953 | 0 | ... | 284.0 | 0 | 2018/11/28 | 2000.0 | 2.0 | 1.0 | 1.0 | 0 | 0 | 1 |
1 | 125.55 | 14 | 暫無數據 | 簡裝 | XQ00130 | SH | RG00002 | BK00049 | 2007 | 0 | ... | 22.0 | 1 | 2018/12/16 | 2000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
2 | 132.00 | 32 | 暫無數據 | 其他 | XQ00179 | SH | RG00002 | BK00050 | 1999.3850952578173 | 3 | ... | 20.0 | 1 | 2018/12/22 | 16000.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
3 | 57.00 | 17 | 暫無數據 | 精裝 | XQ00313 | SH | RG00002 | BK00051 | 1999.3850952578173 | 0 | ... | 279.0 | 9 | 2018/12/21 | 1600.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
4 | 129.00 | 2 | 暫無數據 | 毛坯 | XQ01257 | SH | RG00003 | BK00044 | 1999.3850952578173 | 1 | ... | 480.0 | 0 | 2018/11/18 | 2900.0 | 3.0 | 2.0 | 2.0 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41435 | 10.00 | 11 | 北 | 精裝 | XQ01209 | SH | RG00002 | BK00062 | 2009 | 0 | ... | 2662.0 | 0 | 2018/2/5 | 2190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41436 | 7.10 | 6 | 北 | 精裝 | XQ00853 | SH | RG00002 | BK00055 | 2004 | 0 | ... | 2446.0 | 0 | 2018/1/22 | 2090.0 | 3.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41437 | 9.20 | 18 | 北 | 精裝 | XQ00852 | SH | RG00002 | BK00055 | 2000 | 0 | ... | 2016.0 | 0 | 2018/2/8 | 3190.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41438 | 14.10 | 8 | 北 | 精裝 | XQ00791 | SH | RG00002 | BK00055 | 1998 | 0 | ... | 2554.0 | 0 | 2018/3/22 | 2460.0 | 4.0 | 1.0 | 1.0 | 1 | 0 | 0 |
41439 | 33.50 | 19 | 北 | 其他 | XQ03246 | SH | RG00010 | BK00020 | 2015 | 3 | ... | 717.0 | 1 | 2018/10/21 | 3000.0 | 1.0 | 1.0 | 1.0 | 0 | 0 | 1 |
41440 rows × 53 columns
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41440 entries, 0 to 41439
Data columns (total 53 columns):
area 41440 non-null float64
totalFloor 41440 non-null int64
houseToward 41440 non-null object
houseDecoration 41440 non-null object
communityName 41440 non-null object
city 41440 non-null object
region 41440 non-null object
plate 41440 non-null object
buildYear 41440 non-null object
saleSecHouseNum 41440 non-null int64
subwayStationNum 41440 non-null int64
busStationNum 41440 non-null int64
interSchoolNum 41440 non-null int64
schoolNum 41440 non-null int64
privateSchoolNum 41440 non-null int64
hospitalNum 41440 non-null int64
drugStoreNum 41440 non-null int64
gymNum 41440 non-null int64
bankNum 41440 non-null int64
shopNum 41440 non-null int64
parkNum 41440 non-null int64
mallNum 41440 non-null int64
superMarketNum 41440 non-null int64
totalTradeMoney 41440 non-null int64
totalTradeArea 41440 non-null float64
tradeMeanPrice 41440 non-null float64
tradeSecNum 41440 non-null int64
totalNewTradeMoney 41440 non-null int64
totalNewTradeArea 41440 non-null int64
tradeNewMeanPrice 41440 non-null float64
tradeNewNum 41440 non-null int64
remainNewNum 41440 non-null int64
supplyNewNum 41440 non-null int64
supplyLandNum 41440 non-null int64
supplyLandArea 41440 non-null float64
tradeLandNum 41440 non-null int64
tradeLandArea 41440 non-null float64
landTotalPrice 41440 non-null int64
landMeanPrice 41440 non-null float64
totalWorkers 41440 non-null int64
newWorkers 41440 non-null int64
residentPopulation 41440 non-null int64
pv 41440 non-null float64
uv 41440 non-null float64
lookNum 41440 non-null int64
tradeTime 41440 non-null object
tradeMoney 41440 non-null float64
roomNum 41440 non-null float64
living_room 41440 non-null float64
bathroom 41440 non-null float64
合租 41440 non-null uint8
整租 41440 non-null uint8
未知方式 41440 non-null uint8
dtypes: float64(13), int64(29), object(8), uint8(3)
memory usage: 15.9+ MB
data = data.infer_objects()
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41440 entries, 0 to 41439
Data columns (total 53 columns):
area 41440 non-null float64
totalFloor 41440 non-null int64
houseToward 41440 non-null object
houseDecoration 41440 non-null object
communityName 41440 non-null object
city 41440 non-null object
region 41440 non-null object
plate 41440 non-null object
buildYear 41440 non-null object
saleSecHouseNum 41440 non-null int64
subwayStationNum 41440 non-null int64
busStationNum 41440 non-null int64
interSchoolNum 41440 non-null int64
schoolNum 41440 non-null int64
privateSchoolNum 41440 non-null int64
hospitalNum 41440 non-null int64
drugStoreNum 41440 non-null int64
gymNum 41440 non-null int64
bankNum 41440 non-null int64
shopNum 41440 non-null int64
parkNum 41440 non-null int64
mallNum 41440 non-null int64
superMarketNum 41440 non-null int64
totalTradeMoney 41440 non-null int64
totalTradeArea 41440 non-null float64
tradeMeanPrice 41440 non-null float64
tradeSecNum 41440 non-null int64
totalNewTradeMoney 41440 non-null int64
totalNewTradeArea 41440 non-null int64
tradeNewMeanPrice 41440 non-null float64
tradeNewNum 41440 non-null int64
remainNewNum 41440 non-null int64
supplyNewNum 41440 non-null int64
supplyLandNum 41440 non-null int64
supplyLandArea 41440 non-null float64
tradeLandNum 41440 non-null int64
tradeLandArea 41440 non-null float64
landTotalPrice 41440 non-null int64
landMeanPrice 41440 non-null float64
totalWorkers 41440 non-null int64
newWorkers 41440 non-null int64
residentPopulation 41440 non-null int64
pv 41440 non-null float64
uv 41440 non-null float64
lookNum 41440 non-null int64
tradeTime 41440 non-null object
tradeMoney 41440 non-null float64
roomNum 41440 non-null float64
living_room 41440 non-null float64
bathroom 41440 non-null float64
合租 41440 non-null uint8
整租 41440 non-null uint8
未知方式 41440 non-null uint8
dtypes: float64(13), int64(29), object(8), uint8(3)
memory usage: 15.9+ MB
# 數值
corr = data.corr()
plt.figure(figsize=(15,6))
# print(corr)
sns.heatmap(corr)
#還有一些數據需要改變
<matplotlib.axes._subplots.AxesSubplot at 0x7f7828753210>
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-YkCzMjgV-1578403791176)(output_51_1.png)]
# 箱線圖統計
plt.figure(figsize=(15,6))
data.boxplot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f7828d09190>
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ga6hTSZc-1578403791177)(output_52_1.png)]