PaddlePaddle一步一步預測波士頓房價

如何開始

針對PaddlePaddle新手入門的閱讀順序，本人建議先閱讀Fluid編程指南，然後再看快速入門，在針對上一篇搭建簡單網絡之後，本次爲大家解讀一下快速入門-線性迴歸，個人感覺目錄的名有點問題，線性迴歸應該改爲房價預測，才能與數字識別對應上，哈哈。

現實問題

房價是大家都關心的事情，但是房價由哪些因素決定，在已知一些因素之後房價大概是多少？我們需要使用PaddlePaddle解決線性迴歸的問題。

轉換爲PaddlePaddle的問題

加載房價數據，使用PaddlePaddle定義全連接網絡（線性迴歸模型），加載數據到網絡中進行學習，通過前向計算和反向傳播後（PaddlePaddle已封裝好），得到優化後參數，給定一些房子相關信息，加載優化後的參數，得到預測的房價。

使用PyCharm創建工程、顯示數據

創建InferBostonHousingPrice工程，使用之前已有的環境，新建train.py文件測試paddlepaddle是否可用。

import paddle.fluid

顯示波士頓郊區房價數據

train_reader = paddle.dataset.uci_housing.train()
test_reader = paddle.dataset.uci_housing.test()
print(paddle.dataset.uci_housing.feature_names)

print("-----------train---------------")
for i, data in enumerate(train_reader()):
    print(i, data)

print("-----------test---------------")
for i, data in enumerate(test_reader()):
    print(i, data)

一共506個數據，404個訓練數據，102個測試數據。

原版的說明文件大家可以參考一下：

Title: Boston Housing Data
Sources:
(a) Origin: This dataset was taken from the StatLib library which is
maintained at Carnegie Mellon University.
(b) Creator: Harrison, D. and Rubinfeld, D.L. ‘Hedonic prices and the
demand for clean air’, J. Environ. Economics & Management,
vol.5, 81-102, 1978.
© Date: July 7, 1993
Past Usage:
- Used in Belsley, Kuh & Welsch, ‘Regression diagnostics …’, Wiley,
  1980. N.B. Various transformations are used in the table on
  pages 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning.
  In Proceedings on the Tenth International Conference of Machine
  Learning, 236-243, University of Massachusetts, Amherst. Morgan
  Kaufmann.
Relevant Information:

Concerns housing values in suburbs of Boston.
Number of Instances: 506
Number of Attributes: 13 continuous attributes (including “class”
attribute “MEDV”), 1 binary-valued attribute.
Attribute Information:
1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over
  25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds
  river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
  by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000’s
Missing Attribute Values: None.

定義網絡

# 將數據轉換爲feed所需的數據類型
list = []
for data in train_reader():
    list.append(data)
    
# 定義網絡
x = fluid.layers.data(name="x", shape=[13], dtype='float32')
y = fluid.layers.data(name="y", shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
# 定義損失函數
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_cost = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_cost)

加載數據並進行訓練

feeder = fluid.DataFeeder(place=cpu, feed_list=[x, y])

for i in range(1000):
    avg_loss_value = exe.run(
        None,
        feed=feeder.feed(list),
        fetch_list=[avg_cost])

參數保存到文件

model_path = "./infer_bhp_model"
fluid.io.save_params(executor=exe, dirname=model_path, main_program=None)

選擇預測所需的數據

在原數據集中任意選擇幾個樣本，作爲模擬房價的13個變量的值


# 定義數據

new_x = numpy.array([[-0.0405441 ,  0.06636364, -0.32356227, -0.06916996, -0.03435197,
        0.05563625, -0.03475696,  0.02682186, -0.37171335, -0.21419304,
       -0.33569506,  0.10143217, -0.21172912]]).astype('float32')
#24
new_x1 = numpy.array([[-0.03894423, -0.11363636, -0.0944567 ,
-0.06916996, -0.07138901,0.08476061,
 0.11663336, -0.09250268, -0.19780031,
 -0.04625411,0.26004962,  0.09603603, -0.08921256]]).astype('float32')

#27.5
new_x2 = numpy.array([[-0.01460746, -0.11363636,  0.30950225,
         -0.06916996,  0.10350811,-0.07753102,
         0.29583006, -0.12788538, -0.19780031,
         -0.00999457,-0.39952485, -0.02154428, -0.01719269]]).astype('float32')
#19.1

定義網絡進行預測

# 定義網絡
x = fluid.layers.data(name="x", shape=[13], dtype='float32')
y = fluid.layers.data(name="y", shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)

# 參數初始化
cpu = fluid.core.CPUPlace()
exe = fluid.Executor(cpu)

param_path = "./infer_bhp_model"
prog = fluid.default_main_program()
fluid.io.load_params(executor=exe, dirname=param_path,
                     main_program=prog)

outs = exe.run(
    feed={'x': new_x2},
    fetch_list=[y_predict.name])

print(outs)

完整代碼

文件train.py

import paddle
import paddle.fluid as fluid
import numpy

# 顯示數據

train_reader = paddle.dataset.uci_housing.train()
test_reader = paddle.dataset.uci_housing.test()

print(paddle.dataset.uci_housing.feature_names)

print("-----------train---------------")
for i, data in enumerate(train_reader()):
    print(i, data)

print("-----------test---------------")
for i, data in enumerate(test_reader()):
    print(i, data)


#
list = []
for data in train_reader():
    list.append(data)

# 定義網絡
x = fluid.layers.data(name="x", shape=[13], dtype='float32')
y = fluid.layers.data(name="y", shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
# 定義損失函數
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_cost = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_cost)
# 參數初始化
cpu = fluid.core.CPUPlace()
exe = fluid.Executor(cpu)
exe.run(fluid.default_startup_program())

feeder = fluid.DataFeeder(place=cpu, feed_list=[x, y])

for i in range(1000):
    avg_loss_value = exe.run(
        None,
        feed=feeder.feed(list),
        fetch_list=[avg_cost])
print(avg_loss_value)
model_path = "./infer_bhp_model"
fluid.io.save_params(executor=exe, dirname=model_path, main_program=None)

文件infer.py


import paddle.fluid as fluid
import numpy

# 定義數據

new_x = numpy.array([[-0.0405441 ,  0.06636364, -0.32356227, -0.06916996, -0.03435197,
        0.05563625, -0.03475696,  0.02682186, -0.37171335, -0.21419304,
       -0.33569506,  0.10143217, -0.21172912]]).astype('float32')
#24
new_x1 = numpy.array([[-0.03894423, -0.11363636, -0.0944567 ,
-0.06916996, -0.07138901,0.08476061,
 0.11663336, -0.09250268, -0.19780031,
 -0.04625411,0.26004962,  0.09603603, -0.08921256]]).astype('float32')

#27.5
new_x2 = numpy.array([[-0.01460746, -0.11363636,  0.30950225,
         -0.06916996,  0.10350811,-0.07753102,
         0.29583006, -0.12788538, -0.19780031,
         -0.00999457,-0.39952485, -0.02154428, -0.01719269]]).astype('float32')
#19.1

# 定義網絡
x = fluid.layers.data(name="x", shape=[13], dtype='float32')
y = fluid.layers.data(name="y", shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)

# 參數初始化
cpu = fluid.core.CPUPlace()
exe = fluid.Executor(cpu)

param_path = "./infer_bhp_model"
prog = fluid.default_main_program()
fluid.io.load_params(executor=exe, dirname=param_path,
                     main_program=prog)

outs = exe.run(
    feed={'x': new_x2},
    fetch_list=[y_predict.name])

print(outs)

結果：

PaddlePaddle一步一步預測波士頓房價

PaddlePaddle一步一步預測波士頓房價

如何開始

現實問題

轉換爲PaddlePaddle的問題

使用PyCharm創建工程、顯示數據

定義網絡

加載數據並進行訓練

參數保存到文件

選擇預測所需的數據

定義網絡進行預測

完整代碼

python gdal 安裝使用（Windows， python 3.6.8）

Learning Spark筆記2-RDD介紹

交換機斷電後hdfs不可用

mysql修改權限

yii2使用隨記

python抓頁面基礎知識

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結