第十四周作業 jupyter

原創

2018-09-04 07:05

jupyter notebook （別稱ipython notebook）是一個基於網頁的交互式筆記本，支持40多種編程語言。支持創建和共享包含實時代碼、方程式、可視化和敘述性文本的文檔。

安裝jupyter notebook

具體安裝方式官網有詳細介紹 install jupyter，我選擇了通過anaconda安裝

anaconda是一個python的開源科學計算平臺，支持linux、mac、windows系統，內置個各種常用科學計算包，提供包管理功能和環境管理功能，解決了多版本python並存造成的各種問題，適合初學者和懶人使用

下載可以通過anaconda 的官方下載地址，不過速度真的太慢了……推薦使用清華大學鏡像站下載

有關jupyter的各種用法可以查看這個博客，或者直接查閱官方文檔

作業題目

Anscombe's quartet comprises of four datasets, and is rather famous. Why?You'll find out in this exercise.

Anscombe’s Quartet
I		II		III		IV
x	y	x	y	x	y	x	y
10.0	8.04	10.0	9.14	10.0	7.46	8.0	6.58
8.0	6.95	8.0	8.14	8.0	6.77	8.0	5.76
13.0	7.58	13.0	8.74	13.0	12.74	8.0	7.71
9.0	8.81	9.0	8.77	9.0	7.11	8.0	8.84
11.0	8.33	11.0	9.26	11.0	7.81	8.0	8.47
14.0	9.96	14.0	8.10	14.0	8.84	8.0	7.04
6.0	7.24	6.0	6.13	6.0	6.08	8.0	5.25
4.0	4.26	4.0	3.10	4.0	5.39	19.0	12.50
12.0	10.84	12.0	9.13	12.0	8.15	8.0	5.56
7.0	4.82	7.0	7.26	7.0	6.42	8.0	7.91
5.0	5.68	5.0	4.74	5.0	5.73	8.0	6.89

導入csv文件

涉及知識：

1、csv(逗號分隔值)是一種用來存儲數據的純文本文件，通常都是用於存放電子表格或數據的一種文件格式。一般用WORDPAD或記事本(NOTE),EXCEL打開。

2、%matplotlib inline 這是一個魔法函數（magic function），是IPython中一種模仿命令行來訪問magic函數的獨有的形式

3、seaborn 是一個可以調整圖表讓你的圖表更優美漂亮的庫，知乎上有官方文檔的中文翻譯

4、read_csv（）讀取csv文件

5、.head（10）顯示前10行數據，如果沒有參數則默認顯示5行數據

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: (hint: use statsmodels and look at the Statsmodels notebook)

涉及知識：

1、groupby 是pandas提供的一個能對數據集進行切片、切塊、摘要等操作的函數

2、pandas.DataFrame.mean()求平均值

3、pandas.DataFrame.var()求方差

涉及知識：

.corr()：返回列與列之間的相關係數

涉及知識：

scipy.stats.linregress ：只對計算兩組測量值的最小二成迴歸進行優化，返回係數，截距，R2係數和標準差

可以看出，四組數據x的平均值都是9.0，方差都是11.0； y的平均值都是7.5，方差都是4.12；x, y的相關係數都是0.81；四組數據的線性迴歸方程都近似 y = 0.5x + 3

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

24小時熱門文章

最新文章

最新評論文章