jupyter作業

原創

2018-09-03 12:48

Anscombe's quartet

Anscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise.

所有模塊：

%matplotlib inline
import random
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

計算均值：

anscombe=pd.read_csv('data/anscombe.csv')
print('mean of x:')
print(anscombe.groupby("dataset").x.mean(),'\n')
print('mean of y:')
print(anscombe.groupby("dataset").y.mean(),'\n')

結果：

計算方差：

print('variance of x:')
print(anscombe.groupby("dataset").x.var(),'\n')
print('variance of y:')
print(anscombe.groupby("dataset").y.var(),'\n')

結果：

相關係數：

print('correlation coefficient between x and y:')
print(anscombe.groupby("dataset").x.corr(anscombe.y))
# print(anscombe.groupby("dataset").y.corr(anscombe.x)) #這樣結果和上面一樣

結果：

線性迴歸方程：

def regression(X,Y,num):
    print("dataset "+str(num)+':')
    X=sm.add_constant(X)
    est=sm.OLS(Y,X)
    est=est.fit()
    print('y='+str(est.params[1])+'x+'+str(est.params[0]))
    x=np.linspace(X.x.min(), X.x.max(),100)
    y=est.params[1]*x+est.params[0]
    plt.figure()
    plt.scatter(X.x, Y, alpha=0.3)
    plt.xlabel('x')
    plt.ylabel('y')
    plt.plot(x,y,color='r')
for i in range(4):
    regression(anscombe[i*11:(i+1)*11].x,anscombe[i*11:(i+1)*11].y,i+1)

結果和線性模擬：

dataset 1:
y=0.5000909090909089x+3.0000909090909085

dataset 2:
y=0.4999999999999999x+3.000909090909091

dataset 3:
y=0.4997272727272726x+3.002454545454545

dataset 4:
y=0.49990909090909114x+3.0017272727272735

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

代碼：

def visualize(datasetx,y):
    plt.figure()
    sns.FacetGrid(datasetx)
    plt.scatter(datasetx.x,y)
for i in range(4):
    visualize(anscombe[i*11:(i+1)*11],anscombe[i*11:(i+1)*11].y)

結果：

dataset1:

dataset2:

dataset3:

dataset4:

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

jupyter作業

Anscombe's quartet

Part 1

Part 2

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

JAVA環境搭建與第一個helloworld

ASCII/GBK/Unicode/UTF-8編碼問題

Python第四章課後作業

Python第二章課後作業

瀏覽Python主頁感想

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結