重學Statistics, Cha14 Simple Linear Regression

14.1 Simple Linear Regression Model

Simple Linear Regression Model: y = β0 + β1 x + ε

  • β0 β1 are referred to as parameters of the model
  • ε is a random variable referred to as the error term, which is the variability i y that cannot be explained by the linear relationship between x and y.

Simple Linear Regression Equation : E(y) = β0 + β1 x
Estimated Simple Linear Regression Equation: yˆ = b0 + b1x
這裏寫圖片描述

14.2 Least Square Method

It is a procedure for using sample data to find the estimated regression equation.
這裏寫圖片描述

這裏寫圖片描述

這裏寫圖片描述

這裏寫圖片描述

14.3 Coefficient of Determination

怎麼證明,剛剛的模型能fit the data?

SSE: SUM of squares due to error
這裏寫圖片描述
這裏寫圖片描述
這裏寫圖片描述

這裏寫圖片描述

SST = SSR + SSE

Coefficient of determination
這裏寫圖片描述
Correlation Coefficient
這裏寫圖片描述

Correlation Coefficient 僅能用在 a linear relationship between two variables
Coefficient of determination 可以用在 nonlinear relationship and for relationships that have two or more independent variables

14.4 Model Assumption

An important step in determining whether the assumed model is appropriate involves testing for the significance of the relationship.
這裏寫圖片描述

這裏寫圖片描述

14.5 Test For Significance

Estimate of σ2

這裏寫圖片描述
這裏寫圖片描述
這裏寫圖片描述


t-Test

H0: β1 = 0
H1: β1 != 0
這裏寫圖片描述

這裏寫圖片描述

這裏寫圖片描述

Confidence Interval for β1

b1 +- t * sb1 = 5 +- 1.95
因爲 interval 都比0大,所以可以 reject H0

F Test

F test has the same result as t test if there is only one independent variable, with more than one independent variable, only F test can be used to test for an overall significant relationship.

問題:爲什麼說當 β1 =0時,MSR/MSE接近於1?而且滿足 F distribution?

這裏寫圖片描述
這裏寫圖片描述
這裏寫圖片描述

Cautions:

Reject H0 does not enable us to conclude that the relationship between x and y is linear. We can state only that x and y are related and that a linear relationship explains a significant portion of the variability in y over the range of values for x observed in the sample.

14.6 Using estimated regression equation for estimation and prediction

Point estimation: 直接代入公式算出 y

Confidence Interval :

這裏寫圖片描述

這裏寫圖片描述

Prediction Interval :

這裏寫圖片描述

這裏寫圖片描述

這裏寫圖片描述

14.8 Residual Analysis: Validating Model Assumptions

4個假設:
1. E(ε) = 0.
2. The variance of ε, denoted by σ2,is the same for all values of x.
3. The values of ε are independent.
4. The error term ε has a normal distribution.

Residual Plot Against x

這裏寫圖片描述

Standardized Residuals

這裏寫圖片描述

這裏寫圖片描述

正好從下圖中,可以看出 approximately 95% of the standardized residuals between -2 and +2
這裏寫圖片描述

Normal Probability Plot

用 normal scores 和 standardized residuals plot一個圖,看出是過原點的45度的直線。
這裏寫圖片描述
這裏寫圖片描述

14.9 Residual Analysis: Outliers and Influential Observations

Detecting Outliers

Identify any observation with a standardized residual of less than -2 or greater than +2 as an unusual observation.

Detecting Influential Observations

這裏寫圖片描述

這裏寫圖片描述

這裏寫圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章