14.1 Simple Linear Regression Model
Simple Linear Regression Model: y = β0 + β1 x + ε
- β0 β1 are referred to as parameters of the model
- ε is a random variable referred to as the error term, which is the variability i y that cannot be explained by the linear relationship between x and y.
Simple Linear Regression Equation : E(y) = β0 + β1 x
Estimated Simple Linear Regression Equation: yˆ = b0 + b1x
14.2 Least Square Method
It is a procedure for using sample data to find the estimated regression equation.
14.3 Coefficient of Determination
怎麼證明,剛剛的模型能fit the data?
SSE: SUM of squares due to error
SST = SSR + SSE
Coefficient of determination
Correlation Coefficient
Correlation Coefficient 僅能用在 a linear relationship between two variables
Coefficient of determination 可以用在 nonlinear relationship and for relationships that have two or more independent variables
14.4 Model Assumption
An important step in determining whether the assumed model is appropriate involves testing for the significance of the relationship.
14.5 Test For Significance
Estimate of σ2
t-Test
H0: β1 = 0
H1: β1 != 0
Confidence Interval for β1
b1 +- t * sb1 = 5 +- 1.95
因爲 interval 都比0大,所以可以 reject H0
F Test
F test has the same result as t test if there is only one independent variable, with more than one independent variable, only F test can be used to test for an overall significant relationship.
問題:爲什麼說當 β1 =0時,MSR/MSE接近於1?而且滿足 F distribution?
Cautions:
Reject H0 does not enable us to conclude that the relationship between x and y is linear. We can state only that x and y are related and that a linear relationship explains a significant portion of the variability in y over the range of values for x observed in the sample.
14.6 Using estimated regression equation for estimation and prediction
Point estimation: 直接代入公式算出 y
Confidence Interval :
Prediction Interval :
14.8 Residual Analysis: Validating Model Assumptions
4個假設:
1. E(ε) = 0.
2. The variance of ε, denoted by σ2,is the same for all values of x.
3. The values of ε are independent.
4. The error term ε has a normal distribution.
Residual Plot Against x
Standardized Residuals
正好從下圖中,可以看出 approximately 95% of the standardized residuals between -2 and +2
Normal Probability Plot
用 normal scores 和 standardized residuals plot一個圖,看出是過原點的45度的直線。
14.9 Residual Analysis: Outliers and Influential Observations
Detecting Outliers
Identify any observation with a standardized residual of less than -2 or greater than +2 as an unusual observation.