- My presentations
Auth with social network:
Download presentation
We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
Hypothesis Testing in Linear Regression Analysis
Published by Caren Allison Modified over 9 years ago
Similar presentations
Presentation on theme: "Hypothesis Testing in Linear Regression Analysis"— Presentation transcript:
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of is constant.
Conclusion to Bivariate Linear Regression Economics 224 – Notes for November 19, 2008.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 12 Simple Linear Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Simple Regression
4.1 All rights reserved by Dr.Bill Wan Sing Hung - HKBU Lecture #4 Studenmund (2006): Chapter 5 Review of hypothesis testing Confidence Interval and estimation.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Topic 3: Regression.
Linear Regression Example Data
About project
© 2024 SlidePlayer.com Inc. All rights reserved.
Introduction to Econometrics with R
5 hypothesis tests and confidence intervals in slr model.
This chapter continues our treatment of the simple linear regression model. The following subsections discuss how we may use our knowledge about the sampling distribution of the OLS estimator in order to make statements regarding its uncertainty.
These subsections cover the following topics:
Testing Hypotheses regarding regression coefficients.
Confidence intervals for regression coefficients.
Regression when \(X\) is a dummy variable.
Heteroskedasticity and Homoskedasticity.
The packages AER ( Christian Kleiber and Zeileis 2008 ) and scales ( Wickham and Seidel 2022 ) are required for reproduction of the code chunks presented throughout this chapter. The package scales provides additional generic plot scaling methods. Make sure both packages are installed before you proceed. The safest way to do so is by checking whether the following code chunk executes without any errors.
Simple linear regression
Apr 29, 2012
630 likes | 1.89k Views
Simple linear regression. What regression analysis does The simple regression model Hypothesis testing in regression Residual analysis Inverse prediction, replicated regression and weighted regression Regression caveats Power considerations in simple linear regression. D Y. Y.
Share Presentation
- ss error ss ss
- squared deviations
- simple regression models
- plot studentized
Presentation Transcript
Simple linear regression • What regression analysis does • The simple regression model • Hypothesis testing in regression • Residual analysis • Inverse prediction, replicated regression and weighted regression • Regression caveats • Power considerations in simple linear regression Bio 4118 Applied Biostatistics
DY Y b = DY/DX DX X What regression does • Fits a straight line through a cloud of data. • Tests and quantifies the effect of an independent variable X on a dependent variable Y. • Intensity of the effect is given by the slope (b) of the regression. • The importance of the effect is given by the coefficient of determination (r2). Bio 4118 Applied Biostatistics
The slope b is estimated as: The correlation r is: So, b = r if X and Y have the same variance… and if b = 0, r = 0 and vice versa. Regression and correlation coefficients Bio 4118 Applied Biostatistics
Y ei X How it does it • by the method of least squares, which involves minimizing the sum of squared deviations between the observations and the regression line, i.e. minimizing the residuals • Squared deviation of an observation given by: Residual: Bio 4118 Applied Biostatistics
Regression or correlation? • Correlation: degree of association between two variables X and Y; no causal relationship assumed! • Regression: to predict the value of the dependent variable if the independent variable were changed; causal relationship assumed! Bio 4118 Applied Biostatistics
X2 Correlation X1 When do we use regression? • Don’t use it to determine the strength of association between to variables. • Do use it if you want to predict the value of Y given X. Y Regression X Bio 4118 Applied Biostatistics
ei Yi DY a (intercept) Xi DX X Observed Expected The simple regression model • The regression model is: • So, all simple regression models are described by 2 parameters, the intercept (a) and slope (b). b = DY/DX (slope) Bio 4118 Applied Biostatistics
Assumptions • Residuals are independent and normally distributed. • The variance of the residuals is equal for all X (homoscedasticity). • The relationship between Y and X is linear. • There is no measurement error on X (Model I regression). Bio 4118 Applied Biostatistics
Measurement error • Assumption of no error on X can be examined beforehand, and is almost invariably violated. • Only of concern when measurement error is large relative to magnitude of X (say, > 10%). • If assumption is invalid, then Model II regression is required. Bio 4118 Applied Biostatistics
Residual Estimate Residual analysis I: independence • Plot residuals against estimates, look for patterns. • Do ACF plot. Bio 4118 Applied Biostatistics
Residual NEDs Normal Estimate Non-normal Residual Residual analysis II: Normality • Plot residuals against estimates; look for patterns. • Do normal probability plot. • Check with Lilliefors test. Bio 4118 Applied Biostatistics
Residual Residual Estimate Group 1 Estimate Group 2 Group 3 Residual analysis III: Homoscedasticity • Plot residuals against estimates; look for patterns. • Check with Levene’s test by grouping Y’s into several classes. Bio 4118 Applied Biostatistics
Y Estimate X Residual analysis IV: Linearity • Plot residuals against estimates; look for patterns. Residual Bio 4118 Applied Biostatistics
Robustness of regression with respect to violation of assumptions Bio 4118 Applied Biostatistics
What to do when assumptions aren’t met • Try transforming the data, but remember: (1) for some data, no transformation will work; (2) finding an appropriate transformation may not be easy. • Use non-linear regression. Bio 4118 Applied Biostatistics
8.0 7.2 Weight versus length in the beetle Scorpaenichthys marmoratus 6.0 1.0 4.8 Weight (kg; log scale) 0.1 Weight (kg) 3.6 2.4 0.01 1.2 0.001 0 200 400 600 10 100 1000 Length (mm; log scale) Length (mm) Transformations in regression Bio 4118 Applied Biostatistics
150 160 120 Chirps/min Chirps/min (log scale) 100 80 50 40 10 20 10 20 oC Transformations in regression Chirp rate as a function of temperature in males of the cricket Oecanthus fultoni. oC Bio 4118 Applied Biostatistics
7 7 6 6 5 5 4 4 Millivolts Millivolts Electrical resistance as a function of illumination in cephalopod eyes. 3 3 2 2 1 1 0 0 1 2 5 10 20 50 70 0 10 20 30 40 50 60 70 Relative brightness (times) in log scale Relative brightness (times) Transformations in regression Bio 4118 Applied Biostatistics
Y + = Total SS Model (Explained) SS Unexplained (Error) SS Hypothesis testing I: partitioning the total sums of squares Bio 4118 Applied Biostatistics
Hypothesis testing I: partitioning the total sums of squares • So, MSregression = s2Y and MSerror= 0 if observed = expected. • Calculate F = MSR/MSeand compare with F distribution with 1 and N - 2 df. • H0: F = 0 Bio 4118 Applied Biostatistics
Y sb larger sb smaller X Standard error of the slope • The standard error sb and 100(1- a) CIs of the slope are: • So, for fixed N, can decrease sb by expanding range of X values sampled. Y Bio 4118 Applied Biostatistics
Standard error of the intercept Y • The standard error sa of the intercept a is: • So, for fixed N, we can decrease sa by expanding range of X values sampled. a sa larger Y a sa smaller X Bio 4118 Applied Biostatistics
Y Y a H01: a = 0 Y = 0 Y Y a a H02: b = 0 Observed Expected X X Hypothesis testing II: testing model parameters • Test each hypothesis by a t-test: • Note: these are 2-tailed hypotheses! Bio 4118 Applied Biostatistics
Y Y H0 accepted Y Y H0 rejected X Hypothesis testing III: one-tailed hypotheses • Biological theory predicts that Y should increase with X. • So, H0: b 0 (one-tailed) • Calculate: • Reject if tb > 0 and p (one-tailed) < a. Bio 4118 Applied Biostatistics
Confidence intervals in regression 100 (1-a) CI for estimated values 100 (1-a) CI for observations Bio 4118 Applied Biostatistics
Y Estimates Y Observations X Confidence intervals in regression • CI for observations is larger than CI for estimated values. • CIs for both estimated values and observations increase with increasing distance between X value and mean of sample. Bio 4118 Applied Biostatistics
Outlier? Y Outlier? X Outliers • points that appear to lie well off the fitted line • Issue 1: are “apparent” outliers really outliers? • Issue 2: do they significantly affect the statistical conclusions? Bio 4118 Applied Biostatistics
Outlier analysis I: Studentized residuals • Plot Studentized residuals against estimated values. • “Large” residuals are those with value > 3.0 . • Such cases make large contributions to residual mean square of the regression. Bio 4118 Applied Biostatistics
Small leverage Large leverage Outlier analysis II: Leverage Y • Leverage measures the potential influence of the case on the regression line. • Determined by X value only, so that points far from the mean have higher leverage. • “Large” = anything greater than 4/N. X Bio 4118 Applied Biostatistics
Y X Smaller Cook’s Larger Cook’s Outlier analysis III: Cook’s distance • Cook’s distance: measures both leverage and contribution to residual mean square, i.e. actual influence of a point. • “Large” = anything greater than 1. Bio 4118 Applied Biostatistics
Do they have a significant effect on regression results? To determine, delete them, rerun analyses and compare results. Are slope and intercept estimates significantly affected, i.e. still lie within 95% CI’s of original estimates? Y No significant effect Y Significant effect Outliers in Outliers out X Resolving outlier problems Bio 4118 Applied Biostatistics
1 N larger sb fixed N smaller Power (1 - b) sbsmaller N fixed sb larger 0 0 b The effects of outlier deletion • Reduces sample size (N), thereby reducing power. • Decreases MSe, so sb decreases, and power increases. • If N is small, the former effect will probably outweigh the latter unless outliers are very aberrant. Bio 4118 Applied Biostatistics
Reading Concentration Error in “X” Concentration Reading Inverse prediction • Regression of Y on X, but want to predict X, given Y. • Regression of X on Y not possible due to error in Y. • e.g. calibration curves: want to predict concentration from reading, based on regression of reading on known solute concentrations. Bio 4118 Applied Biostatistics
Y Upper 95% limit Lower 95% limit Predicted “X” Inverse prediction • Regress Y on X. • Generate predicted value of X given Y. • Calculate 95% confidence limits for “X” estimate based on 95% confidence limits for “Y” estimate from standard regression. Bio 4118 Applied Biostatistics
Regression SS Within-group SS Error SS SS due to nonlinearity Group SS Regression with replication • When several Y’s are measured for each X. • In this case, we can test the linearity assumption directly by testing the MS due to deviations from linearity over MS within groups. Bio 4118 Applied Biostatistics
Y X Weighted regression • Used when our confidence in the values of individual observations varies, e.g. different measurement error, precision. • In replicated designs, variance of Y for given X may vary among X’s, as may sample size (N). • So, weight by N or inverse of sample variance. Bio 4118 Applied Biostatistics
Z Y X Regression caveats I: causation Y • A statistically significant regression of Y on X need not imply a causal relationship between the two. • A non-significant linear regression need not imply the lack of a causal relationship if the causal relationship is non-linear. X Accept linear H0 Y X Bio 4118 Applied Biostatistics
Y X True regression (H0 accepted) Sample regression (H0 rejected) Regression caveats II: small samples • Significant regressions can be obtained by chance, i.e. even when no (linear) causal relationship exists. • This is especially true if sample sizes are small. • So when doing multiple simple regressions, control ae. Bio 4118 Applied Biostatistics
Y X True regression (H0 rejected but R2 small) Regression caveats III: large samples • When N is large, only very small regression coefficients are required to reject H0 (power is large). • So, be careful of “overinterpreting” the observed relationship if R2 is small. Bio 4118 Applied Biostatistics
Y Estimated relation True relation X Y Predicted value True value Observations X Regression caveats IV: extrapolation and interpolation • Be careful when (1) predictions lie outside range of sample; (2) when predictions are for values where data are sparse. Bio 4118 Applied Biostatistics
The final word on extrapolation In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-six miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian period, just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck over the Gulf of Mexico like a fishing rod. And by the same token, any person can see that seven hundred and forty-two years from now, the lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. Mark Twain, Life on the Mississippi Bio 4118 Applied Biostatistics
Y X Power and sample size in simple linear regression • Because the correlation coefficient r and the regression coefficient b are closely related, i.e. • … we can transform b to r and evaluate power using r. Bio 4118 Applied Biostatistics
Y X Power and sample size regression • If we test H0: b = 0 with sample size n, we can determine 1 - b by calculating the z-transformed values for the critical value of the corresponding r (at specified a) (za) and the sample regression coefficient b (zr),and the one-tailed probability of the normal deviate: Bio 4118 Applied Biostatistics
Y X p b Zb(1) Power and sample size in regression • Once Zb(1) is determined, we can calculate the probability of obtaining a Z-value of this size or greater, i.e. b. • Power is then 1-b. Bio 4118 Applied Biostatistics
Power and sample size in regression: an example • Changes in wing length with age in a sample of 13 birds • So 1 - b = 1.00. Bio 4118 Applied Biostatistics
Y Reject H0? Y Reject H0? X1 Observed Expected under H0: b = 0 True regression (b0) Minimal sample size in regression • Given desired power 1 - b, how large a sample is required to reject H0: b= 0 if it is false and the true regression coefficient is at least b0 ? • To do so, first calculate regression coefficient r0corresponding to b0 . Bio 4118 Applied Biostatistics
Y Reject H0? Y Reject H0? X1 Observed Expected under H0: b = 0 True regression (b0) Minimal sample size in regression (cont’d) • …then calculate: Bio 4118 Applied Biostatistics
We want to reject H0: b= 0 99% of the time when b0> 0.2anda(2)= .05. So b(1) = .01 and For b = .20, we have... Minimal sample size: an example Bio 4118 Applied Biostatistics
So… …and So, a sample size of at least 8 should be used. Minimal sample size (cont’d) Bio 4118 Applied Biostatistics
- More by User
Simple Linear Regression
SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ. Statistics SPSS An Integrative Approach SECOND EDITION. Simple Linear Regression. Using. Chapter 6. Simple Linear Regression.
1.2k views • 26 slides
Simple Linear Regression. Least squares line Interpreting coefficients Prediction Cautions The formal model. Section 2.6, 9.1, 9.2. Professor Kari Lock Morgan Duke University. Exam 2 Grades. In Class:. Lab:. Total:. Comments on In-Class Exam.
641 views • 40 slides
Simple Linear Regression. Conditions Confidence intervals Prediction intervals. Section 9.1, 9.2, 9.3. Professor Kari Lock Morgan Duke University. To Do. Homework 8 (due Monday, 4/9) Project 2 Proposal (due Wednesday, 4/11). Hypothesis Test. > 2*pt(16.21,5,lower.tail=FALSE)
642 views • 42 slides
STAT 101 Dr. Kari Lock Morgan. Simple Linear Regression. SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1). Want More Stats???. If you have enjoyed learning how to analyze data, and want to learn more: take STAT 210 (Regression Analysis)
692 views • 49 slides
STAT 101 Dr. Kari Lock Morgan. Simple Linear Regression. SECTION 2.6, 9.1 Least squares line Interpreting coefficients Prediction Cautions Inference for slope, correlation. Review. ANOVA is used to test for an association between Two categorical variables
615 views • 43 slides
STAT 101 Dr. Kari Lock Morgan 11/6/12. Simple Linear Regression. SECTION 2.6 Interpreting coefficients Prediction Cautions Least Squares regression. Crickets and Temperature. Can you estimate the temperature on a summer evening, just by listening to crickets chirp?
507 views • 34 slides
Simple linear regression. Gediminas Murauskas Vilnius University. Introduction. How variable Y values depend on the values of X? T his question arises when there is need to explore relation between income and expenditure. t o evaluate impact of advertising on sales volume etc.
454 views • 28 slides
Simple Linear Regression. 1. Introduction. Example:. Brad Pitt: 1.83m Angelina Jolie: 1.70m. George Bush :1.81m Laura Bush: ?. David Beckham: 1.83m Victoria Beckham: 1.68m. ● To predict height of the wife in a couple, based on the husband’s height.
1.15k views • 64 slides
Simple Linear Regression. (Session 02). Learning Objectives. At the end of this session, you will be able to understand the meaning of a simple linear regression model, its aims and terminology
371 views • 16 slides
Simple Linear Regression. Statistics 700 December 4-7, 2001.
263 views • 18 slides
Simple Linear Regression. Often we want to understand the relationships among variables, e.g., SAT scores and college GPA car weight and gas mileage amount of a certain pollutant in wastewater and bacteria growth in local streams
565 views • 15 slides
Simple Linear Regression. Statistics 515 Lecture.
324 views • 17 slides
Simple Linear Regression. Example - mammals Response variable: gestation (length of pregnancy) days Explanatory: brain weight. “Man”. Extreme negative residual but that residual is not statistically significant.
432 views • 24 slides
STAT 101 Dr. Kari Lock Morgan 11/6/12. Simple Linear Regression. SECTIONS 9.1, 9.3 Inference for slope (9.1) Confidence and prediction intervals (9.3) Conditions for inference (9.1) Transformations (not in book). Sample to Population.
611 views • 57 slides
Simple Linear Regression. Simple Linear Regression. Our objective is to study the relationship between two variables X and Y. One way to study the relationship between two variables is by means of regression.
658 views • 58 slides
Simple Linear Regression. Section 13.1. Deterministic Relationship. If the value of y (dependent) is completely determined by the value of x (Independent variable) Most are not determined completely by another. Probabilistic Model.
222 views • 16 slides
Simple Linear Regression. Simple Linear Regression. Our objective is to study the relationship between two variables X and Y. One way is by means of regression.
600 views • 54 slides
327 views • 28 slides
Simple Linear Regression. Chapter 17. 17.1 Introduction. In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical equation. The motivation for using the technique:
933 views • 69 slides
IMAGES
VIDEO
COMMENTS
Regression Using regression analysis, we can derive an equation by which the dependent variable (Y) is expressed (and estimated) in terms of its relationship with the independent variable (X). In simple regression, there is only one independent variable (X) and one dependent variable (Y). The dependent variable is the outcome we are trying to predict. In multiple regression, there are several ...
29 Conduct Hypothesis Tests for the Overall Significance of the Regression Function: the F-test A hypothesis of the overall significance of the regression model tests whether any of the explanatory variables have a statistically significant effect on the dependent variable In simple linear regression there is only one explanatory variable so ...
APA Formatted Summary Example. A simple regression was used to test the hypothesis that hours of sleep would predict quiz scores. Consistent with the hypothesis, hours of sleep was a significant predictor of quiz scores, F(1, 8) = 70.54 F (1, 8) = 70.54, p p < .05. Approximately 89.8% of the variance in quiz scores was accounted for by variance ...
Simple Linear Regression ANOVA Hypothesis Test Example: Rainfall and sales of sunglasses We will now describe a hypothesis test to determine if the regression model is meaningful; in other words, does the value of \(X\) in any way help predict the expected value of \(Y\)?
The hypotheses are: Find the critical value using dfE = n − p − 1 = 13 for a two-tailed test α = 0.05 inverse t-distribution to get the critical values ± 2.160. Draw the sampling distribution and label the critical values, as shown in Figure 12-14. Figure 12-14: Graph of t-distribution with labeled critical values.
Simple Linear Regression Model Pearson’s father-and-son data inspire the following assumptions for the simple linear regression (SLR) model: 1.The means of Y is a linear function of X, i.e., E(Y jX = x) = 0 + 1x 2.The SD of Y does not change with x, i.e., SD(Y jX = x) = ˙ for every x 3.(Optional) Within each subpopulation, the distribution ...
Total Sum of Squares: Total variation in the Y Y. 's before fitting the regression line TSS = n∑ i = 1(y i − ˉy) 2 = (n − 1)s 2y. Residual Sum of Squares (RSS): Total variation in the Y 's around the regression line (sum of squared residuals) RSS = n ∑ i = 1[y i − (ˆβ0 + ˆβ1x i)] 2. 13.
7 Hypothesis Tests and Confidence Intervals in MR Models. 7.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient; 7.2 An Application to Test Scores and the Student-Teacher Ratio. Another Augmentation of the Model; 7.3 Joint Hypothesis Testing using the F-Statistic; 7.4 Confidence Sets for Multiple Coefficients; 7.5 Model ...
Presentation Transcript. Tests of Hypothesis in Linear Regression Models Hendrik Wolff – [email protected]. Simple Tests in Linear Regression MOdels • t-Test • F-Test • Autocorrelation Test • Heteroskedasticity Test • Chow Test for Structural Breaks. Student T-Test in the Linear Regression Model William Gosset, employee of Guinness ...
Presentation Transcript. DY Y b = DY/DX DX X What regression does • Fits a straight line through a cloud of data. • Tests and quantifies the effect of an independent variable X on a dependent variable Y. • Intensity of the effect is given by the slope (b) of the regression. • The importance of the effect is given by the coefficient of ...