Correlation and RegressionThe general purpose of multiple regression the term was first used by Pearson, is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. For example, a real estate agent might record for each listing the size of the house in square feet , the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. You may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics. Personnel professionals customarily use multiple regression procedures to determine equitable compensation. The personnel analyst then usually conducts a salary survey among comparable companies in the market, recording the salaries and respective characteristics i. This information can be used in a multiple regression analysis to build a regression equation of the form:.
Lesson 12: Correlation & Simple Linear Regression
Regression analysis is a statistical process for estimating the relationships among variables and includes many techniques for modeling and analyzing several variables. When the focus is on the relationship between a dependent variable and one or more independent variables. This involves data that fits a line in two dimensions. You will also study correlation which measures how strong the relationship is. The variable x is the independent variable, and y is the dependent variable. Typically, you choose a value to substitute for the independent variable and then solve for the dependent variable.
Statistical technique used to measure the strength of linear association between two continuous variables, i. When calculated using the observed data, it is commonly known as Pearson's correlation coefficient after Karl Pearson who first defined it. When using the ranks of the data, instead of the observed data, it is known as Spearman's rank correlation. One can test whether r is statistically significantly different from zero the value of no correlation. Note that the larger the sample the smaller the value of r that becomes significant. Simple linear regression is used to describe the relationship between two variables where one variable the dependent variable, denoted by y is expected to change as the other one independent, explanatory or predictor variable, denoted by x changes.
For example, here are two graphs. For the first, I dusted off the elliptical machine in our basement and measured my pulse after one minute of ellipticizing at various speeds:. For the second graph, I dusted off some data from McDonald : I collected the amphipod crustacean Platorchestia platensis on a beach near Stony Brook, Long Island, in April, , removed and counted the number of eggs each female was carrying, then freeze-dried and weighed the mothers:. There are three things you can do with this kind of data. For the exercise data, you'd want to know whether pulse rate was significantly higher with higher speeds.
Correlation and linear regression are the most commonly used techniques for investigating the relationship between two quantitative variables. The goal of a correlation analysis is to see whether two measurement variables co vary, and to quantify the strength of the relationship between the variables , whereas regression expresses the relationship in the form of an equation. For example, in students taking a Maths and English test, we could use correlation to determine whether students who are good at Maths tend to be good at English as well, and regression to determine whether the marks in English can be predicted for given marks in Maths. The starting point is to draw a scatter of points on a graph, with one variable on the X-axis and the other variable on the Y-axis, to get a feel of the relationship if any between the variables as suggested by the data. The closer the points are to a straight line, the stronger the linear relationship between two variables.