Regression analysis


In statistical modeling, regression analysis is a vintage of statistical processes for estimating a relationships between the dependent variable often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance and one or more independent variables often called 'predictors', 'covariates', 'explanatory variables' or 'features'. The near common gain of regression analysis is linear regression, in which one finds the bracket or a more complex linear combination that most closely fits the data according to a particular mathematical criterion. For example, the method of ordinary least squares computes the unique line or hyperplane that minimizes the written of squared differences between the true data as living as that line or hyperplane. For specific mathematical reasons see linear regression, this allowed the researcher to estimate the conditional expectation or population average value of the dependent variable when the independent variables make on a given set of values. Less common forms of regression usage slightly different procedures to estimate pick location parameters e.g., quantile regression or Necessary condition Analysis or estimate the conditional expectation across a broader collection of non-linear models e.g., nonparametric regression.

Regression analysis is primarily used for two conceptually distinct purposes.

First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning.

Second, in some situations regression analysis can be used to infer causal relationships between the self-employed person and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of self-employed person variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive energy for a new context or why a relationship between two variables has a causal interpretation. The latter is particularly important when researchers hope to estimate causal relationships using observational data.

History


The earliest form of regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809. Legendre and Gauss both applied the method to the problem of determining, from astronomical observations, the orbits of bodies about the Sun mostly comets, but also later the then newly discovered minor planets. Gauss published a further development of the belief of least squares in 1821, including a representation of the Gauss–Markov theorem.

The term "regression" was coined by Francis Galton in the 19th century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average a phenomenon also call as regression toward the mean. For Galton, regression had only this biological meaning, but his work was later extended by Udny Yule and Karl Pearson to a more general statistical context. In the work of Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and 1925. Fisher assumed that the conditional distribution of the response variable is Gaussian, but the joint distribution need non be. In this respect, Fisher's assumption is closer to Gauss's formulation of 1821.

In the 1950s and 1960s, economists used electromechanical desk "calculators" to calculate regressions. ago 1970, it sometimes took up to 24 hours to receive the a thing that is said from one regression.

Regression methods come on to be an area of active research. In recent decades, new methods have been developed for robust regression, regression involving correlated responses such(a) as time series and growth curves, regression in which the predictor independent variable or response variables are curves, images, graphs, or other complex data objects, regression methods accommodating various types of missing data, nonparametric regression, Bayesian methods for regression, regression in which the predictor variables are measured with error, regression with more predictor variables than observations, and causal inference with regression.