GLM - general linear Model for special cases

Idea of GLM:
From http://www.statsoft.com/textbook/general-linear-models/ , we discuss on GLM when we analyze economic data.

The roots of the general linear model surely go back to the origins of mathematical thought, but it is the emergence of the theory of algebraic invariants in the 1800's that made the general linear model, as we know it today, possible. The theory of algebraic invariants developed from the groundbreaking work of 19th century mathematicians such as Gauss, Boole, Cayley, and Sylvester. The theory seeks to identify those quantities in systems of equations which remain unchanged under linear transformations of the variables in the system. Stated more imaginatively (but in a way in which the originators of the theory would not consider an overstatement), the theory of algebraic invariants searches for the eternal and unchanging amongst the chaos of the transitory and the illusory. That is no small goal for any theory, mathematical or otherwise.

The wonder of it all is the theory of algebraic invariants was successful far beyond the hopes of its originators. Eigenvalues, eigenvectors, determinants, matrix decomposition methods; all derive from the theory of algebraic invariants. The contributions of the theory of algebraic invariants to the development of statistical theory and methods are numerous, but a simple example familiar to even the most casual student of statistics is illustrative. The correlation between two variables is unchanged by linear transformations of either or both variables.
We probably take this property of correlation coefficients for granted, but what would data analysis be like if we did not have statistics that are invariant to the scaling of the variables involved? Some thought on this question should convince you that without the theory of algebraic invariants, the development of useful statistical techniques would be nigh impossible.

The development of the linear regression model in the late 19th century, and the development of correlational methods shortly thereafter, are clearly direct outgrowths of the theory of algebraic invariants. Regression and correlational methods, in turn, serve as the basis for the general linear model. Indeed, the general linear model can be seen as an extension of linear multiple regression for a single dependent variable. Understanding the multiple regression model is fundamental to understanding the general linear model, so we will look at the purpose of multiple regression, the computational algorithms used to solve regression problems, and how the regression model is extended in the case of the general linear model. A basic introduction to multiple regression methods and the analytic problems to which they are applied is provided in the Multiple Regression.

One way in which the general linear model differs from the multiple regression model is in terms of the number of dependent variables that can be analyzed. The Y vector of n observations of a single Y variable can be replaced by a Y matrix of n observations of m different Y variables. Similarly, the b vector of regression coefficients for a single Y variable can be replaced by a b matrix of regression coefficients, with one vector of b coefficients for each of the m dependent variables. These substitutions yield what is sometimes called the multivariate regression model, but it should be emphasized that the matrix formulations of the multiple and multivariate regression models are identical, except for the number of columns in the Y and b matrices. The method for solving for the b coefficients is also identical, that is, m different sets of regression coefficients are separately found for the m different dependent variables in the multivariate regression model.

The general linear model goes a step beyond the multivariate regression model by allowing for linear transformations or linear combinations of multiple dependent variables. This extension gives the general linear model important advantages over the multiple and the so-called multivariate regression models, both of which are inherently univariate (single dependent variable) methods. One advantage is that multivariate tests of significance can be employed when responses on multiple dependent variables are correlated. Separate univariate tests of significance for correlated dependent variables are not independent and may not be appropriate. Multivariate tests of significance of independent linear combinations of multiple dependent variables also can give insight into which dimensions of the response variables are, and are not, related to the predictor variables. Another advantage is the ability to analyze effects of repeated measure factors. Repeated measure designs, or within-subject designs, have traditionally been analyzed using ANOVA techniques. Linear combinations of responses reflecting a repeated measure effect (for example, the difference of responses on a measure under differing conditions) can be constructed and tested for significance using either the univariate or multivariate approach to analyzing repeated measures in the general linear model.

A second important way in which the general linear model differs from the multiple regression model is in its ability to provide a solution for the normal equations when the X variables are not linearly independent and the inverse of X'X does not exist. Redundancy of the X variables may be incidental (e.g., two predictor variables might happen to be perfectly correlated in a small data set), accidental (e.g., two copies of the same variable might unintentionally be used in an analysis) or designed (e.g., indicator variables with exactly opposite values might be used in the analysis, as when both Male and Female predictor variables are used in representing Gender). Finding the regular inverse of a non-full-rank matrix is reminiscent of the problem of finding the reciprocal of 0 in ordinary arithmetic. No such inverse or reciprocal exists because division by 0 is not permitted. This problem is solved in the general linear model by using a generalized inverse of the X'X matrix in solving the normal equations. A generalized inverse is any matrix that satisfies

The theory of algebrac invariants: "Invariant theory is a branch of abstract algebra dealing with actions of groups on algebraic varieties from the point of view of their effect on functions. Classically, the theory dealt with the question of explicit description of polynomial functions that do not change, or are invariant, under the transformations from a given linear group." ( wikipedia)

In Stata, we use structure: glm depvar [indepvars] [if] [in] [weight] [, options]
when the dependent variable is a proportion

Proportion data has values that fall between zero and one. Naturally, it would be nice to have the predicted values also fall between zero and one. One way to accomplish this is to use a generalized linear model (glm) with a logit link and the binomial family. We will include the robust option in the glm model to obtain robust standard errors which will be particularly useful if we have misspecified the distribution family. We will demonstrate this using a dataset in which the dependent variable, meals, is the proportion of students receiving free or reduced priced meals at school.
use http://www.ats.ucla.edu/stat/stata/faq/proportion, clear /* kernal density distribution of meals */
kdensity meals
glm meals yr_rnd parented api99, link(logit) family(binomial) robust nolog

Generalized linear models                          No. of obs      =      4257
Optimization     : ML                              Residual df     =      4253
                                                                    Scale parameter =         1
Deviance         = 395.8141242                    (1/df) Deviance =   .093067
Pearson          = 374.7025759                    (1/df) Pearson = .0881031

Variance function: V(u) = u*(1-u/1)                [Binomial]
Link function    : g(u) = ln(u/(1-u))            [Logit]

                                                    AIC             = .7220973
Log pseudolikelihood = -1532.984106                BIC             = -35143.61

      ------------------------------------------------------------------------------
             |               Robust
      meals |      Coef. Std. Err.       z   P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   yr_rnd |   .0482527   .0321714     1.50   0.134    -.0148021    .1113074
    parented | -.7662598   .0390715   -19.61   0.000    -.8428386   -.6896811
         api99 | -.0073046   .0002156   -33.89   0.000    -.0077271   -.0068821
      _cons |    6.75343 .0896767    75.31 0.000     6.577667    6.929193
             ------------------------------------------------------------------------------

Next, we will compute predicted scores from the model and transform them back so that they are scaled the same way as the original proportions.

predict premeals1
(option mu assumed; predicted mean meals) (164 missing values generated)
summarize meals premeals1 if e(sample)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       meals |      4257    .5165962    .3100389          0          1
   premeals1 |      4257    .5165962    .2849672   .0220988   .9770855

As a contrast, let's run the same analysis without the transformation. We will then graph the original dependent variable and the two predicted variables against api99.

regress meals yr_rnd parented api99
predict preols

If we compare independent variable with 2 it's prediction from estimators
summarize meals premeals1 preols if e(sample)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       meals |      4257    .5165962    .3100389          0          1
   premeals1 |      4257    .5165962    .2849672   .0220988   .9770855
      preols |      4257    .5165962    .2818586 -.1930684   1.199395

/* figure 1: proportion dependent variable */

graph twoway scatter meals api99, yline(0 1) msym(oh)

/* figure 2: predicted values from model with logit transformation */

graph twoway scatter premeals1 api99, yline(0 1) msym(oh)

/* figure 3: predicted values from model without transformation */

graph twoway scatter preols api99, yline(0 1) msym(oh)
Note that the values from figures 1 and 2 fall within the range of zero to one while those in figure 3 the values go beyond those bounds. Let's finish by looking a the correlations of the predicted values with the dependent variable, meals.
corr meals premeals1 preols (obs=4257)

               |    meals premea~1   preols
   -------------+---------------------------
          meals |   1.0000
   premeals1 |   0.9152   1.0000
          preols |   0.9091   0.9891   1.0000

Note that the correlation between meals and premeals1 is slightly higher than for meals and preols.
Predicting specific values
Now, let's say that you want predicted proportions for some specific combinations of your predictor variables. Specifically, for 500, 600 and 700 for api99, for 1 and 2 for yr_rnd, and for parentrd of 2.5. You would append the following six observations to your dataset with an n of 4421.
count 4421
set obs 4427

replace api99 = 500 in 4422
replace api99 = 600 in 4423
replace api99 = 700 in 4424
replace api99 = 500 in 4425
replace api99 = 600 in 4426
replace api99 = 700 in 4427
replace yr_rnd = 1 in 4422/4424
replace yr_rnd = 2 in 4425/4427
replace parented = 2.5 in 4422/4427
list api99 yr_rnd parented in -6/l, separator(3)

Rerun your model for the 'real' observations (note the in 1/4421), predict for all observations, and display your results.

glm meals yr_rnd parented api99 in 1/4421, link(logit) family(binomial) robust nolog

Generalized linear models                         No. of obs      =      4257
Optimization     : ML                             Residual df     =      4253
                                                     Scale parameter =         1
Deviance         = 395.8141242                    (1/df) Deviance =   .093067
Pearson          = 374.7025759                    (1/df) Pearson =    .0881031

Variance function: V(u) = u*(1-u/1)                [Binomial]
Link function    : g(u) = ln(u/(1-u))                [Logit]

                                                   AIC             = .7220973
Log pseudolikelihood = -1532.984106                BIC             = -35143.61

      ------------------------------------------------------------------------------
             |               Robust
         meals |      Coef.   Std. Err.      z      P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
   yr_rnd |   .0482527   .0321714     1.50   0.134    -.0148021    .1113074
    parented | -.7662598   .0390715   -19.61   0.000    -.8428386   -.6896811
         api99 | -.0073046   .0002156   -33.89   0.000    -.0077271   -.0068821
         _cons |    6.75343   .0896767    75.31   0.000     6.577667    6.929193
       ------------------------------------------------------------------------------

predict premeals

list api99 yr_rnd parented premeals in -6/l, separator(3)

      +--------------------------------------+
        | api99   yr_rnd   parented   premeals |
        |--------------------------------------|
4422. |   500       No        2.5    .774471 |
4423. |   600       No        2.5   .6232278 |
4424. |   700       No        2.5   .4434458 |
      |--------------------------------------|
4425. |   500      Yes        2.5   .7827873 |
4426. |   600      Yes        2.5   .6344891 |
4427. |   700      Yes        2.5   .4553849 |
      +--------------------------------------+

Note:
The Akaike information criterion is a measure of the relative goodness of fit of a statistical model. The AIC is grounded in the concept of information entropy, in effect offering a relative measure of the information lost when a given model is used to describe reality. It can be said to describe the tradeoff between bias and variance in model construction, or loosely speaking between accuracy and complexity of the model.
AIC values provide a means for model selection. AIC does not provide a test of a model in the sense of testing a null hypothesis; i.e. AIC can tell nothing about how well a model fits the data in an absolute sense. If all the candidate models fit poorly, AIC will not give any warning of that.

In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models. It is based, in part, on the likelihood function, and it is closely related to Akaike information criterion (AIC).
When fitting models, it is possible to increase the likelihood by adding parameters, but doing so may result in overfitting. The BIC resolves this problem by introducing a penalty term for the number of parameters in the model. The penalty term is larger in BIC than in AIC.
The BIC was developed by Gideon E. Schwarz, who gave a Bayesian argument for adopting it.[1] It is closely related to the Akaike information criterion (AIC). In fact, Akaike was so impressed with Schwarz's Bayesian formalism that he developed his own Bayesian formalism, now often referred to as the ABIC for "a Bayesian Information Criterion" or more casually "Akaike's Bayesian Information Criterion".[2]

GLM - general linear Model for special cases

0 comments:

Post a Comment