what did we know log-likehood?

References: http://blog.stata.com/2011/02/16/positive-log-likelihood-values-happen/
What mean log-likehood? what's it imply?

From Nick Cox:

Nick Cox • 2 years ago

Isabel's point can be pushed further by underlining that probability densities have units of measurement. Naturally, everyone knows about units of measurement. Somewhere in your past there was, with probability 1, some fierce science teacher who was scathing when you missed out the units of measurement in some report. The point re-emerges in the standard elementary statistics course when it is underlined that the units of variance are the square of the original units of measurement, which is grounds enough for introducing its square root, the standard deviation.

However, I've often noticed statistical people not using the same logic when talking about densities, and sometimes their colleagues or students can end up confused (Occasionally, they get confused too.)
The underlying general idea I often explain in this way. Density is amount of "stuff" in some "space". In physics, with density in the classic sense, "stuff" is clearly mass and "space" is clearly volume. Many social scientists might more commonly think of something like population density, in which "stuff" is number of people and "space" is area. In the present example, "stuff" is probability and "space" is the support of the variable(s) in question.
In statistics, introductory courses usually insist on a distinction between probabilities for discrete variables and probability densities for continuous variables, and ne'er the twain shall meet. (At higher levels, mathematically-oriented statisticians who have ingested large doses of measure theory sometimes insist that anything can be a density; it's just a case of the underlying measure, which could be counting measure.)
Focusing on the continuous case, the units of the density come from working backwards from the fact that the total probability, the integral over its support of the density function, must be 1 and must be unit-free and dimensionless. It follows that units of density = 1 / units of variable. In the univariate case, the probability can be considered as the area under the density curve, and the argument can be made visual by considering rectangles with sides the density and the variable.

So, the units of the density of "miles per gallon" are "gallons per mile", however odd that may seem. In the bivariate or multivariate case, the units we are talking about are the product of the units of the individual variables, which gets messy and unintuitive, but not intrinsically difficult or problematic. If we were imagining the joint density of mpg and
weight in the auto dataset, the units would be 1 / (miles per gallon * pounds).
Away from likelihood calculations, the issue often arises when people get a density estimate from say -kdensity- and are puzzled by densities above 1. In fact, people have been known to ask how to "fix" the results, which they put down to some bug in
-kdensity-. I'll add a puff for a small classic of exposition:
D. J. Finney. 1977.
Dimensions of statistics.
Journal of the Royal Statistical Society. Series C (Applied Statistics)
26: 285-289.
Naturally, even if you specify your units for densities, you can still make mistakes. I read a thesis in which a candidate reported a density for his soils of 2 mg/m^3. (Think of how much a cubic metre of water weighs, and double it.) It is salutary to realise that billion-fold errors are not reserved for cosmology or finance, but are possible in your own backyard.

From Maarten Buis

Nice post, things like these can really baffle a person for a while. It reminds me of a discussion I had a while ago where someone claimed that it was better to model proportions than percentages. He based that on looking at the BIC values, and they do result in a lot smaller BIC values:

use http://fmwww.bc.edu/repec/boco..., clear
reg governing noleft minorityleft houseval popdens
estat ic
gen prop_gov = governing * 100
reg prop_gov noleft minorityleft houseval popdens
estat ic

It took me a while to first figure out myself what was going on, which is basicaly the same issue as the one discussed in this post, and than to convince that other person.

Positive log-likelihood values happen

16 February 2011 Isabel Canette, Senior Statistician

From time to time, we get a question from a user puzzled about getting a positive log likelihood for a certain estimation. We get so used to seeing negative log-likelihood values all the time that we may wonder what caused them to be positive.

First, let me point out that there is nothing wrong with a positive log likelihood.
The likelihood is the product of the density evaluated at the observations. Usually, the density takes values that are smaller than one, so its logarithm will be negative. However, this is not true for every distribution.
For example, let’s think of the density of a normal distribution with a small standard deviation, let’s say 0.1.

. di normalden(0,0,.1)
3.9894228

This density will concentrate a large area around zero, and therefore will take large values around this point. Naturally, the logarithm of this value will be positive.

. di log(3.9894228)
1.3836466

In model estimation, the situation is a bit more complex. When you fit a model to a dataset, the log likelihood will be evaluated at every observation. Some of these evaluations may turn out to be positive, and some may turn out to be negative. The sum of all of them is reported. Let me show you an example.
I will start by simulating a dataset appropriate for a linear model.

clear
program drop _all
set seed 1357
set obs 100
gen x1 = rnormal()
gen x2 = rnormal()
gen y = 2*x1 + 3*x2 +1 + .06*rnormal()

I will borrow the code for mynormal_lf from the book Maximum Likelihood Estimation with Stata (W. Gould, J. Pitblado, and B. Poi, 2010, Stata Press) in order to fit my model via maximum likelihood.

program mynormal_lf
        version 11.1
        args lnf mu lnsigma
        quietly replace `lnf' = ln(normalden($ML_y1,`mu',exp(`lnsigma')))
end

ml model lf  mynormal_lf  (y = x1 x2) (lnsigma:)
ml max, nolog

The following table will be displayed:

.   ml max, nolog

                                                  Number of obs   =        100
                                                  Wald chi2(2)    =  456919.97
Log likelihood =  152.37127                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eq1          |   
          x1 |   1.995834    .005117   390.04   0.000     1.985805    2.005863
          x2 |   3.014579   .0059332   508.08   0.000      3.00295    3.026208
       _cons |   .9990202   .0052961   188.63   0.000       .98864      1.0094
-------------+----------------------------------------------------------------
lnsigma      |  
       _cons |  -2.942651   .0707107   -41.62   0.000    -3.081242   -2.804061
------------------------------------------------------------------------------

We can see that the estimates are close enough to our original parameters, and also that the log likelihood is positive.
We can obtain the log likelihood for each observation by substituting the estimates in the log-likelihood formula:

. predict double xb

. gen double lnf = ln(normalden(y, xb, exp([lnsigma]_b[_cons])))

. summ lnf, detail

                             lnf
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -1.360689      -1.574499
 5%    -.0729971       -1.14688
10%     .4198644      -.3653152       Obs                 100
25%     1.327405      -.2917259       Sum of Wgt.         100

50%     1.868804                      Mean           1.523713
                        Largest       Std. Dev.      .7287953
75%     1.995713       2.023528
90%     2.016385       2.023544       Variance       .5311426
95%     2.021751       2.023676       Skewness      -2.035996
99%     2.023691       2.023706       Kurtosis       7.114586

. di r(sum)
152.37127

. gen f = exp(lnf)

. summ f, detail

                              f
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .2623688       .2071112
 5%     .9296673       .3176263
10%      1.52623       .6939778       Obs                 100
25%     3.771652       .7469733       Sum of Wgt.         100

50%     6.480548                      Mean           5.448205
                        Largest       Std. Dev.      2.266741
75%     7.357449       7.564968
90%      7.51112        7.56509       Variance       5.138117
95%     7.551539       7.566087       Skewness      -.8968159
99%     7.566199        7.56631       Kurtosis       2.431257

We can see that some values for the log likelihood are negative, but most are positive, and that the sum is the value we already know. In the same way, most of the values of the likelihood are greater than one.
As an exercise, try the commands above with a bigger variance, say, 1. Now the density will be flatter, and there will be no values greater than one.
In short, if you have a positive log likelihood, there is nothing wrong with that, but if you check your dispersion parameters, you will find they are small

what did we know log-likehood?

Positive log-likelihood values happen

0 comments:

Post a Comment