What mean log-likehood? what's it imply?
From Nick Cox:
Nick Cox • 2 years ago
Isabel's point can be pushed further by underlining that probability densities have units of measurement. Naturally, everyone knows about units of measurement. Somewhere in your past there was, with probability 1, some fierce science teacher who was scathing when you missed out the units of measurement in some report. The point re-emerges in the standard elementary statistics course when it is underlined that the units of variance are the square of the original units of measurement, which is grounds enough for introducing its square root, the standard deviation.
However, I've often noticed statistical people not using the same logic when talking about densities, and sometimes their colleagues or students can end up confused (Occasionally, they get confused too.)
The underlying general idea I often explain in this way. Density is amount of "stuff" in some "space". In physics, with density in the classic sense, "stuff" is clearly mass and "space" is clearly volume. Many social scientists might more commonly think of something like population density, in which "stuff" is number of people and "space" is area. In the present example, "stuff" is probability and "space" is the support of the variable(s) in question.
In statistics, introductory courses usually insist on a distinction between probabilities for discrete variables and probability densities for continuous variables, and ne'er the twain shall meet. (At higher levels, mathematically-oriented statisticians who have ingested large doses of measure theory sometimes insist that anything can be a density; it's just a case of the underlying measure, which could be counting measure.)
Focusing on the continuous case, the units of the density come from working backwards from the fact that the total probability, the integral over its support of the density function, must be 1 and must be unit-free and dimensionless. It follows that units of density = 1 / units of variable. In the univariate case, the probability can be considered as the area under the density curve, and the argument can be made visual by considering rectangles with sides the density and the variable.
So, the units of the density of "miles per gallon" are "gallons per mile", however odd that may seem. In the bivariate or multivariate case, the units we are talking about are the product of the units of the individual variables, which gets messy and unintuitive, but not intrinsically difficult or problematic. If we were imagining the joint density of mpg and
weight in the auto dataset, the units would be 1 / (miles per gallon * pounds).
Away from likelihood calculations, the issue often arises when people get a density estimate from say -kdensity- and are puzzled by densities above 1. In fact, people have been known to ask how to "fix" the results, which they put down to some bug in
-kdensity-. I'll add a puff for a small classic of exposition:
D. J. Finney. 1977.
Dimensions of statistics.
Journal of the Royal Statistical Society. Series C (Applied Statistics)
26: 285-289.
Naturally, even if you specify your units for densities, you can still make mistakes. I read a thesis in which a candidate reported a density for his soils of 2 mg/m^3. (Think of how much a cubic metre of water weighs, and double it.) It is salutary to realise that billion-fold errors are not reserved for cosmology or finance, but are possible in your own backyard.
Nice post, things like these can really baffle a person for a while. It reminds me of a discussion I had a while ago where someone claimed that it was better to model proportions than percentages. He based that on looking at the BIC values, and they do result in a lot smaller BIC values:
use http://fmwww.bc.edu/repec/boco..., clear
reg governing noleft minorityleft houseval popdens
estat ic
gen prop_gov = governing * 100
reg prop_gov noleft minorityleft houseval popdens
estat ic
It took me a while to first figure out myself what was going on, which is basicaly the same issue as the one discussed in this post, and than to convince that other person.
Positive log-likelihood values happen
16 February 2011
From time to time, we get a question from a user puzzled about getting a positive log likelihood for a certain estimation. We get so used to seeing negative log-likelihood values all the time that we may wonder what caused them to be positive.
First, let me point out that there is nothing wrong with a positive log likelihood.
The likelihood is the product of the density evaluated at the observations. Usually, the density takes values that are smaller than one, so its logarithm will be negative. However, this is not true for every distribution.
For example, let’s think of the density of a normal distribution with a small standard deviation, let’s say 0.1.
. di normalden(0,0,.1) 3.9894228This density will concentrate a large area around zero, and therefore will take large values around this point. Naturally, the logarithm of this value will be positive.
. di log(3.9894228) 1.3836466In model estimation, the situation is a bit more complex. When you fit a model to a dataset, the log likelihood will be evaluated at every observation. Some of these evaluations may turn out to be positive, and some may turn out to be negative. The sum of all of them is reported. Let me show you an example.
I will start by simulating a dataset appropriate for a linear model.
clear program drop _all set seed 1357 set obs 100 gen x1 = rnormal() gen x2 = rnormal() gen y = 2*x1 + 3*x2 +1 + .06*rnormal()I will borrow the code for mynormal_lf from the book Maximum Likelihood Estimation with Stata (W. Gould, J. Pitblado, and B. Poi, 2010, Stata Press) in order to fit my model via maximum likelihood.
program mynormal_lf version 11.1 args lnf mu lnsigma quietly replace `lnf' = ln(normalden($ML_y1,`mu',exp(`lnsigma'))) end ml model lf mynormal_lf (y = x1 x2) (lnsigma:) ml max, nologThe following table will be displayed:
. ml max, nolog Number of obs = 100 Wald chi2(2) = 456919.97 Log likelihood = 152.37127 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- eq1 | x1 | 1.995834 .005117 390.04 0.000 1.985805 2.005863 x2 | 3.014579 .0059332 508.08 0.000 3.00295 3.026208 _cons | .9990202 .0052961 188.63 0.000 .98864 1.0094 -------------+---------------------------------------------------------------- lnsigma | _cons | -2.942651 .0707107 -41.62 0.000 -3.081242 -2.804061 ------------------------------------------------------------------------------We can see that the estimates are close enough to our original parameters, and also that the log likelihood is positive.
We can obtain the log likelihood for each observation by substituting the estimates in the log-likelihood formula:
. predict double xb . gen double lnf = ln(normalden(y, xb, exp([lnsigma]_b[_cons]))) . summ lnf, detail lnf ------------------------------------------------------------- Percentiles Smallest 1% -1.360689 -1.574499 5% -.0729971 -1.14688 10% .4198644 -.3653152 Obs 100 25% 1.327405 -.2917259 Sum of Wgt. 100 50% 1.868804 Mean 1.523713 Largest Std. Dev. .7287953 75% 1.995713 2.023528 90% 2.016385 2.023544 Variance .5311426 95% 2.021751 2.023676 Skewness -2.035996 99% 2.023691 2.023706 Kurtosis 7.114586 . di r(sum) 152.37127 . gen f = exp(lnf) . summ f, detail f ------------------------------------------------------------- Percentiles Smallest 1% .2623688 .2071112 5% .9296673 .3176263 10% 1.52623 .6939778 Obs 100 25% 3.771652 .7469733 Sum of Wgt. 100 50% 6.480548 Mean 5.448205 Largest Std. Dev. 2.266741 75% 7.357449 7.564968 90% 7.51112 7.56509 Variance 5.138117 95% 7.551539 7.566087 Skewness -.8968159 99% 7.566199 7.56631 Kurtosis 2.431257We can see that some values for the log likelihood are negative, but most are positive, and that the sum is the value we already know. In the same way, most of the values of the likelihood are greater than one.
As an exercise, try the commands above with a bigger variance, say, 1. Now the density will be flatter, and there will be no values greater than one.
In short, if you have a positive log likelihood, there is nothing wrong with that, but if you check your dispersion parameters, you will find they are small
0 comments:
Post a Comment