Ask the experts, part n

Well we’re long overdue for another installment of “Ask the Self-Appointed Experts“, or at least for the question part. In today’s edition a follower from Two Forks Montana wrestles with the following conundrum, inviting others to help, or at least reassure him that he is not confused alone. He writes:

I know this issue is all over AM talk radio but the inmates pretty clearly run the asylum there and I’m more confused on the following issue than ever.

It is well known that, given a known rate process, the gamma distribution defines the expected values from random starting points to the nth “nearest” object, and so inversely, we can estimate unknown rates by such values. For example, in a forest of randomly distributed trees, the circular area, a, defined by the distance to the nth closest tree, will estimate tree density. But as Skellam (1952), Moore (1954) and Pollard (1971) showed analytically, these estimates are biased, in inverse magnitude to the value of n, specifically, as n/(n-1) for n > 1. Thus, the distance to, say, the 2nd closest tree will correspond to the area represented by one tree, not two. All very well and good.

Now, the mean of the integration of the gamma distribution from 0 to 1, for a known rate, should return the mean area a, but when I closely approximate the integral (in R, which can’t integrate), I seem to get bias-corrected values reflecting the rates, rather than the biased values reflecting the areas (a) to the nth object. I’m flummoxed and not a little aggravated. Do they know what they’re doing there at R headquarters, or is it me that’s got all turned about the wrong way? If I can’t even trust the values from the statistical distributions in R, then just what can I trust there? I tried taking my mind off the matter by following the Winnipeg Jets (WJPCA, 2015), but man, one can just take only so much of that and I sure as hell ain’t going to follow Edmonton. The ice fishing seems to help, at least until the alcohol wears off, but really there should be an 800 number I think. If you can put me onto some clues I would be most grateful.

My R code and results (for the n = 2 object):

probs = seq(from = 0.000001, to = 0.999999, by = 0.000001)		# evenly spaced probability steps
mean.area = mean(qgamma(p=probs, shape=2, rate = 1, lower.tail = T))	# approximate the pdf integral, as the mean of the sampled distribution
[1] 1.999993

1.999993, WTF !??!

References:
Skellam, J.G. 1952. Studies in statistical ecology: I. Spatial Pattern. Biometrika 39:346-362
Moore, P.G. 1954. Spacing in plant populations. Ecology 35:222-227.
Pollard, J.H. 1971. On distance estimators of density in randomly distributed forests. Biometrics 27:991-1002
WJPCA, 2015. Your 2015-2016 Winnipeg Jets: Fast, Furious and Fun. Winnipeg Jets Promotional Coordinating Association.

Sincerely,
Stewart Stansbury, Two Forks MT (“not quite Canada but roughly as empty”), USA

p.s. Jets picture enclosed

Jets player checking imaginary opponent hard into the glass, to the delight of all

Jets player checking imaginary opponent hard into the glass, to the delight of all

Advertisements

6 thoughts on “Ask the experts, part n

  1. Hi Jim. This is an interesting problem, and it got me digging through a bunch of things to see if there was an R issue (definitely not to distract me from working on other papers…).

    This isn’t a problem with R, as far as I can tell. The expectation of a gamma distribution (what you’re calculating here) is equal to shape/rate (https://en.wikipedia.org/wiki/Gamma_distribution), which is what the integral you were approximating here shows. And given what I understand from skimming Moore and Pollard, this is what you’d expect: the mean of the Gamma distribution for a given rate and scale (i.e. nth tree) is the expected distance to the nth closest tree.

    The bias part arises in using a maximum-likelihood estimator to estimate the rate. So while the mean area of the disc that intersects the second tree from a point is x, the maximum likelihood estimator of the rate given x for n=2 will be rate/2:
    max_lik =function(rate, n, obs_area) dgamma(obs_area, n, rate)
    print(optimize(max_lik,interval = c(0,10), n=1,obs_area= 2, maximum = T))

    Also, R does have integration capabilities, and uses quite a nice numerical approximation. The code you wrote can be re-written as:
    mean.area =integrate(qgamma,lower=0,upper = 1,shape=1, rate=2,lower.tail=F)
    print(mean.area)

    Hope this helps!

    • Thanks Eric. You are right and I’ve gotten to the bottom of why Stewart is confused on the matter.

      He had been considering the problem in relation to its relationship to the Poisson distribution and mistakenly assumed that the rate parameter of the gamma was analogous to the lambda parameter of the Poisson. That is, he had assumed that a rate = 1 implied a specific area. In the Poisson, lambda = 1 implies the unit area on which one expects a density of exactly one object, in a collection of plots of exactly that area in size. Not the case for the gamma: a rate = 1 doesn’t imply anything about the sampled area at all, and indeed, the gamma returns the distribution of expected areas defined by the distance to the n = 1 object, biased though they are. As you state, the bias comes in when the areas defined by the gamma are converted to density.

      In his defense however you will find many people making just this mistake on the AM radio talk shows.

  2. Ewww, not quite Canada but roughly as empty. That’s cold. And what did Canada ever do to Stewart?? At least they don’t have Donald Trump.

    R is open source. The folks at R ‘headquarters’ shouldn’t be the only ones looking to make better code. It would be ironic if a Canadian fixed the problem for him. Sometimes solitude is what is called for.

  3. Dear Stewart,
    I can’t help you with your Winnipeg Jets fixation; you need to seek professional help for that.

    But for the trees, try inverting them:
    mean.area = mean(1/qgamma(p=probs, shape=2, rate = 1, lower.tail = T) )
    That should yield an answer closer to 1.

    • Sorry, the left-hand-side of the above equation is the mean of the estimator for Pollard’s lambda, the inverse of the mean area per tree. Give or take a factor of pi. Should have renamed it, but copy-and-paste makes one sloppy.

  4. Hang on everybody, finally checking back in. Very sporadic internet access right now and I’ve been on the phone with Stewart and the media folks non-stop. But I shall return and we un’s shall get to the bottom of this 🙂

Have at it

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s