I often get obsessed with certain topics, especially statistical and mathematical ones. Lately I’ve been thinking about the Poisson distribution a lot, as it figures heavily in analyses of random populations, and thus also in assessing deviations therefrom, a topic that I’ve been heavily involved for a while, relative to historic and current tree populations. I also find that often a specific question will arise, related to whatever it is I’m doing, that is best (and often quickest) answered in a way that gives me the most confidence, by direct simulation, for which I use R.
The topic of varying scales of variation, and the information that can be obtained by analysis thereof, is a pretty interesting one IMO. When the generating processes for a phenomenon of interest are complex, or poorly understood for whatever reason, one can (and should!) obtain valuable information regarding likelihoods of various hypothetical process drivers, by multi-scale analysis–essentially, obtaining evidence for the scale(s) at which departures from randomness are greatest and using that information to suggest, or constrain, possible explanations. There are a number of ways to go about doing so.
The Poisson distribution is the appropriate descriptor of a homogeneous (constant) rate process whose individual event outcomes are random. An “under-dispersed” population at a particular scale of analysis will be more “regular” in its arrangement than expected by a random process, and it is clear that in such a situation there must necessarily also be under-dispersion at at least some other scales as well, both smaller and larger. To illustrate via an extreme example, suppose some location that gets 36 inches of precipitation (P) per year on average, distributed as exactly three inches per month, every month. The probability of such a result arising, when P varies randomly (Poisson) at any sub-monthly scale, is extremely low. It won’t occur over any extended period of time. The same principle holds, though muted, if there is some monthly variance around 3.0.
In an over-dispersed (“clustered”) population, again at some defined scale, the situation is different. Such a population will also be over-dispersed at smaller scales, but not necessarily at larger ones, at least not at the same intensity, and there should be some unknown scale at which the variation reduces to the Poisson. This means that a Poisson population is not necessarily Poisson at smaller scales, but it should be so at larger scales. That is, it should “scale up” according to Poisson expectation, i.e. with the same rate but a greater absolute number, and variance therein, per sample unit.
But does it? Or rather, what does R think about the matter?
Well, here’s what I get, using the example case of a mean annual P/yr of 36″ and 100,000 simulated monthly, or weekly, sums from randomly sampling the Poisson expectation at sub-interval (weekly or daily) time scales.
# 1. Observed annual mean and corresponding scaled sub-int. means. Year = 364, month = 28, days.
obs.ann.mn = 36 # observed annual mean, from record
(monthly.mn = obs.ann.mn/13) # 13 months/yr!
(weekly.mn = obs.ann.mn/52)
(daily.mn = obs.ann.mn/364)
# 2. Poisson CDF expectations, ignore slight variations in days:
# Equal interval CDF probs determined by no. time intervals in a year, eases interpr.
# Set CDF probs to correspond to the various time ints. to give temporal distribution
# NOTE that qpois, for cdf = 1.0, = Inf., so omit last interval
P.Pmonth = qpois(p=(1:12)/13, lambda = monthly.mn) # 13 mos
P.Pweek = qpois(p=(1:51)/52, lambda = weekly.mn) # 52 weeks
P.Pday = qpois(p=(1:363)/364, lambda = daily.mn) # 364 days
table (P.Pmonth); table(P.Pweek); table(P.Pday)
# 3. Simulations: do repeated samples taken from shorter time periods and summed, match Poisson/gamma expectations at longer periods?
n.trials = 1e5
P.month.week = rep(NA,n.trials)
for (i in 1:n.trials) P.month.week[i] = sum(sample(P.Pweek, 4, replace=T)) # Exactly 4 weeks to our months
q.P.month.week = as.vector(quantile(P.month.week, probs = (1:12)/13)); rm(P.month.week)
P.month.day = rep(NA,n.trials)
for (i in 1:n.trials) P.month.day[i] = sum(sample(P.Pday, 28, replace=T))
q.P.month.day = as.vector(quantile(P.month.day, probs = (1:12)/13)); rm(P.month.day)
P.week.day = rep(NA,n.trials)
for (i in 1:n.trials) P.week.day[i] = sum(sample(P.Pday, 7, replace=T))
q.P.week.day = as.vector(quantile(P.week.day, probs = (1:51)/52)); rm(P.week.day)
mw = data.frame(table (P.Pmonth), table(q.P.month.week))[,-3]; colnames(mw)=c("Precip, monthly", "Poisson Expect.", "Aggr., weekly")
md = data.frame(table (P.Pmonth), table(q.P.month.day))[,-3]; colnames(md)=c("Precip, monthly", "Poisson Expect.", "Aggr., daily")
wd = data.frame(table (P.Pweek), table(q.P.week.day))[,-3]; colnames(wd)=c("Precip, weekly", "Poisson Expect.", "Aggr., daily")
mw; md; wd
Answer: Yes, it does exactly.*
Precip, monthly Poisson Exp. Aggr., weekly
1 3 3
2 3 3
3 3 3
4 2 2
5 1 1
Precip, monthly Poisson Exp. Aggr., daily
1 3 3
2 3 3
3 3 3
4 2 2
5 1 1
Precip, weekly Poisson Exp. Aggr., daily
0 26 26
1 18 18
2 6 6
3 1 1
*However, I also evaluated gamma expectations, mainly as a check and/or curiosity (the gamma interpolates between the Poisson’s integer values). I didn’t always get the exact correspondence as expected, and I’m not really sure why. Close, but not close enough to be due to rounding errors, so that’s kind of interesting, but not enough to pursue further.
Funding for this post was provided in equal part by the French Fish Ectoderm and Statistics Foundation and the American Association for Advancement of Amalgamated and Aggregated Associations. These organizations are solely responsible for any errors herein, and associated management related consequences.