Outcome probabilities

Continuing from the previous post., where I discussed earth’s recent surface temperature increase hiatus/slowdown/backoff/vacation

Well not really–I discussed instead the very closely related topic of enumerating the outcomes of a given probabilistic process. And actually not so much a discussion as a monologue. But couldn’t somebody please address that other issue, it’s just been badly neglected… 🙂

Anyway, enumerating the possible allocations of n objects into q groups is rarely an end in itself; the probability, p of each is usually what we want. This is a multinomial probability (MP) problem when q > 2, and binomial (BP) when q = 2, in which we know apriori the per-trial p values and want to determine probabilities of the various possible outcomes over some number, n, of such trials. In the given example, the per-trial probabilities of group membership are all equal (1/6) and we want to know the probability of each possible result from n = 15 trials.

One has to be careful in defining exactly what “trials” and “sample sizes” constitute in these models though, because the number of trials can be nested. We could for example, conduct n2 = 100 higher level trials, in each of which the results from n1 = 2 lower level trials are combined. This is best exemplified by Hardy-Weinberg analysis in population genetics; a lower level trial consists of randomly choosing n1 = 2 alleles from the population and combining them into a genotype. This is repeated n2 times and the expected genotype frequencies, under certain assumption of population behavior, are then compared to the observed, to test whether the assumptions are likely met or not. If only two possible outcomes of a single trial (alleles in this case) exist in the population, the model is binomial, and if more than two, multinomial.

There are two types of MP/BP models, corresponding to whether or not group identity matters. When it does, the BP/MP coefficients determine the expected probabilities of each specific outcome. For n objects, q groups and group sizes a through f, these equal the number of permutations, as given by n! / (a!b!c!d!e!f!), where “!” is the factorial operator and 0! = 1 by definition. This formula is extremely useful; without it we’d have to enumerate all permutations of a given BP/MP process. And this would choke us quickly, as such values become astronomical in a hurry: with n = 15 and q = 6, we already have 6^15 = 470 billion possible permutations.

When group identity doesn’t matter, only the numerical distribution matters, and this will decrease the total number of coefficients but increase the value of each of them. For example, in locating the n = 50 closest trees to a random sampling point, one may want to know only the expected numerical distribution across the q angle sectors around the point. In that case, the allocation [2,1,1,0] into groups [a,b,c,d] would be identical to [0,1,1,2] and to 10 others, and these thus have to be aggregated. The number of aggregations is given by the number of permutations of the observed group sizes, which in turn depends on their variation. When all differ, e.g. [0,1,2,3,4,5] for n = 15, the number of equivalent outcomes is maximized, equaling q + (q-1) + (q-2)…+ 1 (in this case, 21) [Edit: oops, confused that with what follows; the number of permutations there is given by q!]. When some but not all of the group sizes are identical it’s more complex, as determined by products of factorials and permutations. When the number of identical group sizes is maximized, the equivalent outcomes are minimized, always at either q-1 or 1. In this case there are 6 variations of [n,0,0,0,0,0].

To get the desired probability density function, the raw MP/BP coefficients, obtained by either method, are standardized to sum to 1.0.

Next time I’m going to discuss a general solution to the problem of estimating the otherwise unknown relationships between pairs of objects involved in a rate process, such as density, the number of objects per unit area or time. These can be deduced analytically using multinomial and gamma probability models in conjunction.

It will make you call your neighbors over for a party and group discussion. If it doesn’t you get a full refund.


Have at it

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s