Originally Posted by denton
Originally Posted by Jordan Smith
Originally Posted by denton
Originally Posted by Jordan Smith
Originally Posted by denton
The distribution of group sizes is not normal. It looks like a Normal Distribution that has been pushed to the left, with a long tail to the right. So the 68% rule for plus or minus one standard deviation isn't quite right, but it's also not far off. Also, there is no Central Limit Theorem for any measure of dispersion (range, SD, group size), so collections of data do not tend toward normality.
denton,

Could you elaborate on your reasoning here?

It's just a mathmatical truth.

If you are taking inteval/ratio data such as FPS, peak pressure, millimeters, etc., then Central Limit kicks in and the Distribution of Means will have a strong tendency toward normality. That is very convenient for users of the T Test and ANOVA because you don't usually have to worry much about the normality of the data, and the Standard Error of the Mean converges pretty quickly.

Switch to any measure of dispersion, and it's a different world. There is no tendency toward normality. The Distribution of Means looks just as awful as the raw data, and separating normal random variation from real change takes a lot bigger sample. If you're terminally curious, I could scan a page or two out of a text and post it for you.

So for interval/ratio data, we use T and ANOVA. For SD we use F, Bartlett, or Levene's Test.

I was also curious about the underlying distribution. Can you elaborate on that?

In terms of the CLT, I don't follow the reasoning. It seems to me that group size can be considered an independent random variable, in itself, with some underlying distribution. Random sampling of group size, regardless of its underlying distribution, should follow a Gaussian distribution as the number of samples tends to infinity. As least that's how it seems to me, but I'd be interested to understand this better if I'm wrong.

The fundamental issue is that all measures of dispersion are differences between data points. Data points behave as we have come to expect. Differences do not. They do not like to be cornered and made to tell the truth. Skewness and kurtosis are even worse. You need thousands of data to get a good grip on those parameters.

For an article I was doing, I created a 20000 shot simulation. With that, I got a very good estimate of the group size distribution. It looks a lot like the distribution of standard deviations: a normal distribution that has been pushed to the left.
Out of curiosity, I just coded up a similar simulation using 100,000 shots divided into 5-shot groups. The individual shot POI was modelled using a Gaussian distribution. The group size, when defined as the maximum distance between two shots in a group, looked as you described with a skewness of ~0.41. Interestingly, when group size is defined as the mean distance between pairs of shots in a group, the distribution is more normal with skewness of ~0.28.