By using this site, you agree to our Privacy Policy and our Terms of Use. Close
MaskedBandit2 said:
ioi said:
MaskedBandit2 said:

I dont understand the post about error percentages, bell curve analysis, and probability. To me, it seems like a weak attempt at an excuse for publishing bad numbers.  You say you're not "wrong" if you publish 600k and VGC says 485k. That's a 24% error, and a difference of 115k units! How is that even acceptable? As mentioned, it only gets worse as numbers scale higher. A 20% error of 2M is 400k units. That is definitely not meaningless. You're validation for this is that NPD has a margin of error as well, and does the same thing that VGC does? Naturally there's error, but it's going to be much smaller.

And when I look at actual charts, if you want to say the numbers can't be wrong and you're just using probabilities, why are you even publishing these ridiculously precise numbers. What's the difference between saying one game sold 238,854 and another selling 241,913? Heck, what's the point of even publishing the high figures this site does if you're saying they can fall in such a large range? It doesn't take much thought to know a game like GTA is going to sell in the multi-millions. If you're saying the numbers can't be wrong because they fall within a decent portion of a standard distribution, despite being off by a couple million, what's the point?

It's not an excuse, it is an explanation. Read my last post before this one - we take data from a sample population and scale it up to represent data from the whole population. Given variances in what the sample does compared to the whole population, there will be a bell-curve probability of the real values around our estimated one. The further you go from the estimate, the less likely you are to get that value.

Roughly speaking for the USA, we are using data from ~2m people to represent what the entire population are doing. Now a sample of 2 million people is enormous but even so it is less than 1% of the entire population and if for some reason we have bias towards particular regions, ethnic groups, age ranges, household incomes, genders and so on then our data will be an imperfect sample.

As for publishing data to the nearest unit - that is common practice. 238,854 doesn't mean that we have personally tracked exactly 238,854 sales of something - it means in reality that we may have tracked 1571 sales of something and via various scaling methods and adjustments have arrived at that figure as our best estimate of the sales of that product - which represents the centre of the bell curve.

Then why even publish 238,854.  Your original post says if you put a number (600k) that doesn't mean it sold 600k, but rather, it's an estimate, thought of as a probability.  If you have two close numbers like what I said, why would you not say both as 240k, as they basically have the same probability to be off especially since the small differences are just likely statistical noise.  It comes across as misleading.  Why even rank the sales?

Cumulative numbers will be quite a bit off if you round the numbers at every turn.