Aielyn said:
That being said, ioi's comment about "within 5%" is a little concerning, if my memory of statistics is accurate (I'm a pure mathematician, always had some trouble with statistics) - if you have estimated data with a 5% margin of error, and then repeat the "experiment" 100 times, then the margin of error of the final estimate should decrease to far less than 5%. In other words, sales numbers after 2 years should have significantly less than a 5% margin of error if each week's sales number has a margin of error of 5%. In real world context, I'd expect it to reduce less significantly, maybe to 2-3%, given that magnitudes decrease with time (so only the first few data points significantly affect the sum).
|
Absolutely, that is a great point and it all depends on the source of the "error". I guess you can think of it as a combination of two things - long-term systematic errors and short-term fluctuations.
If you collected data from 2 million people and used to to estimate the trends for a population of 300 million then on a simple level you'd just take the data you get and multiply it by 150. So to use an example:
Week |
Sample |
Estimate |
Week 1 |
2,547 |
382,050 |
Week 2 |
1,131 |
169,650 |
Week 3 |
944 |
141,600 |
Week 4 |
1,018 |
152,700 |
So this could be our raw data for a particular game and if we just do a basic scale we get the estimated figures. This is obviously where we started 6-7 years ago but over time you start to learn trends - we tend to over-estimate RPGs by 20-30%, we tend to under-estimate FPS games by 10-20% or whatever. Our sample isn't perfect and may contain a higher bias of RPG gamers and a lower bias of FPS gamers so we try to counter this by doing sweeping genre-based adjustments. Then we find there are recurring patterns for platforms, for times of the year etc.
The week-to-week fluctuations can't be improved upon, they a pure function of our sample size vs the whole population and without drastically increasing the sample we can't do much about this (and it becomes a case of diminshing returns in terms of costs / effort vs improvements in accuracy). Generally, these flucuations will cancel out over time so we may be too low one week, too high the next and so on, but you are fluctuating around a centre point that is roughly correct. This is where the long-term systematic errors come in, as our population ages, people move in and out of the sample, their tastes and buying trends change, they move platforms and so on - then our adjustments become outdated and incorrect. The problem is that you only find this out over a few months and then have to go back and make the changes - both to the old data and to the methods moving forwards.
So you can see that the whole thing is a complex and imperfect process, as with any data tracking service that doesn't sample 100% of the market it is reporting on (including the likes of Media Create, NPD, Chart-Track etc). All of the data from all of these sources is an estimate with a possible range of real values that nobody will ever know for definite.