Aielyn said:
I'm guessing you haven't studied statistics. Sampling works exactly as ioi has described, and the resulting numbers are typically scaled up directly (plus adjustments for known biases), rather than rounding roughly. Why? Because every time you round, you lose further information. And then when you sum over a lot of numbers, the error gets worse. Sampling error is bad enough as it is, you don't want to make it worse with rounding. Sampling error itself follows a fairly simple rule. To work out the margin of error for a 95% confidence interval, for a large population, you can approximate it using 0.98/sqrt(n), where n is the number of samples. So, for that BARB thing, a sample of 11,000 will provide a margin of error of 0.93%. But they'll still note that Strictly Come Dancing got 11.07 million, despite the 0.07 being within the margin of error. And they'll still list The One Show on Tuesday having higher ratings than The One Show on Monday despite the difference being 0.01 million, well below the margin of error. The reason is that people like things ranked. It's not the purpose of the site to rank them, but to provide the numbers. The ranking is for regular users, and makes for a bit more enjoyment from debating the numbers. |
Oh, I have studied statistics somewhat. What I'm saying is that the starting data is so small and insignificant, you cannot rationally expand it out to the nearest unit (or even 100s of units, possibly even 1000s) and rank titles. The data is just not there to support it. Unless I'm wrong, I have to imagine the differences in many of the reported publishings start out as miniscule. There is too much uncertainty to say one title sold X,XXX,XXX units. That's why the numbers are inherently misleading - because you don't have the data to support the precise findings. If you want to make the numbers seem like an estimate, as already stated, a better publishing would be to round the numbers. The numbers themselves would then imply doubt and imply a range of possible values. I think it would help credibility. Whether or not you want to keep the precise numbers in use internally for weekly additions or not, I don't know. But to me, the data collected is too small to measure out on a global scale.
ioi said:
What would your suggestion be as to how we do the top 20 for this chart then for example - http://www.vgchartz.com/weekly/41623/Global/? And yes, we do get data for more than 5000 unique titles each week. Most of the process is automated which is why errors can sometimes creep in with games being incorrectly combined, showing data before they are actually released (mistakedly counting pre-order data as sell-through) and so on but yes, we do collect raw data each week for more than 5000 games and extrapolate it all up. |
I honestly don't know what to do with a chart like that. You could possibly still "rank" them, but by using more general numbers, there would be a lot more "ties." What you'd end up with is a chart with very clear estimated numbers that shows a likely, general overview of the market. I really don't know.
The data gathering explanation page, though, I think, would go a long way in establishing credibility. Right now, the methodology page is way too vague. I'm just going to say right now that by reading it, I don't get a very positive impression of the methods and the numbers you're finding, and I'm probably not the only one. How many retail partners do you have? How many stores are you sampling data from? Are these worldwide? How do you represent regional differences? Do you have a wide range of stores representing different market interests? How many end users do you poll? When do you poll them? What's the demographic of these end users? How are you controlling bias?
My answer to all of those is negative. Because you don't explain and answer possible questions, I automatically assume it's because it would damage your repuation. So, I assume you don't have many retail partners. You don't collect data from very many stores, especially many different ones on a widespread, regional level. There aren't many end users surveyed. The ones that are polled are likely video game enthusiasts, likely frequenting this site. It seems very disorderly and not on a systematic level. Whether any of that's true or not, I don't know. But that's the impression I get. So, I think, expanding on that page would go a long way in helping the site.
EDIT: This took awhile, and I got behind. I'll answer your other post in a bit.







