By using this site, you agree to our Privacy Policy and our Terms of Use. Close

Forums - Sales Discussion - Statistical Sampling: Why VGChartz is More Accurate Than You Think

The example is interesting .. but also shows one significant flaw. The sample is only based on those that have an internet connection and happen to frequent Gamefaq's on one day. Why does this matter? Well the results only show what that particular subset of the overall gamer population favored. So long as it is clear that Tetris is the most popular puzzle among Gamefaq's visitors you can draw some conclusions - but when you try to extrapolate that out to the gamer population in general you may run into some serious errors based on changes in demographics between the small subset and the larger population.

So while the sample data in the example worked reasonable well at predicting the preference for gamefaq site visitors it would not be a good idea to extrapolate that to say that tetris is the favorite puzzle among all gamers - including those without internet connections, those that have internet but do not frequent gaming sites, those that frequent other gaming sites than gamefaq's, those that frequent non-english sites, those that browse gaming sites only on weekends or other weekdays .. etc.

That is what makes random sampling and larger samples vital. It is quite possible that sales of a console are larger at bestbuy because the "average" shopper at Best Buy fits a certain demographic while sales of a different console tend to be higher at EB/Gamestop for the same reason. Even geography can play a significant role - Best Buy in the North East may sell more of a console that is actually the slowest seller at Best Buy in th south west. Or location urban stores may tend to sell more of one console vs a rural store in the same franchise.

So while small samples can be effective they also lend themselves to bias based on the impact of regional demographics - one measure of that bias can be if similar deviations are noted for multiple periods.

For example - VG had two trends in the console numbers for July, August, and September (when compared to NPD data) in all three months VG estimations were higher for PS3 and lower for the 360 than the NPD numbers. In terms of statistics you would prefer to see both consoles being over and under NPD numbers during the course of a few periods.

Given that NPD data is also an estimation there could be other factors that are putting a bias in the numbers.

Before everyone goes off - I am not saying that VG does a bad job estimating. I am only pointing out some of the issues with small/non random samples.



ioi - "I have always endorsed NPD and have always conceded that their figures are obviously far more accurate than ours ..." - Posted on: 06/14/07, 22:22

http://www.vgchartz.com/news/news.php?id=355