Statistical Sampling: Why VGChartz is More Accurate Than You Think

Sullla

Currently Offline

3,826

808 posts since 13/06/07

Recent Badges:

16 Years Has been a VGChartz member for over 16 years.
10 Years Has been a VGChartz member for over 10 years.
4 Years Has been a VGChartz member for over 4 years.
8 Years Has been a VGChartz member for over 8 years.
A Badge Within A Badge Earned 20 badges.
Open For Business Earned 10 badges.

Sullla on 07 November 2007

With all of the new faces that are likely to show up in light of the recent NPD news, I thought I'd post a thread giving an example of how statistical sampling works. (I also want to have a post to link back to whenever someone insists that the numbers on this site are "made up.") We'll keep this pretty simple, leaving out things like confidence intervals and normal distributions, while focusing on one example.

The premise of statistics is that we can look at a small portion of some large group, and use it to make very accurate predictions about the group as a whole. While there will always be discrepencies with the "real" number in the larger group, we can get an extremely good idea of the overall picture so long as we have a true random sample. Let's look at an example.

Probably everyone here has heard of gamefaqs.com, which conducts a poll each and every day. I snapped a picture of one of their polls right at the start of one day:

This question on favorite puzzle series is a good one to use, because there shouldn't be any fanboy bias to deal with. (This helps us get a random sample.) Notice that there are only 139 responses so far. Think of these numbers as the estimates produced by VGChartz. We have a small random sample of a much, much larger total - basically the same relationship that VGChartz has with retailers and sales.

So how accurate of a picture did this particular small sample end up producing? Here's the same poll at the end of the day:

We now have over 67,000 votes on the same topic. Think of this as the "real" sales data VGChartz is trying to track. The initial sample tracked only 0.2% of the total - that's 1 out of 500! But surprise! The overall picture turns out to be extremely accurate. Our tiny sample correctly indicated that Tetris is by far the most popular game, with all the others trailing behind.

Now if you look closely, you'll see that there are some errors in the sample. Tetris is overestimated (72% to 64%), Bust a Move and Lemmings are undertracked, and Pokemon Puzzle League is noticeably too high. In fact, our sample incorrectly had Pokemon Puzzle League ranked higher than Bust and Move and Lemmings. This is exactly the sort of cherry picking that doubters use to "disprove" VGChartz. But to say this is to miss the forest for the trees; individual elements in a statistical sample can definitely be off, especially when the numbers are close together. Clearly, however, the numbers from our sample were not "made up"; even with our tiny sample, it nailed three games almost exactly (Columns, Mr. Driller, Puyo Pop) and was reasonably close on two more (Tetris, Pokemon Puzzle League). Most importantly, the overall shape of the group is extremely clear from our sample. The three tier structure (Tetris alone, followed by close numbers for Bust A Move/Lemmings/Pokemon Puzzle, and then a trailing group of the other three) of the group immediately jumps out from both graphs.

So while the sampling methods VGChartz uses will often be wrong on the micro level (due to margin of error/confidence interval reasons), it will very rarely be wrong on the macro level. For those who would continue to doubt, try looking at how other samples are put together. You'll find that it's very possible indeed to look at a couple hundred of responses and draw inferences about tens of thousands, or even millions, of pieces of data.

My Website

End of 2008 totals: Wii 42m, 360 24m, PS3 18.5m (made Jan. 4, 2008)

HappySqurriel

Currently Offline

44,640

13774 posts since 07/02/07

Recent Badges:

6 Years Has been a VGChartz member for over 6 years.
13 Years Has been a VGChartz member for over 13 years.
Quite a Comeback Enter your first Prediction League event.
So You Came Back For More, Huh? Logged in a second time.
Littlest Genocide 1,000 posts on the gamrConnect forums.
Open For Business Earned 10 badges.

HappySqurriel on 07 November 2007

VGChartz may have an easier task than that ...

If you have a statistic which you can determine is the 'real' value (or at least an additional statistic you can judge your values against) you can start to figure out where the bias' in your sample come from and attempt to compensate for them.

sinha

Currently Offline

5,686

1715 posts since 24/06/07

Recent Badges:

One Piece at a Time Add your first game to your collection.
3 Years Has been a VGChartz member for over 3 years.
Everything's Falling Into Place Add a total of 100 games to your collection.
2 Years Has been a VGChartz member for over 2 years.
Trust Me, It'll Have Legs 100 replies made to user's most popular thread.
11 Years Has been a VGChartz member for over 11 years.

sinha on 07 November 2007

One problem with this: vgchartz gets numbers from a few places every time, it's not randomly distributed (and different each time) like the first 139 people to vote in a gamefaqs poll. The first 139 people to vote in that poll are probably fairly representative of the 67 thousand who voted, as they are random.

For example, if gamefaqs.com polled 139 specific people every time they had a new poll, from the same 50 different towns throughout the US, that would not be as good for predicting how the entire 67 thousand would have voted. Of course this is true of not only vgchartz but also NPD.

The way to get the numbers more accurate is to increase the sample size (I believe NPD was 60% or so), and hopefully vgchartz is always trying to do this so the numbers get more and more accurate over time.

We don't provide the 'easy to program for' console that they [developers] want, because 'easy to program for' means that anybody will be able to take advantage of pretty much what the hardware can do, so the question is what do you do for the rest of the nine and half years? It's a learning process. - SCEI president Kaz Hirai

It's a virus where you buy it and you play it with your friends and they're like, "Oh my God that's so cool, I'm gonna go buy it." So you stop playing it after two months, but they buy it and they stop playing it after two months but they've showed it to someone else who then go out and buy it and so on. Everyone I know bought one and nobody turns it on. - Epic Games president Mike Capps

We have a real culture of thrift. The goal that I had in bringing a lot of the packaged goods folks into Activision about 10 years ago was to take all the fun out of making video games. - Activision CEO Bobby Kotick

kitler53

Currently Offline

68,546

16375 posts since 03/10/07

Recent Badges:

One Small 'Splosion Author of 100 forum threads.
One Piece at a Time Add your first game to your collection.
50 in One Add a total of 50 games to your collection.
Pon Received 100 wall post comments on gamrConnect.
God Of VGC 10,000 posts on the gamrConnect forums.
Painting A Picture 100 screenshots added to the VGChartz database.

Currently Playing:

Sports Champions (PS3)
Demon's Souls (PS3)

kitler53 on 07 November 2007

statistics 101

Kasz216

Currently Offline

117,816

34826 posts since 29/07/07

Recent Badges:

First Rung Of The Ladder Earned 10,000 gamrPoints
16 Years Has been a VGChartz member for over 16 years.
The Nuts & Bolts Add a total of 25 games to your collection.
The High Flyer Earned 40 badges.
Leaving Limbo 100 posts on the gamrConnect forums.
13 Years Has been a VGChartz member for over 13 years.

Kasz216 on 07 November 2007

Well, i'm pretty sure Ioi likely uses a mathmatical formula based on past history and trends to estimate about what percentage of the market each store "speaks" for on average to get the numbers as well.

Which is more like... your 3rd or 4th class in Statistics... or maybe earlier depending on what kind of major your stats class is in and which statistcal program you use.

Which do you use anyway Ioi? You don't painfully do it by hand i'm assuming. I perfer SAS, but I mean i've only used SAS, SPSS and like... Excell.

nordlead

Currently Offline

70,597

11461 posts since 06/06/07

Recent Badges:

Leaving Limbo 100 posts on the gamrConnect forums.
The Devil's in the Detail Updated 1,000 Games.
Moar Badges! Earned 80 badges.
Everything's Falling Into Place Add a total of 100 games to your collection.
Hit And Run 15 comments posted on VGChartz news articles.
Viva Databasia Rejected or approved 1,000 GameDB jobs.

Currently Playing:

Blue Dragon Plus (DS)
Super Mario Galaxy 2 (Wii)
Pokémon Heart Gold / Soul Silver Version (DS)
Wii Fit Plus (Wii)

nordlead on 07 November 2007

Sullla, very informative, yet easy to read post.

@sinha, while what you say is mostly true, each store will not sell to the same people every time. It may do so some of the time, but not always. I'm not your typical EBGames kind of shopper, as i typically go to Walmart/target, but occasionally i do shop at EB, Best Buy, Circuit City, wherever if its more convenient.

If you drop a PS3 right on top of a Wii, it would definitely defeat it. Not so sure about the Xbox360. - mancandy
In the past we played games. In the future we watch games. - Forest-Spirit
11/03/09 Desposit: Mod Bribery (RolStoppable) vg$ 500.00
06/03/09 Purchase: Moderator Privilege vg$ -50,000.00

Nordlead Jr. Photo/Video Gallery!!! (Video Added 4/19/10)

Game_boy

Currently Offline

1,862

1356 posts since 24/06/07

Recent Badges:

6 Years Has been a VGChartz member for over 6 years.
4 Years Has been a VGChartz member for over 4 years.
So You Came Back For More, Huh? Logged in a second time.
13 Years Has been a VGChartz member for over 13 years.
Site Veteran Has been a VGChartz member for over 5 years.
Trust Me, It'll Have Legs 100 replies made to user's most popular thread.

Game_boy on 07 November 2007

The random sample in the first picture may not be as random as you think - time zones will exert a significant bias as different parts of the world become the most active on the internet. Allowing for this, your sample would be extremely accurate.

Ubuntu. Linux for human beings.

If you are interested in trying Ubuntu or Linux in general, PM me and I will answer your questions and help you install it if you wish.

DMeisterJ

Currently Offline

81,484

15197 posts since 26/04/07

Recent Badges:

17 Years Has been a VGChartz member for over 17 years.
Leaving Limbo 100 posts on the gamrConnect forums.
14 Years Has been a VGChartz member for over 14 years.
13 Years Has been a VGChartz member for over 13 years.
2 Years Has been a VGChartz member for over 2 years.
First Rung Of The Ladder Earned 10,000 gamrPoints

DMeisterJ on 07 November 2007

Nice post that puts the whole "Where do VGChartz numbers come from/VGChartz numbers are wrong" to rest. ThX!

kn

Currently Offline

3,514

2047 posts since 26/01/07

Recent Badges:

Making Friends 10 friends on gamrConnect.
Hit And Run 15 comments posted on VGChartz news articles.
11 Years Has been a VGChartz member for over 11 years.
4 Years Has been a VGChartz member for over 4 years.
3 Years Has been a VGChartz member for over 3 years.
6 Years Has been a VGChartz member for over 6 years.

kn on 07 November 2007

I posted something similar yesterday and I feel that a sticky with the "where do VG Chartz numbers come from" that contains this information would be essential... I would add a little more detail on confidence intervals as it is prudent to point out that a smaller sample sizes generally provide a larger interval (margin of error) around the mean than larger samples do. It bears noting that to get a 100% confidence interval -- i.e. our "number" has a +- of ZERO, we are basically saying we have the EXACT number which requires a 1:1 sampling rate... Not possible in the real world and not cost effective, either... That helps illustrate why we use sampling/statistics in the first place... It also helps illustrate that NPD, Media Create, Famitsu, and others are also subject to error as the don't sample 100%, either...

I hate trolls.

Systems I currently own: 360, PS3, Wii, DS Lite (2)
Systems I've owned: PS2, PS1, Dreamcast, Saturn, 3DO, Genesis, Gamecube, N64, SNES, NES, GBA, GB, C64, Amiga, Atari 2600 and 5200, Sega Game Gear, Vectrex, Intellivision, Pong. Yes, Pong.

famousringo

Currently Offline

40,256

9853 posts since 04/08/07

Recent Badges:

15 Years Has been a VGChartz member for over 15 years.
Mirror Image Awarded for uploading an avatar.
7 Years Has been a VGChartz member for over 7 years.
One Piece at a Time Add your first game to your collection.
Trust Me, It'll Have Legs 100 replies made to user's most popular thread.
2 Years Has been a VGChartz member for over 2 years.

Currently Playing:

Henry Hatsworth in the Puzzling Adventure (DS)
Valkyrie Profile: Covenant of the Plume (DS)
Punch-Out!! (Wii)
Boom Blox Bash Party (Wii)

famousringo on 07 November 2007

Excellent post. Though you'd think more people would have this basic understanding...

"The worst part about these reviews is they are [subjective]--and their scores often depend on how drunk you got the media at a Street Fighter event." — Mona Hamilton, Capcom Senior VP of Marketing
*Image indefinitely borrowed from BrainBoxLtd without his consent.

Existing User Log In

New User Registration

Forums - Sales Discussion - Statistical Sampling: Why VGChartz is More Accurate Than You Think

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Currently Playing:

Recent Badges:

Recent Badges:

Currently Playing:

If you drop a PS3 right on top of a Wii, it would definitely defeat it. Not so sure about the Xbox360. - mancandy
In the past we played games. In the future we watch games. - Forest-Spirit
11/03/09 Desposit: Mod Bribery (RolStoppable) vg$ 500.00
06/03/09 Purchase: Moderator Privilege vg$ -50,000.00

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Currently Playing:

Existing User Log In

New User Registration

Forums - Sales Discussion - Statistical Sampling: Why VGChartz is More Accurate Than You Think

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Currently Playing:

Recent Badges:

Recent Badges:

Currently Playing:

If you drop a PS3 right on top of a Wii, it would definitely defeat it. Not so sure about the Xbox360. - mancandyIn the past we played games. In the future we watch games. - Forest-Spirit11/03/09 Desposit: Mod Bribery (RolStoppable) vg$ 500.0006/03/09 Purchase: Moderator Privilege vg$ -50,000.00

Recent Badges:

Recent Badges:

Recent Badges:

Recent Badges:

Currently Playing:

If you drop a PS3 right on top of a Wii, it would definitely defeat it. Not so sure about the Xbox360. - mancandy
In the past we played games. In the future we watch games. - Forest-Spirit
11/03/09 Desposit: Mod Bribery (RolStoppable) vg$ 500.00
06/03/09 Purchase: Moderator Privilege vg$ -50,000.00