## Forums - Gaming Discussion - ioi speaks out about ergh "VGC analysts"

I dont understand the post about error percentages, bell curve analysis, and probability. To me, it seems like a weak attempt at an excuse for publishing bad numbers.  You say you're not "wrong" if you publish 600k and VGC says 485k. That's a 24% error, and a difference of 115k units! How is that even acceptable? As mentioned, it only gets worse as numbers scale higher. A 20% error of 2M is 400k units. That is definitely not meaningless. You're validation for this is that NPD has a margin of error as well, and does the same thing that VGC does? Naturally there's error, but it's going to be much smaller.

And when I look at actual charts, if you want to say the numbers can't be wrong and you're just using probabilities, why are you even publishing these ridiculously precise numbers. What's the difference between saying one game sold 238,854 and another selling 241,913? Heck, what's the point of even publishing the high figures this site does if you're saying they can fall in such a large range? It doesn't take much thought to know a game like GTA is going to sell in the multi-millions. If you're saying the numbers can't be wrong because they fall within a decent portion of a standard distribution, despite being off by a couple million, what's the point?

Around the Network
 Aielyn said: That being said, ioi's comment about "within 5%" is a little concerning, if my memory of statistics is accurate (I'm a pure mathematician, always had some trouble with statistics) - if you have estimated data with a 5% margin of error, and then repeat the "experiment" 100 times, then the margin of error of the final estimate should decrease to far less than 5%. In other words, sales numbers after 2 years should have significantly less than a 5% margin of error if each week's sales number has a margin of error of 5%. In real world context, I'd expect it to reduce less significantly, maybe to 2-3%, given that magnitudes decrease with time (so only the first few data points significantly affect the sum).

Absolutely, that is a great point and it all depends on the source of the "error". I guess you can think of it as a combination of two things - long-term systematic errors and short-term fluctuations.

If you collected data from 2 million people and used to to estimate the trends for a population of 300 million then on a simple level you'd just take the data you get and multiply it by 150. So to use an example:

 Week Sample Estimate Week 1 2,547 382,050 Week 2 1,131 169,650 Week 3 944 141,600 Week 4 1,018 152,700

So this could be our raw data for a particular game and if we just do a basic scale we get the estimated figures. This is obviously where we started 6-7 years ago but over time you start to learn trends - we tend to over-estimate RPGs by 20-30%, we tend to under-estimate FPS games by 10-20% or whatever. Our sample isn't perfect and may contain a higher bias of RPG gamers and a lower bias of FPS gamers so we try to counter this by doing sweeping genre-based adjustments. Then we find there are recurring patterns for platforms, for times of the year etc.

The week-to-week fluctuations can't be improved upon, they a pure function of our sample size vs the whole population and without drastically increasing the sample we can't do much about this (and it becomes a case of diminshing returns in terms of costs / effort vs improvements in accuracy). Generally, these flucuations will cancel out over time so we may be too low one week, too high the next and so on, but you are fluctuating around a centre point that is roughly correct. This is where the long-term systematic errors come in, as our population ages, people move in and out of the sample, their tastes and buying trends change, they move platforms and so on - then our adjustments become outdated and incorrect. The problem is that you only find this out over a few months and then have to go back and make the changes - both to the old data and to the methods moving forwards.

So you can see that the whole thing is a complex and imperfect process, as with any data tracking service that doesn't sample 100% of the market it is reporting on (including the likes of Media Create, NPD, Chart-Track etc). All of the data from all of these sources is an estimate with a possible range of real values that nobody will ever know for definite.

I don't think the 5% margin is concerning. He said they update accordingly but he is right that 5% is small. Yes it gets a little more tricky as the numbers get larger but it is still pretty close to the actual number.

They offer a good estimation, and all the complaining is silly... the only problem you could argue is LTD being too off after several years of sale, or when official data contradicts this site... both are kinda rare... but bias will make some state over or under tracked to fit their agenda.

duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about \$50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Azzanation: "PS5 wouldn't sold out at launch without scalpers."

 MaskedBandit2 said: I dont understand the post about error percentages, bell curve analysis, and probability. To me, it seems like a weak attempt at an excuse for publishing bad numbers.  You say you're not "wrong" if you publish 600k and VGC says 485k. That's a 24% error, and a difference of 115k units! How is that even acceptable? As mentioned, it only gets worse as numbers scale higher. A 20% error of 2M is 400k units. That is definitely not meaningless. You're validation for this is that NPD has a margin of error as well, and does the same thing that VGC does? Naturally there's error, but it's going to be much smaller. And when I look at actual charts, if you want to say the numbers can't be wrong and you're just using probabilities, why are you even publishing these ridiculously precise numbers. What's the difference between saying one game sold 238,854 and another selling 241,913? Heck, what's the point of even publishing the high figures this site does if you're saying they can fall in such a large range? It doesn't take much thought to know a game like GTA is going to sell in the multi-millions. If you're saying the numbers can't be wrong because they fall within a decent portion of a standard distribution, despite being off by a couple million, what's the point?

It's not an excuse, it is an explanation. Read my last post before this one - we take data from a sample population and scale it up to represent data from the whole population. Given variances in what the sample does compared to the whole population, there will be a bell-curve probability of the real values around our estimated one. The further you go from the estimate, the less likely you are to get that value.

Roughly speaking for the USA, we are using data from ~2m people to represent what the entire population are doing. Now a sample of 2 million people is enormous but even so it is less than 1% of the entire population and if for some reason we have bias towards particular regions, ethnic groups, age ranges, household incomes, genders and so on then our data will be an imperfect sample.

As for publishing data to the nearest unit - that is common practice. 238,854 doesn't mean that we have personally tracked exactly 238,854 sales of something - it means in reality that we may have tracked 1571 sales of something and via various scaling methods and adjustments have arrived at that figure as our best estimate of the sales of that product - which represents the centre of the bell curve.

Around the Network

Hello ioi im your number 1 fan   :D

ioi said:
 MaskedBandit2 said: I dont understand the post about error percentages, bell curve analysis, and probability. To me, it seems like a weak attempt at an excuse for publishing bad numbers.  You say you're not "wrong" if you publish 600k and VGC says 485k. That's a 24% error, and a difference of 115k units! How is that even acceptable? As mentioned, it only gets worse as numbers scale higher. A 20% error of 2M is 400k units. That is definitely not meaningless. You're validation for this is that NPD has a margin of error as well, and does the same thing that VGC does? Naturally there's error, but it's going to be much smaller. And when I look at actual charts, if you want to say the numbers can't be wrong and you're just using probabilities, why are you even publishing these ridiculously precise numbers. What's the difference between saying one game sold 238,854 and another selling 241,913? Heck, what's the point of even publishing the high figures this site does if you're saying they can fall in such a large range? It doesn't take much thought to know a game like GTA is going to sell in the multi-millions. If you're saying the numbers can't be wrong because they fall within a decent portion of a standard distribution, despite being off by a couple million, what's the point?

It's not an excuse, it is an explanation. Read my last post before this one - we take data from a sample population and scale it up to represent data from the whole population. Given variances in what the sample does compared to the whole population, there will be a bell-curve probability of the real values around our estimated one. The further you go from the estimate, the less likely you are to get that value.

Roughly speaking for the USA, we are using data from ~2m people to represent what the entire population are doing. Now a sample of 2 million people is enormous but even so it is less than 1% of the entire population and if for some reason we have bias towards particular regions, ethnic groups, age ranges, household incomes, genders and so on then our data will be an imperfect sample.

As for publishing data to the nearest unit - that is common practice. 238,854 doesn't mean that we have personally tracked exactly 238,854 sales of something - it means in reality that we may have tracked 1571 sales of something and via various scaling methods and adjustments have arrived at that figure as our best estimate of the sales of that product - which represents the centre of the bell curve.

The guy was probably taking a shot at you...

But I would agree that if its an estimative it would be easier to see as round numbers 240k... and maybe include a +-% of error in front of the number.

duduspace11 "Well, since we are estimating costs, Pokemon Red/Blue did cost Nintendo about \$50m to make back in 1996"

http://gamrconnect.vgchartz.com/post.php?id=8808363

Mr Puggsly: "Hehe, I said good profit. You said big profit. Frankly, not losing money is what I meant by good. Don't get hung up on semantics"

http://gamrconnect.vgchartz.com/post.php?id=9008994

Azzanation: "PS5 wouldn't sold out at launch without scalpers."

 MaskedBandit2 said: I dont understand the post about error percentages, bell curve analysis, and probability. To me, it seems like a weak attempt at an excuse for publishing bad numbers.  You say you're not "wrong" if you publish 600k and VGC says 485k. That's a 24% error, and a difference of 115k units! How is that even acceptable? As mentioned, it only gets worse as numbers scale higher. A 20% error of 2M is 400k units. That is definitely not meaningless. You're validation for this is that NPD has a margin of error as well, and does the same thing that VGC does? Naturally there's error, but it's going to be much smaller. And when I look at actual charts, if you want to say the numbers can't be wrong and you're just using probabilities, why are you even publishing these ridiculously precise numbers. What's the difference between saying one game sold 238,854 and another selling 241,913? Heck, what's the point of even publishing the high figures this site does if you're saying they can fall in such a large range? It doesn't take much thought to know a game like GTA is going to sell in the multi-millions. If you're saying the numbers can't be wrong because they fall within a decent portion of a standard distribution, despite being off by a couple million, what's the point?

Bye then.

ioi said:
 MaskedBandit2 said: I dont understand the post about error percentages, bell curve analysis, and probability. To me, it seems like a weak attempt at an excuse for publishing bad numbers.  You say you're not "wrong" if you publish 600k and VGC says 485k. That's a 24% error, and a difference of 115k units! How is that even acceptable? As mentioned, it only gets worse as numbers scale higher. A 20% error of 2M is 400k units. That is definitely not meaningless. You're validation for this is that NPD has a margin of error as well, and does the same thing that VGC does? Naturally there's error, but it's going to be much smaller. And when I look at actual charts, if you want to say the numbers can't be wrong and you're just using probabilities, why are you even publishing these ridiculously precise numbers. What's the difference between saying one game sold 238,854 and another selling 241,913? Heck, what's the point of even publishing the high figures this site does if you're saying they can fall in such a large range? It doesn't take much thought to know a game like GTA is going to sell in the multi-millions. If you're saying the numbers can't be wrong because they fall within a decent portion of a standard distribution, despite being off by a couple million, what's the point?

It's not an excuse, it is an explanation. Read my last post before this one - we take data from a sample population and scale it up to represent data from the whole population. Given variances in what the sample does compared to the whole population, there will be a bell-curve probability of the real values around our estimated one. The further you go from the estimate, the less likely you are to get that value.

Roughly speaking for the USA, we are using data from ~2m people to represent what the entire population are doing. Now a sample of 2 million people is enormous but even so it is less than 1% of the entire population and if for some reason we have bias towards particular regions, ethnic groups, age ranges, household incomes, genders and so on then our data will be an imperfect sample.

As for publishing data to the nearest unit - that is common practice. 238,854 doesn't mean that we have personally tracked exactly 238,854 sales of something - it means in reality that we may have tracked 1571 sales of something and via various scaling methods and adjustments have arrived at that figure as our best estimate of the sales of that product - which represents the centre of the bell curve.

Then why even publish 238,854.  Your original post says if you put a number (600k) that doesn't mean it sold 600k, but rather, it's an estimate, thought of as a probability.  If you have two close numbers like what I said, why would you not say both as 240k, as they basically have the same probability to be off especially since the small differences are just likely statistical noise.  It comes across as misleading.  Why even rank the sales?

if only we were arguing about 5%...

i monitor the number really, really close the adjustments i see are more like:
Japan : 0%
America: 20 - 100%
Europe: 50 - 200%