The reason why flops isn't a good comparison measure is because there's a lot of different factors that go into how well a GPU/CPU is performing.
You could have two CPU's, one that has multiplication as a single FLOP, and another that has multiplication set up as multiple additions instead. If both CPUs are capable of multiplying two numbers in one second, one CPU would do the multiplication as 16 FLOPs, and the other would do it as 1 FLOP. It's doing the exact same work, but the way that the cycles for the operations are split up is different, so they could get counted differently.
One GPU could be faster than another GPU with certain operations and slower with some others. Even if two GPUs were pretty similar, they could still be breaking down what a FLOP is differently.
That's a bit of a silly example, but I think it's a reasonable ELI5 example that I don't think is horribly misleading.