3:16 is interesting and showcases what I was talking about earlier about that matter. If there's this much deviation across identical bench tests, you can imagine across different ones. It's the similar results that become suspicious at this rate, not the divergences.
That being said, if they're working to be extra careful with typos, etc. and double-checking for eventual errors, all the better for the viewers.