Comments on The effect of measurement accuracy and rounding on hypothesis testing
Parent
The effect of measurement accuracy and rounding on hypothesis testing
I am checking the temperature I have at home with accuracy of one tenth of a grade (Celsius). The easily publicly available information is temperature in grades, with no decimals.
I am doing very basic hypothesis testing: My null hypothesis is that temperatures I measure do not have a systematic bias upwards or downwards from the public data, while my alternative hypothesis is that the temperatures I measure are systematically higher or systematically lower. I am doing a simple T-test and checking if the average of the differences is far from zero.
Does it make a difference whether I round my own measurements to integers, or, (essentially equivalently,) round the differences to integers? In particular, is there a bias in a particular direction, towards throwing away the null hypothesis or the opposite, if my data has more accuracy then what I am comparing it to?
Notes
This project breaks assumptions of hypothesis testing; at least independence of measurements and possibly the normal distribution of the differences.
The point of the question is not these, but rather the possible effect of rounding or different precision in the ground truth data and the measurements on the hypothesis test.
Post
Your measurements are more precise, which is actually a good thing. Rounding them to match the less-precise public data might feel like you’re making things consistent, but what you’re really doing is tossing out useful info. And when you’re running a hypothesis test, that extra decimal can matter—especially if the differences you’re trying to detect are subtle.
Here’s the deal: If your data is pretty tight (like, low variability), then rounding could mess things up more. But if there’s a lot of natural fluctuation (like temps bouncing all over the place), then rounding might not change much. Still, why risk it?
Best move? Keep your 0.1°C data as-is for the actual stats and analysis. You can always round later when you’re showing results to someone who doesn’t need the fine-grained details.
If you’re really worried about fairness, you can even treat the public data like it represents a range—like if it says 22°C, maybe it actually means somewhere between 21.5 and 22.5°C. That way, you can still work with your full-precision data and just adjust how you interpret the comparison.
0 comment threads