Why mustn't the proportion of smokers among married people be the same as the proportion of smokers in the whole population?
Please see the embolded phrase below. When I read this for the first time, I didn't see this problem at all, and this problem didn't present itself immediately to me. After rereading this four times, I still don't understand this immediate problem!
If you multiply both sides of each inequality by the common denominator (all people) × (all smokers) you can see that the two statements are different ways of saying the same thing:
(married smokers) × (all people) < (all smokers) × (all married people) Why doesn't
In the same way, if smoking and marriage were positively correlated, it would mean that married people were more likely than average to smoke and smokers more likely than average to be married.
One problem presents itself immediately. Surely the chance is very small that the proportion of smokers among married people is exactly the same as the proportion of smokers in the whole population. So, absent a crazy coincidence, marriage and smoking will be correlated, either positively or negatively. And so will sexual orientation and smoking, U.S. citizenship and smoking, first-initial-in-the-last-half-of-the-alphabet and smoking, and so on. Everything will be correlated with smoking, in one direction or the other. It’s the same issue we encountered in chapter 7; the null hypothesis, strictly speaking, is just about always false.
Ellenberg, How Not to Be Wrong (2014), page 348.