Q&A

# Why mustn't the proportion of smokers among married people be the same as the proportion of smokers in the whole population?

+0
−0

Please see the embolded phrase below. When I read this for the first time, I didn't see this problem at all, and this problem didn't present itself immediately to me. After rereading this four times, I still don't understand this immediate problem!

If you multiply both sides of each inequality by the common denominator (all people) × (all smokers) you can see that the two statements are different ways of saying the same thing:

(married smokers) × (all people) < (all smokers) × (all married people) Why doesn't $\tag{3}$ work?

In the same way, if smoking and marriage were positively correlated, it would mean that married people were more likely than average to smoke and smokers more likely than average to be married.
One problem presents itself immediately. Surely the chance is very small that the proportion of smokers among married people is exactly the same as the proportion of smokers in the whole population. So, absent a crazy coincidence, marriage and smoking will be correlated, either positively or negatively. And so will sexual orientation and smoking, U.S. citizenship and smoking, first-initial-in-the-last-half-of-the-alphabet and smoking, and so on. Everything will be correlated with smoking, in one direction or the other. It’s the same issue we encountered in chapter 7; the null hypothesis, strictly speaking, is just about always false.

Ellenberg, How Not to Be Wrong (2014), page 348.

Why does this post require moderator attention?
Why should this post be closed?

+0
−0

The key word is exactly. If I flip a fair coin, I expect about half of my flips to be heads. If I flip it twice, exactly one head is quite likely. If I flip it twenty times, exactly ten heads is not that unusual, but still a little lucky maybe. If I flip that coin one million times and get exactly 500,000 heads, well, that's quite unlikely indeed (the probability is about 0.08%, as it happens). Getting exactly the expected number of a random binary event gets less and less likely the larger the population gets.

If you think of being a smoker or not as a random binary event, like the coin flip, then the expected fraction of smokers in the married population might be equal to the (actual) fraction of smokers in the general population, but the chance of actually having the exact number of married smokers that would make those fractions equal is very small when discussing populations in the millions.

Why does this post require moderator attention?