Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

How to quantify "married people are less likely than the average person to smoke", "smokers are less likely than the average person to be married"?

+0
−0

When I first saw inequalities (1) and (2) below, I quantified them as:

$\color{red}{\text{1.1. married people < all smokers}}$

$\color{red}{\text{2.1. and married smokers < all married people}}$ .

My guesses are wrong because I'm comparing quantities, but the author below compares fractions. But why are my guesses wrong?

      When you’re comparing two binary variables, correlation takes on a particularly simple form. To say that marital status and smoking status are negatively correlated, for example, is simply to say that married people are less likely than the average person to smoke. Or, to put it another way, smokers are less likely than the average person to be married. It’s worth taking a moment to persuade yourself that those two things are indeed the same! The first statement can be written as an inequality

$\color{limegreen}{\text{married smokers / all married people < all smokers / all people}}$ Why doesn't $\tag{1}$ work?

and the second as

$\color{limegreen}{\text{married smokers / all smokers < all married people / all people}}$ Why doesn't $\tag{2}$ work?

      If you multiply both sides of each inequality by the common denominator (all people) × (all smokers) you can see that the two statements are different ways of saying the same thing:

(married smokers) × (all people) < (all smokers) × (all married people) Why doesn't $\tag{3}$ work?

      In the same way, if smoking and marriage were positively correlated, it would mean that married people were more likely than average to smoke and smokers more likely than average to be married.

Ellenberg, How Not to Be Wrong (2014), pages 347-8.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+0
−0

Let's consider 2.1 first. You said that married smokers < married people. That is always true (unless every married person smokes, in which case the two would be equal), because anyone who is a "married smoker" is also a "married person". But since it is always true, it doesn't really give any useful information.

However, in the actual equation (2), the denominators are different. The left side is dividing by all smokers, and the right side is dividing by all people. Since not everyone smokes, those are different. Equation 2.1 drops the two denominators, which is a wrong step because the denominators are not equal.

Equation 1.1 seems to have a different error; it's trying to compare marriage and smoking to each other. We aren't given anything about this. It's possible, for instance, that 50% are married, 10% of married people smoke, and 30% of non-married people smoke. Then married > smokers, but now married smokers / married people is 10%, and all smokers / all people is 20% (check this yourself).

I wonder if you meant to write married smokers < all smokers, which is the same kind of error as for equation 2.1 if you switch the two categories.


For trying to intuit the equations, I personally think percentages are easiest. But if you find comparing numbers of people easier, I might suggest using married smokers < married * (smokers / all people). You can think of the right side as the "naive guess" of how many married people are smokers, if you used the fraction of all people who smoke. The inequality, then, says that the actual number of married smokers is less than the number who would smoke if the two categories were independent.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

General comments (4 comments)

Sign up to answer this question »