Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Comments on Strange behavior in elections and pie charts

Parent

Strange behavior in elections and pie charts

+2
−1

So, a friend asked me the probability for a candidate to get at least 50% of the total votes in an election consisting of 5 candidates (let's pretend everyone picks at random 🙂). I thought for a bit and then presented an answer:

Let's first generalize so that the election has $n$ candidates.

We can think of the votes cast between candidates as a pie chart with $n$ slices and $n$ borders. We draw one fixed border from the center to the top, and then draw $n-1$ more with randomized positions in the chart. Furthermore, let's say the chart shows the votes acquired by candidate 1, candidate 2, etc from the top going clockwise. Illustration Now, the probability of the first candidate getting $\geq50\%$ of all votes (shaded area) is equivalent to the probability of all $n-1$ lines being on the unshaded side of the disk, which is $\left(\frac12\right)^{n-1}$.

Since this probability should be the same for every candidate, the overall probability of someone getting at least half of all votes cast is $n\cdot\left(\frac12\right)^{n-1}=\frac n{2^{n-1}}$.

I found with some coding that for my pie chart analogy of the problem, this is indeed correct (yeay!).

However, for the original problem... I was absolutely wrong! With some coding (again), I found out that the actual probability is very close to, if not $\frac1{(n-1)!}=\frac n{n!}$.

Why's that? I'm certain it's because the two behave differently, but if that's so I don't get why they do.

What causes this behavior? Has this problem been studied before, and is this perhaps a branch of probability theory?

Thanks in advance!

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+4
−0

with randomized positions in the chart

Implicit in this description of your model is the notion that every location for a separator line is equally likely. This doesn't describe the actual distribution of vote totals; with random voters, lines are much less likely to appear near other lines than farther away. This is because to have two lines very close to each other, some candidate would have had to have received only a handful of votes; if each voter votes randomly, this is much less likely than a distribution where all candidates have roughly equal votes.

The actual probability that one candidate obtains a majority of votes, for $n > 2$ candidates, depends on the size of your voting population and approaches zero as the population grows, so your other formula isn't correct either. (The number of votes a single candidate receives is a random variable with a binomial distribution, and the standard deviation of a binomial distribution is proportional to the square root of the number of voters, which grows more slowly than the number of votes needed for a majority.)

History
Why does this post require moderator attention?
You might want to add some details to your flag.

2 comment threads

Probability approaches zero (6 comments)
Extremes (1 comment)
Probability approaches zero

If I understand your reasoning correctly: the standard deviation of a binomial distribution is proportional to the square root of the population, which in turn makes it more likely for the votes cast between participants to be roughly equal as the population grows.

I'd say I agree with this... Then what exactly causes the roughly $\frac1{(n-1)!}$ pattern?

For my code, I used Python's random.random() for RNG (with accuracy of 15 decimal points, from what I've counted) and consistently got the probability to be around the aforementioned formula. If you'd like, I can share the full code.

Or, perhaps, we're both missing something here...

r~~‭ wrote 7 months ago

Does your code vary the population size at all? You should definitely see a change in your observed probability for the same $n$ if so, which means that a formula expressed only in terms of $n$ can't be right.

TheCodidacter, or rather ACodidacter‭ wrote 7 months ago · edited 7 months ago

Initially my code only relied on random.random() for population size. I then tweaked it to make the population size vary (as you said) by using random.randint(), and it does converge around $\frac1{(n-1)!}$. The code starts at a small population size then increases it by powers of 10. Interestingly, the probability starts at around 0 before converging!

Sample probability [average population 33]: 0.002813
Sample probability [average population 303]: 0.007548
Sample probability [average population 3003]: 0.00822
Sample probability [average population 29997]: 0.008382
Sample probability [average population 299993]: 0.008452
Sample probability [average population 3000497]: 0.008285
Sample probability [average population 29994492]: 0.008272
Sample probability [average population 300013648]: 0.008474
Sample probability [average population 3000266383]: 0.008215
Sample probability [average population 30007235623]: 0.008407
Sample probability [average population 300005342759]: 0.00847
TheCodidacter, or rather ACodidacter‭ wrote 7 months ago · edited 7 months ago

Continued:

Sample probability [average population 2999153595322]: 0.008285
Sample probability [average population 29998082716178]: 0.008359
Sample probability [average population 300075885268942]: 0.008453
Sample probability [average population 2999350982695126]: 0.008428
Sample probability [average population 29993189269362720]: 0.008313
Sample probability [average population 300018871848862144]: 0.008271
Sample probability [average population 3000692918613922816]: 0.008287
Sample probability [average population 30002176684710653952]: 0.008337
Sample probability [average population 299898495320259624960]: 0.008485
Expected probability: 0.008333333333333333

The above data is for $n=6$, and the population deviates slightly from $3\cdot10^x$ because what the code does is pick a random number with random.randint() for each candidate, and not have a fixed total population size. The number being increased in powers of 10 is the maximum value for random.randint().

r~~‭ wrote 7 months ago · edited 7 months ago

because what the code does is pick a random number with random.randint() for each candidate, and not have a fixed total population size.

So instead of simulating each voter making a random choice, you're choosing the number of voters for each candidate from a uniform distribution? This has a similar flaw to the pie chart model: the number of voters for each candidate has a binomial distribution, not a uniform one. Your simulation massively overrepresents the cases in which candidates have an unusually high or low number of voters.

Just run a simple simulation with 300 voters, each individually randomly choosing one of 6 candidates, and count the number of times you get a majority winner out of 100,000 simulations (Python should be able to do this in less than a minute on any modern CPU). If you get even a single one I will be surprised—0.008*100000=800 is a huge overestimate.

My code runs on an online IDE, which makes it much slower 🙂

I have tested just 3 candidates on 10,000 simulations, and admittedly to my surprise, you were absolutely right!

So that's what I've been confusing this whole time–randomly choosing the number of votes per candidates makes it more likely for an unbalanced result, hence why it behaves differently than each voter choosing randomly on their own.

I have been enlightened. Thank you very much!