I then apply the ratio of male to female users for each site and with some basic math determine a guestimate of your gender. The math is really quite simple, I just take:Now, I'm not against simple formulae, but the above formula is mathematically absurd for two main reasons:1 / (1 + r_1 * r_2 * … * r_n) where p_i is the ratio of men-to-women for the specific site.
1. The limit is wrong. The more fractions you multiply, the smaller the fraction gets. So, if you visit a lot of popular websites (whose numbers, due to demographics, are all slightly less than 1), the formula will go to 1/(1 + 0) = 1.0 i.e. you will be female.
2. Independence is assumed but not true. By multiplying the individual probabilities, you are assuming that they are independent. But if visiting a website is indicative of gender, then obviously, they are not independent. You can't multiply like this.
Enough with the criticism. How would I fix the formula while keeping the math simple? Change the formula to:
1 / (1 + Average(r_1 , r_2 , … * r_n) )
No comments:
Post a Comment