Saturday, November 21, 2009

Google Challenge1: Weekends!

I have had some time issues in updating this blog. Till I re-prioritize my activities, I have thought of starting a new filler section called Google Challenge. It's about gathering the most feasible statistical inference using Google. Here is an interesting one.

I hate my weekends. My friends ridicule me when I share this sentiment. I believe there are lots more out there who think Weekends are an absolute waste of time :). In order to prove this, I need to figure out, statistically how many good folks out of 10, think likewise. My gut feel says it should be at least 3 out 10. Now let's find out.

There are couple of ways to do it. One, I could do a survey with a sample of people. The key challenge is, it's hard to get a random sample to survey. I would prefer a geographically and culturally diverse people, including single, married, men, women, kids. And obviously, I don't have the financial means to hire Gartner or BCG to help me here, so let's try to find a simple and cheap way to do this.

Attempt1# is to use 2 keywords "I hate weekends" and "I love weekends" and look out the no. of pages google returns. Note: the double quotes ensures the exact string match.

"I love weekends": 498,000 hits
"I hate weekends": 89,100 hits
Inference: For every 100 people who really enjoy their weekends, there are 18 of us who absolute hate it.

This Definitely doesn't help my case. The percentage is too small. Conceptually there are a lot of issues with this:
1. Results included lyrics of songs that had "I love weekends", youtube videos, reviews etc. which skewed the results.
2. If you change the keywords to "I love weekend" (remove 's' in the end) , "I love sundays", etc, each results in a wide spectrum of results. So for simplicity sake, I will stick to the earlier keywords.

Attempt 2# Rather than searching all results, let's focus on blogs. Blogs are personal accounts of people.

"I love weekends": 214,000 hits
"I hate weekends": 74,100 hits
Inference: For every 100 people who really enjoy their weekends, there are 35 of us who absolute hate it. This number is a lot nicer :)

Some issues with this method are:
1. Not all blog. And those who do, have lots of time to kill, and more prone to hate weekends :) like me
2. Demography that blog tends to be a more young generation

Attempt 3# Using twitter to find a more personalized result
"I love weekends" : 27,400 hits"
"I hate weekends" 4,970 hits

Inference: For every 100 people who really enjoy their weekends, there are 18 of us who absolute hate it. Interestingly, this result is quite similar to Method1, which I think is sheer co-incidence.
Some really obvious issues with this method are:
1. Not all twitter
2. Google doesn't index all tweets
Now, in all three cases, we are forgetting the group of people who are indifferent to the concept of "weekends". Assuming that's 10% of the population and assigning 20% significance to method1, 70% to method2 and 10% to method3. Below are the results.

i.e 7 out 10 love their weekends, 2 absolutely hate and 1 is indifferent. Great! That was fun. I still stick to my 3 in 10 gut feel. But I don't know if I can ever statistically prove it. Maybe some day we can analyse human behavior better.

Do comment if you have a better, easier and faster way to deduce the above challenge. Happy googling!