I must confess. I am a huge Idol fan. I have diligently followed the last 6 seasons. The show has consistently managed high TRPs, increasing user votes and some really creative product placements (remember the Coke glass on the judge's table and the Ford ad). The talent pool is sometimes underwhelming but the judges and the musical guests more than compensate.
Recently, another season of Idol ended with Scotty Mcreery and Lauren Alaina reaching the finals. We obviously know who won, but the question is can the results be predicted by looking at the opinions and the trends in the interwebs? So let's find out.
Obviously, a disclaimer first. There is no statistical tool involved, no complex number crunching and no opinion polls. Idea is to reveal a simple intuition on what can be achieved by simpler tools. In our case, using Google Insight.
According to world bank, 78% of population in US have access to internet. Google has about 60% of search share in US.
Google Insight (and Trends) lets you discover what are people searching for over a specified timeline. They don't give you absolute numbers but they do show you the relative popularity of search words, which is good enough when we are comparing two or more items.
I first looked at the relative popularity of the finalists and one Casey Abrams, my show favorite, over the last 90 days. As insight predicts, initially when the first gala rounds start, viewers are still firming on their favorites thus you wouldn't see clear distinctions in terms of relative search volume. But as the show proceeds, you start separating out the favorites from the rest.
As expected, Casey soon starts leading the pack, and peaks during one incident where he kisses judge Jeniffer Lopez in a stunning vocal performance. You also see him peak when he gets voted out of the contest.
Note, I am searching for their full names because it get's tricky if you just put "Scotty" or "Lauren", since they are common names and could refer to say one reality star Lauren Goodger and her recent bikini snaps.
Now zooming on to last 30 days, you obviously notice how Scotty always is a step ahead of Lauren in terms of search volume. The results for "Scotty idol" and "Scotty Mcreery" are quite correlated. The reason for choosing the former keyword is that not everyone who is looking for Scotty would type in his full name. Thus the idol suffix performs better in emulating the desired user behavior.
So, how did we perform? Well, no surprises, Scotty did win American Idol this year. Voila! Our method works. This is brilliant. Now we can predict anything. Right?
Well.. Not quite fast. There are certain issues. One, search volume may not always result in higher votes. Think of a popular camera. Imagine Millions are searching for a review of the new Canon 1100d. However, after reading the reviews, they discover the flimsy design, the below par build quality and the horrible paint job and thus reconsider their purchasing decision. What happens? Higher search volumes leading to the opposite results.
Similar issues arise when you are looking at twitter trends to predict data. You have to send the results through a Natural Language Processor (NLP), a machine learning tool to identify if the opinions are positive, negative or neutral, in order to decipher the end result.
Internet is a great source of data. Its pervasiveness in our lives can help social scientists, economists, marketeers and governments to understand our needs, wants and preferences better. Imagine the public housing body increases the supply of houses by correctly forecasting the housing needs, thus successfully keeping the housing prices in check. Possibilities are endless.
But we need more comprehensive tools. The area looks quite promising with new social analytics services being launched everyday, none of which I have tested yet. Maybe I will revisit it again one day. But for now, this is the best we've got and it works pretty well in some cases. What do you think? Leave your comments behind. Happy Surfing!