Announced today, Google is using individual’s search term patterns to track and predict the spread of the flu.
Notice that even though the country at large has only barely started to climb, Michigan is showing more activity.
I am, on the one hand, excited to see Google applying appropriate data mining techniques to develop and test skills that could be used for disaster management and general health. On the other hand, I think this tool needs some work.
First, Google Flu Trends needs to be tested and validated by public health researchers. It is great that Google is putting it out, and I am very excited about this resources as an indicator or trend showing Google’s commitment to the community at large. I would be more excited if I saw articles comparing and contrasting it with other similar tracking tools, and linking it to other informational tools beyond saying the CDC says you should get a flu shot.
Second, IMHO, the methodology. Of course, being that this is Google, we don’t really have a clue how they arrived at this. They give us access to their data, but we don’t know what they are tracking or how this is related to the outcomes. The methodology is missing, and I’m not sure how relevant the data is when you don’t know the methodology that resulted in the data. We are lacking the opportunity to validate the data. This is a problem for me. If it is something more just of general interest, then fine, trust Google without knowing how they got there. With health information, I would feel safer if I knew more. Frankly, you have the same problem with Google Trends looking at the corporate and business information they make available. Fascinating, but would you put you money behind it in planning investments?
Which leads to my third thought. What little I’ve been able to tease out about this is that they are tracking the geographic use and incidence of phrases such as “flu diagnosis”. I hope that they are using a rich collection of words related to the flu. Perhaps something like this:
(diagnosis OR symptoms OR “what’s wrong” OR “do I have”) (flu OR influenza OR vomit OR vomiting OR cough OR coughing OR chills OR aches OR aching OR headache):
Of if you want to get more technical, maybe something like this:
(diagnosis OR symptoms) (flu OR influenza OR ~vomit OR ~cough OR influenza virus OR influenza viridae OR H3N2 OR H1N1 OR H5N1 OR H9N2 OR “upper respiratory tract infection” OR URTI OR “severe acute respiratory syndrome” OR SARS OR pandemic OR Orthomyxoviridae OR “respiratory syncytial virus” OR RSV OR “West Nile virus”):
Now, what would make this all much more powerful, would be to bring together a collection of data sources that contain things people say about their health. Google searches is one. I would not be surprised if Google included phrases in people’s email if they have GMail accounts. If you also included microblogging tools such as Twitter,, Plurk, Jaiku, Pownce, etc., social networks such as Facebook and Myspace, and other social media, then we’d have such a rich source of sources that I would hope the predictive validity would be very high. Here is a screenshot from someone else who is thinking about this – Morbus on Twitter.
Twitter: Morbus:
Morbus (Flu Tracking)
Now, I just wish Morbus would share their findings.🙂

