SR | Emerging Technologies Librarian

Monday I read the already infamous article published January 9th which concludes that Google Scholar is, basically, good enough to be used for systematic reviews without searching any other databases.

“Conclusion
The coverage of GS for the studies included in the systematic reviews is 100%. If the authors of the 29 systematic reviews had used only GS, no reference would have been missed. With some improvement in the research options, to increase its precision, GS could become the leading bibliographic database in medicine and could be used alone for systematic reviews.”
Gehanno JF, Rollin L, Darmoni S. Is the coverage of google scholar enough to be used alone for systematic reviews. BMC Med Inform Decis Mak. 2013 Jan 9;13(1):7. http://www.biomedcentral.com/1472-6947/13/7/abstract

Leading the argument from the library perspective is Dean Giustini, who has already commented on the problems of:
– precision
– generalizability
– reproducibility

Giustini D. Is Google scholar enough for SR searching? No. http://blogs.ubc.ca/dean/2013/01/is-google-scholar-enough-for-sr-searching-no/

Giustini D. More on using Google Scholar for the systematic review. http://blogs.ubc.ca/dean/2013/01/more-on-using-google-scholar-for-the-systematic-review/

While these have already been touched upon, what I want to do right now is to bring up what distresses me most about this article, which is the same thing that worries me so much about the overall systematic review literature.

Problem One: Google.

First and foremost, “systematic review” means that the methods to the review are SYSTEMATIC and unbiased, validated and replicable, from the question, through the search, delivery of the dataset, to the review and analysis of the data, to reporting the findings.

Let’s take just a moment with this statement. Replicable means that if two different research teams do exactly the same thing, they get the same results. Please note that Google is famed for constantly tweaking their algorithms. SEOMOZ tracks the history of changes and updates to the Google search algorithm. Back in the old days, Google would update the algorithm once a month, at the “dark of the moon”, and the changes would them propagate through the networks. Now they want to update them more often, so there is no set time. It happens when they choose, with at least 23 major updates during 2012, and 500-600 minor ones. That is roughly twice a day. That means you can do exactly the same search later in the same day, and get different results.

Google Algorithm Change History: http://www.seomoz.org/google-algorithm-change

That is not the only thing that makes Google search results unable to be replicated. Google personalizes the search experience. That means that when you do a search for a topic, it shows you what it thinks you want to see, based on the sort of links you’ve clicked on in the past, and your browsing history. If you haven’t already seen the Eli Pariser video on filter bubbles and their dangers, now is a good time to take a look at it.

TED: Eli Pariser: Beware Online Filter Bubbles. http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html

If you are using standard Google, it will give you different results than it would give to your kid sitting on the couch across the room. This is usually a good thing. It is NOT a good thing if you are trying to use the search results to create a standardized dataset as part of a scientific study.

People often think this is not a big problem. All you have to do is log out of any Google products. Then it goes back to the generic search, and you get the same things anyone else would get. Right? Actually, no. Even if you switch to a new computer, in a different office or building, and don’t log in at all, Google is really pretty good at making a guess at who you are based on the topics you search and the links you choose. Whether or not it guesses correctly doesn’t matter for my concerns, the problem is that it is customizing results AT ALL. If there is any customization going on, then that is a tool that is inappropriate for a systematic review.

Now, Google does provide a way to opt-out of the customization. You have to know it is possible, and you have to do something extra to turn it off, but it is possible and isn’t hard.

Has Google Popped the Filter Bubble?: http://www.wired.com/business/2012/01/google-filter-bubble/

Now, the most important question is does it actually turn off the filter bubble. Uh, um, well, … No. It doesn’t. Even if you turn off personalization, go to a new location, and use a different computer, Google still knows where that computer is sitting and makes guesses based on where you are. That Wired article about Google getting rid of the filter bubble was dated in January of 2012. I participated in a study done by DuckDuckGo on September 6th, and reported in November on their blog. Each participant ran the same search strategies at the same time, twice, once logged in and once logged out. They grabbed screenshots of the first screen of search results and emailed them to the research team. The searchers were from many different places around the world. Did they get different results? Oh, you betcha.

Magic keywords on Google and the consequences of tailoring results: http://www.gabrielweinberg.com/blog/2012/11/magic-keywords-on-google-and-the-consequences-of-search-tailoring-results.html

Now try to imagine the sort of challenge we face in the world of systematic review searchers. Someone already published a systematic review. You want to do a followup study. You want to use their search strategy. You need to test that you are using it right, so you limit the results to the same time period they searched, to see if you get the same numbers. I don’t know about you, but I am busting with laughter trying to imagine a search in Google, and saying, “No, I just want the part of Google results that were available at this particular moment in time five years ago and three months and ten days, if I was sitting in Oklahoma City.” Yeah, right.

Take home message? Google cannot be used for a systematic review. Period. And not just because you get 16,000 results instead of 3,000 (the precision and recall question), or because Google is a more comprehensive database than the curated scholarly databases that libraries pay for and thus you end up with poor quality results (also impacting on sensitivity and specificity), but purely on methodological grounds.

Problem Two: Process.

Doing a systematic review is supposed to be SYSTEMATIC. Not just systematic for the data analysis (a subset of which is the focus of the Gehanno Google Scholar article), but systematic for the data generation, the data collection, the data management, defining the question, analysing the data, establishing consensus for the analysis, and reporting the findings. It is systematic ALL THE WAY THROUGH THE WHOLE PROCESS of doing a real systematic review. The point of the methodology is to make sure the review is unbiased (to the best of our ability, despite being done by humans), and replicable. If both of those are true, someone else could do the same study, following your methodology, and get the same results. We all know that one of the real challenges in science is encountering challenges with replicating results. That doesn’t mean it is OK to be sloppy.

The Gehanno article tries to test a tiny fraction of the SR process – if you can find the results. But they search them backwards from the normal way such a search would be done. The idea that the final selected studies of interest in specific systematic reviews will be discoverable in Google Scholar is also fairly predictable, given that Google Scholar scrapes content from publicly accessible databases such as PubMed, and thus duplicates that content.

It is unfortunately that their own methodology is not reported in sufficient detail as to allow replicating their study. What they’ve done is a very tiny partial validation study to show that certain types of content is available in Google Scholar. That is important for showing the scope of Google Scholar, but has absolutely nothing to do with doing a real systematic review, and the findings of their study should have no impact on the systematic review process for future researchers. Specifically, this sentence is what is most misstated.

“In other words, if the authors of these 29 systematic reviews had used only GS, they would have obtained the very same results.”

All we really know is what happened for the researchers who did these several searches on the days they searched. It might have been possible, but to say that they would have obtained the same results is far too strong of a claim. For the statement above to be true, it would have been necessary to first find a way to lock in Google search results for specific content at specific times; second, to replicate the search strategies from the original systematic reviews in Google Scholar and to compare coverage; third, to have vastly more sophisticated advanced searching allowing greater precision, control, and focus; and so forth. Gehanno et al are well aware of these issues, and mention them in their study.

“GS has been reported to be less precise than PubMed, since it retrieves hundreds or thousands of documents, most of them being irrelevant. Nevertheless, we should not overestimate the precision of PubMed in real life since precision and recall of a search in a database is highly dependent on the skills of the user. Many of them overestimate the quality of their searching performance, and experienced reference librarians typically retrieve about twice as many citations as do less experienced users. … . It just requires some improvement in the advanced search features to improve its precision …”

More importantly, in my mind, is that the Gehanno study conflates the search process and the data analysis in the systematic review methodology. These are two separate steps of the methodological process, with different purposes, functions, and processes. Each is to be systematic for what is happening at that step in the process. They are not interchangeable. The Gehanno study is solid and useful, but placed in an inappropriate context which results in the findings being misinterpreted.

Problem Three: Published

Adam Marcus & Ivan Oransky. The paper is not sacred: Peer review continues long after a paper is published, and that analysis should become part of the scientific record. Nature Dec 22, 2011 480:449-450. http://www.nature.com/nature/journal/v480/n7378/full/480449a.html

The biggest problem with the Gehanno article, for me, is that it was published at all, at least in its current form. There is much to like in the article, if it didn’t make any claims relative to the systematic review methodological process. The research is well done and interesting, if looked at in the context of potential utility of Google Scholar to support bedside or chairside clinical decisionmaking. There are significant differences between the approaches and strategies for evidence-based clinical practice and doing a systematic review. While the three authors are all highly respected and expert informaticians, the content of the article illustrates beyond a shadow of a doubt that the authors have a grave and worrisome lack of understanding of the systematic review methodology. It is worse than that. It isn’t just that the authors of the study don’t understand how systematic review methodologies, but that their peer reviewers ALSO did not understand, and that the journal editor did not understand. That is not simply worrisome, but flat out frightening.

The entire enterprise of evidence-based healthcare depends in large part on the systematic review methodology. Evidence-based healthcare informs clinical decisionmaking, treatment plans and practice, insurance coverage, healthcare policy development, and other matters equally central to the practice of medicine and the welfare of patients. The methodologies for doing a systematic review were developed to try to improve these areas. As will any research project, the quality of the end product depends to a great extent on selecting the appropriate methodology for the study, understanding that methodology, following it accurately, and appropriately documenting and reporting variances from the standard methodology where they might impact on the results or findings.

My concern is that this might be just one indicator of a wide-spread problem with the ways in which systematic review methodologies are understood and applied by researchers. These concerns have been discussed for years among my peers, both in medical librarianship and among devoted evidence-based healthcare researchers, those with a deep and intimate understanding of the processes and methodologies. There are countless examples of published articles that state they are systematic reviews which … aren’t. I have been part of project teams for systematic reviews where I became aware partway through the process that other members of the team were not following the correct process, and the review was no longer unbiased or systematic. While some of those were published, my name is not on them, and I don’t want my name associated with them. But the flaws in the process were not corrected, nor reported, creating a certain level of alarm for me with respect to that particular project, as well as looking to them as indicators of challenges with published systematic review in general.

I used to team teach systematic review methodologies with some representatives from the Cochrane Collaboration. At that time, I was still pretty new to the process and had a lot to learn, but I did know who the experts really were, and who to go to with questions. One of the people I follow rigorously is Anne-Marie Glenny, who was a co-author on a major study examining the quality of published systematic reviews. Here is what they found.

“Identified methodological problems were an unclear understanding of underlying assumptions, inappropriate search and selection of relevant trials, use of inappropriate or flawed methods, lack of objective and validated methods to assess or improve trial similarity, and inadequate comparison or inappropriate combination of direct and indirect evidence. Adequate understanding of basic assumptions underlying indirect and mixed treatment comparison is crucial to resolve these methodological problems.”
Song F, Loke YK, Walsh T, Glenny AM, Eastwood AJ, Altman DG. Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: survey of published systematic reviews. BMJ. 2009 Apr 3;338:b1147. doi: 10.1136/bmj.b1147. PMID: 19346285 http://www.bmj.com/content/338/bmj.b1147?view=long&pmid=19346285

We have a problem with systematic reviews as published, and the Gehanno article is merely a warning sign. There are serious large concerns with the quality of published systematic reviews in the current research base, and equally large concerns with the ability of the peer-review process to identify quality systematic reviews. This is due, in my opinion, to weaknesses in the educational process for systematic review methodologies, and in the level of methodological expertise on the part of the authors, editors, and reviewers of the scholarly journals. Those concerns are significant enough to generate doubt about the appropriateness of depending on systematic reviews for developing healthcare policies.

Tag Archives: SR

What’s Wrong With Google Scholar for “Systematic” Reviews

Email Subscription

Flickr Photos

Delicious

Twitter

Archives

Categories