Today marks the tenth birthday of Google Scholar. In anticipation of this celebration, the Google Scholar team has been disseminating more information about what the idea behind Google Scholar is, and how they see the future. Since I wrote a blog post titled “What if Google killed Scholar?” a little over a year ago, an update with answers from Anurag Acharya himself is worthy of a new post.
In my blog post, I made three points why I thought it might make sense for Google Scholar to get shut down:
- It’s a niche tool for a relatively small user base.
- Today’s academic output is much more heterogeneous than it used to be.
- Finally, I asked what search scenario Scholar solved to warrant its future existence.
Thankfully, Anurag Acharya answered all these points wonderfully.
Being a niche tool is a good thing
One of the concerns often expressed is that Google Scholar does not earn Google any money. However, because Scholar serves a small user base, it does not cost Google all that much either. Moreover, Google is highly sympathetic to this particular niche, since most googlers are ex-academics themselves. In short, there is no pressure on Scholar to make money.
Academic information defined the Google way
Since Google indexes a lot more than just academic output, it has the data to determine what should be part of Google Scholar. Instead of working with a list of rules of what should be considered academic, Google can consider the wider context: if it cites other academic work, and if other academic work cites it, it probably is academic work. Although publications are included from the start since these are contextualised by their publishers, other output such as preprints and blog posts can be found using the rule above. Moreover, the team has a patent for deciding which version of a paper is the primary version.
Of additional interest is the discussion of datasets. Datasets are not discoverable through Scholar, and this has to do with a lack of means for users to judge utility. While publications have abstracts to judge utility before downloading, datasets are (apparently) lacking in this regard. As far as I know, services like figshare usually offer a field to describe a dataset, but perhaps this is not yet sufficient.
The search scenario of Scholar
Finally, what search scenario is solved by Google Scholar? The answer to this question is, in hindsight, relatively simple and is related to Scholar being a niche tool. The difficult problem at Google is to figure out the search intent: why did this user place this query? Since Scholar serves a single search problem, i.e. academic information, Google does not have to figure out the intent. Additionally, citations can be used to rank search results, as well as find related search results. These two characteristics not only make it meaningful to have this separate niche tool, but actually make Scholar a relatively simple tool for Google.
To conclude
If one thing can be clear around Google Scholar’s lustrum, it’s that Google is not killing Scholar any time soon. Instead, the Google Scholar team is expanding, building new features like personal libraries, and making advancements in getting papers to find the scholar, instead of the current time-intensive approach. There are several tools that aim to “tame the flood of literature“, but none have the data Google Scholar has on literature and users. I will be anxiously awaiting what Google will bring to the table.