A while back ago I wrote that linking collections is the way to go for digital libraries. Indeed, we see that digital libraries are moving in this direction. For example, Europeana’s plan for 2014 is to shift their priorities from portal to platform. Besides the ability to develop tools on top of linked data platforms, this shift introduces the possibility of semantic search, where the search engine has a certain level of understanding about the concepts in the search query. Thus far, search engines such as Google and Bing have worked by indexing text in webpages without a real understanding of what they mean. Now both are working on semantic search, but in very different ways.
The Semantic Web
First, let me explain a bit about what this semantic web and semantic search is. The idea of the Semantic Web was introduced by Tim Berners-Lee, James Hendler and Ora Lassila in their landmark paper “The Semantic Web” in Scientific American in 2001. They identified a problem with the Web (emphasis mine):
Most of the Web’s content today is designed for humans to read, not for computer programs to manipulate meaningfully. Computers can adeptly parse Web pages for layout and routine processing–here a header, there a link to another page–but in general, computers have no reliable way to process the semantics.
The solution they propose is that publishers should enhance the content of the Web with machine-readable, structured data. With this structured data, machines can read and comprehend the contents of webpages. Amongst other possibilities for Artificial Intelligence, this could help search engines to understand questions, documents and make inferences not explicitly present on webpages.
Without going into the technical details; at the most basic level data is structured according to entities and relations. Instead of working with the text “Max Kemman” the web could use an entity, or unique object, that represents the person Max Kemman. With relations, we can define that the person Max Kemman has a website maxkemman.nl, and a Twitter account @MaxJ_K. With these relations, a search engine can infer that this website and Twitter account belong to the same person and are thus related, even when I myself have not expressed that information. Thus, semantic search allows searching for information not always explicitly on the web, allowing the searcher to perform actions in a single step.
Google Knowledge Graph
In 2010, Google acquired Metaweb, a company that developed technology to structure data from the web. This technology forms the basis of Google’s Knowledge Graph, “a distillation of what Google knows about the world” (John Giannandrea in Technology Review). Instead of publishers putting structured data on the web for Google to acquire, Google takes messy content from webpages and structures it. John Giannandrea explains (emphasis mine):
We basically take all this raw data and filter it to decide our confidence level and whether to change the Graph. (…) The original semantic Web idea was that people with data would emit it in standard formats and then some search engine like Google would come along and aggregate it and provide all kinds of wonderful services. That powerful idea of teaching computers about the world of knowledge wasn’t happening fast enough, and we wanted to get it started by gathering a critical mass of stuff.
Thus, publishers weren’t fast enough in structuring data, so Google will do without their help. This structured data is published by Google in Freebase.How this works for end users is described nicely by Peter Meyers on The Moz Blog. Sometimes Google can give a direct answer to a query, but sometimes it shows its interpretation of a web page, see for example figure 1 on the right.
What can be seen here is that Google scrapes the answer to the question from a webpage, even though this webpage was not the first search result. As such, Google is able to answer a question, even when the source data is unstructured.
Bing Entity Engine
In 2013, Bing unveiled the Entity Engine. On first sight, Bing works similarly; taking messy content from webpages and structuring it. Interestingly, they also seem to be using Google´s data from Freebase. Moreover, while Google seems to work pretty much independently, Microsoft has a much stronger focus on partnerships, as Derrick Connell explains in TechCrunch:
In the long run, though, Connell envisions an open ecosystem where any site can make actions available using a standard markup language (he mentioned schema.org as an option in our conversation). Then, when a user looks for an entity, Bing could map this to an entity provider and shorten the path users take between searching for something and putting this knowledge into action. Ideally, this could even mean taking the action right on Bing.
In other words, Bing is anticipating publishers to put their data in structured form on the web. This appears to lie closer to the original vision of the semantic web, which Google thought was too slow. The difference is that Bing provides a platform and partnership to try and convince publishers to participate.
Both companies have very similar intentions with their semantic search. To 1) understand the information on the web, and 2) improve usability for the end users. The user task to be improved seems different though, Sean Gallagher of Ars Technica writes:
On the surface, Google’s new engine appears to be more about getting answers to questions, while Microsoft’s new Bing front-end exposes entities in a way that is more suited to taking actions—or to making transactions.
In my own projects, we emphasise a third user task, namely discovery and exploration by connecting entities in different collections. In PoliMedia, this linking allows searching the coverage of parliamentary speeches from debates in newspapers and on the radio. These three collections (parliamentary proceedings, newspapers and radio bulletins) can be explored within one interface, allowing the searcher to focus on one theme. In Talk of Europe, the proceedings of European Parliament are curated to semantic web standards to allow other collections to be linked, allowing exploration of a European network of collections.
Will search change?
The question for the future is, will this fundamentally change the way users interact with search engines? Probably yes, as search will be less about finding web pages, and more about gaining direct access to the information the user needs, as we can already see with Google’s Answer Box (see figure 1). However, search engines face the same problem of relevance, as explained in TechCrunch:
Assuming enough third-party sites opt in to this kind of action entity system, the problem for Microsoft becomes one of relevance. “As you get more partners participating, which provider you surface becomes a relevance problem,” Connell noted.
So perhaps this semantic search will be more intelligent, but will still show multiple answers ranked by some kind of relevance to the query. Ultimately, the good old search for webpages will definitely not disappear, as explained in TechCrunch:
As for the good old 10 blue links that have dominated online search since its early days, Connell believes that people will always have a desire to find more and to dig deeper.
Allen, J. Being a Super Villain Just Got Easier with Bing Snapshots. Search Engine Watch, 10-12-2012
Berners-Lee, T., Hendler, J., & Lassila, O. (2001) The Semantic Web. Scientific American, May 2001, p. 29-37.
Gallagher, S. How Google and Microsoft taught search to “understand” the Web. Ars Technica, 27-06-2012
Kleppe, M., Hollink, L., Kemman, M., Juric, D., Beunders, H., Blom, J., Oomen, J. Houben, G.-J. (2014). PoliMedia – Analysing Media Coverage of Political Debates By Automatically Generated Links to Radio & Newspaper Items. In D’Aquin, M. et al. (Eds.), Proceedings of the LinkedUp Veni Competition on Linked and Open Data for Education. Geneva, Switzerland: CEUR-WS.
Lardinois, F. Microsoft Has Big Plans For Bing’s Entity Engine. TechCrunch, 30-03-2014
Meyers, P. Knowledge Graph 2.0: Now Featuring Your Knowledge. The Moz Blog, 25-03-2014
Muthalaly, S. Portal to platform and other priorities in Europeana Business Plan 2014. Europeana Professional blog, 12-03-2014
Silver, C. The great shift in search. Gigaom, 22-09-2013
Simonite, T. How a Database of the World’s Knowledge Shapes Google’s Future. MIT Technology Review, 27-01-2014
Qian, R. Understand Your World with Bing. Bing Search Blog, 21-03-2013.