CLARIAH – Building tools for structured, textual and audio-visual data

On Tuesday March 11th, the CLARIAH project organized a kick-off meeting at the Meerten’s Institute in Amsterdam to present the five chosen projects.

CLARIAH is a contamination of DARIAH and CLARIN, combining the goals of the two projects. Although CLARIAH did not receive the funding it requested, it did receive a million euros of ‘seed capital‘ to keep the proposal going and build a showcase of why CLARIAH is of importance. To achieve this, five projects will build demonstrators to showcase the aims of CLARIAH, as well as show the technological possibilities pursued. Although the presentations were short, filled with acronyms and the projects are still in their infancy, I’ll try to write a short summary of what the projects were about.

Structured data

Two projects will address structured data. The first project, HLZ (short for HSN LINKS Zeeland) aims to build a relational database of genealogy in Zeeland, in which birth, marriage and death certificates are combined. This data is readily available, and genealogies are very interesting for many people, so this project sounded really nice. For more info, see this page.

The second project CLIO-DAP, aims to encourage and facilitate Data Availability Policies, i.e. policies by journals regarding publication of research data along with articles. So far, the project is in talks with several journals who have a DAP, or who are thinking about creating one, or who used to have one but discarded it; what the end-result will become did not become entirely clear to me yet.

Textual data

One project addresses textual data, which is already the main focus of CLARIN-projects. This third project, Nederlab, was described more from a scholarly point-of-view, rather than a technical one. In short, this project aims to create tools on the KB newspaper (also used in our PoliMedia project) and the DBNL archives to provide visualizations such as topographical maps (e.g., showing where people tweet about carnaval in the Netherlands, see here) or an n-gram viewer (e.g., see Google’s n-gram viewer here), allowing for statistical and qualitative analyses.

Audio-visual data

The third type of data addressed by two projects is audio-visual data, and unsurprisingly both projects were presented by NISV. The fourth project is TROVe, which aims to combine news sources to create an overview of how a news event develops in the media by combining multiple newspapers, television reports and Twitter. This sounds similar to the Newsreader project and sounds incredibly ambitious.

The fifth project is Oral History Today, in this project several datasets from Oral History research (e.g., a dataset of recorded interviews with war-veterans) will be combined so that relations between different interviews can be found and analyzed. See this Dutch post for more info.

My interest

I attended this meeting as I’m a project member of Oral History Today, in which Erasmus University will research user requirements and perform user evaluation. As this project aims to build a search tool for academics for a specific audio-visual type of datasets, it is related to my current research of how academic researchers search in political datasets (PoliMedia) as well as how people, and academic researchers specifically, search in audio-visual archives (AXES). I hope that researching how Oral Historic audio-visual archives could or should be searched will provide new insight in the search techniques of researchers, and will lead to a great demonstrator on this type of dataset, for which only few search tools (which, as far as I know, are all dataset-specific) exist so far.

Leave a Reply

Your email address will not be published. Required fields are marked *