In a previous blogpost, I introduced the project A Republic of Emails, where we created a dataset of the 30k Hillary Clinton Emails by scraping Wikileaks. Now that we have the data, we can start exploring with what I like to call the W-questions: What is the collection about? Where do described events take place? When did these events occur? Who are the actors involved? In this second blogpost, we will look at what the emails from the Hillary Clinton corpus are about. I will describe how we prepared the data to analyse a) the raw text, b) normalised text, and c) entities in the text (named entity recognition). Finally, we will look at a small subset of the emails using Voyant Tools. For all the steps I will point to the respective scripts on our GitHub so you can reproduce the project.
This year I will teach for the second time the Doing Digital History course for the History master at the University of Luxembourg. Just like last year, students will ask several W-questions. What is the collection about? Where do described events take place? When did these events occur? Who are the actors involved? In contrast with last year, where we had different collections per week, this year students will work with a single collection to experiment with throughout the course. In a series of blogposts I will describe the collection that the students will be exploring and the methods/tools that will be used to conduct close and distant reading. If you have feedback to further improve our ideas, please comment. If you wish to reproduce the project for your own courses, the blogposts should allow just that. As a reference to the historical Republic of Letters, I like to call this project A Republic of Emails.