University of Luxembourg
November 15, 2015
While waiting: please login to Moodle and Google Drive
Download the files luxembourg and 1000emails in both formats
Online slides optimised for Full-HD screens in full-screen mode
Download PDF here
Doing Digital History: Introduction to Tools and Technology
Why would we want to analyse history by the numbers?
What if you want to analyse:
Cannot focus on all the stories, need for something else
[T]he study of History through the history of things that can be quantitatively measured – wealth, goods, and services that were taxed and recorded, and population.
Guldi & Armitage, p97
Big data enhance our ability to grapple with historical information. They may help us to decide the hierarchy of causality – which events mark watershed moments in their history, and which are merely part of a larger pattern.
Guldi & Armitage, p89
But from our discussion of Big Data, we focused on correlation
It could be interesting to see:
Time on the Cross tried this approach: is the qualitative judgement of slavery also a quantitative one?
Questions: did slaves really live in such awful circumstances? And was slavery economically inefficient?
Not all slaves had it bad, and Southern states 35% more efficient than Northern states
[T]he authors argued that each slave was only whipped something like 7.2 times per year and so slavery wasn’t as brutal as its conventional image. As if one severe whipping in an entire lifetime wouldn’t be bad enough.
Time on the Cross also received quantitative criticisms: statistical mistakes or wrong assumptions
Leezenberg & De Vries (2001, Wetenschapsfilosofie voor Geesteswetenschappen) ask:
Does quantitative history 'undress the historical argument' (Nawrotzki & Doughterty)?
For your next assignment you will download quantitative data to analyze
We will browse the data by country, let's look up Luxembourg
There are three data formats:
We will be using the CSV file luxembourg.csv
But also download the Excel/OpenOffice file just in case
Comma Separated Values, is an open standard
In HTML we learned how to represent data in a table
Because CSV is a standard:
Go to http://drive.google.com, log in, and click the big red "NEW" button and select "File upload"
Find the file on your hard drive, select, and click "Open" to upload it to Google Drive
When your Google Drive is in English you can select the CSV, otherwise the Excel-file will work better
Find the file in your Google Drive, right-click, select "Open with" and select Google Sheets
Google Sheets should now open a nicely ordered sheet as shown here. To clean it up, select the first 4 rows, right-click, and select delete
Select the first 2 columns, right-click, and select delete
Select the 2nd column with Indicator Codes, right-click, and select delete
Select the 1st row with the years and copy using ctrl+c (Windows) or cmd+c (Apple)
Click the + in the lower-left corner (encircled) to create a new sheet, and paste the row here using ctrl+v (Windows) or cmd+v (Apple)
To search in the first sheet, select it at the bottom, and use ctrl+f (Windows) or cmd+f (Apple) to search for gdp (current US$). Select and copy it
After copying the row with gdp (current US$), go to your new sheet and paste
Select the two rows by dragging your mouse from the 1 in row 1 to the 2 in row 2
Select "Insert" in the menu bar and select "Chart..."
Google Sheets will suggest several charts. Choose the second line-chart and select "Insert"
When your chart looks completely different even though you use the same property, and you have tried with the CSV, go back and upload the Excel file instead
Go back to your first sheet and search for "electric power" to find the row electric power consumption (kWh per capita). Select this row and copy
Paste the row in the new sheet under the rows you have. The chart should be updated automatically
The 2 properties have very different values. Hold mouse on the second line until you see "Edit series", and select the _l symbol (encircled) to create a second y-axis
You should now see 2 lines that you can compare. Change the x-axis title by clicking it and enter "Year"
Press enter to apply
To edit the chart further, select the chart and click the triangle in the upper-right corner and select "Advanced edit..."
In this window you can further customize the chart
One interesting visual change is to select "Smooth". Click "Update" once you're done
To share the Google Sheet, click the big blue "Share" button in the top-right corner and click "Get shareable link"
The sharing window will now show a URL you can copy-paste into your report.
When you click the dropdown "Anyone with the link can view" you are provided other options
To share only the chart, you can click the triangle in the upper-right corner and select "Save image"
Download the 1000mails.csv file from Moodle and upload to Google Spreadsheets as before
Drag the gray line above row 1 to below row 1 (see red circle)
This way you can easily sort columns alphabetically or otherwise without losing the headers
6/30/2010 11:53:00 → M(M)/DD/YYYY HH:MM:SS
(I the ODS file the date may be written out slightly differently, but same principle applies)
Rather than as a date, we can treat this as a string of 18/19 characters
To work from the other end, use =RIGHT(field;length)
To get for example only the year, select with left the first 10 characters, then in another column take the right 4 from that
Of course, 6/ isn't a real month
To remove the /, we will use =SUBSTITUTE(field,"char","")
To save only the result and not the formula, copy the entire Date3 column, create a new column Date4
Right-click the new column, and select paste special > paste values only
Select the Date4 column with all the values, click the 123 button in the topbar and click Number (even when this is already checked)
Now you can sort by the Date4 column
The spreadsheet contains a number of other fields
To make a timeline of just emails written by Clinton rather than others, sort the From field and select only relevant emails
(Tip: maybe copy only the relevant rows to a new sheet to keep a view of what you want)
This way you could compare between different email authors
(Skip this if you can't get the formula to work)
Try to make a selection per question (e.g., all emails written by HC, or all mentioning a specific organisation)
For example, if we want to show a timeline for just the emails by Hillary Clinton, we:
To visualise, see the next slide
In your chart, the X-axis will be what you selected from Date (for example, the months), the Y-axis the number of emails
For example, if we want to show a timeline for the emails by Hillary Clinton, we:
Select the time column, and insert a chart
Aggregate by the time column, so that is shows the number of occurrences of each month
Catherine Jones & Daniele Guido
Thursday 17 November 2016, 12:00
"Aquarium", 4th floor MSH
Take the 1000mails.csv file and work with it in Google Spreadsheet
Try to create several timelines of interest using modifications of the Date field
For the brave: you can also use the files 10kmails.csv or allmails.csv
Work in pairs of two or three
Link to the original data and include a link to your Google Sheet (via the Share button)
Hand in the assignment in HTML, include your name and a decent profile photo
800-1500 words, in English
Email to firstname.lastname@example.org before the start of the lecture of 29 November