Who? Networks of social entities

Max Kemman
University of Luxembourg
December 13, 2016

Online slides optimised for Full-HD screens in full-screen mode
Download PDF here

Doing Digital History: Introduction to Tools and Technology

Today

  • Final assignment
  • Preparing the data with Palladio
  • (Cleaning the date column with Google Spreadsheets)
  • Visualising with Palladio
  • Next time

Final assignment

Some additional info about the final assignment

The computers in the TIC-Lab are powerful enough to work with all mails in Google Spreadsheets

(You may also use Excel if you prefer, but more difficult for me to help when you're stuck)

Create a selection and argument why this selection

Deadline: 20 January 2017 23:59

You receive grades on Friday 27 January 2017

Final assignment data

All data is in Moodle in folder Final Assignment:

  • allmails-metadata.csv & allmails-metadata.ods
  • allmails-ner.csv & allmails-metadata.ods (including mentioned people, organisations, locations)
  • allmails-geocoded.csv (about 108k locations)
  • Folder with text files per 1k

Preparing the data with Palladio

To visualize the coded data, we will use Palladio: http://hdlab.stanford.edu/palladio/

First we need to prepare the data for Palladio

Loading the data

Click Start

We will use the 1000mails-cleandate.csv file from Moodle in the Who folder)

Drag the CSV file onto the text input field

Click Load

Preparing the data

You will get a list of the columns from the spreadsheet

You can already give your project a title and your data table as well

Do not close this tab or refresh, you will have to start over!

Let's look at several columns

From

Sort the values by Frequency

Check the data type

Click Close

Date

To set the data type to date we need the format: YYYY-MM-DD

In our original CSV the format included the clock, but here we have the data in the right format so it's automatically recognised

See next section for how to clean the date

Click Close

People

This contains the named entities per email

To separate multiple people in an email, enter the delimiter | in the Multiple values box

Click Close

People

This contains the named entities per email

To separate multiple people in an email, enter the delimiter | in the Multiple values box

Click Close

Cleaning the date column with Google Spreadsheets

Here we used Google Spreadsheets, but also possible in Excel & LibreOffice

You can skip this for now, but important for final assignment

Cleaning the Date field

Select the Date column, and go to Format > Number > More Formats > More date and time formats

Cleaning the Date field

Select the appropriate option YYYY-MM-DD and click Apply

Cleaning the Date field

The Date column will now have the appropriate form

Exporting the CSV

Click File > Download as > Comma-separated values (.csv, current sheet)

Visualising with Palladio

Now let's look at the network by selecting Graph at the top bar

As a source, choose the From and close the popup

As a target, choose the To and close the popup

Wait and watch the result!

Palladio Graph Settings

Try the two Highlighting check-boxes

Try Size nodes

What can we learn from this graph?

Facet

To filter for certain attributes, select Facet in the lower-left corner

As a Dimension select From and close the popup

Now you can select to filter emails only from one person

You could alternatively filter emails mentioning a specific person, location, or organisation

To refine even further, we can select more facets by selecting the Dimension and selecting more options

To remove a facet, delete the red trashcan in the lower right corner

Facet selection from From column

Facet selection from People column

Timeline

We can also create a timeline of the emails by clicking Timeline

Drag the mouse in the timeline to create a bar that acts as a filter

And drag the bar to move it around so you can see how the network develops: you could compare months or years

To remove the timeline filter, delete the red trashcan in the lower right corner

Timeline

Filtering one part of the timeline

Filtering another part of the timeline

Why filtering?

The network can become quite large when you have more emails, or when you select one of the people, locations, organisations columns in the graph

Filtering will help to be able to read the spaghetti/graph

See next slide an example of a spaghetti ball (trying to do this might make your computer quite slow)

Sharing

To export a graph, click the Download button in the settings (the lower one). This will export an SVG file that you can embed in your HTML report with

<img src="Palladio Graph.svg" alt="graph">

To export the entire workspace, click the upper Download button. This will export a JSON file that you can load next time (see next slide)

If you previously exported your workspace, you can load it in by selecting "Load an existing project" and choosing the JSON file. Also useful to share with project partners

For next time

20 December

Wrap-up