Currently I’m following a MOOC on Information Visualization offered by Indiana University called IVMOOC. Each week a course is made available and thus far I’ve done the “when”, “where” and “what” modules. As an additional incentive, students gain access to the Scholarly Database (SDB) with which I have been having some fun.
A did a search in all NEH awards, for which the database contains 47,197 records from 1970-2012 SDB NEH explanation: http://wiki.cns.iu.edu/display/SDBDOC/NEH+Awards. Do note the chart at the bottom showing the distribution of records..
SDB offers full-text search in titles and abstracts. I tried the following:
- “Digital Humanities” in titles: 20 records from 2001-2012
- “Humanities Computing” in titles: 0 records (to see if there was DH-related work before coining of the term “DH”)
- “Digital Humanities” in abstract: 82 results from 2001-2012
- “Humanities Computing” in abstract: 3 results from 2006-2008
In order to create an interesting visualisation, I wanted a nice bunch of result, so I made the searches broader by searching for digital OR computational in abstracts, resulting in 654 records from 1985-2012, with a total “original amount” of $95,248,977.4 The data contains several figures, namely approved_outright, approved_matching, award_outright, award_matching and original_amount. I’m still figuring out a bit which figures I should focus on.
What is this research about? After performing text normalization of the abstracts in Sci2, I entered the words from the abstracts into Wordle, see the figure below.
Going a bit further, I created a co-occurrence network of words in the title. Here I chose to use the title because for the Wordle more words gets a better overview, while for the co-occurrence network fewer more important words works best. This network was created with the Sci2 tool again: I selected the top 500 edges connecting words, laid out the network with GEM and zoomed in on the graph that was most central (there were a couple of graphs with only two words, I removed those here). Finally, the words are enlarged according to number of references, and the more important words got labels. See the figure below.
To see a trend of NEH funding I aggregated all the funding per year to see the total amount award and the number of grants. See the figure below, created with Excel.
The first hit is from 1985, which according to the description is about a history of the first two digital computer companies. You might see this is a false hit, but I leave it in because it shows that even with potential false hits, the trend is still clear that funding for digital&computational related topics is growing.
A bit between “what” and “when” are topic bursts; visualisations of terms that are used more frequently in a certain timeframe. I have visualised the normalised texts from titles and descriptions, which gives two rather different pictures, see the figures below. The one with titles I find especially interesting, as the topics after 2001 seem to become more related with creating and preserving in archives.
To see where DH work is taking place, I aggregated all the funding at the state level. I first tried to aggregate at the zip-code level, but this proved too elaborate and resulted in far too many points close to each other. Aggregating at the state level, I could map the amount of awards, the amount matched, and the number of grants awarded. See the figure below, which was created with Sci2.
For the sixth week of IVMOOC, I wanted to make a co-occurrence network of researchers. Unfortunately, the NEH data only shows one PI per project. Instead, I created a bipartite visualization where authors are linked to their institutions. First, I concatenated first and last names in order to deduplicate people with similar last names. Because there are a lot of authors and institutions in the dataset, I have filtered out to only use those where the sum of all received grants is higher than 500k, and have deleted isolate nodes. In other words, when searching a group to work with, use the below to follow the money?
On closer inspection, I found that this visualization is actually mainly determined by the PIs. This is due to the deletion of isolate nodes: a university may have collected over 500k of funding, but when this is divided over multiple PIs with all <$500k, this is not shown anymore. To show how this works, see also the second figure with all DH millionaires; there are a lot more universities than PIs present, which provides a much more truthful picture.
|↑1||SDB NEH explanation: http://wiki.cns.iu.edu/display/SDBDOC/NEH+Awards. Do note the chart at the bottom showing the distribution of records.|
|↑2||The data contains several figures, namely approved_outright, approved_matching, award_outright, award_matching and original_amount. I’m still figuring out a bit which figures I should focus on.|