Further adventures in the digital humanities?

As anticipated on this very blog, I recently spent a week in Indianapolis attending a workshop on computational text analysis at HILT 2016. We spent our time surveying a number of different tools, techniques, and concepts related to text analysis, so I walked away with a greater appreciation for data cleaning, Weka, HathiTrust, metadata, Python, and much more. The most frustrating part of the workshop was that we visited each topic so briefly and that we had so few opportunities to apply these techniques to our own work. I can’t fault the workshop organizers for these decisions—helping participants take a dozen wildly different datasets through deep dives into a particular technique would have been difficult—but I was excited enough by a lot of the concepts we covered that I was itching to try them out myself. This was the most true of topic modeling, a technique for identifying different “topics” (or themes, or discourses, or…) in the documents of a particular corpus. As we tried out this technique on a corpus of slave narratives, I was amazed at how an algorithm was able to tease out what seemed to be clearly distinct themes within and across these narratives. One of our instructors warned us against being too impressed, explaining that the underlying math was actually really simple. He certainly had a point, and I know the importance of not being blindly wowed by what an algorithm seems to do, but to not think of topic modeling as amazing because it really comes down to conditional probabilities seemed to me akin to choosing to not recognize the wonder of the French language because at its roots, it’s an arbitrary collection of mouth sounds. That said, neither French nor topic modeling can be really useful or truly amazing for me unless I spend some time figuring out how it works. I went to HILT hoping to learn a couple of neat tricks, but I came away convinced that topic modeling could have some real value for me. Over the past few weeks, I’ve added to my notebook full of dissertation brainstorming scribbles a number of references to topic modeling, and over the next few months, I hope to learn more about the process, dive more into the details, and make this a part of the work that I do.

Speaker: “Darwin’s Semantic Voyage: Exploration and Exploitation of Victorian Science in the Reading Notebooks”, Colin Allen

Join us for an invited talk, “Darwin’s Semantic Voyage: Exploration and Exploitation of Victorian Science in the Reading Notebooks”.

Speaker: Colin Allen, Provost Professor, Department of History and Philosophy of Science, Indiana University

Date: 10/23/2015
Time: 12:00-2:00
Location: Main Library, Room W444

Description: During the 23 years between his voyage on the Beagle and publication of The Origin of Species, Charles Darwin meticulously documented the books he read. His Reading Notebooks thus enable the study of inputs to his creative process between 1837 and 1860.  We located digitized full texts of 670 of his nonfiction readings (390 of which he classified as work-related reading) and applied topic modeling to them. We then used the semantic space of the topic models in a novel way to measure the distances that Darwin traveled between books.  These measurements permitted us to investigate the trade-off he made between reading within a given domain and switching to new domains. Our analysis shows that Darwin’s behavior shifts from exploitation to exploration on multiple timescales, and that at the longest timescale these shifts correlate with major intellectual epochs of his career. Furthermore, contrasting his reading order with the publication order of the same texts, we find Darwin’s consumption of the texts is more exploratory than the culture’s production of them.

Cosponsored by MSU Libraries, Philosophy Department, and the Digital Humanities Program in the College of Arts and Letters.

LOCUS: Text Analysis

LOCUS is a new series of presentations from people at MSU doing work in DH. The second is Apr 9, 3pm. Click here for more info.

LOCUS: Call for Participation (full information found at digitalhumanities.msu.edu/locus/next)

Partners: Writing Rhetoric and American Cultures, Political Science, and the Social Science Data Analytic Initiative

Submit Proposal( CFP Closes – 3/31/2015  4/3/2015)dts@mail.lib.msu.edu
Register (space is limited)  – http://classes.lib.msu.edu/view_class.php?class_id=125

Date: 4/9/2015
Time: 3:00-5:30
Location: Main Library, 3 West, REAL Classroom

Increasingly, scholars operating in a wide array of disciplines use computational methods to study digital texts. These digital texts include but are not limited to journal articles, professional proceedings, government documents, novels, websites, and social media (Twitter, Facebook, among others). How can the content of these sources be collected and analyzed to infer the underlying structure and dynamics of human intent or behavior? What computational hurdles and opportunities exist to fruitfully utilize this digitized information in the context of (inter)disciplinary questions?  What leverage does digital text as a medium offer vs. its analog antecedents?  To what extent do computational methods align, complement, or diverge from methods used to study analog text? This LOCUS will gather scholars together to explore these questions in the context of specific research projects and/or pedagogical applications.

LOCUS Presentations are:
– 7-10 minutes presentation time
– Present on works in progress
– Present on completed projects
– Demo a method, tool, or resource
– Share “the seed” of an idea

Submission Guidelines:
– 300-500 words describing your presentation
– Highlight the connection between technology/digital method(s) and research and/or pedagogy

Speaker: “Approaches to Verbal Data Analysis”, Cheryl Geisler

Dr. Geisler will give a talk on Monday, March 23, 4:00pm in Bessey Hall, Room 300 (Writing Center). She will also give a workshop on Tuesday, March 24, 10:00am – 3:00pm in B342 Wells Hall.

This talk and workshop are part of the WRAC Department’s Speaker Series as well as the Road to HASTAC Speaker Series.

As scholars and teachers, we often find ourselves with access to copious amounts of texts, talk, or other verbal data, but with little idea on how to approach its analysis. In this workshop, Cheryl Geisler will provide an introduction to the systematic coding of verbal data. Following the presentation of sample analyses, participants will be introduced to the assumptions underlying analysis, the criteria for a good analysis, and the analytic process. Participants will be given a chance to work on a sample analysis and compare their results with one another. The last segment of the workshop will be given over to small group consultation in which participants will have a chance to share projects and receive advice about conducting an analysis.

Reading Group: Corpus Selection

Erin Beard will lead a discussion on Corpus selection based on the following articles. Please read as many of the articles as possible in advance of the discussion, but also feel welcome to attend even if you haven’t had a chance to read it all!

Refreshments will be served.