Blog post #5 - Data mining & quantifying literature

- February 24, 2024

Our coursebook defines Data Mining as, “…an automated analysis that looks for patterns and extracts meaningful information in digital files (Underwood 2017)… Data mining has long been incorporated into the natural and social sciences. It has become a part of research methods in text, music, sound recording, images, and multimodal communications studies with tools customized for these purposes. Text analysis is a specialized subset of data mining that focuses on analysis of language (Schmidt 2013).”

In simple terms this process can be used for things like analyzing cultural texts within their native context or gathering qualitative research.

Our textbook also states that, “At the heart of data mining and text analysis are several processes that should be understood critically. These are the same processes noted earlier: parameterization and tokenization. These identify what can be counted and how the counting is done. Additional considerations come into play with data mining, which are the statistical analyses of frequency, proximity, and value of individual data points within the larger sets. Principles like collocation of words are judged relative to other usage—and in contrast to the sum of all other words in a sample. In other words, multiple factors go into determining how any individual word is valued, not just the number of times it appears.” Which is exactly the kind of data we’re examining when using the Voyant Tools program. This program will highlight key words, language trends and mapping where certain words are used. This is a great tool for analysis since it will help with distance reading since the program groups words in terms of frequency, which aids in the reader understanding the themes and topics written across the whole novel.

Here’s an example with my book: “Emma” by Jane Austen. As you can see, in this entire book a lot of the most frequent words are transition phrases such as: “and”, “the” or “to”.

Comments

Julianna PascuccioFebruary 25, 2024 at 10:35 AM
I enjoyed how you included a screenshot of the Voyant tool with your book at the bottom of the post. Being able to see a different book example helps me compare and contrast my pattern findings using that same Voyant tool. My book, Great Expectations, by Charles Dickens has frequent words that were either the character’s names or transition words as well. The top five words for chapter 20 of my book included “Mr,” “Jaggers,” “said,” “clerk,” and “like.” I found that using the Cirrus section of the Voyant tool was beneficial when figuring out the theme of the chapter. Visually seeing the line graph of those frequencies in the Trends section was also useful when analyzing a specific line of the book.
ReplyDelete
Replies
Dr. MFebruary 26, 2024 at 3:19 PM
That is a lot of transition words, I wonder if the sentence length is long in this text?
ReplyDelete
Replies

Add comment

Search This Blog

Digital Humanities

Blog post #5 - Data mining & quantifying literature

Comments

Post a Comment

Popular posts from this blog

Blog 4: Information Visualizations and Distant Reading

Julianna Pascuccio - MEdiation Website

Blog Post 6: Maps & Virtual Spaces (Pat Pasong)