Blog Post 5: Data Mining & Quantifying Literature

Although quantitative data may seem like it is quite the opposite of literature, as one is based on factual numbers and the other based on creative words, I think that they can go hand in hand. Extracting quantitative data from literature can discover and present patterns throughout that piece that would not have been found when reading it word for word as one usually would. Data mining is defined as “an automated analysis that looks for patterns and extracts meaningful information in digital files (Underwood 2017)” (Drucker 110). Even though data mining is numerical, it can provide information that adds deep meaning to the literature being read and can help group literature based on different patterns. Data mining can be used with images and media as well however it can be a long process unlike data mining texts or literature.

As we know, Voyant Tools is a very useful data mining website that helps uncover different patterns throughout the text being analyzed. Voyant displays the most frequently used words, showing where in the text they are said most, as well as the average words per sentence, readability index, and vocabulary density. Collecting this kind of data about a piece of literature can demonstrate trends in an author’s writing that may help one better understand the author and their intention with their literature. However, data mining doesn’t just count the frequency of words, it calculates the “relation to the frequency of other words—and to the likelihood or probability of its being used” (Drucker 112). This data can be collected from multiple pieces from the same genre or author to then be compared to create connections. These connections would not have been made if close reading was performed over this new form of reading, “distant reading”. Distant reading allows for greater overall analyzation to be done within literature pieces unlike traditional literary research. While traditional literary research is thorough, it cannot work as efficiently as data mining through the use of Voyant tools. I think both methods of literary analyzation are beneficial to one another and can come together for a more comprehensive understanding.

Voyant Tools has helped to demonstrate the important use of formal language and character names throughout the novels of Charles Dickens. In putting one chapter from Dickens novel, Nicholas Nickleby, in Voyant I found that character names and formal words such as “Mr” and “miss” were the most frequently used words. However, when the book in its entirety was put into Voyant, the most common words were prepositions, therefore not adding much meaning to this analysis. People in my author had a similar problem, so we found the analysis of just one chapter to be more useful.



Comments

  1. Yes, it's interesting that the Dickens novels seem to get bogged down in language formalities. It sounds very polite, but what is happening? You could assume many characters are addressing each other. And the word "said" doesn't seem to prominent, where it is the most frequent word in some of the Woolf novels, so a difference in written dialogue perhaps?

    ReplyDelete

Post a Comment

Popular posts from this blog

Blog 4: Information Visualizations and Distant Reading

Blog Post 6: Maps & Virtual Spaces (Pat Pasong)