Data Mining & Quantifying Literature

"Data mining is an automated analysis that looks for patterns and extracts meaningful information in digital files" (Underwood 2017). With technology continuing to advance with time, the tools that are created for intellectuals to analyze large amounts of text at once and at a faster glance are more important than ever. The significance of data mining in specific is to extract important patterns and aspects of text quickly to understand the larger picture of a text. 

Voyant tools that we have been using specifically analyze the language used in the text. The most commonly used words are highlighted, unique word forms are pointed out, and trends are also made apparent. Johanna Drucker said in our textbook, "Distant reading is the idea of processing content—subjects, themes, persons, or places—or information about publication date, place, author, or title in a large number of textual items without engaging in the reading of the actual text" (Drucker). When we used Voyant for our personal novels, it was a way to distant read. We were not focused on the individual details and story of the text, but rather what we can figure out just from reading at a glance. 


This is an example from the chapter I read from "Oliver Twist" by Charles Dickens. Based on the use of the Voyant tool I can say that this chapter specifically focuses on the characters of the story and they play a large role in the storyline. The most commonly used words were the character's names, the story clearly focuses a lot on the characters and the different situations they end up in. I can also conclude that this author is very detail-oriented due to all of the smaller words that are not used as much being descriptive and focused on the aspects of the story. 

"Using Voyant, and other text analysis tools, is most successful when applied to a text with which the researcher is familiar" (Drucker). As much as it allowed me some insight when I have yet to read the entire novel, it is clearly a tool that seems as though it would work better for someone looking for more underlying themes and a deeper analysis of a text that they have already read and understand. Data mining and the tools used to do so are important for the future of understanding literature and decision making. 



Comments

  1. Yes, for scholars who are familiar with many of the author's readings, it would be easier to recognize the patterns in the distant analysis, or to even look to confirm their working hypothesis. I think it just goes to show that we can't replace reading or have digital tools do that work for us.

    ReplyDelete
  2. I can definitely see how Voyant tools can help scholars with research, especially if their research is focused on the specific use of language. By finding the most common words we can assess what parts of speech people most prioritize within their literature. I found that in my book, Persuasion by Jane Austen, titles of names were most common. Names of the main characters and Mrs, Mr, and Captain were in the top five most common words. The usage and abundance of these words shows how important titles and social status were at the time when the book was written.

    ReplyDelete
  3. I like how you start your post with the definition of both data mining and distant reading as a structure to the rest of the discussion. In addition, your example of your project in Voyant tools really helped my understanding of what function role distance reading can play in an overall understanding of the text. I think you make a fantastic point at the end of your blog, as well, by acknowledging that the Voyant tool might work better as an analytic tool when analyzing a text that the digital humanist is very familiar with.
    From my own personal experience with my own group's project, I can safely say that I was confused at first on exactly what role the most common words would play in terms of a bigger theme in the text. However, in a similar fashion to you, I did some research and tried to put the words into context with the story's overall setting and characters, and my group worked on getting a better grasp on the author's (I had Jane Austen) writing style. We eventually shaped our observations around the significance of the words when examined in collaboration with the author's writing style as well as the text as an entity that is most relevant to the time it was written (in the late 1700s).
    Finally, it was super interesting to see a screenshot of your work in Voyant tools and your examination of your text helped me with mine. I drew some of the same conclusions (such as the names meaning the author wrote in-depth about their characters) and I wish you the best of luck on your final project!

    ReplyDelete

Post a Comment

Popular posts from this blog

Blog 4: Information Visualizations and Distant Reading

Blog Post 6: Maps & Virtual Spaces (Pat Pasong)