Data Mining & Quantifying Literature

As defined in chapter 7 of our textbook, data mining is an automated analysis that looks for patterns and extracts meaningful information in digital files. Data mining has become a part of research methods in text, music, sound recording, images, and multimodal communications studies with tools customized for these purposes. 


Data mining allows the analysis of cultural materials in their native format (texts, images, and media). However, processing has to occur before text or image analysis can proceed. A concept that has arisen to describe these approaches (which we have discussed previously in class) is “distant reading”, the idea of processing content such as subjects, themes, persons, places or information about publication date, place, author, or title in a large number of textual items without engaging in the reading of the actual text. The “reading” itself is a form of data mining that allows information in or about the text to be processed and analyzed. It can also help detect patterns of changes in vocabulary, nomenclature, terminology, moods, themes, and a number of other topics. It allows for a broader insight into the patterns of reading. 


Data mining is not limited to texts. Many applications for extracting meaningful statistical information from materials operate on quantitative data, which is information that originates in numerical form. At the heart of data mining and text analysis, are several processes that should be understood. These are the same processes noted earlier in past chapters of the textbook: parameterization and tokenization. These identify what can be counted and how it will be done. Additionally, considerations come into play which are the statistical analyses of frequency, proximity, and value of individual data points within the larger sets. 


An example of data mining that we have begun to look at in class is Voyant, a dashboard-based platform that was developed to be useful without technical expertise and is freely available for use online. Voyant is a dashboard-based platform, which means it has various functions and can be accessed by simply going to the site and entering texts or URLs directly. The tools automatically process and out[ut the results into a set of different screens to offer an array of visualizations that you can use to analyze a text in different ways. While using Voyant this past week has been something that I have found particularly confusing at first, I think that this is a great tool for people to be able to have easy access to a tool that allows users to analyze specific texts with a variety of tools. 


Comments

Popular posts from this blog

Blog 4: Information Visualizations and Distant Reading

Blog Post 6: Maps & Virtual Spaces (Pat Pasong)