Data & Digitization

Data and Digitization

How do data and the inner workings of the web relate to the overarching concept of digital humanities? How do researchers use data to create digital humanities projects? Data used in digital humanities are made through a process called data modeling. Data models and data are never neutral, but they are the expression of some point of view or value system. (Drucker, 32) 


There are different types of data which can be used for different purposes. First, there is quantitative and qualitative data. Quantitative data is produced using anything that can be counted, while qualitative data is created from interpretative judgments, such as an open-ended survey. Next, there is structured and unstructured data. Structured data is made up of explicit units, like numbers or true/false statements, while unstructured data is made up of often ambiguous units, like images or videos. 

Digital research in the humanities can make use of existing or new data. In my case, I am analyzing a project that uses existing data: the Vogue archives. Analyzing existing data can be complicated because we often don't know how complete the data is, who made it, or how it was made. Humanists often struggle with gathering data that is complete and consistent. Materials from the past, such as an issue of Vogue from 1923, are often incomplete or vague, and humanists have to work with these types of data all the time. 



Concepts that are not quantifiable can make data modeling even more complex. For example, there is the issue of analyzing language. "When we are analyzing language, instability of meaning is built in." (Drucker, 23) The word embedding section of the Vogue project I am analyzing could be problematic as robots do not know to distinguish words by their different contexts. For example, "makeup" could mean cosmetics or it could mean composition, depending on the context.



Now we move on to the process of digitization. 


All online material is formatted in HTML. HTML, along with HTTP, CSS, and JavaScript, is a main element used to structure web-based documents. All HTML documents are structured with tags so that their contents are read according to a fixed set of rules. No one owns HTML. (Drucker, 35)

Other types of elements:

  • HTTP – an application layer protocol designed to transfer information (Cloudflare)
  • CSS – language used to style websites
  • JavaScript – language used to implement complex features, like animated images
  • Proprietary formats of files: Microsoft Excel and Word, Adobe InDesign
  • Open-source formats of files: GIF, PNG, HTML, JPEG
Key takeaways from Chapters 2 & 3:
  • To parameterize is to measure features such as the number of words in a song or the length of a video by some metric (quantity, scale, size, etc.)
  • To tokenize is to determine what units can be identified in an entity, such as words. This is a subset of parameterization.
  • The W3C is the organization that monitors standards of the Web, such as tags and protocols.
  • Ethical concerns of digital content include sustainability, accessibility, and copying code. It is better for technology to be simpler so that it is more accessible and sustainable.
  • Making information digital does not preserve it



Comments

  1. Excellent and helpful breakdown of terms and concepts here! And how they apply to Robots Reading Vogue-makeup is a great example.

    ReplyDelete
  2. I loved the way you presented the information here, it makes it accessible for the reader in a pleasant way. I found this chapter to be heavy with definitions and explanation which were kind of confusing for me at first. Specifically unstructured data vs structured was very confusing for me because of the way the textbook explained it. But great job presenting the information directly!

    ReplyDelete
  3. The Vogue project sounds so interesting! It's so cool how all issues are able to be accessed online, especially in a magazine that has been historically exclusive and luxury. I found Drucker's idea you displayed about the impact of language's fine nuances on AI to be intriguing/comforting. The fact that AI has problems distinguishing the words "makeup" and "composition," from each other gives me hope that there will still be use for an English degree after graduation (hopefully). I also liked how you broke down the different types of data starting with quantitative vs. qualitative. I agree with your point that simple technology feeds sustainability/accessibility. The more simple something is the more easily it can be accessed and consumed.

    ReplyDelete

Post a Comment

Popular posts from this blog

Blog 4: Information Visualizations and Distant Reading

Blog Post 6: Maps & Virtual Spaces (Pat Pasong)