I recently created a project using my Raspberry Pi mini computer that explores visualizations of documents run through Optical Character Recognition (OCR) software. Visit the project repository on GitHub and view the project in action.
A lot of OCR focuses on, surprise, visuality. This project, then, is an attempt to sonify dirty data. Digital Humanists are no strangers to cleaning up dirty data. We see it presented before us on screens. The OCR misses words or replaces letters with numbers. We then use cleaning methods like regular expressions and other tools to return the text to its original informational state. We clean away the ‘nonsense’ data that becomes entangled in our ‘pure’ data. But what happens if we sonify that dirty data? We challenge the idea that text is useless because it is ‘dirty’. Sonifying OCRd data complements and even enhances the purely visual aspect of OCR. Take it a step further: what happens if we then convert that sonification back into a text file to visualize it? We begin to mix the digital voice into our work. Sound literally changes our data as it becomes entangled in our association of OCR and visuality. We can begin to approach an understanding of ‘data’ as what Johanna Drucker calls ‘capta’, or something that is constructed and created in a specific context. Data ceases to be purely informational. Therefore, ‘bad’ OCR becomes valuable for our understanding of data as capta, the different ways we can ‘visualize’ data, and the affordances and limitations of OCR software.
This project was inspired by the BrickPi Bookreader. I wanted to create a book digitizer like the BrickPi that included an aspect of sonification and visualization. Below I will outline my process with this project (I go into more detail in my GitHub repository). Aside from hardware, this project utilizes two python scripts.
- Download the current Raspberry Pi operating system
- Power up Raspberry Pi
- Update Raspberry Pi
- Enable camera module, and sound via HDMI
- Download Tesseract OCR engine, Alsa player, Espeak text to speech, PocketSphinx speech to text, Sox audio conversion tool
- Install the camera module
- Put Raspberry Pi into cardboard box with cutout for camera so that camera is stable.
- Take photo of book page/document (any text will do)
- Run Tesseract and save to text file
- Run Espeak to convert text file to wav audio file
- Convert wav file to correct frequency using Sox
- Run PocketSphinx to convert wav file to text file
- Connect breadboard to Raspberry Pi via breakout ribbon and cobbler board
- Insert Red and Green LED lights, resistors, and jumper cables
- Run python script that compares original OCRd text file with OCR-text-file-to-audio-back-to-text file
- Get Red or Green light
- Run python script to turn off lights
When I began the process of translating my original text, I imagined the text to speech software would fail because of bad OCR. I was surprised. I took a photograph of an ethics clearance form for my Masters project. Tesseract OCRd my paper beautifully with few errors. The text to speech operation produced an equally ‘sound’ audio file with a near perfect reading of the OCRd text file and the original paper. But when the audio was converted back into a text file, the result was a garbled mess of text.
I think this shows the limits of PocketSphinx more than anything. PocketSphinx learns inputed audio and can produce a textual output if required. I wonder what a different speech to text software would output. On that same note, what would happen if I recorded my own voice reading the text file rather than the computer voice from Espeak and Alsa? I imagine the output would be a bit different. The point is that the quality of every step depends on context: the document being photographed, the quality of the photograph, the quality of the audio files, and the software used at each stage. It appears that all data is capta.
Try it out yourself! The tutorial I put together is geared towards a complete beginner. I have explained as much as I can about issues with a Linux-based operating system and the hurdles a beginner can face with their Raspberry Pi (all the way down to the optimal way to connect an HDMI… trust me, this will save you a lot of headaches).
That last point about HDMI may seem insignificant but it deeply important to this project. It shows how minor the issues we confront can be. I did not face any major issues throughout the process of creating this project. I had prior experience using my Raspberry Pi and knew the particular issues with the Pi and the Linux operating system. The biggest problem I faced the first day I opened my Pi was connecting it to my TV via HDMI. I found resources online showing one how to enable HDMI output. However, it was only through experience that I learned the particulars of plugging the HDMI cable in first and turning on the TV before powering up the Pi. Otherwise, there was no output. Similarly, updating Linux using
sudo apt-get update and
sudo apt-get upgrade can be ridiculously difficult. Installing software on Linux requires a lot of rebooting and cache-clearing.
And so I approached my project as a ‘good digital citizen’ making my work reproducible. Often tutorials will provide the correct steps and information to reproduce one’s work. But they are not enhanced for User Experience. These tutorials are not necessarily maliciously misleading or guarding computing from the masses for the privileged small audience of ‘serious’ programmers. Many authors simply may not account for the range of users attempting to recreate each step of their tutorial. My tutorial, on the other hand, goes to great lengths to describe the minor intricacies of installing both software and hardware (down to a description of the amount of pressure needed to install pins on the breadboard). There were many steps I had to take like physically installing my hardware with a little more force than I was comfortable with because I found nothing to guide me. I made sure to include my own experiences in the tutorial, littered as strategically placed CAUTIONS! and NOTES.
In the end, there were several cases where I could only provide a simple word of encouragement for my users. Upgrading the Linux operating system, for instance, is not uniformly buggy. And so all I could do to account for this was to provide troubleshooting alternatives such as sequences of rebooting until the upgrades install without error. A good digital citizen, then, will account for the range of user experiences and potential issues. Ultimately, though, people will run into errors and headaches – they will make mistakes. But that is the fun in digital work. The value comes from the process. I enjoy hitting a bunch of walls because I can learn the intricacies of a system. It makes me wonder, then – how much should we help users in a tutorial? I think we should provide as much guidance as we can. We should not deliberately withhold information as some pedagogical lesson, especially when people need pertinent help. Because if I’ve learned one thing, it’s that no matter how much guidance you provide, Murphy’s Law reigns true: we will always run into problems.