The EWDs in their current PDF-bitmap format are a priceless resource, but they would be even more valuable if they could be searched for significant words and phrases, to answer such questions as
what did Dijkstra write about <topic>?
where did Dijkstra write <pithy epigram> ?
Furthermore, as bitmaps the EWDs are inaccessible to visitors who are visually impaired.
Therefore we've started a project to transcribe the EWDs to text files. If you feel like contributing to this effort, we invite you to transcribe as many EWDs as your inclination and available time permit.
OCR transcriptions need proofreading and correction
The EWDs that were originally typewritten or have been typeset for publication are proving amenable to optical character recognition (OCR), but the results of OCR fall short of perfection. Raw OCR files in need of proofreading and correction are noted in the transcription index. If there's an EWD you'd like to nominate for the OCR collection, please let me know and I'll scan it.
Some typewritten EWDs have been transcribed into HTML by Google. The results are not good enough for this archive, but they are sometimes a good starting point for a transcription. If you're thinking of transcribing a typewritten EWD, perform a Google search on its designator (e.g., "EWD 35") and follow the "View as HTML" link in the search result.
As you can see by inspecting the transcriptions that have been completed so far, our aim is not to replace the EWDs, but only to provide them with searchable companions. The transcriptions contain only enough HTML markup to provide convenient links to the original PDFs, indentation, lists, and a few other amenities; you can save yourself (and me) some work by starting with a copy of the transcription stationery (if adding the markup is not convenient for you, feel free to send transcriptions in plain text, and we'll add the markup). Some suggestions for handling special characters can be found here.
The one exception to this simplicity is the EWDs' formulas. For purposes of searching, the formulas don't matter much— visitors aren't likely to search for formulas. For visitors who are visually impaired, however, the formulas must eventually be represented in a format which can be rendered by audio software. At the moment, the best long-term bet looks like MathML, but support for MathML, both for writing it and for rendering it (either visually or aurally), is still somewhat spotty. For now, therefore, we'll tend to concentrate on the EWDs that are less formula-intensive. When we do encounter formulas, we'll transcribe them as ordinary html, planning to come back and do them properly once we know what "properly" means.
Since all of the EWDs to be transcribed are already available on the web, you can simply email your transcriptions to the
and I'll install them in the web site. If you want to avoid collisions, let me know when you're about to start on a transcription, and I'll add an "in-progress" entry to the index.
Comments and suggestions on any aspect of this project are always welcome.