ESP Blog 002: Adding Typeset Sidenotes to a PDF

The Electronic Scholarly Publishing Project: Providing world-wide, free access to classic scientific papers and other scholarly materials, since 1993.

More About: ESP | OUR CONTENT | THIS WEBSITE | WHAT'S NEW | WHAT'S HOT

Adding Typeset Sidenotes to a PDF

posted: 22 APR 2015

It’s not every day that you get to mark something as done that has been on your TO DO list for more than a decade.

For many years I have wanted to produce annotated versions of papers, where the annotations would be present as typeset side notes in the original PDF file.

I have recently finished a working system for doing that and here’s a quick explanation:

First, the technical explanation:

My goal was to make it easy to be able to start with a PDF document that looked more or less like this:

and quickly convert it into something like

Now I can in fact easily do this and here is how it works: I start by producing a version of the original with ruler marks on it, so that it is easy to specify exactly where the side notes should be located, as

Here we see that we want to locate the note on absolute page number 4 (the numbers along the edge), next to lines 16-26. The simplest way to specify that would be something like

4:16-26::Mendel notes that no previous work has ever attempted to do a quantitative analysis of the types of progeny produced in controlled breeding experiments.

So, I have developed software that can (1) produce the ruled versions of the original, necessary for specifying note locations, and (2) take a simple text file containing statements like the one above and produce typeset side notes that are then merged with the original PDF to give the final, typeset version.

From an appropriately written text file containing the text for side notes, I can automatically produce a typeset, annotated version of Mendel’s paper in less than a minute.

The system fully supports proportional fonts (my first efforts using mono-spaced fonts produced workable, but unattractive results). I decided that as long as I was going to figure out how to automate the typesetting of proportional fonts, I might as well go ahead and include support for UTF-8 encoded text, which would give me the ability to produce side notes in most any European language. And, when commenting on genetics material, there may be a desire to use italics or to allow the creation of super- or sub-scripts, so I added support for that as well.

Thus, a text file with comments can be written in any language (so long as the file is UTF-8 encoded, which is now standard on most computer systems) and a few HTML-like tags can be used to indicate where italics should go, etc.

For example, the following comments in the source file

4:15-26::Embedding HTML-style tags in the source text makes it possible to produce typeset side notes in italics or in bold or in bold italics. It is also possible to include superscripts like N2 or subscripts like F1.

4:15-35::The typesetting systems supports generic UTF-8 encoding for nearly all European languages, so that comments may be written in the English language, ή στην ελληνική γλώσσα, nebo v českém jazyce, или на русском языке, ou dans la langue française, vagy a magyar nyelv, lub w języku polskim, eller i det norske språket, or …

will produce this typeset result:

Note: the finished file is fully typeset. The image above looks flaky in the foreign languages, but that’s because I have taken a screenshot of the page and converted it into an image file to include in this document.

Zooming in further in the Acrobat reader shows the actual quality of the typesetting:

To support proofreading of the final version, I have designed the system so that it is trivially easy to produce a document that contains the typeset side notes AND the ruler marks. This allows for a quick verification that the notes have in fact gone where they are supposed to go.

The system tries to set the side notes so that they are vertically centered immediately next to the specified text, but if there is a collision on the placement of the notes, the system will automatically adjust the placement of the notes (shifting them up or down) to make them fit (provided that they can be made to fit, given the length of the notes and the size of the specified font).

Bottom line: the system was designed to do what I consider computers should do, if they are to be maximally useful. That is, it allows the creation of a final product for just the marginal effort necessary to create that specific product. In this case, the actual writing of the side notes is the only inescapable human work to be done. The rest is automated.

The system is pretty flexible, so that the notes may be set in any color and in any size, using either serif or sanserif fonts, or even a font that looks like it is hand drawn. Using a parameter file I can adjust the tracking (spacing between letters) or leading (spacing between the lines), as well as the relative size and relative position of super- and sub-scripted characters. If super- or subscripts are going to be used, the leading needs to be adjusted to make sure that there is room for them. I have found through experimentation, that the readability of some fonts (especially when set small, like 8 point, or smaller) is improved by a bit of increased inter-letter tracking.

Finally, there is no reason this is restricted to Mendel’s paper (as in the examples shown). The system can be used to produce annotated versions of any PDF file. And, the PDF file does not have to be typeset — it can just be scanned pages of some paper original. Nor does the original have to have room in the margins. It is relatively easy with some free tools to modify any PDF to add more space around the edges.

I did not build this system as a user-friendly product that could easily be used by anyone. However, I would be happy to consider working as a collaborator with anyone who might find this a useful tool for some project.

If you are interested in a possible collaboration, please CONTACT ME

ESP Quick Facts

ESP Origins

In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.

ESP Support

In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.

ESP Rationale

Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.

ESP Goal

In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.

ESP Usage

Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.

ESP Content

When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.

ESP Help

Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.

ESP Plans

With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.

Papers in Classical Genetics

The ESP began as an effort to share a handful of key papers from the early days of classical genetics. Now the collection has grown to include hundreds of papers, in full-text format.

Digital Books

Along with papers on classical genetics, ESP offers a collection of full-text digital books, including many works by Darwin and even a collection of poetry — Chicago Poems by Carl Sandburg.

Timelines

ESP now offers a large collection of user-selected side-by-side timelines (e.g., all science vs. all other categories, or arts and culture vs. world history), designed to provide a comparative context for appreciating world events.

Biographies

Biographical information about many key scientists (e.g., Walter Sutton).

Selected Bibliographies

Bibliographies on several topics of potential interest to the ESP community are automatically maintained and generated on the ESP site.