Adding Typeset Sidenotes to a PDF
posted: 22 APR 2015
It’s not every day that you get to mark something as done that has been on your TO DO list for more than a decade.
For many years I have wanted to produce annotated versions of papers, where the annotations would be present as typeset side notes in the original PDF file.
I have recently finished a working system for doing that and here’s a quick explanation:
First, the technical explanation:
My goal was to make it easy to be able to start with a PDF document that looked more or less like this:
and quickly convert it into something like
Now I can in fact easily do this and here is how it works: I start by producing a version of the original with ruler marks on it, so that it is easy to specify exactly where the side notes should be located, as
Here we see that we want to locate the note on absolute page number 4 (the numbers along the edge), next to lines 16-26. The simplest way to specify that would be something like
So, I have developed software that can (1) produce the ruled versions of the original, necessary for specifying note locations, and (2) take a simple text file containing statements like the one above and produce typeset side notes that are then merged with the original PDF to give the final, typeset version.
From an appropriately written text file containing the text for side notes, I can automatically produce a typeset, annotated version of Mendel’s paper in less than a minute.
The system fully supports proportional fonts (my first efforts using mono-spaced fonts produced workable, but unattractive results). I decided that as long as I was going to figure out how to automate the typesetting of proportional fonts, I might as well go ahead and include support for UTF-8 encoded text, which would give me the ability to produce side notes in most any European language. And, when commenting on genetics material, there may be a desire to use italics or to allow the creation of super- or sub-scripts, so I added support for that as well.
Thus, a text file with comments can be written in any language (so long as the file is UTF-8 encoded, which is now standard on most computer systems) and a few HTML-like tags can be used to indicate where italics should go, etc.
For example, the following comments in the source file
will produce this typeset result:
Note: the finished file is fully typeset. The image above looks flaky in the foreign languages, but that’s because I have taken a screenshot of the page and converted it into an image file to include in this document.
Zooming in further in the Acrobat reader shows the actual quality of the typesetting:
To support proofreading of the final version, I have designed the system so that it is trivially easy to produce a document that contains the typeset side notes AND the ruler marks. This allows for a quick verification that the notes have in fact gone where they are supposed to go.
The system tries to set the side notes so that they are vertically centered immediately next to the specified text, but if there is a collision on the placement of the notes, the system will automatically adjust the placement of the notes (shifting them up or down) to make them fit (provided that they can be made to fit, given the length of the notes and the size of the specified font).
Bottom line: the system was designed to do what I consider computers should do, if they are to be maximally useful. That is, it allows the creation of a final product for just the marginal effort necessary to create that specific product. In this case, the actual writing of the side notes is the only inescapable human work to be done. The rest is automated.
The system is pretty flexible, so that the notes may be set in any color and in any size, using either serif or sanserif fonts, or even a font that looks like it is hand drawn. Using a parameter file I can adjust the tracking (spacing between letters) or leading (spacing between the lines), as well as the relative size and relative position of super- and sub-scripted characters. If super- or subscripts are going to be used, the leading needs to be adjusted to make sure that there is room for them. I have found through experimentation, that the readability of some fonts (especially when set small, like 8 point, or smaller) is improved by a bit of increased inter-letter tracking.
Finally, there is no reason this is restricted to Mendel’s paper (as in the examples shown). The system can be used to produce annotated versions of any PDF file. And, the PDF file does not have to be typeset — it can just be scanned pages of some paper original. Nor does the original have to have room in the margins. It is relatively easy with some free tools to modify any PDF to add more space around the edges.
I did not build this system as a user-friendly product that could easily be used by anyone. However, I would be happy to consider working as a collaborator with anyone who might find this a useful tool for some project.
If you are interested in a possible collaboration, please CONTACT ME
In the early 1990's, Robert Robbins was a faculty member at Johns Hopkins, where he directed the informatics core of GDB — the human gene-mapping database of the international human genome project. To share papers with colleagues around the world, he set up a small paper-sharing section on his personal web page. This small project evolved into The Electronic Scholarly Publishing Project.
In 1995, Robbins became the VP/IT of the Fred Hutchinson Cancer Research Center in Seattle, WA. Soon after arriving in Seattle, Robbins secured funding, through the ELSI component of the US Human Genome Project, to create the original ESP.ORG web site, with the formal goal of providing free, world-wide access to the literature of classical genetics.
Although the methods of molecular biology can seem almost magical to the uninitiated, the original techniques of classical genetics are readily appreciated by one and all: cross individuals that differ in some inherited trait, collect all of the progeny, score their attributes, and propose mechanisms to explain the patterns of inheritance observed.
In reading the early works of classical genetics, one is drawn, almost inexorably, into ever more complex models, until molecular explanations begin to seem both necessary and natural. At that point, the tools for understanding genome research are at hand. Assisting readers reach this point was the original goal of The Electronic Scholarly Publishing Project.
Usage of the site grew rapidly and has remained high. Faculty began to use the site for their assigned readings. Other on-line publishers, ranging from The New York Times to Nature referenced ESP materials in their own publications. Nobel laureates (e.g., Joshua Lederberg) regularly used the site and even wrote to suggest changes and improvements.
When the site began, no journals were making their early content available in digital format. As a result, ESP was obliged to digitize classic literature before it could be made available. For many important papers — such as Mendel's original paper or the first genetic map — ESP had to produce entirely new typeset versions of the works, if they were to be available in a high-quality format.
Early support from the DOE component of the Human Genome Project was critically important for getting the ESP project on a firm foundation. Since that funding ended (nearly 20 years ago), the project has been operated as a purely volunteer effort. Anyone wishing to assist in these efforts should send an email to Robbins.
With the development of methods for adding typeset side notes to PDF files, the ESP project now plans to add annotated versions of some classical papers to its holdings. We also plan to add new reference and pedagogical material. We have already started providing regularly updated, comprehensive bibliographies to the ESP.ORG site.
ESP Picks from Around the Web (updated 07 JUL 2018 )
Science Policy & Funding
Big Data & Informatics