Dept. DF
Kindle typography

Update 2011 June 1: There has been a huge positive response to this article, so I got access to the Kindle SDK and started a company to implement some of these ideas natively on the Kindle hardware. Drop us a line if you’re interested.

Amazon’s Kindle sets text in justified paragraphs using a slab-serif typeface, PMN Caecilia, which typically looks very nice. Unfortunately, the Kindle does not hyphenate, so it is not unusual to see very loosely set lines. Occasionally, lines fail to reach the margin entirely:

Loosely set text on the Kindle

Figure 1: Text natively set by the Kindle. Notice that both the first and second lines of the first full paragraph fail to reach the rightmost margin, even though the text is supposedly justified. There is no native hyphenation on the Kindle, so the typesetting algorithm doesn’t notice this paragraph could be easily fixed by moving parts of “in-dif-fer-ent” and “courtier-like” up a line. Text from Tolstoy’s War and Peace.

The Kindle MOBI format implements only a limited subset of HTML and provides no access to the underlying rendering algorithm, so currently we cannot save native ebooks on the device. However, the Kindle 3 ships with a WebKit-based browser, so we can use JavaScript and CSS to dynamically reflow and style web content on the device. This article outlines an approach to implement hyphenation & justification on the Kindle with support for “advanced” features like hanging punctuation and non-rectangular paragraph shapes. We first discuss the Knuth & Plass line breaking algorithm, then how to render the resulting lines using either the word-spacing property or the flexible box model.

Line breaking

The problem of justification is to choose a series of line breaks in a string of text that yields the most attractive paragraph. Attractive meaning, of course, that the words are “not too far apart” or “not too squashed together” but are “just right”. The simplest justification algorithm is a greedy one:

0. Repeat until out of words:
  1. Put a word on the current line.
  2. If the next word will cause us to exceed the current line length, break the line.
  3. If not out of words, goto 1.
  4. Expand the spaces on every line except the last to justify the text.

This algorithm is simple and fast, but it gives loosely spaced, ugly paragraphs.

The legendary Donald Knuth designed with Michael Plass a more advanced line breaking algorithm in the early 80’s. Their algorithm considers the paragraph as a whole when choosing the line breaks. It does this by considering the “badness” of individual lines and the consequences of breaks on the badness of subsequent lines. Furthermore, it allows but penalizes hyphenation (especially if a hyphenation results in consecutive hyphenated lines).

K&P models a paragraph as a sequence of three kinds of object: boxes, glue, and penalties. Roughly, boxes are objects to be typeset (individual characters, pictures, mathematics, &c.), glue is white space, and penalties are potential line breaks with an associated aesthetic cost. Boxes each have their own immutable width, determined by the content inside. Glue items have their own natural width, which can be adjusted via (potentially infinite) stretchability $y$ or (non-infinite) shrinkability $z$ parameters. Penalties items have width only if they are chosen as break points (a soft hyphen, for instance, has width only when the word is broken there and the hyphen must be drawn). Line breaks occur only at penalties or at glue following a box.

By the appropriate distribution of glue and penalties, this model handles in a unified way justified, centered, and ragged left/right paragraph alignments, as well as non-rectangular paragraph shapes and hanging punctuation. The Knuth and Plass algorithm itself uses dynamic programming to find the optimal series of breaks and line stretch/shrink ratios.

Rendering

Once the K&P algorithm has yielded a set of breakpoints for a sequence of boxes, glue, and penalties, everything must be rendered onto the page. In the simplest case, where all of the glue on a line has uniform stretchability and shrinkability, one can simply use the line adjustment ratio r and set the CSS word-spacing property equal to r*y (if the line needs to be stretched) or r*z (shrunk).

Thanks to Bram Stein for pointing this out to me. His solution is to allocate word spacing in integer increments; rather than setting word-spacing: 0.33px; on a line having three spaces, just set word-spacing: 1px; on the first space. See his implementation here.

If you’re wondering why we have to set word-spacing manually, it’s because the browser won’t justify a single line of text (a single line is, after all, the last line; we’ll have to wait for text-align-last). One could render to a canvas element or other image, but at that point you might as well actually use TeX to render PDF.

This is a nice solution, markup-wise, because all we need to do is wrap each line of text with a span having a custom word-spacing style. Unfortunately, WebKit seems to ignore subpixel word-spacing and lines sometimes come out ragged:

WebKit ignoring subpixel wordspacing on Kindle

Figure 2: Knuth & Plass justified text, rendered by setting the CSS word-spacing property for each line. Note that the right margin is very slightly ragged; this is because WebKit ignores subpixel word-spacing values. Text from The Frog Prince, by the Brothers Grimm.

The second solution is to use, in a totally inappropriate fashion, CSS3’s flexible box layout. In particular, the spec notes that

unused space can be […] distributed among the children by assignment of “flex” to the children that should expand.

That is, the quantity of flex is essentially a combined glue stretchability/shrinkability. Thus, if we render each line as a <div> having style display: box; box-pack: justify; and give spaces the appropriate flex, the browser will take care of justification for us:

K&P justified text on KindleWebKit justified text on the Kindle

Figure 3: Knuth & Plass justified text rendered using CSS flexbox (left), and standard WebKit CSS text-align: justify (right). Notice that the K&P text is set a bit tighter, has fewer words on the final line, and includes hanging punctuation in the right margin.

Extra credit

The box-glue-penalty model makes it relatively easy to implement hanging punctuation in the right edge of the text. Knuth says (Digital Typography, pg. 87);

It is easy to get inserted hyphens into the margin: We simply let the width of the corresponding penalty item be zero. And it is almost as easy to do the same for periods and other symbols, by putting every such character in a box of width zero and adding the actual symbol width to the glue that follows. If no break occurs at this glue, the accumulated width is the same as before; and if a break does occur, the line with be justified as if the period or other symbol were not present.

and this strategy translates easily; all we need to do is give punctuation width: 0 and set the appropriate min-width on the immediately following glue.

Hyphenating words helps out justification algorithms greatly, because it allows fine-grain adjustments in the text colour. When hyphenating, words should only be hyphenated at natural breaks in the word pronunciation (“pro-nun-ci-a-tion”, not “pronu-nciat-ion”), and determining these breaks automatically is a difficult problem. See Hyphenator.js or Hypher for JavaScript implementations of the algorithm described in Frank M. Liang’s thesis (which is the algorithm used in TeX).

Demo

If you have a Kindle, you can point the experimental browser here to try out the K&P algorithm. If you don’t have a Kindle, a WebKit browser (Google Chrome or Safari) will work in a pinch.