HTML is well-positioned for delivering information with contextual formatting to a wide variety of rendering agents ("browsers"). In most cases HTML is more than sufficient for this task but indisputably there arise some occasions that demand more control over formatting and the requirement for a document feature set that exceeds the "one-page fits all" model of HTML. In such cases LaTeX and the Portable Document Format (PDF) offer a perfect solution!
Why not stay with TeX/LaTeX and the .dvi (device independent) format? Most people in the academic community are perfectly comfortable with .dvi files, knowing already about tools such as dvips(1), kdvi, xdvi, etc. that allow printing and viewing of .dvi files. In the Microsoft Windows world, however, technical expertise is often sorely lacking (let's face it: Microsoft sells to people who would mistake MS Word for a DTP program) so .dvi files are not an option when addressing the general public.
And this brings us to the purpose of this document: how to create high-quality (e.g. fast-displaying) PDF documents from your LaTeX sources. These documents will offer the following benefits to those who read your work; all of that capability is available completely free of charge. I've done all the work for you with this comprehensive how-to document:
The combination of tools available allows you to generate documents whose visual quality and feature-completeness can rival anything you can get out of commercial tools on other platforms. What's more, few of the documents created with Adobe Distiller are as feature-rich as what you get with these tools. And with Linux it's all free. No surprise at all.
Let's get started!
Linux is not required, but given the context of this document I make the assumption here that you have an operating system that is basically Unix-like. Solaris, BSD, Linux… they should all work the same for this purpose:
If so, then you have the excellent THUMBPDF program installed and have all the pieces that you really need! Of course, you don't need the THUMBPDF program: without it your PDF files will have only blank thumbnails for the pages. Few people seem to bother with those anyway, probably because they don't know how to create them…
Note: If the version of THUMBPDF is later than 1.11 (or at least 2.10) then you need to use the --v2 parameter on the script provided in Figure 8: in earlier versions THUMBPDF created (and relied upon) files named "thumb???.png" but this pattern has now changed to "thb?.png".
To see if you have all the right pieces in place, create yourself a working directory (perhaps called "tex") and then cut-and-paste the following text into a file in that directory named "test.tex":
Figure 1: test.tex, A Small Test Document |
\documentclass{article} \usepackage{thumbpdf} \usepackage[pdftex, colorlinks=true, urlcolor=rltblue, % \href{...}{...} external (URL) filecolor=rltgreen, % \href{...} local file linkcolor=rltred, % \ref{...} and \pageref{...} pdftitle={Untitled}, pdfauthor={Your Name}, pdfsubject={Just a test}, pdfkeywords={test, testing, testable stuff}, pdfproducer={pdfLaTeX}, pdfadjustspacing=1, pagebackref, pdfpagemode=None, bookmarksopen=true]{hyperref} \usepackage{color} \definecolor{rltred}{rgb}{0.75,0,0} \definecolor{rltgreen}{rgb}{0,0.5,0} \definecolor{rltblue}{rgb}{0,0,0.75} \title{Untitled} \author{Your Name} \date{\today} \begin{document}\label{start} \maketitle \section{Section One} A test document from \href{http://www.Ringlord.com/}{Ringlord Technologies}! \subsection{Subsection One Dot One} Hello, this is section 1.1 \subsection{Subsection One Dot Two} And here we have section 1.2; here's a link to \href{Other.pdf}{another Document (Other.pdf)}, which probably does not exist (but that's OK). \section{Section Two} You could also click on the following number to jump to the first page, namely page \pageref{start}\ldots \label{end}\end{document} |
Figure 2: Acrobat Reader 4.0 showing our test document. At this resolution it may not be obvious, but the outline in the left column reflects the section and subsections of the text on the page. |
pdflatex test
twice in a row to ensure that the \pagerefS are resolved. If you are not familiar with the way TeX and LaTeX work, you should know at least this:
TeX and LaTeX works as a compiler does. It parses the document specification and builds output from that. As it encounters \labelS (markers whose section and/or page number you can reference) it writes these to a file for future reference. Any forward references (say a reference on page 1 to an object somewhere else, maybe on page 50) are not yet known by the time that the reference on page 1 is processed.
By running the program a second time it will be able to fill-in the reference on page 1 that it finally encountered on page 50 on the previous run.
But back to our test.tex file: If you get errors then check again that you copied the entire test document correctly. You should get several dozen lines of output from the program, most or all of which you can ignore. You should get the shell prompt back; if not then there's an error and you should check that you copied and pasted the test document from Figure 1 correctly.
If everything went as planned then you should now have a simple PDF file named "test.pdf" (yes, test.tex --> test.pdf, that is the expected pattern). You could now blindly trust that your PDF document is ready for distribution. Instead, it would be good to check it. Further below I'll also show you how to add thumbnail images to the PDF for some extra karma.
Try the following commands in succession. As soon as one of them works you should have a way of verifying your documents. I'll leave it up to you to research the operation of these programs:
If you have Adobe Acrobat Reader (acroread) installed then you have access to a set of features of the PDF document that the other programs may not make fully accessible to you. Figure 2 shows you what this document looks like on my 1600x1200 sized display screen. Note the bookmark outline in the left column. And if you look closely you'll notice that there is some blue (web link) and green (external document link) text, too. There is a red page number, too, that links to a page in this document, but it's too small to read at the resolution of that image. You should be able to see that on your own copy, though!
At this point you've managed to produce a fairly nice, if simple PDF document. In the next sections I'll show you how to add some cool features.
This is a good time to examine the various parts of that "test.tex" (Figure 1) file to get an idea of what it is that got us this far. This will also let you customize the output. Here is that relevant section:
Figure 3: Hyperlinking for PDF |
\usepackage[pdftex, colorlinks=true, urlcolor=rltblue, % \href{...}{...} external (URL) filecolor=rltgreen, % \href{...} local file linkcolor=rltred, % \ref{...} and \pageref{...} pdftitle={Untitled}, pdfauthor={Your Name}, pdfsubject={Just a test}, pdfkeywords={test testing testable}, pdfadjustspacing=1, pagebackref, pdfpagemode=None, bookmarksopen=true]{hyperref} |
The hyperref macro package provides you with the ability to specify hyperlinks in your document. The package is configured with a series of comma-separated options in the (optional) [ ]-list:
Now, you are probably wondering what's with these "rltred", "rltgreen", and "rltblue" colours. These are defined after (and could be defined before) the hyperref package, but must be defined after the color package is included. Of note here is the fact that I'm using a somewhat darker green than usual because green on a white computer monitor background gets washed-out too easily.
Figure 4: Defining Colours by Name |
\usepackage{color} \definecolor{rltred}{rgb}{0.75,0,0} \definecolor{rltgreen}{rgb}{0,0.5,0} \definecolor{rltblue}{rgb}{0,0,0.75} |
If you tried to compile "test.tex" as a normal LaTeX document, you may have noticed (with some dismay) that it no longer compiled. That's because the various references to PDF constructs aren't understood by the LaTeX constructs.
There's an easy way to fix that, however. It involves a bit of TeX code:
Figure 5: Flexible generation of both PDF and DVI files |
\newif\ifpdf \ifx\pdfoutput\undefined \pdffalse % we are not running PDFLaTeX \else \pdfoutput=1 % we are running PDFLaTeX \pdftrue \fi %---------------------------------------------------------------------- % The following is the construct that interests us in the end: \ifpdf % Put PDF-specific stuff here \else % Put LaTeX-specific stuff here \fi |
If that doesn't make a lot of sense to you, don't worry about it. The last five lines are the interesting stuff. It's basically an if-then-else statement that allows us to do some stuff when compiling the document for PDF, and something else for LaTeX. I'll have another example for you later (see Figure 8) that includes all the tricks. For now, just keep reading:
The THUMBPDF program takes as its input a PDF file. From that it can generate a series of PNG images (what, you're still using GIF?! Shame on you!) which it then merges into the PDF file. This works well enough, but THUMBPDF doesn't create terribly nice looking snapshots of the pages: there is no antialiasing. Does that mean THUMBPDF is useless? Not at all! The author of THUMBPDF is a smart person and has allowed us to skip the generation of PNG images, thereby accepting existing images, or images that we've generated with another program! Haha, flexible software. Gotta love it!
So, what program should we use to generate high-quality scaled representations of our PDF document's pages?
The combination that has worked very well for me is Ghostscript to take snapshots of the pages as JPEG images, and The GIMP to scale and post-process these images into PNG files that THUMBPDF can read.
What, The GIMP is slow? It's true that The GIMP is a large, large program and takes about five seconds (on my system) to load. But there's a way to get it to process all of our pages without reloading the program so we pay this slow-loading penalty only once per document. That's not bad. Once loaded The GIMP isn't all that slow, so let's go with that solution. If you're adventurous you can use the PPM utilities, ImageMagick, or whatever tool you prefer.
Using The GIMP gives me a chance to introduce non-interactive use of that program, which took me a while to figure out and get right. Why not teach you that, too? :-)
The following invocation of Ghostscript (gs) will take an inputfile.pdf and generate a 36 DPI snapshot of each of the file's pages, writing these to files named 1.jpeg, 2.jpeg, etc.:
Figure 6: Getting Ghostscript to Make Snapshots |
gs -q -NOPAUSE -sDEVICE=jpeg -sOutputFile=%d.jpeg -r36x36 \ "inputfile.pdf" &>/dev/null <<EOF EOF |
But wait, I've already incorporated this in a nifty tool (see Figure 8). We'll now proceed to using The GIMP to perform automated processing of multiple image files from the command line. Yes, this little recipe is useful for just about any command-line usage of The GIMP:
Store the following file as "scale-file.scm" in your GIMP scripts directory. If you have GIMP 1.2 installed then there should be a directory .gimp-1.2 in your home directory. Within that is a directory named scripts, so you'd store the following file as "~/.gimp-1.2/scripts/scale-file.scm":
Figure 7: Scaling an Image with The GIMP |
; Retrieves a file, scales and post-processes it, then writes ; it to another file. Store this in your ~/.gimp-1.2/scripts ; directory; and this is how you call it: ; (scale-file "page-1.jpeg" "thumb001.png" 82 106) (define (scale-file source-name dest-name width height) (set! img (car (gimp-file-load 1 source-name source-name))) (gimp-image-undo-disable img) ; (gimp-display-new img) (gimp-image-scale img width height) (plug-in-unsharp-mask 1 0 (car (gimp-image-active-drawable img)) 3 0.5 50) (set! drawable (car (gimp-image-flatten img))) (gimp-file-save 1 img drawable dest-name dest-name) (gimp-image-undo-enable img) ) |
As the comment in this file explains it will accept an input file and an output file, as well as the desired dimensions for the image to be written to the output file. The scaling will perform an "unsharp mask" operation to undo some of the blurring effect caused by the down-scale operation. This is especially valuable for small thumbnails: they will have a slightly heightened contrast, which improves their appearance.
Note, the "1" as the first argument in the commands refers to the fact that this script is to run in non-interactive mode. I've commented out the (gimp-display-new img) command; if you ran this in interactive mode it would not just prompt you for various parameters but would also produce an image window to show you the result. But we definitely do not want to bother with visible stuff and interactive operations in this project. Onwards we go:
The program in Figure 8 puts the previous two together into a unifying whole. The effect is that we give this program a PDF file and it spits out a bunch of scaled and nicely post-processed PNG images that are ready to use by the THUMBPDF program for insertion into our PDF document.
There will be a slight delay while Ghostscript does its work. Then there is a slightly longer delay (probably five seconds or more) while The GIMP is loaded. After that it processes between 10 and 20 page images per second (depends on your processor's speed, of course). So you see, even with a 100 page document you're not likely to feel yourself growing old while waiting for this to finish :-)
Ah, before I forget: There are some shell script variables at the very beginning of the program. You may want to look over them (and the comments) to verify that they are not blatantly wrong or in danger of conflicting with something of your own making.
And finally, if as of July 24, 2001 the Totally Cool Unified PDF-Thumbnail Tool creates on its own all the Lispish code described above, so you don't need to store any of it in The GIMP directory. Just make sure that the script knows from where The GIMP loads its scripts…
N.B. This script relies on ThumbPDF whose filename pattern has undergone changes from release to release. Please search for '@@@' in this script and apply what changes may be necessary to correct things, if necessary:
Figure 8: The Totally Cool Unified PDF-Thumbnail Tool |
#!/bin/shn # GIMPSCRIPTS: Your GIMP scripts directory. the file named byn # SCALEALL will be CREATED there and then DELETED again:n GIMPSCRIPTS=${HOME}/.gimp-1.2/scriptsn SCALEALL=coolthumbs-scale-files$$.scmn SCALEFILE=coolthumbs-scale-file$$.scmn SIOD_FUN=coolthumbs-file-scale$$n # W/H (WIDTH/HEIGHT) of the final thumbnail. The valuesn # of 82/106 are pretty much dead-on for US Letter paper.n # You will have to vary them for A4, US Legal, or an othern # paper size.n THUMBNAIL_W=82n THUMBNAIL_H=106n SNAPSHOT_DPI=72n n THUMBPDF_V2=0n n while [ 1 ]; don if [ "$1" = "--v2" ]; thenn THUMBPDF_V2=1n shiftn continuen fin if [ "$1" = "-L" -o "$1" = "--landscape" ]; thenn TMP=$THUMBNAIL_Wn THUMBNAIL_W=$THUMBNAIL_Hn THUMBNAIL_H=$TMPn shiftn continuen fin breakn donen n if [ ! -e "$1" ]; thenn echo "The Totally Cool Unified PDF-Thumbnail Tool"n echo "Copyright © 2001,2002,2003 Ringlord Technologies"n echo "All rights reserved"n echo ""n echo "Creates page images from an existing PDF/Postscript file, then creates high-"n echo "quality scaled thumbnails from these for use with the thumbpdf(1) tool for"n echo "inclusion in a .tex file and re-construction of the PDF file with pdflatex."n echo ""n echo "Requires: gs (Ghostscript) to snapshot document pages"n echo "Requires: gimp 1.2 to create high-quality thumbnails from the images"n echo ""n echo "Usage: $0 [ options ] pdffile"n echo " options: --v2 Recognize THUMBPDF v2 file pattern 'thb?.png'"n echo " -L, --landscape Rotate the thumbnails into landscape mode"n echo ""n echo " Creates and destroys working directory ./.mkthumbpdf"n echo " Creates and destroys ${GIMPSCRIPTS}/${SCALEALL}"n echo ' Creates ./thumb???.png (one per page) ... OR'n echo ' Creates ./thb?.png (one per page) with --v2 parameter'n echo ""n echo 'Makefile:'n echo ' BASENAME=myFile'n echo ' all:'n echo ' pdflatex ${BASENAME}.tex'n echo ''n echo ' final: all'n echo ' pdflatex ${BASENAME}.tex'n echo ' coolthumbs ${BASENAME}.pdf'n echo ' thumbpdf --nomakepng ${BASENAME}.pdf'n echo ' pdflatex ${BASENAME}.tex'n exitn fin n if [ ! -e .mkthumbpdf ]; then mkdir .mkthumbpdf ; fin echo "Creating pages at ${SNAPSHOT_DPI} DPI"n gs -q -NOPAUSE -sDEVICE=jpeg -sOutputFile=.mkthumbpdf/%d.jpeg \n -r${SNAPSHOT_DPI}x${SNAPSHOT_DPI} \n "$1" &>/dev/null <<EOFn EOFn cd .mkthumbpdfn n ISFIRST=1n for n in *.jpeg ; don f1=`basename $n .jpeg`n # @@@ thumbpdf filename pattern:n if [ $THUMBPDF_V2 = 1 ]; thenn f2=`printf 'thb%d.png' $f1`n elsen f2=`printf 'thumb%03d.png' $f1`n fin if [ $ISFIRST = 1 ]; thenn ISFIRST=0n echo '(define (coolthumbs-scale-files$$)' >${SCALEALL}n fin echo " (${SIOD_FUN} \"$n\" \"$f2\" ${THUMBNAIL_W} ${THUMBNAIL_H})" >>${SCALEALL}n donen if [ $ISFIRST = 0 ]; thenn echo ')' >>${SCALEALL}n cat >${GIMPSCRIPTS}/${SCALEFILE} <<EOFn (define (${SIOD_FUN} source-name dest-name width height)n (set! img (car (gimp-file-load 1 source-name source-name)))n (gimp-image-undo-disable img)n (gimp-image-scale img width height)n (plug-in-unsharp-mask 1 0 (car (gimp-image-active-drawable img)) 3 0.5 50)n (set! drawable (car (gimp-image-flatten img)))n (file-png-save 1 img drawable dest-name dest-name 0 3 1 1 0 0 0)n (gimp-image-undo-enable img)n )n EOFn echo "Calling The GIMP to process the images"n mv ${SCALEALL} ${GIMPSCRIPTS}n gimp --no-interface --no-data --batch '(coolthumbs-scale-files$$)' '(gimp-quit 0)'n rm -f ${GIMPSCRIPTS}/${SCALEALL} ${GIMPSCRIPTS}/${SCALEFILE}n fin n for n in *.jpeg ; don f1=`basename $n .jpeg`n if [ $THUMBPDF_V2 = 1 ]; thenn f2=`printf 'thb%d.png' $f1`n elsen f2=`printf 'thumb%03d.png' $f1`n fin if [ -f "$f2" ]; then mv "$f2" ".."; else mv "$n" "../$f2" ; fin donen cd ..n rm -rf .mkthumbpdfn |
Figure 9: Acrobat Reader 4.0 showing our final test document. The real-sized inset is the page's thumbnail as presented by Acrobat Reader 4.0; the thumbnail was generated with the procedures detailed in this document. |
Figure 10 shows two ways of generating and adding the thumbnails to the PDF. The "Low-quality Thumbnails" column does not use The GIMP; the thumbnails will have no antialiasing. The "High-quality Thumbnails" column uses The GIMP and the scripts in Figure 8 above (we named that script "coolthumbs") and then adds the thumbnails without having thumbpdf generate its own images.
Figure 10: Putting it All Together | |
Low-quality Thumbnails | High-quality Thumbnails |
---|---|
pdflatex test.tex pdflatex test.tex |
pdflatex test.tex pdflatex test.tex |
thumbpdf test.pdf |
coolthumbs test.pdf thumbpdf --nomakepng test.pdf |
pdflatex test.tex | pdflatex test.tex |
That's basically it. You should have all the pieces to build your own high-quality PDF documents. If you do not want to spare the cycles to generate high-quality thumbnails then just use the sequence of commands from the left column of Figure 10, otherwise use the ones from the right column. The Makefile presented in Figure 12 also assumes that you want high-quality thumbnails. If not, change that file accordingly.
And now, as promised above, I present you in Figure 11 a document with some TeX conditionals that allows you to continue using your document as a normal TeX or LaTeX document; there is a reference to an image file there. Just create yourself a little image and store it as "myLogo.jpg" before compiling this document:
Figure 11: The Big Test Document |
\documentclass{article} \newif\ifpdf \ifx\pdfoutput\undefined \pdffalse % we are not running PDFLaTeX \else \pdfoutput=1 % we are running PDFLaTeX \pdftrue \fi %---------------------------------------------------------------------- \ifpdf \usepackage[pdftex, colorlinks=true, urlcolor=rltblue, % \href{...}{...} anchorcolor=rltbrightblue, filecolor=rltgreen, % \href*{...} linkcolor=rltred, % \ref{...} and \pageref{...} menucolor=webdarkblue, citecolor=webbrightgreen, pdftitle={A Document With An Image}, pdfauthor={Udo K. Schuermann, Ringlord Technologies}, pdfsubject={A Demonstration with Extra Cheese}, pdfkeywords={latex pdf gimp batch thumbnails hyperlinks}, pdfadjustspacing=1, pagebackref, pdfpagemode=None, bookmarksopen=true]{hyperref} \pdfcompresslevel=9 \usepackage[pdftex]{graphicx} \usepackage{thumbpdf} \else \usepackage[ colorlinks=true, urlcolor=rltblue, % \href{...}{...} anchorcolor=rltbrightblue, filecolor=rltgreen, % \href*{...} linkcolor=rltred, % \ref{...} and \pageref{...} menucolor=webdarkblue, citecolor=webbrightgreen]{hyperref} \usepackage{graphicx} \fi \usepackage{color} \definecolor{rltbrightred}{rgb}{1,0,0} \definecolor{rltred}{rgb}{0.75,0,0} \definecolor{rltdarkred}{rgb}{0.5,0,0} % \definecolor{rltbrightgreen}{rgb}{0,0.75,0} \definecolor{rltgreen}{rgb}{0,0.5,0} \definecolor{rltdarkgreen}{rgb}{0,0,0.25} % \definecolor{rltbrightblue}{rgb}{0,0,1} \definecolor{rltblue}{rgb}{0,0,0.75} \definecolor{rltdarkblue}{rgb}{0,0,0.5} \definecolor{webred}{rgb}{0.5,.25,0} \definecolor{webblue}{rgb}{0,0,0.75} \definecolor{webgreen}{rgb}{0,0.5,0} \begin{document}\label{begin} % In PDF mode this will use "myLogo.jpg" % In LaTeX mode this will use "myLogo.eps" \includegraphics[width=3in]{myLogo} \label{end}\end{document} |
And while I'm at it, here is a convenient little Makefile that might be useful for you as a template. Just run "make" to build the test file. Run "make final" to add nice thumbnails to the document. Run "make public" to create a final version and then copy it to a public folder. Naturally you'd have to change the first two variables that name the file's basename and the public folder, but this little script encapsulates all the pieces presented in this article. All you need is to ensure that you have the supporting software installed, including the scripts from Figures 7 and 8.
Figure 12: Simple Makefile |
BASENAME=test FINALDIR=/usr/share/doc all: pdflatex ${BASENAME}test.tex final: all pdflatex ${BASENAME}.tex coolthumbs ${BASENAME}.pdf thumbpdf --nomakepng ${BASENAME}.pdf pdflatex ${BASENAME}.tex public: final cp ${BASENAME}.pdf ${FINALDIR} |
I am indebted to the numerous sources of fine (and free!) software, and the uncounted individuals on the internet who have shared their various knowledge with the rest of us. Without their sharing, I could not have gathered all this information here, built the meager tools you find here, and then shared it in turn with others.
Thanks go to Andreas Abraham for suggesting the -L / --landscape option.
Thanks go to J�rgen Wenzel, who made me aware of the new filename pattern that later versions of THUMBPDF use, a change which caused the script in Figure 8 to break.
Thanks go to Tom Heinrich as well as George Dowding, both of whom have pointed out independently to me the fact that the $THUMBPDF_V2 flag needed to be checked near the end of the script, too. Took me a while to get around to adding that to the script in Figure 8 above.
If this article has helped you, or you have suggestions for improvements or corrections, why not drop us a line? Go to the contacts page. Use the help or status email addresses. Thanks!
All content is copyright © Ringlord Technologies unless otherwise stated. We do encourage deep linking to our site's pages but forbid direct reference to images, software or other non-page resources stored here; likewise, do not embed our content in frames or other constructs that may mislead the reader about the content ownership. Play nice, yes?