Etext Center Guidelines for the Creation of Archivable Illustrations
David Seaman, Electronic Text Center, University of Virginia
![[ornament]](http://etext.lib.virginia.edu/images/horzorn1.gif)
Etext staff: if in doubt see David Seaman for guidance on the settings at which to scan an image.
We have eight years of experience in the creation of digital copies of book illustrations, typescript, and manuscript, so don't try to "go it alone". The .tiff files may be sizeable -- don't be offput, and especially don't be tempted to scan at too low a resolution (or God forbid, at 8-bit colour), just because a tiff is a big file. And we don't want to have to re-scan at a later date. The tiffs go off onto a CD as soon as we have made jpeg and gif versions for current everyday use.
Rules and Regulations of Image Scanning and Encoding
The following list explains the items we typically scan, their specifications for scanning, and how to name them for our electronic text database at the Etext Center:
- What Typically Warrants Scanning
- Images of the spine, front cover, end-papers (ONLY if visually interesting), frontispiece (if there is one), and title-page.
- All other images in the text or anything that warrants visual interest--including ornamental capitalizations and small images embedded in the text itself.
- Scanning Specifications
- At the Etext Center, when we say an "image" we mean the entire page upon which the image is placed even if it is something as small as an ornamental capitalization. When you draw your box around the image that you want to scan, leave a few millimeters on each side of the page so the viewer can better appreciate the three-dimensionality of the book as a physical object.
- All images are scanned and saved as 400 dpi (dots per inch), 24-bit color tiffs. See Special Collections Image Scanning for more information.
- Image Naming Conventions
- Again, all images will be saved in uncompressed tif format.
- An image name can have no more than 8 characters as some of our work is done in the MS-DOS environment. These characters can only be numbers and letters--no punctuation.
- At Etext, we typically name images so that they will correspond to
the texts they are a part of.
Example: if you are tagging the frontispiece, the titlepage, and an illustration on page 122 in Booth Tarkington's The Flirt (a work whose UVa ID is TarFlir) you would name these images as follows:
Frontispiece: "TarFfpc"
Titlepage: "TarFttl"
Page 122: "TarFl122"
Illustration and Image Tags
The following tags are used to tag illustrations and information that goes with illustrations.
<figure> </figure>
The <figure> tag pair indicates the location of a graphic, illustration, or figure. The filename for the digital image is given as the value of an entity= attribute.
<figure entity="FILENAME">
"Entity" specifies the file in which the graphic image of the figure is stored. Do not include a suffix denoting the image type (e.g. FILENAME.gif). Usually, we will name the image file using as much of the work's unique ID as possible, and the page number on which the illustration occurs. As some of our work is done in the MS-DOS environment, the image filename should not be longer than eight characters.
So, for an illustration from Booth Tarkington's The Flirt (a work whose UVa ID is TarFlir) the entity value for an illustration on page 122 would read as follows:
<figure entity="TarFl122"> </figure>
<head> </head>
The <head> tag may be used to transcribe (or supply) a heading or title for the graphic itself:
Example:
<figure entity="TarFl122">
<head> "Kiss me some more darl----"</head>
</figure><figDesc> </figDesc>
The <figDesc> tag is important. The tag contains a brief prose description of the appearance or content of a graphic figure. The reason it is necessary to have is because the information in this tag allows the user to search for information within a particular illustration.
Example:<figure entity="TarFl122">
<head>"Kiss me some more darl----"</head> <figDesc>Grayscale illustration of a young girl trying to kiss a boy, under moonlight. </figDesc> </figure>Click here to see the image.
- Note: if it is possible to use terms from the following control
vocabulary, that would be to our advantage: The
Thesaurus for
Graphics Materials, consisting of
5,997 terms and numerous cross references indexing visual materials.
TGM I
is a companion document to
Thesaurus for Graphic Materials II: Genre and
Physical Characteristic Terms
- You may also have one or more paragraphs following
the <head> and preceding the <figDesc> to transcribe
any additional text relating to the figure found in the print source.
The <head> and <figDesc> fields are valuable sets of information for PAT searches -- as the set of etext images grows, they will allow a user to search image captions, and descriptions of those images. For a WWW user coming to the data through a VT100 client such as Lynx, thefield should be able to be sent as an alternative to the graphical image.
Other Simple Examples
<figure entity="EliMid10">
<head>Dorothea</head>
<figDesc>
An engraved portrait of Dorothea
posed thoughtfully at a writing table. Three stacked books
stand in the right foreground. Dorothea's right hand holds a
quillpen.</figDesc>
</figure>
<figure entity="EliMid50">
<head>Mr. Casaubon and Dorothea</head>
<figDesc>An engraving by W.L. Taylor showing Mr.
Casaubon and Dorothea, presumably in their "hour's
<hi>tête-à-tête</hi>." Casaubon sits in an
upholstered wooden chair in the left background corner, facing
the viewer, with Dorothea's right hand in his own. Dorothea
sits on a footstool at center-right, turned towards Casaubon.
The left quarter of her face is visible to the viewer. The
setting is a sunny room with one curtained window and one
uncurtained, open window behind the figures.
</figDesc>
</figure>
SGML Text Embedded in Image Files
A growing number of our electronic texts have book illustrations and other book-related images along with the tagged ASCII text. To include an attribution record in these book illustrations we bury a version of the TEI header into the binary code of the image. The user who saves an image from a text on our etext server now gets -- in Trojan Horse fashion -- a tagged full-text record of the creation of that image as part of the single image file they save. The image header and related <figDesc> information gives us a searchable SGML text database for our images.
For a description of an early implementation of "text in images", see David Seaman: "Campus Publishing in Standardized Electronic Formats: HTML and TEI." in Scholarly Publishing on the Electronic Networks, 1994.
Specific Procedures for Adding Image Headers
Image Processing on Unix: ImageMagick
The mogrify part of this impressive Unix tool allows us to perform batch image conversions from one format to another (e.g., TIFF to JPEG) and to add tagged text headers into the images as we convert.
ImageMagick, is available from
ftp.x.org/contrib/applications/ImageMagick/
and is on the UVa etext machines. See the ImageMagick
README file for more information.
For an interactive on-line implementation of ImageMagick, see the Image Machine at:
http://www.vrl.com/Imaging/
Overview
- To change formats from TIF to JPEG , type:
mogrify -format jpg -quality 50 *.tif
This will convert all the tif images within that directory to jpg files of 50% quality. You can use the same command syntax for other formats.
To resize them as well, add the -geometry command:
mogrify -format jpg -quality 50 -geometry 30% *.tif - To add text into the image comments field, type:
mogrify -comment @text.file image.file
NOTE: the text.file should have each line of text preceded by a hash mark and a space; this enables programs such as JPEGView for the Macintosh to read the comments as well.You can batch process this as well:
mogrify -comment @text.file *.jpg - It is possible to convert formats and add a text header with a single command:
mogrify -format jpg -quality 50 -comment @text.file *.tif
All tif files will be converted to jpgs that contain the text comments in the text.file.
Step by Step Instructions for UVa Etext processors
1. Use the new TEI header template in etext/Done; it has several new fields:
- Just after the "Creation of machine-readable version:" field, there are two lines to indicate who created the digital images.
- The first note field should be used to indicate the existance of images; also note if the images come from a different source than the print text.
- In the <editorialDecl> section, there's now a standard indication about how we store the images.
- There's an extra <textClass> section which includes keywords and terms to indicate the artist, the type of visual work, and the type and dpi of the digital image; modify those fields as appropriate (i.e., if you have a 24-bit color image at 400 dpi, that's the only information that should appear in that field).
To add a header to an image:
- Make a copy of the completed TEI header for the text in question
- put a hash mark and a space at the beginning of every line:
# <titleStmt>
# <title>blah [a machine-readable transcription]</title>
# <author>blah</author>
# <respStmt>
The hash marks are necessary for some image viewers. This text is now ready to go into the image(s).
You can now simultaneously convert your tifs to jpgs and add in the header information above to those jpgs.
If the header text file is called AutWork.header, and your various tiff files are image1.tif, image2.tif, image3.tif, and image4.tif, then this is what you do: - Make sure the tiff files are in the same directory in which you are doing your mogrification.
- Type the following command:
mogrify -format jpg -quality 50 -geometry 30% image*.tif
You have now converted all the image*.tif files into image*.jpg files, and those .jpg files have the textual information from the header embedded within them; the .tif files have remained unchanged. (You can view the text in the images by viewing the .jpg files in xv, calling up the control window, and choosing the "comments" button.)
If you want textual information that's specific to one particular image, you need only do the following:
- Repeat step 1 above.
- Repeat step 2, but add the following into the text after
<text id=XXXXXXX>:
<body>
<p>
<figure entity="XXX">
<head>XXX</head>
<figDesc>
XXXXXXXX
</figDesc>
</figure>
</p>
</div0>
</body>
</text>
</TEI.2> - Fill in the fields with the information appropriate to the individual image. (These tags will also need the hash mark and space before them.)
- Repeat step 3 above.
Image Processing on the Mac: ADDJFIFcomment
- 1. Move a jpg to the Mac; save it again as a jpg using JPEGView -- this
process will only work with a Mac conformant jpg.
- 2. Once you have a Mac jpg, call up the ADDJFIFcomment application;
type in your text, and select "add"; then select the jpg file to which you
would like to add comments.
- 3. NOTE: if you want comments in a gif as well, follow steps 1 and 2, and then call that new jpg into JPEGView and save as a gif; ADDJFIFcomment won't take anything but a Mac conformant jpg.
Alternative, and much less preferable methods, used before ImageMagick
- 1. Call up the image in xv, and save it in PBM (ascii) format; it will
assign either a .ppm or .pgm suffix depending on whether the file is
color or greyscale.
- 2. Issue the following command:
csplit -f pnum file.pgm 02
or
csplit -f pnum file.ppm 02
This will result in two output files: pnum00 and pnum01. These two files are your original file.pgm split into two: the first line and everything following the first line. We want to insert the header after the first line in the .ppm or .pgm file.
- 3. Concatenate the header and the two "pnum" files in the
following order, to create a new file (here called "file-2.pgm"):
cat pnum00 text.header pnum01 >file-2.pgm
- 4. Call up file-2.pgm in xv and save back to JPEG, or convert to GIF; the text remains embedded.
NOTE: The text header must have a pound symbol and a space at the beginning of
every line:
#
# text of header goes here
#

