News Getting Involved Download Ports Documentation Overview Running Conformance Graphics Examples Packages Building History Extras Bugs Mailing Lists Development
| | Overviewxmlroff produces PDF or PostScript output using the GNOME Print library. Other output formats can be added. xmlroff is written in C, and uses it uses libxml2 and libxslt plus the GLib, GObject and Pango libraries that underlie GTK+ and GNOME (although it does not require either GTK+ or GNOME). GLib is a general-purpose utility library, GObject is a flexible extensible object-oriented framework for C, and Pango is a framework for the layout and rendering of internationalized text. This combination made it easier to develop the formatter, makes it easier for current GTK+ and GNOME developers to also work on the formatter, and allows the formatter to use the internationalization support of Pango. xmlroff has no connection with troff, nroff, groff or related programs. The names are similar, but so are the purposes. The roff(7) man page from the groff distribution describes a roff type-setting system as "an extensible text formatting language and a set of programs for printing and converting to other text formats." Since this program's input and and its extensible text formatting language are both XML, it therefore makes sense to call this program "xmlroff" in homage to the traditional Unix type-setting programs. XSL describes formatting – for both paper and screen – in terms of "formatting objects." There are formatting objects, for example, for pages, blocks of text, list items, and tables. XSL also defines their allowed properties and the properties' meaning: for example, all formatting objects containing text may specify the font size or font weight (normal, bold, etc.) of the text, but only a table cell may use the "number-rows-spanned" property that indicates how many rows that cell spans. XSL also describes the conceptual procedure for processing the
input XML document and XSL stylesheet to create the formatted
output. xmlroff implements formatting largely in accordance with the
stages described in the XSL Recommendation. The stages are shown in
the context diagram and summarized in the following sections.
The context diagram is in the form used in Software Requirements & Specifications by Michael Jackson.
Source XML to Result Tree
This stage transforms the source XML into a representation of
the formatting objects that are used to direct the formatting.
The inputs to xmlroff are an XML document that is to be
formatted -- typically described as the "source" document -- and the
XSL stylesheet that specifies the transformation of the source XML
document into the XML vocabulary used for specifying formatting
objects and their properties.
xmlroff incorporates the libxslt XSLT processor, which performs
the transformation. The result, termed the result tree in the XSLT
Recommendation, is an in-memory representation of the structure of an
XML document. The element names and attribute names appearing in the
result tree are the names of the XSL formatting objects and their
properties, respectively. Later processing stages use the structure of
the result tree and the property values specified to determine the
appearance of the formatted output.
The result tree could be identical to the source document's tree
or could be radically different, since the stylesheet can drop any
part of the source tree, duplicate any part of the source tree, create
elements and text in the result tree, and merge in any part of other
XML documents that could be specified in the source XML, in the
stylesheet, or in a parameter passed to the XSLT processor.
The mechanics of this transformation, from source XML to a
different XML document, is standardized by the XSL
Recommendation. Actually, it is standardized by the separate XSLT
Recommendation. XSLT is conceptually part of XSL, and in early
drafts of XSL, the XSLT specification was in one section and the
formatting objects' descriptions were in another. XSLT was broken out
as a separate W3C Recommendation because it now has widespread use for
general XML-XML, XML-HTML, and XML-text transformations in addition to
its initial purpose of transforming arbitrary XML into the specific
XML vocabulary used for expressing formatting objects and their
properties.
A beneficial side effect of XSLT's success is the availability
of a choice of free, stable, and high performance XSLT processors that
can be incorporated into xmlroff as the much preferred alternative to
writing an XSLT processor. Accordingly, xmlroff incorporates Daniel
Velliard's libxslt XSLT processor.
XSLT operates on documents as trees. That is, XSLT's processing
model views the source document not as a sequence of characters, and
not as start-tags and end-tags with text between them, but as a tree
of nodes, where, for example, each element, each attribute, and each
contiguous run of text is a separate node. The structure of the source
XML document — for example, the containment of one element by another
— is reflected in the structure of the tree of nodes comprising the
source tree.
The result tree is similarly a tree of nodes, where a node
represents an element, an attribute, some text, etc. In the general
XSLT processor, the result tree is usually written out to a file or
transmitted to another application as an XML document. The result tree
doesn't have to be written out in any form, however, and it can be
used as-is by the application. Accordingly, xmlroff uses this
in-memory representation as the input to the next processing
stage.
Result Tree to Formatting Object and Area Trees
This processing stage transforms the result tree into a tree of
real programmatic objects with properties that are expressed as
numeric, boolean, color, or other datatypes (instead of just
text).
The result tree is a representation of an XML document. XML
documents are just text, so in the result tree, formatting object and
property names are represented as text, as are property values. Some
property values, however, represent numeric quantities, and many may
contain expressions that need to be evaluated to determine the exact
value to use. Furthermore, there are complex interactions and
dependencies between formatting objects and between properties.
This stage also creates the area tree representation of the
formatted document layed out onto pages.
The formatting object tree expresses the specification for the
formatted document (with expressions, interactions, and dependencies
fully resolved) in terms of objects and datatypes that are useful for
manipulation by a program. If the output "page" of a formatter is
always infinitely wide and infinitely long (as is both possible with
an electonic display and supported by the XSL Recommendation), then
creating the output would be a comparatively simple matter of writing
out the formatting object tree.
When the output isn't infinitely wide, however, a formatter has
to support breaking lines, and when the output also isn't infinitely
long, a formatter has to support breaking the output into discrete
pages. In the real world, formatting content into pages also means
numbering pages, supporting running headers and footers, and possibly
handling different page sizes or margins on different pages.
The formatting objects each create zero, one, or more than one
areas in the area tree:
-
Some formatting objects, e.g. wrapper, generate no areas because
that's how they're specified, and some don't generate areas because of
some conditionality in the formatting process; e.g., only the
"preferred" of any number of applicable marker formatting objects is
formatted in place of a retrieve-marker formatting object
-
Many formatting objects generate one area.
-
Some formatting objects generate more than one area; for
example, a block formatting object split across two pages, or a
page-sequence that, by definition, generates as many pages as
necessary to contain its content.
Many formatting object properties may be expressed as
percentages of another value, often a percentage of a dimension of the
area generated by an ancestor formatting object. Xmlroff builds the
formatting object tree and the area tree in parallel so expressions
containing percentages are resolved when a formatting object is added
to the formatting object tree.
This stage works on the area tree as a whole to optimize the
arrangement of the areas.
Creating each page in isolation produces a workable result, but
there can be dependencies between pages; for example, page number
citations to other pages. In addition, producing quality pages (i.e.,
pages that look good) means, for example:
-
Balancing the amount of text on facing pages so the content of both pages extend the same distance down the page
-
Aligning the lines on facing pages and on back-to-back pages
-
Not splitting a block of text such that only one line appears before a page break or only one line appears after a break
-
Not ending a page on a hyphen
This stage writes out the area tree in a format that can be used
by other programs or sent to a printer. The initial output format is
PDF.
|