Overview of Structured Documents, SGML and XML in FrameMaker
This article provides and overview of Structured Documents, SGML and XML in FrameMaker.
- Structured FrameMaker documents, SGML and XML
- The Element Definition Document (EDD)
- Basic Structured Authoring
- How FrameMaker handles SGML and XML
Structured FrameMaker documents, SGML and XML
What is a structured FrameMaker document?
A structured FrameMaker document contains additional information about how the content of document is put together, not just how the document is laid out. This structure is hierarchical and describes what sort of things are allowed in the document, where they are allowed and in what order. To do this, the structure defines 'elements' which make up the structure of the document. In each element, you can define what other elements can reside inside of it, how many are allowed and in what order.
For example, you can have a structure that defines a letter. At the top of the hierarchy is a letter element. In that element you have to have a From address, a To address, a greeting, a message body and a closing in that order. A message body element is made of least one paragraph element, each of which must contain at least one sentence element, etc.
Frame elements can also contain data about on how the text in their element should be formatted, as well as other attributes. This structure is saved in the Frame document itself, and can be imported into a Frame document from another docuemt or from an EDD file.
What is SGML?
SGML stands for Standard Generalized Markup Language. This is an international standard (ISO 8879:1986) that describes a langauge for describing data in an organized and structured faction. SMGL allows you to create a defintion of how a certain bit of data should be structured; that is, it is a markup language for marking up data in a way that tells you infromation about the data (this is a blah, that is made of this and that, etc). This also allows you to verify that data is in the pre-defined structure. The generic part of SGML allows you to create structures for whatever sort of data that you want to define. SGML files are plain text, making them easy to edit both manually and programmatically.
What is XML?
XML stands for eXtensible Markup Language. XML is a subset of the SGML standard that removes some of the more complex features fo SGML and is opoimized for the use on the World Wide Web.
What is the real differences between all of them?
SGML and XML are very similar. They are metalanguages that describe ways you can describe data. They are markup languages and are done in plain text that can be read by any text editor like Notepad. XML has less functionality than SGML, but is simpler to use and impliment because of that. That is why XML is much more popular than SGML as a data markup language.
SGML and XML are format-independent. They mark up data, not layout. They don't say anything about how the data should be presented, unless that is the point of the structure and has been designed into it. (HTML is an instance of SGML specifically written to display text, thus layout data is port of the data being marked up).
FrameMaker structured documents combine the structure of the data and the layout which is independent from that data. It can tie formatting to the structure and markup of the data, or it can layout the data without paying attention to the structure.
Why are they useful?
XML and SGML have a wide degree of application. Virtually anything that can be described in a hierarchical, explicitly-defined way can be described with SGML or XML. This makes it easy for organizations or applications to define a structure and use that structure to archive and trade information. The fact that it is plain text makes it easy to store and process. And the fact that it is structured gives it the ability to communicate details about the data in context of a larger data document, as opposed to just providing the data itself.
FrameMaker can convert it's structured documents into XML and SGML, allowing it to take this data and lay it out for print, PDF or HTML publishing.
The Element Definition Document (EDD)
The EDD file is the file within FrameMaker that contains the actual definitions of the elements available in a structured document (as such, it ends with a .fm, not a .edd). Each element is defined with a name and a 'general rule'. This general rule specifies what elements a particular element can contain, how many occurances and in what order they can or must occur. Additionally, any attributes that the elemts might have are defined in this definition.
Optionally, the definition of an element can contain formatting information, such as associated paragraph or character tags, automatically inserted text, specific font information, etc. This information can also be defined to be context-dependent, that is, differing dependent on where the element is in the hierarchy, where it is in relation to other elements, or what attributes the element has.
FrameMaker specific objects can also be specified in the EDD as specialized elements, such as markers, cross-references, equations, table elements, Asian Rubi characters, footnotes, etc.
Basic Structured Authoring
FrameMaker can deal with structure in two ways, as part of an SGML/XML workflow or as a self-contained FrameMaker structured document. This self-contained document is contained entirely within FrameMaker and requires no outside intervention for structured authoring. I'll refer to this as basic structured authoring.
To create structured documentation in a basic structured authoring environment, you need only two components: an EDD and a document template. The document template is simply a FrameMaker document where the actual authoring occurs. This document contains the definitions of the structure (i.e. the defintions are saved with the document). These definitions are imported to the document from an EDD file using the File > Import > Element Definitions... option.
Once the elements are imported into a document template, you can author using that structure in that document. Or, more often, structure developers will provide a blank FrameMaker document to authors that they can copy to make individual documents (hence the term document template). Note that in this situation, the author does not need the EDD, just a document template with the appropriate elements imported into it. You can even import elements from one Frame doc to another.
How FrameMaker handles SGML and XML
When you include FrameMaker as part of an SGML or XML workflow, you still using the basic structured authoring workflow as the core of the process. However, you are adding conversion components to FrameMaker's section of the workflow. This is handled by the Structured Application.
Structured applications - what they are and what they are made of
A structured applciation is not an application per se, rather it is a set of files and settings designed to import and export a certain set of structures in and out of FrameMaker. The central point of structured applications in FrameMaker is the structapps.fm.
FrameMaker only recognizes a single strucapps.fm file:
-- Windows: \Program Files\FrameMaker [version]\Structure\structapps.fm
-- Unix: $FMHOME/fminit/$LANG/structure/structapps.fm
This file contains the relevant information for all of the structured applications that FrameMaker has access to. Each application has a separate entry. This defines the name of the application, the locations of the related files, processing settings, entity locations and a list of doctypes. Doctypes are the elements in that structure that are valid as root level elements in a structure.
The rest of the particular structured application consists of the related files. Some of these files include:
Template - This is a document template as described in basic authoring. It is a FrameMaker document with the appropriate elements imported and any static fearures (master pages, headers/footers, etc.) included.
- DTD - This is an SGML Data Template Document, which is a document that defines the structure of an SGML or XML file (similar to an EDD without the data on formatting).
- ReadWrite Rules - The read/write rules describe how tags from a an SGML/XML file correspond to elements in an EDD.
Importing and Exporting SGML and XML
FrameMaker does not read native XML or SGML files. However, you can import these files using the information in the structured application as a filter.
When you import one of these files into FrameMaker, the data is checked against the definition in the specified DTD file. The data is checked both for well-formedness (does the file have all the right pieces to be a valid XML/SGML file) and validity (does the file match the definitions in the DTD).
If you are using FrameMaker 7.2, Frame can apply a specified XSLT file to an XML file before it is checked against the DTD. This gives a chance for the data to be preprocesses by the XSL engine in FrameMaker to make any modifications. The result of that XSL transformation is what is checked against the specified DTD.
FrameMaker then takes the specified read/write rules and processes the file. In this process, Frame can rename elements from one name to another, delete elements and/or attributes, map elements to FrameMaker objects (such as tables or markers), map attributes to Framemaker object properties, or map XML entities to FrameMaker variables.
The structure and data is then inserted into a copy of the specified document template as defined by the elements saved with that template. This adds any formatting specified in the element definitions.
Exporting to SMGL/XML is the reverse process.
In the above process, you can see that the EDD is not referenced at all. You must create the EDD in order to create the element definitions that are saved into the document template. However, the EDD is never directly referenced in the import/export process. If you change the EDD, it will have no affect on the above process unless you import the definitions into the document template.
- You'll note that applying the XSLT and running the data through the read/write rules do very similar things. They have some important differences. XSLT can only be used for transforming XML files, not SMGL files. Additionally, it cannot map XML tags to FrameMaker objects. If you need to do that, you'll need to use read/write rules instead of or in addition to an XSLT file. Conversely, read/write rules do not provide the enormous flexibility of transforming XML data from one form to another. It is entirely possible to use both in a workflow.
- There is no mention of XML Schema in the above process. FrameMaker 7.2 does support the use of XML Schema for creating structure. however, what it actually does is convert the Schema to a DTD file. So it is the DTD that will be need to be referenced in the structapps.fm file and not the XML Schema.
At the end of the structapps.fm file is a Defaults section. When you choose "No Application" when importing SGML/XML, these are the settings that are used. Additionally, if a structured application does not specify a setting, the corresponding setting from the Defaults section is used.
Round Tripping is a term used for pulling the same data from XML/SGML into FrameMaker, out of FrameMaker and back in again. The goal of such a process is that there would be no loss of data in this process, that XML imported to Frame will export from Frame in tact and Frame docs exported to XML will be the same when re-imported. This can be a tricky prospect that takes careful development of your structured application. If a customer is interested in SGML or XML roundtripping, they need to be prepared to become a structured applciation developer or hire one - there is no good solution 'out of the box'.
Some things to consider:
When FrameMaker exports to XML or SGML, it does not export conditional text that is currently hidden.
However, there are ways to get around this with FrameMaker 7.1 and later. See Chapter 23 of the Structure Application Developer's Guide for one way of doing this with XML.
- FrameMaker 7.0 and later does import and export XMIl including receipt of all Unicode characters that were available in the data stream during the import. FrameMaker still cannot display Unicode characters outside of the ISO Latin 1/Mac Roman mapping. For this reason, during XML import, FrameMaker temporarily parks characters that it cannot display in special markers in order to add them back into the data stream again during XML export.
FrameMaker XML Cookbook - \Program Files\Adobe\FrameMaker7.x\XMLCookbook
FrameMaker Structure Application Developer's Guide - \Program Files\Adobe\FrameMaker7.x\OnlineManuals
SGML: General Introductions and Overviews - http://www.oasis-open.org/cover/general.html
XML.org - http://www.xml.org
XML for Absolute Beginners - http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml.html