Add One Egg, a Cup of Milk, and Sir: Single Source Documentation for Today

Add One Egg, a Cup of Milk, and Stir: Single Source Documentation for Today

Carl Stieren

Simware, Inc.

2 Gurdwara Rd.

Ottawa, Ontario K2E 1A2 Canada

Email: stieren@simware.com

Abstract

What happens when the software firm you work for decides it will not deliver large printed manuals any more? Then the request comes to put everything online. Six months later, user profiles shift to the World Wide Web and you're asked to deliver HTML. In the future, a database of SGML information chunks may let us deliver anything, any which way. Today, we must devise a system that allows us to "author once, publish many". Such as system is crucial for software and hardware documentation. The method I chose was to go from FrameMaker to Acrobat .pdf files to HTML. I wrote in Adobe FrameMaker, then converted to .pdf files with Adobe Acrobat, and converted FrameMaker to HTML files with Quadralay WebWorks Publisher. But while we're waiting for the future, just learning SGML and diving deep into DTDs alone could be a mistake. SGML is a language which sets out structure, and most of us are concerned with content. Enter Information Mapping, or information types of your own devising. Identifying chunks of information such as a procedure for changing the default printer is extremely important. If we then mark each chunk for an index and record its type and title, we've also got the keywords for a future database.

How do you change?

Everyone thinks there are only two ways to change:

1. Use a software-specific conversion

2. Convert to SGML

The problem with number 1 is that you may be able to convert to a specific format, but the software may not handle another. You might convert to Acrobat .pdf format just fine, but what about HTML? The problem with number 2 is that it's a complex technology. There are not only complex tags and rules to learn, but a Document Type Definition (DTD) to design. But a DTD handles only logical or structural formatting: the SGML software companies haven't really agreed how to handle the physical formatting. Do they use DSSLs or FOSIs or, as Frame+SGML does, an EDD?

There is a third way: convert and prepare

To convert and prepare, you do the following:

Use a good software-specific conversion now

Learn SGML and specify the structure you want in a DTD

Write or contract out your DTD

Keep abreast of formatting specification

Define the information type and other metadata for each chunk of your text

Past Conversions: One-Way Only

When you convert a number of formats, it's largely one-way conversion, which means that either you can't convert back, or you can do so imperfectly. These same formats require work, and structural decisions, so that reconversions from the source aren't practical.

What I call imperfect conversions are the following:

Microsoft Word to FrameMaker

Microsoft Word to Windows Help 3.1

Microsoft Word to Windows Help 4

Because many, if not most of us, use these tools and these help formats, it's crucial that we eventually get a single source documentation method, which implies a common format to carry both structural characteristics of a document's elements, and the physical characteristics. So far, the only solution which meets these needs is SGML - with only one runner-up: XML.

Today's Methods and Tomorrow's

Tomorrow's single-source documentation will likely be one of two forms SGML or XML. The potential for XML (so-called DTD-less SGML) comes from its been endorsation by both Microsoft and Netscape. However, at the time this article was printed there was doubt whether either its specs, let alone the software which would interpret those specs, could be ready in time to catch the market. As of print time, the choice had not been made as to whether Cascading Style Sheets or DSSLs would be used to define XML styles.

For SGML, it has been around for a while, and the specs are stable. But the software to interpret the entire SGML documents and DTDs, and to verifying them, is complex. There are not many SGML browsers on the market, although Panorama from SoftQuad is one that can be downloaded from the Web. And the cost of converting everything to SGML is enormous. Not only is there the processing costs of doing conversions, but new designs for DTDs and for format specifications (DSSLs, FOSIs, EDD characteristics) have to be created. And most importantly, the documentation has to be re-engineered to fit the logic that a DTD lays out. Also, the training costs for bringing staff up to speed on the new SGML system is not small. Is it any wonder that only the largest companies have converted to SGML?

And for both SGML and XML, there has to be a method for storing the single source of documentation. A database is an obvious solution, but not a simple one (see Tomorrow's Methods section)

But we need a method today.

Prepare for Tomorrow:
Describe your Information Chunks

To prepare for a future SGML or XML documentation database, you need to get the information into chunks that are meaningful and that can be indexed on terms useful to anyone doing the authoring. Such meta-information includes:

name (= title)

information type

user level

user role

context or conceptual location (for each different manual)

usefulness rating

completeness rating

There may be other characteristics, but filling out these eight will be quite enough for writers to do (the Product specification can probably be eliminated if Context always includes the product name)

Here's how I see each item of metadata:

Name should be the same as title of each information chunk

Information type is the hardest to define (see Information Type section)

Product can be more than one name (REXXWARE, RMT, Salvo)

User level should provide few categories (Beginner, Intermediate, Advanced)

User role will be product-specific (end user, database programmer, information rule programmer)

Context can be location in an outline (REXXWARE.Userguide.Kernel.Kernelextension.COMPILE)

Usefulness should be a minimum number of categories (1,2,3 with 1 being most useful)

Completeness should be per cent of the ideal book (50 out of 100)

Information Type is where experts. The most thorough definition of Information Type that is useful to writers has been given by Information Mapping.

Information Mapping Microsoft
HTML Help 2.0

Procedure Procedural

Process

Structure

Concept Conceptual

Principle

Fact

Classification

Reference

Tips

Troubleshooting

Responses from Colleagues on TECHWR-L

To choose a method for my test of single sourcing today, I posted a query on the Technical Communicators List, TECHWR-L, asking for examples of single source documentation.

Of eight replies, five were using FrameMaker, all but one with conditional text, and producing output to a number of media. The replies I received showed the following use of conditional text:

Writer Books Pages % Conditional Output

Alexia 1 1 350 30 HTML

Alexia 2 2 300 30 .pdf

Melissa 6 236 to 398 40-50 print
.pdf

Kelly 200 to 400 15-20 .pdf
HTML

Jenni 1 1 500 45 .hlp*
print

Jenni 2 1 500 30 " "

Jenni 3 1 300 30 " "

Jenni 4 1 75 25 " "

Hope 11 2200 0 .html
.ps

Thomas 1 500 10

* hlp file was not Windows but Bristol's HyperHelp compiler and viewer

My main conclusion from this chart was to discard completely the rumor I'd heard that FrameMaker had trouble with large quantities of conditional text.

I also asked each respondent for tips and for bugs they had encountered in their single source projects.

Alexia Prendergast of Seagate Software said planning conditional tags was crucial:

"I found it easier to plan ahead what my conditional tags were going to be, then add them to my template and import them to my docs. Also, I had to pay attention when not converting entire paragraphs to conditional text to make sure I selected all appropriate spaces, punctuation, etc. Copy edit each version of the doc (i.e. with Project A text on, then with Project B text on) to make sure you catch anomalies."

Melissa Fisher of Automated Logic warned of the pitfalls of the "one-way" conversion to Windows help files:

"The biggest problem we have is when the software we document changes at the last minute - after we've started creating help files, at which point we end up either updating 8 help files and the Frame doc or redoing all the work we'd started on the help files."

Planning her condition tags was crucial, Melissa said.

"We use different tags for each of the 6 manuals, plus tags for info that does not appear in help files (such as graphics and page-number cross references), or does not appear in some other kind of group (such as info that will not appear in any doc produced for a specific oem. Having "negative" tags (like "Not in Help" or "Not in OEM1") is sometimes very useful when feature sets change (less re-tagging is necessary)."

Her suggestions on how to handle the conditional beast were as follows:

"Make a chart of how the conditions are used. For example, we have a chart that lists all the tags, their attributes (color, underline or strikethrough, etc.) and the manuals in which we "show" the tag. We create the chart in a spreadsheet app and keep a printed copy handy for reference.

Try to group conditions and colors/attributes so that it's easier to tell at a glance what you are looking at. When you apply conditions that use different colors to the same text, Frame displays it as magenta (a good reason not to use magenta in your condition tags, by the way). If you have a lot of magenta text on your screen, it makes it difficult to figure out what you are really looking at.

Use a different template for each manual/document you produce. This template should contain only the appropriate conditional text show/hide settings for each manual or document. We also create a different book file for each manual. This way we can apply the appropriate template to the book and have the correct TOC and index created without interfering with TOCs and indices for other books."

Kelly of Milpitas, California, told how to implement sub-classes of conditional tags (something FrameMaker doesn't do):

"The biggest snag is using the condition tags in frame. For example, you can't have one NT tag and one online tag. Because of Frame's weirdness, you need an NT tag, NT print, and NT online. This way you can use the NT tag to have text appear in NT for both print and online, the NT print tag for print only, and NT online for online only."

Jenni Miller of Pittsburgh, PA, used to work for a firm called Tartan Software, and now works for Texas Instruments, which has since acquired Tartan.

"We used a single set of (Unix-based) FrameMaker sources for each manual, which of course contained generic information as well as information specific to each of five different processors. Processor-specific information was tagged according to the processor. Using variables and conditional text, we could easily generate a processor-specific manual, such as the 'C3x Compilation System Manual."

The process of creating online help was done with the Bristol HyperHelp compiler

" We used our FrameMaker sources to create on-line versions of the manuals. We used Bristol's HyperHelp compiler and viewer for the on-line versions. We used the following process to create on-line versions:

1. We added the appropriate HyperHelp marker tags within the Frame source files. This was a lot of work, but only a one-time job.

2. We would then set the sources accordingly to create a processor specific manual, and generate .mif files. We created a Gema script to do this.

3. The .mif files had to be edited so that the information would look good in the help viewer. (The format of our hard-copy contained a lot of white space, and this was not appropriate for on-line. Also, the font type and size needed to be changed and increased.) The Gema script mentioned above also carried out this step.

4. We could then use the Help compiler to create an .hlp file that contained an on-line representation of the manual. The user could navigate through the manual via a table of contents, the index, or the browse buttons.

Our debugger was the only tool that supported a GUI. We provided a type of context-sensitive help for the GUI. For example, if the user hit F1 while in a particular dialog box, the help window would display the page in the Debugging Tools Manual that described that dialog box. So it wasn't true on-line help, but it was a quick and dirty solution using single-sourcing."

Hope Creskoff, of Network Products & Services of Melbourne, FL, converted Frame files to HTML with Quadralay's WebWorks. She gave the following advice:

"Just make sure that the template in the HTML-conversion product is tweaked for your particular docs. Run a couple of tests, re-tweak, run more conversions, etc. till it's looking good."

One of the more creative responses was from Geoff Lane of GJC Technical Ltd. in the UK, who used Wextech's Documentation Studio (WexTech makes Doc-to-Help) to convert from WordPerfect 6.1 for Windows to produce output in HTML and in Windows Help.

"Attempting to target WinHelp and HTML from a single source is, however, a little trickier. Although both media are 'on-line' the facilities offered are significantly different. It is comparatively easy to include video, sound, plug-ins, etc. with HTML. WinHelp offers popup and multiple windows that are not currently supported by HTML (although I understand they will be soon). This inevitably restricts 'creative expression'. Using Doc Studio, it is possible to produce paper, WinHelp (3.x), WinHelp (95), 'standard' HTML, and HTML Help -- however, you really need to keep the limitations of each medium in mind."

Geoff's creative approach to the problem was to build his own database using Microsoft Access 2.0.

"The documentation was arranged into 'sub-documents' each describing a particular aspect for a particular option. The documentation was partitioned to achieve a minimal word-count consistent with readability. The partitioning structure was recorded in an Access database, which WordPerfect queried, via ODBC, at 'assembly-time'. Parts lists were also recorded in Access, together with their translated parts descriptions.

Total un-translated volume was about 1,200 pages of documentation and about two hundred images, underpinned by about 2,000 lines of WordPerfect macro and a significant amount of work in Access (to be fair, all the Access parts list data entry was done by technical clerks -- I just developed the database). This took about six person-months."

Jennifer Westerberg, of Access Health, Inc., used HelpBreeze to convert her source Word documents to online help.

" I'm using HelpBreeze to manage the single sourcing, and it's a huge improvement over working on two separate docs. It required quite a bit of initial set up to make it work the way I wanted, and I had to write my own custom macros to generate the format I wanted.

But now, I make the changes in the Word source files, generate the help file, then generate the printed doc. It takes HelpBreeze about 20 minutes to generate the printed doc (it's about 100 pages for the primary manual). Then I spend another 15 minutes or so reviewing the printed doc online for formatting consistencies, etc."

Today: FrameMaker to .pdf and HTML

To demonstrate a single source path, I wrote a small demonstration book entitled The Webmaster's Guide. I used conditional text in the book for sections specific to two Web sites: Peaceweb and Polonianet. I could turn the conditions for one on and generate one book in FrameMaker, and that carried through to Acrobat .pdf and HTML using Quadralay's WebWorks.

Here are the steps I took:

Authored in FrameMaker

Printed to a PostScript file

Converted PostScript to .pdf with the Adobe Acrobat Distiller

Added a few extra links with Adobe Acrobat Exchange

Created HTML from the FrameMaker (.fm) files with WebWorks

Touched up the HTML a bit

Everything worked fine, except for the very first .pdf file I ever created. Using an old version of Acrobat Distiller, a few "illegal colors" and "unrecognized objects" caused Distiller to quit before it finished the file. However, the error messages in the log were so clear that I found each page on which the conversion quit, changed the color or the object, and got it to work. An Adobe representative expressed horror that I should ever have encountered such a thing, and said the bug was fixed in a later version (it was).

Using Quadralay's WebWorks was not difficult. Ninety per cent of the work is in the mapping of FrameMaker styles to HTML styles. You can also define your own HTML styles with short WebWorks macros, which are shown in the dialog box where you map styles. I found them intuitive enough to define my own macro for "ChapterTitle" to match the ChapterTitle style in Frame which had no HTML equivalent.

The advantage of WebWorks is that it generates a table of contents and an index from those in FrameMaker. Of course, you have to map the table of contents styles to HTML Table of Contents styles (a few are provided by WebWorks).

The only thing problematic about the HTML conversion was a theoretical one: what do you do with the cover page? I eventually scrapped the cover and made a large heading right above the Table of Contents, which became the home page.

What's left out of my single source method for today is one crucial output: Windows help files. The conversion I tried (from Frame to .rtf to RoboHelp to .hlp) was badly flawed, and not just in efficiency. The actual conversions didn't work (something borne out by others I surveyed). Since Windows help files are likely to replaced by the some version of Microsoft's HTML Help and/or Netscape's NetHelp, I don't see any single source that will encompass online help today. Tomorrow, both SGML and XML will likely convert to a future version of HTML Help or NetHelp.

Tomorrow:
Anything to an SGML Database

The best description of tomorrow's solution, using an SGML database was an article entitled "Defining the Next Generation of Technical Authoring" by Jan Johnston-Tyler of Cisco Systems, Inc., in the February, 1997 issue of Intercom, the STC magazine. In brief, here is what she did:

Chose Information Mapping for their content

Decided on an SGML database and hired a consultant to set it up

Wrote their spec for a DTD, defining optional and mandatory elements and sequences

Hired a consultant to write the actual DTD

Wrote in Frame+SGML at a Map level but stored at an element (paragraph) level

Hired a consultant to write middleware to parse out their Frame+SGML file into database records

Their metadata consisted of Object ID and revision numbers.

When all of this work was complete, they could write in Frame+SGML, use their custom middleware app to break up their SGML file into database records and store it. The metadata was then written into the Frame+SGML file and also stored in a table in their database.

A word about scale: Cisco Systems, Inc., Jan's company, has 200 people in its documentation and training group alone. At Simware, we fewer than 100 employees in total. Our decision is likely be to do today's single sourcing with its limitations, and wait for an affordable product to manage an SGML database and to parse a Frame+SGML file into database records..

Conclusions

If you're creating an SGML database today, you should consider following Jan Johnson-Tyler's example. The only thing I would change from this technique is the actual record that is stored. I would store each information chunk (each instance of an information type) rather than each paragraph. Of course, I haven't authored with one of these systems, and I can imagine a situation in which there's a procedure which is the same for three different versions of a product. Unless I could define a variable an place it in the procedure, I'd have to store the procedure by paragraph so as not to duplicate information.

For those of us authoring in Frame, using proven conversion tools and keeping a single source in FrameMaker is the least costly, most accurate method of producing multiple output media and multiple versions using Frame's conditional text. But with SGML and XML on the horizon, creating an SGML-compatible structure for documents now will make writing a DTD and making existing documents comply with that DTD much easier in the future. And information typing will do the same thing for content that a good DTD does for structure, and make moving to an SGML database much easier.

Acknowledgments

I wish to thank Jan Johnston-Tyler of Cisco Systems, Inc. for information on her path to a single source, Paul Therriault of Adobe Systems for an evaluation copy of Frame+SGML, and Kris of Quadralay for a Beta evaluation copy of WebWorks Publisher 3.5.

References

1. Johnston-Taylor, Jan: (February, 1997), "Defining the Next Generation of Technical Authoring", InterCom, Arlington, VA, USA, Society for Technical Communication

2. Ronquillo, Omy and Moorman, Jan,.(1997), "A Process for Implementing Online Documentation and Online Help Using SGML, 1997 Proceedings of the Society for Technical Communication, Arlington, VA, USA, Society for Technical Communication

3. Flanders, Melanie G and Smart-Wycisio, Nicole Y.: "Information Deliver: Single Source Documentation for Multiple Delivery Mechanisms," 1997 Proceedings of the Society for Technical Communication, Arlington, VA, USA, Society for Technical Communication

4. North, Simon, "SGML->XML", posting on TECHWR-L email mailing list, 20 May, 1997

5. Dhénin, Cyril, "HTML/XML, le schisme", in le Monde Informatique, 21 mars 1997

Information Mapping	Microsoft HTML Help 2.0
Procedure	Procedural
Process
Structure
Concept	Conceptual
Principle
Fact
Classification
	Reference
	Tips
	Troubleshooting

Writer	Books	Pages	% Conditional	Output
Alexia 1	1	350	30	HTML
Alexia 2	2	300	30	.pdf
Melissa	6	236 to 398	40-50	print .pdf
Kelly		200 to 400	15-20	.pdf HTML
Jenni 1	1	500	45	.hlp* print
Jenni 2	1	500	30	" "
Jenni 3	1	300	30	" "
Jenni 4	1	75	25	" "
Hope	11	2200	0	.html .ps
Thomas	1	500	10