Perplexed by complex syntax: understanding syntax diagrams in DITA
What DITA elements are available for syntax diagrams? And how does one go about using them?
The other day, a client asked for help with formatting syntax diagrams in DITA sources, using the standard metacharacters: brackets, braces, and bars (often called “Unix” syntax diagrams). Although I’ve known about the DITA reference topic specialization and the refsyn and syntaxdiagram elements, I’ve never had a great call to use them.
Digging in the DITA specification and asking Mr. Google didn’t turn up much…other than the fact that that the refsyn and syntaxdiagram elements exist and Deborah Pickett has a plugin that will produce beautiful railway diagrams (once you get your syntax coded…but how do you encode the syntax?).
What I was looking for, but couldn’t find, was something that shows source and output of a few simple syntax diagrams: “To get this, do this.” The DITA specification comes close; it shows an example of the syntaxdiagram element, but fails to provide a corresponding example of what the output might look like (nor does the example descend to the attribute level, which is important).
After digging into a number of different sources, I was able to put together a reasonable example (which I’ll share here). But along the way, I discovered something somewhat disconcerting: I wouldn’t want to use any of it.
A test case
Let’s look at a simple command from a fictitious command line interpreter. Call it “doit.”
The doit command has two required parameters (represented by the variables in and out) and two optional command line switches (/a and /b). The /b switch requires one of three keywords: low, medium, or high. In a “Unix” syntax diagram, the doit command might be represented like this:
doit in out [/a] [/b {low | medium | high }]
There are essentially three ways to encode this syntax in DITA, which range from inappropriate use of tags (codeblock, currently used by the client) up through properly tagged content (syntaxdiagram, which turns out to be impractical).
Using codeblock
It is possible to contain syntax diagrams in the codeblock element, but this borders on tag abuse (using elements for the desired appearance, rather than for their semantic meaning). However, this is the most readable of the three approaches. To use syntax metacharacters (braces, brackets, and bars), you’ll need to hard code them in the text. (I added line breaks for readability.)
<codeblock>
doit <varname>in</varname> <varname>out</varname>
[/a] [/b {low | medium | high }]
</codeblock>
For further precision, you can use some inline elements to identify the contents of your syntax diagram:
- cmdname – The name of the command
- keyword – A keyword
- varname – A variable
<codeblock>
<cmdname>doit</cmdname> <varname>in</varname> <varname>out</varname>
[/<keyword>a</keyword>]
[/<keyword>b</keyword>
{<keyword>low</keyword> | <keyword>medium</keyword> | <keyword>high</keyword> }]
</codeblock>
Using synph
The synph element is a step up from the codeblock. First off, it’s much more semantically appropriate. Secondly, because it is more correct semantically, you can modify the transforms in the DITA OT to give your syntax diagrams a different appearance from code examples. However, the synph element is (supposedly) an inline element; syntax examples usually begin on a new line. Using an inline element and expecting block-like behavior is questionable application. But again, this form is fairly readable. As with codeblock, you’ll have to hard-code syntax metacharacters.
The synph element can contain inline elements that identify the keywords, variables, and other aspects of the syntax. These are different from the elements used in a codeblock element. The DITA elements for semantic markup of synph contents include:
- kwd – Keywords (by default, formatted in bold in OT HTML transforms, and no style applied in OT PDF transform)
- var – Variables (formatted with italics)
- delim – Delimiters, such as quotations marks, slashes, or hyphens
- sep – Separators, such as required commas separating list items
There are other elements; the ones in the preceding list are important to our discussion. See the DITA specification for full details. The delim and sep elements do not provide any special formatting behavior, although the plugins can be extended to handle them differently.
<synph>doit <var>in</var> <var>out</var>
[/<kwd>a</kwd>]
[/<kwd>b</kwd> {<kwd>low</kwd> | <kwd>medium</kwd> | <kwd>high</kwd> }]
</synph>
Using syntaxdiagram
The syntaxdiagram element is the proper block-mode element to use for syntax diagrams. The elements used within syntax diagram allow the DITA Open Toolkit to format the syntax diagram using a variety of representations. The default is to use the syntax metacharacters, but there is a plugin that creates railway diagrams (http://tech.dir.groups.yahoo.com/group/dita-users/message/14504).
The three most useful elements with the syntaxdiagram element are the elements groupchoice, groupseq, and groupcomp. The other elements allowed in syntaxdiagram are for depicting the diagram in sub-pieces or for attaching notes. Here’s when to use the three group elements:
- groupseq – Contains a sequence of elements that must occur in the order shown.
- groupchoice – Contains a number of elements from which you can make a choice. The importance attribute indicates whether the contents are optional or required.
- groupcomp – Contains a sequence of elements that must be formatted close together.
You can nest any of these three group elements inside other group elements.
The same inline elements used in synph can be used in the syntaxdiagram group elements, so each piece of the syntax can be identified precisely.
<syntaxdiagram>
<groupseq>
<kwd>doit</kwd>
<var>in</var>
<var>out</var>
<groupchoice importance="optional">
<groupseq>
<sep>/</sep>
<kwd>a</kwd>
</groupseq>
</groupchoice>
<groupchoice importance="optional">
<groupseq>
<sep>/</sep>
<kwd>b</kwd>
<groupchoice importance="required">
<kwd>low</kwd>
<kwd>medium</kwd>
<kwd>high</kwd>
</groupchoice>
</groupseq>
</groupchoice>
</groupseq>
</syntaxdiagram>
However, by this point, most of you are saying, “Ugh!”
And that’s my reaction, too. In fact there are several shortcomings to the DITA implementation of syntax diagrams.
- The contents of the syntaxdiagram element are opaque. Unless you’re well versed in the syntaxdiagram elements, the meaning of the content is inscrutable. It would be great if there was a visual editor for this type of content (as there is for MathML-encoded mathematical equations).
- The contents of synph and syntaxdiagram are entirely different. I like to think of the codeblock and codeph elements as two halves of the same whole; I use codeblock for a full example and codeph as inline text when quoting snippets from the code example. The elements used in codeph are a reasonable subset of the elements used in codeblock. However, when I use syntaxdiagram for the full syntax, the Open Toolkit handles the syntax metacharacters; when I use codeph, I have to provide them myself.
- I’m surprised that synph and syntaxdiagram don’t provide a separate element for the command name. I don’t like the idea of using the kwd element (because I see the command name as being separate from other parts of the command). What’s more, it might be useful to do a faceted search on individual commands, which a separate element would facilitate.
- The DITA Open Toolkit does not correctly implement the output for the group elements in a syntaxdiagram. When groupchoice is used with importance=”optional”, the Open Toolkit surrounds the content in both braces and brackets. I have never encountered this in practice, so I’m presuming it’s an error. I was able to override the behavior in my plugins.
Similarly, there is no special formatting applied for the kwd or keyword element. Second-guessing the Open Toolkit group, this might be a deliberate choice, as these may be applied simply to identify content, rather than indicate formatting. Again, I overrode this behavior in my plugins. (Note that I only tested what I needed to implement; I can’t vouch for the completeness of the DITA OT implementation of the other elements allowed in syntaxdiagram.)
So what did we do?
Of the approaches, using synph seems to make the most sense (and that’s what our client chose to do). It keeps syntax diagrams readable and easily editable, but it also distinguishes them semantically from code examples.
I would use syntaxdiagram only if required by a client. A visual editor for syntaxdiagram would help a great deal. If developers’ sources for the command language were available, I would also search for (or build) a tool to convert the sources directly into syntaxdiagram elements. But again, only if required.
Do you have to include syntax diagrams in your DITA topics? What has been your experience?
Leigh White
Hi Simon,
What an enjoyable post. No, really! You point out some of the very things that have driven me crazy trying to develop team standards for documenting code. The DITA purist side of me and the practical side of me did battle for quite a while. Fortunately, not in public. Much. Ultimately, we decided on plain codeblock for most code samples for the reason that they are exported from the application code into separate topics, which are then conref’d into our content. A single code sample might be exported multiple times as the code evolves. We decided it was too onerous to set up an export into full-fledged syntax diagrams. This code is copied and pasted by clients into their applications for customization so it was critical that the code appear in our documentation EXACTLY as is appears in the original application code. Trying to use DITA syntax diagrams raised the risk or something being lost (or added) “in translation.” Moreover, we don’t deconstruct the code, so there’s no value added in identifying the parts of the sample. So…although it was fun to play with syntaxdiagram and we do use it for a few things, it’s definitely the exception for us.
Barbara Douma
Thanks Simon,
Your post is very timely, as I spent several days last week looking for information on and playing with syntaxdiagram markup. Not because I want diagrams but because I want to find a way of semantically binding dynamic user assistance to our code editor. But without an adequate representation when editing, it is just too error prone for authors to use effectively. I’ll look into synph now.
Mark Baker
Simon,
This illustrates one of my (many) beefs with DITA. One of my cardinal rules of markup design is: don’t add markup to what is already structured text. Markup should be use to add structured where it does not exist, not to re-mark-up text that is already structured.
A syntax diagram is already structured text. You don’t need to add anything to it to make it parsable. All you do by wrapping
XML tags around it is force the author to do parsing that ought to be done by the machine.
At most, in a syntax diagram, you might need to distinguish variables from literals, which I would generally prefer to do with a meta-character rather than XML tags — preferably a meta-character already in common use for that purpose, such as or {}.
The whole point should be to make the data entry easy for the author, not to have the author jump through hoops to serve some publishing script.
In SPFE projects, for other types of syntax diagrams (not specifically function syntax) I have used EBNF notation for expressing syntax. It is well know, well documented, and much more compact than XML (it is used by the XML spec itself). But I only use if for authors who are familiar with EBNF. An authoring format should be all about the author, and there is no need to use XML tagging for everything. It is usually better to use XML to delineate more task-appropriate and audience-appropriate structured formats so that you can identify and parse them on the back end.
Susanne Muris
Hi Simon,
actually, I have a single manual with numerous syntax diagrams. And I wasn’t forced to include them, at least not literally.
The manual is a command line reference. Initially, the command line help (UNIX style) was generated automatically. For some reason, maybe because the commands were fairly complex, in the generated output, almost every argument was marked as optional – this was clearly wrong.
Some parts of these commands were highly repetitive (for instance, pretty much every command would accept the same arguments to log on to a server).
So first of all I used the syntax diagrams (groupseq, groupchoice, importance attribute) in discussions with developers about the actual syntax. In addition, a used the fragment element for reuse.
It was cumbersome, but the lesser evil in this case: I could ask very precise questions and finish the project, QA had an easier job, the software became much better, and the misleading generated help was replaced.
About a year later, the syntax for logging on was revised, and I only had to change one fragment and a few parameter descriptions …
BTW, I also customized the output UNIX style – railroad diagrams look just too scary. And I also tested only the part that I actually used …
Leslie Turriff
I’ve been familiar with railroad track diagrams for a long time, and I’ve been watching the progress(?) of these diagram generators as well. None of them seems to have a mechanism for diagramming default options or arguments*, and that seems to me to be an important requirement which is not being met. Because of this I write my own diagrams by hand, using Unicode box characters with a monospace font. It’s time-consuming, but at least my diagrams properly reflect the syntax’s realities.
*See page xvii of http://publibz.boulder.ibm.com/epubs/pdf/hcse2c30.pdf for an example of a default path in a railroadtrack diagram.