Man Pages as HTML is hard

And one Dollar per request could be saved.

PDF

I really like to link to valuable sources in my articles. Man pages are a hard problem. There are sites out there delivering them as HTML but not very handy.

Various tools exist and have existed even before I had internet at all. One of the earliest still in use is man2html. A more recent variant is roffit. I tried both and even attempted to improve them – I gave up.

My average desktop Linux offers nearly 3000 level 1 man pages. They are full of sections and program arguments. But man format (man 7 man) is not a very good start for structured documents. Roffit's sample of curl man page is very good but also made for each other. Running roffit for xinput's man page yields no anchors for the program arguments. Translating zstdcat's man page with roffit even causes partly garbled section headings and arguments.

Some of the issues available:

The last point is the most serious. There is no outline of program arguments. Instead a man page author could use a new paragraph with a bold marker for the argument itself and text afterwards:


.PP
.TP 8
.B --enable \fIdevice\fP
Enable the \fIdevice\fP. This call is equivalent to
.B xinput --set-prop device \fI"Device Enabled"\fP 1
	

A more experienced author would use a indexed paragraph that allows a sort of heading or title. The next sample is taken from curl's man page:


.IP "-v, --verbose"
Makes curl verbose during the operation. Useful for debugging and seeing
what's going on "under the hood". A line starting with '>' means "header data"
sent by curl, '<' means "header data" received by curl that is hidden in
normal cases, and a line starting with '*' means additional info provided by
curl.
	

Holding the previous two against the man page of zstdcat illustrates a lack of standardization for arguments at all. This third man page uses not only the character '#' but also escapes minus signs with backslashes:


.
.TP
\fB\-\-stream\-size=#\fR
Sets the pledged source size of input coming from a stream\. This value must be exact, as it will be included in the produced frame header\. Incorrect stream sizes will cause an error\. This information will be used to better optimize compression parameters, resulting in better and potentially faster compression, especially for smaller source sizes\.
	

Rendering man pages with anchors for program arguments requires sophisticated pattern matching. It is not only hard to find arguments but also to transform them into a valid HTML-anchor.

And all this is already summarized in the man page level 7 of man itself:

So instead use the already available info pages created from TeXinfo. Write a sane primer as man page and the full version only as info page. In turn there is less post production effort when rendering for other media. There is not only @option but also @opindex, @defun and many more to add meaning to arguments.

  1. man2html – man2html Savannah Project
  2. roffit at GitHub – Github repository
  3. Texinfo – GNU TeXinfo
  4. Unix documentation (Nov. 2020) – For the love of troff