View HTML and XML

By: Jaap van Lelieveld

Release date: august 17, 1999

  1. Installation and system requirements
  2. The program
  3. The command line
  4. The INI-file
  5. SETUP paragraph attributes
  6. Internal procedures
  7. Direct cursor navigation
  8. Remarks on HTML interpretation
  9. Remarks and comment

Abstract Some time ago - on april 16, 1998 - I needed an HTML viewer. The main reason for this was the outcome of the ADOBE PDF2HTML convertor. The HTML output allows a lot more navigation then the text output. I started looking for an easy to use HTML viewer that still works in DOS. Most viewers I found only removed the HTML tags; others - like LYNX for DOS - were so complex to use...

Therefore I tried it myself

L_H is an HTML viewer (not a browser !) ... for DOS. Since the first start XML facilities have been added.

The funny thing with this viewer though is that you can easy "configure" it so it meets your personal needs. It can be very useful for example in some cases to be informed about certain tags. In other cases you might prefer just plain text.

Next to several basic functions you can simply change the INI-file or make alternate INI-files which you can add through a parameter on the command line.

This document is only available in English.

1.

Installation and System requirements

To install the program simply put at least the active files in a directory in your DOS path (active files: L_H.EXE, L_H.HTM and L_H.INI). In the distribution ZIP you will also find DOS batch files named L_H_I.BAT and L_H_INI.BAT. These batch files can be used to rebuild the L_H.INI file with alternate user-interface languages etc. If you decide to prepare or change components for the INI- file you can update the L_H_INI.BAT batch file, and the basic components referred to, to allow the selection of new components.

The program does not have specific system requirements. The following notes could be useful though:

  * Although there are no limitations on (IBM compatible) PC's the program will run on, a smaller processors (<486) might perform relatively slow especially during the conversion phase. A disk cache might speed up this process. A limited number of temporary disk space will be needed during execution.
  * If the /Convert option is not used temporary disk space (equivalent to the documents read at one time) will be used files named $$$nnnnn.THM, $$$LFL$$.thm and $$$TOC$$.THM. These files will be added in the directory referred to by the TEMP DOS environment variable. If this DOS environment variable is not available the program directory will be used to store the temporary files. Normally these files will be deleted if the program comes to an end.

If the program stops with some error these files will not be removed.
Make sure enough temporary free disk space is available to store the temporary files. Note a TEMP environment variable is necessary if the program is run from a CD since the program directory can not be used in such a case.

  * If available the program will use XMS memory. The status panel - which is available through Shift F1 - shows if and how XMS memory is used.
If no XMS memory is available the program might get in trouble if big documents are loaded. Therefore it is recommended to run the program in an XMS environment if you plan to use larger documents.
  * The program supports long filenames and the {close} request when running in a DOS box under MS Windows. The {close} command only works if the program is started directly e.g. using the Run option in the start menu.
  * If you want the program to do XML viewing too please add hte L_H_XML.INI to the default INI- file or add it on the command line.

2.

The program

2.1.

Basic functionality

The basic functionality available in the program:

INI-file (see the string attributes).
  * Support for basic HTML tags like new line (BR), paragraph (P) etc.
  * Support for presentation aspects in text for HTML tags like font changes (FONT), bold/italic/underline (B/I/U) etc. and color and background color attributes. Quite a number of these tags exist. If you need some which are currently not covered add them to the INI-file. You might also change the way tags are presented by changing the
  * Support for links and anchors for HTML tags like A, LINK and tag attributes like LONGDESC. Support for XML tags like LOC
  * Support for tables in HTML and XML for both single level and nested tables. Note the regular table presentation method might end up in a table with (very) small columns because of the DOS screen limitations.
As an alternative table presentation with cell numbering is available. This mode supports nested tables (see /T option) as well. In this presentation mode cell names (attribute=TITLE) is also displayed.
  * Support for (nested) ordered and unstructured lists (OL, UL), data description (DD) and list items (LI) and described terms (DT) in HTML. Support for list types (BLIST/ITEM, GLIST/ITEM, OLIST/itm, ULIST/ITM) in XML.
  * Simple support for frame sets and maps. Frames in the frameset and areas in a map are displayed in a list of selectable anchors.
  * Support for non-standard symbols. This does NOT include the SYMBOL character set or other ISO standards then IS8859-1 yet.
Symbols can occur in several formats: "&name;", "&nnn;" or "&Xnn;",
where
name = a literal (like "nbsp");
XNN = numeric constant in hexadecimal format (like "&XA0;"
nnn = numeric constant (like "&131;".

If you add or modify symbol paragraphs please check which format you will need/use. You can add both occurrences of each symbol by using the name or numeric value.

  * Support for suppressing specific texts like scripting procedures, comment etc.
  * Support for special screening on HTML files produced by the Adobe PDF2HTML converter (see www.adobe.com/access).
  * A file selection list (F5), that allows you to select an (HTML-)file. You can use letters to navigate in the file list.
  * Support for long filenames as used in DOS 7 (the DOS-box in MS Windows).
  * Support for thw MS Windows {close} request (when started directly e.g. using the Run option in the start menu).
  * Support for logging of URL-information in an anchor or link.
  * Support for setting a bookmark on a document to allow you to restart at a certain page of a document. Note only one bookmark can be saved for each document by using F10 or by simply quitting the program (if the INI-file option is set).
  * Support for line-by-line direct cursor navigation with the cursor keys or number keys e.g. with the numeric keypad. This includes support for marking, grabbing and saving of text blocks from the screen in the log.
  * The program offers a multi-lingual user-interface. If you want to use an alternate user-interface language (simply) translate the messages in the INI-file. Please follow the hints and remarks in the INI-file. Current languages available are english and dutch. The program can handle up to five languages at one time. You can switch languages by using Shift-F8. Note that the user-interface language loaded first will be the start up and default language. If an HTML document has the LANG attribute (in the HTML or BODY tag) the program will try to support this LANG request by adjusting the user-interface language.

If you translate the user-interface messages to a new language be so kind to return this translation to me to allow me to distribute it as another alternate user-interface language.

  * The program offers a text-search facility. The cursor will be possitioned on the first match found on a page. If more matches exist all matching lines will be reported. The closest link - if any exists - on the displayed page preceding the found text will be the active link.

If the search is repeated using the same searchstring the current page will not be checked again. searching will start on the next or previous page.

  * The program offers a file search and file selection facility if no INDEX.HTM is present. This file selection list can be saved as INDEX file using the /I option.
  * The program offers an option to generate a table of contents automatically (see the /G option). This contents overview will be added after the document text. The user determines which HTML tags are used in this generation process. Default the HTML header tags will be used.

If you need more (or less) specific output simply add or remove specific paragraphs in the INI-file. Be aware if you start changing HTML tag handling some HTML background knowledge could be very useful

2.2.

Key usage

In the table below an overview is given of all keys the program will reply to while the regular display mode is used. If the Direct cursor navigation mode is used alternate keys apply. Please check the chapter where this option is described for more details on key usage.

Key Description
F1 Show a help screen
F2 Search forward
F3 Go to a (known) page number; the page numbers in the right upper corner apply
F4 Add a named file
F5 Show file list for file selection
f6 Break down the used links and go back to the top level i.e. the first HTML page requested
F7 Jump to previous HTML document displayed
F8 Jump to next HTML document displayed
F9 If you are positioned on a link the target of the link (i.e. the location the link refers to) will be displayed
F10 Save the currently displayed page as a bookmark to use when this HTML page is used later on
Shift-F1 Show program status including XMS usage
Shift-F2 Search backward
Shift-F8 Switch between available (loaded) user-interface languages
Shift-F9 If you are positioned on a link the target of the link (i.e. the location the link refers to) will be written to the log file
Cursor Down Jump to the next link; if the current page is the last link on the current page the next page will be displayed; if this new page does not have any link the cursor will be positioned in the bottom line
Cursor Up Jump to the previous link; if the current link is the first link on the current page the previous page will be displayed; if the new page does not have any link the cursor will be positioned in the bottom line
Cursor Right Select and execute link (see Enter)
Cursor Left Go back (up) one link level
End Jump to the end of the displayed document
Home Jump to the top of the displayed document
Page Up Jump to the previous page
Page Down Jump to the next page
Escape Leave the program
Enter Select and execute link (see Cursor Right)
Tab Jump to the next link possibly skipping pages which do not have any link
Shift-Tab Jump to the previous link possibly skipping pages which do not have any link
= Save the current page to the log (in text format)
+ Activate Direct cursor navigation (key usage) (please refer to this paragraph for key usage details)
- Quit from / reset Direct cursor navigation (key usage)

Please note the difference between Tab and Cursor Down and Shift-Tab and Cursor Up.

3.

The command line

On the command line you can enter different types of parameters. If you need online help simply use the /? option. The general syntax is:
L_H <file specification>|<filename>... [OPTIONS] [/INI=<filename> ...] [/LOG=<filename>]
3.1.

A file specification, HTML-files or XML-files to convert or show

You can specify one or more HTML-files. The first one will be displayed. Links can be used to jump to other files.

If you use the /Convert option all files you specify will be converted to an ASCII format with a formfeed (Ctrl L) between them. The output will be sent to L_H.OUT.

If the program tries to find a file, the following steps are used in the order below.

If a file specification is specified, if a find request has been done (see below) or if no HTML files are supplied on the command line, the program will try to build a selection page showing all relevant files in the current directory. You can select from and return to this page. This generated page is ordered by document title.

If no .HTM files are found this text will be displayed.

Only one file specification will be used. A file specification can contain wildcard characters. A file specification can not be combined with a HTML file name on the command line.

Used procedure to resolve file names:

  * If a file is found in any step the searching stops. In this case the "file path" of the file found is determined for future use.
  * If a filename contains the character '/' (slash), which is quite common for Unix, each '/' will be replaced by a '\' (backslash).
  * If a referenced file name is empty or contains a reference to the root directory (reference like '/' or '\') the reference is replaced by "<Start directory>\index.htm"
  * As long as a filename starts with '..\' the last part of the last used file path is dropped. In the end the remaining file path is added to the file name.
  * If a filename ends with '.HTML' the 'L' is cleared to match PC type filenames (.HTM).
  * If a filename does not contain an extension '.HTM' is added.
  * If a filename contains a '\' the first part of the name is removed.
  * If a file path is available this path is added to the name.

If the program is started without any parameters the following start methods are tried:

  * First a check is done to see if an INDEX(.HTM) file is present. If so the program is ran with this index.
  * An index is built from all HTML or specified files. The overview is ordered by the HTML title of the documents found.
An additional feature is the text search facility. This will only show the documents that meet the search criteria.
  * If no files are found this help file (L_H.HTM) will be displayed.
  * If the help file can not be found the command line help will be displayed.

3.2.

Options

Several options are available. These options can be preset in the INI-file and can be overwritten using the command-line. Please note the "name" of these options is user-interface language dependent. The overview below shows the available command line options and INI-file options:

Command line option INI-file option Description (based on english user-interface)
/Adobe{+|-} ADOBE_CONVERSION = {ON|OFF} Activates or deactivates special support for HTML files generated by the Adobe PDF2HTML converter (see www.adobe.com/access for more details). This option will perform certain additional character conversions.
/Bookmark{+|-} SAVE_BOOKMARK = {ON|OFF} The /b- option suppresses automatic bookmarks; the /B+ option enables automatic bookmarks.
The default can be set in the INI-file.
BOOKMARK_ENTRIES = <Number> This option can only be set in the INI- file. It maximizes the number of bookmarks that will be saved.
/Convert Converts the supplied HTML files to ASCII text in a single output file.
/Find If this option is specified you will be prompted for a search text. A maximum of 5 words existing of only letters or digits will be used to search all specified files. If the requested text is found in one or more files a selection page will be generated. If no file matches the search criteria the program will stop. With the search option a file specification can be specified.
/Generate{+|-} GENERATE_TOC = {ON|OFF} This option allows you to generate a table of contents for a document. This table will be added after the text of the document. A link to the generated table of contents will most likely be one of the first links in the document. Up to 9 levels of TOC information can be generated.
INDENT = <Number> This option allows you to set the indent size (in spaces) used e.g. for list items.
/Index This option allows you to save the file selection list as INDEX.HTM. This of course only works if a selection list is generated e.g. by giving no (other) options or by using the /F option.
/INI=<filename> Allows you to add one or more additional INI-files e.g. for special support for certain HTML documents.
LINE_BREAK = <Number> This option allows you to set the line size (in characters)after which white space will result in an automatic line wrap.
LINE_WIDTH = <Number> This option allows you to set the maximum line length (in characters). Because this program is designed to run in a DOS environment it is not recommended to set this size larger than 80 characters.
/LOG=<filename> /LOG=<filename> Allows you to set an alternate log-file which will be used for the text grabbing results and URL logging.
MATCH_STRINGi = <String> This option allows you to change the internal strings used. This can be useful if the default ("%S") does not meet your needs. In strings in the INI-file the match string can be used to indicate the position of parameters. If this value is not correct for your use simply change all values of the match_string.
NEW_PAGE_STRINGi = <String> This option allows you to change the internal string used to match a new page in the intermediate files. This can be useful if the default ("{:PAGE:%S:}") does not meet your needs.
/Procedures Gives an overview of all internal procedures available with a (very short) description. All procedure names with an "_" in its name are used to handle attribute/parameter pairs. The other procedures are used to process certain begin or end tags or to offer special support.
SCREEN_LINES = <Number> This option allows you to set the number of lines that will be used on the screen. Do not set the value >20. I did not try too much in this field!.
SCREEN_WIDTH_IN_PIXELS = <Number> Certain HTML tags can specify the width or height in pixels. Here the maximum screen size is of importance. Select the default to use. Possible standard values are 480:640, 600:800, 768:1024 etc. Only the screen width is used. The height of an object is not taken in consideration.
TAB_SIZE = <Number> This option allows you to set the size of a TAB in case of pre-formatted text.
/Table{+|-} TABLE_MODEi = <String> Uses alternate table presentation with numbered rows and cells; each cell starting on a new line. Possible values: STANDARD or ALTERNATE
/TFrames{+|-} TABLE_FRAMEi = <String>
TABLE_RULE_COLUMNi = <String>
TABLE_RULE_ROWi = <String>
The /TF- option suppresses all table frames and rules; the /TF+ option activates all table frames and rules; the /TF* option will insert blank lines between table rows.
The default setting can be set in the INI-file, where additional options are available for all three options:
NEVER - Suppress rulers
ALWAYS : Generate rulers
HTML : Use HTML settings for a table
EMPTY_LINE : Insert an empty line after each row.
TOC_ENTRY_ON_NEW_SCREEN = = {ON|OFF} This option - which is especially useful for speech and braille users - will force a new screen for each text part that will get an table of contents (TOC) entry> This option will also work in the GENERATE_TOC option is not set.

3.3.

Additional options for experts

Some additional command line switches are available for more specific tasks. They might only be useful for specialists. Below they are described.

/XB If this switch is used the output text is sent through the BIOS.
/XD If this switch is used the temporary files holding intermediate code will not be deleted at program termination.
/XE If this switch is used a list of HTML elements and attributes is logged in the log-file. All elements and attributes handled by the program are marked with an asterisk.
/XT If this switch is used table column sizes will be reported at the end of each table.

3.4.

Alternate INI-files

You can add or overwrite certain paragraphs in the default INI-file by simply adding one or more /INI-file references. For each /INI-file reference the contents of the file will be added to the default settings. If you use entries that already exist the old ones will be replaced.
4.

The INI-file

An INI-file is used to set the program's defaults, user-interface language etc. The default INI-file (L_H.INI) must always be available in the current directory (to set defaults for special groups of files) or in the program directory.

An alternate INI-file (L_H_IDPN) is available without "all kind of layout-change-texts". If you prefer this you can run the distributed DOS batch file L_H_I.BAT (or L_H_INI.BAT). You can also change the contents of the INI-files manually.

Different paragraphs can exist:

INTERFACE This paragraph defines the characteristics of a user-interface. This defines e.g. the LANGUAGE of the user-interface. Use this paragraph to start the series of messages used to make a complete user-interface.
MESSAGE These paragraphs hold all texts the program can produce on the screen. If you want the program to run in another language simply translate these messages. Of course an additional INTERFACE paragraph should be added before the first message.

Do NOT change the format of the status screens (messages named R01/R20 and S01/S20). The options on the command line can also be translated! Be aware to change to help lines too if you translate the command line texts.

Note all the message paragraphs should be the first ones in the INI-file. If this is not the case the program possibly can not find the text of error messages.

L_H-SETUP This paragraph holds default parameters for the program. This paragraph should occur only once in the INI-file.
ENTITY This paragraph defines special symbols. It should occur for each symbol you want the program to handle. The default list provided could be incorrect or incomplete. If a symbol is not in the list it will be ignored.
ELEMENT This paragraph holds the description for each specific HTML tag. It should occur for each HTML tag you want to program to handle. For a limited set of HTML tags a built-in handling procedure is available.
ELEMENT-ATTRIBUTE This paragraph holds the activities definition for an single element/attribute pair. It should occur for each HTML element/attribute pair you want to program to handle. For a limited set of HTML element/attribute pairs a built-in handling procedure is available.

The string-attributes in the INI-file can contain color setting defaults. Be aware if you SET a color also RESET it.
Please check the default INI-file for a description of the attributes of all different paragraphs.
5.

SETUP paragraph attributes

This paragraph holds a list of attributes. These attributes and their values can be used to configure the program. Please refer back to the table where both command line options and the SETUP attributes are described.@@@

6.

Internal procedures

The internal procedures e.g. for list or table handling belong together. If you do not like to use them simply remove them from the INI-file. If you only remove some of them strange effects can of course occur. You can imagine which belong together: lists handling is one group; table handling is another one.

If you specify the START procedure for a certain tag also specify the END procedure for this tag to avoid hanging options. The /P option will show you all available internal procedures.

7.

Direct cursor navigation

The program allows you to move the cursor everywhere in the text on the screen. To do this:
- switch to the direct cursor navigation mode ( +-key) to switch permanently,
- use the number keys (e.g. on the numeric keypad (note this will only work if Num Lock is on) or
- the number keys used by laptops (e.g. J=1, K=2 etc).
As soon as +-key or one of the keys below is used the program switches to the "page navigate" mode automatically. If you use the Ctrl-Home or '5'-key the cursor will be reset to the active link i.e. the link before the current cursor position. If did not switch permanently by using the +-key and use the regular navigation keys like the cursor keys etc. the page navigate mode will also be switched off automatically. The navigation action will in this case start from the original cursor position or from the expected cursor position if you switched to another page.

The program allows you to mark a part of the text. To set a mark use the '/'-key. To grab the text after moving the cursor use the '*'-key. The grabbed text will be written to the log. Please note the marked text is only displayed as coordinates at the bottom of the screen.

7.1.

Key usage

Available keys:

Key Description
+ Switch to direct cursor navigation mode permanently. If you are in permanent direct cursor navigation mode the regular cursor keys can be used next to the number keys below.
- Reset the permanent direct cursor navigation mode.
4 U (Left) Move the cursor left one character. The cursor will wrap to the end of the previous line.
6 O (Right) Move the cursor right one character. The cursor will wrap to the beginning of the next line.
8 (Up) Move the cursor up one line.
2 K (Down) Move the cursor down one line
7 (Home) Move the cursor to first position of the line. If it is on the first position of a line move the cursor to the top line of the screen.
1 J (End) Move the cursor to the right margin of the page. If the cursor is at the right margin move the cursor to the bottom line of the screen.
9 (PgUp) Move the text one page up. The cursor will be positioned in the bottom line of the new page.
3 L (PgDn) Move the text one page down. The cursor will be positioned in the top line of the new page.
5 I (Ctrl-End) The cursor will be reset to the active link i.e. the link before the current cursor position. If you switched to another page the cursor will be positioned on the screen position where it would be on if you jumped to that page with PgUp or PgDn. If a mark is set it will be cleared.
/ Set a text marker to allow you to grab a block of text on the current page. If you press this key again the mark will be cleared. While a mark is pending you can not move to another page.
* Grab a block of text. You should use the '/' key to set a start mark first. You can move in all directions before grabbing the text. The program will - in all cases - pick out the right block of text. You will be prompted for a log identification text. This identifying text and the grabbed text will be written to the log.

8.

Remarks on HTML interpretation

This paragraphs lists some considerations on the way the program handles HTML tags.

  * Nested tables are supported in the standard mode. If you use alternate table presentation (/T option) nested tables will be displayed with table, row and cell numbers each beginning on a new line.
  * The ROWSPAN and COLSPAN attributes in tables are supported. The COLSPAN though is implemented in such a way the text will be displayed in right most column of the spanned set. In the other cells blanks will be displayed to indicate the column spanning.
  * The NOBR tag is not implemented. Because of the limited space on a text screen and the way the program displays attributes, images etc, screen lines might overflow anyway.
  * The color-attributes in HTML are not supported. The simple reason for this is that the color notation in HTML can hardly be mapped on the 16 colors used in DOS in text mode.

Colors can be added to the text-strings used to displayed HTML-tags. These colors are stored on a color-stack. This stack is handled as safe as possible by not allowing duplicate entries (while pushing on the stack) and automatic popping when an HTML end-tag is missing.

This allows the user to mark the HTML tags he wants to with the colors he likes best without any influence from the HTML colors used in the HTML documents..

9.

Remarks and comment

If you do have any error reports, comments, requests for enhancements etc please let me know. I up to now only implemented what I needed myself.

If the program hangs up your computer it might have some problems with certain HTML constructions. If this happens please mail me a (part of) the HTML page that gives problem, so I can try to find out why things go wrong.

Please send me an E-mail at: jvleliev@inter.nl.net

Download this program from: www.inter.nl.net/users/jvleliev/ebutc.htm

Top of text