BP7 Help file format

Contributor: DEREK POWLES             

{
Derek Powles 

Subject: Re Help File format

Expanded information on Help File sources.
   In an earlier message I referred to four authors For the book in (1)
below - my mistake - there are three.
    Derek

(1) 'A Programmer's Guide to Turbo Vision' by Ertl, Machholz and Golgath,
authored by the Pascal product management team at Borland in Germany.
Pub. Addison-Wesley, ISBN 0-201-62401-X.
The book contains chapter 12 ( 18 pages ) describing how to add a
conText sensitive help Unit to your own TV Program.
The use of TVHC is described ( 2 pages ). TVHC generates helptxt.pas and
helptxt.hlp Files from a special helptxt.txt File. The .pas File is
included in your main routine as a Unit. The HelpFile Unit is also needed,
this Unit, TVHC and a working example are in the TVDEMO subdirectory.
This .hlp File is not compatible With the Borland supplied Dos help Files,
these have a name of the form *.t*h. I believe I am correct in stating that,
other than demohelp.hlp, all supplied *.hlp Files are For Windows use.

(2) The 'Borland Open Architecture Handbook For Pascal' describes and
supplies the Program HL.EXE. Chapter 8 tells you '...how to create, build,
and maintain Borland Dos Help Files'. I have attached information gleaned
from this book.
There are two sections, the first is information taken from the book and
the second is a Unit holding data structures.

look For the *****cut here*****.

******************cut here*********************
{  See `Borland Open Architecture Handbook For Pascal' pages 170-177 }
(***************************************************************
Binary help File format -

Records are grouped in four sections

 1 - File stamp
 2 - File signature
 3 - File version
 4 - Record Headers
   a - File header Record
   b - conText
   c - Text
   d - keyWord
   e - index
   f - compression Record
   g - indextags {= subheadings, = qualifiers}

   ***************************************************************

 1 - A File stamp is a null terminated String, note case,
      `TURBO PASCAL HelpFile.\0' or
      `TURBO C Help File.\0'  identifying the File in readable form.
      The null Character is followed by a Dos end-of-File Charcater $1A.
      This is defined as ;STAMP in the help source File.

 2 - The File signature For Borland products is a null terminated
      String `$*$* &&&&$*$' (no quotes in help Files).
      This is defined as ;SIGNATURE in the help source File.

 3 - The File version is a Record of two Bytes that define the version
      of help format and the help File Text respectively,

      Type
        TPVersionRec    = Record
          Formatversion : Byte;
          TextVersion   : Byte   {defined With ;VERSION}
          { this is For info only }
        end;
      FormatVersion For BP7     = $34
                        TP6     = $33
                        TP5     = $04
                        TP4     = $02
                        TC++3.0 = $04

 4 - Record Headers -
  All the remaining Records have a common format that includes a header
  identifying the Record's Type and length.

    Type
      TPrecHdr  = Record;
        RecType   : Byte;
        RecLength : Word;
      end;

  Field RecType is a code that identifies the Record Type.
    Type
      RecType =
        (RT_FileHeader,    {0}
         RT_ConText,       {1}
         RT_Text,          {2}
         RT_KeyWord,       {3}
         RT_Index,         {4}
         RT_Compression);  {5}
         RT_IndexTags,     {6 - only in FormatVersion $34}

  Field RecLength gives the length of the contents of the Record in
  Bytes, not including the Record header. The contents begin With the
  first Byte following the header.

  The field `RecType' Types are explained in the following Text.

  Although this structure allows an arbitrary order of Records, existing
  Borland products assume a fixed Record ordering as follows:-
        File header Record
        compression Record
        conText table
        index table
        indextags Record {introduced in BP7}
        Text Record
        keyWord Record

 4.a The File header Record defines Various parameters and options
  common to the help File.

  Type
    TPFileHdrRec     = Record
      Options        : Word;
      MainIndexScreen: Word;
      Maxscreen size : Word;
      Height         : Byte;
      Width          : Byte;
      LeftMargin     : Byte;
    end;

      Options - is a bitmapped field. Only one option currently
                supported.
                OF_CaseSense ($0004) (C only, not Pascal)
                  if set, index tokens are listed in mixed case
                  in the Index Record, and index searches will
                  be Case sensitive.
                  if cleared, index tokens are all uppercase in
                  the Index Record, and index searches will ignore
                  case.
                  Defined With ;CASESENSE.

      MainIndexScreen - this is the number assigned to the conText
                  designated by the ;MAININDEX command in the help
                  source File.
                  if ;MAININDEX is not used, MainIndexScreen is set
                  to zero.

      MaxScreenSize - this is the number of Bytes in the longest Text
                  Record in the File (not including the header). This
                  field is not currently in use.

      Height, Width - This is the default size in rows and columns,
                  respectively, of the display area of a help Window.
                  Defined With ;HEIGHT and ;WIDTH.

      LeftMargin - Specifies the number of columns to leave blank on
                  the left edge of all rows of help Text displayed.
                  Defined With ;LEFTMARGIN.

 4.b ConText table -
  This is a table of Absolute File offsets that relates Help conTexts to their
  associated Text. The first Word of the Record gives the number of conTexts
  in the table.
  The remainder of the Record is a table of n ( n given by first Word) 3-Byte
  Integers (LSByte first). The table is indexed by conText number (0 to n-1).
  The 3-Byte Integer at a given index is an Absolute Byte that is offset in
  the Help File where the Text of the associated conText begins.
  The 3-Byte Integer is signed (2's complement).
  Two special values are defined -
        -1 use Index Screen Text (defined in File Header Record).
        -2 no help available For this conText.
  ConText table entry 0 is not used.

 4.c Text descriptions -
  The Text Record defines the compressed Text of conText.
  Text Records and keyWords appear in pairs, With one pair For each
  conText in the File. The Text Record always precedes its associated
  keyWord. Text Records are addressed through the File offset values
  found in the conText table described above.

  The RecLength field of the Text Record header defines the number
  of Bytes of compressed Text in the Record.
  The Compression Record defines how the Text is compressed.
  if the Text is nibble encoded, and the last nibble of the last Byte
  is not used, it is set to 0.
  Lines of Text comprising the the Text Record are stored as null
  terminated Strings.

 4.d the keyWord Record defines keyWords embedded in the preceeding Text
  Record, and identifies related Text Records.
  The Record begins With the following fixed fields:
    UpConText   : Word;
    DownConText : Word;
    KeyWordCnt  : Word;

  UpConText and DownConText give the conText numbers of the previous
  and next sections of Text in a sequence, either may be zero,
  indicating the end of the conText chain.

  KeyWordCnt gives the number of keyWords encoded in the associated
  Text Record.
  Immediately following this field is an Array of KeyWord Descriptor
  Records of the following form:
    Type
      TPKwDesc = Record;
        KwConText: Word;
      end;

  The keyWords in a Text Record are numbered from 1 to KeyWordCnt in the
  order they appear in the Text (reading left to right, top to bottom).

  KwConText is a conText number (index into the conText table) indicating
  which conText to switch to if this keyWord is selected by the user.

 4.e Index table -
  This is a list of index descriptors.
  An index is a token (normally a Word or name) that has been explicitly
  associated With a conText using the ;INDEX command in the source Text File.
  More than one index may be associated With a conText, but any given index
  can not be associated With more than one conText.
  The list of index descriptors in the Index Record allows the Text of an
  index token to be mapped into its associated conText number.

  The first Word of the Record gives the number of indexes defined in the
  Record.
  The remaining Bytes of the Record are grouped into index descriptors.
  The descriptors are listed in ascending order based on the Text of the index
  token (normal ascii collating sequence). if the OF_CaseSense flag is not set
  in the option field of the File header Record, all indexes are in uppercase
  only.

  Each index descriptor Uses the following format:
    LengthCode    : Byte;
    UniqueChars   : Array of Byte;
    ConTextNumber : Word;

  The bits of LengthCode are divided into two bit fields.
  Bits(7..5) specify the number of Characters to carry over from the start of
  the previous index token String.
  Bits(4..0) specify the number of unique Characters to add to the end of the
  inherited Characters.

  Field UniqueChars gives the number of unique Characters to add.
  e.g. if the previous index token is `addition', and the next index token is
  `advanced', we inherit two Characters from the previous token (ad), and add
  six Characters (vanced); thus LengthCode would be $46.

  ConTextNumber gives the number of the conText associated With the index.
  This number is an index into the conText table described above.

 4.f A compression Record defines how the contents of Text Records are
  encoded.
     Type
       TPCompress = Record
         CompType : Byte;
         CharTable: Array[0..13] of Byte;
       end;

  CompType - nibble encoding is the only compression method currently
             in use.
                     Const CT_Nibble = 2;
  The Text is encoded as a stream of nibbles. The nibbles are stored
  sequentially; the low nibble preceeds the high nibble of a Byte.
  Nibble values ($0..$D) are direct indexes into the CharTable field of
  the compression Record. The indexed Record is the literal Character
  represented by the nibble. The Help Linker chooses the 14 (13?) most
  frequent Characters For inclusion in this table. One exception is that
  element 0 always maps to a Byte value of 0.

  The remaining two nibble values have special meanings:
          Const
            NC_RawChar = $F;
            NC_RepChar = $E;

  Nibble code NC_Char introduces two additional nibbles that define a
  literal Character; the least significant nibble first.

  Nibble code NC_RepChar defines a Repeated sequence of a single Character.
  The next nibble gives the Repeat count less two, (counts 2 to 17 are
  possible)
  The next nibble(s) define the Character to Repeat; it can be either a
  single nibble in the range ($0..$D) representing an index into
  CharTable, or it can be represented by a three nibble NC_RawChar
  sequence.


 4.g RT_IndexTags is in FormatVersion $34 only. This provides a means
  For including index sub headings in the help File.
  The Record header is followed by a list of Variable length tag Records
  Type
    IndRecType = Record;
      windexNumber: Word; {index into the RT_Index Record }
      StrLen: Byte;  {length of tag String(not including terminating 0)}
      szTag: Array[0..0] of Char; {disable range checking}{zero term tag
                                   String}
    end;
   The first structure in the Array is a special entry which has the
   windexnumber set to $FFFF, and contains the default ;DESCRIPTION
   in Case there are duplicate index entries and no tags were specified.
*)

Unit HelpGlobals;
{ This File contains only the structures which are found in the help Files }

Interface

Const
  Signature      = '$*$* &&&&*$';   {+ null terminator }
  NC_RawChar     = $F;
  NC_RepChar     = $E;

Type
  FileStamp      = Array [0..32] Of Char; {+ null terminator + $1A }
  FileSignature  = Array [0..12] Of Char; {+ null terminator }

  TPVersion      = Record
    FormatVersion : Byte;
    TextVersion   : Byte;
  end;

  TPRecHdr       = Record
    RecType   : Byte; {TPRecType}
    RecLength : Word;
  end;

Const   {RecType}
  RT_FileHeader  = Byte ($0);
  RT_ConText     = Byte ($1);
  RT_Text        = Byte ($2);
  RT_KeyWord     = Byte ($3);
  RT_Index       = Byte ($4);
  RT_Compression = Byte ($5);
  RT_IndexTags   = Byte ($6);


Type
  TPIndRecType = Record
    windexNumber : Word;
    StrLen       : Byte;
    szTag        : Array [0..0] Of Char;
    { disable range checking}
  end;

  TPFileHdrRec = Record
    Options         : Word;
    MainIndexScreen : Word;
    Maxscreensize   : Word;
    Height          : Byte;
    Width           : Byte;
    LeftMargin      : Byte;
  end;


  TP4FileHdrRec = Record                  {derived from sample File}
    Options    : Word;
    Height     : Byte;
    Width      : Byte;
    LeftMargin : Byte;
  end;

  TPCompress = Record
    CompType  : Byte;
    CharTable : Array [0..13] Of Byte;
  end;

  TPIndexDescriptor = Record
    LengthCode    : Byte;
    UniqueChars   : Array [0..0] Of Byte;
    ConTextNumber : Word;
  end;

  TPKeyWord = Record
    UpConText   : Word;
    DownConText : Word;
    KeyWordCnt  : Word;
  end;

  TPKwDesc = Record
    KwConText : Word;
  end;


  TmyStream = Object(TBufStream) end;

  PListRecHdr = ^RListRecHdr;

  RListRecHdr = Record
    PNextHdr : PListRecHdr; {Pointer to next list element}
    RRecType : TPRecHdr;    {RecType, RecLength}
    PRecPtr  : Pointer;     {Pointer to copy of Record}
  end;

  ConTextRecHd = Record
    NoConTexts  : Word;
    PConTextRec : Pointer;
  end;


Implementation
end.