Extended column convention for FITS BINTABLE
--------------------------------------------

The BINTABLE extension type as described in the FITS Standard
(FITS Standard v3.0, sec 7.3) requires table column metadata
to be described using 8-character keywords of the form XXXXXnnn,
where XXXXX represents one of an open set of mandatory, reserved
or user-defined root keywords up to five characters in length,
for instance TFORM (mandatory), TUNIT (reserved), TUCD (user-defined).
The nnn part is an integer between 1 and 999 indicating the
index of the column to which the keyword in question refers.
Since the header syntax confines this indexed part of the keyword
to three digits, there is an upper limit of 999 columns in
BINTABLE extensions.

Note that the FITS/BINTABLE format does not entail any restriction on
the storage of column *data* beyond the 999 column limit in the data
part of the HDU, the problem is just that client software
cannot be informed about the layout of this data using the
header cards in the usual way.

In some cases it is desirable to store FITS tables with a column
count greater than 999.  Whether that's a good idea is not within
the scope of this discussion.

To achieve this, I propose the following convention.

Definitions:

 - 'BINTABLE columns' are those columns defined using the
      FITS BINTABLE standard

 - 'Data columns' are the columns to be encoded

 - N_TOT is the total number of data columns to be stored

 - Data columns with (1-based) indexes from 999 to N_TOT inclusive
      are known as 'extended' columns.  Their data is stored
      within the 'container' column.

 - BINTABLE column 999 is known as the 'container' column
      It contains the byte data for all the 'extended' columns.
 
Convention:

 - All column data (for columns 1 to N_TOT) is laid out in the data part
      of the HDU in exactly the same way as if there were no 999-column
      limit.

 - The TFIELDS header is declared with the value 999.

 - The container column is declared in the header with some
      TFORM999 value corresponding to the total field length required
      by all the extended columns ('B' is the obvious data type, but
      any legal TFORM value that gives the right width MAY be used).
      The byte count implied by TFORM999 MUST be equal to the
      total byte count implied by all extended columns.

 - Other XXXXX999 headers MAY optionally be declared to describe
      the container column in accordance with the usual rules,
      e.g. TTYPE999 to give it a name.

 - The NAXIS1 header is declared in the usual way to give the width
      of a table row in bytes.  This is equal to the sum of
      all the BINTABLE columns as usual.  It is also equal to
      the sum of all the data columns, which has the same value.

 - Headers for Data columns 1-998 are declared as usual,
      corresponding to BINTABLE columns 1-998.

 - Keyword XT_ICOL indicates the index of the container column.
      It MUST be present with the integer value 999 to indicate
      that this convention is in use.

 - Keyword XT_NCOL indicates the total number of data columns encoded.
      It MUST be present with an integer value equal to N_TOT.

 - Metadata for each extended column is encoded with keywords
      of the form HIERARCH XT XXXXXnnnnn, where XXXXX
      are the same keyword roots as used for normal BINTABLE extensions,
      and nnnnn is a decimal number written as usual (no leading zeros,
      as many digits are required).  Thus the formats for data
      columns 999, 1000, 1001 etc are declared with the keywords
      HIERARCH XT TFORM999, HIERARCH XT TFORM1000, HIERARCH XT TFORM1001
      etc.  Note this uses the ESO HIERARCH convention described at
      https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html.
      The "name space" token has been chosen as "XT" (extended table).

 - This convention MUST NOT be used for N_TOT<=999.

The resulting HDU is a completely legal FITS BINTABLE extension.
Readers aware of this convention may use it to extract column
data and metadata beyond the 999-column limit.
Readers unaware of this convention will see 998 columns in their
intended form, and an additional (possibly large) column 999
which contains byte data but which cannot be easily interpreted.

An example header might look like this:

   XTENSION= 'BINTABLE'           /  binary table extension
   BITPIX  =                    8 /  8-bit bytes
   NAXIS   =                    2 /  2-dimensional table
   NAXIS1  =                 9229 /  width of table in bytes
   NAXIS2  =                   26 /  number of rows in table
   PCOUNT  =                    0 /  size of special data area
   GCOUNT  =                    1 /  one data group
   TFIELDS =                  999 /  number of columns
   XT_ICOL =                  999 /  index of container column
   XT_NCOL =                 1204 /  total columns including extended
   TTYPE1  = 'posid_1 '           /  label for column 1
   TFORM1  = 'J       '           /  format for column 1
   TTYPE2  = 'instrument_1'       /  label for column 2
   TFORM2  = '4A      '           /  format for column 2
   TTYPE3  = 'edge_code_1'        /  label for column 3
   TFORM3  = 'I       '           /  format for column 3
   TUCD3   = 'meta.code.qual'
    ...
   TTYPE998= 'var_min_s_2'        /  label for column 998
   TFORM998= 'D       '           /  format for column 998
   TUNIT998= 'counts/s'           /  units for column 998
   TTYPE999= 'XT_MORECOLS'        /  label for column 999
   TFORM999= '813I    '           /  format for column 999
   HIERARCH XT TTYPE999         = 'var_min_u_2' / label for column 999
   HIERARCH XT TFORM999         = 'D' / format for column 999
   HIERARCH XT TUNIT999         = 'counts/s' / units for column 999
   HIERARCH XT TTYPE1000        = 'var_prob_h_2' / label for column 1000
   HIERARCH XT TFORM1000        = 'D' / format for column 1000
    ...
   HIERARCH XT TTYPE1203        = 'var_prob_w_2' / label for column 1203
   HIERARCH XT TFORM1203        = 'D' / format for column 1203
   HIERARCH XT TTYPE1204        = 'var_sigma_w_2' / label for column 1204
   HIERARCH XT TFORM1204        = 'D' / format for column 1204
   HIERARCH XT TUNIT1204        = 'counts/s' / units for column 1204
   END

This general approach was suggested by William Pence on the FITSBITS
list in June 2012
(https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html),
and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
The details have been filled in by Mark Taylor (Bristol).

It was discussed in some detail on the FITSBITS list in July 2017
(https://listmgr.nrao.edu/pipermail/fitsbits/2017-July/002967.html)


--------------------------------------------------------------------

Note: a previous variant of this convention was proposed in which
the metadata for the extended columns was declared by extending
the numbering scheme using three-digit base-26 representations.
It was identical to the above, except that:

 - Metadata for each extended column is encoded with keywords
      of the form XXXXXaaa, where XXXXX are the same keyword roots
      as used for normal BINTABLE extensions, and aaa is a 3-digit
      value in base 26 using the characters 'A' (0 in base 26) to
      'Z' (25 in base 26), and giving the 1-based data column index
      minus 999.  The sequence aaa MUST be exactly three characters
      long (leading 'A's are required).  Thus the formats for data
      columns 999, 1000, 1001, etc are declared with the keywords
      TFORMAAA, TFORMAAB, TFORMAAC etc.

This convention can therefore allow encoding of tables with data
column counts N_TOT up to 998+26^3 = 18574.

In that case the header looks identical to the previous example up
to TFORM999, but the remaining entries differ:

   TTYPE998= 'var_min_s_2'        /  label for column 998
   TFORM998= 'D       '           /  format for column 998
   TUNIT998= 'counts/s'           /  units for column 998
   TTYPE999= 'XT_MORECOLS'        /  label for column 999
   TFORM999= '813I    '           /  format for column 999
   TTYPEAAA= 'var_min_u_2'        /  label for column 999
   TFORMAAA= 'D       '           /  format for column 999
   TUNITAAA= 'counts/s'           /  units for column 999
   TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
   TFORMAAB= 'D       '           /  format for column 1000
    ...
   TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
   TFORMAHW= 'D       '           /  format for column 1203
   TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
   TFORMAHX= 'D       '           /  format for column 1204
   TUNITAHX= 'counts/s'           /  units for column 1204
   END

This variant was generally less favoured than the HIERARCH_based
one by participants in the FITSBITS discussion.

