The Opsys Archiving Session

Concepts

Document archiving is an optional component of Opsys. The component manages the transfer of an irregular stream of data from your main computer into secure long-term storage and provides retrieval facilities at one or more personal computers.

The fundamental unit of data is a printed document and the archive media comprises any appropriate fixed or removable media.

Documents

Examples of single documents include an invoice, a credit note, a statement, a purchase order. This application archives only the tag and the print image of a document, both of which must be available in electronic form.

The Datascape

Knowing where your data is is the key to understanding the archiving process. As you cannot easily sense computer data on the move it may help to imagine the document images produced by your main computer(s) as data flowing into the WORM Sea (the permanent archive media). Before it reaches the Sea, the data may wait first in a WORM Pool on your main computer then in a WORM Lake on your Primary Archive PC (in GO.EXE's Current Directory).

Once the data has reached the Sea, it may be distributed to any number of Secondary Archive PCs by any of three methods: export, extraction or equalisation. All these methods support data transfer via a network connection or by carrying the data on removable media between stations.

The Primary Archive PC imports the data from the Pool to the Lake, maybe almost continuously (so the Pool is usually empty) via the Terminal Session's DDE Protocol WORM, or maybe during overnight consolidation of your main computer system by file transfer across a network.

Either way, the Primary Archive PC eventually exports the data to the Sea twice, firstly to the Archive's current Master Volume then to its Backup Volume. Documents cannot be retrieved until they reach a Master Volume.

Fail Safe

When data is moved it nevertheless remains available at the source (for a while). This property lets the application strengthen its resilience against loss of archive data by:

This strategy implies all archive data is drained from lake to sea twice: firstly (once or twice a day) onto your active Master volume and secondly (perhaps once a week) onto a corresponding Backup volume.

Indexing

When a document is added to the archive media, its master volume location and its tag are added to the retrieval index and all the indexed tags are sorted into document type within date of issue. Over time, the index will grow very large (one megabyte for every 32,768 document tags) so it too is archived. Once you know that all documents dated before (say) last Sunday have been archived, you can instruct the application to archive the index up to that date (as the archived tags no longer require sorting, this process also speeds up subsequent index sorts).

The index locates documents within master volumes only, so it is archived onto master volumes only. But for speed when searching or sorting, any part of the index currently in use is cached (held) on the PC's hard disc. There is no index for backup volumes because you never view documents from a backup volume (this would compromise your security).

Secondary Archive Stations

Although originally a (last resort) recovery strategy, The Equaliser is now used extensively to enable other PC's (known as Secondary Archive Stations) to update their own local archive, either by carrying the current Backup Volume between stations or (more conveniently) by file transfer across a network.

Checkpoints

A checkpoint is any instant at which all the application's data is mutually consistent. If a checkpoint is recorded, it should be possible to fall-back to it should a failure occur later during a critical update sequence. This is the basis of the archiving session's fail safe strategy. The current checkpoint is recorded automatically:

The control file SYNC ties together the checkpoint taken at the time recorded in its directory entry (according to the PC's clock). It provides a snapshot of the datascape at that time. If GO.EXE cannot find this file it creates it with default settings (as the Archive application is disabled by default it will require activation).

Recovery strategy 1

Simply shutting down and restarting the program GO.EXE effects fall-back to the most recently recorded checkpoint.

This strategy overcomes straightforward problems (after you have attended to their cause) including:

Any data imported from your main computer since the checkpoint will be reimported.

Recovery strategy 2

To fall-back to an earlier checkpoint, remove the file SYNC from the current directory (i.e. either delete it or rename it SYNC.OLD). This will force re-initialisation of the application to an archived checkpoint of your choosing.

This strategy overcomes corruption or loss of files within the current directory (including total PC hardware failure necessitating migration to another PC).

Corrupt files should be removed manually. The initialisation procedure will automatically rebuild any files missing from the current directory, except files BATCHnx and INDEX (which should be reconstructed if necessary).

Archive data may be reimported from your main computer provided it is still available there. To guard against possible irrevocable loss of data within the WORM Lake (the BATCHnx file group in the PC Current Directory), you must drain the Lake into your Master Volume more frequently than the main computer fills its WORM Pool.

If you lose any BATCHnx files, you should re-initialise the archiving application having started GO.EXE with the /U option. This instructs the initialiser to disregard the extents of the WORM Lake recorded at the checkpoint, and to use the PC's directory information instead. To bring your Backup Volume up to date you need to copy the latest BATCHnx data sets onto it from the Master Volume. Use main menu option M (Map files on optical volume) to determine what needs to be done, then use options F and B to do it.

Recovery strategy 3

Documents may be added to an archive by using equalisation mode to restore BATCHnx data sets from any archive volume (usually your Backup Volume).

This strategy allows you to replace missing or defective Master Volumes (after first employing strategy 2 to rebuild the older part of the archive if appropriate).

Archive Files in Opsys's Current Directory

BATCHnxThis group of files (sometimes called the WORM Lake) hold all the document tags and print images imported from up to two main computer systems but not yet exported to both the active Archive Master Volume and its Backup Volume. n is the main computer system number (0 or 1) and x is the logical printer identifier (A:O) within that system.
CLUSTERSHolds fixed length (8-byte) records, one for each cluster within the archived data set INDEX0. The nth record points to cluster n-1 both within the archive media and (when that cluster is rolled-in to the outer cache) within the file INDEX0 in the current directory .
LINEUPHolds fixed length (32-byte) records each giving page layout information for one document type. Maintained by main menu option D (Document Lineup).

If GO.EXE does not find this file in the current directory it will create it with your default entries.

INDEXHolds fixed length (32 byte) records, one for each document tag not yet moved into INDEX0 (record layout as for INDEX0).

If this file is lost it should be rebuilt by running main menu option I (Index reconstruction).

INDEX0Holds the hard disc cache for the archived section of the document tag index. This file comprises two logical sections: the Inner Cache and the Outer Cache. The maximum size of each section is assigned separately during initialisation.
The inner cache holds all your most recently archived document tags. Once this reaches its maximum size, each new cluster (of 1,024 tags) ousts the oldest cluster from the inner cache.
The outer cache is used during document retrieval to buffer (roll-in from the archive media) any clusters needed but not available in the inner cache, on a last-in last-out basis.
Rolling a cluster into the outer cache slows down the search process, but only occurs when retrieving older documents. With an 8Mbyte inner cache your most recent ¼-million document tags will be cached on the much faster hard disc.
SYNCHolds the configuration settings for GO.EXE and the latest checkpoint data for the Archiving Session.
VWAPATH.INIThe Archiving session reads this text file (if it exits) at startup. List here (in search order, one path on each line) any paths where GO.EXE is to look for Archive Volumes additional to the Archive Path specified in the Change Settings dialog.
XREFHolds fixed length (20-byte) records which index the document tag index. There is one record for each document type within date of issue. The records are held in chronological sequence.
*.BAK and *.TMPMain menu option A (Archive & Index new documents) produces new versions of the files INDEX and XREF. During this process the new versions are named INDEX.TMP and XREF.TMP. On completion, the old versions are renamed INDEX.BAK and XREF.BAK.

Main menu option R (Retrieve documents) uses temporary work files WORMnn.TMP.

If you need more space on your hard disk, you can delete these files when GO.EXE is shut down.

Archive Volumes

When you introduce a new Archive Volume, you give it an electronic label specifying:

Provided you take care to label all your volumes uniquely, you may leave it to the application to tell you which one is needed (if it cannot be found on-line). A Backup Volume is always paired with a Master Volume with the same Volume Number.

The data type UINT(n) below represents an unsigned binary number of length n bytes in little-endian format (i.e. the least significant byte is presented first).

The original (1989) implementation of the Archive application used Maxtor OC-800 5¼ inch Optical WORM media, directly accessing the sectors (each holding 2048 bytes of user data) along each track as a single Archive Volume follows:

Sectors 0:511Reserved (not used).
Sector 512Volume Label.
Sectors 513:191951Data Sets. Each Data Set occupies one or more whole sectors and contains a binary copy of a file (or a section of a file, called a cluster) originally held within a PC file system.
Data Sets are written sequentially along the track from the sector following the current Partition Base, without leaving any sector blank. When a Volume is first labeled, the Partition Base is sector 512 (the label sector) and is changed only if it becomes necessary to "skip" over defective media.
The first two bytes of every sector of a Data Set are reserved and contain the UINT(2) sequence number of this sector within the Data Set. The first sector of a Data Set has the sequence number 0 and begins with the Data Set Header of 24 or 36 bytes.

Thus the maximum file size within a data set is 134,086,632 bytes.

This device-specific structure has been re-implemented under the MS-DOS file system (thereby making it device independent) using files known as Virtual WORM Archives and named VOLyynnx.VWA, where yynn is the Volume Number and x is M for a Master Volume or B for a Backup Volume.

The first byte of a .VWA file is the first byte of the Volume Label, designated as byte 1,048,576 of the Virtual WORM Archive (the 512 unused reserved sectors are not physically present in the .VWA file).

Archive Paths

Several .VWA files may exist in a single MS-DOS directory. Indeed even if the media has high capacity, such as a 650Mb CD-ROM cartridge, it's generally more convenient if this is organised as half a dozen files of around 110Mb than as one single file (which can be difficult to move around). When you setup GO.EXE's Archiving Session, the location of the directory holding its archive files must be given in the Archive Path field of the Change Settings dialog.

If the Archive Path is the root directory of a device be sure to add the \ when specifying the path, e.g. enter E:\ rather than E: because the latter format is reserved to indicate that this is a WORM drive requiring the original device driver.

A .VWA file may reside on any appropriate media, fixed or removable, and may be copied and shared over a network. To allow you to distribute your archive with total flexibility, when GO.EXE is started it looks for the text file VWAPATH.INI in its Current Directory. Make a list in this file of other paths (in search order with one path on each line) where the application is to look for .VWA files.

For example, it may convenient to hold recent archive data on a fast hard disc, relegate older data to CD-ROM, and use removable Jaz media for the Backup Volume.

Volume Label

UINT(2)Schema Number:
VALUE 1 if this version.
UINT(2)User Number.
(Backup volumes have a User Number one greater than master volumes.)
UINT(2)This Volume Number (99V99).
UINT(2)Previous Volume Number (99V99).
PIC X(4)Time & Date labeled (in MS-DOS directory format) as read from PC clock.
PIC X(20)(empty)
PIC X(64)User Name & site Post Code (ASCIIZ).
PIC X(1952)(empty)
UINT(1)Length of this Header in bytes including this byte (i.e. 24 or 36 bytes).
UINT(1)Schema Number:
VALUE 1 if this version.
PIC X(22)Attribute, Time, Date, Size and Filename in MS-DOS directory format.
UINT(4)This Cluster Number.
UINT(2)Volume Number, previous cluster.
UINT(4)Sector Number, previous cluster.
UINT(2)Number of Sectors, previous cluster.

A data set is valid only if it occupies sufficient sectors to contain a file of the size stated in the header.

The last four fields are present only when the data set contains a cluster (a file section). The first cluster of a file has Cluster Number 0.

Data Sets written by the Archiving Application

The BATCHnx Data Sets

Each BATCHnx data set contains a file of variable length records (see layout following)containing the document tags and print images from logical printer x (range A:O) on main computer n (0=Normal, 1=Backup). The data set header records the time of receipt at the PC (according to its clock) of the last data frame from the printer. Each document comprises one Tag record followed by any number of Print Image records each containing one print line. A form feed must separate each page of multi-page documents.

UINT(1)Length i of this record in bytes including this byte;
VALUE 0 if end of data (always present).
UINT(1)VALUE 128+n if Document Tag (Schema n) follows;
Line Feed count if Line Feed with Print Image follows;
VALUE 0 if Form Feed with Print Image follows;
VALUE 128 if Vertical Tab with Print Image follows.
PIC X(i-2)Document Tag or Print Image
(ASCII with VALUE 128+j representing j consecutive spaces).

The INDEX0 Data Sets

Each INDEX0 data set is a cluster of up to 1,024 records of fixed length 32 bytes (see layout following). Each record holds the tag and the location of one archived document. The location points to the start of the Document Tag record for the document within the relevant BATCHnx data set. The records are sorted within the data set into Document Type within Date of Issue, and the data set header records the cut-off date for the Date of Issue.

UINT(2)Master Volume Number of document's location.
UINT(4)Byte Offset in volume of document's location.
UINT(3)VALUE 0 (not used).
PIC XSerial Number Group
UINT(4)Serial Number.
PIC X(2)Date of Issue (in MS-DOS directory format).
PIC XDocument Type (user defined).
PIC XSchema Number for this Document Type (user defined).
PIC XFlag 1 (user defined).
PIC XFlag 2 (user defined).
PIC X(12)Reference (user defined).

The SYNC Data Sets

Each SYNC data set is a copy of the SYNC file taken from the PC's current directory at the time (according to the PC's clock) recorded in the data set header.

Main menu option A (Archive and Index new documents) adds a SYNC data set to a volume upon successful completion of its schedule for that volume.

WORM technology

The acronym WORM expands to Write Once Read Many. A WORM drive incorporates a low-power infrared laser to burn patterns into a sensitive layer on the media. This pattern cannot subsequently be changed, but may be read back as often as desired. Data seek and transfer rates are comparable to floppy diskette but the capacity is much greater. The sensitive layer on the cartridges is methine dye spin-coated in two spiral tracks (A and B), one on each side of the disc. There is only one head in the drive unit so the cartridge has to be turned over by hand to access the opposite side, therefore the application classifies each side as an independent volume. Each track is pre-formated into 191,952 sectors each of which may be written once with 2,048 bytes of data. Thus one cartridge can hold 750 Mb (approaching one million typical document image pages).

In 1989 the typical PC hard disc could hold 20Mb, so WORM technology was well suited to archiving, the removable cartridges being unmodifiable and as convenient as floppy discs. Nowadays, hard discs provide much faster on-line access together with massive capacity whilst CD-R media provide cheap, reliable and convenient long-term off-line backup.

Installation

A WORM PC requires the extra equipment listed following.

Maxtor OC-800 (5¼" Optical Disk Cartridges)

The initial requirement is for two cartridges comprising one master/backup pair.

Each of the two sides of a cartridge has its own small green slider to enable or disable write protection. Please ensure that both the green sliders are in the W/R position.

Maxtor RXT800-HS (5¼" Optical Disk Drive Unit)

This WORM drive unit is marketed under the Storage Dimensions or Ricoh badge. It may be ordered either for internal installation into your PC's empty half-height 5¼" drive bay or as an external unit with its own power supply. This guide assumes you will be using the external version. This has a power switch and fuse socket at the rear, near the power inlet.

WORM drives are relatively expensive and should be handled firmly but never roughly. Ideally, set up the entire station in a cul-de-sac away from dusty areas.

The drive unit features the industry standard Small Computer System Interface (SCSI) for connection to the host system (in this case the PC) through either of the pair of Centronics-type sockets at the rear.

Please ensure that the drive's SCSI ID number is set to 0 by moving the small red wheel at the rear of the unit so that it points to 0.

Future Domain HCA-120 (External Cable)

This 36-inch cable connects the RXT-800HS drive to the TMC-850M controller.

Future Domain TMC-850MCRL (SCSI Controller & driver)

This is a cling-wrapped kit comprising:

  1. With the power off, install this interface card in one of your PC's spare expansion slots. See the Future Domain Installation Instructions booklet, page 17, for full details.
  2. Connect the controller card to the disc drive with the HCA-120 cable. Turn on the power at the disc drive and insert your active master volume (or a blank cartridge if this is a new installation).
  3. Turn on the PC and watch the screen. You should see a banner similar to:
    Future Domain 950 SCSI ROM BIOS v8.2
    followed by the outcome of a SCSI bus scan (which should detect the Maxtor drive at Id 0 if it is switched on). There is plenty of time to read these messages as they extend the boot process by many seconds.
  4. If there is no banner at step 3, you have a memory conflict between this card and some other device in the PC. Switch off the PC and reconfigure jumpers W1, W2 and W3 on the card to another memory address, then repeat step 3. Try each configuration listed on the card until you find one that works. See Future Domain's Installation Instructions booklet, Appendix B, for full information.

Corel OptiStar (alternative driver)

Corel's earlier OptiStar driver has the advantage over CorelSCSI in that the tedious SCSI bus scan (performed by the SCSI ROM BIOS at every PC boot) can be skipped by simply pulling the ROM chip out of the TMC-850M controller:

  1. Ensure the TMC-850M controller is configured to a usable memory address by installing it as described above.
  2. Switch off the PC and carefully remove the 28-pin ROM chip from the TMC-850M controller, using a small flat-head screwdriver.
  3. Switch on the PC. At the MS-DOS command prompt, copy the two OptiStar files 800_FDC.SYS (or WORM.SYS in the 1988 edition) and WDIAGS.EXE into GO.EXE's current directory.
  4. Use a text editor to add the following line to the end of your C:\CONFIG.SYS file:
    DEVICE=C:\GO\800_FDC.SYS /o
  5. Reboot the PC and watch the screen. Check that the WORM device driver signs-on. Note the MS-DOS drive letter given on the last line issued by the driver, for example:
    Installed as drive D:
  6. Run OptiStar's C:\GO\WDIAGS x: program from the MS-DOS command prompt to test the installation, where x is the letter you noted at step 5. This diagnostic program should complete successfully in two minutes or so.

Optical data transfers

To effect any optical data transfer, the program requests the Optistar WORM driver to execute the appropriate SCSI dialog with the WORM drive. During this dialog, a busy message is shown at the bottom of the archiving session's screen:
Busy reading|writing n sector(s) @ i in yy.nn
where i is the base Sector Number and yy.nn is the Volume Number. In due course the message changes to done or fail, according to the outcome reported by the driver.

As the WORM driver complies with MS-DOS's single-tasking restrictions, the program cannot respond to any keystrokes (including hotkeys) whilst the driver is busy. This is particularly evident when the drive is not responding (e.g. powered off), as the driver allows up to 11 seconds for a response.

If the transfer failed one of the following will be displayed:

MS-DOS error k
Check the driver's DEVICE statement in CONFIG.SYS and check the drive letter allocated by the driver when MS-DOS started.
Status k
Check that the WORM drive is ready and correctly connected. k=770 may indicate that some sectors in a multiple sector read are blank.
SCSI Key k Code j
The hexadecimal j is sufficient to identify the cause. Amongst the possible values for j are:
04h Drive not ready (no cartridge loaded)
29h Drive power change
28h Cartridge changed
27h Attempt to write to a write-protected cartridge
21h Invalid sector number
98h Attempt to read a single blank sector
9Bh Attempt to write to sector already written

After any failure (other than reading a single blank sector), the program will try to read the volume label and, if successful, will then retry the transfer. This process continues until either the transfer succeeds or you press Esc. In the latter case the program pauses (though allow up to 11 seconds for a response) with the cursor at the prompt:
Retry(any key) or Esc
This enables you to note the details of the failure. To abandon the current process, press Esc again.

If any volume label read reveals that the wrong volume is in the drive, the program will issue the prompt:
Please load Optical Volume v & press a key (or Esc)
where v is the number of the required volume. To abandon the current process, press Esc.

Write Partitions

To dramatise the irreversible effect of writing to WORM media, the process is often called burning. The program burns each sector in turn along the track, leaving no gaps within the current partition. Whenever you label (or reactivate) a Master Volume, the current partition is reset to extend from the volume label (sector 512) to the end of the track (sector 191,951).

If there has been a volume label read since the last burn the program must search the track to find the lowest blank sector within the partition. This is done by a binary chop iteration (if the central sector is blank then discard the track above otherwise discard the track below; repeat until the boundary is found).

If the program attempts to write to a section of the track so defective as to defeat the flaw manager (described following), your intervention will be required as follows:

  1. Abandon the burn (i.e. press Esc) and note the sector number. Then restart the burn.
  2. If the problem recurs but at a higher sector, repeat (1) until successful.
  3. If the problem recurs at the same sector further repetition is evidently futile. Repeat (1) but before restarting the burn change the partition's base (low) sector to a number slightly higher than the problem sector.

The Archive option under the archiving session menu shows and lets you change the partition base sector. The program never writes outside the partition, so changing the base sector allows a defective section of the track to be skipped. The partition limits apply to any burn to any volume (Master or Backup).

Flaw management

The WORM drive's firmware supports a flaw management technique designed to yield less than one bit error in every 120 gigabytes (300 full volumes) retrieved ten years after recording. The technique overcomes media imperfections by:

Use of the alternate sectors requires the cooperation of the WORM driver. The available documentation for OptiStar does not suggest that this driver provides such support.

The MS-DOS Archive