Document archiving is an optional component of Opsys. The component manages the transfer of an irregular stream of data from your main computer into secure long-term storage and provides retrieval facilities at one or more personal computers.
The fundamental unit of data is a printed document and the archive media comprises any appropriate fixed or removable media.
Examples of single documents include an invoice, a credit note, a statement, a purchase order. This application archives only the tag and the print image of a document, both of which must be available in electronic form.
A tag at least identifies the document by its type and date of issue, and may include a serial number and other custom references such as an account number or department code.
Tagging documents enables the application to automatically index the archive without the need to interpret print images.
It is possible to send a print image to a word processing application, where (with some set-up effort) it may be underlain with a template of its pre-printed form.
A document longer than 20 pages or thereabouts cannot currently be retrieved in full. These are considered to be reports not documents; you could consider breaking up such a report into small sheets archived individually (which may well make the subsequent retrieval of information easier anyway).
Knowing where your data is is the key to understanding the archiving process. As you cannot easily sense computer data on the move it may help to imagine the document images produced by your main computer(s) as data flowing into the WORM Sea (the permanent archive media). Before it reaches the Sea, the data may wait first in a WORM Pool on your main computer then in a WORM Lake on your Primary Archive PC (in GO.EXE's Current Directory).
Once the data has reached the Sea, it may be distributed to any number of Secondary Archive PCs by any of three methods: export, extraction or equalisation. All these methods support data transfer via a network connection or by carrying the data on removable media between stations.
The Primary Archive PC imports the data from the Pool to the Lake, maybe almost continuously (so the Pool is usually empty) via the Terminal Session's DDE Protocol WORM, or maybe during overnight consolidation of your main computer system by file transfer across a network.
Either way, the Primary Archive PC eventually exports the data to the Sea twice, firstly to the Archive's current Master Volume then to its Backup Volume. Documents cannot be retrieved until they reach a Master Volume.
The structure provides both flexibility for your routine operations and extra storage capacity to give you time to overcome even serious problems without loosing any archive data. Typically, the WORM Pool might hold at least a busy day's documents and the WORM Lake at least a week's documents.
When data is moved it nevertheless remains available at the source (for a while). This property lets the application strengthen its resilience against loss of archive data by:
This strategy implies all archive data is drained from lake to sea twice: firstly (once or twice a day) onto your active Master volume and secondly (perhaps once a week) onto a corresponding Backup volume.
When a document is added to the archive media, its master volume location and its tag are added to the retrieval index and all the indexed tags are sorted into document type within date of issue. Over time, the index will grow very large (one megabyte for every 32,768 document tags) so it too is archived. Once you know that all documents dated before (say) last Sunday have been archived, you can instruct the application to archive the index up to that date (as the archived tags no longer require sorting, this process also speeds up subsequent index sorts).
The index locates documents within master volumes only, so it is archived onto master volumes only. But for speed when searching or sorting, any part of the index currently in use is cached (held) on the PC's hard disc. There is no index for backup volumes because you never view documents from a backup volume (this would compromise your security).
Although originally a (last resort) recovery strategy, The Equaliser is now used extensively to enable other PC's (known as Secondary Archive Stations) to update their own local archive, either by carrying the current Backup Volume between stations or (more conveniently) by file transfer across a network.
A checkpoint is any instant at which all the application's data is mutually consistent. If a checkpoint is recorded, it should be possible to fall-back to it should a failure occur later during a critical update sequence. This is the basis of the archiving session's fail safe strategy. The current checkpoint is recorded automatically:
The control file SYNC ties together the checkpoint taken at the time recorded in its directory entry (according to the PC's clock). It provides a snapshot of the datascape at that time. If GO.EXE cannot find this file it creates it with default settings (as the Archive application is disabled by default it will require activation).
Simply shutting down and restarting the program GO.EXE effects fall-back to the most recently recorded checkpoint.
This strategy overcomes straightforward problems (after you have attended to their cause) including:
Any data imported from your main computer since the checkpoint will be reimported.
To fall-back to an earlier checkpoint, remove the file SYNC from the current directory (i.e. either delete it or rename it SYNC.OLD). This will force re-initialisation of the application to an archived checkpoint of your choosing.
This strategy overcomes corruption or loss of files within the current directory (including total PC hardware failure necessitating migration to another PC).
Corrupt files should be removed manually. The initialisation procedure will automatically rebuild any files missing from the current directory, except files BATCHnx and INDEX (which should be reconstructed if necessary).
Archive data may be reimported from your main computer provided it is still available there. To guard against possible irrevocable loss of data within the WORM Lake (the BATCHnx file group in the PC Current Directory), you must drain the Lake into your Master Volume more frequently than the main computer fills its WORM Pool.
If you lose any BATCHnx files, you should re-initialise the archiving application having started GO.EXE with the /U option. This instructs the initialiser to disregard the extents of the WORM Lake recorded at the checkpoint, and to use the PC's directory information instead. To bring your Backup Volume up to date you need to copy the latest BATCHnx data sets onto it from the Master Volume. Use main menu option M (Map files on optical volume) to determine what needs to be done, then use options F and B to do it.
Documents may be added to an archive by using equalisation mode to restore BATCHnx data sets from any archive volume (usually your Backup Volume).
This strategy allows you to replace missing or defective Master Volumes (after first employing strategy 2 to rebuild the older part of the archive if appropriate).
BATCHnx | This group of files (sometimes called the WORM Lake) hold all the document tags and print images imported from up to two main computer systems but not yet exported to both the active Archive Master Volume and its Backup Volume. n is the main computer system number (0 or 1) and x is the logical printer identifier (A:O) within that system. |
---|---|
CLUSTERS | Holds fixed length (8-byte) records, one for each cluster within the archived data set INDEX0. The nth record points to cluster n-1 both within the archive media and (when that cluster is rolled-in to the outer cache) within the file INDEX0 in the current directory . |
LINEUP | Holds fixed length (32-byte) records each giving page layout information for one document type. Maintained by main menu option D (Document Lineup). If GO.EXE does not find this file in the current directory it will create it with your default entries. |
INDEX | Holds fixed length (32 byte) records, one for each document tag not yet moved into INDEX0 (record layout as for INDEX0). If this file is lost it should be rebuilt by running main menu option I (Index reconstruction). |
INDEX0 | Holds the hard disc cache for the archived section of the document tag index. This file comprises two logical sections: the Inner Cache and the Outer Cache. The maximum size of each section is assigned separately during initialisation. The inner cache holds all your most recently archived document tags. Once this reaches its maximum size, each new cluster (of 1,024 tags) ousts the oldest cluster from the inner cache. The outer cache is used during document retrieval to buffer (roll-in from the archive media) any clusters needed but not available in the inner cache, on a last-in last-out basis. Rolling a cluster into the outer cache slows down the search process, but only occurs when retrieving older documents. With an 8Mbyte inner cache your most recent ¼-million document tags will be cached on the much faster hard disc. |
SYNC | Holds the configuration settings for GO.EXE and the latest checkpoint data for the Archiving Session. |
VWAPATH.INI | The Archiving session reads this text file (if it exits) at startup. List here (in search order, one path on each line) any paths where GO.EXE is to look for Archive Volumes additional to the Archive Path specified in the Change Settings dialog. |
XREF | Holds fixed length (20-byte) records which index the document tag index. There is one record for each document type within date of issue. The records are held in chronological sequence. |
*.BAK and *.TMP | Main menu option A (Archive & Index new documents) produces new versions of the files INDEX and XREF. During this process the new versions are named INDEX.TMP and XREF.TMP. On completion, the old versions are renamed INDEX.BAK and XREF.BAK. Main menu option R (Retrieve documents) uses temporary work files WORMnn.TMP. If you need more space on your hard disk, you can delete these files when GO.EXE is shut down. |
When you introduce a new Archive Volume, you give it an electronic label specifying:
Provided you take care to label all your volumes uniquely, you may leave it to the application to tell you which one is needed (if it cannot be found on-line). A Backup Volume is always paired with a Master Volume with the same Volume Number.
The data type UINT(n) below represents an unsigned binary number of length n bytes in little-endian format (i.e. the least significant byte is presented first).
The original (1989) implementation of the Archive application used Maxtor OC-800 5¼ inch Optical WORM media, directly accessing the sectors (each holding 2048 bytes of user data) along each track as a single Archive Volume follows:
Sectors 0:511 | Reserved (not used). |
---|---|
Sector 512 | Volume Label. |
Sectors 513:191951 | Data Sets. Each Data Set occupies one or more whole sectors and contains a binary copy of a file (or a section of a file, called a cluster) originally held within a PC file system. Data Sets are written sequentially along the track from the sector following the current Partition Base, without leaving any sector blank. When a Volume is first labeled, the Partition Base is sector 512 (the label sector) and is changed only if it becomes necessary to "skip" over defective media. The first two bytes of every sector of a Data Set are reserved and contain the UINT(2) sequence number of this sector within the Data Set. The first sector of a Data Set has the sequence number 0 and begins with the Data Set Header of 24 or 36 bytes. Thus the maximum file size within a data set is 134,086,632 bytes. |
This device-specific structure has been re-implemented under the MS-DOS file system (thereby making it device independent) using files known as Virtual WORM Archives and named VOLyynnx.VWA, where yynn is the Volume Number and x is M for a Master Volume or B for a Backup Volume.
The first byte of a .VWA file is the first byte of the Volume Label, designated as byte 1,048,576 of the Virtual WORM Archive (the 512 unused reserved sectors are not physically present in the .VWA file).
Several .VWA files may exist in a single MS-DOS directory. Indeed even if the media has high capacity, such as a 650Mb CD-ROM cartridge, it's generally more convenient if this is organised as half a dozen files of around 110Mb than as one single file (which can be difficult to move around). When you setup GO.EXE's Archiving Session, the location of the directory holding its archive files must be given in the Archive Path field of the Change Settings dialog.
If the Archive Path is the root directory of a device be sure to add the \ when specifying the path, e.g. enter E:\ rather than E: because the latter format is reserved to indicate that this is a WORM drive requiring the original device driver.
A .VWA file may reside on any appropriate media, fixed or removable, and may be copied and shared over a network. To allow you to distribute your archive with total flexibility, when GO.EXE is started it looks for the text file VWAPATH.INI in its Current Directory. Make a list in this file of other paths (in search order with one path on each line) where the application is to look for .VWA files.
For example, it may convenient to hold recent archive data on a fast hard disc, relegate older data to CD-ROM, and use removable Jaz media for the Backup Volume.
UINT(2) | Schema Number: VALUE 1 if this version. |
---|---|
UINT(2) | User Number. (Backup volumes have a User Number one greater than master volumes.) |
UINT(2) | This Volume Number (99V99). |
UINT(2) | Previous Volume Number (99V99). |
PIC X(4) | Time & Date labeled (in MS-DOS directory format) as read from PC clock. |
PIC X(20) | (empty) |
PIC X(64) | User Name & site Post Code (ASCIIZ). |
PIC X(1952) | (empty) |
UINT(1) | Length of this Header in bytes including this byte (i.e. 24 or 36 bytes). |
---|---|
UINT(1) | Schema Number: VALUE 1 if this version. |
PIC X(22) | Attribute, Time, Date, Size and Filename in MS-DOS directory format. |
UINT(4) | This Cluster Number. |
UINT(2) | Volume Number, previous cluster. |
UINT(4) | Sector Number, previous cluster. |
UINT(2) | Number of Sectors, previous cluster. |
A data set is valid only if it occupies sufficient sectors to contain a file of the size stated in the header.
The last four fields are present only when the data set contains a cluster (a file section). The first cluster of a file has Cluster Number 0.
Each BATCHnx data set contains a file of variable length records (see layout following)containing the document tags and print images from logical printer x (range A:O) on main computer n (0=Normal, 1=Backup). The data set header records the time of receipt at the PC (according to its clock) of the last data frame from the printer. Each document comprises one Tag record followed by any number of Print Image records each containing one print line. A form feed must separate each page of multi-page documents.
UINT(1) | Length i of this record in bytes including this byte; VALUE 0 if end of data (always present). |
---|---|
UINT(1) | VALUE 128+n if Document Tag (Schema n) follows; Line Feed count if Line Feed with Print Image follows; VALUE 0 if Form Feed with Print Image follows; VALUE 128 if Vertical Tab with Print Image follows. |
PIC X(i-2) | Document Tag or Print Image (ASCII with VALUE 128+j representing j consecutive spaces). |
Each INDEX0 data set is a cluster of up to 1,024 records of fixed length 32 bytes (see layout following). Each record holds the tag and the location of one archived document. The location points to the start of the Document Tag record for the document within the relevant BATCHnx data set. The records are sorted within the data set into Document Type within Date of Issue, and the data set header records the cut-off date for the Date of Issue.
UINT(2) | Master Volume Number of document's location. |
---|---|
UINT(4) | Byte Offset in volume of document's location. |
UINT(3) | VALUE 0 (not used). |
PIC X | Serial Number Group |
UINT(4) | Serial Number. |
PIC X(2) | Date of Issue (in MS-DOS directory format). |
PIC X | Document Type (user defined). |
PIC X | Schema Number for this Document Type (user defined). |
PIC X | Flag 1 (user defined). |
PIC X | Flag 2 (user defined). |
PIC X(12) | Reference (user defined). |
Each SYNC data set is a copy of the SYNC file taken from the PC's current directory at the time (according to the PC's clock) recorded in the data set header.
Main menu option A (Archive and Index new documents) adds a SYNC data set to a volume upon successful completion of its schedule for that volume.
The acronym WORM expands to Write Once Read Many. A WORM drive incorporates a low-power infrared laser to burn patterns into a sensitive layer on the media. This pattern cannot subsequently be changed, but may be read back as often as desired. Data seek and transfer rates are comparable to floppy diskette but the capacity is much greater. The sensitive layer on the cartridges is methine dye spin-coated in two spiral tracks (A and B), one on each side of the disc. There is only one head in the drive unit so the cartridge has to be turned over by hand to access the opposite side, therefore the application classifies each side as an independent volume. Each track is pre-formated into 191,952 sectors each of which may be written once with 2,048 bytes of data. Thus one cartridge can hold 750 Mb (approaching one million typical document image pages).
In 1989 the typical PC hard disc could hold 20Mb, so WORM technology was well suited to archiving, the removable cartridges being unmodifiable and as convenient as floppy discs. Nowadays, hard discs provide much faster on-line access together with massive capacity whilst CD-R media provide cheap, reliable and convenient long-term off-line backup.
A WORM PC requires the extra equipment listed following.
The initial requirement is for two cartridges comprising one master/backup pair.
Each of the two sides of a cartridge has its own small green slider to enable or disable write protection. Please ensure that both the green sliders are in the W/R position.
This WORM drive unit is marketed under the Storage Dimensions or Ricoh badge. It may be ordered either for internal installation into your PC's empty half-height 5¼" drive bay or as an external unit with its own power supply. This guide assumes you will be using the external version. This has a power switch and fuse socket at the rear, near the power inlet.
WORM drives are relatively expensive and should be handled firmly but never roughly. Ideally, set up the entire station in a cul-de-sac away from dusty areas.
The drive unit features the industry standard Small Computer System Interface (SCSI) for connection to the host system (in this case the PC) through either of the pair of Centronics-type sockets at the rear.
Please ensure that the drive's SCSI ID number is set to 0 by moving the small red wheel at the rear of the unit so that it points to 0.
This 36-inch cable connects the RXT-800HS drive to the TMC-850M controller.
This is a cling-wrapped kit comprising:
Corel's earlier OptiStar driver has the advantage over CorelSCSI in that the tedious SCSI bus scan (performed by the SCSI ROM BIOS at every PC boot) can be skipped by simply pulling the ROM chip out of the TMC-850M controller:
To effect any optical data transfer, the program requests the Optistar WORM driver to execute the appropriate SCSI dialog with the WORM drive. During this dialog, a busy message is shown at the bottom of the archiving session's screen:
Busy reading|writing n sector(s) @ i in yy.nn
where i is the base Sector Number and yy.nn is the Volume Number. In due course the message changes to done or fail, according to the outcome reported by the driver.
As the WORM driver complies with MS-DOS's single-tasking restrictions, the program cannot respond to any keystrokes (including hotkeys) whilst the driver is busy. This is particularly evident when the drive is not responding (e.g. powered off), as the driver allows up to 11 seconds for a response.
If the transfer failed one of the following will be displayed:
After any failure (other than reading a single blank sector), the program will try to read the volume label and, if successful, will then retry the transfer. This process continues until either the transfer succeeds or you press Esc. In the latter case the program pauses (though allow up to 11 seconds for a response) with the cursor at the prompt:
Retry(any key) or Esc
This enables you to note the details of the failure. To abandon the current process, press Esc again.
If any volume label read reveals that the wrong volume is in the drive, the program will issue the prompt:
Please load Optical Volume v & press a key (or Esc)
where v is the number of the required volume. To abandon the current process, press Esc.
To dramatise the irreversible effect of writing to WORM media, the process is often called burning. The program burns each sector in turn along the track, leaving no gaps within the current partition. Whenever you label (or reactivate) a Master Volume, the current partition is reset to extend from the volume label (sector 512) to the end of the track (sector 191,951).
If there has been a volume label read since the last burn the program must search the track to find the lowest blank sector within the partition. This is done by a binary chop iteration (if the central sector is blank then discard the track above otherwise discard the track below; repeat until the boundary is found).
If the program attempts to write to a section of the track so defective as to defeat the flaw manager (described following), your intervention will be required as follows:
The Archive option under the archiving session menu shows and lets you change the partition base sector. The program never writes outside the partition, so changing the base sector allows a defective section of the track to be skipped. The partition limits apply to any burn to any volume (Master or Backup).
The WORM drive's firmware supports a flaw management technique designed to yield less than one bit error in every 120 gigabytes (300 full volumes) retrieved ten years after recording. The technique overcomes media imperfections by:
Use of the alternate sectors requires the cooperation of the WORM driver. The available documentation for OptiStar does not suggest that this driver provides such support.