The SDIF Frame Directory

Diemo Schwarz

06.01.2000






From several sides, the need has emerged for rapid random access to data at a given time in an SDIF file, without loading the whole file into memory. Using the basic SDIF framework, a new ``internal'' SDIF type (as opposed to a new ``external'' sound description) can be defined. It records the file position of, and information about all frames occurring in an SDIF file. The definition allows later addition of data to an existing SDIF file, without having to rewrite its frame directory. As usual, programs unaware of this extension can ignore it completely. However, as soon as a program uses a good SDIF library, the frame directory will be generated and used automatically.

Here's a proposal how to represent this as a normal SDIF description type:

An SDIF file with one directory looks like this:

Header -- File header, NVTs, type definitions, etc.
Directory Pointer -- a frame containing the file position of the Directory
Data -- usual SDIF data frames
Directory -- the SDIF directory itself, containing the streams, signatures, times, and file positions of all SDIF data frames (do we need to record the number of matrices and matrix types also?)

The SDIF directory begins itself with a Directory Pointer, containing the file position of the next directory for data that has been appended, so that we get a linked list of SDIF directories. The entries in the directories are pointing to data frames like in figure 1.



Figure 1: Schema of an SDIF file with n appended data blocks with their respective directories.

How to represent an SDIF directory?

The SDIF directory consists of directory entries. The directory pointer is an SDIF directory entry itself, recording the file position of the appropriate SDIF directory frame(s).

An SDIF Directory is represented as one (or more, see below) 1DIR (or similar) SDIF frames. Its time is the time of the last data frame in the block.

How to represent directory entries?

To optimize the representation we'd have to know the most common type of query to the directory. It will probably be something like: Where's the frame of stream n with type 1ABC at time xyz? If we completely hierarchise the directory, this sort of query will be very fast, but getting a list of all times for which frames are present will be slow. Vice-versa in a flat directory, and additionally we would be penalized by a high redundancy.

We'll have to think about this a bit more. Here come two proposals for representation:

Flat Representation

A directory entry is constituted from corresponding rows in several matrices of the form:

1DIT double {time}
1DIF int32  {frame-type, stream-id}
1DIP intXX  {filepos}
If it wasn't for the awkward restriction of one frame-type per stream, we could make one 1DIR frame per stream, recording only the frames of this stream.

If 32bit file positions are enough, we could compact this to:

IDIR double {time}
1DIR int32  {frame-type, [stream-id,] filepos}

Hierarchical Representation

There is one 1DIR directory frame per group of data frames, containing matrices of the form:


IDIR int32 {frame-type, [stream-id]} -- one row only
1DIT double {time} -- one row per data frame
1DIR intXX {filepos} -- corresponding rows
Again, the stream-id info could be taken from the stream of the directory frame.

More Details of Representation

The directory pointer
is of the form of an SDIF directory containing, as only entry, the file and time position of the corresponding SDIF directory. Thus, the frame type in the entry is 1DIR, or the null signature if there is no SDIF directory pointed to. The file position in a null directory pointer is the last byte of the file + 1, or 0 if not known. The time recorded in its entry is the time of the last frame or NaN if not known. This way, we have a standard place to store the length and duration of a file, even if we don't write the SDIF directory.

Extendability:
A program can, of course, add more information about the frames as additional matrices, or columns, which are ignored by other programs.

Representation of file positions:
These would be the bytes counting from the start of the file, useable with fseek(). In 32 bits, we could then have files up to 4 GB in size. Is this enough? Using the reasonable but error-prone technique to define a file position as the number of the longword, we could use 64 GB files. Is this enough?

I don't think this question is silly. Who, in the past world of 40 MB harddisks for PCs would have thought that we're approaching the 100 GB mark of capacity? Given a lifetime of the SDIF standard of maybe 20 years or more (file-format historians will certainly give a better estimate) we should never underestimate requirements.

Let's do an estimate of the absolute maximum we could put into an sdif file. Say, 192kHz, 32 bit sound data, plus all conceivable analyses (how many?) in very small frames. How many stream-hours do we get from 4 GB? Then extrapolate to at least 10 years from now.

One solution where we could stay with convenient 32 bits would be to define file positions always relative to the start of the block. The overhead doesn't seem to be worth it, though.

Another solution is to opt for 32 bit positions for now and reserve 2DIR frames for the huge files of the future.

Updating:
What can you sensibly modify in an SDIF file without re-writing it? Not much. If you overwrite data within a frame, the directory stays valid. If you overwrite the times, frame type, and stream id, you'd indeed have to update or invalidate the directory. The latter is easily done by setting the Directory Pointer to null.

On re-writing a file, the library would regenerate the index. For the bad case that there are programs that add some data, and pass the rest through, not being aware of the special meaning of the directory, we could add some sanity checks: The directory could include its own file-position, and a checksum of the file. Maybe this is not necessary, since every access through the directory can be validated by cross-checking if the frame found at a certain position is really of the recorded type, time, and stream.


This document was translated from LATEX by HEVEA.