Several tagged samples will be bundled into one "physics" stream. The physics stream includes the monitor, high multiplicity, upgoing tracks and cascade/tau data. They are distinguished from one another by virtue of, for example, a set of bits in each data record's header. An actual data event, for example, may have both the bits corresponding to the high multiplicity and cascade/tau tags set. Monitor information has the monitor bit set, etc. It is a requirement that data from one of the above streams can be time-correlated with data from any other of the above streams. One possible way to implement this is to have all the data time-ordered and resident in one file, but other implementations are certainly possible. Time slicing may prove valuable as well. In this scenario, the data are sliced into chunks representing a short length of time (something like 10 s or 10 min). A database can then catalog the chunks, speeding up random access to the events.
Several obvious advantages accrue from this "bundled data" scenario:
The size of the files is dominated by tagged physics events, so the additional data volume of the monitor data, prescaled raw data, etc. does not substantially increase overall file size.
An important point here is that we are saving the full raw data stream. In the six-string first year, the raw data are transferred over satellite for analysis and filter checkout. After that it grows too large for the satellite bandwidth, and we plan to copy it at the Pole onto tape. The estimated size of this sample is 50 TB/yr. The tapes are mainly envisioned as an insurance policy and we are only allocating resources for their archiving and no additional resources for their use. If a collaborator comes up with a compelling new analysis idea which requires data not present in the various filtered data sets, the first step will be to develop a new filter and install it at the Pole. In this way, current data is used to validate the new idea. After the new idea passes this test, the collaboration may decide to obtain the significant additional resources needed to refilter all the raw tapes.