Compression and Progressive Transmission of Digital Images


Technical Statement of Work

Overview

The progressive transmission of images is a necessary requirement for general public access to digital image archives. This progressive transmission must meet these requirements:

We have designed and built a prototype progressive image transmission system for images from the WIYN 3.5 meter Telescope that demonstrates these qualities. We have sent 16-bit digital images of size 800x800 over a network as slow as 2400 baud, and have shown that useful images can be obtained in fewer than 60 seconds. We feel that this approach is a useful technology for a publicly accessed digital library, as it can simultaneously meet the different imaging requirements of each user. Browsers can see many images in a short span of time without placing excessive loads on a server or sending an enormous volume of data over the Internet. The lossiness need not be determined in advance, so a browser can decide to focus on an interesting image, and ask for more data to improve it, without sacrificing the time already spent.

To illustrate the power of the technique and quantify the potential, we will imagine a ``worst case'' public access scenario: a third grader in a rural school, connected to the Internet using SLIP over a 2400 baud dialup, with no significant local image storage capability, wanting to browse galaxy images from the Hubble Space Telescope. While we appreciate and support the goal of putting every classroom in America on the Information Superhighway, we also understand that this goal will not be achieved immediately, or at the same time for all classrooms. We suggest that a worst-case scenario is the needle's eye through which any proposed compression scheme must pass if public access is to be truly democratic in the near term.

Suppose each image that our third grader wants to see has 800 rows and columns, and is digitized to 16 bits. A 256x256 pixel subset of our test image is shown in Figure 1. This is a digitized photograph of a cluster of galaxies, displayed as a photographic negative (the bright objects are dark). Uncompressed, the full image amounts to 640,000 pixels, or about 10 million bits. Over the dialup, this would take about an hour (71 minutes) to receive. If the image is sent sequentially, by rows, from top to bottom, then the full hour must pass before full spatial coverage is achieved. Even with compression, many images only yield a factor of 2 without losing information, so the transmission time falls only to 30 minutes. Lossy compression can yield another factor of 10 for astronomical images, dropping the time to 3 minutes, but two things have happened: information has been discarded by the server and is no longer available to the student, and several minutes must still pass before our young browser has anything to look at.

With our prototype system, we investigated two better methods of sending the data, comparing each of them to the worst-case 71-minute sequential transmission (which we will call raster mode) described above. The first method involves decomposing the raw data by bitplane, rather than by rows or columns, and compressing the bitplanes using quadtree coding (Huang and Bijaoui 1991) followed by a simple Huffman or run-length coding. That is, we extract and send the most significant bitplane first, then the next, and so on. This method meets some of the requirements we laid out for progressive transmission. First, the higher bitplanes are sparse, and they compress well, so that only a small amount of data must be sent. Second, full spatial coverage is provided by each bitplane. The image is sent in order of intensity, not in a spatial or geometrical order. The drawback is that the received image is badly quantized in intensity until many bitplanes have been received, and the transmission by intensity prevents the visualization of lower-brightness extended features (galaxies in our case) that may occupy the mid- to lower-level bitplanes. Figure 2 illustrates the performance of the raw bitplane transmission, each panel showing a 256x256 subset of our test image after the receipt of another image bitplane. This is a big improvement over the raster mode, but shows intensity quantization and an absence of faint, diffuse features in the early stages of the transmission. Our user's 71-minute wait would decrease to about 9 minutes for the image quality shown at the lower right.

The second method involves performing a 2-D wavelet transformation (e.g. Numerical Recipes 2nd Edition, Chapter 13) called the H-transform (see White 1992 for a general discussion; also Fritze et al. 1977, Richter 1978, and Capaccioli et al. 1988) on the server end and decomposing that by bitplanes. The H-transform performs a multi-resolution analysis of an image, organizing the information such that the ``useful'' information can be sent first, while the noise, which makes up the bulk of the transmission, arrives last. The receiving program can invert a partial set of H-coefficients, creating an image for the user that improves with time as more information arrives. The highest bitplanes of the H-transform encode information at many spatial and intensity scales, so image-wide information with a non-trivial dynamic range is received in the very first bitplanes. Figure 3 shows the same 256x256 subset seen in Figure 2, but here each panel shows the image after the receipt of another H-transform bitplane. Note how much earlier in the transmission the user sees faint, diffuse details compared to Figure 2. In this case, the image at the upper left may very well be good enough for an archive browser, in which case the wait time drops to 43 seconds. With a 2x2 rebinning (see below), the wait time drops again to about 10 seconds.

The H-transform is particularly well-suited to this style of incremental reconstruction for several reasons. First, the transformation and its inverse can be performed in place in the computer's random access memory using only integer arithmetic, which is much faster than floating point transformations. Approximately $5 M^{2}$ integer additions are required for an image of $M$ rows and columns. Transforming or inverting a $512^{2}$ image takes about 4 seconds on a Sun SPARCstation 1. (This time will of course vary among platforms, but it is small compared with traditional image transmission times and will certainly get smaller as low-end platforms improve in speed.) Second, the spatially localized nature of the basis functions of the H-transform prevents the appearance of objectionable artifacts such as ringing around point sources and edges. Third, a natural feature of wavelet-based transformations is that a 2x2 rebinned version of the image (representing 1/4 of the data volume) is available without additional computational cost, and can be sent first. Sending the 2x2 rebinned subset of the image first is attractive because it lowers the computational cost of inversion until the end of the transmission, and the 2x2 rebinned version will in many cases still be perfectly acceptable considering the number of pixels on many users' display monitors. Sending the highest resolution information at the end is in keeping with the other powerful aspect of this scheme, that of letting the user control the degree of lossiness in real time. Any user can get all the data at the full resolution simply by waiting longer, but others with simpler needs are not forced to spend the time receiving information that is useless to them.

In order to compare the various schemes, we define a quality measure $Q$ as \begin{equation} Q = \frac{1}{\sigma_{0}^{2} N} \sum (I_{1} - I_{0})^2 \end{equation} where $I_{1}$ represents the partially received image, $I_{0}$ represents the original image, $\sigma_{0}$ is the standard deviation of the noise in the original image, and the sum is over all N pixels in the image. $Q \leq 1$ occurs when the images are globally consistent to within the noise, and $Q = 0$ occurs for a completely received image with no loss.

Figure 4 shows $Q$ as a function of the amount of data received (expressed as a percentage of the total volume of data to be sent, including protocol overhead) for the image used to generate Figures 2 and 3. The raster mode converges most slowly, as expected. $Q < 1$ occurs only after 98.7% of the data (1304004 bytes) has been received. Sending the image in bitplane order gives $Q < 1$ after 20.1% of the data (187560 bytes) has been received. Sending the H-transform in bitplane order gives $Q < 1$ after only 1.6% of the data (18456 bytes) has been received.

While $Q \approx 1$ may be a relevant scientific criterion, our third grader can do with quite a bit less. The Q-curves in Figure 4 each make a sharp drop to the asymptote at some point. This plunge marks the point at which the image will be visually very similar to the original, and will suffice for our example student. The raster method makes the plunge at about a million bytes, not too far from the original data volume. The raw bitplane method plunges at about 30 thousand bytes, or about a minute and a half over our dialup. The H-transform method plunges at about a hundred bytes. Even allowing 1000 bytes, a respectable image is received in just a few seconds.

Finally, we note that with the progressive transmission of the H-transform, the dramatic plunge to the asymptote that occurs in our images means that the images can be usefully broken up into two distinct pieces (files) in the digital library. The first file, containing the highest bitplanes of the H-transformed image, is small (e.g. typically about 1% of the full image) and can be stored on a fast, primary medium such as a hard disk. The second file, containing the bulk of data representing the slow crawl along the asymptote, can be stored on some slower, secondary medium such as CD-ROM. In this manner, the primary files for many thousands of images can be stored together, with quick access for browsers. For users who cross the fidelity threshold requiring more data, the server can access the slower medium automatically, providing an effective and powerful data management strategy for the archive administrator.

In the system we propose, the user interface is as important as the transmission protocol. The display must also be seamlessly integrated with the protocol; for example, having to copy the whole image between a receiving program and a display program every time a bitplane is received is not acceptable. The display program must also be built with industry standard languages, subroutine libraries, and display protocols for maximum portability between different types of computers. In making sure we end up with a system that our third grader can use, the control features must be intuitive and simple, allowing casual use with little or no learning curve, and yet must provide for user growth and experience with more advanced features such as zooming, panning, image scaling (e.g. logarithmic and square root), and false color. We will design our display program to be simple for beginners, but useful at all educational levels, including college classrooms and laboratories.

Detailed Description

We have a prototype system in hand with which we have made measurements, and which stands as a proof-of-concept for the technique of progressive transmission. Our proposal covers the enhancements we feel are necessary to upgrade the prototype from a research tool used by professional astronomers to a tool that is widely useful to archive administrators and the general public. The proposed enhancements include:

Rather than proposing to build finished products for a large variety of computers, we propose instead to provide portable, ANSI C source code for all aspects of machine-independent progressive transmission of images. In addition, we will build a working application targeted for a specific widely used environment, a standard Unix workstation supporting TCP sockets and an X-Windows display. Our intention is that the portable, industry-standard compression and protocol software we deliver will be used by applications developers to create progressive transmission applications for whatever platforms are popular 3, 5, or 10 years from now.

Figure 5 shows a block diagram of the proposed system. The input is in the form of plain FITS (NASA-standard file format) files. hcompress and dhcompress are file preprocessors used by the archive administrator to prepare those FITS files for the archive. Although the file server would be able to ingest plain FITS files and compress them ``on the fly,'' the server would be affected by the computational load. Preparing the files in advance is recommended to preserve the performance of the server. The archived images can be stored in the archive either as single H-transformed files or in a distributed mode, with the image split between a small file available for quick access, and a much larger file stored on a slower medium such as CD-ROM. The client machine accepts compressed files into the GUI, and can also store either compressed or uncompressed versions locally.

Client development

We propose that the general user will use Mosaic as the Internet navigation tool, which will connect the user to some digital library. When the connection is made, Mosaic will invoke the proposed visualization program, which will conduct all further interactions with the archive server. Alternatively, if the user already knows the location of a specific archive, then Mosaic is not needed or required. The visualization program will have an easy, intuitive Graphical User Interface (GUI) control panel for managing the progressive transmission. The control panel functions will include help screens, file and directory choosing, and transmission controls including cancel image, go to next image, get more data, how much longer, and so on. The visualization program will be a Motif application conforming to the Motif style conventions. The visualization program will also be able to store images locally, with or without compression.

The GUI program will be designed to present a clean, simple interface to first-time or casual users, but will also support more advanced features such as image scaling, false color, color map control, and some simple quantitative features including image minimum and maximum, convergence criteria, zooming, and panning.

Server development

The server enhancements include all the programs required by the archive administrator to receive files in NASA-standard format (FITS) and prepare them for use in the digital library. The required programs will H-compress images into either single-file compressed format or into two or more separate files, allowing distribution of data files between primary (fast) and secondary (slow) storage media. The compression format will include all the data necessary for supporting progressive transmission, including end-of-plane demarcations, sign-plane management (H-transform coefficients can be negative, and must be handled carefully), and data volume and convergence odometers.

Protocol development

The existing prototype protocol already allows two-way communication between the client and server, request-by-filename, and cancel-transmission. The protocol will be enhanced to include support for file and directory navigation, data volume and convergence odometers, progress reports, server status, timeout handling, and client/server negotiations concerning image resolution, user's screen size, and other advanced features that users find helpful.

Educational use, evaluation, and feedback

In order to properly test and evaluate the software, we need feedback from users. The UW Space Place, its staff, and an education specialist will support the testing, evaluation, and improvement of the project software. Operated by the UW Space Astronomy Laboratory, the UW Space Place is a science and technology education center open to the general public. We plan to host several one day teacher workshops at the UW Space Place to provide hands-on opportunities for the teachers to access various images, to discuss possible classroom activities to use these images, and to provide an evaluation of the project software. After the evaluations are reviewed and changes made if needed, we plan to host several school group visits to Space Place. In this way the students will also be provided a hands-on opportunity to access various images, discuss their possible use, and evaluate the software.

When the software is finalized the Space Place will host a computer open house that will highlight various public domain databases and their uses. We will mail out a descriptive flyer discussing the use of available databases to the various teacher associations in Wisconsin, the Wisconsin Space Grant Consortium, and various teacher resource centers around the country. The teacher workshops, school group visits, and open house will also be supported by other UW Space Place staff as part of their on-going activities.


Additional information is available, including a top level page, the abstract of our proposal, and further references.