I/O Efficient GZip Compression of Packet Captures

One of the major challenges with logging network traffic is that it is very disk I/O intensive. It can also require a lot of storage. The storage requirements often lead to the use of file compression algorithms such as gzip to reduce the amount of disk space needed. For DNS traffic this can typically result in an 80% reduction in file size.

There are two commonly implemented ways of compressing pcap files. The simplest is to have the capture application write its output to “stdout”, and then pipe that output into the “stdin” of a compression program, e.g.:

tcpdump -w - | ( gzip -c > output.pcap.gz & )

The other is to just have the files written out in their normal format, and then use a post-processing script to find (completed) pcap files and compress them.

The pipe method however is incompatible with tcpdump’s (and dnscap’s) automatic file rotation feature - there’s no way to close the current file and restart the compression when a new file is opened. If you are performing continuous packet capture then file rotation is essential, so this method is impractical.

The disadvantage of using a post-processor, though, is that it further dramatically increases the disk I/O load. Every packet ends up being written to disk in uncompressed form, and then later the post-processor has to read back the contents of that file, compress the data, and output a new file, most likely at the same time as yet more uncompressed data is being saved to the next file. If the packet capture is being performed directly on a network server this additional I/O load can adversely affect the operation of the server.

A solution is required, therefore, that supports file rotation and on-the-fly compression such that uncompressed data is never written to disk. As we were planning to deploy DNS-OARC’s dnscap to support monitoring of the forthcoming root zone DNSSEC key roll I decided to see if I could build this functionality directly into dnscap.

dnscap uses libpcap to read and write data, and specifically the pcap_dump_open(pcap_t *pcap, const char *path) function to save files. However libpcap also has a function pcap_dump_fopen(pcap_t *pcap, FILE *fp) which allows the caller to pass a handle to an already opened file to libpcap instead of having libpcap open the file for itself. All libpcap I/O uses stdio internally.

A little known (albeit non-standard) feature of modern stdio implementations is the ability to create a file handle that uses user-supplied read, write, seek and close functions instead of using the standard POSIX functions that work on a UNIX file descriptor. On Linux systems glibc provides fopencookie() and BSD-derived systems have funopen(). An excerpt from the manual page for the latter is shown below:

SYNOPSIS

     #include <stdio.h>

     FILE *funopen(const void *cookie,
                   int (*readfn)(void *, char *, int),
                   int (*writefn)(void *, const char *, int),
                   fpos_t (*seekfn)(void *, fpos_t, int),
                   int (*closefn)(void *));

DESCRIPTION

     The funopen() function associates a stream with up to four ``I/O
     functions''.  Either readfn or writefn must be specified; the others can
     be given as an appropriately-typed NULL pointer.  These I/O functions
     will be used to read, write, seek and close the new stream.

    ...

     The calling conventions of readfn, writefn, seekfn and closefn must match
     those, respectively, of read(2), write(2), lseek(2), and close(2) with
     the single exception that they are passed the cookie argument specified
     to funopen() in place of the traditional file descriptor argument.

By happy coincidence (or more likely by design) the standard gzip compression library contains functions that almost perfectly match these requirements. With just a small amount of additional code you can create a FILE * handle that acts exactly like a normal file handle to the caller, but that transparently compresses the output data before it’s saved to disk.

Within dnscap the call to pcap_dump_open() can be replaced with code like this (NB: error handling code omitted for brevity and showing only the BSD funopen method):

static int
     gzip_cookie_write(void *cookie, const char *buf, int size) {
         return gzwrite((gzFile)cookie, (voidpc)buf, (unsigned) size);
     }

     static int
     gzip_cookie_close(void *cookie) {
         return gzclose((gzFile)cookie);
     }

     pcap_dumper_t *dump_open(pcap_t *pcap, const char *path, int want_gzip) {
         if (want_gzip) {
             gzFile z = gzopen(path, "w");
             FILE *fp = funopen(z, NULL, gzip_cookie_write, NULL, gzip_cookie_close);
             return pcap_dump_fopen(pcap, fp);
         } else {
             return pcap_dump_open(pcap, path);
         }
     }

The cookie argument is an opaque pointer that is typically a pointer to a structure containing per-context data relating to the file being handled. In this case the gzip library’s gzFile type is used directly as that cookie.

Patches to dnscap that automatically generate compressed output if the save suffix ends with .gz have been accepted by DNS-OARC and are in the 1.5.0 release.

In our tests this was found to be very much more efficient than using a post-processor, with vastly decreased I/O load and also a beneficial drop in CPU usage.

Recent Posts

What's New from ISC

Previous post: Funding Kea