Sometimes you’ve got data in a compressed form, but you need to read and process the (uncompressed) data in a program. If, as in my case, the uncompressed data (a program execution trace) is around 8 GB, and the compressed trace only 400MB big, uncompressing that data to disk, reading it and, after processing, deleting it is a big hassle.
It would be nice to directly uncompress and read the data from within the program…and luckly this is actually very simple:
Let’s say, we have a compressed file, trace.dat.gz, then we need to read from.
The code to read the decompressed data goes something like this:
FILE *trace = fopen(filename, "r");
while ((res = fscanf(trace, "%s "lx %lx", &status, &adr, &data)) == 3) {
/* process data */
}
fclose(trace);
The trick to directly read from the compressed file is to execute a process that decompresses the data and directly feeds it into our program. That’s what pipes & fork are for:
int pfd[2];
FILE *trace = NULL;
if (pipe(pfd) != 0) return false;
pid_t cpid = fork();
if (cpid == 0) {
// child process: redirect pipe to stdout & start the decompressor
close(pfd[0]); dup2(pfd[1], 1); close(pfd[1]);
excelp("gunzip", "gunzip", "-c", filename, NULL);
exit(0);
} else {
// parent: read from the pipe
close(pfd[1]);
trace = fdopen(pfd[0], "r");
}
while ((res = fscanf(trace, "%s "lx %lx", &status, &adr, &data)) == 3) {
/* process data */
}
fclose(trace);
close(pfd[0]);
That’s it! Our program now automatically runs “gunzip” on the compressed data before piping it into our program.