My test equipment generates large text files which tend to grow in size over

Question

0

Asked: May 23, 20262026-05-23T03:30:39+00:00 2026-05-23T03:30:39+00:00

My test equipment generates large text files which tend to grow in size over

0

My test equipment generates large text files which tend to grow in size over a period of several days as data is added.

But the text files are transferred to a PC for backup purposes daily, where they’re compressed with gzip, even before they’ve finished growing.

This means I frequently have both file.txt and a compressed form file.txt.gz where the uncompressed file may be more up to date than the compressed version.

I decide which to keep with the following bash script gzandrm:

#!/usr/bin/bash

# Given an uncompressed file, look in the same directory for 
# a gzipped version of the file and delete the uncompressed 
# file if zdiff reveals they're identical. Otherwise, the 
# file can be compressed.

# eg:  find . -name '*.txt' -exec gzandrm {} \;

if [[ -e $1 && -e $1.gz ]] 
then

    # simple check: use zdiff and count the characters
    DIFFS=$(zdiff "$1" "$1.gz" | wc -c)

    if [[ $DIFFS -eq 0 ]] 
    then

        # difference is '0', delete the uncompressed file
        echo "'$1' already gzipped, so removed"
        rm "$1"

    else

        # difference is non-zero, check manually
        echo "'$1' and '$1.gz' are different"

    fi

else
    # go ahead and compress the file
    echo "'$1' not yet gzipped, doing it now"
    gzip "$1"
fi

and this has worked well, but it would make more sense to compare the modification dates of the files, since gzip does not change the modification date when it compresses, so two files with the same date are really the same file, even if one of them is compressed.

How can I modify my script to compare files by date, rather than size?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T03:30:40+00:00

It’s not entirely clear what the goal is, but it seems to be simple efficiency, so I think you should make two changes: 1) check modification times, as you suggest, and don’t bother comparing content if the uncompressed file is no newer than the compressed file, and 2) use zcmp instead of zdiff.

Taking #2 first, your script does this:

DIFFS=$(zdiff "$1" "$1.gz" | wc -c)
if [[ $DIFFS -eq 0 ]]

which will perform a full diff of potentially large files, count the characters in diff’s output, and examine the count. But all you really want to know is whether the content differs. cmp is better for that, since it will scan byte by byte and stop if it encounters a difference. It doesn’t take the time to format a nice textual comparison (which you will mostly ignore); its exit status tells you the result. zcmp isn’t quite as efficient as raw cmp, since it’ll need to do an uncompress first, but zdiff has the same issue.

So you could switch to zcmp (and remove the use of a subshell, eliminate wc, not invoke [[, and avoid putting potentially large textual diff data into a variable) just by changing the above two lines to this:

if zcmp -s "$1"    # if $1 and $1.gz are the same

To go a step further and check modification times first, you can use the -nt (newer than) option to the test command (also known as square bracket), rewriting the above line as this:

if [ ! "$1" -nt "$1.gz" ] || zcmp -s "$1"

which says that if the uncompressed version is no newer than the compressed version OR if they have the same content, then $1 is already gzipped and you can remove it. Note that if the uncompressed file is no newer, zcmp won’t run at all, saving some cycles.

The rest of your script should work as is.

One caveat: modification times are very easy to change. Just moving the compressed file from one machine to another could change its modtime, so you’ll have to consider your own case to know whether the modtime check is a valid optimization or more trouble than it’s worth.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My test equipment generates large text files which tend to grow in size over

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply