%% Problem solved and the code below acts as expected %%
I’m trying to write a SVN pre-commit hook in Bash that tests incoming files for UTF-8 encoding. After a lot of string juggling to get the path of the incoming files and ignore dirs/pictures/deleted files and so on, I use ‘svnlook cat’ to read the incoming file and pipe it to ‘iconv -f UTF-8’. After this I read the exit status of the iconv operation with ${PIPESTATUS[1]}.
My code look like this:
REPOS="$1"
TXN="$2"
SVNLOOK=/usr/bin/svnlook
ICONV=/usr/bin/iconv
# The file endings to ignore when checking for UTF-8:
IGNORED_ENDINGS=( png jar )
# Prepairing to set the IFS (Internal Field Separator) so "for CHANGE in ..." will iterate
# over lines instead of words
OIFS="${IFS}"
NIFS=$'\n'
# Make sure that all files to be committed are encoded in UTF-8
IFS="${NIFS}"
for CHANGE in $($SVNLOOK changed -t "$TXN" "$REPOS"); do
IFS="${OIFS}"
# Skip change if first character is "D" (we dont care about checking deleted files)
if [ "${CHANGE:0:1}" == "D" ]; then
continue
fi
# Skip change if it is a directory (directories don't have encoding)
if [ "${CHANGE:(-1)}" == "/" ]; then
continue
fi
# Extract file repository path (remove first 4 characters)
FILEPATH=${CHANGE:4:(${#CHANGE}-4)}
# Ignore files that starts with "." like ".classpath"
IFS="//" # Change seperator to "/" so we can find the file in the file path
for SPLIT in $FILEPATH
do
FILE=$SPLIT
done
if [ "${FILE:0:1}" == "." ]; then
continue
fi
IFS="${OIFS}" # Reset Internal Field Seperator
# Ignore files that are not supposed to be checked, like images. (list defined in IGNORED_ENDINGS field above)
IFS="." # Change seperator to "." so we can find the file ending
for SPLIT in $FILE
do
ENDING=$SPLIT
done
IFS="${OIFS}" # Reset Internal Field Seperator
IGNORE="0"
for IGNORED_ENDING in ${IGNORED_ENDINGS[@]}
do
if [ `echo $IGNORED_ENDING | tr [:upper:] [:lower:]` == `echo $ENDING | tr [:upper:] [:lower:]` ] # case insensitive compare of strings
then
IGNORE="1"
fi
done
if [ "$IGNORE" == "1" ]; then
continue
fi
# Read changed file and pipe it to iconv to parse it as UTF-8
$SVNLOOK cat -t "$TXN" "$REPOS" "$FILEPATH" | $ICONV -f UTF-8 -t UTF-16 -o /dev/null
# If iconv exited with a non-zero value (error) then return error text and reject commit
if [ "${PIPESTATUS[1]}" != "0" ]; then
echo "Only UTF-8 files can be committed (violated in $FILEPATH)" 1>&2
exit 1
fi
IFS="${NIFS}"
done
IFS="${OIFS}"
# All checks passed, so allow the commit.
exit 0
The problem is, every time I try to commit a file with Scandinavian characters like “æøå” iconv returns an error (exit 1).
If I disable the script, commit the file with “æøå”, change the -t (transaction) in “svnlook -t” and “svnlook cat -t” to a -r (revision), and run the script manually with the revision number of the “æøå” file, then iconv (and therefor the script) returns exit 0. And everthing is dandy.
Why does svnlook cat -r work as expected (returning UTF-8 encoded “æøå” string), but not svnlook cat -t?
The problem was that iconv apparently behaves unexpectedly if no output encoding is selected.
Changing
to
solved the problem, and made the script behave as expected 🙂