What does the sed expression: G; s/\n/&&/; /^\([ ~-]*\n\).*\n\1/d; s/\n//; h; P do? Exactly what does it match and how does it match it?
It’s from todo.sh. In context:
archive()
{
#defragment blank lines
sed -i.bak -e '/./!d' "$TODO_FILE" ## delete all empty lines
[ $TODOTXT_VERBOSE -gt 0 ] && grep "^x " "$TODO_FILE" ## if verbose mode print completed tasks..
grep "^x " "$TODO_FILE" >> "$DONE_FILE" ## append completed tasks to $DONE_FILE
sed -i.bak '/^x /d' "$TODO_FILE" ## delete completed tasks
cp "$TODO_FILE" "$TMP_FILE"
sed -n 'G; s/\n/&&/; /^\([ ~-]*\n\).*\n\1/d; s/\n//; h; P' "$TMP_FILE" > "$TODO_FILE"
## G; Add a newline
## s/\n/&&/; Substitute newline with && (two newlines?)
## /^\([ ~-]*\n\).*\n\1/d; Delete duplicate lines???
## s/\n// Remove newlines
## h Hold: copy pattern space to buffer
## P Print first line of pattern space
if [ $TODOTXT_VERBOSE -gt 0 ]; then
echo "TODO: $TODO_FILE archived."
fi
}
Ok, you’ve got some of the story already. Recall that the sed expression is executed for each input line. So the
Gat the beginning appends the contents of the hold space to the current line (with a newline in between). The contents of the hold space is empty initially but expanded by thehcommand at the end of each input cycle.Then
s/\n/&&/duplicates the first newline only, the one between the current line and what was grabbed from the hold space. This is in preparation for the next command./^\([ -~]*\n\).*\n\1/indeed matches if the current line is identical to a line in the hold space:^\([ -~]*\n\)matches a line at the beginning of the buffer¹Note that this matches only if the line contains only printable ASCII characters.
If your system supports locales,
^\([[:print:]]*\n\)would be better..*\nmatches at least one subsequent line\1matches a line identical to the first lineThe extra newline added by the previous
scommand takes care of the case when the duplicate is the very first line from the hold space. The point of the\n\1is to “anchor” the duplicate at the beginning of a line, otherwisebarwould be considered a duplicate offoobar. If the current line is a duplicate, thedcommand discards it and execution branches to the next line.If the current line is not a duplicate,
s/\n//discards that extra newline (again, nogmodifier, so only the first newline is removed). Then thehcommand results in the hold space containing what it contained before, with the current line prepended. FinallyPprints the current input line.Ok, now what does the hold space contain? It starts empty, then gets each successive line prepended unless it’s a duplicate. So the hold space contains the input lines, in reverse order, minus the duplicates.
¹ Uh, I don’t know how you did that, but that should be
[ -~], not[ ~-]which wouldn’t make any sense.Here’s another way of doing this, if you have a POSIX-conforming set of tools (Single Unix v2 is good enough).
Oh, you wanted to do this legibly and concisely? Just use awk.
If the current line hasn’t been seen yet, mark it as seen, and print it.
Note that like the sed method, the awk method essentially stores the whole file in memory. The method above using
sorthas the advantage that onlysortneeds to keep more than one line of input at a time, and it’s designed for this.Of course, if you don’t care about the order of the lines, it’s as simple as
sort -u.