I need to get the unique URLs from a web log and then sort them. I was thinking of using grep, uniq, sort command and output this to another file
I executed this command:
cat access.log | awk '{print $7}' > url.txt
then only get the unique one and sort them:
cat url.txt | uniq | sort > urls.txt
The problem is that I can see duplicates, even though the file is sorted which means my command worked. Why?
uniq | sortdoes not work:uniqremoves contiguous duplicates.The correct way is
sort | uniqor bettersort -u. Because only one process is spawned.