I have a file listing from an original and a duplicate drive consisting of 985257 lines and 984997 lines respectfully.
As the number of lines do not match I am certain that some of the files have not duplicated.
In order to establish which files are not present I wish to use sed to filter the original file listing by deleting any lines present in the duplicate listing from the source listing.
I had thought about using a match formula in excel but due to the number of lines the program crashes. I thought using this approach in sed would be a viable option.
I have had no success with my approach so far however.
echo "Start"
# Cat the passed argument which is the duplicate file listing
for line in $(cat $1)
do
#sed the $line variable over the larger file and remove
#sed "${line}/d" LiveList.csv
#sed -i "${line}/d" LiveList.csv
#sed -i '${line}' 'd' LiveList.csv
sed -i "s/'${line}'//" /home/listings/LiveList.csv
done
There is a temporary file which is created and fills to the 103.4mb of the listing file however the listing file itself is not altered at all.
My other concern is that as the listing has been created in windows the ‘\’ character may be escaping the string leading to no matches and therefore no alteration.
Example path:
Path,Length,Extension
Jimmy\tail\images\Jimmy\0001\0014\Text\A0\20\A056TH01-01.html,71982,.html
Please help.
This might work for you: