I’m looking to compare two big sets of csv files and/or a csv file and a .txt file. I “think” the .txt file may need to be converted to a csv file just for simplicity sake but that may or may not be needed. I either want to use excel, c++, or python. I need to compare one “accepted” value list to a list that is measured and find the difference between them if there is one. Excel may be the easiest way to do this but python or c++ may work just as well. This is not homework so don’t worry about that sort of thing. Code advice and/or templates is greatly appreciated. or links to websites
EDIT 1
I’ve read about Python’s difflib or differ class but unfamiliar how to use it and may be more than I want.
EDIT 2
The Files both will have a series of columns(not with lines drawn between them or anything) and below those “named” columns there will be numbers. I need to compare the number in column 1 spot one in file one to column 1 spot one of file 2 and if there is a difference show the difference in another csv file
You can use ADO (ODBC/JET/OLEDB Text Driver) to treat ‘decent’ .txt/.csv/.tab/.flr files as tables in a SQL Database from every COM-enabled language. Then the comparisons could be done using the power of SQL (DISTINCT, GROUP, (LEFT) JOINS, …).
Added with regard to your comment:
It’s your problem and I don’t want to push you where you don’t want to go. But SQL is a good (the best?) tool, if you need to compare tabular data. As evidence the output of a script that spots the differences in two .txt files:
Further additions:
This article deals with ADO and text files; look for a file adoNNN.chm
(NNN=Version number, e.g. 210) on your computer; this is a good book about
ADO.
You can use Access or OpenOffice Base to experiment with SQL statements
applied to a linked/referenced (not imported!) text database.
A script/program will be easy after you mastered the initial hurdle: connecting
to the the database, i.e. to a folder containing the files and a schema.ini
file to define the structure of the files=tables.
The output above was generated by:
If you delete/ignore the fat (create SQL statements, diagnostics output), it boils
down to 6 lines
which can be ‘ported’ easily to every COM-enabled language, because the ADO
objects do all the heavy lifting. The .GetString method comes handy, when you
want to save a resultset: just twiddle the separator/delimiter/Null arguments
and dump it to file
(don’t forget to add a definition for that table to your schema.ini). Of
course you also can use a “SELECT/INSERT INTO”, but such statements may not
be easy to get right/passed the ADO Text Driver’s parser.
Addition wrt Computations:
Start with a 5 x 2 master/approved file containing:
transform it to expected.txt
by appending the Spot column so it conforms to
in your schema.ini file. Similarly, transform a measure file like:
to measured.txt
Apply
Write the resultset to differences.txt
aFNames = Array( “Num0”, … “Spot” )
oFS.CreateTextFile( sFSpec ).Write _
Join( aFNames, sFSep ) & sRSep & oRS.GetString( adClipString, , sFSep, sRSep, “” )
and you get: