I did picked csv-enumerator package for parsing large amount of CSV files task.
Input files do contain headers so map type was chosen for row representing. MapRow which is alias to Data.Map ByteString ByteString, to be concrete.
The issue is with map is automatically sorted by keys(column names) deep inside csv-enumerator code – in localy-scoped toMapCSV function using Data.Map.fromList that does sort elements by keys, so simple monkey-patch is not possible.
I’d need to keep columns order for output CSVs.
Unfortunaly my current solution is rather complicated, ie. read header separately and reorder once read MapRow:
import Data.CSV.Enumerator
import qualified Data.Map as M
import Data.Either
import Control.Monad
...
csvMode = CSVS { csvSep = ';'
, csvQuoteChar = Just '"'
, csvOutputQuoteChar = Just '"'
, csvOutputColSep = ';'
}
...
updateCSV :: FilePath -> IO (Either SomeException Int)
updateCSV fn = do
...
csv <- readCSVFile csvMode fn >>= return . (either (const []) (id :: [MapRow]->[MapRow]))
hdr <- readCSVFile csvMode fn >>= return . head .(either (const []) (id :: [Row]->[Row]))
...
-- reorder MapRow
outMapRow=M.fromAscList $ map (\k -> (k, (M.!) (head csv) k)) hdr
...
- is there a more simple way how to keep columns order of an original CSV file ?
- altering csv-enumerator sources is not an option
- I’d like to mention in real code the input csv file is processed by foldCSVFile, ie. “row by row” besides haskell’s laziness
As there is no more complete CSV parsing module than csv-enumerator and Data.Map used for row representation loses original order of columns I had to use association list instead and implement custom functions for element updating, conversion between assoc. list and plain list etc. Fortunately standard Data.List (GHC.List respectively) does contain lookup and zip/unzip functions at least.
first line of csv file is extracted from IO action once
header <- readCSVFile … >>= return . head …
contents of csv file is parsed with foldCSVFile
linesCount <- foldCSVFile … (funToIterIO …) 0
iterative function uses assoc list [(Field, Field)] converted from Row and converts back before writing to output file
let assocRow = zip header …
let updatedARow = adjust trans “Column name” assocRow
— write updated values
outputRow … $ foldr ((:) . snd) [] updatedARow
All works as expected now although more code had to be written and more time was consumed 🙁