I’m currently working with CSV files where I’m parsing them into a [[String]]
The first [String] in that array is a header file eg:
["Code","Address","Town"]
and the rest are arrays of information
["ABA","12,east road", "London"]
I would like to create a query system where input and the result will look something like this
>count "Town"="*London*" @1="A*"
14 rows
The column name could be put in as a string Or as a @ with the index of the column
I have a case switch to recognise the first word input since Im going to expand my CSV reader for different functions.
When It sees the word count it will go to a function that will return a count of rows. Im not sure how to start doing the parsing of the query.
At first I thought I might split the resulting string after the word count into a list of strings with each query, perform one and use the list that satisfied this query to be checked again for the next, leaving with a list for which all queries are satisfied, then counting amount of entries and returning them. There would be a case switch also to recognise if the first input is a string or an @ symbol.
The * are used to represent zero or any character following the word.
I am not sure how to start implementing this or if im missing a problem I might encounter with my solution. I will be great full for any help with starting me off. Im not very advanced with Haskell(since Im just starting), so I would also appreciate keeping it simple. Thank you
Here’s one possible approach.
First, let us move away from your list-of-list-of-string representation a bit and let us represents records as key/value pairs, such that a database is just a list of records:
Reading in CSV data in your representation then becomes:
Now, let us talk about queries. In your setting, a query is essentially a list of filters, where each filter identifies a field and matches a set of values:
Fields are selected either by name or by a one-based (!) index:
Values are matched by applying a sequence of simple parsers, where a parser either recognises a single character or otherwise a sequence of zero or more arbitrary characters:
Parsing can be implemented using the list-of-successes method, where each success denotes the remaining input, i.e., the part of the input that was not consumed by the parser. An empty list of remaining inputs denotes failure. (So, note the difference between
[]and[[]]in the produced results in the cases below.)Filtering values then develops into backtracking:
Value selection is straightforward:
Applying a record filter now amounts to constructing a predicate over records:
Finally, for executing a complete query, we have
(I leave the parsing of queries themselves as an exercise:
but I recommend using a parser-combinator library such as parsec or uulib.)
Now, let’s test. First, we introduce a small database in CSV-format:
Then, we construct a simple query:
And, indeed, running our query against the database yields:
Or, if you are only after counting the results of your query:
Of course, this is just one approach and for sure one can think of many alternatives. A nice aspect of breaking the problem down in small functions, as we have done above, is that you can easily test and experiment with small chunks of the solution in isolation.