I have data in LISP form and I need to process them in RapidMiner. I am new to LISP and to RapidMiner aswell. RapidMiner doesn’t accept the LISP (I guess it’s because it is programming language) so I probably need somehow to convert LISP form to CSV or something like that. Little example of code:
(def-instance Adelphi
(state newyork)
(control private)
(no-of-students thous:5-10)
...)
(def-instance Arizona-State
(state arizona)
(control state)
(no-of-students thous:20+)
...)
(def-instance Boston-College
(state massachusetts)
(location suburban)
(control private:roman-catholic)
(no-of-students thous:5-10)
...)
I would be really grateful for any advice.
You can make use of the fact that Lisp’s parser is available to the Lisp user. A problem with this data is that some values contain colons, with is used the package name separator in Common Lisp. I made some working Common Lisp code to solve your question, but I’ve had to work around the mentioned problem by defining appropriate packages.
Here’s the code, that of course has to be extended (following the same patterns that are already used in it) for everything you left out in the example in your question:
The main function is data-file-to-csv, that can be called with
(data-file-to-csv "path-to-input-file" "path-to-output-file")in a Common Lisp REPL after loading this code.EDIT: some additional thoughts
It would actually be easier, instead of adding package definitions for all values with colons, to do a regular expression search and replace over the data to add quotes(“) around all the values. That will make Lisp parse them as strings right away. In that case, the line
for string = (remove #\| (write-to-string value :case :downcase))could be removed andstringbe replaced withvaluein all the lines of thecasestatement.Because of the high regularity of the data, it shouldn’t even actually be necessary to parse the Lisp definitions correctly at all. Instead, you could just extract the data with regular expressions. A language particularly suited for regex-based transformation of text files should be just fine for that job, like AWK or Perl.