I am currently doing a small data structures project, and I am trying to get data on universities across the country; and then do some data manipulation with them. I have found this data here: http://archive.ics.uci.edu/ml/machine-learning-databases/university/university.data
BUT, the problem with this data is (and I quote from the website): “It is a LISP readable file with a few relevant functions at the end of the data file.” I plan on taking this data and saving it as a .txt file.
The file looks a bit like:
(def-instance Adelphi
(state newyork)
(control private)
(no-of-students thous:5-10)
(male:female ratio:30:70)
(student:faculty ratio:15:1)
(sat verbal 500)
(sat math 475)
(expenses thous$:7-10)
(percent-financial-aid 60)
(no-applicants thous:4-7)
(percent-admittance 70)
(percent-enrolled 40)
(academics scale:1-5 2)
(social scale:1-5 2)
(quality-of-life scale:1-5 2)
(academic-emphasis business-administration)
(academic-emphasis biology))
(def-instance Arizona-State
(state arizona)
(control state)
(no-of-students thous:20+)
(male:female ratio:50:50)
(student:faculty ratio:20:1)
(sat verbal 450)
(sat math 500)
(expenses thous$:4-7)
(percent-financial-aid 50)
(no-applicants thous:17+)
(percent-admittance 80)
(percent-enrolled 60)
(academics scale:1-5 3)
(social scale:1-5 4)
(quality-of-life scale:1-5 5)
(academic-emphasis business-education)
(academic-emphasis engineering)
(academic-emphasis accounting)
(academic-emphasis fine-arts))
......
The End Of the File:
(dfx def-instance (l)
(tlet (instance (car l) f-list (cdr l))
(cond ((or (null instance) (consp instance))
(msg t instance " is not a valid instance name (must be an atom)"))
(t (make:event instance)
(push instance !instances)
(:= (get instance 'features)
(tfor (f in f-list)
(when (cond ((or (atom f) (null (cdr f)))
(msg t f " is not a valid feature "
"(must be a 2 or 3 item list)") nil)
((consp (car f))
(msg t (car f) " is not a valid feature "
"name (must be an atom)") nil)
((and (cddr f) (consp (cadr f)))
(msg t (cadr f) " is not a valid feature "
"role (must be an atom)") nil)
(t t)))
(save (cond ((equal (length f) 3)
(make:feature (car f) (cadr f) (caddr f)))
(t (make:feature (car f) 'value (cadr f)))))))
instance))))
(set-if !instances nil)
(dex run-uniq-colleges (l n)
(tfor (sc in l)
(when (cond ((ge (length *events-added*) n))
((not (get sc 'duplicate))
(run-instance sc)
~ (remprop sc 'features)
nil)
(t (remprop sc 'features) nil)))
(stop)))
The data I am mostly interested in is Number of students, Academic emphases, and School name.
Any help is greatly appreciated.
You can work on/use a Lisp file parser, or you can ignore the language it’s written on and focus on the data. You mentioned you need:
You can
grepthe relevant keywords (def-instance, no-of-students, academic-emphasis), which would leave you with (based on your example):Which simplifies writing a specific parser (def-instance is followed by the name, then all academic-emphasis and no-of-students before the next def-instance refer to the previously defined name)