What is the best way to organize scraped data into a csv? More specifically each item is in this form
url
"firstName middleInitial, lastName - level - word1 word2 word3, & wordN practice officeCity."
JD, schoolName, date
Example:
http://www.examplefirm.com/jang
"Joe E. Ang - partner - privatization mergers, media & technology practice New York."
JD, University of Chicago Law School, 1985
I want to put this item in this form:
(http://www.examplefirm.com/jang, Joe, E., Ang, partner, privatization mergers, media & technology, New York, University of Chicago Law School, 1985)
so that I can write it into a csv file to import to a django db.
What would be the best way of doing this?
Thank you.
There’s really no short cut on this. Line 1 is easy. Just assign it to
url. Line 3 can probably be split on,without any ill effects, but line 2 will have to be manually parsed. What do you know about word1-wordN? Are you sure “practice” will never be a “word”. Are you sure the words are only one word long? Can they be quoted? Can they contain dashes?Then I would parse out the beginning and end bits, so you’re left with a list of words, split it by commas and/or & (is there a consistent comma before &? Your format says yes, but your example says no.) If there are a variable number of words, you don’t want to inline them in your tuple like that, because you don’t know how to get them out. Create a list from your words, and add that as one element of the tuple.
More specifically? You’re on your own there.