I’m relatively new to Python, and extremely new to MongoDB (as such, I’ll only be concerned with taking the text files and converting them). I’m currently trying to take a bunch of .txt files that are in JSON to move them into MongoDB. So, my approach is to open each file in the directory, read each line, convert it from JSON to a dictionary, and then over-write that line that was JSON as a dictionary. Then it’ll be in a format to send to MongoDB
(If there’s any flaw in my reasoning, please point it out)
At the moment, I’ve written this:
"""
Kalil's step by step iteration / write.
JSON dumps takes a python object and serializes it to JSON.
Loads takes a JSON string and turns it into a python dictionary.
So we return json.loads so that we can take that JSON string from the tweet and save it as a dictionary for Pymongo
"""
import os
import json
import pymongo
rootdir='~/Tweets'
def convert(line):
line = file.readline()
d = json.loads(lines)
return d
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file, 'r')
lines = f.readlines()
f.close()
f=open(file, 'w')
for line in lines:
newline = convert(line)
f.write(newline)
f.close()
But it isn’t writing.
Which… As a rule of thumb, if you’re not getting the effect that you’re wanting, you’re making a mistake somewhere.
Does anyone have any suggestions?
When you decode a json file you don’t need to convert line by line as the parser will iterate over the file for you (that is unless you have one json document per line).
Once you’ve loaded the json document you’ll have a dictionary which is a data structure and cannot be directly written back to file without first serializing it into a certain format such as json, yaml or many others (the format mongodb uses is called bson but your driver will handle the encoding for you).
The overall process to load a json file and dump it into mongo is actually pretty simple and looks something like this: