I am saving my data into a dictionary and after saving it to the dictionary I printed the data to see what it looks like and I see the unicode:
(u'520775', [[u'Kategori:2. divisjon fotball for herrer 2008']])
(u'754686', [[u'Kategori:Debutalbum', u'Kategori:Musikkalbum fra 1990', u'Kategori:Tre Sm\xe5 Kinesere-album']])
(u'381191', [[u'Kategori:Serierundene i Adeccoligaen 2007']])
(u'972597', [[u'Kategori:Tippeligaen 2011']])
(u'263001', [[u'Kategori:Musikkalbum fra 2003']])
(u'23037', [[u'Kategori:Luftforsvaret']])
(u'640060', [[u'Kategori:Deltagermedaljen', u'Kategori:F\xf8dsler i 1923', u'Kategori:Norske folkemusikere', u'Kategori:Norske trekkspillere', u'Kategori:Paul Harris Fellow', u'Kategori:Personer fra Vefsn kommune']])
I have the following code, I used the format option but it didn’t really work. What also confuses me is,when I print the id prior to saving it in dictionary, I see it without integer.
Here is the segment of the code,
for (pageId, pageData) in data['query']['pages'].iteritems():
categoryTitles = [];
idTitleDictionary[pageId] = [];
print pageId;
try:
for category in pageData['categories']:
categoryTitles.append(category['title']);
idTitleDictionary[format(pageId)].append(categoryTitles);
I am trying it figure how to encode it prior to saving it into a dictionary.
When you
printadict, orlist, ortuple,repris called on the items in the container, rather thanstrlike when youprintthem directly, so you see the unicode escape codes.If you were to
You’d see the strings encoded properly for your terminal. You don’t need to do anything to those strings to interpret the escape codes — everything is stored properly, it’s just how it’s being displayed.