I’m writing a program which fetches and edits articles on Wikipedia, and I’m having

Question

0

Asked: June 17, 20262026-06-17T05:33:22+00:00 2026-06-17T05:33:22+00:00

I’m writing a program which fetches and edits articles on Wikipedia, and I’m having

0

I’m writing a program which fetches and edits articles on Wikipedia, and I’m having a bit of trouble handling Unicode characters prefixed with \u. I’ve tried .encode(“utf8”) and it isn’t seeming to do the trick here. How can I properly encode these values prefixed with \u to POST to Wikipedia? See this edit for my problem.
Here is some code:
To get the page:

url = "http://en.wikipedia.org/w/api.php?action=query&format=json&titles="+urllib.quote(name)+"&prop=revisions&rvprop=content"
articleContent = ClientCookie.urlopen(url).read().split('"*":"')[1].split('"}')[0].replace("\\n", "\n").decode("utf-8")

Before I POST the page:

data = dict([(key, value.encode('utf8')) for key, value in data.iteritems()])
data["text"] = data["text"].replace("\\", "")
editInfo = urllib2.Request("http://en.wikipedia.org/w/api.php", urllib.urlencode(data))

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T05:33:23+00:00

You are downloading JSON data without decoding it. Use the json library for that:

import json

articleContent = ClientCookie.urlopen(url)
data = json.load(articleContent)

JSON encoded data looks a lot like Python, it uses \u escaping as well, but it is in fact a subset of JavaScript.

The data variable now holds a deep datastructure. Judging by the string splitting, you wanted this piece:

articleContent = data['query']['pages'].values()[0]['revisions'][0]['*']

Now articleContent is an actual unicode() instance; it is the revision text of the page you were looking for:

>>> print u'\n'.join(data['query']['pages'].values()[0]['revisions'][0]['*'].splitlines()[:20])
{{For|the game|100 Bullets (video game)}}
{{GOCEeffort}}
{{italic title}}
{{Supercbbox  <!--Wikipedia:WikiProject Comics-->
| title =100 Bullets
| image =100Bullets vol1.jpg
| caption = Cover to ''100 Bullets'' vol. 1 "First Shot, Last Call". Cover art by Dave Johnson.
| schedule = Monthly
| format =
|complete=y
|Crime       = y
| publisher = [[Vertigo (DC Comics)|Vertigo]]
| date = August [[1999 in comics|1999]] – April [[2009 in comics|2009]]
| issues = 100
| main_char_team = [[Agent Graves]] <br/> [[Mr. Shepherd]] <br/> The Minutemen <br/> [[List of characters in 100 Bullets#Dizzy Cordova (also known as "The Girl")|Dizzy Cordova]] <br/> [[List of characters in 100 Bullets#Loop Hughes (also known as "The Boy")|Loop Hughes]]
| writers = [[Brian Azzarello]]
| artists = [[Eduardo Risso]]<br>Dave Johnson
| pencillers =
| inkers =
| colorists = Grant Goleash<br>[[Patricia Mulvihill]]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a program which fetches and edits articles on Wikipedia, and I’m having

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply