Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3782642
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T11:06:42+00:00 2026-05-19T11:06:42+00:00

I’ve been bashing my head at this for ages, I must be doing something

  • 0

I’ve been bashing my head at this for ages, I must be doing something stupid.

I am trying to retrieve all of the possible Wikipedia supported languages and output them to a text file by traversing the tables on List_of_Wikipedias.

Here is my python code so far, which is simply trying to retrieve one of the tables:

import httplib
from lxml import etree

def main():
    conn = httplib.HTTPConnection("meta.wikimedia.org")
    conn.request("GET","/wiki/List_of_Wikipedias")
    res = conn.getresponse()
    root = etree.fromstring(res.read())
    table = root.xpath('//table')
    print table

main()

On my machine this only prints an empty list. To increase speed I cached the page locally and used:

wikipage = open("wikipage.html")
root = lxml.parse(wikipage)

but this makes no impact whatsoever (other than the obvious speedup). I have also tried

lxml.find('table')

and:

for element in root.iter():
    print("%s - %s" % (element.tag, element.text))

which successfully prints out all of the elements, so I know the tree is being created.

What am I doing wrong?

Any help would be appreciated.
Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T11:06:43+00:00Added an answer on May 19, 2026 at 11:06 am
    I am trying to retrieve all of the possible Wikipedia supported languages and output them to a text file by traversing the tables on List_of_Wikipedias
    

    Your problem is that the element names in the document are in a default namespace. How to write XPath expressions that involve such element names is the most FAQ in XPath and has numerous good answer in the SO xpath tag. Just search for them.

    Here is a complete solution:

    Use:

    (//x:table)[1]/x:tr[not(x:th)]/x:td[2]//text()
    

    where you have registered the XHTML namespace ("http://www.w3.org/1999/xhtml") bound to the prefix "x".

    When I evaluated this XPath expression against the document obtained from: http://s23.org/wikistats/wikipedias_html

    I needed to add the following at the start of the document, because I was working locally and didn’t have the DTD for XHTML — maybe you will not need these:

    <!DOCTYPE html [
    <!ENTITY uarr "&#8593;">
    <!ENTITY darr "&#8595;">
    <!ENTITY ccedil "&#199;">
    <!ENTITY oslash "&#216;">
    <!ENTITY aacute "&#225;">
    <!ENTITY aring "&#229;">
    <!ENTITY agrave "&#192;">
    <!ENTITY egrave "&#232;">
    <!ENTITY ograve "&#210;">
    <!ENTITY ocirc "&#244;">
    ]>
    

    The result of applying the above XPath expression to this document is:

                        English
    
                        German
    
                        French
    
                        Polish
    
                        Italian
    
                        Japanese
    
                        Spanish
    
                        Portuguese
    
                        Dutch
    
                        Russian
    
                        Swedish
    
                        Chinese
    
                        Catalan
    
                        Norwegian (Bokmål)
    
                        Finnish
    
                        Ukrainian
    
                        Czech
    
                        Hungarian
    
                        Romanian
    
                        Korean
    
                        Turkish
    
                        Vietnamese
    
                        Indonesian
    
                        Danish
    
                        Arabic
    
                        Esperanto
    
                        Serbian
    
                        Lithuanian
    
                        Slovak
    
                        Volapük
    
                        Persian
    
                        Hebrew
    
                        Bulgarian
    
                        Slovenian
    
                        Malay
    
                        Waray-Waray
    
                        Croatian
    
                        Estonian
    
                        Newar / Nepal Bhasa
    
                        Simple English
    
                        Hindi
    
                        Galician
    
                        Thai
    
                        Basque
    
                        Norwegian (Nynorsk)
    
                        Aromanian
    
                        Greek
    
                        Haitian
    
                        Azerbaijani
    
                        Tagalog
    
                        Latin
    
                        Telugu
    
                        Georgian
    
                        Macedonian
    
                        Cebuano
    
                        Serbo-Croatian
    
                        Breton
    
                        Piedmontese
    
                        Marathi
    
                        Latvian
    
                        Luxembourgish
    
                        Javanese
    
                        Belarusian (Taraškievica)
    
                        Welsh
    
                        Icelandic
    
                        Bosnian
    
                        Albanian
    
                        Tamil
    
                        Belarusian
    
                        Bishnupriya Manipuri
    
                        Aragonese
    
                        Occitan
    
                        Bengali
    
                        Swahili
    
                        Ido
    
                        Lombard
    
                        West Frisian
    
                        Gujarati
    
                        Afrikaans
    
                        Low Saxon
    
                        Malayalam
    
                        Quechua
    
                        Sicilian
    
                        Urdu
    
                        Kurdish
    
                        Cantonese
    
                        Sundanese
    
                        Asturian
    
                        Neapolitan
    
                        Samogitian
    
                        Armenian
    
                        Yoruba
    
                        Irish
    
                        Chuvash
    
                        Walloon
    
                        Nepali
    
                        Ripuarian
    
                        Western Panjabi
    
                        Kannada
    
                        Tajik
    
                        Tarantino
    
                        Venetian
    
                        Yiddish
    
                        Scottish Gaelic
    
                        Tatar
    
                        Min Nan
    
                        Ossetian
    
                        Uzbek
    
                        Alemannic
    
                        Kapampangan
    
                        Sakha
    
                        Egyptian Arabic
    
                        Kazakh
    
                        Maori
    
                        Limburgian
    
                        Amharic
    
                        Nahuatl
    
                        Upper Sorbian
    
                        Gilaki
    
                        Corsican
    
                        Gan
    
                        Mongolian
    
                        Scots
    
                        Interlingua
    
                        Central_Bicolano
    
                        Burmese
    
                        Faroese
    
                        Võro
    
                        Dutch Low Saxon
    
                        Sinhalese
    
                        Turkmen
    
                        West Flemish
    
                        Sanskrit
    
                        Bavarian
    
                        Malagasy
    
                        Manx
    
                        Ilokano
    
                        Divehi
    
                        Norman
    
                        Pangasinan
    
                        Banyumasan
    
                        Sorani
    
                        Romansh
    
                        Northern Sami
    
                        Zazaki
    
                        Mazandarani
    
                        Wu
    
                        Friulian
    
                        Uyghur
    
                        Ligurian
    
                        Maltese
    
                        Bihari
    
                        Novial
    
                        Tibetan
    
                        Anglo-Saxon
    
                        Kashubian
    
                        Sardinian
    
                        Classical Chinese
    
                        Fiji Hindi
    
                        Khmer
    
                        Ladino
    
                        Zamboanga Chavacano
    
                        Pali
    
                        Franco-Provençal/Arpitan
    
                        Pashto
    
                        Hakka
    
                        Cornish
    
                        Punjabi
    
                        Navajo
    
                        Silesian
    
                        Kalmyk
    
                        Pennsylvania German
    
                        Hawaiian
    
                        Saterland Frisian
    
                        Interlingue
    
                        Somali
    
                        Komi
    
                        Karachay-Balkar
    
                        Crimean Tatar
    
                        Tongan
    
                        Acehnese
    
                        Meadow Mari
    
                        Picard
    
                        Erzya
    
                        Lingala
    
                        Kinyarwanda
    
                        Extremaduran
    
                        Guarani
    
                        Kirghiz
    
                        Emilian-Romagnol
    
                        Assyrian Neo-Aramaic
    
                        Papiamentu
    
                        Aymara
    
                        Chechen
    
                        Lojban
    
                        Wolof
    
                        Banjar
    
                        Bashkir
    
                        North Frisian
    
                        Greenlandic
    
                        Tok Pisin
    
                        Udmurt
    
                        Kabyle
    
                        Tahitian
    
                        Sranan
    
                        Zealandic
    
                        Hill Mari
    
                        Komi-Permyak
    
                        Lower Sorbian
    
                        Abkhazian
    
                        Gagauz
    
                        Igbo
    
                        Oriya
    
                        Lao
    
                        Kongo
    
                        Avar
    
                        Moksha
    
                        Mirandese
    
                        Romani
    
                        Old Church Slavonic
    
                        Karakalpak
    
                        Samoan
    
                        Moldovan
    
                        Tetum
    
                        Gothic
    
                        Kashmiri
    
                        Bambara
    
                        Inupiak
    
                        Sindhi
    
                        Bislama
    
                        Lak
    
                        Nauruan
    
                        Norfolk
    
                        Inuktitut
    
                        Pontic
    
                        Assamese
    
                        Cherokee
    
                        Min Dong
    
                        Swati
    
                        Palatinate German
    
                        Hausa
    
                        Ewe
    
                        Tigrinya
    
                        Oromo
    
                        Zulu
    
                        Zhuang
    
                        Venda
    
                        Tsonga
    
                        Kirundi
    
                        Dzongkha
    
                        Sango
    
                        Cree
    
                        Chamorro
    
                        Luganda
    
                        Buginese
    
                        Buryat (Russia)
    
                        Fijian
    
                        Chichewa
    
                        Akan
    
                        Sesotho
    
                        Xhosa
    
                        Fula
    
                        Tswana
    
                        Kikuyu
    
                        Tumbuka
    
                        Shona
    
                        Twi
    
                        Cheyenne
    
                        Ndonga
    
                        Sichuan Yi
    
                        Choctaw
    
                        Marshallese
    
                        Afar
    
                        Kuanyama
    
                        Hiri Motu
    
                        Muscogee
    
                        Kanuri
    
                        Herero
    

    Do note: Every second selected node is a white-space-only text node. If you don’t want these selected, use:

    (//x:table)[1]/x:tr[not(x:th)]/x:td[2]//text()[normalize-space()]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I have a text area in my form which accepts all possible characters from
I have a jquery bug and I've been looking for hours now, I can't
link Im having trouble converting the html entites into html characters, (&# 8217;) i
For some reason, after submitting a string like this Jack’s Spindle from a text
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
this is what i have right now Drawing an RSS feed into the php,
I am doing a simple coin flipping experiment for class that involves flipping a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.