Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9009269
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T02:08:42+00:00 2026-06-16T02:08:42+00:00

I would like to use Python to extract content formatted in MediaWiki markup following

  • 0

I would like to use Python to extract content formatted in MediaWiki markup following a particular string. For example, the 2012 U.S. presidential election article, contains fields called “nominee1” and “nominee2”. Toy example:

In [1]: markup = get_wikipedia_markup('United States presidential election, 2012')
In [2]: markup
Out[2]:
u"{{
| nominee1 = '''[[Barack Obama]]'''\n
| party1 = Democratic Party (United States)\n
| home_state1 = [[Illinois]]\n
| running_mate1 = '''[[Joe Biden]]'''\n
| nominee2 = [[Mitt Romney]]\n
| party2 = Republican Party (United States)\n
| home_state2 = [[Massachusetts]]\n
| running_mate2 = [[Paul Ryan]]\n
}}"

Using the election article above as an example, I would like to extract the information immediately following the “nomineeN” field but that exists before the invocation of the next field (demarcated by a pip “|”). Thus, given the example above, I would ideally like to extract “Barack Obama” and “Mitt Romney” — or at least the syntax in which they’re embedded (”'[[Barack Obama]]”’ and [[Mitt Romney]]). Other regex has extracted links from the wikimarkup, but my (failed) attempts of using a positive lookbehind assertion have been something of the flavor of:

nominees = re.findall(r'(?<=\|nominee\d\=)\S+',markup)

My thinking is that it should find strings like “|nominee1=” and “|nominee2=” with some whitespace possible between “|”, “nominee”, “=” and then return the content following it like “Barack Obama” and “Mitt Romney”.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T02:08:44+00:00Added an answer on June 16, 2026 at 2:08 am

    Lookbehinds aren’t necessary here—it’s much easier to use matching groups to specify exactly what should be extracted from the string. (In fact, lookbehinds can’t work here with Python’s regular expression engine, since the optional spaces make the expression variable-width.)

    Try this regex:

    \|\s*nominee\d+\s*=\s*(?:''')?\[\[([^]]+)\]\](?:''')?
    

    Results:

    re.findall(r"\|\s*nominee\d+\s*=\s*(?:''')?\[\[([^]]+)\]\](?:''')?", markup)
    # => ['Barack Obama', 'Mitt Romney']
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to use R to extract the speaker out of scripts formatted
I would like to use python to make system calls to programs and time
I'm stumped by this seemingly trivial problem... I would like to use python to
I'm new to OpenCV and would like to use its Python binding. When trying
I would like to use Mechanize (with Python) to submit a form, but unfortunately
I would like to use some code written in python (it uses built in
I use tabs for indentation in my python programs, but I would like to
I would like to use the python logging module to log all of the
I would like to use python webbrowser to access a secure https page and
I am porting some Java code to Python and we would like to use

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.