Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 347767
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T11:18:08+00:00 2026-05-12T11:18:08+00:00

I have some data which look like that: PMID- 19587274 OWN – NLM DP

  • 0

I have some data which look like that:

PMID- 19587274
OWN - NLM
DP  - 2009 Jul 8
TI  - Domain general mechanisms of perceptual decision making in human cortex.
PG  - 8675-87
AB  - To successfully interact with objects in the environment, sensory evidence must
      be continuously acquired, interpreted, and used to guide appropriate motor
      responses. For example, when driving, a red 
AD  - Perception and Cognition Laboratory, Department of Psychology, University of
      California, San Diego, La Jolla, California 92093, USA.

PMID- 19583148
OWN - NLM
DP  - 2009 Jun
TI  - Ursodeoxycholic acid for treatment of cholestasis in patients with hepatic
      amyloidosis.
PG  - 482-6
AB  - BACKGROUND: Amyloidosis represents a group of different diseases characterized by
      extracellular accumulation of pathologic fibrillar proteins in various tissues
AD  - Asklepios Hospital, Department of Medicine, Langen, Germany.
      innere2.longen@asklepios.com

I want to write a regex which can match the sentences which follow PMID, TI and AB.

Is it possible to get these in a one shot regex?

I have spent nearly the whole day to try to figure out a regex and the closest I could get is that:

reg4 = r'PMID- (?P<pmid>[0-9]*).*TI.*- (?P<title>.*)PG.*AB.*- (?P<abstract>.*)AD'
for i in re.finditer(reg4, data, re.S | re.M): print i.groupdict()

Which will return me the matches only in the second “set” of data, and not all of them.

Any idea? Thank you!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T11:18:08+00:00Added an answer on May 12, 2026 at 11:18 am

    How about:

    import re
    reg4 = re.compile(r'^(?:PMID- (?P<pmid>[0-9]+)|TI  - (?P<title>.*?)^PG|AB  - (?P<abstract>.*?)^AD)', re.MULTILINE | re.DOTALL)
    for i in reg4.finditer(data):
        print i.groupdict()
    

    Output:

    {'pmid': '19587274', 'abstract': None, 'title': None}
    {'pmid': None, 'abstract': None, 'title': 'Domain general mechanisms of perceptual decision making in human cortex.\n'}
    {'pmid': None, 'abstract': 'To successfully interact with objects in the environment, sensory evidence must\n      be continuously acquired, interpreted, and used to guide appropriate motor\n      responses. For example, when driving, a red \n', 'title': None}
    {'pmid': '19583148', 'abstract': None, 'title': None}
    {'pmid': None, 'abstract': None, 'title': 'Ursodeoxycholic acid for treatment of cholestasis in patients with hepatic\n      amyloidosis.\n'}
    {'pmid': None, 'abstract': 'BACKGROUND: Amyloidosis represents a group of different diseases characterized by\n      extracellular accumulation of pathologic fibrillar proteins in various tissues\n', 'title': None}
    

    Edit

    As a verbose RE to make it more understandable (I think verbose REs should be used for anything but the simplest of expressions, but that’s just my opinion!):

    #!/usr/bin/python
    import re
    reg4 = re.compile(r'''
            ^                     # Start of a line (due to re.MULTILINE, this may match at the start of any line)
            (?:                   # Non capturing group with multiple options, first option:
                PMID-\s           # Literal "PMID-" followed by a space
                (?P<pmid>[0-9]+)  # Then a string of one or more digits, group as 'pmid'
            |                     # Next option:
                TI\s{2}-\s        # "TI", two spaces, a hyphen and a space
                (?P<title>.*?)    # The title, a non greedy match that will capture everything up to...
                ^PG               # The characters PG at the start of a line
            |                     # Next option
                AB\s{2}-\s        # "AB  - "
                (?P<abstract>.*?) # The abstract, a non greedy match that will capture everything up to...
                ^AD               # "AD" at the start of a line
            )
            ''', re.MULTILINE | re.DOTALL | re.VERBOSE)
    for i in reg4.finditer(data):
        print i.groupdict()
    

    Note that you could replace the ^PG and ^AD with ^\S to make it more general (you want to match everything up until the first non-space at the start of a line).

    Edit 2

    If you want to catch the whole thing in one regexp, get rid of the starting (?:, the ending ) and change the | characters to .*?:

    #!/usr/bin/python
    import re
    reg4 = re.compile(r'''
            ^                 # Start of a line (due to re.MULTILINE, this may match at the start of any line)
            PMID-\s           # Literal "PMID-" followed by a space
            (?P<pmid>[0-9]+)  # Then a string of one or more digits, group as 'pmid'
            .*?               # Next part:
            TI\s{2}-\s        # "TI", two spaces, a hyphen and a space
            (?P<title>.*?)    # The title, a non greedy match that will capture everything up to...
            ^PG               # The characters PG at the start of a line
            .*?               # Next option
            AB\s{2}-\s        # "AB  - "
            (?P<abstract>.*?) # The abstract, a non greedy match that will capture everything up to...
            ^AD               # "AD" at the start of a line
            ''', re.MULTILINE | re.DOTALL | re.VERBOSE)
    for i in reg4.finditer(data):
        print i.groupdict()
    

    This gives:

    {'pmid': '19587274', 'abstract': 'To successfully interact with objects in the environment, sensory evidence must\n      be continuously acquired, interpreted, and used to guide appropriate motor\n      responses. For example, when driving, a red \n', 'title': 'Domain general mechanisms of perceptual decision making in human cortex.\n'}
    {'pmid': '19583148', 'abstract': 'BACKGROUND: Amyloidosis represents a group of different diseases characterized by\n      extracellular accumulation of pathologic fibrillar proteins in various tissues\n', 'title': 'Ursodeoxycholic acid for treatment of cholestasis in patients with hepatic\n      amyloidosis.\n'}
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some data that look like this: john, dave, chris rick, sam, bob
I have some MET data I want to validate which would look something like
I'm considering the following: I have some data stream which I'd like to protect
I have a file which is an XML representation of some data that is
I have some data files from a legacy system that I would like to
I have some data that goes over three levels, like this: Identifier, Key, Value
I have some code which needs to ensure some data is in a mysql
I have a lingo script which runs some data processing for a Flash movie.
I have a PHP-generated page which is displaying some data about a set of
I have a php script which accesses a MSSQL2005 database, reads some data from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.