Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3274204
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T19:00:46+00:00 2026-05-17T19:00:46+00:00

I am a nurse and I know python but I am not an expert,

  • 0

I am a nurse and I know python but I am not an expert, just used it to process DNA sequences
We got hospital records written in human languages and I am supposed to insert these data into a database or csv file but they are more than 5000 lines and this can be so hard. All the data are written in a consistent format let me show you an example

11/11/2010 - 09:00am : He got nausea, vomiting and died 4 hours later

I should get the following data

Sex: Male
Symptoms: Nausea
    Vomiting
Death: True
Death Time: 11/11/2010 - 01:00pm

Another example

11/11/2010 - 09:00am : She got heart burn, vomiting of blood and died 1 hours later in the operation room

And I get

Sex: Female
Symptoms: Heart burn
    Vomiting of blood
Death: True
Death Time: 11/11/2010 - 10:00am

the order is not consistent by when I say in ……. so in is a keyword and all the text after is a place until i find another keyword
At the beginnning He or She determine sex, got …….. whatever follows is a group of symptoms that i should split according to the separator which can be a comma, hypen or whatever but it’s consistent for the same line
died ….. hours later also should get how many hours, sometimes the patient is stil alive and discharged ….etc
That’s to say we have a lot of conventions and I think if i can tokenize the text with keywords and patterns i can get the job done. So please if you know a useful function/modules/tutorial/tool for doing that preferably in python (if not python so a gui tool would be nice)

Some few information:

there are a lot of rules to express various medical data but here are few examples
- Start with the same date/time format followed by a space followd by a colon followed by a space followed by He/She followed space followed by rules separated by and
- Rules:
    * got <symptoms>,<symptoms>,....
    * investigations were done <investigation>,<investigation>,<investigation>,......
    * received <drug or procedure>,<drug or procedure>,.....
    * discharged <digit> (hour|hours) later
    * kept under observation
    * died <digit> (hour|hours) later
    * died <digit> (hour|hours) later in <place>
other rules do exist but they follow the same idea
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T19:00:47+00:00Added an answer on May 17, 2026 at 7:00 pm

    This uses dateutil to parse the date (e.g. ’11/11/2010 – 09:00am’), and parsedatetime to parse the relative time (e.g. ‘4 hours later’):

    import dateutil.parser as dparser
    import parsedatetime.parsedatetime as pdt
    import parsedatetime.parsedatetime_consts as pdc
    import time
    import datetime
    import re
    import pprint
    pdt_parser = pdt.Calendar(pdc.Constants())   
    record_time_pat=re.compile(r'^(.+)\s+:')
    sex_pat=re.compile(r'\b(he|she)\b',re.IGNORECASE)
    death_time_pat=re.compile(r'died\s+(.+hours later).*$',re.IGNORECASE)
    symptom_pat=re.compile(r'[,-]')
    
    def parse_record(astr):    
        match=record_time_pat.match(astr)
        if match:
            record_time=dparser.parse(match.group(1))
            astr,_=record_time_pat.subn('',astr,1)
        else: sys.exit('Can not find record time')
        match=sex_pat.search(astr)    
        if match:
            sex=match.group(1)
            sex='Female' if sex.lower().startswith('s') else 'Male'
            astr,_=sex_pat.subn('',astr,1)
        else: sys.exit('Can not find sex')
        match=death_time_pat.search(astr)
        if match:
            death_time,date_type=pdt_parser.parse(match.group(1),record_time)
            if date_type==2:
                death_time=datetime.datetime.fromtimestamp(
                    time.mktime(death_time))
            astr,_=death_time_pat.subn('',astr,1)
            is_dead=True
        else:
            death_time=None
            is_dead=False
        astr=astr.replace('and','')    
        symptoms=[s.strip() for s in symptom_pat.split(astr)]
        return {'Record Time': record_time,
                'Sex': sex,
                'Death Time':death_time,
                'Symptoms': symptoms,
                'Death':is_dead}
    
    
    if __name__=='__main__':
        tests=[('11/11/2010 - 09:00am : He got nausea, vomiting and died 4 hours later',
                {'Sex':'Male',
                 'Symptoms':['got nausea', 'vomiting'],
                 'Death':True,
                 'Death Time':datetime.datetime(2010, 11, 11, 13, 0),
                 'Record Time':datetime.datetime(2010, 11, 11, 9, 0)}),
               ('11/11/2010 - 09:00am : She got heart burn, vomiting of blood and died 1 hours later in the operation room',
               {'Sex':'Female',
                 'Symptoms':['got heart burn', 'vomiting of blood'],
                 'Death':True,
                 'Death Time':datetime.datetime(2010, 11, 11, 10, 0),
                 'Record Time':datetime.datetime(2010, 11, 11, 9, 0)})
               ]
    
        for record,answer in tests:
            result=parse_record(record)
            pprint.pprint(result)
            assert result==answer
            print
    

    yields:

    {'Death': True,
     'Death Time': datetime.datetime(2010, 11, 11, 13, 0),
     'Record Time': datetime.datetime(2010, 11, 11, 9, 0),
     'Sex': 'Male',
     'Symptoms': ['got nausea', 'vomiting']}
    
    {'Death': True,
     'Death Time': datetime.datetime(2010, 11, 11, 10, 0),
     'Record Time': datetime.datetime(2010, 11, 11, 9, 0),
     'Sex': 'Female',
     'Symptoms': ['got heart burn', 'vomiting of blood']}
    

    Note: Be careful parsing dates. Does ‘8/9/2010’ mean August 9th, or September 8th? Do all the record keepers use the same convention? If you choose to use dateutil (and I really think that’s the best option if the date string is not rigidly structured) be sure to read the section on “Format precedence” in the dateutil documentation so you can (hopefully) resolve ‘8/9/2010’ properly.
    If you can’t guarantee that all the record keepers use the same convention for specifying dates, then the results of this script would have be checked manually. That might be wise in any case.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Any help here would be greatly appreciated. I have this table hospital Nurse |
Any help is greatly appreciated. I have a table hospital: Nurse + Year +
Here's my code, I don't know why the checkbox isnt updated when I try
I'm having difficulty with pages that does not show the updated data immediately Here's
I am struggling with avoiding cyclic dependencies. I know I need to hide the
What I am trying to do is list if an agency nurse works in
I'm developing an application that optimally assigns shifts to nurses in a hospital. I
I have a small ncurse program I'm running, but the output doesn't seem to
I'm getting this exception Exception in thread main alice.tuprolog.InvalidTheoryException: The term part is not
i' have downloaded boost 1.45.0, installed Python 3.1.3 on my mac. I have the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.