list1 = ['Contact: Hamdan Z Hamdan, MBBS, Msc',
'\r\n ',
'+249912468264',
'\r\n ',
'hamdanology@hotmail.com',
'\r\n ',
'Contact: Maha I Mohammed, MBBS, PhD',
'\r\n ',
'+249912230895',
'\r\n ',
'\r\n ',
'Sudan',
'Jaber abo aliz',
'\r\n ',
'Recruiting',
'\r\n ',
'Khartoum, Sudan, 1111 ',
u'Contact: Khaled H Bakheet, MD,PhD \xa0 \xa0 +249912957764 \xa0 \xa0 ',
'khalid2_3456@yahoo.com',
u' \xa0 \xa0 ',
u'Principal Investigator: Hamdan Z Hamdan, MBBS,MSc \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 ',
'Principal Investigator:',
'\r\n ',
'Hamdan Z Hamdan, MBBS, MSc',
'\r\n ',
'Al-Neelain University',
'\r\n '
]
From this list of strings I need to extract only 4 digit integers which should not be associated with other characters?
Example: ‘1111’ only is the needed output.
How we should write the regex in python? Obviously, this won’t work: *([\d]{4})*.
You can use
\bin a regular expression to indicate a word boundary, so the following will work for you:… which just outputs
1111. The documentation for\bexplains further: