I have text which shows course numbers, names, grade and other information for courses taken by students. Specifically, the lines look like these:
0301 453 20071 LINEAR SYSTEMS I A 4 4 16.0
0301 481 20071 ELECTRONICS I WITH LAB A 4 4 16.0
0301 481 20084 ELECTRONICS II WITH LAB RE B 4 4 12.0
0301 713 20091 SOLID STATE PHYSICS NG 0 0 0.0
0511 454 20074 INT'L TRADE & FINANCE B 4 4 12.0
I want to write a regular expression that extracts:
LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB
SOLID STATE PHYSICS
INT'L TRADE & FINANCE
I wrote the following
pattCourseName = re.compile(r'([-/&A-Z\':\s]{2,})(\s+[A-Z])')
However, this gives me
LINEAR SYSTEMS I
ELECTRONICS I WITH LAB
ELECTRONICS II WITH LAB RE
SOLID STATE PHYSICS
INT'L TRADE & FINANCE
That is, I cannot get rid of the RE part.
Can someone please help with this? Thanks!
If the layout is fixed as you show, then forget the regular expression, and just grab the columns you want: