I need to separate the key and values from the text that looks like below
Student ID: 0
Department ID = 18432
Name XYZ
Subjects:
Computer Architecture
Advanced Network Security 2
In the above example Student ID, Department ID and Name are the keys and 0,18432, XYZ are values. The keys are separated from the values either by :,= or multiple spaces. I tried reg ex such as
$line =~ /(([\w\(\)]*\s)*)([=:\s?]?)\s*(\S.*)?$/;
$key = $2;
$colon=$3;
$value = $4;
The problem I am facing is identifying when a word is separated with single space and when it is separated by more than one.
The output I get is
line is Student ID: 0
key is Student , value is ID: 0
while I want key is Student ID and value is 0. For lines like Subjects: and Computer Architecture, the key should have Subjects and Computer Architecture. I have logic later when there is no value or colon, I append the strings to the previous key so it will look like Subjects=Computer Architecture;Advanced Network Security 2
Update: Thanks Ikegami for indicating that I use look behind operator. But I still seem to have problem solving it.
$line=~/^(?: ( [^:=]+ ) (?<!\s\s)\s* [:=]\s*|\s*)(.*)$/x;
So When I say (?<!\s\s)\s* [:=]\s*|\s* I mean when there more than two spaces, consume all the spaces and when there are no two consecutive spaces look for : or = and consume spaces. So if you pass below line to the expression, Shouldnt I be getting $1=Name and $2=ABC XYZ?
Name ABC XYZ
What I seem to be getting is key is empty and value is Name ABC XYZ.
If
means
then you want
or
If
means
then you want
or
Note that you can remove all the space and line breaks. For example, the last snippet can be written as: