I’m setting up a PHP script that will have emails piped to it from a maintenance help desk. These emails are sent from a web form used by our client company, which I have no control over. The emails are standarised in format but contain a list with labels that is fed from the web form. I want to use regular expressions to split out this list and put the labels and values into an array, which I can feed into my own database. I have got a working solution on the go but I’m very new at regex and I’m sure there is a better / more efficient way to do it.
An example of an email that I may recieve:
Dear *MY COMPANY*,
A new job has been raised, please see details below.
If you are unable to action this job request, please notify the Maintenance Help Desk on xxx-xxxx as soon as possible.
Job Type: Man In Van
Job Code: 1462399
Due Date: 27/09/2012 07:21:10
Response Time: Man In Van
Pub Number: 234
Pub Name: pub name, location
Pub Address: 123 somewhere, some place XX1 7XX
Pub Post Code: XX1 7XX
Pub Telephone Number: xxx xxxx
Placed By: Ben
Date/time placed: 20/09/2012 07:21:10
Trade Type: Man In Van
Description: List of jobs emailed by Chris, carried out by Martin Baker. No callout on system currently, although jobs already completed, just need signing off.
For any queries, please either contact the pub directly, telephone the Maintenance Help Desk on xxx-xxxx or reply to this e-mail.
Many Thanks
*CLIENT COMPANY*
There is more boilerplate around it, and obviously the email headers and such, but you get the idea. Each email will only contain one list, and the labels will remain the same, although I would like to future proof it so should they add new fields I will not need to change my code. I want to end up with an array such as:
$job['Job Type'] = Man in van
$job['Job Code'] = 1462399
...
$job['Description'] = List of all jobs emailed ... just need signing off.
Although I can be confident that the format will not change, every form is user input and as such may be unpredictable, particularly the description, which may contain line breaks.
This is the code I am using at the moment:
// Rip out the job details from the email
preg_match_all('/job type\:.*description\:.*\s{3}F/is', $the_email, $jobs);
for each job returned (should always be one but hey)
foreach($jobs[0] as $job_details) {
// Get the variables from the job description
preg_match_all('/(\w[^\:]*)\: ([\w\d][^\*]+)/i', $job_details, $the_vars);
}
// For each row returned, put into an array with the first group as the key and the second as the value
for ($i=0; $i<count($the_vars[0]); $i++) {
$arr[$the_vars[1][$i]] = $the_vars[2][$i];
}
It works, but it is ugly and I’m sure there is a better way. The main problem I am having is the description section, as I cannot simply search for the text following the ‘:’ sign up until a line break, as the description itself may contain line breaks.
Any advice would be much appreciated!
Still not the prettiest thing in the world but it should work just fine!
The first
preg_match_alldoes what you said doesn’t really work, just grabs all of the fields by whitespace, colon and newline.The second one replaces the potentially erroneous Description key that the first one filled in.