Okay… We have a contact books in Exchange that gets exported into an XML file… that gets used by our intranet… for our Associate Directory. “Something” happened that caused a chain of events that lead to the XML getting updated.
Apparently, our Squirrel Mail server uses a Perl script to transform this XML into a global.abook.
I’m not versed in Perl, but the generic idea’s seem easy to follow: Traversing the XML, for each person pull “Nickname”, Full Name, Email & Title and put into global.abook.
I’m certain the OLD XML file didn’t have the Root\XSD:Schema and Root\DataRoot layout. Uncertain as to what the best format for an update on this would be.
Perl Script:
#!/usr/bin/perl
use strict;
use XML::Parser;
use Data::Dumper;
my $url = 'http://intranet.mycompany.org/directory/directory.xml';
my $output = '/var/lib/squirrelmail/prefs/global.gabook';
my $file = "curl -sS '$url' |";
my $parser = new XML::Parser(Style => 'Tree');
my $tree = $parser->parsefile($file)->[1];
sub extract {
my ($string, $record) = @_;
for (my $i = 0; $i < @{$record}.''; $i++) {
if ($record->[$i] eq $string) {
return $record->[$i + 1][2];
}
}
return undef;
}
open FILE, "> $output"
or die "Couldn't open: $!";
for (my $i = 4; $i < @{$tree}.''; $i += 4) {
my $record = $tree->[$i];
my $full = &extract('DisplayName', $record);
my $title = &extract('JobTitle', $record);
my $email = &extract('EMailDisplayName', $record);
next unless($email);
my $nickname;
# Nickname is the first part of the email address
if ($email =~ /^(\w+)\@/) {
$nickname = $1;
}
print FILE "$nickname|$full||$email|$title" . "\n";
}
close FILE
XML File:
<?xml version="1.0" standalone="yes"?>
<root xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:od="urn:schemas-microsoft-com:officedata">
<xsd:schema>
...
</xsd:schema>
<dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" generated="2011-07-12T14:14:13">
<ROW>
<DisplayName>John Doe</DisplayName>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
<JobTitle>I.D. 10 Technologist</JobTitle>
<Company>My Company</Company>
<Department>Administration</Department>
<FileAs>Doe, John</FileAs>
<BusinessPhone>(800) 867-5309</BusinessPhone>
<EMailAddress>jdoe@mycompany.org</EMailAddress>
<EMailAddressType>SMTP</EMailAddressType>
<EMailDisplayName>jdoe@mycompany.org</EMailDisplayName>
<Initials>J.D.</Initials>
<Private>0</Private>
</ROW>
<ROW>
...
</ROW>
</dataroot>
</root>
Desired Text file:
jdoe|John Doe||jdoe@atlanticgeneral.org|I.D. 10 Technician
...
...
Is this what you were looking for?
If your XML is not terribly complex, XML::Simple is an easy solution. Also, I don’t see a big need for using
curlfrom the shell when you could just use LWP::Simple from within Perl. You could easily modify the above to become closer in its dependencies to your original script if you like though. My use of LWP::Simple could be replaced by yourcurl.I added on-screen warnings and default behavior in the case of a particular field not containing anything or not being present. For example, if EMailAddress is missing for a given row, you will get a couple of warnings about that. But a default empty string will be inserted into that column position for graceful recovery. If you considered such an issue to be serious enough you could change the
warns todie.I’m also skipping any ROW that doesn’t have a defined FileAs tag, under the assumption that at least one tag in particular has to exist for the record to be valid. You could alter that to taste, but I would keep some form of graceful ‘move on if it’s not a valid record’ code in there just in case.