I’m a rookie with LINQ to XML and I’ve got this code that works (most of the time):
private long processFile(StreamWriter oWriter, string inFileName)
{
XDocument xmlDoc = XDocument.Load(inFileName);
List<DocMetaData> docList =
(from d in xmlDoc.Descendants("DOCUMENT")
select new DocMetaData
{
Folder = d.Element("FOLDER").Attribute("name").Value
,
File = d.Element("FILE").Attribute("filename").Value
,
Comment = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Comment(idmComment)")
.First()
.Attribute("value").Value
,
Title = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Title(idmName)")
.First()
.Attribute("value").Value
,
DocClass = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Document Class(idmDocType)")
.First()
.Attribute("value").Value
}
).ToList<DocMetaData>();
OutputListToFile(oWriter, docList);
return docList.LongCount();
}
This fails on line 117 (the select expression) with:
System.NullReferenceException: Object reference not set to an instance of an object.
at CBMI.WinFormsUI.GridForm.<processFile>b__3(XElement d) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 117
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at CBMI.WinFormsUI.GridForm.processFile(StreamWriter oWriter, String inFileName) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 115
at CBMI.WinFormsUI.GridForm.btnProcess_Click(Object sender, EventArgs e) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 85
The data is well-formed XML. There are many <DOCUMENT> nodes in this given XML file and most but not all, <DOCUMENT> nodes contain a <FOLDER> node. I discovered this by brute force opening the XML file with VStudio 2010 and using Find command which gives counts of matching lines.
Is there a way I can improve the LINQ such that it does not fail when the data is not perfect? And, is there a way to see which part of the LINQ expression is actually failing (I’m guessing that it is due to the missing <FOLDER> nodes but that could be wrong and is an ugly brute force way to troubleshoot).
Here is one <DOCUMENT> that DOES contain the proper <FOLDER> node (at the very bottom):
<?xml version="1.0" ?>
<DOCUMENTCOLLECTION>
<DOCUMENT>
<FILE filename="P:\LatitudeConsulting\LatConConverter-1.8.2\ConverterOutput\B0000002\3rd Party CON\D003694452.0001.tif"
outputpath="P:\LatitudeConsulting\LatConConverter-1.8.2\ConverterOutput\B0000002\3rd Party CON"/>
<ANNOTATION filename=""/>
<INDEX name="Access Level(idmAccessLevel)" value="Admin"/>
<INDEX name="Added By Group(idmAddedByGroup)" value="General Users"/>
<INDEX name="Added By User(idmDocOwner)" value="Import"/>
<INDEX name="Allow Secondary Version Lines?(idmDocVariants)" value="Yes"/>
<INDEX name="Application(idmVerApplication)" value=""/>
<INDEX name="Archive Category(idmDocDispCategory)" value="Archive"/>
<INDEX name="Archive Date(idmVerDispDate)" value=""/>
<INDEX name="Archive Repository(idmVerDispId)" value=""/>
<INDEX name="ArchivedDocument" value="NO"/>
<INDEX name="Availability Status(idmVerAvailStat)" value="Online"/>
<INDEX name="CAN(idmDocCustom4)" value=""/>
<INDEX name="Checked In By Group(idmVerCheckinGroup)" value="General Users"/>
<INDEX name="Checked In By User(idmVerCheckinUser)" value="Import"/>
<INDEX name="Checked Out?(idmVerCheckoutPending)" value="No"/>
<INDEX name="Checkin Date(idmVerCreateDate)" value="3/9/2001 9:20:38 AM"/>
<INDEX name="Child Count(idmVerCD)" value="0"/>
<INDEX name="Comment(idmComment)" value="1983\06_June_Meeting"/>
<INDEX name="Comment(idmVerComment)" value=""/>
<INDEX name="Content Search Repository(idmVerCsiId)" value=""/>
<INDEX name="Current Content Srch Repository(idmDocCurVerCsiId)" value=""/>
<INDEX name="Current Version Author(idmAddedByUser)" value="Import"/>
<INDEX name="Current Version Checked Out?(idmDocCurVerCheckedOut)" value="No"/>
<INDEX name="Current Version Date(idmDocCurVerDate)" value="3/9/2001 9:20:38 AM"/>
<INDEX name="Current Version ID(idmDocCurVerNum)" value="1"/>
<INDEX name="Current Version Index ID(idmDocCurVerCsiCid)" value=""/>
<INDEX name="Date Added(idmDateAdded)" value="3/9/2001 9:20:37 AM"/>
<INDEX name="Default Index Versions?(idmDocCsiDefault)" value="No"/>
<INDEX name="DiagnosticID(idmDocCustom5)" value="2-16.MDB-00015"/>
<INDEX name="Document Class(idmDocType)" value="3rd Party CON"/>
<INDEX name="Encrypted File Name(idmVerShelfFileId)" value="_276no__.__1"/>
<INDEX name="ExternalDocument" value="NO"/>
<INDEX name="File Name" value="51099.TIF"/>
<INDEX name="File Name(idmVerFileName)" value="51099.TIF"/>
<INDEX name="File Size(idmVerFileSize)" value="1166770"/>
<INDEX name="Has Annotations?(idmAnnotation)" value=""/>
<INDEX name="Index ID(idmVerCsiCid)" value=""/>
<INDEX name="Indexed Version Limit(idmDocCsiLimit)" value="1"/>
<INDEX name="Indexing Status(idmVerCsiStatus)" value="Not Indexed"/>
<INDEX name="Item ID(idmId)" value="003694452"/>
<INDEX name="Item ID(idmVerDocId)" value="003694452"/>
<INDEX name="Keyword(idmDocKeywords)" value=""/>
<INDEX name="Last Access Date(idmDateAccessed)" value="11/28/2003 3:05:30 PM"/>
<INDEX name="Last Access Date(idmDateModified)" value="8/24/2011 5:52:34 PM"/>
<INDEX name="Last Access Group(idmVerLastGroup)" value="Administrators"/>
<INDEX name="Last Access User(idmModifiedByUser)" value="Admin"/>
<INDEX name="Last Accessed Version(idmDocLastVerId)" value="1"/>
<INDEX name="Latest Version?(idmVerBranchCurVer)" value="Yes"/>
<INDEX name="Merge-Destination Version ID(idmVerMergeDst)" value="0"/>
<INDEX name="Merge-Source Version ID(idmVerMergeSrc)" value="0"/>
<INDEX name="MimeType" value="image/tiff"/>
<INDEX name="Min Item Delete Access Level(idmDocDeleteAccess)" value=""/>
<INDEX name="Modification Date(idmVerFileDate)" value="12/19/2000 11:12:30 AM"/>
<INDEX name="Number of Indexed Versions(idmDocCsiCount)" value="0"/>
<INDEX name="Offline Location(idmVerOfflineLocation)" value=""/>
<INDEX name="Online Disk Space(idmDocOnlineSize)" value="1166770"/>
<INDEX name="Online Limit(idmDocOnlineLimit)" value="5"/>
<INDEX name="Online Version Count(idmDocOnlineCount)" value="1"/>
<INDEX name="Origin ID(idmDocOriginID)" value=""/>
<INDEX name="Origin Library(idmDocOriginLibrary)" value=""/>
<INDEX name="Original File Name(idmDocOriginalFile)" value="51099.TIF"/>
<INDEX name="Permanent Index?(idmVerCsiPermanent)" value="No"/>
<INDEX name="Permanent Version?(idmVerPermanent)" value="No"/>
<INDEX name="Property ID(idmDocDynPropertyId)" value=""/>
<INDEX name="Protected?(idmDocProtected)" value="Yes"/>
<INDEX name="Publishing Status(idmPublish)" value=""/>
<INDEX name="Reclaim Pending?(idmVerReclaimPending)" value=""/>
<INDEX name="Reclaim Submitted Date(idmVerReclaimDate)" value=""/>
<INDEX name="Replica?(idmDocIsReplica)" value="No"/>
<INDEX name="ReplicatedDocument" value="NO"/>
<INDEX name="Secondary Version Line Count(idmVerBranchCount)" value="0"/>
<INDEX name="Source Version Checkout Date(idmVerPrevCheckoutDate)" value=""/>
<INDEX name="Storage Category(idmDocFileCategory)" value="Documents"/>
<INDEX name="Storage Repository(idmVerShelfId)" value="2"/>
<INDEX name="Title(idmName)" value="3rd Party CON Comments"/>
<INDEX name="Version ID(idmVerId)" value="1"/>
<FOLDER name="/NACAIE/1983/06_June_Meeting/NAPNSC"/>
</DOCUMENT>
EDIT: solution follows (contains LINQ that fixed this problem when FOLDER node might be missing; use of First() might be dangerous practice as others note but in this case missing FOLDER nodes had to be handled):
namespace CBMI.Common
{
public static class Extensions
{
public static string SafeGetAttributeValue(this XElement element, string attribute)
{
return (element != null) ?
(element.Attribute(attribute) != null) ?
element.Attribute(attribute).Value : null : null;
}
}
}
private long processFile(StreamWriter oWriter, string inFileName)
{
XDocument xmlDoc = XDocument.Load(inFileName);
List<DocMetaData> docList =
(from d in xmlDoc.Descendants("DOCUMENT")
select new DocMetaData
{
File = d.Element("FILE").Attribute("filename").Value
,
ItemID = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Item ID(idmId)")
.First()
.Attribute("value").Value
,
Comment = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Comment(idmComment)")
.First()
.Attribute("value").Value
,
Title = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Title(idmName)")
.First()
.Attribute("value").Value
,
DocClass = d.Elements("INDEX")
.Where(i => i.Attribute("name").Value == "Document Class(idmDocType)")
.First()
.Attribute("value").Value
,
Folder = d.Element("FOLDER").SafeGetAttributeValue("name")
}
).ToList<DocMetaData>();
OutputListToFile(oWriter, docList);
return docList.LongCount();
}
You can always check the given node before trying to select it:
But I admit that can get kind of ugly. In that case you can create an XElement extension method that does that:
Which you could use like: