In our new project we have to provide a search functionality to retrieve data from hundreds of xml files. I have a brief of our current plan below, I would like to know your suggestions/improvements on this.
These xml files contain personal information, and the search is based on 10 elements in it for example last name, first name, email etc. Our current plan is to create an master XmlDocument with all the searchable data and a key to the actual file. So that when the user searches the data we first look at master file and get the the results. We will also cache the actual xml files from the recent searches so simillar searches later can be handled quickly.
Our application is a .net 2.0 web application.
First: how big are the xml files?
XmlDocumentdoesn’t scale to ‘huge’… but can handle ‘large’ OK.Second: can you perhaps put the data into a regular database structure (perhaps SQL Server Express Edition), index it, and access via regular TSQL? That will usually out-perform an xpath search. Equally, if it is structured, SQL Server 2005 and above supports the
xmldata-type, which shreds data – this allows you to index and query xml data in the database without having the entire DOM in memory (it translates xpath into relational queries).