I have the following file and I am using an iterator block to parse certain re-occuring nodes/parts within the file. I initially used regex to parse the entire file but when certain fields were not present in a node, it would not match. So I am trying to use the yield pattern. The file format is as follows perceeded with the code I am using. All I want from the file are the replicate nodes as an individual part so I can fetch fields within it using a key string and store in collection of objects. I can start parsing where the first replicate occurs but unable to end it where the replicate node ends.
File Format:
X_HEADER
{
DATA_MANAGEMENT_FIELD_2 NA
DATA_MANAGEMENT_FIELD_3 NA
DATA_MANAGEMENT_FIELD_4 NA
SYSTEM_SOFTWARE_VERSION NA
}
Y_HEADER
{
DATA_MANAGEMENT_FIELD_2 NA
DATA_MANAGEMENT_FIELD_3 NA
DATA_MANAGEMENT_FIELD_4 NA
SYSTEM_SOFTWARE_VERSION NA
}
COMPLETION
{
NUMBER 877
VERSION 4
CALIBRATION_VERSION 1
CONFIGURATION_ID 877
}
REPLICATE
{
REPLICATE_ID 1985
ASSAY_NUMBER 656
ASSAY_VERSION 4
ASSAY_STATUS Research
DILUTION_ID 1
}
REPLICATE
{
REPLICATE_ID 1985
ASSAY_NUMBER 656
ASSAY_VERSION 4
ASSAY_STATUS Research
}
Code:
static IEnumerable<IDictionary<string, string>> ReadParts(string path)
{
using (var reader = File.OpenText(path))
{
var current = new Dictionary<string, string>();
string line;
while ((line = reader.ReadLine()) != null)
{
if (string.IsNullOrWhiteSpace(line)) continue;
if (line.StartsWith("REPLICATE"))
{
yield return current;
current = new Dictionary<string, string>();
}
else
{
var parts = line.Split('\t');
}
if (current.Count > 0) yield return current;
}
}
}
public static void parseFile(string fileName)
{
foreach (var part in ReadParts(fileName))
{
//part["fIELD1"] will retireve certain values from the REPLICATE PART HERE
}
}
If you add a
yield return current;after your while loop is over, you will get the final dictionary.I believe it would be better to check for ‘}’ as an end to the current block, and then put the
yield returnthere. although you can’t use regex t parse the entire file, you can use regex to search for the key-value pairs within the lines. The following iterator code should work. It will only return dictonaries for REPLICATE blocks.Update: I made sure that the regex string includes cases where there is no value. In addition, the group indexes were all changed to use the group name to avoid any issues if the regex string is modified.