I have the following node(s) which I retrieve in a streamreader. There could be numerous of these. I am only interested to retrieve a few groups within this node for instance REPLICATE_ID, ASSAY_NUMBER,FEW DATES FIELDS.
The ordering of the fields within the node could be different and sometimes new fields could be present as well but the fields I want to extract they will not change.
So far the regex I have matches the entire node so in case the node has new fields or the order is different, it breaks. Is it possible to match groups I am only interested in?
TEST_REPLICATE
{
REPLICATE_ID 453w
ASSAY_NUMBER 334
ASSAY_VERSION 4
ASSAY_STATUS test
DILUTION_ID 1
SAMPLE_ID "NC_dede"
SAMPLE_TYPE Specimen
TEST_ORDER_DATE 05.23.2012
TEST_ORDER_TIME 04:25:07
TEST_INITIATION_DATE 05.23.2012
TEST_INITIATION_TIME 05:19:43
TEST_COMPLETION_DATE 05.23.2012
TEST_COMPLETION_TIME 05:48:01
ASSAY_CALIBRATION_DATE NA
ASSAY_CALIBRATION_TIME NA
TRACK 1
PROCESSING_LANE 1
MODULE_SN "EP004"
LOAD_LIST_NAME C:\BwedwQwedw_SCC\edwLoadlist2RACKSB.json
OPERATOR_ID "Q_dwe"
DARK_SUBREADS 16 23 19 20 16 18 21 16 17 18 19 19 20 22 19 20 19 20 18 20 17 20 21 16 19 23 20 22 19 20
SIGNAL_SUBREADS 18 17 20 21 42 61 41 31 30 30 26 26 25 22 24 DARK_COUNT 577
SIGNAL_COUNT 781
CORRECTED_COUNT 204
STD_BAK 1.95965044971226
AVG_BAK 19.2333333333333
STD_FOR 8.67212471810898
AVG_FOR 26.0333333333333
SHAPE NA
EXCEPTION_STRING TestException - Parameters:Unable to process test, background read failure.
RESULT NA
REPORTED_RESULT NA
REPORTED_RESULT_UNITS NA
REAGENT_MASTER_LOT 13600LI02
REAGENT_SERIAL_NUMBER 25022
RESULT_FLAGS RUO
RESULT_INTERPRETATION NA
DILUTION_PROTOCOL UNDILUTED
RESULT_COMMENT frer 1 LANE A
DATA_MANAGEMENT_FIELD_1 NA
DATA_MANAGEMENT_FIELD_2 NA
DATA_MANAGEMENT_FIELD_3 NA
DATA_MANAGEMENT_FIELD_4 NA
}
string pat = @"TEST_REPLICATE\s*{\s*REPLICATE_ID\s*([^}]*?)\s+ASSAY_NUMBER\s*([^}]*?)\s+ASSAY_VERSION\s*([^}]*?)\s+DILUTION_ID\s*([^}]*?)\s+SAMPLE_ID\s*([^}]*?)\s+SAMPLE_TYPE\s*([^}]*?)\s+TEST_ORDER_DATE\s*([^}]*?)\s+TEST_ORDER_TIME\s*([^}]*?)\s+TEST_INITIATION_DATE\s*([^}]*?)\s+TEST_INITIATION_TIME\s*([^}]*?)\s+TEST_COMPLETION_DATE\s*([^}]*?)\s+TEST_COMPLETION_TIME\s*([^}]*?)\s+ASSAY_CALIBRATION_DATE\s*([^}]*?)\s+ASSAY_CALIBRATION_TIME\s*([^}]*?)\s+TRACK\s*([^}]*?)\s+PROCESSING_LANE\s*([^}]*?)\s+MODULE_SN\s*([^}]*?)\s+LOAD_LIST_NAME\s*([^}]*?)\s+OPERATOR_ID\s*([^}]*?)\s+DARK_SUBREADS\s*([^}]*?)\s+SIGNAL_SUBREADS\s*([^}]*?)\s+DARK_COUNT\s*([^}]*?)\s+SIGNAL_COUNT\s*([^}]*?)\s+CORRECTED_COUNT\s*([^}]*?)\s+STD_BAK\s*([^}]*?)\s+AVG_BAK\s*([^}]*?)\s+STD_FOR\s*([^}]*?)\s+AVG_FOR\s*([^}]*?)\s+SHAPE\s*([^}]*?)\s+EXCEPTION_STRING\s*([^}]*?)\s+RESULT\s*([^}]*?)\s+REPORTED_RESULT\s*([^}]*?)\s+REPORTED_RESULT_UNITS\s*([^}]*?)\s+REAGENT_MASTER_LOT\s*([^}]*?)\s+REAGENT_SERIAL_NUMBER\s*([^}]*?)\s+RESULT_FLAGS\s*([^}]*?)\s+RESULT_INTERPRETATION\s*([^}]*?)\s+DILUTION_PROTOCOL\s*([^}]*?)\s+RESULT_COMMENT\s*([^}]*?)\s+DATA_MANAGEMENT_FIELD_1\s*([^}]*?)\s+DATA_MANAGEMENT_FIELD_2\s*([^}]*?)\s+DATA_MANAGEMENT_FIELD_3\s*([^}]*?)\s+DATA_MANAGEMENT_FIELD_4\s*([^}]*?)\s*}";
Yeah, you probably should just parse the record for key-value pairs.
Here is a code sample if you want to extract key-value pairs from a record.
When a match is found, the key’s your looking for can be tested against those in the capture collection.
You can also alter the regex as to how the begin/end of record are allowed.
But don’t alter the core, it protects from catastrophic backtracking.
Regex alternatives:
C# test code: