I got the following method which is used to read a txt file and return a dictionary. It takes ~7 mins to read a ~5MB file (67000 lines, 70 chars in each line).
public static Dictionary<string, string> FASTAFileReadIn(string file)
{
Dictionary<string, string> seq = new Dictionary<string, string>();
Regex re;
Match m;
GroupCollection group;
string currentName = string.Empty;
try
{
using (StreamReader sr = new StreamReader(file))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(">"))
{// Match Sequence
re = new Regex(@"^>(\S+)");
m = re.Match(line);
if (m.Success)
{
group = m.Groups;
if (!seq.ContainsKey(group[1].Value))
{
seq.Add(group[1].Value, string.Empty);
currentName = group[1].Value;
}
}
}
else if (Regex.Match(line.Trim(), @"\S+").Success &&
currentName != string.Empty)
{
seq[currentName] += line.Trim();
}
}
}
}
catch (IOException e)
{
Console.WriteLine("An IO exception has benn thrown!");
Console.WriteLine(e.ToString());
}
finally { }
return seq;
}
Which part of the code is most time consuming and how to speed it up?
Thanks
Cache and compile regular expressions, reorder conditionals, lessen number of trimmings, and such.
However, that’s just a naïve optimization. Reading up on the FASTA format, I wrote this:
Tell me if it works; it should be much faster.