I wrote a small algorithm using LINQ to read in a bunch of files (about 30mb) and store them in memory, currently it takes about a minute for the program to finish reading in all files, however I need this process to only take a few seconds.
Code:
List<ClimateDailyData> dailyData = new List<ClimateDailyData>();
if (File.Exists(FileName))
{
StreamReader reader = new StreamReader(FileName);
try
{
List<string[]> lines =
Regex.Split(reader.ReadToEnd(), Environment.NewLine)
.Where(l => !String.IsNullOrWhiteSpace(l) && !String.IsNullOrEmpty(l))
.Select(l => l.Trim().Split(new char[] { ' ', '\t' })
.Where(f => !String.IsNullOrWhiteSpace(f) && !String.IsNullOrEmpty(f))
.Select(f => f.Trim())
.ToArray())
.ToList();
Latitude = double.Parse(lines[0][0]);
Longitude = double.Parse(lines[0][1]);
lines.RemoveRange(0, 2);
foreach (string[] fields in lines)
{
ClimateDailyData dayData = new ClimateDailyData();
dayData.DayDate = DateTime.ParseExact(fields[0], "yyyyMMdd",
CultureInfo.InvariantCulture, DateTimeStyles.None);
dayData.MaxTemp = double.Parse(fields[2]);
dayData.MinTemp = double.Parse(fields[3]);
dayData.Rain = double.Parse(fields[4]);
dayData.Pan = double.Parse(fields[5]);
dailyData.Add(dayData);
}
}
finally { reader.Close(); }
}
SetValue(() => DailyData, dailyData);
Can anyone sugest how I could speed this code up? The majority of the time seems to be involved with parsing the individual file fields (especially the date field).
However if it cannot be sped up I will simply make it so each individual file is loaded as required.
Thanks,
Alex.
EDIT:
Also I decided to just store a few fields from each file rather then all file data and then load the rest of the data in a seperate thread and make it avaiable to the user as it finishes loading.
So now it only takes 2.7seconds.
As noted in comments, it’s an odd way of reading lines – but I wouldn’t use
File.ReadAllLines, I’d useFile.ReadLinesif you’re using .NET 4 – that only reads one line at a time.Beyond that – you definitely don’t need to call
ToArrayandToList… I’d also useSelectandToListwithSkipto createdailyData. Also,String.IsNullOrWhiteSpacealready returnsfalseif the string is empty, so you can remove those calls.After splitting, you’re currently trimming and removing any empty/whitespace entries. You can remove empty entries with
StringSplitOptions.RemoveEmptyEntriesand if you’re confident that the only whitespace in a line would be space or tab, you then don’t need to worry about trimming or anything else. If you have other whitespace which needs trimming, it could still be a problem – but I doubt that’s the case. One big benefit of that is that you can use the array returned bySplitdirectly, rather than copying it to another collection.