As you have discovered SQL Server & Oracle temporary tables…

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T01:11:09+00:00 2026-05-14T01:11:09+00:00

Here is the situation: I am making a small prog to parse server log

0

Here is the situation:

I am making a small prog to parse server log files.

I tested it with a log file with several thousand requests (between 10000 – 20000 don’t know exactly)

What i have to do is to load the log text files into memory so that i can query them.

This is taking the most resources.

The methods that take the most cpu time are those (worst culprits first):

string.split – splits the line values into a array of values

string.contains – checking if the user agent contains a specific agent string. (determine browser ID)

string.tolower – various purposes

streamreader.readline – to read the log file line by line.

string.startswith – determine if line is a column definition line or a line with values

there were some others that i was able to replace. For example the dictionary getter was
taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time.

Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec.

Now this is really bad.

Imagine a site that has tens of thousands of visits a day. It’s going to take minutes to load the log file.

So what are my alternatives? If any, cause i think this is just a .net limitation and i can’t do much about it.

EDIT:

If some of you gurus want to look at the code and find the problem here are my code files:

The function that takes the most resources is by far LogEntry.New
The function that loads all the data is called Data.Load

Total amount of LogEntry objects created: 50 000. Time taken: 0.9 – 1.0 seconds.

CPU: amd phenom II x2 545 3ghz.

not multithreaded

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T01:11:09+00:00

Without seeing your code, it’s hard to know whether you’ve got any mistakes there which are costing you performance. Without seeing some sample data, we can’t reasonably try experiments to see how we’d fare ourselves.

What was your dictionary key before? Moving to a multi-dimensional array sounds like an odd move – but we’d need more information to know what you were doing with the data before.

Note that unless you’re explicitly parallelizing the work, having a dual core machine won’t make any difference. If you’re really CPU bound then you could parallelize – although you’d need to do so carefully; you would quite probably want to read a “chunk” of text (several lines) and ask one thread to parse it rather than handing off one line at a time. The resulting code would probably be significantly more complex though.

I don’t know whether one second for 10,000 lines is reasonable or not, to be honest – if you could post some sample data and what you need to do with it, we could give more useful feedback.

EDIT: Okay, I’ve had a quick look at the code. A few thoughts…

Most importantly, this probably isn’t something you should do “on demand”. Instead, parse periodically as a background process (e.g. when logs roll over) and put the interesting information in a database – then query that database when you need to.

However, to optimise the parsing process:

I would personally not keep checking whether the StreamReader is at the end – just call ReadLine until the result is Nothing.
If you’re expecting the “#fields” line to come first, then read that outside the loop. Then you don’t need to see whether you’ve already got the fields on every iteration.
If you know a line is non-empty, it’s possible that testing for the first character being ‘#’ could be faster than calling line.StartsWith("#") – I’d have to test.
You’re scanning through the fields every time you ask for the date, time, URI stem or user agent; instead, when you parse the “#fields” line you could create an instance of a new LineFormat class which can cope with any field names, but specifically remembers the index of fields that you know you’re going to want. This also avoids copying the complete list of fields for each log entry, which is pretty wasteful.
When you split the string, you have more information than normal: you know how many fields to expect, and you know you’re only splitting on a single character. You could probably write an optimised version of this.
It may be faster to parse the date and time fields separately and then combine the result, rather than concatenating them and then parsing. I’d have to test it.
Multi-dimensional arrays are significantly slower than single-dimensional arrays. If you do want to keep to the “copy all the field names per entry” idea, it would be worth separating into two arrays: one for the fields, one for the values.

There are probably other things, but I’m afraid I don’t have the time to go into them now 🙁

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions