Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 678765
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T01:11:09+00:00 2026-05-14T01:11:09+00:00

Here is the situation: I am making a small prog to parse server log

  • 0

Here is the situation:

I am making a small prog to parse server log files.

I tested it with a log file with several thousand requests (between 10000 – 20000 don’t know exactly)

What i have to do is to load the log text files into memory so that i can query them.

This is taking the most resources.

The methods that take the most cpu time are those (worst culprits first):

string.split – splits the line values into a array of values

string.contains – checking if the user agent contains a specific agent string. (determine browser ID)

string.tolower – various purposes

streamreader.readline – to read the log file line by line.

string.startswith – determine if line is a column definition line or a line with values

there were some others that i was able to replace. For example the dictionary getter was
taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time.

Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec.

Now this is really bad.

Imagine a site that has tens of thousands of visits a day. It’s going to take minutes to load the log file.

So what are my alternatives? If any, cause i think this is just a .net limitation and i can’t do much about it.

EDIT:

If some of you gurus want to look at the code and find the problem here are my code files:

  • http://freehosting1.net/temp/data.txt
  • http://freehosting1.net/temp/logentry.txt
  • http://freehosting1.net/temp/lists.txt

The function that takes the most resources is by far LogEntry.New
The function that loads all the data is called Data.Load

Total amount of LogEntry objects created: 50 000. Time taken: 0.9 – 1.0 seconds.

CPU: amd phenom II x2 545 3ghz.

not multithreaded

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T01:11:09+00:00Added an answer on May 14, 2026 at 1:11 am

    Without seeing your code, it’s hard to know whether you’ve got any mistakes there which are costing you performance. Without seeing some sample data, we can’t reasonably try experiments to see how we’d fare ourselves.

    What was your dictionary key before? Moving to a multi-dimensional array sounds like an odd move – but we’d need more information to know what you were doing with the data before.

    Note that unless you’re explicitly parallelizing the work, having a dual core machine won’t make any difference. If you’re really CPU bound then you could parallelize – although you’d need to do so carefully; you would quite probably want to read a “chunk” of text (several lines) and ask one thread to parse it rather than handing off one line at a time. The resulting code would probably be significantly more complex though.

    I don’t know whether one second for 10,000 lines is reasonable or not, to be honest – if you could post some sample data and what you need to do with it, we could give more useful feedback.

    EDIT: Okay, I’ve had a quick look at the code. A few thoughts…

    Most importantly, this probably isn’t something you should do “on demand”. Instead, parse periodically as a background process (e.g. when logs roll over) and put the interesting information in a database – then query that database when you need to.

    However, to optimise the parsing process:

    • I would personally not keep checking whether the StreamReader is at the end – just call ReadLine until the result is Nothing.
    • If you’re expecting the “#fields” line to come first, then read that outside the loop. Then you don’t need to see whether you’ve already got the fields on every iteration.
    • If you know a line is non-empty, it’s possible that testing for the first character being ‘#’ could be faster than calling line.StartsWith("#") – I’d have to test.
    • You’re scanning through the fields every time you ask for the date, time, URI stem or user agent; instead, when you parse the “#fields” line you could create an instance of a new LineFormat class which can cope with any field names, but specifically remembers the index of fields that you know you’re going to want. This also avoids copying the complete list of fields for each log entry, which is pretty wasteful.
    • When you split the string, you have more information than normal: you know how many fields to expect, and you know you’re only splitting on a single character. You could probably write an optimised version of this.
    • It may be faster to parse the date and time fields separately and then combine the result, rather than concatenating them and then parsing. I’d have to test it.
    • Multi-dimensional arrays are significantly slower than single-dimensional arrays. If you do want to keep to the “copy all the field names per entry” idea, it would be worth separating into two arrays: one for the fields, one for the values.

    There are probably other things, but I’m afraid I don’t have the time to go into them now 🙁

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 512k
  • Answers 512k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer As you have discovered SQL Server & Oracle temporary tables… May 16, 2026 at 5:41 pm
  • Editorial Team
    Editorial Team added an answer Did you switch off warnings? You should have got some… May 16, 2026 at 5:41 pm
  • Editorial Team
    Editorial Team added an answer This is just a wild guess, but have you tried… May 16, 2026 at 5:41 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Related Questions

Now here is my situation: I'm making a CMS. When links are clicked, i
Here is the situation. I am making changes to an application but I do
Here is my situation. I would like to make writing to the file system
I am making some small business intelligence applications/tools that need to talk to other
Here's the situation: we have an Oracle database we need to connect to to
I really have a strange situation. I'm making a Linux multi-threaded C application using
I'm making a game which you can see here, if you are on Windows
Here's my situation, and it's probably fairly common but I have yet to find
Here is my situation: i have a page once clicking on one of it's
Here's the situation: I have a menu that needs to be created dynamically from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.