Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8979733
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T19:54:50+00:00 2026-06-15T19:54:50+00:00

I’m looking for the fastest (generic approach) to converting strings into various data types

  • 0

I’m looking for the fastest (generic approach) to converting strings into various data types on the go.

I am parsing large text data files generated by a something (files are several megabytes in size). This particulare function reads lines in the text file, parses each line into columns based on delimitters and places the parsed values into a .NET DataTable. This is later inserted into a database. My bottleneck by FAR is the string conversions (Convert and TypeConverter).

I have to go with a dynamic way (i.e. staying away form “Convert.ToInt32” etc…) because I never know what types are going to be in the files. The type is determined by earlier configuration during runtime.

So far I have tried the following and both take several minutes to parse a file. Note that
if I comment out this one line it runs in only a few hundred milliseconds.

row[i] = Convert.ChangeType(columnString, dataType);

AND

TypeConverter typeConverter = TypeDescriptor.GetConverter(type);
row[i] = typeConverter.ConvertFromString(null, cultureInfo, columnString);

If anyone knows of a faster way that is generic like this I would like to know about it. Or if my whole approach just sucks for some reason I’m open to suggestions. But please don’t point me to non-generic approaches using hard coded types; that is simply not an option here.

UPDATE – Multi-threading to Improve Performance Test

In order to improve performance I have looked into splitting up parsing tasks to multiple threads. I found that the speed increased somewhat but still not as much as I had hoped. However, here are my results for those who are interested.

System:

Intel Xenon 3.3GHz Quad Core E3-1245

Memory: 12.0 GB

Windows 7 Enterprise x64

Test:

The test function is this:

(1) Receive an array of strings. (2) Split the string by delimitters. (3) Parse strings into data types and store them in a row. (4) Add row to data table. (5) Repeat (2)-(4) until finished.

The test included 1000 strings, each string being parsed into 16 columns, so that is 16000 string conversions total. I tested single thread, 4 threads (because of quad core), and 8 threads (because of hyper-threading). Since I’m only crunching data here I doubt adding more threads than this would do any good. So for the single thread it parses 1000 strings, 4 threads parse 250 strings each, and 8 threads parse 125 strings each. Also I tested a few different ways of using threads: thread creation, thread pool, tasks, and function objects.

Results:
Result times are in Milliseconds.

Single Thread:

  • Method Call: 17720

4 Threads

  • Parameterized Thread Start: 13836
  • ThreadPool.QueueUserWorkItem: 14075
  • Task.Factory.StartNew: 16798
  • Func BeginInvoke EndInvoke: 16733

8 Threads

  • Parameterized Thread Start: 12591
  • ThreadPool.QueueUserWorkItem: 13832
  • Task.Factory.StartNew: 15877
  • Func BeginInvoke EndInvoke: 16395

As you can see the fastest is using Parameterized Thread Start with 8 threads (the number of my logical cores). However it does not beat using 4 threads by much and is only about 29% faster than using a single core. Of course results will vary by machine. Also I stuck with a

    Dictionary<Type, TypeConverter>

cache for string parsing as using arrays of type converters did not offer a noticeable performance increase and having one shared cached type converter is more maintainable rather than creating arrays all over the place when I need them.

ANOTHER UPDATE:

Ok so I ran some more tests to see if I could squeeze some more performance out and I found some interesting things. I decided to stick with 8 threads, all started from the Parameterized Thread Start method (which was the fastest of my previous tests). The same test as above was run, just with different parsing algorithms.
I noticed that

    Convert.ChangeType and TypeConverter

take about the same amount of time. Type specific converters like

    int.TryParse

are slightly faster but not an option for me since my types are dynamic. ricovox had some good advice about exception handling. My data does indeed have invalid data, some integer columns will put a dash ‘-‘ for empty numbers, so type converters blow up at that: meaning every row I parse I have at least one exception, thats 1000 exceptions! Very time consuming.

Btw this is how I do my conversions with TypeConverter. Extensions is just a static class and GetTypeConverter just returns a cahced TypeConverter. If an exceptions is thrown during the conversion, a default value is used.

public static Object ConvertTo(this String arg, CultureInfo cultureInfo, Type type, Object defaultValue)
{
  Object value;
  TypeConverter typeConverter = Extensions.GetTypeConverter(type);

  try
  {
    // Try converting the string.
    value = typeConverter.ConvertFromString(null, cultureInfo, arg);
  }
  catch
  {
    // If the conversion fails then use the default value.
    value = defaultValue;
  }

  return value;
}

Results:

Same test on 8 threads – parse 1000 lines, 16 columns each, 250 lines per thread.

So I did 3 new things.

1 – Run the test: check for known invalid types before parsing to minimize exceptions.
i.e. if(!Char.IsDigit(c)) value = 0; OR columnString.Contains(‘-‘) etc…

Runtime: 29ms

2 – Run the test: use custom parsing algorithms that have try catch blocks.

Runtime: 12424ms

3 – Run the test: use custom parsing algorithms checking for invalid types before parsing to minimize exceptions.

Runtime 15ms

Wow! As you can see eliminating the exceptions made a world of difference. I never realized how expensive exceptions really were! So If I minimize my exceptions to TRULY unknown cases, then the parsing algorithm runs three orders of magnitude faster. I’m considering this absolutely solved. I believe I will keep the dynamic type conversion with TypeConverter, it is only a few milliseconds slower. Checking for known invalid types before converting avoids exceptions and that speeds things up incredibly! Thanks to ricovox for pointing that out which made me test this further.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T19:54:51+00:00Added an answer on June 15, 2026 at 7:54 pm

    if you are primarily going to be converting the strings to the native data types (string, int, bool, DateTime etc) you could use something like the code below, which caches the TypeCodes and TypeConverters (for non-native types) and uses a fast switch statement to quickly jump to the appropriate parsing routine. This should save some time over Convert.ChangeType because the source type (string) is already known, and you can directly call the right parse method.

    /* Get an array of Types for each of your columns.
     * Open the data file for reading.
     * Create your DataTable and add the columns.
     * (You have already done all of these in your earlier processing.)
     * 
     * Note:    For the sake of generality, I've used an IEnumerable<string> 
     * to represent the lines in the file, although for large files,
     * you would use a FileStream or TextReader etc.
    */      
    IList<Type> columnTypes;        //array or list of the Type to use for each column
    IEnumerable<string> fileLines;  //the lines to parse from the file.
    DataTable table;                //the table you'll add the rows to
    
    int colCount = columnTypes.Count;
    var typeCodes = new TypeCode[colCount];
    var converters = new TypeConverter[colCount];
    //Fill up the typeCodes array with the Type.GetTypeCode() of each column type.
    //If the TypeCode is Object, then get a custom converter for that column.
    for(int i = 0; i < colCount; i++) {
        typeCodes[i] = Type.GetTypeCode(columnTypes[i]);
        if (typeCodes[i] == TypeCode.Object)
            converters[i] = TypeDescriptor.GetConverter(columnTypes[i]);
    }
    
    //Probably faster to build up an array of objects and insert them into the row all at once.
    object[] vals = new object[colCount];
    object val;
    foreach(string line in fileLines) {
        //delineate the line into columns, however you see fit. I'll assume a tab character.
        var columns = line.Split('\t');
        for(int i = 0; i < colCount) {
            switch(typeCodes[i]) {
                case TypeCode.String:
                    val = columns[i]; break;
                case TypeCode.Int32:
                    val = int.Parse(columns[i]); break;
                case TypeCode.DateTime:
                    val = DateTime.Parse(columns[i]); break;
                //...list types that you expect to encounter often.
    
                //finally, deal with other objects
                case TypeCode.Object:
                default:
                    val = converters[i].ConvertFromString(columns[i]);
                    break;
            }
            vals[i] = val;
        }
        //Add all values to the row at one time. 
        //This might be faster than adding each column one at a time.
        //There are two ways to do this:
        var row = table.Rows.Add(vals); //create new row on the fly.
        // OR 
        row.ItemArray = vals; //(e.g. allows setting existing row, created previously)
    }
    

    There really ISN’T any other way that would be faster, because we’re basically just using the raw string parsing methods defined by the types themselves. You could re-write your own parsing code for each output type yourself, making optimizations for the exact formats you’ll encounter. But I assume that is overkill for your project. It would probably be better and faster to simply tailor the FormatProvider or NumberStyles in each case.

    For example let’s say that whenever you parse Double values, you know, based on your proprietary file format, that you won’t encounter any strings that contain exponents etc, and you know that there won’t be any leading or trailing space, etc. So you can clue the parser in to these things with the NumberStyles argument as follows:

    //NOTE:   using System.Globalization;
    var styles = NumberStyles.AllowDecimalPoint | NumberStyles.AllowLeadingSign;
    var d = double.Parse(text, styles);
    

    I don’t know for a fact how the parsing is implemented, but I would think that the NumberStyles argument allows the parsing routine to work faster by excluding various formatting possibilities. Of course, if you can’t make any assumptions about the format of the data, then you won’t be able to make these types of optimizations.

    Of course, there’s always the possibility that your code is slow simply because it takes time to parse a string into a certain data type. Use a performance analyzer (like in VS2010) to try to see where your actual bottleneck is. Then you’ll be able to optimize better, or simply give up, e.g. in the case that there is noting else to do short of writing the parsing routines in assembly 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I have a bunch of posts stored in text files formatted in yaml/textile (from
I have a jquery bug and I've been looking for hours now, I can't
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
For some reason, after submitting a string like this Jack’s Spindle from a text
this is what i have right now Drawing an RSS feed into the php,
I have a French site that I want to parse, but am running into
I have a text area in my form which accepts all possible characters from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.