I wrote a little C# application that indexes a book and executes a boolean

Question

0

Asked: June 5, 20262026-06-05T04:39:02+00:00 2026-06-05T04:39:02+00:00

I wrote a little C# application that indexes a book and executes a boolean

0

I wrote a little C# application that indexes a book and executes a boolean textretrieval algorithm on the index. The class at the end of the post showes the implementation of both, building the index and executing the algorithm on it.

The code is called via a GUI-Button in the following way:

    private void Execute_Click(object sender, EventArgs e)
    {
        Stopwatch s;
        String output = "-----------------------\r\n";

        String sr = algoChoice.SelectedItem != null ? algoChoice.SelectedItem.ToString() : "";

        switch(sr){
            case "Naive search":
                output += "Naive search\r\n";
                algo = NaiveSearch.GetInstance();
                break;
            case "Boolean retrieval":
                output += "boolean retrieval\r\n";
                algo = BooleanRetrieval.GetInstance();
                break;
            default:
                outputTextbox.Text = outputTextbox.Text + "Choose retrieval-algorithm!\r\n";
                return;
        }
        output += algo.BuildIndex("../../DocumentCollection/PilzFuehrer.txt") + "\r\n";

        postIndexMemory = GC.GetTotalMemory(true);

        s = Stopwatch.StartNew();
        output += algo.Start("../../DocumentCollection/PilzFuehrer.txt", new String[] { "Pilz", "blau", "giftig", "Pilze" });
        s.Stop();

        postQueryMemory = GC.GetTotalMemory(true);
        output +=  "\r\nTime elapsed:" + s.ElapsedTicks/(double)Stopwatch.Frequency + "\r\n";

        outputTextbox.Text = output + outputTextbox.Text;
    }

The first execution of Start(…) runs about 700µs, every rerun only takes <10µs.
The application is compiled with Visual Studio 2010 and the default ‘Debug’ buildconfiguration.

I experimentad a lot to find the reason for that including profiling and different implementations but the effect always stays the same.

I’d be hyppy if anyone could give me some new ideas what I shall try or even an explanation.

    class BooleanRetrieval:RetrievalAlgorithm
    {
        protected static RetrievalAlgorithm theInstance;

        List<String> documentCollection;
        Dictionary<String, BitArray> index;

        private BooleanRetrieval()
            : base("BooleanRetrieval")
        {

        }

        public override String BuildIndex(string filepath)
        {
            documentCollection = new List<string>();
            index = new Dictionary<string, BitArray>();

            documentCollection.Add(filepath);

            for(int i=0; i<documentCollection.Count; ++i)
            {
                StreamReader input = new StreamReader(documentCollection[i]);

                var text = Regex.Split(input.ReadToEnd(), @"\W+").Distinct().ToArray();

                foreach (String wordToIndex in text)
                {
                    if (!index.ContainsKey(wordToIndex))
                    {
                        index.Add(wordToIndex, new BitArray(documentCollection.Count, false));
                    }

                    index[wordToIndex][i] = true;
                }
            }

            return "Index " + index.Keys.Count + "words.";            
        }

        public override String Start(String filepath, String[] search)
        {
            BitArray tempDecision = new BitArray(documentCollection.Count, true);

            List<String> res = new List<string>();

            foreach(String searchWord in search)
            {

                if (!index.ContainsKey(searchWord))
                    return "No documents found!";

                tempDecision.And(index[searchWord]);
            }


            for (int i = 0; i < tempDecision.Count; ++i )
            {
                if (tempDecision[i] == true)
                {
                    res.Add(documentCollection[i]);
                }
            }

            return res.Count>0 ? res[0]: "Empty!";
        }

        public static RetrievalAlgorithm GetInstance()
        {
            Contract.Ensures(Contract.Result<RetrievalAlgorithm>() != null, "result is null.");
            if (theInstance == null)
                theInstance = new BooleanRetrieval();

            theInstance.Executions++;
            return theInstance;
        }
    }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T04:39:04+00:00

Cold/warm start of .Net application is usually impacted by JIT time and disk access time to load assemblies.

For application that does a lot of disk IO very first access to data on disk will be much slower than on re-run for the same data due to caching content (also applies to assembly loading) if data is small enough to fit in memory cache for the disk.

First run of the task will be impacted by disk IO for assemblies and data, plus JIT time.
Second run of the same task without restart of application – just reading data from OS memory cache.
Second run of application – reading assemblies from OS memory cache and JIT again.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wrote a little C# application that indexes a book and executes a boolean

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply