I am developing a C# application in which I am converting a PDF document

Question

0

Asked: June 11, 20262026-06-11T19:01:02+00:00 2026-06-11T19:01:02+00:00

I am developing a C# application in which I am converting a PDF document

0

I am developing a C# application in which I am converting a PDF document to an image and then rendering that image in a custom viewer.

I’ve come across a bit of a brick wall when trying to search for specific words in the generated image and I was wondering what the best way to go about this would be. Should I find the x,y location of searched word?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T19:01:03+00:00

You can use tessract OCR image for text recognition in console mode.

I don’t know about such SDK for pdf.

BUT, if you want to get all word coordinates and values, you can use next my not complex code, thank nguyenq for hocr hint:

public void Recognize(Bitmap bitmap)
{
    bitmap.Save("temp.png", ImageFormat.Png);
    var startInfo = new ProcessStartInfo("tesseract.exe", "temp.png temp hocr");
    startInfo.WindowStyle = ProcessWindowStyle.Hidden;
    var process = Process.Start(startInfo);
    process.WaitForExit();

    GetWords(File.ReadAllText("temp.html"));

    // Futher actions with words
}

public Dictionary<Rectangle, string> GetWords(string tesseractHtml)
{
    var xml = XDocument.Parse(tesseractHtml);

    var rectsWords = new Dictionary<System.Drawing.Rectangle, string>();

    var ocr_words = xml.Descendants("span").Where(element => element.Attribute("class").Value == "ocr_word").ToList();
    foreach (var ocr_word in ocr_words)
    {
        var strs = ocr_word.Attribute("title").Value.Split(' ');
        int left = int.Parse(strs[1]);
        int top = int.Parse(strs[2]);
        int width = int.Parse(strs[3]) - left + 1;
        int height = int.Parse(strs[4]) - top + 1;
        rectsWords.Add(new Rectangle(left, top, width, height), ocr_word.Value);
    }

    return rectsWords;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am developing a C# application in which I am converting a PDF document

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply