Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6807253
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T19:48:33+00:00 2026-05-26T19:48:33+00:00

I working on an application for processing document images (mainly invoices) and basically, I’d

  • 0

I working on an application for processing document images (mainly invoices) and basically, I’d like to convert certain regions of interest into an XML-structure and then classify the document based on that data. Currently I am using ImageJ for analyzing the document image and Asprise/tesseract for OCR.

Now I am looking for something to make developing easier. Specifically, I am looking for something to automatically deskew a document image and analyze the document structure (e.g. converting an image into a quadtree structure for easier processing). Although I prefer Java and ImageJ I am interested in any libraries/code/papers regardless of the programming language it’s written in.

While the system I am working on should as far as possible process data automatically, the user should oversee the results and, if necessary, correct the classification suggested by the system. Therefore I am interested in using machine learning techniques to achieve more reliable results. When similar documents are processed, e.g. invoices of a specific company, its structure is usually the same. When the user has previously corrected data of documents from a company, these corrections should be considered in the future. I have only limited knowledge of machine learning techniques and would like to know how I could realize my idea.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T19:48:33+00:00Added an answer on May 26, 2026 at 7:48 pm

    The following prototype in Mathematica finds the coordinates of blocks of text and performs OCR within each block. You may need to adapt the parameters values to fit the dimensions of your actual images. I do not address the machine learning part of the question; perhaps you would not even need it for this application.

    Import the picture, create a binary mask for the printed parts, and enlarge these parts using an horizontal closing (dilation and erosion).

    enter image description here

    Query for each blob’s orientation, cluster the orientations, and determine the overall rotation by averaging the orientations of the largest cluster.

    enter image description here

    Use the previous angle to straighten the image. At this time OCR is possible, but you would lose the spatial information for the blocks of text, which will make the post-processing much more difficult than it needs to be. Instead, find blobs of text by horizontal closing.

    enter image description here

    For each connected component, query for the bounding box position and the centroid position. Use the bounding box positions to extract the corresponding image patch and perform OCR on the patch.

    enter image description here

    At this point, you have a list of strings and their spatial positions. That’s not XML yet, but it sounds like a good starting point to be tailored straightforwardly to your needs.

    This is the code. Again, the parameters (structuring elements) of the morphological functions may need to change, based on the scale of your actual images; also, if the invoice is too tilted, you may need to “rotate” roughly the structuring elements in order to still achieve good “un-skewing.”

    img = ColorConvert[Import@"http://www.team-bhp.com/forum/attachments/test-drives-initial-ownership-reports/490952d1296308008-laura-tsi-initial-ownership-experience-img023.jpg", "Grayscale"];
    b = ColorNegate@Binarize[img];
    mask = Closing[b, BoxMatrix[{2, 20}]]
    orientations = ComponentMeasurements[mask, "Orientation"];
    angles = FindClusters@orientations[[All, 2]]
    \[Theta] = Mean[angles[[1]]]
    straight = ColorNegate@Binarize[ImageRotate[img, \[Pi] - \[Theta], Background -> 1]]
    TextRecognize[straight]
    boxes = Closing[straight, BoxMatrix[{1, 20}]]
    comp = MorphologicalComponents[boxes];
    measurements = ComponentMeasurements[{comp, straight}, {"BoundingBox", "Centroid"}];
    texts = TextRecognize@ImageTrim[straight, #] & /@ measurements[[All, 2, 1]];
    Cases[Thread[measurements[[All, 2, 2]] -> texts], (_ -> t_) /; StringLength[t] > 0] // TableForm
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to do some processing of images in a WPF application. However,
I'm working on an application that does processing at what I'd call fairly high
We are working on a video processing application using EmguCV and recently had to
I'm working on an image processing application where I have two threads on top
I 'm trying to develop an Image Processing application for the images stored at
I'm working on a graphical application which looks something like this: while (Simulator.simulating) {
I am currently working on batch processing application using MSMQ in C#. In the
I am working on an application that works like this: It fetches data from
I am currently working on a web application that requires certain requests by users
I am currently working on a distributed processing application written in Python that utilises

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.