Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 880567
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T12:07:11+00:00 2026-05-15T12:07:11+00:00

I’m working on a document management system. An example workflow would be something like

  • 0

I’m working on a document management system. An example workflow would be something like this:

  1. A document is emailed to the system
  2. The system does a number of preparatory actions to the document
  3. Document is presented to a user for further processing
  4. Afterwards, document is sent to Quality Assurance
  5. Afterwards, the system does a number or post-processing actions to the document
  6. Document is considered completely processed and disseminated (e.g. emailed back to whoever emailed the document to the system, etc.)

Since the volume of my input will vary (but will usually be high volume), I am very concerend about scalability.

For example, say the system has already downloaded the email attachments. If the attachments are PDF documents, the system needs to split the PDF into individual pages, then convert each page into multiple size thumbnails, etc. I plan to have a cron job check (say, every minute) to see if there are an PDF documents that need to be processed. Using a flagging system (e.g. “PDF Document Ready to be Processed”), I can check the database for all PDF documents that are flagged to be processed. Once the PDF processing is done, the flag can be updated to say “PDF Processing Done.”

However, since the processing of each PDF document is very time consuming, I am concerned that when the next cron job is executed, that cron job will also try to process the PDFs that the previous cron job is still processing.

A possible solution is to immediately flag the PDF documents with “PDF Document Currently Being Processed.” That way, when the next cron job is executed, it will exclude the ones already being processed.

Thus, each step in the workflow will probably have 3 flags:

  1. PDF Document Ready to be Processed
  2. PDF Document Currently Being Processed
  3. PDF Processing Done

Same for QA:

  1. Document Ready for QA
  2. Document Currently Being QAd
  3. Document QA Done

Is this a good approach? Is there a better approach? Would I have these flags as a single column of the “PDF Document” table in the database? Or should the flags be its own table (e.g. especially if a document can have multiple flags set).

I’d like to solicit suggestions on how to implement such a system.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T12:07:11+00:00Added an answer on May 15, 2026 at 12:07 pm

    To solve your concern about concurrent processing on the same document, you can use many scheduler packages to help you manage this aspect. http://www.quartz-scheduler.org/ is one I’ve used with great success.

    To address your problem, I’d have the 3 states, received, queued, processed (similar to what you suggest).

    I’d have a scheduled recurring job which polls the database, looking for received pdfs, and for each, queue a job to process and mark the pdf as queued. If you ensure this happens in the same transaction, and utilize optimistic locking, there is no risk another job could come along and re-read this as received.

    Quartz uses a thread pool, with may configuration options, and is great for deferred, resource intensive processing (I use it for image thumbnailing in a server setting).

    To take a step back, there are some great workflow packages in the java world which can handle most of what you want to do, including the deferred pdf processing. Take a look at jbpm or drools flow, these are two great, if complex, packages.

    UPDATE: Drools Flow has been merged into JBPM. For this particular problem it may be a bit of “killing a mosquito with a bazooka” situation, but it’s a great workflow package.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 478k
  • Answers 478k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Either way, it's up to you. You'll get the needed… May 16, 2026 at 5:32 am
  • Editorial Team
    Editorial Team added an answer I'm not sure if it can be simplified further, but… May 16, 2026 at 5:32 am
  • Editorial Team
    Editorial Team added an answer Phil Haack blogged about this a while ago. Basically, he… May 16, 2026 at 5:32 am

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.