Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 880567
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T12:07:11+00:00 2026-05-15T12:07:11+00:00

I’m working on a document management system. An example workflow would be something like

  • 0

I’m working on a document management system. An example workflow would be something like this:

  1. A document is emailed to the system
  2. The system does a number of preparatory actions to the document
  3. Document is presented to a user for further processing
  4. Afterwards, document is sent to Quality Assurance
  5. Afterwards, the system does a number or post-processing actions to the document
  6. Document is considered completely processed and disseminated (e.g. emailed back to whoever emailed the document to the system, etc.)

Since the volume of my input will vary (but will usually be high volume), I am very concerend about scalability.

For example, say the system has already downloaded the email attachments. If the attachments are PDF documents, the system needs to split the PDF into individual pages, then convert each page into multiple size thumbnails, etc. I plan to have a cron job check (say, every minute) to see if there are an PDF documents that need to be processed. Using a flagging system (e.g. “PDF Document Ready to be Processed”), I can check the database for all PDF documents that are flagged to be processed. Once the PDF processing is done, the flag can be updated to say “PDF Processing Done.”

However, since the processing of each PDF document is very time consuming, I am concerned that when the next cron job is executed, that cron job will also try to process the PDFs that the previous cron job is still processing.

A possible solution is to immediately flag the PDF documents with “PDF Document Currently Being Processed.” That way, when the next cron job is executed, it will exclude the ones already being processed.

Thus, each step in the workflow will probably have 3 flags:

  1. PDF Document Ready to be Processed
  2. PDF Document Currently Being Processed
  3. PDF Processing Done

Same for QA:

  1. Document Ready for QA
  2. Document Currently Being QAd
  3. Document QA Done

Is this a good approach? Is there a better approach? Would I have these flags as a single column of the “PDF Document” table in the database? Or should the flags be its own table (e.g. especially if a document can have multiple flags set).

I’d like to solicit suggestions on how to implement such a system.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T12:07:11+00:00Added an answer on May 15, 2026 at 12:07 pm

    To solve your concern about concurrent processing on the same document, you can use many scheduler packages to help you manage this aspect. http://www.quartz-scheduler.org/ is one I’ve used with great success.

    To address your problem, I’d have the 3 states, received, queued, processed (similar to what you suggest).

    I’d have a scheduled recurring job which polls the database, looking for received pdfs, and for each, queue a job to process and mark the pdf as queued. If you ensure this happens in the same transaction, and utilize optimistic locking, there is no risk another job could come along and re-read this as received.

    Quartz uses a thread pool, with may configuration options, and is great for deferred, resource intensive processing (I use it for image thumbnailing in a server setting).

    To take a step back, there are some great workflow packages in the java world which can handle most of what you want to do, including the deferred pdf processing. Take a look at jbpm or drools flow, these are two great, if complex, packages.

    UPDATE: Drools Flow has been merged into JBPM. For this particular problem it may be a bit of “killing a mosquito with a bazooka” situation, but it’s a great workflow package.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I would like to count the length of a string with PHP. The string
For some reason, after submitting a string like this Jack’s Spindle from a text
I would like to run a str_replace or preg_replace which looks for certain words
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
Does anyone know how can I replace this 2 symbol below from the string
I would like my Web page http://www.gmarks.org/math_in_e-mail.txt on my Apache 2.2.14 server to display
I'm working with an upstream system that sometimes sends me text destined for HTML/XML
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.