Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8729437
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T08:49:59+00:00 2026-06-13T08:49:59+00:00

I have a flow set up to recognize when a file is dropped into

  • 0

I have a flow set up to recognize when a file is dropped into a directory. Next I need to run a Bash script that processes the file (fairly intensive processing). The script grabs a PDF, creates a temporary directory, breaks the PDF into separate PNG files, runs an OCR processor against each image, converts the result to single-page PDFs, then merges all of the PDFs into a single multi-page PDF with the text layer from the OCR.

The problem is, the Bash script chokes after 10 concurrent transformations are triggered. Right now I have Mule ESB listening for new files, then triggering the script for each file, passing the appropriate parameters. Unfortunately, Mule has two tasks, listen -> trigger. We are going to have over 200 files in that directory that need to be queued for processing, preferably 5 at a time. How do I get Mule to limit the number of concurrent processes triggered?

Below is my initial draft Flow:

<?xml version="1.0" encoding="UTF-8"?>

<mule xmlns:cxf="http://www.mulesoft.org/schema/mule/cxf" xmlns:scripting="http://www.mulesoft.org/schema/mule/scripting" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns:file="http://www.mulesoft.org/schema/mule/file" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation" xmlns:spring="http://www.springframework.org/schema/beans" version="CE-3.3.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
http://www.mulesoft.org/schema/mule/file http://www.mulesoft.org/schema/mule/file/current/mule-file.xsd 
http://www.mulesoft.org/schema/mule/scripting http://www.mulesoft.org/schema/mule/scripting/current/mule-scripting.xsd 
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd 
http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd 
http://www.mulesoft.org/schema/mule/cxf http://www.mulesoft.org/schema/mule/cxf/current/mule-cxf.xsd 
http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd ">
    <configuration>
    <default-threading-profile doThreading="false"/>
  </configuration>

    <queued-asynchronous-processing-strategy name="limitThreads" maxThreads="2"/>

    <flow name="Poll_DirectoryFlow1" doc:name="Poll_DirectoryFlow1" processingStrategy="limitThreads">
        <file:inbound-endpoint path="/home/administrator/Downloads/Input" responseTimeout="10000" doc:name="File" pollingFrequency="5000" fileAge="5000">

        </file:inbound-endpoint>
        <scripting:component doc:name="Script">
            <scripting:script engine="Groovy">
                <property key="originalFilename" value="#[header:originalFilename]"/>
                <scripting:text><![CDATA[def filename = message.getInboundProperty('originalFilename')
                                                        println "$filename"
                                                        def directory = message.getInboundProperty('directory')
                                                        println "$directory"
                                                        "mkdir processed".execute()
                                                        def command = ["/home/administrator/ocr.sh", "$directory/$filename", "/home/administrator/Downloads/Output/$filename"]
                                                        println "$command"
                                                        def proc = "pwd".execute()
                                                        command.execute()
                                                        proc.waitFor()
                                                        println "${proc.in.text}"]]></scripting:text>
            </scripting:script>
        </scripting:component>
        <echo-component doc:name="Echo"/>        
    </flow>
</mule>

Here is the actual Bash script (gives some hints on what tools we are using):

#!/bin/bash

#Setting variables
PARAM=$#
TMPDIR=./split
INFILENAME=${1##*/}
OUTFILENAME=${2##*/}
echo "1 is $1"
echo "2 is $2"
echo "infilename is $INFILENAME"
echo "outfilename is $OUTFILENAME"

#Logging I/O filenames
echo "infile: $1" >> error.log
echo "outfile: $2" >> error.log

#If the temporary directory doesn't exist, make it
if [ ! -d "$TMPDIR" ];
then
    mkdir $TMPDIR
fi

#Check to see that the correct number of params have been passed.
if [[ $PARAM -lt 2 ]];
then
    echo "Usage: $0 source.pdf output.pdf"
    echo "output.pdf is the desired output file"
    echo "source.pdf is a file to be OCR'd"
    exit 1
fi

#Make sure the input file is a PDF
if [ "${1##*.}" == "pdf" ];
then
    multilayer=false

    #Check to see if the input file is a multi-layered pdf with searchable text
        if grep -Fl "Font" "$1"; then multilayer=true; fi

    #If it's not multi-layered, then perform the OCR
    if [ "$multilayer" == "false" ];
    then
        mkdir $TMPDIR/"$INFILENAME/"
        echo "making temporary directory $TMPDIR/$INFILENAME"
        #Split the PDF into pdf's of one page per df in a temporary directory
        pdftk "$1" burst output "$TMPDIR/$INFILENAME/pg_%04d.pdf"
        echo "burse output to $TMPDIR/$INFILENAME/pg_%04d.pdf"
        mv "$1" processed/
        for files in "$TMPDIR/$INFILENAME/"*
            do
            echo "$files"
                    filename=$(basename "$files")
                    filename="${filename%.*}"

            #Convert the pdf page into an image
                    gs -r300 -o "$TMPDIR/$INFILENAME/$filename.jpeg" -sDEVICE=jpeg "$TMPDIR/$INFILENAME/$filename.pdf"

            #Perform the OCR against the image
                    tesseract "$TMPDIR/$INFILENAME/$filename.jpeg" "$TMPDIR/$INFILENAME/$filename" hocr

            #Combine the OCR'd image and OCR'd text into a multi-layer PDF file of that page
                    hocr2pdf -i "$TMPDIR/$INFILENAME/$filename.jpeg" -o "$TMPDIR/$INFILENAME/$filename.pdf" < "$TMPDIR/$INFILENAME/$filename.html"
                    compressed="$filename-compressed.pdf"

            #Compress the multi-layered PDF of the page
                    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$TMPDIR/$INFILENAME/$compressed $TMPDIR/$INFILENAME/$filename.pdf"
                    mv "$TMPDIR/$INFILENAME/$compressed" "$TMPDIR/$INFILENAME/$filename"
            done

        #Concatenate all of the multiline PDF pages into a single PDF file
        pdftk "$TMPDIR/$INFILENAME/"*.pdf cat output "$OUTFILENAME"
        compressed="$OUTFILENAME-compressed.pdf"

        #Compress the multi-layered PDF
        gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$compressed" "$OUTFILENAME"
        mv "$compressed" "$2"
        rm -rf "$TMPDIR/$INFILENAME"
    else
        echo "The input file is multi-layered"
        mv "$1" "$2"
    fi
else
    echo "Please enter a valid input pdf file"
    exit 2
fi
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T08:50:00+00:00Added an answer on June 13, 2026 at 8:50 am

    @genjosanzo…you put me on the right track thinking about the processing strategy. Here is the solution that ended up working:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <mule xmlns:cxf="http://www.mulesoft.org/schema/mule/cxf"
        xmlns:scripting="http://www.mulesoft.org/schema/mule/scripting"
        xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns:file="http://www.mulesoft.org/schema/mule/file"
        xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
        xmlns:spring="http://www.springframework.org/schema/beans" version="CE-3.3.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="
    http://www.mulesoft.org/schema/mule/file http://www.mulesoft.org/schema/mule/file/current/mule-file.xsd 
    http://www.mulesoft.org/schema/mule/scripting http://www.mulesoft.org/schema/mule/scripting/current/mule-scripting.xsd 
    http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd 
    http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd 
    http://www.mulesoft.org/schema/mule/cxf http://www.mulesoft.org/schema/mule/cxf/current/mule-cxf.xsd 
    http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd ">
    
        <queued-asynchronous-processing-strategy
            name="limitThreads" maxThreads="7"
            doc:name="Queued Asynchronous Processing Strategy" />
        <flow name="Poll_DirectoryFlow1" doc:name="Poll_DirectoryFlow1"
            processingStrategy="limitThreads">
            <file:inbound-endpoint path="/home/administrator/Downloads/Input"
                responseTimeout="10000" doc:name="File" pollingFrequency="60000"
                fileAge="5000">
                <file:filename-regex-filter pattern="^.*\.(pdf)$"
                    caseSensitive="false" />
            </file:inbound-endpoint>
            <scripting:component doc:name="Script">
                <scripting:script engine="Groovy">
                    <scripting:text><![CDATA[def filename = message.getInboundProperty('originalFilename')
                    println "$filename"
                    def directory = message.getInboundProperty('directory')
                    println "$directory"
                    "mkdir processed".execute()
                    def command = ["/home/administrator/ocr.sh", "$directory/$filename", "/home/administrator/Downloads/Output/$filename"]
                    println "$command"
                    def cmd = command.execute()
                    cmd.waitFor()
                    println "$filename has completed processing"]]></scripting:text>
                </scripting:script>
            </scripting:component>
            <echo-component doc:name="Echo"/>
        </flow>
    </mule>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a Mule 3.3.0 flow which splits a file into records. I need
I have a need to develop an algorithm that will take a set of
I have large set of flow charts and workflow diagrams. I want to draw
I have a set of input text boxes that activate a calendar for selecting
using MSQL 2005 I have a continuous set of flow measurements (averaged for each
I have set <body> to be absolutely positioned so that it takes up the
I have a class set up that ideally will read the methods of any
I have a dialplan that contains the IVR flow for a number of applications,
I have a lithium app set up that way, so when return($data) is used
I have Visual Studio 2005 (BIDS) set to Check out automatically when a file

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.