In Adobe Acrobat 9, how do I apply a regex to search the text of a pdf and/or index of a series of pdfs?
There are 200 or so keywords that I need to search, and I could do it manually through each index, but I’ll have to do this several times for a lot of indexs/pdfs and want to automate as much as possible.
It’s easy enough to search the text of a pdf from the JavaScript console, say for the word ‘the’:
search.query("the","ActiveDoc");
And having a regex interact with a string you’ve written in the console is no problem either:
var string="I hope this works9867"
var regex=/\d/
if (regex.test(string))
{app.alert("win",2)
}
But I can’t get a regex to apply to the OCR-ed text of a pdf and have found no guides on how to do so thus far. It seemed logical that either
var regex=/\d/
search.query(regex,"ActiveDoc");
or some close variant on
search.query(/\d/,"ActiveDoc");
would work, but no dice. Is there a way to do this? Ideally the method would work for indexes and pdfs alike.
You cant use regular expressions with
search.query.There are two ways you can make searching easier:
Method #1: Put everything you want to search for in an array and pass that to
search.query.You could also change the way you want to search by doing something like this:
For more examples of how to configure
search.query, refer to the Adobe Javascript API Reference.Method #2: Extract the text out of the PDF document and perform a regex search on the string.
The following code loops through the entire document and makes a string of the words on each page and then searches for “Hello” inside the string.