Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8014223
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T19:52:00+00:00 2026-06-04T19:52:00+00:00

What here I need to do with the exception while it catches, is I

  • 0

What here I need to do with the exception while it catches, is I need to move those PDF to the folder that I specified in my code ‘fail folder’.

package extractInfoFromPDF;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.itextpdf.text.exceptions.InvalidPdfException;

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;

public class Test {
    static FileWriter output = null;
    public static void main(String[] args) throws IOException {



        File file = new File("c:/write.txt");
        output = new FileWriter(file);

        PdfReader pdfArticle = null;

        Pattern pattern = Pattern.compile("\\b(10[.][0-9]{4,}(?:[.][0-9]+)*/(?:(?![\"&\\'<>])\\S)+)\\b", Pattern.CASE_INSENSITIVE);

        File ArticleFolder = new File("D:\\AI\\failed1");
        File[] listOfArticles = ArticleFolder.listFiles();
        int count = 0;

        StringBuffer s = null;

        for (File article : listOfArticles) {

            if(!article.getName().contains("article.fulltext.000001")){
                continue;
            }

                pdfArticle = new PdfReader(article.getAbsolutePath());
try{
                s = new StringBuffer(PdfTextExtractor.getTextFromPage(pdfArticle, 1));
} catch (InvalidPdfException|StringIndexOutOfBoundsException|ArrayIndexOutOfBoundsException  e) {

    copyFile(article, new File ("D:\\AI\\fail"));
    delete(article);

}

            // System.out.println(s);
            Matcher m = pattern.matcher(s);
            String DOI = null;
            if (m.find()) {
                DOI = m.group();

            }
            if (DOI == null) {
                Pattern pattern2 = Pattern.compile("(DOI:).*", Pattern.CASE_INSENSITIVE);
                Matcher m2 = pattern2.matcher(s);

                if (m2.find()) {
                    DOI = m2.group();
                    DOI=DOI.replaceAll("\\s+", "");
                    m = pattern.matcher(DOI);
                    if (m.find()) {
                        DOI = m.group();

                    }else{
                        DOI = "DOI-NOT-AVALIABLE";
                    }

                }else{
                    DOI = "DOI-NOT-AVALIABLE";
                }

            }
            count = count + 1;
            String d[]=DOI.split(" ");

            for(String d2 : d){
                if(d2.contains("10.")){
                    DOI=d2;
                }
            }

            DOI = DOI.replaceAll("(DOI:)(doi:)(\\s+)([\\.,;)]])", "").trim();
            System.out.println(count + "    TAN: " + article.getName() + "      "
            + DOI);
//if(DOI.matches(""[A-Z-a-z-0-7]"))

            output.write(count + "  TAN: " + article.getName() + "      " + DOI+"\n");

            // FileUtils.writeStringToFile(new File("write.txt"), count++
            // +"   TAN: "+article.getName()+"      "+DOI, "UTF-8");

        }

        output.close();


    }


    public static void copyFile(File source, File dest) throws IOException{

        if(!dest.exists()){

        dest.createNewFile();

        }

        InputStream in = null;

        OutputStream out = null;

        try{

        in = new FileInputStream(source);

        out = new FileOutputStream(dest);

        byte[] buf = new byte[1024];

        int len;

        while((len = in.read(buf)) > 0){

        out.write(buf, 0, len);

        }

        }

        finally{

        in.close();

        out.close();

        }
    }

    public static boolean delete(File resource) throws IOException{ 

        if(resource.isDirectory()){

        File[] childFiles = resource.listFiles();

        for(File child : childFiles){

        delete(child);

        }

        }

        return resource.delete();

        }


}

This is my full code and below is what the specific line where I get the exception.

s = new StringBuffer(PdfTextExtractor.getTextFromPage(pdfArticle, 1));

Where I get String index out of range (-1 or some thime 0) for few 100s for PDF from 1000s and 1000s of PDF. While I Google it, there is no solution for this. below is the exception while I get from iText. and not from my code. Where for some pdf I also get ArrayIndexOutOfBoundsException(some time 397 or some time 286 or some other 3 digit number like the same) for the same line in my code(PdfTextExtractor.getTextFromPage).

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.charAt(String.java:695)
    at com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy.getResultantText(LocationTextExtractionStrategy.java:121)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:73)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:88)
    at extractInfoFromPDF.Test.main(Test.java:41)

for ArrayIndexOutOfBoundsException I another PDF I get this exception

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 397
    at com.itextpdf.text.pdf.CMapAwareDocumentFont.getWidth(CMapAwareDocumentFont.java:182)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getStringWidth(TextRenderInfo.java:210)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getUnscaledWidth(TextRenderInfo.java:113)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getUnscaledBaselineWithOffset(TextRenderInfo.java:147)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getBaseline(TextRenderInfo.java:122)
    at com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy.renderText(LocationTextExtractionStrategy.java:154)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.displayPdfString(PdfContentStreamProcessor.java:303)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$2500(PdfContentStreamProcessor.java:74)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$ShowText.invoke(PdfContentStreamProcessor.java:496)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:246)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:366)
    at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:79)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:73)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:88)
    at extractInfoFromPDF.Test.main(Test.java:41)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T19:52:02+00:00Added an answer on June 4, 2026 at 7:52 pm

    Upon all the try I found out the issue of getting exception. It is because of the iText API bug in its version 5.1 and when I rebuilt my application with the latest version 5.2 I get no exception and every thing works fine 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm stuck here: I need to get the values of org.jboss.system.server.ServerInfo With the code
Here are the 2 methods and the first error is complaining that I need
Here is the code snippet. read = new FileReader(trainfiles/+filenames[i]); br = new BufferedReader(read); while((lines
I need to catch exception out side the Thread which is occurred while Thread
I have a function that loops while doing something that could throw an exception.
I need to export big amount of data from database. Here is classes that
I'm converting an existing program to C++ and here need to manipulate Sybase timestamps.
Here I need to call a javascript function first and after some time I
Here i need to check a button click using javascript i.e)if button A is
here i need a batch file which can apply and create label or base

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.