What here I need to do with the exception while it catches, is I

Question

0

Asked: June 4, 20262026-06-04T19:52:00+00:00 2026-06-04T19:52:00+00:00

What here I need to do with the exception while it catches, is I

0

What here I need to do with the exception while it catches, is I need to move those PDF to the folder that I specified in my code ‘fail folder’.

package extractInfoFromPDF;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.itextpdf.text.exceptions.InvalidPdfException;

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;

public class Test {
    static FileWriter output = null;
    public static void main(String[] args) throws IOException {



        File file = new File("c:/write.txt");
        output = new FileWriter(file);

        PdfReader pdfArticle = null;

        Pattern pattern = Pattern.compile("\\b(10[.][0-9]{4,}(?:[.][0-9]+)*/(?:(?![\"&\\'<>])\\S)+)\\b", Pattern.CASE_INSENSITIVE);

        File ArticleFolder = new File("D:\\AI\\failed1");
        File[] listOfArticles = ArticleFolder.listFiles();
        int count = 0;

        StringBuffer s = null;

        for (File article : listOfArticles) {

            if(!article.getName().contains("article.fulltext.000001")){
                continue;
            }

                pdfArticle = new PdfReader(article.getAbsolutePath());
try{
                s = new StringBuffer(PdfTextExtractor.getTextFromPage(pdfArticle, 1));
} catch (InvalidPdfException|StringIndexOutOfBoundsException|ArrayIndexOutOfBoundsException  e) {

    copyFile(article, new File ("D:\\AI\\fail"));
    delete(article);

}

            // System.out.println(s);
            Matcher m = pattern.matcher(s);
            String DOI = null;
            if (m.find()) {
                DOI = m.group();

            }
            if (DOI == null) {
                Pattern pattern2 = Pattern.compile("(DOI:).*", Pattern.CASE_INSENSITIVE);
                Matcher m2 = pattern2.matcher(s);

                if (m2.find()) {
                    DOI = m2.group();
                    DOI=DOI.replaceAll("\\s+", "");
                    m = pattern.matcher(DOI);
                    if (m.find()) {
                        DOI = m.group();

                    }else{
                        DOI = "DOI-NOT-AVALIABLE";
                    }

                }else{
                    DOI = "DOI-NOT-AVALIABLE";
                }

            }
            count = count + 1;
            String d[]=DOI.split(" ");

            for(String d2 : d){
                if(d2.contains("10.")){
                    DOI=d2;
                }
            }

            DOI = DOI.replaceAll("(DOI:)(doi:)(\\s+)([\\.,;)]])", "").trim();
            System.out.println(count + "    TAN: " + article.getName() + "      "
            + DOI);
//if(DOI.matches(""[A-Z-a-z-0-7]"))

            output.write(count + "  TAN: " + article.getName() + "      " + DOI+"\n");

            // FileUtils.writeStringToFile(new File("write.txt"), count++
            // +"   TAN: "+article.getName()+"      "+DOI, "UTF-8");

        }

        output.close();


    }


    public static void copyFile(File source, File dest) throws IOException{

        if(!dest.exists()){

        dest.createNewFile();

        }

        InputStream in = null;

        OutputStream out = null;

        try{

        in = new FileInputStream(source);

        out = new FileOutputStream(dest);

        byte[] buf = new byte[1024];

        int len;

        while((len = in.read(buf)) > 0){

        out.write(buf, 0, len);

        }

        }

        finally{

        in.close();

        out.close();

        }
    }

    public static boolean delete(File resource) throws IOException{ 

        if(resource.isDirectory()){

        File[] childFiles = resource.listFiles();

        for(File child : childFiles){

        delete(child);

        }

        }

        return resource.delete();

        }


}

This is my full code and below is what the specific line where I get the exception.

s = new StringBuffer(PdfTextExtractor.getTextFromPage(pdfArticle, 1));

Where I get String index out of range (-1 or some thime 0) for few 100s for PDF from 1000s and 1000s of PDF. While I Google it, there is no solution for this. below is the exception while I get from iText. and not from my code. Where for some pdf I also get ArrayIndexOutOfBoundsException(some time 397 or some time 286 or some other 3 digit number like the same) for the same line in my code(PdfTextExtractor.getTextFromPage).

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.charAt(String.java:695)
    at com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy.getResultantText(LocationTextExtractionStrategy.java:121)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:73)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:88)
    at extractInfoFromPDF.Test.main(Test.java:41)

for ArrayIndexOutOfBoundsException I another PDF I get this exception

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 397
    at com.itextpdf.text.pdf.CMapAwareDocumentFont.getWidth(CMapAwareDocumentFont.java:182)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getStringWidth(TextRenderInfo.java:210)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getUnscaledWidth(TextRenderInfo.java:113)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getUnscaledBaselineWithOffset(TextRenderInfo.java:147)
    at com.itextpdf.text.pdf.parser.TextRenderInfo.getBaseline(TextRenderInfo.java:122)
    at com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy.renderText(LocationTextExtractionStrategy.java:154)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.displayPdfString(PdfContentStreamProcessor.java:303)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$2500(PdfContentStreamProcessor.java:74)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$ShowText.invoke(PdfContentStreamProcessor.java:496)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:246)
    at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:366)
    at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:79)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:73)
    at com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:88)
    at extractInfoFromPDF.Test.main(Test.java:41)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T19:52:02+00:00

Editorial Team

2026-06-04T19:52:02+00:00Added an answer on June 4, 2026 at 7:52 pm

Upon all the try I found out the issue of getting exception. It is because of the iText API bug in its version 5.1 and when I rebuilt my application with the latest version 5.2 I get no exception and every thing works fine 🙂

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What here I need to do with the exception while it catches, is I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply