Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7960639
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T04:48:46+00:00 2026-06-04T04:48:46+00:00

I have been searching for this in forums and on stackoverflow; it must be

  • 0

I have been searching for this in forums and on stackoverflow; it must be here somewhere but I couldn’t find it.
I’m on a Mac, using the terminal to run a shell script to rename some pdf files based on file content.

I have a directory full of pdfs that I’m exporting to text files using the opensource pdfbox. The resulting files have the same name as the pdf file but end in .txt. I created the text files so that I could find a string inside the file with the format Page xx Question xx; for example Page 43 Question 2. Given this example, I would like to rename the pdf file as pg43_q2.pdf

I think the regular expression I want is this:
/Page\s+(\d+)Question\s+(\d+)
but I’m not sure how to read the two captured numbers and save them into a string that I can use as a filename.

The script I have so far is:

#!/bin/sh
PDF_FILE_PATH=$1
echo "Converting pdfs at $PDF_FILE_PATH"

find "$PDF_FILE_PATH" -name '*.pdf' -print0 | while IFS= read -r -d '' filename; do
   echo $filename
   java -jar pdfbox-app-1.6.0.jar ExtractText "$filename" "$filename.txt"
   NEWNAME=$(sed -n -e '/Page/s/Page\s+\(\d+\)\s+Question\s+\(\d+\).*$/pg\1_q\2/p' "$filename.txt")
   echo "Renaming pdf $filename to $NEWNAME"
   # I would do this next but the $NEWNAME is empty
   # mv "filename" "PDF_FILE_PATH$NEWNAME"
done

… but the sed command is not putting anything into the NEWNAME variable.

I’m not particularly attached to sed, any suggestions would be appreciated

Latest edit to script uses the following sed command:

newname=$(sed -nE -e '/Page/s/^.*Page[[:blank:]]+([0-9]+)[[:blank:]]+Question[[:blank:]]+([0-9]+).*$/pg\1_q\2.pdf/p' "$filename.txt")

This works about 50% of the time, but the rest of the time the newname variable is empty when I go to rename the file.

The third line of a converted file that does work:

Unit 2 Review Page 257 Question 9  a)  12 (2)(2)(3)

The third line of a converted file that doesn’t work:

Unit 2 Review Page 258 Question 16  a)  (a – 4)(a + 7) = a(a + 7) – 4(a + 7)                             = a2 + 7a – 4a – 28                              = a2 + 3a – 28   b)  (2x + 3)(5x + 2) = 2x(5x + 2) + 3(5x + 2)                                 = 10x2 + 4x + 15x + 6                                 = 10x2 + 19x + 6  c)  (–x + 5)(x + 5) = –x(x + 5) + 5(x + 5)                              = –x2 – 5x + 5x + 25                              = –x2 + 25  d)  (3y + 4)2 = (3y + 4)(3y + 4)                     = 3y(3y + 4) + 4(3y + 4)                     = 9y2 + 12y + 12y + 16                     = 9y2 + 24y + 16  e)  (a – 3b)(4a – b) = a(4a – b) – 3b(4a – b)                                = 4a2 – ab – 12ab + 3b2                                = 4a2 – 13ab + 3b2  f)  (v – 1)(2v2 – 4v – 9) = v(2v2 – 4v – 9) – 1(2v2 – 4v – 9)                                      = 2v3 – 4v2 – 9v – 2v2 + 4v + 9                                      = 2v3 – 6v2 – 5v + 9
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T04:48:48+00:00Added an answer on June 4, 2026 at 4:48 am

    Removed unhelpful original answer

    echo 'Unit 2 Review Page 257 Question 9  a)  12 (2)(2)(3)'\
    | sed -n '/Page/{s/.*Page[ ][ ]*\([0-9][0-9]*\)[ ][ ]*Question[ ][ ]*\([0-9][0-9]*\).*$/pg\1_q\2/;p;q;}'
    

    output

    pg257_q9
    
    echo 'Unit 2 Review Page 258 Question 16  a)  (a  4)(a + 7) = a(a + 7)  4(a + 7)'\
    | sed -n '/Page/{s/.*Page[ ][ ]*\([0-9][0-9]*\)[ ][ ]*Question[ ][ ]*\([0-9][0-9]*\).*$/pg\1_q\2/;p;q;}'
    

    output

    pg258_q16
    

    Otherwise, you had it right!

    (Note that the sed processing is the same for both cases).

    I’ve included a trailing ;p;q}, and an initial { so the sed script will just process the line with ‘Page’ and then quit.

    I’ve expanded the posix char classes to the basic terms, ie [[:digit:]] = [0-9], and replaced the +, with a repetition of the intitial char class followed by the ‘zero-or-more’ char ‘*’, making [0-9][0-9]*. My personal experience, having learned sed on Sun 3 from OReilly’s 2nd edition Sed and Awk (with the comb-binding!), is that all the posix stuff is a distraction and a further source of errors. I’m clearly in the minority on this here on S.O ;-), but I’m willing to admit that newer seds have some great features and in any case …..

    I hope this helps.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been searching this for quite sometime but could not find an appropriate
I have been searching and this problem seems simple but cannot find answer. I
I have been searching for this for quite a while and couldn't find a
I have been searching on this but it is surprisingly hard to come by
I have been searching around for an answer to this but I can't seem
I have been searching round the internet for days on this matter but I
Im new to .NET and have been searching on this issue but no luck.
I have been searching for info on this to no avail. The context of
I have been searching for a reason for this behavior in my code for
I have been searching the internet wildly for an answer to this question. We

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.