I need to extract the filename from a text file whereas the output on the text file doesn’t have fonts.
So as you can see from the output file below I need to print out results where they are no fonts after the first results? So only the last result has fonts in this output
Does this make sense – Would Grep, Sed or Awk be the answer
So need a output from the text file below that shows that no fonts are present in that PDf within the **START and **END
******************START***********************
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
/home/user1/Documents/temp1.pdf
******************END***********************
******************START***********************
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
/home/user1/Documents/temp2.pdf
******************END***********************
******************START***********************
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
BAAAAA+TimesNewRomanPS-BoldMT TrueType yes yes yes 14 0
CAAAAA+TimesNewRomanPSMT TrueType yes yes yes 9 0
/home/user3/Documents/temp file.pdf
******************END***********************
This prints any line containing “.pdf” if the previous line starts with
-.It is not a generic solution, but will work with the input data you’ve given. I can imagine several edge cases where this might fail but it’s all down to the specifications of your input file.
Update
(Based on the script you’ve posted in the comments below) If what you’re trying to do is simply to identify PDF files that have no embedded fonts, this might work:
Here’s a breakdown of the script:
Detecting embedded font count. Would have been simple if
pdffontsreturned a specific value if no fonts were embedded but that is not so. We therefore count the number of output lines and deduct 2 (header lines) to determine the number of embedded fontsbash function exported so it can be used in subshell.
Locate pdf files and only print out name if PDF valid and has no fonts
If you prefer a one-line, the whole script can be written as: