Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8211299
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T10:22:56+00:00 2026-06-07T10:22:56+00:00

According to this site http://www.searchable-pdf.com/content.php?lang=en&c=61 , a PDF can be searchable when a text

  • 0

According to this site http://www.searchable-pdf.com/content.php?lang=en&c=61, a PDF can be searchable when a text layer is added.

I was looking for the technical specification of a PDF. I think text can be stored in 2 ways into a PDF:
a) as a text layer above the image layer (as described in the webpage above)
b) when you create a PDF from a Word document (with text), I don’t think Word will store all the text in the text layer. I think it will store it in the image layer? Right?

Since PDF 1.4, XMP has been added (http://en.wikipedia.org/wiki/Extensible_Metadata_Platform). But what is XMP? Is this the “text layer” which I discussed above?

If a scanner is performing OCR on an image, is it storing the text in the “text layer”? Or the “XMP” field? This can only be when a PDF is of version 1.4?

And how can I detect if a PDF already has text data? For example: PDF A has been scanned with OCR and PDF B has not. How can I know that PDF B should be sent to a separate OCR engine?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T10:22:57+00:00Added an answer on June 7, 2026 at 10:22 am

    The PDF specification has no mention of a ‘text layer’. Normally, there is just one way to ‘store’ text: by means of text showing operators. These operators draw text at a specific location, using a specific color, font, font size and text rendering mode. There are several text rendering modes. For the purpose of answering your question, text can be visible or invisible.

    A scanner that performs OCR, renders both the raster image and text to the PDF document. The text is rendered using the invisible text rendering mode. The result is that you can select the text using a mouse (the highlighted area will be shown at the expected location on top of the image) and you can search for text. Again the search result will be shown at the correct location.

    What happens when you generate PDF from a Word document depends on the software that you use to convert. To my knowledge, these converters do not generate an image but they will generate visible text.

    XMP is meta data as opposed to visual data.

    Finally, with respect to your question about detecting whether a PDF has text data, here is a similar question (10k only).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

according to this site http://www.cplusplus.com/reference/std/functional/unary_function/ this code should work #include <iostream> #include <functional> using
the specific example i am referring to is this site http://www.iamextreme.net/category.php?CategoryId=1&SubCategoryId=0&SortBy=name notice the sort
I am using JQuery Accordion. I have this page here: http://www.hauppauge.com/site/support/support_colossus.html#tabs-6 What happens is
There is a web site http://www.pringit.com i want to receive text messages from mobile
According to the site, http://www.dba-oracle.com/t_nls_lang.htm Problem might occur even if both the database and
http://www.neilstuff.com/guide_to_cpp/notes/Multi%20Dimension%20Arrays%20and%20Pointer%20Pointers.htm According to this site, I should be able to use the following code:
According to Herb Sutter the code below wouldn't compile. See this site http://www.gotw.ca/gotw/066.htm from
So according to the link here: http://www.cplusplus.com/reference/algorithm/max_element/ , the max_element function is O(n), apparently
http://www.hccp.org/java-net-cookie-how-to.html According to this link I was trying to create cookie and send cookie
I asked another question poorly so i'll ask something else. According to http://www.c-point.com/javascript_tutorial/special_characters.htm there

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.