Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6581135
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T16:06:51+00:00 2026-05-25T16:06:51+00:00

I’m trying to extract some textual data from a PDF file. To do this,

  • 0

I’m trying to extract some textual data from a PDF file. To do this, I need a sense of where some text is printed on the page, so I can correlate locations of different pieces of data. However, I’m getting stuck because I don’t fully understand the behavior of the text matrix set by the Tm operator.

Tm (0.0, -5.28, 5.28, 0.0, 429.7006, 803.9603)
rg (0.617, 0.098, 0.043)
Tj '\x01'
Tm (0.0, -9.0, 9.0, 0.0, 428.1406, 784.8203)
rg (0.0, 0.219, 0.512)
Tc (2.4756,)
Tj '4567'

This is some of the stream content. As you can see, it has two Tm calls, closely together. All the normal text is printed in the Tm (0.0, -9.0, 9.0, 0.0) space — it appears like the -5.28/5.28 space is just used to print some special characters. Now, I know that the latter two parameters to Tm are used to set the current location to a new one, but it appears these numbers are dependent on more context (probably the 5.28 and 9.0 scales, somehow). I can’t seem to figure out how all this fits together, though, and the spec (page 250 has the Tm “explanation”) seems spectacularly unhelpful to me.

EDIT: extended example, why this has me flummoxed:

Tm 0 -27 27 0 545.5606 817.2203
(rg, Tc, Tw, Tj, Tf omitted)
TD 0.0156 -1.2556
Tm 0 -9 9 0 441.9406 677.4803
TD 10.6733 0 # more omitted, including other TD ops with second param 0
TD -82.7267 -1.5333 # start of a new line
Tc 0
Tj (3)
Tf /F2 1
Tm 0 -5.28 5.28 0 429.7006 803.9603
Tj ()
Tf /TT2 1
Tm 0 -9 9 0 428.1406 784.8203
Tc 2.4756
Tj (4567) # these appear on the same line as before the double Tm

In my initial code I assumed that the e and f parameters to Tm and the parameters to TD were in the same space, leading to organized coordinates. However, that fails here: the 4567 in the last Tj shows up in the same line as the earlier 3, while the y coordinate has gone from 677.4803 + -1.5333 = 675.947, but after the final Tm, the y axis coordinate seems to be set to 784.8203; suggesting that “4567” should be drawn above the 3.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T16:06:52+00:00Added an answer on May 25, 2026 at 4:06 pm

    The text matrix is combined with the current transformation matrix in order to set the text position. Your text is placed at (429.7006, 803.9603) and at (428.1406, 784.8203). The text size is 5.28 and 9 points. It is a common technique to set the font size to 1 using the Tf operator and set the actual font size by scaling the text matrix. Your text is also rotated.
    A correct calculation of text position requires to parse the entire content stream and execute all q, Q, cm, Tf, Tm and all the other text related operators.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For some reason, after submitting a string like this Jack’s Spindle from a text
I have some data like this: 1 2 3 4 5 9 2 6
Basically, what I'm trying to create is a page of div tags, each has
I have just tried to save a simple *.rtf file with some websites and
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I have a text area in my form which accepts all possible characters from
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
Does anyone know how can I replace this 2 symbol below from the string
I have a reasonable size flat file database of text documents mostly saved in
I have a bunch of posts stored in text files formatted in yaml/textile (from

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.