I use the following to read a PDF file and get text strings of a page:
my $pdf = CAM::PDF->new($pdf_file); my $pagetree = $pdf->getPageContentTree($page_no); # Get all text strings of the page # MyRenderer is a separate package which implements getTextBlocks and # renderText methods my @text = $pagetree->traverse('MyRenderer')->getTextBlocks;
Now, @text has all the text strings and start x,y of each text string.
How can I get the width (and possibly the height) of each string?
MyRenderer package is as follows:
package MyRenderer; use base 'CAM::PDF::GS'; sub new { my ($pkg, @args) = @_; my $self = $pkg->SUPER::new(@args); $self->{refs}->{text} = []; return $self; } sub getTextBlocks { my ($self) = @_; return @{$self->{refs}->{text}}; } sub renderText { my ($self, $string, $width) = @_; my ($x, $y) = $self->textToDevice(0,0); push @{$self->{refs}->{text}}, { str => $string, left => $x, bottom => $y, right =>$x + $width, }; return; }
Update 1: There’s a function getStringWidth($fontmetrics, $string) in CAM::PDF. Altough there’s a parameter $fontmetrics in that function, irespective of what I pass to that parameter, the function returns the same value for a given string.
Also, I am not sure of the unit of measure the returned value uses.
Update 2: I changed the renderText function to following:
sub renderText { my ($self, $string, $width) = @_; my ($x, $y) = $self->textToDevice(0,0); push @{$self->{refs}->{text}}, { str => $string, left => $x, bottom => $y, right =>$x + ($width * $self->{Tfs}), font => $self->{Tf}, font_size => $self->{Tfs}, }; return; }
Note that in addition to getting font and font_size, I multiplied $width with font size to get the real width of the string.
Now, only thing missing is the height.
getStringWidth() depends heavily on the font metrics you provide. If it can’t find the character widths in that data structure, then it falls back to the following code:
which may be what you’re seeing. When I wrote that, I thought it was better than returning 0. If your font metrics seem good and you think there’s a bug in CAM::PDF, feel free to post more details and I’ll take a look.