I receive encoded PDF files regularly. The encoding works like this: the PDFs can

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T09:35:58+00:00 2026-05-14T09:35:58+00:00

I receive encoded PDF files regularly. The encoding works like this: the PDFs can

0

I receive encoded PDF files regularly. The encoding works like this:

the PDFs can be displayed correctly in Acrobat Reader
select all and copy the test via Acrobat Reader
and paste in a text editor
will show that the content are encoded

so, examples are:

13579 -> 3579;
hello -> jgnnq

it’s basically an offset (maybe swap) of ASCII characters.

The question is how can I find the offset automatically when I have access to only a few samples. I cannot be sure whether the encoding offset is changed. All I know is some text will usually (if not always) show up, e.g. “Name:”, “Summary:”, “Total:”, inside the PDF.

Thank you!

edit: thanks for the feedback. I’d try to break the question into smaller questions:

Part 1: How to detect identical part(s) inside string?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T09:35:59+00:00

You need to brute-force it.

If those patterns are simple like +2 character code like in your examples (which is +2 char codes)

h i j
e f g
l m n
l m n
o p q

1 2 3
3 4 5
5 6 7
7 8 9
9 : ;

You could easily implement like this to check against knowns words

>>> text='jgnnq'
>>> knowns=['hello', '13579']
>>>
>>> for i in range(-5,+5): #check -5 to +5 char code range
...     rot=''.join(chr(ord(j)+i) for j in text)
...     for x in knowns:
...         if x in rot:
...             print rot
...
hello

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I receive encoded PDF files regularly. The encoding works like this: the PDFs can

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply