Interview Question I have been asked this question in an interview, and the answer

Question

0

Asked: May 28, 20262026-05-28T11:09:08+00:00 2026-05-28T11:09:08+00:00

Interview Question I have been asked this question in an interview, and the answer

0

Interview Question

I have been asked this question in an interview, and the answer doesn’t have to be specific programming language, platform- or tool- specific.

The question was phrased as following:

How would you get the instance count of a given word in a PDF. The answer doesn’t have to be programming, platform, or tool specific. Just let me know how would you do it in a memory and speed efficient way

I am posting this question for following reasons:

To better understand the context – I still fail to understand the context of this question, what might the interviewer be looking for by asking this question?
To get diverse opinions – I tend to answer such questions based on my skills on a programming language (C#), but there might be other valid options to get this done.

Thanks for your interest.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T11:09:09+00:00

If I had to write a program to do it, I’d find a PDF rendering library capable of extracting text from PDF files, such as Xpdf and then count the words.
If this was a one-of task or something that needed to be automated for a non-production quality task, I’d just feed the file into pdftotext program and then parsed the output file with python, splitting into words, putting them in a dictionary and counting number of occurances.

If I was asking this interviewing question, I’d be looking for a couple of things:

understanding the difference between the setting for this task:
one-off script thingy vs production code
not attempting to
implement PDF rendered yourself and trying to find a library
instead.

Now I wouldn’t expect this from any random candidate with no PDF experience, but you can have a very meaningful discussion about what PDF is and what a “word” is. You see, PDF stored text as a bunch of string with coordinates. Each string is not necessarily a word. Often times, the words will be split into a couple of completely separate strings which are absolutely positioned in the document to make a single word. This is why sometimes when searching for words in a PDF document you get strange looking results. So to implement word searching in a document you’d have to glue these strings back together (pdftotext takes care of that for you).

It’s not a bad question at all.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Interview Question I have been asked this question in an interview, and the answer

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply