Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 31607
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T13:37:54+00:00 2026-05-10T13:37:54+00:00

Is there any python module to convert PDF files into text? I tried one

  • 0

Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T13:37:55+00:00Added an answer on May 10, 2026 at 1:37 pm

    Try PDFMiner. It can extract text from PDF files as HTML, SGML or ‘Tagged PDF’ format.

    The Tagged PDF format seems to be the cleanest, and stripping out the XML tags leaves just the bare text.

    A Python 3 version is available under:

    • https://github.com/pdfminer/pdfminer.six
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there any way to get Python to use my ActiveTcl installation instead of
Is there any Ruby equivalent for Python's builtin zip function? If not, what is
Is there any GUI toolkit for Python with form designer similar to Delphi, eg
In Python is there any way to make a class, then make a second
Is there any way to install Setuptools for Python 2.6 in Windows without having
Is there any alternative for WPF (windows presentation foundation) in python? http://msdn.microsoft.com/en-us/library/aa970268.aspx#Programming_with_WPF
Is there any IDE (like VS) with drag and drop support for building python
Is there any performance advantage to using lists over dictionaries over tuples in Python?
Is there any benefit in using compile for regular expressions in Python? h =
I have over a million text files compressed into 40 zip files. I also

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.