Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7011735
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T22:09:53+00:00 2026-05-27T22:09:53+00:00

I have few thousands HTML sources to read. It is from forum which started

  • 0

I have few thousands HTML sources to read. It is from forum which started from 2004. My basic idea is to read through page by change the page number in Python script. All thing I need is like this

 lot of other tag from beginning
 <div id="posts">
  lot of stuff between
 </div>
 lot of other tag till ending

I use beautifulsoup findAll command to read the stuff between and which works perfectly in 99% percent time, I think. Suddenly, one page gives me frustration. And the structure is like below

 lot of other tag from beginning
 <div id="posts">
  first part
  </div>
  second part
 </div>
 lot of other tag till ending

As you can see, here is a unparallel which has no before. Then the beautifulsoup thought that the second last is the ending for the then it stopped ignoring the useful second part between the unparallel and the real ending for

I believe it is rare condition since I finished another thread which contains 1960 pages which has no such problem. This problem occurred in an old thread. any one has any idea? Is there any fixing tool ? It is quite frustrated.

Thanks in advance

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T22:09:54+00:00Added an answer on May 27, 2026 at 10:09 pm

    oh dear.
    Easiest way would be to fix the page so all end tags have a start tag….

    Basically the mark up is not correct, browsers have all sorts of ifs and buts to cope with this and other fun ones like

    <Tag1><Tag2></Tag1></Tag2>
    

    to cope with the bad old days where html wasn’t valid xml.
    It’s do able in code, though a lot of work, but basically you have to “guess” where the missing start tag should be.

    In this specific case where would youy logically inset a start div, or could you afford to rip out the orphaned end tag. You have to guess the intent… Painful, very painful.

    Quite liklely to make a mess of your logic. Me I’d throw an error on this page and move to the next, then get it fixed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Here's the problem -- I have a few thousand small text snippets, anywhere from
I have few different applications among which I'd like to share a C# enum.
I have few question in this regard When you create an internet page, does
I have few nested DIVs at page. I want to add event only for
I have few thousands of records with few 100 fields in a MySQL Table.
I have an application which receives GPS data from a mobile device as well
I have a table of data which is generated dynamically with Javascript. Every few
Dear Powershell Gurus, I have a few thousands of files in a folder called
I have huge table (millions of records) in which a few hundreds to a
I have to digitize a few thousand physical documents and assign them to a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.