Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8866809
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T16:52:50+00:00 2026-06-14T16:52:50+00:00

I have the following snippet of code, which takes a url opens it, parses

  • 0

I have the following snippet of code, which takes a url opens it, parses out JUST the text and then searches for widgets. The way it detects widgets is by looking for the word widget1 and then endwidget, which denotes the end of the widget.

Basically the code writes all lines of text to a file as soon as it finds the word widget1 and ends when it reads endwidget. However, my code is indenting all lines after the first widget1 line.

This is my output

widget1 this is a really cool widget
       it does x, y and z 
       and also a, b and c
       endwidget

What I want is:

widget1 this is a really cool widget
it does x, y and z 
and also a, b and c
endwidget

Why am I getting this indentation? This is my code…

 for url in urls:
        page = mech.open(url)
        html = page.read()
        soup = BeautifulSoup(html)
        text= soup.prettify()
        texts = soup.findAll(text=True) 

        def visible(element):
            if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: 
            # If the parent of your element is any of those ignore it

                return False

            elif re.match('<!--.*-->', str(element)):
            # If the element matches an html tag, ignore it

                return False

            else:
            # Otherwise, return True as these are the elements we need

              return True

        visible_texts = filter(visible, texts)

        inwidget=0
        # open a file for write
        for line in visible_texts:
        # if line doesn't contain .widget1 then ignore it
            if ".widget1" in line and inwidget==0:
                match = re.search(r'\.widget1 (\w+)', line)
                line = line.split (".widget1")[1]   
                # make the next word after .widget1 the name of the file
                filename = "%s" % match.group(1) + ".txt"
                textfile = open (filename, 'w+b')
                textfile.write("source:" + url + "\n\n")
                textfile.write(".widget1" + line)
                inwidget = 1
            elif inwidget == 1 and ".endwidget" not in line:
                print line
                textfile.write(line)
            elif ".endwidget" in line and inwidget == 1:
                textfile.write(line)
                inwidget= 0
            else:
                pass
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T16:52:51+00:00Added an answer on June 14, 2026 at 4:52 pm

    The reason you’re getting this indentation in all lines except for the first is because the first line you edit the line with textfile.write(".widget1" + line) but the rest of the lines you take directly from the html file where it contains indentation. You can remove the unwanted white spaces by using str.strip() on the lines and change textfile.write(line) to textfile.write(line.strip()).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

i have the following code snippet. in which i just want to return PartyName
I have the following code snippet which essentially parses my blog site and store
i have the following code snippet in my ANT File which compiles my project
I have the following very simple code snippet which is loaded from a separate
I have the following java code snippet which runs a batch file( renames a
I have the following snippet code: my $hostname = `host \`hostname\``; which this yields
I have this following snippet of code which will when I connect to work,
I have the following code snippet which is working fine: ifstream NDSConfig( NDS.config )
I have the following code snippet which is returning the correct IPV4 Address. NSArray
I have following code snippet in jquery.In which,I want to animate each element of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.