Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8940021
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T10:56:40+00:00 2026-06-15T10:56:40+00:00

I want to use following codes to replace strings like /xxxxx/ with /xxxxx.html in

  • 0

I want to use following codes to replace strings like “/xxxxx/” with “/xxxxx.html” in the page_data, but doesn’t work. page_data is bytes type which is downloaded by a crawler.

page_data.replace(each, neweach)

Only when I change them to:

page_data = page_data.replace(each, neweach)

the strings(each) in page_data are actually replaceed.

The whole code is below:

import os
import sys
import re
import urllib
import urllib2

class WebGet(object):
    base_url = ""
    urls_list = []
    history_list = []
    replace_ch={}

    def __init__(self, base_url):
        self.base_url = base_url[:-1]
        self.urls_list.append('/')
        self.replace_ch[">>"] = "%3E%3E"
        self.replace_ch["<<"] = "%3C%3C"
        self.replace_ch["::"] = "%3A%3A"

    def recurseGet(self):
        '''Get page data recursively'''
        while(len(self.urls_list) != 0):
            url_suffix = self.urls_list[0]
            self.urls_list.remove(url_suffix)
            self.history_list.append(url_suffix)
            url_to_get = self.base_url + url_suffix

            "Get page data with url"
            print "To get",url_to_get
            page_data = urllib2.urlopen(url_to_get).read()
            page_data_done = self.pageHandle(page_data)

            "Write the page data into file"
            if url_suffix[-1] == '/':
                url_suffix = url_suffix[:-1]
            if url_suffix == '':
                url_suffix = "index"
            elif url_suffix[0] == '/':
                url_suffix = url_suffix[1:]
            url_suffix.replace('/','\\')
            url_suffix.replace('>>','%3E%3E')
            url_suffix.replace('<<','%3C%3C')
            url_suffix.replace('::','%3A%3A')
            file_str = "e:\\reference\\"+url_suffix
            if file_str.rfind("\\") != 12:
                new_dir = file_str[:file_str.rfind("\\")]
                if os.path.isdir(file_str) == False:
                    os.mkdir(file_str)
            file_str = file_str.strip()+".html"
            print "write file",file_str
            f_page = open(file_str, "wb")
            f_page.write(page_data_done)
            f_page.close


    def pageHandle(self, page_data):
        page_data.replace("http://www.cplusplus.com/","/") #here the replace works

        re_rule = '<a href="/reference(/\S{2,40}/)\">'
        list_page_urls = re.findall(re_rule, page_data)
        for each in list_page_urls:
            neweach = each
            neweach = neweach[:-1]+".html"
            #page_data = page_data.replace(each, neweach)
            page_data.replace(each, neweach)
            if each in page_data:
                print "fail replace"
            if each in self.history_list:
                continue
            elif each in self.urls_list:
                continue
            elif each == '/':
                continue                
            self.urls_list.append(each)

        return page_data

def main():
    url = "http://www.cplusplus.com/reference/"
    fc = WebGet(url)
    fc.recurseGet()

if __name__ == "__main__":
    main()

Why could be this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T10:56:41+00:00Added an answer on June 15, 2026 at 10:56 am

    Because that’s what the replace method does: returns a copy of the string with the relevant characters replaced.

    Apart from anything else, strings are immutable in Python, so it couldn’t work any other way.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In the following code I want to use the dilate function but I don't
I want to use some basic struct in C like the following: struct p
I want to use an html parser that does the following in a nice,
If I had the following strings: brasil and brasil-carinhoso, I want to replace only
I want to use the following code to login to a website which returns
I use the following code to layout network drives on a system. I want
I want to use pygments with jekyll I have the following code: {% highlight
I want to use the following command: openssl x509 -noout -in /etc/pki/tls/certs/cert1.pem -enddate openssl
I want preserve original value of target field and use json_decode to use following
Hello Sir i want send list of data to php server i use following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.