Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6379113
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T02:05:46+00:00 2026-05-25T02:05:46+00:00

How do python raw strings and string literals work? I’m trying to make a

  • 0

How do python raw strings and string literals work? I’m trying to make a webscraper to download pdfs from a site. When I search the string it works, but when I try to implement it in python I always get None as my answer

import urllib
import re    
url="" //insert url here
sock=urllib.urlopen(url)
htmlSource=sock.read();
sock.close();

m=re.match(r"<a href.*?pdf[^>]*?", raw(htmlSource))
print m



$ python temp.py
None

The raw function is from here: http://code.activestate.com/recipes/65211-convert-a-string-into-a-raw-string/

That said, how can I complete this program so that I can print out all of the matches and then download the pdfs?

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T02:05:47+00:00Added an answer on May 25, 2026 at 2:05 am

    You seem to be very confused.

    A ‘string literal’ is a string that you type into the program. Because there needs to be a clear beginning and end to your string, certain characters become inconvenient to have within the middle of the string, and escape sequences must be used to represent them.

    Python offers ‘raw’ string literals which have different rules for how the escape sequences are interpreted: the same rules are used to figure out where the string ends (so a single backslash, followed by the opening quote character, doesn’t terminate the string), but then the stuff between the backslashes doesn’t get transformed. So, while '\'' is a string that consists of a single quote character (the \' in the middle is an escape sequence that produces the quote), r'\'' is a string that consists of a backslash and a quote character.

    The raw string literal produces an object of type str. It is the same type as produced by an ordinary string literal. These are often used for the pattern for a regex operation, because the strings used for regexes often need to contain a lot of backslashes. If you wanted to write a regex that matched a backslash in the source text, and you didn’t have raw string literals, then you would need to put, perhaps surprisingly, four backslashes between the quotes in your source code: the Python compiler would interpret this as a string containing two real backslashes, which in turn represents “match a backslash” in the regex syntax.

    The function you found is an imperfect attempt to re-introduce escape sequences into input text. This is not what what you want to do, doesn’t even really make sense, and doesn’t meet the author’s own spec anyway. It seems to be based on a misconception similar to your own. The concept of a “raw equivalent of” a string is nonsensical. There is, really, no such thing as “a raw string”; raw string literals are a convenience for creating ordinary strings.

    You want to search for the pattern within htmlSource. It is already in the form you need it to be in. Your problem has nothing to do with string escapes. When a string comes from user input, file input, or basically anything other than the program source, it is not processed the way string literals are, unless you explicitly arrange for that to happen. If the web page contains a backslash followed by an n, the string that gets read by urllib contains, in the corresponding spot, exactly that – a backslash followed by an n, not a newline.

    The problem is as follows: you want to search the string, as you said: “when I search the string it works”. You are currently matching the string. See the documentation:

    Help on function match in module re:
    
    match(pattern, string, flags=0)
        Try to apply the pattern at the start of the string, returning
        a match object, or None if no match was found.
    

    Your pattern does not appear at the beginning of the string, since the HTML for the webpage does not start with the <a> tag you are looking for.

    You want m=re.search(r"<a href.*?pdf[^>]*?", htmlSource).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Does Objective-C have raw strings like Python's ? Clarification: a raw string doesn't interpret
Hello I'm a trying to learn python, In C++ to read in string from
Python has this wonderful way of handling string substitutions using dictionaries: >>> 'The %(site)s
Python's convention is that variables are created by first assignment, and trying to read
I've got a problem with strings that I get from one of my clients
How could I send an e-mail from my Python script that is being run
Consider this short python list of dictionaries (first dictionary item is a string, second
I am trying to process various texts by regex and NLTK of python -which
I'm trying to extract the date/time when a picture was taken from the CR2
I'm trying to build a C++ extension for python using swig. I've followed the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.