Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7613025
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T02:04:02+00:00 2026-05-31T02:04:02+00:00

I want to remove all strange characters from a string to make it url

  • 0

I want to remove all strange characters from a string to make it “url safe”. Therefor, I have a function that goes like this:

def urlize(url, safe=u''):
   intab =  u"àáâãäåòóôõöøèéêëçìíîïùúûüÿñ" + safe
   outtab = u"aaaaaaooooooeeeeciiiiuuuuyn" + safe
   trantab = dict((ord(a), b) for a, b in zip(intab, outtab))
   return url.lower().translate(trantab).strip()

This works just great, but now I want to reuse that funcion to allow special characters. For example, the quotation mark.

urlize(u'This is sóme randóm "text" that í wánt to process',u'"')

…and that throws the following error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: expected a character buffer object

I have tried, but did not work:

urlize(u'text',u'\"')
intab =  u"àáâãäåòóôõöøèéêëçìíîïùúûüÿñ%s" , safe

–EDIT–
The full function looks like this

def urlize(url, safe=u''):

    intab =  u"àáâãäåòóôõöøèéêëçìíîïùúûüÿñ" + safe
    outtab = u"aaaaaaooooooeeeeciiiiuuuuyn" + safe
    trantab = dict((ord(a), b) for a, b in zip(intab, outtab))
    translated_url = url.lower().translate(trantab).strip()

    pos = 0
    stop = len(translated_url)
    new_url= ''
    last_division_char = False

    while pos < stop:
        if not translated_url[pos].isalnum() and translated_url[pos] not in safe:
            if (not last_division_char) and (pos != stop -1):
                new_url+='-'
                last_division_char = True
        else:
            new_url+=translated_url[pos]
            last_division_char = False
        pos+=1

    return new_url

–EDIT– Goal

What I want is to normalize text so that I can put it on the url myself, and use it like an Id. For example, if I want to show the products of a category, I’d rather put “ninos-y-bebes” instead of “niños-y-bebés” (spanish for kids and babies). I really don’t want all the áéíóúñ (which are the special characters in spanish) in my url, but I don’t want to get rid of them either. That’s why I would like to replace all characters that looks the same (not 100% all of them, I dont care) and then delete all non alfanumeric characters left.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T02:04:04+00:00Added an answer on May 31, 2026 at 2:04 am

    The unidecode module is a safer option (it will handle other special simbols like “degree”):

    >>> from unidecode import unidecode
    >>> s = u'This is sóme randóm "text" that í wánt to process'
    >>> unidecode(s)
    'This is some random "text" that i want to process'
    >>> import urllib
    >>> urllib.urlencode(dict(x=unidecode(s)))[2:]
    'This+is+some+random+%22text%22+that+i+want+to+process'
    

    [ update ]

    i think i’m already doing that -> u”aaaaaaooooooeeeeciiiiuuuuyn” – Marco Bruggmann

    Fair enough, if you are willing to keep track of every unicode character out there for your translation table (accented characters are not the only issues, there are a whole lot of symbols to rain on your parade).

    Worst, many unicode symbols may be visually identical to their ASCII counterparts, leading to hard to diagnose errors.

    [ update ]

    What about something like:

    >>> safe_chars = 'abcdefghijklmnopqrstuvwxyz01234567890-_'
    >>> filter(lambda x: x in safe_chars, "i think i'm already doing that")
    'ithinkimalreadydoingthat'
    

    [ update ]

    @Daenyth I tried it, but I only get errors: from urllib import urlencode => urlencode(‘google.com/’;) => TypeError: not a valid non-string sequence or mapping object – Marco Bruggmann

    The urlencode function is intended to produce QUERYSTRING formated output (a=1&b=2&c=3). It expects key/value pairs:

    >>> urllib.urlencode(dict(url='google.com/'))
    'url=google.com%2F'
    
    >>> help(urllib.urlencode)
    Help on function urlencode in module urllib:
    
    urlencode(query, doseq=0)
        Encode a sequence of two-element tuples or dictionary into a URL query string.
    
        If any values in the query arg are sequences and doseq is true, each
        sequence element is converted to a separate parameter.
    
        If the query arg is a sequence of two-element tuples, the order of the
        parameters in the output will match the order of parameters in the
        input.
    (END)
    

    [ update ]

    That will works without a doubt, but what I want is to normalize text so that I can put it on the url myself, and use it like an Id. For example, if I want to show the products of a category, I’d rather put “ninos-y-bebes” instead of “niños-y-bebés” (spanish for kids and babies). I really don’t want all the áéíóúñ (which are the special characters in spanish) in my url, but I don’t want to get rid of them either. That’s why I would like to replace all characters that looks the same (not 100% all of them, I dont care) and then delete all non alfanumeric characters left.

    Ok, Marco, what you want is a routine to create the so called slugs, isn’t it?

    You can do it in one line:

    >>> s = u'This is sóme randóm "text" that í wánt to process'
    >>> allowed_chars = 'abcdefghijklmnopqrstuwvxyz01234567890'
    >>> ''.join([ x if x in allowed_chars else '-' for x in unidecode(s.lower()) ])
    u'this-is-some-random--text--that-i-want-to-process'
    >>> s = u"Niños y Bebés"
    >>> ''.join([ x if x in allowed_chars else '-' for x in unidecode(s.lower()) ])
    u'ninos-y-bebes'
    >>> s = u"1ª Categoria, ½ docena"
    >>> ''.join([ x if x in allowed_chars else '-' for x in unidecode(s.lower()) ])
    u'1a-categoria--1-2-docena'
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to remove all special characters from a string. Allowed characters are A-Z
I want to remove all non-alphanumeric and space characters from a string. So I
I want to remove all 'N's from the data that looks like this: >Seq1
I want to remove all unnecessary commas from the start/end of the string. eg;
I want to remove all rows that i entered from my SQLite database table.
Consider a non-DOM scenario where you'd want to remove all non-numeric characters from a
I want to remove all commas from a string using regular expressions, I want
I want to remove all null properties in a generic object. It doesn't have
I want to remove all files from Git at ~/bin/. I run git rm
I want to remove all vba-modules from an MS Word template using VBScript. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.