Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 755431
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T15:07:45+00:00 2026-05-14T15:07:45+00:00

I am trying to come up with a function that does a good job

  • 0

I am trying to come up with a function that does a good job of sanitizing certain strings so that they are safe to use in the URL (like a post slug) and also safe to use as file names. For example, when someone uploads a file I want to make sure that I remove all dangerous characters from the name.

So far I have come up with the following function which I hope solves this problem and allows foreign UTF-8 data also.

/**
 * Convert a string to the file/URL safe "slug" form
 *
 * @param string $string the string to clean
 * @param bool $is_filename TRUE will allow additional filename characters
 * @return string
 */
function sanitize($string = '', $is_filename = FALSE)
{
 // Replace all weird characters with dashes
 $string = preg_replace('/[^\w\-'. ($is_filename ? '~_\.' : ''). ']+/u', '-', $string);

 // Only allow one dash separator at a time (and make string lowercase)
 return mb_strtolower(preg_replace('/--+/u', '-', $string), 'UTF-8');
}

Does anyone have any tricky sample data I can run against this – or know of a better way to safeguard our apps from bad names?

$is-filename allows some additional characters like temp vim files

update: removed the star character since I could not think of a valid use

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T15:07:46+00:00Added an answer on May 14, 2026 at 3:07 pm

    Some observations on your solution:

    1. ‘u’ at the end of your pattern means that the pattern, and not the text it’s matching will be interpreted as UTF-8 (I presume you assumed the latter?).
    2. \w matches the underscore character. You specifically include it for files which leads to the assumption that you don’t want them in URLs, but in the code you have URLs will be permitted to include an underscore.
    3. The inclusion of “foreign UTF-8” seems to be locale-dependent. It’s not clear whether this is the locale of the server or client. From the PHP docs:

    A “word” character is any letter or digit or the underscore character, that is, any character which can be part of a Perl “word”. The definition of letters and digits is controlled by PCRE’s character tables, and may vary if locale-specific matching is taking place. For example, in the “fr” (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

    Creating the slug

    You probably shouldn’t include accented etc. characters in your post slug since, technically, they should be percent encoded (per URL encoding rules) so you’ll have ugly looking URLs.

    So, if I were you, after lowercasing, I’d convert any ‘special’ characters to their equivalent (e.g. é -> e) and replace non [a-z] characters with ‘-‘, limiting to runs of a single ‘-‘ as you’ve done. There’s an implementation of converting special characters here: https://web.archive.org/web/20130208144021/http://neo22s.com/slug

    Sanitization in general

    OWASP have a PHP implementation of their Enterprise Security API which among other things includes methods for safe encoding and decoding input and output in your application.

    The Encoder interface provides:

    canonicalize (string $input, [bool $strict = true])
    decodeFromBase64 (string $input)
    decodeFromURL (string $input)
    encodeForBase64 (string $input, [bool $wrap = false])
    encodeForCSS (string $input)
    encodeForHTML (string $input)
    encodeForHTMLAttribute (string $input)
    encodeForJavaScript (string $input)
    encodeForOS (Codec $codec, string $input)
    encodeForSQL (Codec $codec, string $input)
    encodeForURL (string $input)
    encodeForVBScript (string $input)
    encodeForXML (string $input)
    encodeForXMLAttribute (string $input)
    encodeForXPath (string $input)
    

    https://github.com/OWASP/PHP-ESAPI
    https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.