Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8488187
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T21:27:48+00:00 2026-06-10T21:27:48+00:00

I need to check a string if is in conformance with this rules: http://www.w3.org/TR/widgets/#zip-rel-path

  • 0

I need to check a string if is in conformance with this rules: http://www.w3.org/TR/widgets/#zip-rel-path

Zip-rel-path   = [locale-folder] *folder-name file-name /
                 [locale-folder] 1*folder-name
locale-folder  = %x6C %x6F %x63 %x61 %x6C %x65 %x73
                 "/" lang-tag "/"
folder-name    = file-name "/"
file-name      = 1*allowed-char
allowed-char   = safe-char / zip-UTF8-char
zip-UTF8-char  = UTF8-2 / UTF8-3 / UTF8-4
safe-char      = ALPHA  / DIGIT / SP  / "$" / "%" / 
                 "'"    / "-"   / "_" / "@" / "~" /
                 "("    / ")"   / "&" / "+" / "," /
                 "="    / "["   / "]" / "."
UTF8-2         = %xC2-DF UTF8-tail
UTF8-3         = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4         = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
                 %xF4 %x80-8F 2( UTF8-tail )
UTF8-tail      = %x80-BF
lang-tag       = primary-subtag *( "-" subtag )
primary-subtag = 1*8low-alphasubtag         = 1*8(alphanum)
alphanum       = low-alpha  / DIGITlow-alpha      = %x61-7a

A code example exactly on the rules above would help, I am not familiar with ABNF.
I don’t need a way to parse the ABNF, I just need only the above rules translated manually by someone who is used to or understands ABNF, to python code with regular expressions or any other way. Practically just input a string and verify it against the above mentioned rules eventually as a function that enters a string and returns true or false if the rules are matched or not. So to put it in a form of a question: How would this look in implemented in python?

I see from the UTF8 documentation that much of the part from the rules above is just checking if string is utf8:
https://www.rfc-editor.org/rfc/rfc3629

UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
   UTF8-1      = %x00-7F
   UTF8-2      = %xC2-DF UTF8-tail
   UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
   UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
                 %xF4 %x80-8F 2( UTF8-tail )
   UTF8-tail   = %x80-BF  
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T21:27:50+00:00Added an answer on June 10, 2026 at 9:27 pm

    I’ve attempted to write a parser for you.

    I agree that the bulk is a test for UTF-8, which is redundant if you already have the value in a string (UTF-8 is the encoding on the file system, unicode is the internal representation of the valid UTF-8). That does indeed simplify things tremendously.

    As I understand it, the BNF says:

    • locale-folder (which is optional) is the string ‘locale/’ followed by lang-tag
    • lang-tag is of the form ‘en’, ‘en-us’, ‘en-123’, ‘en-us-1’ and so on:
      • At least one token, separated by the ‘-‘ character
      • Each token is from 1 to 8 characters
      • The first token may have only lower case letters
      • Following tokens are a mix of lower case letters and numbers
    • After the optional locale, you can have:
      • A single file-name or
      • A path (a series of folder-name separated by ‘/’) or
      • A path followed by a file-name
    • folder-name and file-name are sort-of unicode. Each character is either
      • A-Z, a-z, 0-9, or
      • Any of ” $%’-_@~()&+,=[].”
      • Any character above u007F (the UTF8 two, three and four-byte characters)

    That said, here is a simple implementation (For the purpose of debugging it captures the output from the parsing. I did this for debugging, but please feel free to remove this if you don’t need it). Errors in the path cause the ZipRelPath constructor to raise a ValueError:

    import re
    
    class ZipRelPath:
        FILE_NAME_RE = re.compile(u"^[a-zA-Z0-9 \$\%\'\-_@~\(\)&+,=\[\]\.\u0080-\uFFFF]+$")
        LANG_TAG_RE  = re.compile("^[a-z]{1,8}(\-[a-z0-9]{1,8})*$")
        LOCALES      = "locales/"
    
        def __init__(self, path):
            self.path = path
            self.lang_tag = None
            self.folders = []
            self.file_name = None
    
            self._parse_locales()
            self._parse_folders()
    
        def _parse_locales(self):
            """Consumes any leading 'locales' and lang-tag"""
            if self.path.startswith(ZipRelPath.LOCALES):
                self.path = self.path[len(ZipRelPath.LOCALES):]
                self._parse_lang_tag()
    
        def _parse_lang_tag(self):
            """Parses, consumes and validates the lang-tag"""
            self.lang_tag, _, self.path = self.path.partition("/")
            if not self.path:
                raise ValueError("lang-tag missing closing /")
            if not ZipRelPath.LANG_TAG_RE.match(self.lang_tag):
                raise ValueError(u"'%s' is not a valid language tag" % self.lang_tag)
    
        def _parse_folders(self):
            """Handles the folders and file-name after the locale"""
            while (self.path):
                self._parse_folder_or_file()
    
            if not self.folders and not self.file_name:
                raise ValueError("Missing folder or file name")
    
        def _parse_folder_or_file(self):
            """Each call consumes a single path entry, validating it"""
            folder_or_file, _, self.path = self.path.partition("/")
            if not ZipRelPath.FILE_NAME_RE.match(folder_or_file):
                raise ValueError(u"'%s' is not a valid file or folder name" % folder_or_file)
            if self.path:
                self.folders.append(folder_or_file)
            else:
                self.file_name = folder_or_file
    
        def __unicode__(self):
            return u"ZipRelPath [lang-tag: %s, folders: %s, file_name: %s" % (self.lang_tag, self.folders, self.file_name)
    

    And a short set of tests:

    GOOD = [
        "$%'-_@~()&+,=[].txt9",
        "my/path/to/file.txt",
        "locales/en/file.txt",
        "locales/en-us/file.txt",
        "locales/en-us-abc123-xyz/file.txt",
        "locales/abcdefgh-12345678/file.txt",
        "locales/en/my/path/to/file.txt",
        u"my\u00A5\u0160\u039E\u04FE\u069E\u0BCC\uFFFD/path/to/file.txt"
    ]
    BAD   = [
        "",
        "/starts/with/slash",
        "bad^file",
        "locales//bad/locale",
        "locales/en123/bad/locale",
        "locales/EN/bad/locale",
        "locales/en-US/bad/locale",
        ]
    
    for path in GOOD:
        print unicode(ZipRelPath(path))
    
    for path in BAD:
        try:
            zip = ZipRelPath(path)
            raise Exception("Illegal path {0} was accepted by {1}".format(path, zip))
        except ValueError as exception:
            print "Incorrect path '{0}' fails with: {1}".format(path, exception)
    

    Which produces:

    ZipRelPath [lang-tag: None, folders: [], file_name: $%'-_@~()&+,=[].txt9
    ZipRelPath [lang-tag: None, folders: ['my', 'path', 'to'], file_name: file.txt
    ZipRelPath [lang-tag: en, folders: [], file_name: file.txt
    ZipRelPath [lang-tag: en-us, folders: [], file_name: file.txt
    ZipRelPath [lang-tag: en-us-abc123-xyz, folders: [], file_name: file.txt
    ZipRelPath [lang-tag: abcdefgh-12345678, folders: [], file_name: file.txt
    ZipRelPath [lang-tag: en, folders: ['my', 'path', 'to'], file_name: file.txt
    ZipRelPath [lang-tag: None, folders: [u'my\xa5\u0160\u039e\u04fe\u069e\u0bcc\ufffd', u'path', u'to'], file_name: file.txt
    Incorrect path '' fails with: Missing folder or file name
    Incorrect path '/starts/with/slash' fails with: '' is not a valid file or folder name
    Incorrect path 'bad^file' fails with: 'bad^file' is not a valid file or folder name
    Incorrect path 'locales//bad/locale' fails with: '' is not a valid language tag
    Incorrect path 'locales/en123/bad/locale' fails with: 'en123' is not a valid language tag
    Incorrect path 'locales/EN/bad/locale' fails with: 'EN' is not a valid language tag
    Incorrect path 'locales/en-US/bad/locale' fails with: 'en-US' is not a valid language tag
    

    Please let me know if any of your test cases fail, and I’ll see if I can fix it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string with keywords and I need to check if this string
Ive got this problem where I need to check an arrayposition if the string
I need to check if a string starts with http:// in java script What
I need to check if a string contains any of this characters: Á,À,Ã,É,Ê,Í,Ó,Õ,Ô,Ú,Ç I
I need to check that a string takes on the format 05:31:2008:06:27:2010 I do
I need to check whether a string contains any swear words. Following some advice
I need to check If a column value (string) in SQL server table starts
i need to check existing of Character (*,&,$) in Given String using python command
I need to know whether check whether a String end with something like .xyz
I need an simple way to check whether a string that is sent to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.