I need to check a string if is in conformance with this rules: http://www.w3.org/TR/widgets/#zip-rel-path
Zip-rel-path = [locale-folder] *folder-name file-name /
[locale-folder] 1*folder-name
locale-folder = %x6C %x6F %x63 %x61 %x6C %x65 %x73
"/" lang-tag "/"
folder-name = file-name "/"
file-name = 1*allowed-char
allowed-char = safe-char / zip-UTF8-char
zip-UTF8-char = UTF8-2 / UTF8-3 / UTF8-4
safe-char = ALPHA / DIGIT / SP / "$" / "%" /
"'" / "-" / "_" / "@" / "~" /
"(" / ")" / "&" / "+" / "," /
"=" / "[" / "]" / "."
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
lang-tag = primary-subtag *( "-" subtag )
primary-subtag = 1*8low-alphasubtag = 1*8(alphanum)
alphanum = low-alpha / DIGITlow-alpha = %x61-7a
A code example exactly on the rules above would help, I am not familiar with ABNF.
I don’t need a way to parse the ABNF, I just need only the above rules translated manually by someone who is used to or understands ABNF, to python code with regular expressions or any other way. Practically just input a string and verify it against the above mentioned rules eventually as a function that enters a string and returns true or false if the rules are matched or not. So to put it in a form of a question: How would this look in implemented in python?
I see from the UTF8 documentation that much of the part from the rules above is just checking if string is utf8:
https://www.rfc-editor.org/rfc/rfc3629
UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1 = %x00-7F
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
I’ve attempted to write a parser for you.
I agree that the bulk is a test for UTF-8, which is redundant if you already have the value in a string (UTF-8 is the encoding on the file system, unicode is the internal representation of the valid UTF-8). That does indeed simplify things tremendously.
As I understand it, the BNF says:
That said, here is a simple implementation (For the purpose of debugging it captures the output from the parsing. I did this for debugging, but please feel free to remove this if you don’t need it). Errors in the path cause the ZipRelPath constructor to raise a ValueError:
And a short set of tests:
Which produces:
Please let me know if any of your test cases fail, and I’ll see if I can fix it.