I am using this code to find files recursively in a folder , with size greater than 50000 bytes.
def listall(parent):
lis=[]
for root, dirs, files in os.walk(parent):
for name in files:
if os.path.getsize(os.path.join(root,name))>500000:
lis.append(os.path.join(root,name))
return lis
This is working fine.
But when I used this on ‘temporary internet files’ folder in windows, am getting this error.
Traceback (most recent call last):
File "<pyshell#4>", line 1,
in <module> listall(a) File "<pyshell#2>",
line 5, in listall if os.path.getsize(os.path.join(root,name))>500000:
File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg'
I think this is because windows gives names with special characters in this specific folder…
Please help to sort out this issue.
It’s because the saved file ‘(something)+1[1].jpg’ has non-ASCII characters in its name, characters that don’t fit into the ‘system default code page’ (also misleadingly known as ‘ANSI’).
Programs like Python that use the byte-based C standard library (
stdio) file access functions have big problems with Unicode filenames. On other platforms they can just use UTF-8 and everyone’s happy, but on Windows the system default code page is never UTF-8, so there will always be characters that can’t be represented in the given encoding. They’ll get replaced with?or sometimes other similar-looking characters, and then when you try to read the files with mangled names you’ll get errors like the above.Which code page you get depends on your locale: on Western Windows installs it’ll be cp1252 (similar to ISO-8859-1, ‘Latin-1’), so you’ll only be to use these characters.
Luckily, reasonably recent versions of Python (2.3+, according to PEP277) can also directly support Unicode filenames by using the native Win32 APIs instead of stdio. If you pass a Unicode string into
os.listdir(), Python will use these native-Unicode APIs and you’ll get Unicode strings back, which will include the original characters in the filename instead of mangled ones. So if you calllistallwith a Unicode pathname:it should Just Work.