I am building a routine that processes disk buffers for forensic purposes. Am I better off using python strings or the array() type? My first thought was to use strings, but I’m trying to void unicode problems, so perhaps array(‘c’) is better?
Share
Write the code using what is most natural (strings), find out if it’s too slow and then improve it.
Arrays can be used as drop-in replacements for
strin most cases, as long as you restrict yourself to index and slice access. Both are fixed-length. Both should have about the same memory requirements. Arrays are mutable, in case you need to change the buffers. Arrays can read directly from files, so there’s no speed penalty involved when reading.I don’t understand how you avoid Unicode problems by using arrays, though.
stris just an array of bytes and doesn’t know anything about the encoding of the string.I assume that the “disk buffers” you mention can be rather large, so you might think about using
mmap: