Does anybody have a code sample for a unicode-aware strings program? Programming language doesn’t matter. I want something that essentially does the same thing as the unix command ‘strings’, but that also functions on unicode text (UTF-16 or UTF-8), pulling runs of english-language characters and punctuation. (I only care about english characters, not any other alphabet).
Thanks!
Do you just want to use it, or do you for some reason insist on the code?
On my Debian system, it seems
stringscommand can do this out of the box. See the exercept from the manpage:Edit: OK. I don’t know C# so this may be a bit hairy, but basically, you need to search for sequences of alternating zeros and English characters.
This should work for little-endian.