Is there any documentation on how Python’s string functions are implemented in Python?
I understand that str is a built-in module, and so its functions are implemented in C.
But isn’t there code for it anyways? How about in Pypy? From what I’ve read so far, they’ve re-implemented a lot of built-in modules in Python itself.
Example Question: How is the split method of strings implemented? (Without writing my own implementation of it)
EDIT: I am not looking for an implementation written in C (which is the default implementation in the source code of Python/CPython).
This doesn’t really answer the question posed, it’s just a bit too long to be a comment.
Some quick digging through the source shows that PyPy has two implementations of
split(), this high-level, readable version and this lower-level version, which appears to be the implementation ofsplit()in rpython itself.Neither of these implementations are equivalent to CPython’s
split()method (most obviously they do not handle the special case CPython does where thesepis not supplied). However, if you are merely interested in the basic algorithm used rather than the details, PyPy’s implementations could be a guide (at a quick glance, it looks to be doing basically the same thing as both CPython and Jython).As a general resource, though, there’s no reason to think that PyPy’s implementation of all string functions would mirror the algorithms used in CPython — PyPy is, after all, intended as an optimized version of Python running in a JIT, and this may have significant impact on what the most reasonable implementation of a method is (especially string functions, which frequently can be performance bottlenecks and which the implementors of an “optimized” runtime therefore have incentive to optimize).
Thinking about the more general question, there’s very little incentive for the CPython developers to maintain a separate set of pure Python implementations of the low-level library already maintained in C. It seems that there is to much risk that the mirror implementations would grow stale or inaccurate to what is actually being done, which could ultimately be harmful for people who were trying to understand the inner workings of Python without reading the C code.