Python’s tokenize returns all the found tokens’ position as two tuples of (startRow, startCol) and (endRow, endCol).
Is there a way to return the positions as the offsets from the beginning of the string? That is, I would like to get rid of (row, col) in favor of just “offset”.
There isn’t one built-in to
tokenize.If you had access to the same set of lines being used by the tokenizer, you could run through and store the accumulated “total length of lines before line X” into a list, and then use that to convert the row values into additive offsets.
For instance:
(Note: haven’t tested this code, it’s more as an example of the general concept.)