Suppose I have a text file with numbers separated by colons and spaces as the following.
0:-83 1: -51 2: -69 3: -82 4: -85 8: -90 9: -69 QUAD
0:-88 1: -88 2: -98 3: -75 4: -42 5: -71 6: -89 7: -28 8: -83 9: -78 STADIUM
A pair is defined as two numbers separated by a colon. Spaces interrupt pairs of numbers arbitrarily.
Currently, I have the following.
with open('data.txt') as file:
lines = file.read().splitlines()
for line in lines:
line = line[:-1]
# What is the regex I should be using?
# data = re.split(r'[:\s]',line) includes the space after the colon if it exists
What is the best way to parse the text file so that each line is stored as a list of tuples where each tuple is a pair?
The following regex will give you your pairs of numbers (including minus signs) in groups:
This matches a word-boundary (
\b), then a set of digits (with an optional-minus sign before it), followed by a colon surrounded by optional whitespace, followed by another set of digits with optional minus sign, followed by a word boundary.Demo:
You probably don’t want to reach all lines into memory at once; just loop over your file line by line:
You don’t need to remove the newline for the above code to work, but if you do, use
line.strip()instead ofline[:-1].