I’ve got a string which has the following format some_string = ,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,, and this

Question

0

Editorial Team

Asked: June 17, 20262026-06-17T01:38:12+00:00 2026-06-17T01:38:12+00:00

I’ve got a string which has the following format some_string = ,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,, and this

0

I’ve got a string which has the following format

some_string = “,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,”
and this is the content of a text file called f

I want to search for a specific term within the xxx (let’s say that term is ‘silicon’)

note that the xxx can all be different and can contain any special characters (including meta characters) except for a new line

match = re.findall(r",{3}(.*?silicon.*?),{3}", f.read())
print match

But this doesn’t seem to work because it returns results which are in the format:
[“xxx,,,xxx,,,xxx,,,xxx,,,silicon”, “xxx,,,xxx,,,xxx,,,xxsiliconxx”] but I only want it to return [“silicon”, “xxsiliconxx”]

What am I doing wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T01:38:13+00:00

Try the following regex:

(?<=,{3})(?:(?!,{3}).)*?silicon.*?(?=,{3})

Example:

>>> s = ',,,xxx,,,silicon,,,xxx,,,xxsiliconxx,,,xxx'
>>> re.findall(r'(?<=,{3})(?:(?!,{3}).)*?silicon.*?(?=,{3})', s)
['silicon', 'xxsiliconxx']

I am assuming that the content in the xxx can contain commas, just not three consecutive commas or it would end the field. If the content in the xxx sections cannot contain any commas, you can use the following instead:

(?<=,{3})[^,\r\n]*?silicon.*?(?=,{3})

The reason your current approach doesn’t work is that even though .*? will try to match as few characters as possible, the match will still start as early as possible. So for example the regex a*?b would match the entire string "aaaab". The only time the regex will advance the starting position is when the regex fails to match, and since ,,, can be matched by the .*?, your match will always start at the beginning of the string or just after the previous match.

The lookbehind and lookahead are used to address the issue raised by JaredC in comments, basically re.findall() won’t return overlapping matches, so you need the leading and trailing ,,, to not be a part of the match.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a string which has the following format some_string = ,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,, and this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply