i’ve got some problems with my regex and removing my the strongs bounded by

Question

0

Editorial Team

Asked: May 23, 20262026-05-23T12:10:53+00:00 2026-05-23T12:10:53+00:00

i’ve got some problems with my regex and removing my the strongs bounded by

0

i’ve got some problems with my regex and removing my the strongs bounded by brackets.

here’s my code:

import sys, re
import codecs

reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTALL)

for row in reader:
    if ("(" in row) and (")" not in row):
        continue
    if row.count("(") != row.count(")"):
        continue
    else:
        row2 = p.sub('', row)
        print row2

for the input textfiles it looks something like this:

가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서)   unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다    transmit

the required output should look like this:

가시 돋친평  spinosity
가장 완전한  unabridged
표준강도 이하의    underproof
치명적인    capital

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T12:10:54+00:00

Would this work for you?

# -*- coding: utf-8 -*-
import sys, re
import codecs

#reload(sys)
#sys.setdefaultencoding('utf-8')

#prepareing the examples to work on
writer = codecs.open("input.txt",'w','utf-8')
examples = [u'가시 돋친(신랄한)평 spinosity',
            u'가장 완전한 (같은 종류의 것 중에서)',
            u'알코올이)표준강도(50%) 이하의 underproof',
            u'(암초 awash',
            u'치명적인(fatal) capital']
for exampl in examples:
    writer.write(exampl+"\n")
writer.write(exampl)
writer.close()

reader = codecs.open("input.txt",'r','utf-8')

#order of patterns is important,
#if you remove brackets first, the other won't find anything
patterns_to_remove = [r"\(.{1,}\)",r"[\(\)]"]

#one pattern would work just fine, with the loop is a bit more clear
#pat = r"(\(.{1,}\))|([\(\)])"    
#for row in reader:
#    row = re.sub(pat,'',row)#,re.U)
#    print row

reader.seek(0)
for row in reader:
    for pat in patterns_to_remove:
        row = re.sub(pat,'',row)#,re.U)
    print row
reader.close()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i’ve got some problems with my regex and removing my the strongs bounded by

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply