I have a working pattern for english language but for my native language is not working and it give me headaches. First of all i have opened many question about encoding, and i know that i underestimated it, it was a big problem. I spent some time reading about it, and the problem is still there. So now i am facing a regular expression utf problem. So the pattern is:
exactMatch = re.compile(r"([^\.]*\bтурција\b[^\.]*)\.", re.UNICODE)
print exactMatch.pattern
result= exactMatch.findall("турција е на врвот од индустријата. турција е на врвот од индустријата.")
It works for english language. It function is to give me all sentences in a paragraph. So any suggestions?
I have also tried with encode and decode but noting happens except encoding error.
this will work:
if you use unicode, then use unicode.