how do you delete text inside <ref> *some text*</ref> together with ref itself?
in '...and so on<ref>Oxford University Press</ref>.'
re.sub(r'<ref>.+</ref>', '', string) only removes <ref> if
<ref> is followed by a whitespace
EDIT: it has smth to do with word boundaries I guess…or?
EDIT2 What I need is that it will math the last (closing) </ref> even if it is on a newline.
I don’t really see you problem, because the code pasted will remove the
<ref>...</ref>part of the string. But if what you mean is that and empty ref tag is not removed:Then what you need to do is change the .+ with .*
A + means one or more, while * means zero or more.
From http://docs.python.org/library/re.html: