I am trying to manage a Calibre library. Calibre uses Python regex to manipulate database fields. In particular, you can specify the “pattern” and “repl” arguments to the sub() method. But that’s all you can do, no other coding. My current problem is that if there is no match for my group expression, Calibre reports an “unmatched group” error, and refuses to proceed.
Can I create a group that “always matches” but contains an empty string if it’s really not there?
I want to replace a field with a sub-string from the title, if the sub-string is found, or an empty string if it is not. I currently have mixed titles like:
Anne McCaffrey - Pern 10 - The Renegades of Pern
Generation Warriors
The Mystery of Ireta: Dinosaur Planet & Dinosaur Planet Survivors
Anne McCaffrey - Tsw 7 - Ship That Returned
I want to pick out “Pern 10” from the first example, and “Tsw 7” from the fourth example, and write them to the series field. How can I do this?
My current, erroneous expression is
(((P<author>[^-]*?)- )?((?P<series>\w+)\W*(?P<series_index>\d*)\s-))?(?P<title>.*)
The only field I want at the moment is
\g<series>
Thanks for any ideas!
If I’m understanding the requirements correctly, it sounds like you should be able to write:
The initial
^(?:(?! - ).)*part will swallow everything before the first space-hyphen-space — or simply swallow everything, if there is no space-hyphen-space.The
(?: - )?part will swallow the first space-hyphen-space if it’s there, or otherwise nothing.The
(?:(?! - ).)*part inside the(?P<series>...)will swallow everything that hasn’t already been swallowed, up until the second space-hyphen-space (or end-of-string, if no second space-hyphen-space is found). If everything has already been swallowed, then this will simply be the empty string.In other words, the above is roughly equivalent to:
Will that work for you?