I’m stumped about a “stack overflow” error–“out of stack space (application error code: 12246)–that I’m getting in BBEdit when I do a “replace all”, searching for
(@article(((?!eprint|@article|@book).)*\r)*)pmid = {(.+)}((((?!eprint|@article|@book).)*\r)*(@|\r*\z))
and replacing with
\1eprinttype = {pubmed}, eprint = {\4}\5
I can use these same patterns manually, doing one-at-a-time find & replace, without any errors, even once the match no longer occurs. I can also avoid the error by working on smaller files.
I suspect that it’s my inefficient and sloppy regex coding that’s to blame, and would appreciate an expert’s help in doing this more efficiently. I’m trying to locate all entries in a BibLaTeX bibliography that don’t already have an eprint field, but which have a pmid field, and replace the pmid field with a corresponding e-print specification (using eprint and eprinttype).
Update: After some experimentation, I’ve found that a different approach is the only thing I can get to work. Searching for
(?(?=@article(.+\r)+eprint = {(.+\r)+}\r*)(?!)|(@article(.+\r)+)pmid = {(.+)}((.+\r)+}\r*))
and replacing with
\3eprinttype = {pubmed}, eprint = {\5}\6
does the trick. The only problem with this is the backreferences are fragile, but I can’t get named backreferences to work in BBEdit.
It’s probably catastrophic backtracking caused by this last part:
If you break that down and simplify it, you essentially have a
.*, a\r*, and another\r*right next to each other. Now picture a string of\rcharacters at the end of your input: How should each\rbe distributed? Which of those little clauses will soak up each\rcharacter? If you have\r\r\r\r\r, you could eat all five\rs with the.*part and none at all with the\r*parts…or, you can make up any number of permutations that will still match. Since the*is greedy, it will try to fill the.*up first, but if that fails, it has to keep trying permutations until one of them works. So it’s probably hogging a bunch of your resources with unnecessary backtracking, until finally it crashes.I’m not an expert on optimization techniques for regex, but I’d start there if I were you.
Update:
Check out the Wikipedia article on PCRE:
So I think catastrophic backtracking is a good bet here. I’d try to solve it by tweaking your regex before changing the build options on PCRE.