While trying to paste images, I noticed that Cases[] is very slow.
To reproduce, first copy a large image to the clipboard (just press Print Screen), then evaluate the following:
In[33]:= SetSystemOptions["PackedArrayOptions" -> "UnpackMessage" -> True];
In[34]:= AbsoluteTiming[nb = NotebookGet@ClipboardNotebook[];]
Out[34]= {0.4687500, Null}
In[35]:= AbsoluteTiming[d1 = nb[[1, 1, 1, 1, 1, 1, 1]];]
Out[35]= {0., Null}
In[36]:= AbsoluteTiming[d2 = First@Cases[nb, r_RasterBox :> First[r], Infinity, 1];]
During evaluation of In[36]:= Developer`FromPackedArray::unpack: Unpacking array in call to Notebook. >>
Out[36]= {0.9375000, Null}
(I did this on Windows, not sure if the paste code is the same on other systems.)
Note that extracting the data using Cases is extremely slow compared to using Part directly, even though I explicitly tell Cases that I need only one match.
I did find out (as shown above) that Cases triggers unpacking for some reason, even though the search should stop before it reaches the packed array inside. Using a shallower level specification than Infinity might avoid unpacking.
Question: Using Cases here is both easier and more reliable than Part (what if the subexpression can appear in different positions?) Is there a way to make Cases fast here, perhaps by using a different pattern or different options?
Possibly related question: Mathematica's pattern matching poorly optimized?
(This is why I changed the Cases rule from RasterBox[data_, ___] -> data to r_RasterBox :> First[r].)
I don’t have access to Mathematica right now, so what follows is untested. My guess is that
Casesunpacks here because it searches depth-first, and so sees the packed array first. If this is correct, then you could use rules instead (ReplaceAll, notReplace), and throw an exception upon first match:As I said, this is just an untested guess.
Edit 2: an approach based on shielding parts of expression from the pattern-matcher
Preamble
In the first edit (below) a rather heavy approach is presented. In many cases, one can take an alternative route. In this particular problem (and many others like it), the main problem is to somehow shield certain sub-expressions from the pattern-matcher. This can be achieved also by using rules, to temporarily replace the parts of interest by some dummy symbols.
Code
Here is a modification of
Caseswhich does just that:This version of
Caseshas one additional parametershieldPattern(third one), which indicates which sub-expressions must be shielded from the pattern-matcher.Advantages and applicability
The code above is pretty light-weight (compared to the suggestion of edit1 below), and it allows one to fully reuse and leverage the existing
Casesfunctionality. This will work for cases when the main pattern (or rule) is insensitive to shielding of the relevant parts, which is a rather common situation (and in particular, covers patterns of the type_h, including the case at hand). This may also be faster than the application ofmyCases(described below).The case at hand
Here, we need this call:
and the result is of course the same as before:
Edit: an alternative Cases-like function
Motivation and code
It took me a while to produce this function, and I am not 100 percent sure it always works correctly, but here is a version of
Caseswhich, while still working depth-first, analyzes expression as a whole before sub-expressions:How it works
This is not the most trivial piece of code, so here are some remarks. This version of
Casesis based on the same idea I suggested first – namely, use rule-substitution semantics to first attempt the pattern-match on an entire expression and only if that fails, go to sub-expressions. I stress that this is still the depth-first traversal, but different from the standard one (which is used in most expression-traversing functions likeMap,Scan,Cases, etc). I useReapandSowto collect the intermediate results (matches). The trickiest part here is to prevent sub-expressions from evaluation, and I had to wrap sub-expressions intoHoldComplete. Consequently, I had to use (a nested version of the) Trott-Strzebonski technique (perhaps, there are simpler ways, but I wasn’t able to see them), to enable evauation of rules’ r.h.sides inside held (sub)expressions, and usedReplacewith proper level spec, accounting for extra addedHoldCompletewrappers. I returnNullin rules, since the main action is toSowthe parts, so it does not matter what is injected into the original expression at the end. Some extra complexity was added by the code to support the level specification (I only support the single integer level indicating the maximal level up to which to search, not the full range of possible lev.specs), the maximal number of found results, and theHeadsoption. The code forfruleserves to not introduce the overhead of counting found elements in cases when we want to find all of them. I am using the sameModule-generated tag both as a tag forSow, and as a tag for exceptions (which I use to stop the process when enough matches have been found, just like in my original suggestion).Tests and benchmarks
To have a non-trivial test of this functionality, we can for example find all symbols in the
DownValuesofmyCases, and compare toCases:The
myCasesfunction is about 20-30 times slower thanCasesthough:The case at hand
It is easy to check that
myCasessolves the original problem of unpacking:It is hoped that
myCasescan be generally useful for situations like this, although the performance penalty of using it in place ofCasesis substantial and has to be taken into account.