I have just started using Scala and wish to better understand the functional approach to problem solving.
I have pairs of strings the first has placeholders for parameter and it’s pair has the values to substitute. e.g.
“select col1 from tab1 where id > $1 and name like $2”
“parameters: $1 = ‘250’, $2 = ‘some%'”
There may be many more than 2 parameters.
I can build the correct string by stepping through and using regex.findAllIn(line) on each line and then going through the iterators to construct the substitution but this seems fairly inelegant and procedurally driven.
Could anyone point me towards a functional approach that will be neater and less error prone?
Speaking strictly to the replacement problem, my preferred solution is one enabled by a feature that should probably be available in the upcoming Scala 2.8, which is the ability to replace regex patterns using a function. Using it, the problem can be reduced to this:
Which reduces the problem to what you actually intend to do: replace all $N patterns by the corresponding Nth value of a list.
Or, if you can actually set the standards for your input string, you could do it like this:
If that’s all you want, you can stop here. If, however, you are interested in how to go about solving such problems in a functional way, absent clever library functions, please do continue reading.
Thinking functionally about it means thinking of the function. You have a string, some values, and you want a string back. In a statically typed functional language, that means you want something like this:
If one considers that those values may be used in any order, we may ask for a type better suited for that:
That should be good enough for our function. Now, how do we break down the work? There are a few standard ways of doing it: recursion, comprehension, folding.
RECURSION
Let’s start with recursion. Recursion means to divide the problem into a first step, and then repeating it over the remaining data. To me, the most obvious division here would be the following:
That is actually pretty straight-forward to do, so let’s get into further details. How do I replace the first placeholder? One thing that can’t be avoided is that I need to know what that placeholder is, because I need to get the index into my values from it. So I need to find it:
Once found, I can replace it on the string and repeat:
That is inefficient, because it repeatedly produces new strings, instead of just concatenating each part. Let’s try to be more clever about it.
To efficiently build a string through concatenation, we need to use
StringBuilder. We also want to avoid creating new strings.StringBuildercan acceptsCharSequence, which we can get fromString. I’m not sure if a new string is actually created or not — if it is, we could roll our ownCharSequencein a way that acts as a view intoString, instead of creating a newString. Assured that we can easily change this if required, I’ll proceed on the assumption it is not.So, let’s consider what functions we need. Naturally, we’ll want a function that returns the index into the first placeholder:
But we also want to skip any part of the string we have already looked at. That means we also want a starting index:
There’s one small detail, though. What if there’s on further placeholder? Then there wouldn’t be any index to return. Java reuses the index to return that exception. When doing functional programming however, it is always best to return what you mean. And what we mean is that we may return an index, or we may not. The signature for that is this:
Let’s build this function:
That’s rather complex, mostly to deal with boundary conditions, such as index being out of range, or false positives when looking for placeholders.
To skip the placeholder, we’ll also need to know it’s length, signature
(String, Int) => Int:Next, we also want to know what, exactly, the index of the value the placeholder is standing for. The signature for this is a bit ambiguous:
The first
Intis an index into the input, while the second is an index into the values. We could do something about that, but not that easily or efficiently, so let’s ignore it. Here’s an implementation for it:We could have used the length too, and achieve a simpler implementation:
As a note, using curly brackets around simple expressions, such as above, is frowned upon by conventional Scala style, but I use it here so it can be easily pasted on REPL.
So, we can get the index to the next placeholder, its length, and the index of the value. That’s pretty much everything needed for a more efficient version of
replaceRecursive:Much more efficient, and as functional as one can be using
StringBuilder.COMPREHENSION
Scala Comprehensions, at their most basic level, means transforming
T[A]intoT[B]given a functionA => B, something known as a functor. It can be easily understood when it comes to collections. For instance, I may transform aList[String]of names into aList[Int]of name lengths through a functionString => Intwhich returns the length of a string. That’s a list comprehension.There are other operations that can be done through comprehensions, given functions with signatures
A => T[B], which is related to monads, orA => Boolean.That means we need to see the input string as a
T[A]. We can’t useArray[Char]as input because we want to replace the whole placeholder, which is larger than a single char. Let’s propose, therefore, this type signature:Since we the input we receive is
String, we need a functionString => List[String]first, which will divide our input into placeholders and non-placeholders. I propose this:Another problem we have is that we got an
IndexedSeq[String], but we need aString => String. There are many ways around that, but let’s settle with this:We also need a function
List[String] => String, butList‘smkStringdoes that already. So there’s little left to do aside composing all this stuff:I use
@uncheckedbecause there shouldn’t be any pattern other than these two above, if my regex pattern was built correctly. The compiler doesn’t know that, however, so I use that annotation to silent the warning it would produce. If an exception is thrown, there’s a bug in the regex pattern.The final function, then, unifies all that:
One problem with this solution is that I apply the regex pattern twice: once to break up the string, and the other to identify the placeholders. Another problem is that the
Listof tokens is an unnecessary intermediate result. We can solve that with these changes:FOLDING
Folding is a bit similar to both recursion and comprehension. With folding, we take a
T[A]input that can be comprehended, aB“seed”, and a function(B, A) => B. We comprehend the list using the function, always taking theBthat resulted from the last element processed (the first element takes the seed). Finally, we return the result of the last comprehended element.I’ll admit I could hardly explained it in a less-obscure way. That’s what happens when you try to keep abstract. I explained it that way so that the type signatures involved become clear. But let’s just see a trivial example of folding to understand its usage:
Or, as a one-liner:
Ok, so how would we go about solving the problem with folding? The result, of course, should be the string we want to produce. Therefore, the seed should be an empty string. Let’s use the result from
tokenize2as the comprehensible input, and do this:And, with that, I finish showing the most usual ways one would go about this in a functional manner. I have resorted to
StringBuilderbecause concatenation ofStringis slow. If that wasn’t the case, I could easily replaceStringBuilderin functions above byString. I could also convertIteratorinto aStream, and completely do away with mutability.This is Scala, though and Scala is about balancing needs and means, not of purist solutions. Though, of course, you are free to go purist. 🙂