I’m trying to learn Scala and tried to write a sequence comprehension that extracts unigrams, bigrams and trigrams from a sequence. E.g., [1,2,3,4] should be transformed to (not Scala syntax)
[1; _,1; _,_,1; 2; 1,2; _,1,2; 3; 2,3; 1,2,3; 4; 3,4; 2,3,4]
In Scala 2.8, I tried the following:
def trigrams(tokens : Seq[T]) = {
var t1 : Option[T] = None
var t2 : Option[T] = None
for (t3 <- tokens) {
yield t3
yield (t2,t3)
yield (t1,t2,Some(t3))
t1 = t2
t2 = t3
}
}
But this doesn’t compile as, apparently, only one yield is allowed in a for-comprehension (no block statements either). Is there any other elegant way to get the same behavior, with only one pass over the data?
You can’t have multiple yields in a for loop because for loops are syntactic sugar for the map (or flatMap) operations:
translates into
Without a yield at all
translates into
So the entire body of the
forloop is turned into a single closure, and the presence of theyieldkeyword determines whether the function called on the collection ismaporforeach(orflatMap). Because of this translation, the following are forbidden:yieldto determine what will be yielded.(Not to mention that your proposed verison will return a
List[Any]because the tuples and the 1-gram are all of different types. You probably want to get aList[List[Int]]instead)Try the following instead (which put the n-grams in the order they appear):
or
If you prefer the n-grams to be in length order, try: