From input 1:
fruit, apple, cider
animal, beef, burger
and input 2:
animal, beef, 5kg
fruit, apple, 2liter
fish, tuna, 1kg
I need to produce:
fruit, apple, cider, 2liter
animal, beef, burger, 5kg
The closest example I could get is:
object FileMerger {
def main(args : Array[String]) {
import scala.io._
val f1 = (Source fromFile "file1.csv" getLines) map (_.split(", *")(1))
val f2 = Source fromFile "file2.csv" getLines
val out = new java.io.FileWriter("output.csv")
f1 zip f2 foreach { x => out.write(x._1 + ", " + x._2 + "\n") }
out.close
}
}
The problem is that the example assumes that the two CSV files contain the same number of elements and in the same order. My merged result must only contain elements that are in the first and the second file. I am new to Scala, and any help will be greatly appreciated.
You need an intersection of the two files: the lines from file1 and file2 which share some criteria. Consider this through a set theory perspective: you have two sets with some elements in common, and you need a new set with those elements. Well, there’s more to it than that, because the lines aren’t really equal…
So, let’s say you read file1, and that’s of type
List[Input1]. We could code it like this, without getting into any details of whatInput1is:We can do the same thing for file2 and
List[Input2]:You might be wondering why I created two different classes if they have the exact same definition. Well, if you were reading structured data, you would have two different types, so let’s see how to handle that more complex case.
Ok, so how do we match them, since
Input1andInput2are different types? Well, the lines are matched by keys, which, according to your code, are the first column in each. So let’s create a classKey, and conversionsInput1 => KeyandInput2 => Key:Ok, now that we can produce a common
KeyfromInput1andInput2, let’s get the intersection of them:So we can build the intersection of lines we want, but we don’t have the lines! The problem is that, for each key, we need to know from which line it came. Consider that we have a set of keys, and for each key we want to keep track of a value — that’s exactly what a
Mapis! So we can build this:So the output can be produced like this:
All you have to do now is output that.
Let’s consider some improvements on this code. First, note that the output produced above repeats the key — that’s exactly what your code does, but not what you want in the example. Let’s change, then,
Input1andInput2to split the key from the rest of the args:It’s now a bit harder to initialize f1 and f2. Instead of using
split, which will break all the line unnecessarily (and at great cost to performance), we’ll divide the line right the at the first comma: everything before is key, everything after is rest. The methodspandoes that:Play a bit with the
spanmethod on REPL to get a better understanding of it. As for(',' !=), that’s just an abbreviated form of saying(x => ',' != x).Next, we need a way to create
Input1andInput2from a tuple (the result ofbreakLine):We can now read the files:
Another thing we can simplify is intersection. When we create a
Map, its keys are sets, so we can create the maps first, and then use their keys to compute the intersection:And the output is computed like this:
Note that I don’t append comma anymore — the rest of both f1 and f2 start with a comma already.