I currently have a script that invokes bin/sstable2json on all files of the pattern /var/lib/cassandra/data/fake-keyspace/*-Data.db and saves the output from std out to disk. However the exported files are starting to take 10x the space of the all files in /var/lib/cassandra
I took this approach after reading the following section http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export
What are some of the best practices for get data out from one cluster to another? Just to be clear, I am not trying to add additional nodes to a ring, but rather move data out of one ring to another in a process that is repeatable.
Any help or nudge in the right direction would be much appreciated.
Just copy the sstable files. The only reason to use json is for (1) debugging or (2) you want to do some kind of processing in the json form before re-loading.
So, just rename all the sstable files (from a snapshot, if you’re running live in the first cluster) to unique numbers (order doesn’t matter, as long as they’re unique), and copy them all to the data directory on the target machine.