I’m now going to manipulate large files in Ruby.
What I’m going to do is to add a header line to a gigabyte size file containing a sequence of characters with no new-lines and output the result to a new file. (shown below in example). There can be one of four characters at each position in the sequence (a, c, g, t)
My questions are
-
Should I open gigabyte size file & ouput file whether as Binary or as Text file?
-
I’d like to know a sample code. (If possible, I don’t want to load all the data of giga-byte size file at once on memory of my computer.)
Thanks.
Example
Suppose the program name is add-header-giga (The 1st argument is header-line and the 2nd argument specifies the input file name. Output file name is output-file)
>cat giga-byte-size-file.txt cctgcaggagcagagcaaagaggtggccatccgcatctttcgggctgccagtttcgctcctggaggctgtgcag.... >add-header-giga DNA-sequence-from-Homo-Sapiens giga-byte-size-file.txt >cat output-file DNA-sequence-from-Homo-Sapiens cctgcaggagcagagcaaagaggtggccatccgcatctttcgggctgccagtttcgctcctggaggctgtgcag....
If there’s no newlines then it doesn’t matter. Binary vs text only differs in their treatment of newlines.