I have a semicolon delimited input file where first column is a 3 char fixed width code, while the remaining columns are some string data.
001;first_data_str;second_data_str;third_data_str;fourth_data_str
001;first_data_str;second_data_str;third_data_str;fourth_data_str
002;first_data_str;second_data_str;third_data_str;fourth_data_str
003;first_data_str;second_data_str;third_data_str;fourth_data_str
001;first_data_str;second_data_str;third_data_str;fourth_data_str
003;first_data_str;second_data_str;third_data_str;fourth_data_str
001;first_data_str;second_data_str;third_data_str;fourth_data_str
002;first_data_str;second_data_str;third_data_str;fourth_data_str
002;first_data_str;second_data_str;third_data_str;fourth_data_str
003;first_data_str;second_data_str;third_data_str;fourth_data_str
003;first_data_str;second_data_str;third_data_str;fourth_data_str
003;first_data_str;second_data_str;third_data_str;fourth_data_str
002;first_data_str;second_data_str;third_data_str;fourth_data_str
001;first_data_str;second_data_str;third_data_str;fourth_data_str
I want to divide above file into number of files based on different values of first column.
For e.g. in above example, there are three different values in the first column, so I will divide the file into three files viz. 001.txt, 002.txt, 003.txt
The output file should contain item count as line one and data as remaining lines.
So there are 5 001 rows, so 001.txt will be:
5
first_data_str;second_data_str;third_data_str;fourth_data_str
first_data_str;second_data_str;third_data_str;fourth_data_str
first_data_str;second_data_str;third_data_str;fourth_data_str
first_data_str;second_data_str;third_data_str;fourth_data_str
first_data_str;second_data_str;third_data_str;fourth_data_str
Similarly, 002 file will have first line as 4 and then 4 lines of data and 003 file will have first line as 5 and then five lines of data.
What would be the most efficient way to achieve this considering very large input file with greater then 100,000 rows?
I have written below code to read lines from the file:
try{
FileInputStream fstream = new FileInputStream(this.inputFilePath);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
while ((strLine = br.readLine()) != null) {
String[] tokens = strLine.split(";");
}
in.close();
}catch(IOException e){
e.printStackTrace();
}
First, create
HashMap<String, ArrayList<String>> mapto collect all the data from the file.Second, use
strLine.split(";",2)instead ofstrLine.split(";"). The result will be array of length 2, first element be the code and the second the data.Then, add decoded string to the map:
At the end, scan the
map.keySet()and for each key, create a file named as that key and write list’s size and list’s content to it.