Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6055481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T08:16:10+00:00 2026-05-23T08:16:10+00:00

My data is stored in large matrices stored in text files with millions of

  • 0

My data is stored in large matrices stored in text files with millions of rows and 4 columns of comma-separated values. (Each column stores a different variable, and each row stores a different millisecond’s data for all four variables.) There is also some irrelevant header data in the first dozen or so lines. I need to write Java code to load this data into four arrays, with one array for each column in the text matrix.

The Java code also needs to be able to tell when the header is done, so that the first data row can be split into entries for the 4 arrays. Finally, the Java code needs to iterate through the millions of data rows, repeating the process of decomposing each row into four numbers which are each entered into the appropriate array for the column in which the number was located.

How can I alter the code below in order to accomplish this? I want to find the fastest way to accomplish this processing of millions of rows.

Here is my code:

MainClass2.java

  package packages;

public class MainClass2{
    public static void main(String[] args){
    readfile2 r = new readfile2();
    r.openFile();
    int x1Count = r.readFile();
    r.populateArray(x1Count);
    r.closeFile();  
}
}

readfile2.java

  package packages;

import java.io.*;
import java.util.*;

public class readfile2 {
private Scanner scan1;
private Scanner scan2;

public void openFile(){
    try{
        scan1 = new Scanner(new File("C:\\test\\samedatafile.txt"));
        scan1 = new Scanner(new File("C:\\test\\samedatafile.txt"));
    }
    catch(Exception e){
        System.out.println("could not find file");
    }
}
public int readFile(){
    int scan1Count = 0;
    while(scan1.hasNext()){
        scan1.next();
        scan1Count += 1;
    }
    return scan1Count;
}
public double[] populateArray(int scan1Count){
    double[] outputArray1 = new double[scan1Count];
    double[] outputArray2 = new double[scan1Count];
    double[] outputArray3 = new double[scan1Count];
    double[] outputArray4 = new double[scan1Count];
    int i = 0;
    while(scan2.hasNext()){
        //what code do I write here to:
        //  1.) identify the start of my time series rows after the end of the header rows (e.g. row starts with a number AT LEAST 4 digits in length.)
        //  2.) split each time series row's data into a separate new entry for each of the 4 output arrays
        i++;
    }
    return outputArray1, outputArray2, outputArray3, outputArray4;
}
public void closeFile(){
    scan1.close();
    scan2.close();
}
}

Here are the first 19 lines of a typical data file:

text and numbers on first line
1 msec/sample
3 channels
ECG
Volts
Z_Hamming_0_05_LPF
Ohms
dz/dt
Volts
min,CH2,CH4,CH41,
,3087747,3087747,3087747,
0,-0.0518799,17.0624,0,
1.66667E-05,-0.0509644,17.0624,-0.00288295,
3.33333E-05,-0.0497437,17.0624,-0.00983428,
5E-05,-0.0482178,17.0624,-0.0161573,
6.66667E-05,-0.0466919,17.0624,-0.0204402,
8.33333E-05,-0.0448608,17.0624,-0.0213986,
0.0001,-0.0427246,17.0624,-0.0207532,
0.000116667,-0.0405884,17.0624,-0.0229672,

Edit

I tested Shilaghae’s code suggestion. It seems to work. However, the length of all the resulting arrays is the same as x1Count, so that zeros remain in the places where Shilaghae’s pattern matching code is not able to place a number. (This is a result of how I wrote the code originally.)

I was having trouble finding the indices where zeros remain, but there seemed to be a lot more zeros besides the ones expected where the header was. When I graphed the derivative of the temp[1] output, I saw a number of sharp spikes where false zeros in temp[1] might be. If I can tell where the zeros in temp[1], temp[2], and temp[3] are, I might be able to modify the pattern matching to better retain all the data.

Also, it would be nice to simply shorten the output array to no longer include the rows where the header was in the input file. However, the tutorials I have found regarding variable length arrays only show oversimplified examples like:

int[] anArray = {100, 200, 300, 400};

The code might run faster if it no longer uses scan1 to produce scan1Count. I do not want to slow the code down by using an inefficient method to produce a variable-length array. And I also do not want to skip data in my time series in the cases where the pattern matching is not able to split the input row into 4 numbers. I would rather keep the in-time-series zeros so that I can find them and use them to debug the pattern matching.

Can these things be done in fast-running code?


Second edit

So

"-{0,1}\\d+.\\d+,"  

repeats for times in the expression:

"-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,"  

Does

"-{0,1}\\d+.\\d+,"  

decompose into the following three statements:

"-{0,1}" means that a minus sign occurs zero or one times, while  

"\\d+." means that the minus sign(or lack of minus sign) is followed by several digits of any value followed by a decimal point, so that finally  

"\\d+," means that the decimal point is followed by several digits of any value?  

If so, what about numbers in my data like "1.66667E-05," or "-8.06131E-05," ? I just scanned one of the input files, and (out of 3+ million 4-column rows) it contains 638 numbers that contain E, of which 5 were in the first column, and 633 were in the last column.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T08:16:10+00:00Added an answer on May 23, 2026 at 8:16 am

    You could read line to line the file and for every line you could control with a regular expression (http://www.vogella.de/articles/JavaRegularExpressions/article.html) if the line presents exactly 4 comma.
    If the line presents exactly 4 comma you can split the line with String.split and fill the 4 array otherwise you pass at next line.

            public double[][] populateArray(int scan1Count){
                double[] outputArray1 = new double[scan1Count];
                double[] outputArray2 = new double[scan1Count];
                double[] outputArray3 = new double[scan1Count];
                double[] outputArray4 = new double[scan1Count];
    
    
                //Read File Line By Line
                try {
                    File tempfile = new File("samedatafile.txt");
                    FileInputStream fis = new FileInputStream(tempfile);
                    DataInputStream in = new DataInputStream(fis);
                    BufferedReader br = new BufferedReader(new InputStreamReader(in));      
                    String strLine;
                    int i = 0;
                    while ((strLine = br.readLine()) != null)   {
                          Pattern pattern = Pattern.compile("-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,-{0,1}\\d+.\\d+,");
                          Matcher matcher = pattern.matcher(strLine);
                          if (matcher.matches()){
                              String[] split = strLine.split(",");              
                              outputArray1[i] = Double.parseDouble(split[0]);
                              outputArray2[i] = Double.parseDouble(split[1]);
                              outputArray3[i] = Double.parseDouble(split[2]);
                              outputArray4[i] = Double.parseDouble(split[3]);
                          }
                          i++;
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
                double[][] temp = new double[4][];
                temp[0]= outputArray1;
                temp[1]= outputArray2;
                temp[2]= outputArray3;
                temp[3]= outputArray4;
                return temp;
            }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have large data files stored in S3 that I need to analyze. Each
I have data stored in three columns of Excel Column A: Serial Number Column
I have a large amount of data stored in an XML file, 173 MB
I would like to rename a large number of columns (column headers) to have
I am looking for a way to put large arrays of data (stored inside
Stored procedures are typically used for data validation or to encapsulate large, complex processing
For our application, we keep large amounts of data indexed by three integer columns
I have a large amount of data stored in a Collection . I would
I'm developing an application which needs to store large amounts of data. I cannot
I want to store a large amount of data onto my Arduino with a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.