Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7854431
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T20:03:50+00:00 2026-06-02T20:03:50+00:00

I have a large collection of tab separated text data in the form of

  • 0

I have a large collection of tab separated text data in the form of DATE NAME MESSAGE. By large I mean, a collection of 1.76GB divided into 1075 actual files. I have to get the NAME data from all the files. Till now I have this:

   File f = new File(directory);
        File files[] = f.listFiles();
        // HashSet<String> all = new HashSet<String>();
        ArrayList<String> userCount = new ArrayList<String>();
        for (File file : files) {
            if (file.getName().endsWith(".txt")) {
                System.out.println(file.getName());
                BufferedReader in;
                try {
                    in = new BufferedReader(new FileReader(file));
                    String str;
                    while ((str = in.readLine()) != null) {
                        // if (all.add(str)) {
                        userCount.add(str.split("\t")[1]);
                        // }

                        // if (all.size() > 500)
                        // all.clear();
                    }
                    in.close();
                } catch (IOException e) {
                    System.err.println("Something went wrong: "
                            + e.getMessage());
                }

            }
        }

My program is always giving out of memory exception even with -Xmx1700. I cannot go beyond that. Is there anyway I can optimize the code so that it can handle the ArrayList<String> of NAMEs?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T20:03:51+00:00Added an answer on June 2, 2026 at 8:03 pm

    Since you seem to be allowing alternative solutions than Java, here’s an awk one that should handle it.

    cat *.txt | awk -F'\t' '{sum[$2] += 1} END {for (name in sum) print name "," sum[name]}'
    

    Explanation:

    -F'\t' - separate on tabs
    sum[$2] += 1 - increment the value for the second element (name)
    

    Associative arrays make this extremely succinct. Running it on a test file I created as follows:

    import random
    
    def main():
        names = ['Nick', 'Frances', 'Carl']
        for i in range(10000):
            date = '2012-03-24'
            name = random.choice(names)
            message = 'asdf'
            print '%s\t%s\t%s' %(date, name, message)
    
    if __name__ == '__main__':
        main()
    

    I get the results:

    Carl,3388
    Frances,3277
    Nick,3335
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We have a large internal data collection website. I don't have time to create
I have a large collection of data chunks sized 1kB (in the order of
I need help with regular expression. I have very large collection of text files
I have a large collection of documents scanned into PDF format, and I wish
I have a large collection of roughly 3.2 million records, this collection data is
I have a large collection of data in an excel file (and csv files).
Suppose I have a large (300-500k) collection of text documents stored in the relational
I have a dataset consisting of a large collection of points in three dimensional
I have a large (more than 100K objects) collection of Java objects like below.
i have large numbers of text files and i am in problem that i

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.