Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7043339
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T02:16:37+00:00 2026-05-28T02:16:37+00:00

I have a file containing vectors of data, where each row contains a comma-separated

  • 0

I have a file containing vectors of data, where each row contains a comma-separated list of values. I am wondering how to perform k-means clustering on this data using mahout. The example provided in the wiki mentions creating sequenceFiles, but otherwise I am not sure if I need to do some type of conversion in order to obtain these sequenceFiles.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T02:16:38+00:00Added an answer on May 28, 2026 at 2:16 am

    I would recommend manually reading in the entries from the CSV file, creating NamedVectors from them, and then using a sequence file writer to write the vectors in a sequence file. From there on, the KMeansDriver run method should know how to handle these files.

    Sequence files encode key-value pairs, so the key would be an ID of the sample (it should be a string), and the value is a VectorWritable wrapper around the vectors.

    Here is a simple code sample on how to do this:

        List<NamedVector> vector = new LinkedList<NamedVector>();
        NamedVector v1;
        v1 = new NamedVector(new DenseVector(new double[] {0.1, 0.2, 0.5}), "Item number one");
        vector.add(v1);
    
        Configuration config = new Configuration();
        FileSystem fs = FileSystem.get(config);
    
        Path path = new Path("datasamples/data");
    
        //write a SequenceFile form a Vector
        SequenceFile.Writer writer = new SequenceFile.Writer(fs, config, path, Text.class, VectorWritable.class);
        VectorWritable vec = new VectorWritable();
        for(NamedVector v:vector){
            vec.set(v);
            writer.append(new Text(v.getName()), v);
        }
        writer.close();
    

    Also, I would recommend reading chapter 8 of Mahout in Action. It gives more details on data representation in Mahout.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have file containing a list of files separated by end of lines $
I have a file containing the data shown below. The first comma-delimited field may
I have a file containing some data (for example, 00927E2B112DB958......). This data is a
I have a file containing lots of data put in a form similar to
I have a file containing a list of filenames: esocket.c esocket.h dockwin.cpp dockwin.h makefile
I have a file containing data in a single column .. I have to
I have a file containing a [Double] serialized by Data.Binary that I'd like to
I have text file containing a list of 16 bit hex numbers (e.g. '61C7393AA9B3474DB081C7B7CCE1C545')
I have a file containing data like so: 2012-01-02 GREEN 4 2012-01-02 GREEN 6
I have a file containing data that I'd like to monitor changes to, as

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.