Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6330531
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T17:53:15+00:00 2026-05-24T17:53:15+00:00

Given data in the following format (tag_uri image_uri image_uri image_uri …), I need to

  • 0

Given data in the following format (tag_uri image_uri image_uri image_uri …), I need to turn them into Hadoop SequenceFile format for further processing by Mahout (e.g. clustering)

http://flickr.com/photos/tags/100commentgroup http://flickr.com/photos/34254318@N06/4019040356 http://flickr.com/photos/46857830@N03/5651576112
http://flickr.com/photos/tags/100faves http://flickr.com/photos/21207178@N07/5441742937
...

Before this I would turn the input into csv (or arff) as follows

http://flickr.com/photos/tags/100commentgroup,http://flickr.com/photos/tags/100faves,...
0,1,...
1,1,...
...

with each row describes one tag. Then the arff file is converted into a vector file used by mahout for further processing. I am trying to skip the arff generation part, and generate a sequenceFile instead. If I am not mistaken, to represent my data as a sequenceFile, I would need to store each row of the data with $tag_uri as key, then $image_vector as value. What is the proper way of doing this (if possible, can I have the tag_url for each row to be included in the sequencefile somewhere)?

Some references that I found, but not sure if they are relevant:

  1. Writing a SequenceFile
  2. Formatting input matrix for svd matrix factorization (can I store my matrix in this form?)
  3. RandomAccessSparseVector (considering I only list images that are assigned with a given tag instead of all the images in a line, is it possible to represent it using this vector?)
  4. SequenceFile write
  5. SequenceFile explanation
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T17:53:16+00:00Added an answer on May 24, 2026 at 5:53 pm

    You just need a SequenceFile.Writer, which is explained in your link #4. This lets you write key-value pairs to the file. What the key and value are depends on your use case, of course. It’s not at all the same for clustering versus matrix decomposition versus collaborative filtering. There’s not one SequenceFile format.

    Chances are that the key or value will be a Mahout Vector. The thing that knows how to write a Vector is VectorWritable. This is the class you would use to wrap a Vector and write it with SequenceFile.Writer.

    You would need to look at the job that will consume it to make sure you’re passing what it expects. For clustering, for example, I think the key is ignored and the value is a Vector.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Given the following schema / data / output how would I format a SQL
Given delimited data in the following format, how can I insert and delete columns?
Given a set of delimited data in the following format: 1|Star Wars: Episode IV
I have an odd linq subquery issue. Given the following data structure: Parents Children
Hey guys, given a data set in plain text such as the following: ==Events==
Given the following example data: Users +--------------------------------------------------+ | ID | First Name | Last
Given the following simple BST definition: data Tree x = Empty | Leaf x
How can you sort the following word by the given rule? Example data is
Given I have data like the following, how can I select and group by
Given the following table (how to format those correctly here?) primary secondary A a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.