Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8964995
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T16:42:54+00:00 2026-06-15T16:42:54+00:00

I have always read that Cassandra is good if your application changes frequently and

  • 0

I have always read that Cassandra is good if your application changes frequently and features are added frequently.

That makes sense, since you don’t have any fixed schema, you can add columns to rows to suffice your needs, instead of running an ALTER TABLE query which may freeze your database for hours for very large tables.

However I have an hypotetical problem which I’m not able to solve.
Let’s say I have:

CREATE COLUMN FAMILY Students
    with comparator='CompositeType(UTF8Type,UTF8Type),
    and key_validation_class=UUIDType;

Each student has some generic column (you know, meta:username, meta:password, meta:surname, etc), plus each student may follow N courses. This N-N relationship is resolved using denormalization, adding N columns to each Student (course:ID1, course:ID2).

On the other side, I may have a Courses CF, where each row is contains all of the following Students UUIDs.

So I can ask “which courses are followed by XXX” and “which students follow course YYY”.

The problem is: what if I didn’t create the second column family? Maybe at the time when the application was built, getting the students following a specific course wasn’t a requirement.

This is a simple example, but I believe it’s quite common. “With Cassandra you plan CFs in terms of queries instead of relationships”. I need that query now, while at first it wasn’t needed.

Given a table of students with thousands of entries, how would you fill the Courses CF? Is this a job for Hadoop, Pig or Hive (I never touched any of those, just guessing).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T16:42:56+00:00Added an answer on June 15, 2026 at 4:42 pm

    Pig (which uses the Hadoop integration) is actually perfect for this type of work, because you can not only read but also write data back into Cassandra using CassandraStorage. It gives you the parallel processing capability to do the job with minimal time and overhead. Otherwise the alternative is to write something to do the extraction yourself, then write the new CF.

    Here is a Pig example that computes averages from a set of data in one CF and outputs them to another:

    rows = LOAD 'cassandra://HadoopTest/TestInput' USING CassandraStorage() AS (key:bytearray,cols:bag{col:tuple(name:chararray,value)});
    columns = FOREACH rows GENERATE flatten(cols) AS (name,value);
    grouped = GROUP columns BY name;
    vals = FOREACH grouped GENERATE group, columns.value AS values;
    avgs = FOREACH vals GENERATE group, 'Pig_Average' AS name, (long)SUM(values.value)/COUNT(values.value) AS average;    
    cass_group = GROUP avgs BY group;   
    cass_out = FOREACH cass_group GENERATE group, avgs.(name, average);
    STORE cass_out INTO 'cassandra://HadoopTest/TestOutput' USING CassandraStorage();
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a dummy question. I always read that C++ std::list container has constant
I was just asking myself, if this makes sense: I have a query that
I read that it is bad to have such structure in an iOS application.
I have always read that creating threads is expensive. I also know that you
I currently have an ActionBarActivity which always returns a NoClassDefFoundError . I've read that
I have always thought that in order to connect to SQL server using windows
I have always wondered about this. So many application setups have a zip file
I have always been a bit unclear on the type of tasks that should
I have read on this site that it is necessary to customize the setOnItemSelectedListener
I have always used open-uri, and open().read to get content through http. I am

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.