Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9280267
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T17:49:47+00:00 2026-06-18T17:49:47+00:00

I am using Cassandra to store my data and hive to process my data.

  • 0

I am using Cassandra to store my data and hive to process my data.
I have 5 machines on which i have set up cassandra and 2 machines I use as analytics node(where hive runs)
So I want to ask is does hive do map reduce on just two machines(analytics nodes) and brings data there or it moves the process/computation to 5 cassandra nodes as well and process/compute the data on those machines.(What I know is in hadoop, process moves to data not data to process).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T17:49:48+00:00Added an answer on June 18, 2026 at 5:49 pm

    If you interested to marry Hadoop and Cassandra – the first link should DataStax company which is built around this concept. http://www.datastax.com/
    They built and support hadoop with HDFS replaced with cassandra.
    In best of my understanding – they do have data locality:http://blog.octo.com/en/introduction-to-datastax-brisk-an-hadoop-and-cassandra-distribution/

    There is good answer about Hadoop & Cassandra data locality if you run MapReduce against cassandra
    Cassandra and MapReduce – minimal setup requirements

    Regarding your question – there is a tradeof:
    a) If you run Hadoop / Hive on separate nodes you loose data locality and thereof your data throughput is limited by your network bandwidth.
    b) If you run hadoop / Hive on the same nodes as cassandra runs – you can get data locality but MapReduce processing behind hive queries might clogg your network (and other resources) and thereof affect your quality of service from cassandra.

    My suggestion will be to have separate hive nodes if performance of your cassandra cluster are critical.
    If your cassandra is mostly used as a data store and do not handle real-time requests – then running hive on each node will improve performance and hardware utilization.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am using Cassandra to store my parsed site logs. I have two column
I am using Node.js with Helenus to connect to a Cassandra DB. I have
I have an exception occurred, using cassandra with thrift. I want to insert some
If you are using the Cassandra distributed key-value store, you will have several Cassandra
I have a SessionDaoCassandraImpl class that reads data from Cassandra using Astyanax that I
i'm using Ruby on Rails and i want to store images into the cassandra
i am using cassandra-jdbc to perform the operation on data in cassandra but when
I'm getting NullPointerException :s when using sstable2json in Cassandra 0.6.0-beta3: $ bin/sstable2json .../cassandra/data/system/LocationInfo-1-Data.db Exception
can cassandra use multiple native secondary indexes when querying on mutiple indexed columns? using
I start to using cassandra and I want to index my db with sphinx.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.