Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8156469
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T16:57:40+00:00 2026-06-06T16:57:40+00:00

I am running a simple join query select count(*) from t1 join t2 on

  • 0

I am running a simple join query

 select count(*) from t1 join t2 on t1.sno=t2.sno 

Table t1 and t2 both have 20 million records each and column sno is of string data type.

The table data is imported in to HDFS from Amazon s3 in rcfile format.
The query took 109s with 15 Amazon large instances however it takes 42sec on sql server with 16 GB RAM and 16 cpu cores.

Am I missing anything? Can’t understand why am I getting slow performance on Amazon?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T16:57:43+00:00Added an answer on June 6, 2026 at 4:57 pm

    Some questions to help you tune Hadoop performance:

    • What does your IO utilization look like on those instances? Maybe large instances are not the right balance of CPU / Disk / Memory for the job.
    • How are your files stored? Is it a single file, or many small files? Hadoop isn’t so hot with many small files, even if they’re combinable
    • How many reducers did you run? You want to have about 0.9*totalReduceCapacity as ideal
    • How skewed is your data? If there are many records with the same key they will all go to the same reducer, and you’ll have O(n*n) upper bound in that reducer if you’re not careful.

    sql-server might be fine with 40mm records, but wait till you have 2bn records and see how it does. It will probably just break. I’d see hive more as a clever wrapper for Map Reduce rather than an alternative to a real database.

    Also from experience I think having 15 c1.mediums might perform just as well as the large machines, if not better. the large machines don’t have the right balance of CPU/Memory honestly.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm running a simple query with a join, similar to SELECT t1.a, t2.b FROM
I have a simple query running on both .NET 3.5 and .NET 4, something
I have the following query running against a mysql database: select value from fact_data
I am running the following query: SELECT MyField, COUNT(*) AS MyCount FROM MyTable NATURAL
I am running a simple select query on oracle server A And on Oracle
I have a simple SQL query that when run from C# takes over 30
Trying to get a simple COUNT from a table that takes a couple of
I have two massive tables with about 100 million records each and I'm afraid
I have the following sample query (MySQL): SELECT * FROM `action` WHERE `customer_id` IN
I have a simple LINQ query running on top of Entity Framework (v1) and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.