Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8616747
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T05:41:47+00:00 2026-06-12T05:41:47+00:00

From what I understood, Hadoop is a distributed storage system thingy. However what I

  • 0

From what I understood, Hadoop is a distributed storage system thingy. However what I don’t really get is, can we replace normal RDBMS(MySQL, Postgresql, Oracle) with Hadoop? Or is Hadoop is just another type of filesystem and we CAN run RDBMS on it?

Also, can Django integrated with Hadoop? Usually, how web frameworks (ASP.NET, PHP, Java(JSP,JSF, etc) ) integrate themselves with Hadoop?

I am a bit confused with the Hadoop vs RDBMS and I would appreciate any explanation.
(Sorry, I read the documentation many times, but maybe due to my lack of knowledge in English, I find the documentation is a bit confusing most of the time)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T05:41:48+00:00Added an answer on June 12, 2026 at 5:41 am

    What is Hadoop?

    Imagine the following challange: you have a lot of data, and with a lot I mean at least Terabytes. You want to transform this data or extract some informations and process it into a format which is indexed, compressed or “digested” in a way so you can work with it.

    Hadoop is able to parallelize such a processing job and, here comes the best part, takes care of things like redundant storage of the files, distribution of the task over different machines on the cluster etc (Yes, you need a cluster, otherwise Hadoop is not able to compensate the performance loss of the framework).

    If you take a first look at the Hadoop ecosystem you will find 3 big terms: HDFS(Hadoop Filesystem), Hadoop itself(with MapReduce) and HBase(the “database” sometimes column store, which does not fits exactly)

    HDFS is the Filesystem used by both Hadoop and HBase. It is a extra layer on top of the regular filesystem on your hosts. HDFS slices the uploaded Files in chunks (usually 64MB) and keeps them available in the cluster and takes care of their replication.

    When Hadoop gets a task to execute, it gets the path of the input files on the HDFS, the desired output path, a Mapper and a Reducer Class. The Mapper and Reducer is usually a Java class passed in a JAR file.(But with Hadoop Streaming you can use any comandline tool you want). The mapper is called to process every entry (usually by line, e.g.: “return 1 if the line contains a bad F* word”) of the input files, the output gets passed to the reducer, which merges the single outputs into a desired other format (e.g: addition of numbers). This is a easy way to get a “bad word” counter.

    The cool thing: the computation of the mapping is done on the node: you process the chunks linearly and you move just the semi-digested (usually smaller) data over the network to the reducers.

    And if one of the nodes dies: there is another one with the same data.

    HBase takes advantage of the distributed storage of the files and stores its tables, splitted up in chunks on the cluster. HBase gives, contrary to Hadoop, random access to the data.

    As you see HBase and Hadoop are quite different to RDMBS. Also HBase is lacking of a lot of concepts of RDBMS. Modeling data with triggers, preparedstatements, foreign keys etc. is not the thing HBase was thought to do (I’m not 100% sure about this, so correct me 😉 )

    Can Django integrated with Hadoop?

    For Java it’s easy: Hadoop is written in Java and all the API’s are there, ready to use.

    For Python/Django I don’t know (yet), but I’m sure you can do something with Hadoop streaming/Jython as a last resort.
    I’ve found the following: Hadoopy and Python in Mappers and Reducers.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

It is understood that if you wish to get a container from an item
Maybe you understood from the title however here I go: I have a file,
From the little I have read, I understood that Hadoop is great for the
as I understood from ASP.net MVC 4 Release notes, is that it has Content
I just learnt about generators in Python a week back. From what I understood,
From what I have understood, it is not the best way to open a
Being a newbie I thought I understood what to do from a security standpoint
As you can understand from title, i modified Settings.cs file.Added some properties, some code
I found that Maven implies specific directory layout. But I don't understand from here:
I'm trying to implement this protocol: http://en.wikipedia.org/wiki/Chord_(peer-to-peer ) What i understood from it is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.