Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7949769
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T02:04:28+00:00 2026-06-04T02:04:28+00:00

In my company we are developing MapReduce applications on Hadoop. There is a debate

  • 0

In my company we are developing MapReduce applications on Hadoop. There is a debate going on over dependency management for these projects and I would like to hear you opinion.

We are using Cloudera’s Hadoop distribution (CDH).

Our development workflow:

  • a MapReduce project is hosted in SVN repos
  • each of them has a POM file with dependencies defined (and some other stuff too)
  • we also create Oozie workflow projects which have these MapReduce projects defined as depenencies in their POM and which are responsible to define the execution flow of the MapReduce projects
  • the build artifact of a Oozie project is a jar file containing all MapReduce jars it uses and their dependencies (we use Maven’s assembly plugin to compress it), this is the artifact we later deploy to HDFS (after decompressing)
  • we build the projects with Maven, managed by Jenkins
  • successful builds get deployed to an Archiva server
  • deployment to HDFS is on-demand from Archiva, getting the artifact of the Oozie project build, extracting it and putting it to HDFS
  • some dependencies (namely the ones used by Oozie; Hive, Sqoop, MySQL connector, Jline, commons-…, etc) are not needed for building the projects but they needed for it to work

Still with me?

Now the debate is about defining these dependencies of MapReduce and Oozie projects. There are two standpoints.

One says it’s not needed to define these dependencies (ie. the ones not needed to build the projects) in the POM files, but instead, have them in a shared directory in HDFS and always assume they are there.

Pros:

  • devs don’t need to take care of these (however, they take care of some others)
  • most likely, when updating CDH distribution, it’s easier to update these in the shared directory than in each project individuality (not sure if this is necessary though)

Cons:

  • some dependencies are defined for the projects, some are assumed which doesn’t feel right
  • the shared directory can become a sink of unused JARs and no one will know which is still used and which not
  • code becomes less portable because it assumes these JARs are always there in HDFS with the right version

So what do you guys think?

EDIT: forgot to write, but it’s quite obvious, that the 2nd option is to define all dependencies – even if they will repeat for most projects and need some maintenance.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T02:04:29+00:00Added an answer on June 4, 2026 at 2:04 am

    I vote for the second which means to handle and maintain the dependencies for each project instead of a shared-directory. Cause the problem is that the shared directory will change over the time and after some time other project will not work anymore cause someone removed some dependencies etc. So it’s better to hold the dependencies into the pom which they intended for. Furthermore any project will run out of the box without any dependency to the current state of the shared-directory.

    You might think about a parent pom which contains some default dependencies which should be used. This can be handled via definition in dependencyManagement section and the particular project defines the real dependencies without the versions.
    An other solution might be to use the import scope.

    <dependency>
      <groupId>yourGroupIdy</groupId>
      <artifactId>YourArtifactId</artifactId>
      <version>1.0</version>
      <scope>import</scope>
    </dependency>
    

    via this it’s possible to have a defined set of dependencies which is not needed to maintain in every project only in this single pom project which is responsible for the dependencies.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to benchmark a website that our company is developing. It will
I'm developing a custom WordPress theme for a company, and I'd like to keep
I'm currently developing PL/SQL stored procedures for my company and I'd like to use
So we have recently started developing applications for the iPad for our company. Unfortunately
My company is developing an application that has a dependency on PostgreSQL, we are
I'm developing a site for a company that has clients from all over the
We're developing a Flex-based component that's going to be hosted inside another company's product.
I know someone who has a one-man company developing applications in Delphi and Pascal.
My company is currently developing a project management web application in PHP, and I
on my company we are developing a ERP-like app using java and jsf, so

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.