Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6131611
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T16:59:48+00:00 2026-05-23T16:59:48+00:00

i am new to hadoop map reduce framework, and I am thinking of using

  • 0

i am new to hadoop map reduce framework, and I am thinking of using hadoop map reduce to parse my data. I have thousands of big delimited files for which I am thinking of writing a map reduce job to parse those files and load them into hive datawarehouse. I have written a parser in perl which can parse those files. But I am stuck at doing the same with Hadoop map reduce

For example: I have a file like
x=a y=b z=c…..
x=p y=q z=s…..
x=1 z=2 ….
and so on

Now I have to load this file as columns (x,y,z) in hive table, but I am not able to figure out can I proceed with it. Any guidance with this would be really helpful.

Another problem in doing this is there are some files where the field y is missing. I have to include that condition in the map reduce job. So far, I have tried using streaming.jar and giving my parser.pl as mapper as input to that jar file. I think that is not the way to do it :), but I was just trying if that would work. Also, I thought of using load function of Hive, but the missing column will create problem if I will specify regexserde in hive table.

I am lost in this now, if any one could guide me with this I would be thankful 🙂

Regards,
Atul

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T16:59:49+00:00Added an answer on May 23, 2026 at 4:59 pm

    I posted something a while ago to my blog a while ago. (Google “hive parse_url” should be in the top few)

    I was parsing urls but in this case you will want to use str_to_map.

    str_to_map(arg1, arg2, arg3)
    
    • arg1 => String to process
    • arg2 => Key Value Pair separator
    • arg3 => Key Value separator
    str = "a=1 b=42 x=abc"
    str_to_map(str, " ", "=")
    

    The result of str_to_map will give you a map<str, str> of 3 key-value pairs.

    str_to_map(str, " ", "=")["a"] --will return "1"
    
    str_to_map(str, " ", "=")["b"] --will return "42"
    

    We can pass this to Hive via:

    INSERT OVERWRITE TABLE new_table_with_cols_x_y_z
    (select params["x"], params["y"], params["z"] 
     from (
       select str_to_map(raw_line," ","=") as params from data
     ) raw_line_from_data
    ) final_data
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm using the new Hadoop API to write a sequence of map-reduce jobs. I
I have a sequential file which is the output of hadoop map-reduce job. In
I am a new bee in hadoop - big data analysis. I am referring
I have been trying to run simple map-reduce jobs on data stored in Cassandra
I am starting on a new Hadoop project that will have multiple hadoop jobs(and
I have a job with mapper PrepareData only which needed for converting text data
I have created a Mapper & Reducer which use AVRO for input, map-output en
I am new to Hive and Hadoop. I have implemented a task in hive.
I'm altering a hadoop map - reduce job that currently compiles and runs fine
I wrote a simple map reduce job that would read in data from the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.