i am new to hadoop map reduce framework, and I am thinking of using

Question

0

Asked: May 23, 20262026-05-23T16:59:48+00:00 2026-05-23T16:59:48+00:00

i am new to hadoop map reduce framework, and I am thinking of using

0

i am new to hadoop map reduce framework, and I am thinking of using hadoop map reduce to parse my data. I have thousands of big delimited files for which I am thinking of writing a map reduce job to parse those files and load them into hive datawarehouse. I have written a parser in perl which can parse those files. But I am stuck at doing the same with Hadoop map reduce

For example: I have a file like
x=a y=b z=c…..
x=p y=q z=s…..
x=1 z=2 ….
and so on

Now I have to load this file as columns (x,y,z) in hive table, but I am not able to figure out can I proceed with it. Any guidance with this would be really helpful.

Another problem in doing this is there are some files where the field y is missing. I have to include that condition in the map reduce job. So far, I have tried using streaming.jar and giving my parser.pl as mapper as input to that jar file. I think that is not the way to do it :), but I was just trying if that would work. Also, I thought of using load function of Hive, but the missing column will create problem if I will specify regexserde in hive table.

I am lost in this now, if any one could guide me with this I would be thankful 🙂

Regards,
Atul

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T16:59:49+00:00

I posted something a while ago to my blog a while ago. (Google “hive parse_url” should be in the top few)

I was parsing urls but in this case you will want to use str_to_map.

str_to_map(arg1, arg2, arg3)

arg1 => String to process
arg2 => Key Value Pair separator
arg3 => Key Value separator

str = "a=1 b=42 x=abc"
str_to_map(str, " ", "=")

The result of str_to_map will give you a map<str, str> of 3 key-value pairs.

str_to_map(str, " ", "=")["a"] --will return "1"

str_to_map(str, " ", "=")["b"] --will return "42"

We can pass this to Hive via:

INSERT OVERWRITE TABLE new_table_with_cols_x_y_z
(select params["x"], params["y"], params["z"] 
 from (
   select str_to_map(raw_line," ","=") as params from data
 ) raw_line_from_data
) final_data

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i am new to hadoop map reduce framework, and I am thinking of using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply