I am a Hadoop/PIG beginner. Could anyone please tell me the difference between grunt>

Question

0

Editorial Team

Asked: June 5, 20262026-06-05T15:28:21+00:00 2026-06-05T15:28:21+00:00

I am a Hadoop/PIG beginner. Could anyone please tell me the difference between grunt>

0

I am a Hadoop/PIG beginner.

Could anyone please tell me the difference between

grunt> A = join A by $1, B by $1 using 'merge';

And
grunt> A = join A by $1, B by $1;

I have 2 files 1.txt and 2.txt which have the following data
1.txt
A 1
B 3
C 5
D 7

2.txt
AA 1
BB 2
CC 4
DD 6

And I want the output merged together like this
A 1
AA 1
BB 2
B 3
CC 4
C 5
DD 6
D 7

Will “using ‘merge'” give me the desired output?

I tried, however it is not.

Can you let me know what am I missing here.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T15:28:22+00:00

Sounds like you are getting an inner join (datasets joined by a common key) rather than an outer join (which is what it looks like you are after from your desired output).

Use the word keyword FULL to signify you want a full outer join:

grunt> A = join A by $1 FULL, B by $1 using 'merge';

This may however yield unexpected results if you have a record in both datasets with the same $0 (see the example for inner join). You may also need to amend the output to drop the missing columns between the two datasets.

Alternatively, if you just want to append one dataset to another, and then sort, use the UNION and ORDER BY operators

grunt> U = UNION A, B;
grunt> OrderedU = ORDER U BY $1

See

for more information about each

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am a Hadoop/PIG beginner. Could anyone please tell me the difference between grunt>

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply