Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8957215
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T14:56:11+00:00 2026-06-15T14:56:11+00:00

I receive data in the form id1|attribute1a,attribute1b|attribute2a|attribute3a,attribute3b,attribute3c…. id2||attribute2b,attribute2c|.. I’m trying to merge it all

  • 0

I receive data in the form

id1|attribute1a,attribute1b|attribute2a|attribute3a,attribute3b,attribute3c....
id2||attribute2b,attribute2c|..

I’m trying to merge it all into a form where I just have a bag of tuples of an id field followed by a tuple containing a list of all my other fields merged together.

(id1,(attribute1a,attribute1b,attribute2a,attribute3a,attribute3b,attribute3c…))
(id2,(attribute2b,attribute2c…))

Currently I fetch it like

my_data = load '$input' USING PigStorage(|) as 
(id:chararray, attribute1:chararray, attribute2:chararray)...

then I’ve tried all combinations of FLATTEN, TOKENIZE, GENERATE, TOTUPLE, BagConcat, etc. to massage it into the form I want, but I’m new to pig and just can’t figure it out. Can anyone help? Any open source UDF libraries are fair game.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T14:56:12+00:00Added an answer on June 15, 2026 at 2:56 pm

    Load each line as an entire string, and then use the features of the built-in STRPLIT UDF to achieve the desired result. This relies on there being no tabs in your list of attributes, and assumes that | and , are not to be treated any differently in separating out the different attributes. Also, I modified your input a little bit to show more edge cases.

    input.txt:

    id1|attribute1a,attribute1b|attribute2a|,|attribute3a,attribute3b,attribute3c
    id2||attribute2b,attribute2c,|attribute4a|,attribute5a
    

    test.pig:

    my_data = LOAD '$input' AS (str:chararray);
    split1 = FOREACH my_data GENERATE FLATTEN(STRSPLIT(str, '\\|', 2)) AS (id:chararray, attr:chararray);
    split2 = FOREACH split1 GENERATE id, STRSPLIT(attr, '[,|]') AS attributes;
    DUMP split2;
    

    Output of pig -x local -p input=input.txt test.pig:

    (id1,(attribute1a,attribute1b,attribute2a,,,attribute3a,attribute3b,attribute3c))
    (id2,(,attribute2b,attribute2c,,attribute4a,,attribute5a))
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a form which sends form data for processing and accordingly receive the
I am creating a .Net web service to receive and process serialized form data
I have an class which should send/receive data in packet form. This class contains
I want to receive the following HTTP request in PHP: Content-type: multipart/form-data;boundary=main_boundary --main_boundary Content-type:
I have a form.Panel in which I show some data. I receive the data
This is a newbie question... I receive data from the user via a form,
My server receives multipart form data from a mobile app. The result of printing
I periodically receive data that I use to update my database with. The external
i am developing an iphone app that receive data from sql server through php
How can i send and receive data though parallel port using C# .net? Also

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.