Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8097485
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T21:46:34+00:00 2026-06-05T21:46:34+00:00

I need to convert hierarchical data (AVRO data, which boils down to JSON) into

  • 0

I need to convert hierarchical data (AVRO data, which boils down to JSON) into tabular data (csv). Since AVRO have strict schema, I know essentially what form the JSON will take, but I have to do this for many different schema, so I’m looking for a consistent, declarative way to express the transformations I need to make. For example, if my incoming data looks like this…

{
    "customers": [
        {
            "addresses": [
                {
                    "city": "Los Angeles", 
                    "country": "USA", 
                    "county": null, 
                    "postalCode": "90064", 
                    "stateOrProvince": "California", 
                    "street1": "11832 W. Pico Blvd.", 
                    "street2": "", 
                    "street3": "", 
                    "street4": "", 
                    "tags": [
                        "BILLING"
                    ]
                }
            ], 
            "company": "", 
            "dateCreated": "2009-04-24T11:42:31+00:00", 
            "dateOfBirth": null, 
            "doNotCall": null, 
            "email": {
                "emailAddress": "general@magentocommerce.com"
            }, 
            "emailOptOut": null, 
            "fullName": {
                "firstName": "Test", 
                "lastName": "General", 
                "middleName": "", 
                "prefix": "", 
                "suffix": ""
            }, 
            "gender": null, 
            "id": {
                "Id": "2", 
                "namespace": "1000020016"
            }, 
            "lastModified": "2009-05-08T23:33:06+00:00", 
            "primaryPhone": {
                "number": "866.4.VARIEN", 
                "type": "UNKNOWN"
            }, 
            "sourceIds": null
        }
    ], 
    "totalItemsFound": 3
}

…I might need to output one row for each customer, like this:

MERCHANT ID|NUM CUSTOMERS|ID|FIRST NAME|LAST NAME|EMAIL|PHONE|STREET|CITY|STATE|ZIP|COUNTRY|EMAIL PREFERENCE
some.merch|3|1000020016-2|Test|General|general@magentocommerce.com|866.4.VARIEN|11832 W. Pico Blvd.|Los Angeles|California|90064|USA|N

I need to be able to express the following things:

  1. Get all the values from a given key as an array: All the dates-of-birth
  2. Repeat one value over every row: totalItemsFound, repeated in every row
  3. Repeat a static value in every row that comes from static data I already know the merchant channel never changes
  4. And the tricky one: Arbitrarily manipulate the incoming data to produce the desired output:
    • Convert the customer’s id into namespace-id
    • Invert and change a null/boolean value into y/n, as in emailOptOut to EMAIL PREFERENCE
    • (re-)format a date or currency
    • etc

I started out with jsonpath, but that only solves #1 above. I’ve been slowly adding a language around jsonpath to serve 2 and 3, but I really don’t have a good answer for 4 (besides eval., and I’d really hate to do that). I looked at JSON/T, but couldn’t find a python library for it. I even seriously considered writing a middleware to convert the JSON into XML so that I could use XSLT, but I’m hoping someone here at S/O has a better solution before I get that desperate.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T21:46:36+00:00Added an answer on June 5, 2026 at 9:46 pm

    Why not attempt a functional decomposition not unlike what follows:

    w = csv.writer(...)
    for r in records: 
        l = {}
        for field in fields:
            f_ = rename(field)
            v_ = transform(field, r.get(field, default(field)))
            l[f_] = v_
        w.write(l)
    

    where rename maps the old field names to the new ones, and transform converts the field’s value depending on the transform set for the field, and default returns the value to be assigned to this field.

    So you would only need to define the list of fields, and the functions: rename, transform, and default.

    For the example you’ve given:

    def rename(field):
        t = {'emailOptOut':'EMAIL PREFERENCE'}
        return t.get(field, field)
    
    def transform(field, data):
        t = {'emailOptOut': bool}
        return t.get(field, lambda a: a)(data)
    
    def default(field)
        t = {'MERCHANT ID':11039215}
        return t.get(field, None)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to convert all of the following forms into .NET Uri object: hello.world
I need to convert an open delegate (one in which the Target is not
I need to convert a python code into an equivalent java code. Python makes
I need to convert a string to another which has removed anything before the
I need to convert a string of text containing a long url into the
I need convert some Mac Numbers(00163e2fbab7) into Mac String. (with :) Is there some
I have a field varchar(14) = 20090226115644 I need convert it to -> 2009-02-26
I have a Java List containing objects with hierarchical data (id, name, parentId) and
Equivalent to utf8_decode in php I have an string: tópicos I need convert to
Need to convert a text value `2012-03-19' into a date type, then extract the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.