How to go from flat relational data to nested object oriented data
I am trying to combine multiple rows into a single nested json object. I know how to do the opposite (i.e. flatten), but cannot find the right tool to go the opposite direction.
As an example, I start with this data:
Class, Student, Grade
1, Sally, A
1, Matt, A
1, Phil, C
What I want as an output is a single record:
Class, Grades
1, {{"Student": "Sally", "Grade": "A"}, {"Student": "Matt", "Grade": "A"}, {"Student": "Phil", "Grade": "C"})
Is there a way to do this with Dataiku?
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,974 Neuron
Is there a way to do that in Dataiku using the standard recipes? ⇒ No
Or do I need to write a custom Python function to do it? ⇒ Yes
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,974 Neuron
In most cases people want to unfold arrays:
https://doc.dataiku.com/dss/latest/preparation/reshaping.html
But you seem to want the opposite. What exactly are you trying to achieve? They might a better way of getting there.
-
I have multiple relational tables, one BASE and 1-5 subtables which link with a 1 to many relationship via a Primary Key.
I need to convert the data in these tables into a specific XML based format for electronic data exchange. So the plan is to link the individual tables together and build one JSON object per record in the original BASE table. Then I should be able to map the fields of the JSON object directly to the XML model. -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,974 Neuron
I wouldn't bother with JSON if what you need is XML. You will need a Python code recipe for this. Here is an example of how to convert a Pandas dataframe (which Dataiku will give you) to XML:
https://www.askpython.com/python-modules/pandas/dataframe-to-xml
You will then need to write this XML file to a Dataiku managed folder. After that you can then send it to the system that needs it.
-
Thanks for that, but we are actually getting off topic.
The original question is how to build a nested json object from multiple lines of a dataframe.
Is there a way to do that in Dataiku using the standard recipes? Or do I need to write a custom Python function to do it?
-
Thanks