Parsing xml file

SUSHIL
Level 3
Parsing xml file

Hi, 

I  want  to read xml file from dataiku  into csv structure as output. 

When I upload xml file  by using type xml in format and preview. 

It  showing 6   columns   in schema. Where 3 columns is having nested array and in each content it was append with xml_ text. 

When I prepare recipe  I used unfold the array option  for the nested array column. 

But it's not showing any result  and unfold is not working. 

Can you sugggest  the steps how to prepare the data for the  array which is nested

0 Kudos
3 Replies
Jurre
Level 5

Hi @SUSHIL ,

DSS has some nice features and options with regard to reading xml's. For example the use of explicit Xpaths  with which you can specify paths to columns. You can find that option in de Data extraction dropdown in the preview section.  If you haven't done already please have a look at de XML-related docs. if that is not helpful enough a small sample of your data posted here will help to find the proper settings to unnest things right from the start.  

Cheers!

SUSHIL
Level 3
Author

Hi, 

I have followed your steps and documents but i can't able to read the results properly. 

There are some columns which has array format values. 

I cannot able to do unfold or unnest in recipe preparation. 

The array element has tag of xml_text.

Can you please share the steps for sample which has multiple hierachy in xml format  and parse the results

0 Kudos
Jurre
Level 5

Hi @SUSHIL ,

With the option "explicit Xpaths" you specify a path to a certain value. This path is basically all tags enclosing the value. 

I can't share a sample at the moment but a small example from a current project : 

"/sub/fields/field[@id="DOC_NAME"]/text() " translates to a column "DOC_NAME". The structure of the sourcefile : 

<?xml><sub><fields><field id="DOC_NAME">letter_to_myself.pdf</field></fields></sub>

Hope this helps!

 

 

0 Kudos