Parsing xml file

SUSHIL
SUSHIL Registered Posts: 22 ✭✭✭

Hi,

I want to read xml file from dataiku into csv structure as output.

When I upload xml file by using type xml in format and preview.

It showing 6 columns in schema. Where 3 columns is having nested array and in each content it was append with xml_ text.

When I prepare recipe I used unfold the array option for the nested array column.

But it's not showing any result and unfold is not working.

Can you sugggest the steps how to prepare the data for the array which is nested

Tagged:

Answers

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭

    Hi @SUSHIL
    ,

    DSS has some nice features and options with regard to reading xml's. For example the use of explicit Xpaths with which you can specify paths to columns. You can find that option in de Data extraction dropdown in the preview section. If you haven't done already please have a look at de XML-related docs. if that is not helpful enough a small sample of your data posted here will help to find the proper settings to unnest things right from the start.

    Cheers!

  • SUSHIL
    SUSHIL Registered Posts: 22 ✭✭✭

    Hi,

    I have followed your steps and documents but i can't able to read the results properly.

    There are some columns which has array format values.

    I cannot able to do unfold or unnest in recipe preparation.

    The array element has tag of xml_text.

    Can you please share the steps for sample which has multiple hierachy in xml format and parse the results

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭

    Hi @SUSHIL
    ,

    With the option "explicit Xpaths" you specify a path to a certain value. This path is basically all tags enclosing the value.

    I can't share a sample at the moment but a small example from a current project :

    "/sub/fields/field[@id="DOC_NAME"]/text() " translates to a column "DOC_NAME". The structure of the sourcefile :

    <?xml><sub><fields><field id="DOC_NAME">letter_to_myself.pdf</field></fields></sub>

    Hope this helps!

Setup Info
    Tags
      Help me…