Parsing XML file

Turribeach
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,107 Neuron
edited July 16 in Using Dataiku

I have an XML file as shown below which I am loading as a dataset in Dataiku. The XML file shows the membership of each user in in each room. In the Format/Preview dataset screen I can either select /FullReport/rooms/room to get all the rooms or /FullReport/rooms/users to get all the users. But what I really want is to have the combined output as follows:

roomIDroomTitlefullName
1Room 1JOHN DOE
1Room 1TOM DOE
2Room 2JOHN DOE
2Room 2JOHN SMITH
3Room 3SARAH SMITH

Any ideas how to do that in DSS?

<FullReport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <rooms>
        <room>
            <roomID>1</roomID>
            <roomTitle>Room 1</roomTitle>
        </room>
        <users>
            <fullName>JOHN DOE</fullName>
        </users>
        <users>
            <fullName>TOM DOE</fullName>
        </users>
        </users>
    </rooms>
    <rooms>
        <room>
            <roomID>2</roomID>
            <roomTitle>Room 2</roomTitle>
        </room>
        <users>
            <fullName>JOHN DOE</fullName>
        </users>
        <users>
            <fullName>JOHN SMITH</fullName>
        </users>
        </users>
    </rooms>
    <rooms>
        <room>
            <roomID>3</roomID>
            <roomTitle>Room 3</roomTitle>
        </room>
        <users>
            <fullName>SARAH SMITH</fullName>
        </users>
        </users>
    </rooms>
</FullReport>

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓

    You can do this in 2 steps:

    - import the XML with format params:

    - Root element XPath -> /FullReport/rooms

    - Data extraction -> Explicit XPath

    - extracted elements :

    - ./room/roomTitle/text() -> roomTitle. (unique)

    - ./room/roomID/text() -> roomID (unique)

    - ./users/fullName/text() -> users

    - use a Prepare recipe to fold the users array with the Fold processor

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,107 Neuron

    Hi, I played with it last night a bit more and I ended up doing the same thing you suggested so thanks for validating the road I took. It's a pity you can't extract the values into rows as part of the load as this can speed up the file load. But for my use case the file wasn't that big so the extra recipe didn't take that long. Thanks!

Setup Info
    Tags
      Help me…