XML Parsing= > Python recipe

Registered Posts: 1

Hello,

I would like to get a Python recipe to upload easily my XML file in Dataiku.

If anyone as this magic recipe, I would appreciate

Many thanks

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
    edited July 2024

    Hi @Devian31_M
    ,

    The simplest way to upload an XML to DSS is by creating a dataset from the UI. In this case, the DSS autodetects the format (XML) and the schema (XPath).

    For example, you could upload XML files to a managed folder (or connect your folder to an external database with XML files) > select the menu icon on the XML file > "create a dataset". In the next window, select "test & get schema" then "load preview"

    Screenshot 2024-06-17 at 4.44.18 PM.png

    And, the output should look something like this:

    Screenshot 2024-06-17 at 4.42.47 PM.png

    This is the simplest way to load an XML file. To do this in Python recipe instead, you will need to use pandas read_xml, in which case you would need to specify the XPath. You would also need a code env with pandas 1.3 and the library "lxml" installed in the env.

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    XML_folder = dataiku.Folder("3trhpQ3P")
    
    with XML_folder.get_download_stream("books.xml") as f:
        data = pd.read_xml(f, xpath='/catalog/book')
    
    books_df = data
    
    # Write recipe outputs
    books = dataiku.Dataset("books")
    books.write_with_schema(books_df)

    Let us know if you have questions!

    Thanks

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.