XML Parsing= > Python recipe

Devian31_M
Devian31_M Registered Posts: 1

Hello,

I would like to get a Python recipe to upload easily my XML file in Dataiku.

If anyone as this magic recipe, I would appreciate

Many thanks

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
    edited July 17

    Hi @Devian31_M
    ,

    The simplest way to upload an XML to DSS is by creating a dataset from the UI. In this case, the DSS autodetects the format (XML) and the schema (XPath).

    For example, you could upload XML files to a managed folder (or connect your folder to an external database with XML files) > select the menu icon on the XML file > "create a dataset". In the next window, select "test & get schema" then "load preview"

    Screenshot 2024-06-17 at 4.44.18 PM.png

    And, the output should look something like this:

    Screenshot 2024-06-17 at 4.42.47 PM.png

    This is the simplest way to load an XML file. To do this in Python recipe instead, you will need to use pandas read_xml, in which case you would need to specify the XPath. You would also need a code env with pandas 1.3 and the library "lxml" installed in the env.

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    XML_folder = dataiku.Folder("3trhpQ3P")
    
    with XML_folder.get_download_stream("books.xml") as f:
        data = pd.read_xml(f, xpath='/catalog/book')
    
    books_df = data
    
    # Write recipe outputs
    books = dataiku.Dataset("books")
    books.write_with_schema(books_df)

    Let us know if you have questions!

    Thanks

Setup Info
    Tags
      Help me…