XML Parsing= > Python recipe
Hello,
I would like to get a Python recipe to upload easily my XML file in Dataiku.
If anyone as this magic recipe, I would appreciate
Many thanks
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @Devian31_M
,The simplest way to upload an XML to DSS is by creating a dataset from the UI. In this case, the DSS autodetects the format (XML) and the schema (XPath).
For example, you could upload XML files to a managed folder (or connect your folder to an external database with XML files) > select the menu icon on the XML file > "create a dataset". In the next window, select "test & get schema" then "load preview"
And, the output should look something like this:
This is the simplest way to load an XML file. To do this in Python recipe instead, you will need to use pandas read_xml, in which case you would need to specify the XPath. You would also need a code env with pandas 1.3 and the library "lxml" installed in the env.
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs XML_folder = dataiku.Folder("3trhpQ3P") with XML_folder.get_download_stream("books.xml") as f: data = pd.read_xml(f, xpath='/catalog/book') books_df = data # Write recipe outputs books = dataiku.Dataset("books") books.write_with_schema(books_df)
Let us know if you have questions!
Thanks