How to Parse XML Values in a Column Using Dataiku Visual Recipe (Preparation Recipe)?

Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8 ✭✭✭


Hello everyone,

I am trying to parse XML values within a column of a Dataiku dataset using a visual recipe (preparation recipe) in Dataiku.

For JSON values, I can use processors like "Unnest Object (Flatten JSON)" to extract and structure the data. However, I couldn't find a similar built-in processor for handling XML values.

I am aware that XPath can be used when uploading data or creating datasets from folders, but I am specifically looking for a way to process XML values that exist within a column of a dataset using a visual recipe.

Does anyone know of a method or workaround for achieving this?
Any help would be greatly appreciated.

I've uploaded an image for your reference.

Thank you!

Operating system used: linux

Answers

  • Registered Posts: 9 ✭✭

    제가 사용한 예시는 다음과 같습니다.
    <?xml version="1.0" encoding="UTF-8"?><!-- 어떤 엘리먼트가 사용될 수 있는지 정의할 수 있다. --><!-- Document Type Definition : DTD --><!DOCTYPE department SYSTEM "ex02.dtd" ><!-- department>dept>deptno{10}+dname{ACCOUNTING}+loc{NEW YORK}
    공용: PUBLIC
    사용자정의: SYSTEM
    --><department>
    <dept>
    <deptno>10</deptno>
    <dname>ACCOUNTING</dname>
    <loc>NEW YORK</loc>
    </dept>
    <dept>
    <deptno>20</deptno>
    <dname>RESEARCH</dname>
    <loc>DALLAS</loc>
    </dept>
    <dept>
    <deptno>30</deptno>
    <dname>SALES</dname>
    <loc>CHICAGO</loc>
    </dept>
    <dept>
    <deptno>40</deptno>
    <dname>OPERATIONS</dname>
    <loc>BOSTON</loc>
    </dept>
    </department>

    더블클릭하면 아래와 같이 메뉴가 나옵니다. Extract 선택

    <dept>\s*<deptno>(\d+)</deptno>\s*<dname>(.?)</dname>\s<loc>(.?)</loc>\s</dept>
    정규식을 활용해서 추출 할 수 있습니다.

  • Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8 ✭✭✭

    Hello younhyun,

    I tried the process again using the XML file you provided.
    I'm sharing my progress this time, along with an additional screenshot since the feature shown in your example doesn't seem to be enabled on my side.

    Could you please let me know which Dataiku version you're currently using?

    Thank you!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.