Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on March 18, 2025 8:54AM
Likes: 0
Replies: 2
Hello everyone,
I am trying to parse XML values within a column of a Dataiku dataset using a visual recipe (preparation recipe) in Dataiku.
For JSON values, I can use processors like "Unnest Object (Flatten JSON)" to extract and structure the data. However, I couldn't find a similar built-in processor for handling XML values.
I am aware that XPath can be used when uploading data or creating datasets from folders, but I am specifically looking for a way to process XML values that exist within a column of a dataset using a visual recipe.
Does anyone know of a method or workaround for achieving this?
Any help would be greatly appreciated.
I've uploaded an image for your reference.
Thank you!
Operating system used: linux
제가 사용한 예시는 다음과 같습니다.
<?xml version="1.0" encoding="UTF-8"?><!-- 어떤 엘리먼트가 사용될 수 있는지 정의할 수 있다. --><!-- Document Type Definition : DTD --><!DOCTYPE department SYSTEM "ex02.dtd" ><!-- department>dept>deptno{10}+dname{ACCOUNTING}+loc{NEW YORK}
공용: PUBLIC
사용자정의: SYSTEM
--><department>
<dept>
<deptno>10</deptno>
<dname>ACCOUNTING</dname>
<loc>NEW YORK</loc>
</dept>
<dept>
<deptno>20</deptno>
<dname>RESEARCH</dname>
<loc>DALLAS</loc>
</dept>
<dept>
<deptno>30</deptno>
<dname>SALES</dname>
<loc>CHICAGO</loc>
</dept>
<dept>
<deptno>40</deptno>
<dname>OPERATIONS</dname>
<loc>BOSTON</loc>
</dept>
</department>
더블클릭하면 아래와 같이 메뉴가 나옵니다. Extract 선택
<dept>\s*<deptno>(\d+)</deptno>\s*<dname>(.?)</dname>\s<loc>(.?)</loc>\s</dept>
정규식을 활용해서 추출 할 수 있습니다.
Hello younhyun,
I tried the process again using the XML file you provided.
I'm sharing my progress this time, along with an additional screenshot since the feature shown in your example doesn't seem to be enabled on my side.
Could you please let me know which Dataiku version you're currently using?
Thank you!