Using Dataiku
- Hello, while following this link, https://knowledge.dataiku.com/latest/plugins/development/tutorial-first-plugin-recipe-processor.html#test-the-preparation-processor I would like to completely delete …Solution by SunghoPark
hello
I already deleted the plugin in DATA_DIR/plugins/installed, but Hide-color exists in Process-library in Prepare recipe.
I think I need to find the folder containing this Process-library and delete it, but I can't find the path.This process library appears the same in other projects and needs to be deleted.
Solution by SunghoParkhello
I already deleted the plugin in DATA_DIR/plugins/installed, but Hide-color exists in Process-library in Prepare recipe.
I think I need to find the folder containing this Process-library and delete it, but I can't find the path.This process library appears the same in other projects and needs to be deleted.
- Hi team, I currently encountered an urgent issue with our project automation pipeline. In a Python code recipe, I used builder to create a new python recipe, and have added a dataset as input, 3 new d…Last answer byLast answer by sanshekh
Hi All,
I'm trying to export my final df of a python recipe to a managed folder & I get the below mentioned error:
Job failed: Error in Python process: At line 35: <class 'Exception'>: Managed folder EyABfYFjV cannot be used : declare it as input or output of your recipe
Any code snippet via which I can push my df data into a csv file & then when I go back to the recipes it should show the managed folder icon wherein my file should be present. - I have a folder of data which can be converted to a desired format using a .exe application. The input data and .exe isgiven by 3rd party. How can I do this in DSS? Operating system used: CloudLast answer by
- Hello Dataiku Community. I am connecting to an FTP folder to build a dataset. The data I got is the union of all the data in all the files in that directory. What I want is to process only the new fil…Last answer byLast answer by Turribeach
In terms of only loading new files there are no built-in ways of doing this so you will have to develop custom Python code. Ideally you would want to have subfolders within your folder and move files as you process them. For instance files arrive in ./landing then they get moved to ./processing for loading and then after successful load you move them to ./loaded. That way you always know what files you processed and if it fails loading a file you have right there in the ./processing folder to fix the code or the file and try again the same file. This post will give you some guidance but you will need Python skills to achieve this requirement.
- Hi there - new to dataiku, Lets say i have an excel sheet of 2 columns where one has app reviews and the other has dates they were posted. Is there a video tutorial anywhere or example where i can cre…Last answer byLast answer by AdrienL
The simplest would be to use the Classify Text recipe showcased in this tutorial.
It does require a LLM connection, which can be either
- an API-based LLM (most are paid, some provide a free tier or trial)
- a Huggingface connection for local inference, which requires access to a GPU-enabled Kubernetes cluster
- Hi, I am using a python recipe to get embeddings from my text features in Dataiku and everything worked out fine, but the embeddings did not come out with comma delimiters when I write the output to a…
- Hello Community, I am actually helping one of my team work on preparation of data with: - Prepare recipe (running in Partially in database) engine. (Due to multiple data connections) - with some simpl…Solution by
- We are considering a migration to Dataiku Online. To All Dataiku Online Users What database do you use with your Dataiku online instance? How much data are you loading into that data? How do you find …
- Hi, I have a table that resides in data warehouse. Once I create a connection to that table in Dataiku, I would like to directly explore the data without doing any additional processing, so I was hopi…Last answer byLast answer by JordanB
Hi @DogaS
,This error usually occurs if there are null values in the input data (in this case, it would be the date column). Can you please check? If there are nulls, I'd recommend filling or removing them.
Note, you can fill empty values with a Prepare recipe using processor "fill empty cells...".
Thanks!
- Hello, I would like to get a Python recipe to upload easily my XML file in Dataiku. If anyone as this magic recipe, I would appreciate Many thanksLast answer byLast answer by JordanB
Hi @Devian31_M
,The simplest way to upload an XML to DSS is by creating a dataset from the UI. In this case, the DSS autodetects the format (XML) and the schema (XPath).
For example, you could upload XML files to a managed folder (or connect your folder to an external database with XML files) > select the menu icon on the XML file > "create a dataset". In the next window, select "test & get schema" then "load preview"
And, the output should look something like this:
This is the simplest way to load an XML file. To do this in Python recipe instead, you will need to use pandas read_xml, in which case you would need to specify the XPath. You would also need a code env with pandas 1.3 and the library "lxml" installed in the env.
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs XML_folder = dataiku.Folder("3trhpQ3P") with XML_folder.get_download_stream("books.xml") as f: data = pd.read_xml(f, xpath='/catalog/book') books_df = data # Write recipe outputs books = dataiku.Dataset("books") books.write_with_schema(books_df)
Let us know if you have questions!
Thanks