-
How can I change the default location where a .conf file is created to any custom location?
Hello Community, I am using the great expectations package in a dss project for my Data Quality checks. I have already installed it in my code env and using it in a python script for the time being. Even though, the package properly runs in a python notebook when I save it back to recipe I get the following error: " Job…
-
RAG Webapp
Hello everyone, In my Dataiku Flow, I have a RAG setup that includes embeddings and prompts. I’d like to replicate this process—achieving the same results as in Prompt Studio—in a Dash web app. The goal is to reuse the knowledge base built in the Flow and leverage the augmented LLM created by the embedding recipe. Does…
-
How to dynamically output recent 3 days' data with partitioned dataset
Hi all, I am new to Dataiku world, I'd like to ask the right way to output data with specific time range with partitioning method. The thing I want to do is: Dynamically build the recent 3-days data from input datasets. (Use Time Range day partition) I've tested but the output seems to grow even more than I expect, and the…
-
Avoid python recipe to change data types
Hello everyone, I would like to prevent python from inferring the data type of my dataframe during a python recipe . For example, I would like an id column to remain in string type rather than dataiku converting it to float. I could, for example, convert each of my columns manually, but this is tedious for datasets with…
-
How to leverage locally hosted llm using python scripts
Dataiku Version: 13.3.1 I have several LLMs that I have in my DSS cache from the HuggingFace Connection. I can leverage these models using the prompt recipe. However I am struggling to use them in custom python scripts. For example if I get the LLM id of all the models I have a connection to using the following script, and…
-
Read files in managed folders with shell
Hi, can someone help me please. Given a folder input and a folder output I want to link them with a shell script so that the shell script can read a test.txt file from input folder and write the output.txt file in the output folder with a .sh script but when i use the variables of dataiku it doesn't work. Here an example…
-
Stock geometry / geopoint objects on Hive
Hello everyone, I would like to store a column of type geometry or geopoint in my HDFS dataset with the aim of later performing a geojoin recipe between a geometry column containing polygons and a geopoint column containing geopoints. When I try to store my column in geopoint format, I get the following error: And when I…
-
How to use "Execute Python unit test" scenario step
A new scenario step was added in a recent DSS version which is to execute a Python unit test. I'd like to start using this. However, the documentation is pretty brief: "this step executes one or more Python pytest tests from a project’s Libraries folder using a Pytest selector". Anyone have more details on or an example of…
-
Why are SQL queries in Dataiku slower than in my AWS Docker container (RDS Oracle)?
Hello, I'm currently using Dataiku and SQLExecutor2 to run queries on my Oracle database hosted on AWS RDS, port 2484. When I execute the same query from a Docker container on AWS, the query takes about 15 ms. However, when I run it in Dataiku, it takes approximately 1 second, and the whole process, which takes about 8…
-
Custom trigger to run itself?
Hi everyone, I want to execute scenario again if its fail, so it can try 3 times, sometimes kubernetes fails or, spark fails could be fixed after run again so, i dont miss time range between fail and fix manually. This is the code created with LLM, it used python for that. I changed project name to variable "project name",…