-
Integration with Microsoft Fabric and its OneLake
Hi, Couldn't find anything on an integration with Microsoft Fabric through OneLake yet in the Dataiku docs/release notes. Is this coming soon? As I read the Microsoft docs I understand we can't connect directly via ADLS, only via APIs or SDKs. Thanks in advance, Jonathyan Operating system used: RHEL 8
-
My custom Plugin changes are not applied.
My environment is Dataiku built on-premise AWS. I created my custom Plugin from a Python program recipe. I edited and saved the recipe.json of this Plugin, but when I invoke the Plugin in the flow screen, the changes are not applied. I then restarted Dataiku and all the changes were applied. How can I apply the changes to…
-
How to debug provisioning errors using Fleet manager
I get an error when I provision Automation servers using Fleet manager. How can I debug this? Operating system used: Amazon Linux
-
Publishing Feature Groups in the Dataiku Feature Store from a Prod Node to a Dev Node
We've used a Dataiku project on the Design Node to produce a number of shared data assets. These are published as Feature Groups in the Feature Store where other users can bring them into their own projects. When the project producing the shared datasets is promoted through Test and Prod environments, is there a way the…
-
Run a Time Series Forecasting Model
I get the following error message Error message: Failed to train : <class 'ImportError'> : libcuda.so.1: cannot open shared object file: No such file or directory Operating system used: 13.1.4
-
Code Snippet as Edit option in Support Chats and Community Posts
Hey there I do not know why this feature is not present. It would be really great to send codes as code snippet, provided by many other tech community and support platforms, for better readability and communication in these post. I think this is an essential to give better context
-
Docs for "pandasutils"?
Hello, My apologies if this is a remedial question, but at the start of every Python recipe the boilerplate code includes an import of: from dataiku import pandasutils as pdu Is there documentation for pandasutils? Is it a package that can be used in Python recipes? I've tried looking through the Dataiku Developer Guide,…
-
Identifying the Node Type in a DSS Notebook using Python
In Python, in a DSS notebook, I want to know if the code is running in the design node or the automation node. How can I do that?
-
Issues about text in learning path
Yesterday I could see text and video in the Core course. Todday I can only see video but not th text. Could you help me ?
-
Kubernetes job failed, exitCode=137, reason=OOMKilled
Log error : error Out of memory for code recipe on Kubernetes Kubernetes job failed, exitCode=137, reason=OOMKilled Hi there Please, give me step by step to reconfigure when configuration path or location to increase memory limit on GKE for dataiku application. Thanks Operating system used: Kubernetes cluster
-
RAG LLM for multiple datasets
Greetings, While working with the embedding recipe, we faced a limitation where we have two datasets, we want to apply the rag on, how can we apply the knowledge bank on them specifically? Regards
-
Can we use multiple data sources/tables to create knowledge bank for a RAG model?
-
Life cycle of local hugging face model pod
hello I now want to use Dataiku by deploying it as a pod on K8s using the local hugging face model. However, when you deploy a model as a Pod, the Pod automatically exits after a certain time (gracefull exit). And I know that I can change this time by adjusting the idle time in settings -> llm mesh -> huggingface, but I…
-
automation of insights export to pdf from notebook
I have a Dataiku notebook in a flow that creates various visualizations. After the visualizations are generated, I manually publish and export them as a PDF. I would like to automate this entire process, from generating the visualizations to publishing and exporting the PDF. Could someone guide me on how to automate this…
-
Connect Databricks Catalog from Dataiku API Designer
I have created a connector to query databricks catalogs from dataiku. this works fine when i test the same inside a python notebook. but from an API designer, this is not working and asking for project key. from dataiku import SQLExecutor2 executor = SQLExecutor2(connection="NAME") sql_query = f"""select * from…
-
RestException: INTERNAL_ERROR: Ticket not given or unrecognized while using MLFlow commands
Hi I am working on logging fine tuned hugging face model to MLflow from Dataiku notebook. I am not sure why I am facing Internal error, it works fine at times but again starts giving this error, at the simplest step creating experiment Not sure if the error is in code or notebook kernel or authentication RestException…
-
"NumberFormatException: For input string" in scenario with integer partitioned dataset
I have a non-partitioned D1 dataset. The first column "dt_partition" will be used to partition the next dataset. dt_partition is of type integer, representing the month (for example 202409). It only contains one value at a time, so the data will go into a single partition. My database is Snowflake. At the output of D1,…
-
How to make TF-IDF vectorization on a textual column?
Hello, I am searching how to make a TF-IDF vectorization on a textual column in order to combine it after with a cosine similarity. Thanks.
-
Using the dataiku API to reverse the design of a ML Task
lo, For a specific case we would like to use the Dataiku API to revert the design of a visual analysis to a design which was used for a specific model. E.g. we have three sessions, the last trained session (session 3) had a different design then session 1. I would like to revert the design back to session 1 and this is…
-
How can I set the font family of Notebook in DSS
I am a Korean User in a company. When I see Korean fonts on a Notebook (recipe), readability decreases. Korean developers often use the “D2Coding” font distributed by a famous Korean search site called Naver. (https://github.com/naver/d2codingfont) I want to apply this, but is there any way? When using Jupyter Notebook in…
-
Not able to store the pre trained model
I am trying to save the huggingface model in the code env, and it shows no error when am updating the code, but am not able to store it in the Resources directory, Operating system used: Windows
-
How do I install packages on dataiku Rmarkdown files?(R programming language)
im working with a rmarkdown file,I'm able to directly import few libraries like ggplot2 and Dataiku but when I try to install packages using install.packages("plotly") ,it shows error in install.packages() '. how do I install ? Operating system used: Dataiku online
-
How to debug Dataiku libraries + Llama3 DKUChatLLM bind_tools NotImplementedError
I am trying to reproduce the Text-to-SQL WebApp of the Dataiku Gallery on Dataiku 13+ with a Llama 3 connection and I am facing a NotImplementedError when executing DKUChatLLM().bind_tools(). Here is the code I am executing (after having set a proper LLM_ID), which you can find in the file web_apps/x9jctZd/backend.py here:…
-
After renewal of Core Designer Certificate, Valid date still shows old one
My Core designer certificate was valid from 24 June 2021 to 24 June 2023. I again took the core designer certificate on 17 Oct 2024 and also passed , but No mail came to registered email with link to the new certificate. Also in the portal certificate is showing the old dates (24 June 2021) Please let me know , how to get…
-
Inconsistent Schema in SFTP folder
I am trying to pull in an SFTP folder for my project in DataIKU however there's an error with the schema and this is inconsistent across all files in the folder. My problem is, our SFTP folder pulls from a 3rd party, meaning I can't access to amend the schema and I am cautious that when more files are added (this is a…
-
Projects, workspaces and other not showing up after upgrading DSS
Hello, I followed the instructions on "Upgrading a DSS instance" to the letter, however my projects, workspaces, code environments etc. are not showing up. The only thing i can see is the "Recent and favorite items" tab. I am running my DSS instance in a docker container. EDIT: It appears that the config/projects folder is…
-
Code studio exposition
Hello , we have a custom Dataiku version 12.5.2 running inside an azure vm and we want to use the code studio feature ! unfortunately the communication between DSS UI and the HTTP server running inside the pod uses portforwarding and it's not permitted in the environment we're working on ! Is it possible to change the…
-
Convert String value to Date
We convert String " 29692" to Date "16-04-1981" using " =Value() " formula in excel. How can we convert the same string to Date in Dataiku
-
Changing sample size in data preview while sample computation's loading
I've changed the sampling method on a big dataset and the Waiting for other sample computation prompt is loading indefinitely. Is there any way to change those settings before current loads? (abort button doesn't help) Operating system used: Windows 10
-
Modify "Answers" WebApp Plugin
Hello all, I have DSS v13.1 , I want to modify the "Answers" plugin to add feature uploading documents (pdfs, images, excel, word …etc) , like the one used in chatgpt 4 for example . What is the best way to do so? Operating system used: Linux - Debian 11
-
import local huggingface model
hello i want to import local huggingface model so i try to dss→ adminstation→ setiing→misc→ import model Models around 14GB can be uploaded. but large size model was not uploaded ex)starcoder 15b = 60GB So I checked the log of the dss node. When I checked, there was a timeout error due to an ngnix error. So I changed the…
-
Sharepoint Online Plugin
Looking for assistance with Sharepoint List Views. Anytime I try to bring in a view that I have created and saved in Sharepoint, the plugin doesn't respect the columns that are in my view. As a quick example, I created a view off of a default list that only has three columns, and the plugin pulls in some default set of…
-
Code env libraries installation issue
I have some libraries like nltk installed in some code environment, and when I try to run any code/notebook importing these libraries, I get the module not found error. Why does this happen and how can I fix it?
-
List all failed rows based on qulity rule
HI all, i would like to create a dataset based on data quality rules where it failed. I would like to list (create table) all of the failed rows to able to send it to the team what needs to be changed. Failed or not is not enough I need to be able to collect every line where the rules are failed. Did not find and option.…
-
googlesheets plugin feature: Ignore top n rows on import
Reading a google sheet with the plugin currently requires that header columns are in row 1. In the wild, a lot of users don't build sheets like that and the data begins some rows down the sheet. I suggest to add a feature of ignoring a number of top rows to correctly set the header row and table data.
-
Export Dashboard For Different Variable Values
Hi - Please see below my goal, methodology, and the challenge I am facing. Thanks in advance for your help. GOAL: I have a flow that takes a project variable while running. The output is then visualized on the dashboard. I would like to try different variable values and export the resulting dashboards. I have 10 different…
-
Testing SQL Connections
We want to test database connections in all our instances since our internal security policy is to change database passwords yearly and this invariably leads to some user connections being missed and this invariably leads to some user database connections being missed, the password expiring, flows failing, data not being…
-
Using sample.py after export model python
Hello, I’m trying to use the sample.py after unzipping the archive of a model I extracted. The model is a light gbm with a feature selection step. The version of the dss is 12.6.5 However the python script crash after the dummifier step with the error : Indexed_matrix.py Line 35 in _ remap _ key Remapped_key = (key[0],…
-
Confirmation pop-up when deleting a code sample
I deleted by mistake a code sample I had created because I clicked on the "trash basket icon" while I just wanted to close it. Indeed, the "trash basket icon" is in upper-right corner, which is generally the position of a "cross icon". So I think a confirmation pop-up of deletion would be good. Thanks
-
Add Seldon to deployment options
One of the deployment options in our company is Seldon (Seldon, MLOps for the Enterprise.). It would be great if Dataiku had the option to deploy directly to Seldon, the way deployment to K8, AWS, Databricks or Azure is now possible. Seldon in general deploys MLflow artefacts.
-
Put stuff in the API logging without sending in the response
We often run into situations where we'd like to log stuff from our internal API workings - like intermediary results for checking - without having to send these out in the response. It would be wonderful if there was an option to send things to the API log without it having to be part of either the request or the response.
-
fekport': nan, what does this mean?
hello, everyone I'd like to ask about the cause of the result from executing the following code: client = dataiku.api_client() prj_key = dataiku.Project().project_key project = client.get_project(prj_key) scena = project.get_scenario(scenario_id) The result is: { … 'progress ': { … , 'fekport' : nan, …} … } In other…
-
How to uninstall a package within a codenv (or install a package without its dependencies) ?
Hello We would like to install a package A (ultralytics in our case) but without its dependency B (opencv-python) or to be able to install this package A and then remove one of its dependencies B. The reason behind is that ultralytics doesn't work properly with opencv-python, we need to remove it and install…
-
Configurable Timezone Display for Date Columns (Beyond UTC-only)
Current Situation Dataiku DSS has specific behaviors when handling time columns: When it recognizes time-related columns (e.g., date, timestamp_tz, or timestamp_ntz), it displays them as Date columns, rendering them in timestamp format (with both date and time components). A significant limitation is that Date columns…
-
Data Quality Check: Valid Time Series
When working with time series, it would be nice to have a quality check that ensures that time steps meet a minimum definition (i.e. weekly on Monday), have no duplicates, and have no missing steps.
-
Dates localization coerced to UTC
Hello, I'm trying to localize a column with dates from UTC to CET/CEST using a python recipe. The results are correct when I open it in the notebook but seems that Dataiku coerces the dates back to UTC when writing the dataframe. Here below, a code that can reproduce then issue import dataikuimport pandas as pd, numpy as…
-
Extracting data for a specific time range from Datatime
Hi. I want to extract data from 7 AM to 10 AM, regardless of the date, in Datetime, but I don't know how to do it. Thank you.
-
Interacting with Labeling Task through Python API
Hello, In my project, I am trying to do two different things with labelings tasks. 1- Automatically create a labeling class according with a registry dataset with the classes that should exists. 2- Get the URL of the labeling class though a Python API to send to slack webhook. This is a issue since I need to send the…
-
Database schema
I have a connection to a postgres database. In this database there are two schemes. Is there a way to specify which schema the datasets should be written to? Operating system used: Ubuntu Operating system used: Ubuntu
-
Loading Dataiku model in Python Notebook are recipes
Hello everyone, I would like to discuss about an issue i'm facing. I trained a gradient boosting classifier using Dataiku Lab. I would like to use Shap explainability on it and first, try it on a python notebook. To do such a thing i am loading my model this way : import dataiku from dataiku import pandasutils as pdu model…
-
how we can achieve Data Lineage in DataIku
How we can achieve table and column decencies in recipes.
-
Batch Processing for Custom API end point
I’ve developed a custom Python API endpoint for regression and successfully predicted outcomes for individual records. However, when I attempt to process a batch of records, I encounter the following error: "Failed: Could not parse a SinglePredictionQuery from request body, caused by: JsonSyntaxException: Expected a…
-
Error in python process: <class 'ValueError'>: Numeric feature score_2 is empty
Trying to run a Recommendation system, I received this error not allowing to run the job. Received this error message, checked the database and all fields are with information. [01:16:18] [INFO] [dku.utils] - *************** Recipe code failed **************[01:16:18] [INFO] [dku.utils] - Begin Python stack[01:16:18]…
-
Timeseries forecasting with GPU / cuda 11
Hello, I am now trying to train a model with timeseries forecast by using GPU. OS: Ubuntu 22.04 Installed with apt-get on OS: libcudnn9-cuda-11 cuda-toolkit-11-8 libnccl2 I then created a new python env : when i use that environment in the model, I can see at first that it's fine since it shows me my GPU card : but when I…
-
Why is the recipe saying Dataset doesn't exist after the dataset has been created?
I am facing a strange issue where even after a dataset is getting created, the code is saying that dataset doesn't exists. Following is the function that I am using which has 3 steps - 1. Creating a table in the database 2. Creating a dataset in DSS and connecting it to the table created in 1 3. Writing a pandas dataframe…
-
Format Now() Date
In my recipe, I created a new column with Now(). I would like to format that date as 'mm-dd-yyyy'. However, the result is:
-
Extract flow into python / jupyter notebook
Hi! I built a working machine learning flow from data preparation and processing to modelling and prediction. Can I extract this flow as a python code or as an .ipynb file? Can you please elaborate this part in detail? Thanks! Operating system used: Windows Operating system used: Windows Operating system used: Windows
-
Governance API- Validation of Governance Approval for a specific bundle of a project
I am trying to a create a logic to validate the governance approval for a particular bundle of a project. Input params: PROJECT_KEY, BUNDLE_ID The python code should be able to validate the governance approval status and provide the result as approved, pending , rejected,etc. I am trying to use the…
-
Meaning-associated palettes not applying color
I made a user defined meaning with colors for each value, but my charts don't automatically pick up my colors. Are there any settings that I'm missing here? Some things I tried: I had a list of just Sharp and Flat A list of [no value] which lead to a null pointer exception A list with "null" I tried deleting and remaking…
-
Possible to add flow image and wiki as inline email vs attachments?
Hi, I would like to add the Project wiki at the top of my scenario email template and add the image of the flow below that. Is this possible? thx Operating system used: Windows10
-
How to unzip files from managed folder in dataiku
I am getting error as : NotImplementedError: That compression method is not supported at zip_file.open(), even though code is able to list the filenames in zipped folder as you can see VBOX0001.vbo Operating system used: Windows Operating system used: Windows
-
Add Granger Causality tests to the stats worksheet
I'd really like to be able to test granger causality between two or more time series. Would it be possible to add it to the stats page, such that I can pick 2 or more input columns, and the GC can be calculated between each pairing and each ordering, over a specified range of lags?
-
Dataiku K8S Pod unable to connect DSS
[urllib3.connectionpool] http://XXXXX:PORT "POST /dip/api/XXXXX/containers/get-execution HTTP/1.1" 500 None [ERROR] [root] Could not reach DSS: None: b'Unknown execution context: test_execution_id Pod config have the necessary variables under the container along with the base image. This execution id is returning as…
-
After installing - the web page is not loading
OS : Ubuntu 22.04 DSS : 13.1.1 After starting DSS free edition, here is what I get by connecting to the server : looking at the backend.log I don't see any error either (see file) any idea what the reason could be ? Operating system used: Ubuntu 22.04
-
Problem loading my Profile from another Laptop, yes amazing
Hi, I've recently starting using another laptop to work with dataiku, and when I connect to my account, all information is gonne? my certifications, and so on. Can you please help me to get tyhem back ? Operating system used: MacOS SONOMOA 14.6
-
Dataiku migration - Macos - Error with Laucher
Hi, After several years of faithful service, my old laptop for Dataiku was due for a well-deserved retirement and I got a new one from. So, I've decided to migrate all Dataiku environment into new laptop. I've created a dss-home folder inside DataScienceStudio and copy the content of my backup dss_home into this new one…
-
"Fold" processors in visual recipe - Implement In-Database engine
Today, fold processors require the DSS engine because they are not supported as in-database processing, which forces dataiku designers to implement SQL recipes to perform fold operations. Most modern databases support "unpivot" syntax, which enable fold processors to be converted to SQL.…
-
How to get a trial version of dataiku to complete project for advanced designer certification course
-
Preformat and preprocessing a register Dataiku model through python by using
I would like to use through a python recipe a registered model and apply the preformat and preproccesing through a huge dataframe connected to Databricks (which is good for memory issues). But it seems this is not possible to do it without passing through a pandas dataframe. Anybody know how to resolve this? The error…
-
How do I get the average of a column to use in a DSS output file (not a metric)
I have a small table that gives me the average # of workdays, average daily value, and average forecast for the previous 3 months. I have to be able to calculate the average of each column so I can use the results in another prepare recipe (for example, the average of Workdays is 22). Is there a formula that will create…
-
Keeping Conditional Formatting While Export Doesn't Work.
I have an issue with the Applying conditional Formatting while export. Dataiku does not apply formatting while export. What is the issue how can i keep conditional formatting.
-
Label Encoding Dataiku Recipe
Hello, I cannot find any recipe that is the equivalent of the scikit learn LabelEncoder(). The One Hot encoding recipe can be found in prepare recipe as "Unfold" step but regarding LabelEncoder (IE label_1 = 1, label_2 = 2 … label_N = N) is very well hidden. I could make a python recipe but i would prefer to every task in…
-
If the dataset do not exists, I either want to catch this exception thrown, or check if df exists
Exception: Reading dataset failed: b"Failed to read data from table, caused by: SnowflakeSQLException: SQL compilation error:\nObject 'PROD_.GLOBAL_PILOT' does not exist or not authorized."
-
Dataiku visual Recipe Parallel
I am using Dataiku 12.5.2. How can I enable parallel processing when using a Sync recipe for the following cases: Filesystem Dataset to Filesystem Dataset JDBC Dataset to JDBC Dataset Or between Filesystem Dataset and JDBC Dataset? Are the only available options duplicating the flow, partitioning the data, or using code…
-
Expanding reach
Hello, I am an experienced AI/ML consultant with significant hands-on experience using Dataiku with three certifications. I am interested to learn about effective strategies on how to reach clients in search of Dataiku consultants. Best Regards Operating system used: mac os
-
Leveraging Dataiku Instance from outside the environment.
I've tried using dataiku managed instance from local workstation using dataiku API client v13.1.4 and had trouble accessing projects in the machine. import dataikuapi import random import requests requests.packages.urllib3.disable_warnings() client = dataikuapi.DSSClient(DATAIKU_URL, API_KEY) project_keys =…
-
Is there a way to copy a dashboard to another project?
I have a dashboard in our QA environment to develop that I then want to publish to our prod instance. so Ideally I wanted a way to export the dashboard and then import in the other instance. Operating system used: Windows
-
Dataiku Automation For Excel File Refresh
Hello, I have an excel file which is connected to SQL DWH. Normally everyday, I go in this file to refresh data manually by clicking ctrl + alt + f5. I want to make it automatically with Dataiku. pip install pywin32 import win32com.client as win32 def refresh_excel_file(file_path): # Open Excel excel =…
-
Why does Dataiku allow two web-apps with the same name?
I was surprised to find Dataiku allows two web-apps with the same name to exist. Why? The expected behavior would be to ask the user if they want to overwrite a published web-app when name collision occurs. thx Operating system used: WIndows 10
-
Running Python code in Library from Dataiku Application
Hi! I am interested in understanding if it is possible to call a Python script in a Library that I linked with my GitHub, and call a module within that script to run it on a Dataiku Application. In other words, how do you call a Python script defined in a Project Library to the interface below which is defined in the…
-
Dataiku Python notebook kernel error.
Hi Dataiku Expert, We got a kernel error when starting a python notebook in Dataiku. The error message shows below. It is strange that it only happens to certain users. For the same notebook, one user can start and connect to the kernel without issue, but the affected user will get this kernel error. Both tested users have…
-
How to share objects using the Dataiku API
Hi, It appears that the settings.add_exposed_object() method is undocumented. So documenting here few examples for the benefit of others: import dataiku client = dataiku.api_client() project = client.get_project(source_project_key) settings = project.get_settings() # Share Managed Folder…
-
Is it possible to edit RMarkdown Reports in RStudio Code Studio?
Hello there Is it somehow possible to edit RMarkdown reports in an RStudio Code Studio? This would be great, to leverage the interactivity in RStudio, and save the resulting report back to DSS to serve as a dashboard insight for example. Developing a full RMarkdown report in the existing editor is quite difficult.…
-
Dashboard - Timeline and chart show different lines (timeline looks correct)
Hi team, can you explain why the timeline and chart on this dashboard look different? The timeline looks correct (shows two lines), the chart itself shows only one line. Operating system used: AWS
-
ModuleNotFoundError: No module named 'dataiku.langchain'
Hi, I was recently trying to install the dataiku api locally for some testing purposes but was met with some errors. Installation seemed to be working fine but the moment I tried to run this code:`from dataiku.langchain.llm import DKUChatModel` , it seemed to crash and be unable to find the dataiku.langchain module. What…
-
API : load a request from postman/bruno collection
Hello all, Configuring by hand a rest api can be painful. On the other hand, the API world use a lot tools such as Postman or Bruno (an open source clone) which allows easy test, debug... I use it everytime I had to work on a rest API and then I try to translate it to the final tool . Both tools offer "collection", a set…
-
ADBC connectivity : faster columnar storage query
Hello all, ADBC is a database connection standard (like ODBC or JDBC) but specifically designed for columnar storage (so database like DuckDB, Clickhouse, MonetDB, Vertica...). This is typically the kind of stuff that can make Dataiku way faster. more info in Here a benchmark made by the guys at DuckDB : 38x improvement…
-
How to run/build a flow zone using dataiku python api
Let say that I have created a flow zone with id 'xytg' and I want to trigger / run/ build this flow zone using python api. How can I do that? I tried to look into the python api but could not find it.
-
Properly implement support for Building Flow Zones in Scenarios and the Dataiku API
In Dataiku v12.0.0 a new feature was added that allows users to build flow zones from the flow UI: https://knowledge.dataiku.com/latest/data-preparation/pipelines/tutorial-build-modes.html#build-a-flow-zone This works well however this capability was never added properly to Scenarios and to the Dataiku API. In 12.1.0…
-
Export predicted data through code
I need to export predicted data from visual analysis using python. I am using performance on train set as a reference for my monitoring therefore I need to automate downloading the predicted dataset to keep it up-to-date in case I retrain my model Operating system used: Windows
-
The Python process failed (exit code: 2)
Hi, I'm new to this tool and was following tutorials from Quick Start courses. When I was trying build the flow, I got the following error: [18:46:22] [INFO] [dku.flow.activity] - Run thread failed for activity…
-
Delete a partition
Hi, can anyone please help me on deleting a partition, i have created a dataset having many partitions and unfortunately one of the partition is loaded incorrectly and i would like to delete only partition and maintain the remaining same. please help on this, as i am unable to get any options to delete this.
-
Export data to password-protected xls
Hi! Does DSS have an option for setting a password on a newly generated xls-file when exporting data ? A general password that is, needed for opening that file. If a pythonscript is the way to handle this, what would be the preferred library for that ? Thanks in advance for any thoughts on this! Jurre
-
Export Partitoned Dataset
I'm trying to use a python recipe to export a partitioned dataset, when I partition by a specific column(date column(LDD for example)), that column is removed from the dataset. How would I export the partitioned dataset into monthly files based on LDD which was partitioned?
-
Update excell sheet via python script
Hi folks, I want to update specific cells in an Excel sheet using openpyxl.load_workbook in Python. When I run the code, I don’t encounter any errors, but nothing gets updated in the file. How can I solve this problem? Thanks in advance.
-
O365 connectors within Teams will be deprecated and notifications from this service
We setup a script to send out scenarios checking using get_message_sender() & send() APIs to team channel. Starting today, this message popup: "Action Required: O365 connectors within Teams will be deprecated and notifications from this service will stop. Learn more about the timing and how the Workflows app provides a…
-
Repeated Random Splitting and Bootstrapping with XGBoost
I have a dataset that I want to random split into train and test sets with an 80/20 ratio. I aim to repeat this random splitting and bootstrapping of the training data 1,000 times. For each iteration, I'll train an XGBoost model and then export the SHAP values, Gini index for each feature, F1 score, and ROC AUC for the…
-
Smart indexing: recommend index based on downstream recipes
It would be helpful if on the index selection menu for a dataset, some smart values could be displayed based on downstream recipes, and if in the recipe creation views, upstream datasets could be reindexed to optimize them as well. For example, after I've created two join recipes downstream of a dataset, on the index…
-
using pre deployment hook create a pv and pvc?
"I am trying to add a Persistent Volume Claim (PVC) to a Kubernetes deployment using a pre-deployment hook in Dataiku. Could you provide any documentation or steps outlining how to add a volume to a deployment through a pre-deployment hook?" Operating system used: windows Operating system used: windows
-
Looking to replicate a SUM(COUNTIF) formula in Dataiku
I am working on a scorecard in Dataiku and I would like to calculate the percentage of completion in a set number of columns. Basically, I would like to replicate this formula in excel: =SUM(COUNTIF(ColumnX:ColumnXX,"*")/Total Number of Columns) and am having issues. The columns are a mix of strings, integers, and text,…