-
Creating new features through python API in visual analysis
I am trying to build models through Dataiku's Python API. I want to deploy the model as an API endpoint. I want to add some additional feature creation steps in the visual analysis to pass raw data to the endpoint, as given below in the Dataiku Documentation. I want to know if it's possible to create preprocessing steps…
-
How can I write a Pandas dataframe in a Database in SQL Server connection?
Suppose I have a Pandas DF and I want to create a new table in a SQL Server connection with all the data in the DF. For Snowflake, I use the DkuSnowpark module and write_with_schema, but I couldn't find something similar for SQL Server. I tried using SQL Alchemy but I got driver error, but couldn't find another way.
-
Problem with loading large files
I try to upload datasets from my location to dataiku but it only allows me to upload data smaller than 1 GB in weight. I have tried several types but it has not been possible since it generates an error when loading the information. I don't know if this is directly due to the instance or the license I have.
-
Saving a 'styler' type object to a managed folder?
Hi, I am trying to save a pandas styler object as an HTML ( ref: pandas.io.formats.style.Styler.to_html — pandas 2.2.2 documentation (pydata.org) ) in Dataiku Managed folder (HDFS not local). Can you help me with that? My dataframe styler name: df folder handler: folder Code to save to a managed folder: region_monitor_path…
-
Count number of rows depending on a condition
Hello, I would like to increment the number of rows group by some variables only when a condition between a date and its lag is true. The idea is the following : if it is the first time we encounter an id, then var = 1; else if id = id_lag and dat - dat_lag > 30 then var = var +1 ; I try to do this with a window recipe but…
-
Add an automatic timestamp to a dataset name
Hello ! I would like to automatically add a timestamp to my output dataset names (and then export them to folders). Does someone know how to do that ? For example, at September, 10th, my dataset would be named "Dataset_100924"
-
Export datasets to folders
Hello ! I would like to export several datasets of my project to the same folder (each folder for each necessary date). Does someone know how to do that ? Thanks !
-
How to go from flat relational data to nested object oriented data
I am trying to combine multiple rows into a single nested json object. I know how to do the opposite (i.e. flatten), but cannot find the right tool to go the opposite direction. As an example, I start with this data: Class, Student, Grade 1, Sally, A 1, Matt, A 1, Phil, C What I want as an output is a single record: Class,…
-
Fuzzy Match in Alteryx equivalent tool in Dataiku
Hi Team, I am migrating a workflow from Alteryx to Dataiku where I encountered a tool called Fuzzy Match in the Alteryx which actually compared 3 columns and generate a new column that satisfies partial matching criteria. Below is the input data Below is the configuration in Alteryx tool Below is the sample output in…
-
Question about the if formula using a generic charaters string
Hello, I would like to use a "if" formula and make a condition only if the charaters string begin with "Adults" charaters. Thanks for uor help