-
partitioning
Hi there, i have a problem aggregating big data file base on s3 the data is stored like this way /2022-12-21T00/B00/part-00004-d89b41ad-3d2b-4350-9880-c5f1dfbdbea6.c000.csv.gz the T00 stands for the hour and the B00 is a group that always contains the same subjects. now what i try to achieve what is to aggregate the B00…
-
calculate the difference between 2 datetime values in hh:mm
I have two datetime values date1: "2024-05-60 12:00:00" and date2: "2024-05-07 09:58:00" is it possible to calculate the difference in hh:mm between those two datetimes in dataiku? Operating system used: windows
-
Recipe to create multiple datasets from a single dataset
I have a dataset called "MasterData" in the flow. I want to subset this data based on the "Country" column and save it after including the name of the country (e.g. "MasterDataAustralia") and save it in a zone built exclusively for that Country. I have around 60+ countries in the master data, and new countries may be added…
-
How to create a dynamic contains function
I have 2 files / sheets, one that shows file names and one that shows colors filename sheet colors sheet what I'm trying to do is somehow link these two datasets, if filename contains one of the colors in my colors sheet, then in a new column list that color. desired output is that possible to do in Dataiku? I thought…
-
Github configuration
Hi everyone, I'm trying to connect with my private githun repo via ssh key but I gave this respond: An error happened while adding the following remote: origin (git@github.com:user-repo.git), and then fetching it, caused by: IOException: Process failure, caused by: IOException: Process execution failed (return code 1)…
-
Not able to read text files using Pyspark in Dataiku
Hi, I'm trying to read the text files from my managed folder using pyspark in Dataiku. I have created RDD but when I use collect() in RDD it throws error that path doesn't exist. Below is the code : # -*- coding: utf-8 -*- import dataiku from dataiku import spark as dkuspark from pyspark import SparkContext from…
-
Hide API Keys in Project Library Editor
Hi, so I have this following code in my project's Library Editor, however, I have manually defined keys, which I do not want to be shared. Let me give more contexts. I have a python script that calls the following function, and extracts my files from Confluence using from langchain.document_loaders import ConfluenceLoader…
-
How to read the text file using pyspark in Dataiku
I'm new to Dataiku and trying to read the text file using Pyspark in Dataiku. Tried creating dataframe using spark.read.text() & used sparl context to create RDD but both methods throw some error. Now when I'm creating spark context it throws error like "RuntimeError: Java gateway process exited before sending its port…
-
Assistance Needed with Custom Python Triggers in Dataiku
Hello Folks, I recently created a project in Dataiku aimed at collecting metric data at the beginning and end of each month. Here is a quick summary of my project: I used a scenario to execute an SQL query and set up triggers for the beginning and end of the month, with specific parameters to launch only on working days.…
-
Weighting method documented in ML Model Results?
Hi all, I have been unable to find documentation of the weighting method setting in the model results summary. Is it not there or am I just somehow missing it? I typically compare the performance of weighting (class weights) and no weighting methods when tuning my models. I'd like to be able to look at the results summary…