-
Combine split and get in a formula
Hello everybody, it's my first post on the comunnity. I've a question, I try to alterate a column its save a dates in format dd/MM/yyyy HH:mm:ss, and I want it to split for get the hour, I tried with asDate, get and split but unsuceess. Somebody to help me. Thanks in advance. HP
-
DataFrame developed with PySpark remains running without yielding any results.
Hi everyone, I´ve a challengue with a jupyther notebook using pyspark. The trouble is when I try to instance a dataframe with the instruction write_with_schema. The complete sentence are: import dataiku from dataiku import spark as dkuspark from pyspark import SparkContext from pyspark.sql import SQLContext sc =…
-
Shapely feature importance
Hello Dataiku team, Thanks for this tool< has been useful for my school projects. I am currently trying to map the features that are influential in my model using Shapely feature importance, But they are not changing to different target classes. How do i resolve this, again this is a very useful tool. Best, Jyothikamalesh.S
-
HOw to learn dataiku
hii guys, i want to use the dataiku as an administrtor i dont know where ton start? do you know from where i can start. documents are not of my type
-
Running multiple SQL statements within a Python recipe to return a dataframe
Hi, Here is some SQL our internal Python recipe creates dynamically as it boots to check of the database, schema and table have proper permissions before performing a function. show grants on database MYDB;create or replace temporary table db_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" =…
-
Wrong date format when reading Excel file
Hi, I have some issue when I'm reading Excel files stored in folder. I import 1 file by month in my folder and I create a dataset to read all files and obtain one stacked table as result. For some of them (but not all), the format is not read like it's stored : * In my Excel file, the column is store with dd/MM/yyyy format…
-
Very big dataset
I have a very large dataset, 16.8billion records and about 8TB. It takes days to do any operation on the data and the project owner want to use all the data and not subset. Dataiku and S3 get into memory errors after several hours of running. Looking for some general guidelines on how to handle this situation. Thank you.
-
Spark integration problem on k8s
Hi all , I have problem with deploying. the spark on k8s and it shows me the below error . dataiku@dataiku---design:~$ /data/design/bin/dssadmin install-spark-integration -standaloneArchive spark-3.3.4-bin-hadoop3.tgz [+] Saving installation log to /data/design/run/install.log [+] Standalone mode selected + Using…
-
Dependency between Projects for Building DataSets
Is there an option to set dependencies as we do with DataStage ETL jobs while triggering jobs. Like set predecessor for a job that is waiting to be executed in the queue. Ex: Imagine Project B is being built or refreshed, with in Project B there is a data set which gets refreshed only when Project A is built/refreshed. Is…
-
Project Export Null Pointer Exception
I'm trying to export a project and this error comes up. I suspected it could be due to limited disk space. I tried not exporting any data but just the shell project and the error persists. Can anyone help? An internal error occurred Please report this issue to Dataiku DSS Support Technical details follow: * Internal error,…