Using Dataiku

Dataset with big number of columns
I have a dataset with 396 columns and the dataset is incorrectly parsed in the dataset sample view whereas the preview (before the dataset creation) parse correctly the columns. Plus I have a warning in the sample view but not in the schema preview. I have the issue with a file with 60k rows and i tried with the same file…
How can I create a materialized view in DSS and use it in the flow ?
I have few tables in postgreSQL and I want to make a materialized view for a complex join query.
How can I set the mapping of an ElasticSearch Dataset ?
fail to run in dataset
I have a problem with my dataset. Every time when I want to update/deploy my data set, it showed "fail to run" or "the target server failed to respond". Do you know any possible reason to fix it? Thanks
Sync a dataset to S3 with headers, no compression and custom name
Hi, I create a workflow to normalize a dataset imported from AWS S3 bucket.I added a "prepare recipe" and it works fine. I want to export this dataset to S3, with a custom name, header, and no-compression (this file will be processed by an external tool) but I don't see any options to configure the export while using "sync…
Visualisation of res
Hi, I have made a script that shows me 32 different values (there are names) of my dataset. But I can't find those values in the dataset, although I think I don't have a sampling of it. Does someone have know why I can't see anything (because obviously the script won't create new names so those names have to exist…
Issue connecting to Hive : "Failed to synchronize Hive metastore"
Hi, I installed Dataiku successfully on my VM and created the two hdfs connections required by the tutorial : "hdfs_root" & "hdfs_managed". However, I can't seem to connect to Hive Metastore when I synchronize the haiku_shirt_sales.csv that I imported under the hdfs_managed directory to create a corresponding metastore…
How to sample an unbalanced dataset?
I have a dataset too big to fit in memory, so I want to down sample it. But the two classes to predict are unbalanced: there are many more 0's than 1's in the target column.
scrap the web
I was trying to use the scrap the web snippet I am getting java.io.IOException: Process return code is 1 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.execute(AbstractCodeBasedRecipeRunner.java:213) at…
“Unterminated quoted field at the end of the file”

Trending Discussions

How to automate flow flow build ?
I have multiple python scripts that i use to ingest data from DB and perform ETL steps . At the moment all the ETL logic runs on an on premise server (mainly using python / cron). Since we have Dataiku available since few things, i'm thinking about migrating all our ETL to Dataiku (flows using python recipes). Yet , it is…
compatibility of the Foreach and transpose transformations
Hi, I have a project which uses Foreach statement and right after Transpose. The Result of the output should be 26 as u see in the picture but we get 19 for some reason. When we spited the prepare statement in two moving the Transpose in a separate prepare the output was correct. I do believe it has to do something with…
Performance issue in Dataiku.
Hi, I am new to Dataiku and creating one pipeline like, datbricks-read-only dataset to -> prepare recipe (databricks dataset) ->(sync) databricks dataset to ->(sync) Azure dataset and then further process . In prepare recipe I am taking only required columns and renaming it so no space should be there. pipeline is like as…

Leaderboard

Turribeach 3813

tgb417 2527

Ignacio_Toledo 1089