-
Dataset with big number of columns
I have a dataset with 396 columns and the dataset is incorrectly parsed in the dataset sample view whereas the preview (before the dataset creation) parse correctly the columns. Plus I have a warning in the sample view but not in the schema preview. I have the issue with a file with 60k rows and i tried with the same file…
-
How can I create a materialized view in DSS and use it in the flow ?
I have few tables in postgreSQL and I want to make a materialized view for a complex join query.
-
How can I set the mapping of an ElasticSearch Dataset ?
-
fail to run in dataset
I have a problem with my dataset. Every time when I want to update/deploy my data set, it showed "fail to run" or "the target server failed to respond". Do you know any possible reason to fix it? Thanks
-
Sync a dataset to S3 with headers, no compression and custom name
Hi, I create a workflow to normalize a dataset imported from AWS S3 bucket.I added a "prepare recipe" and it works fine. I want to export this dataset to S3, with a custom name, header, and no-compression (this file will be processed by an external tool) but I don't see any options to configure the export while using "sync…
-
Visualisation of res
Hi, I have made a script that shows me 32 different values (there are names) of my dataset. But I can't find those values in the dataset, although I think I don't have a sampling of it. Does someone have know why I can't see anything (because obviously the script won't create new names so those names have to exist…
-
Issue connecting to Hive : "Failed to synchronize Hive metastore"
Hi, I installed Dataiku successfully on my VM and created the two hdfs connections required by the tutorial : "hdfs_root" & "hdfs_managed". However, I can't seem to connect to Hive Metastore when I synchronize the haiku_shirt_sales.csv that I imported under the hdfs_managed directory to create a corresponding metastore…
-
How to sample an unbalanced dataset?
I have a dataset too big to fit in memory, so I want to down sample it. But the two classes to predict are unbalanced: there are many more 0's than 1's in the target column.
-
scrap the web
I was trying to use the scrap the web snippet I am getting java.io.IOException: Process return code is 1 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedRecipeRunner.execute(AbstractCodeBasedRecipeRunner.java:213) at…
-
“Unterminated quoted field at the end of the file”