create empty hdfs dataset using schema
I want to create an empty hdfs dataset with my preferred schema before connecting to any recipe.
how can i do that?
using sql query or any other recipe?
Answers
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @sasidharp
. What is your use case? I'm just starting on my journey with Hadoop and HDFS, but I'm not sure it makes sense talking about an empty HDFS dataset, as the dataset is actually a connection to a file system, from which DSS will try to guess an schema after detecting the file types (json, csv, parquet, etc). -
Hi @Ignacio_Toledo
, my use case is ike, i want to create tables in dataiku rather than creating in my hdfs core.usually we import dataset from hdfs, but i want to create in dataiku and push it to hdfs as parquet file. and later that dataset is connected to a plugin recipe to add rows everytime the model is trained.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Aaah, so want you ment is that you want to create a dataset in HDFS then?
If that is so, do you have already a dataset in Dataiku you want to send to HDFS? The sync recipe should work, and create the output dataset with the options "Storage Type" set to "hdfs_managed" and "Format" to "Parquet".