create empty hdfs dataset using schema

sasidharp
Level 3
create empty hdfs dataset using schema

I want to create an empty hdfs dataset with my preferred schema before connecting to any recipe.

how can i do that?

using sql query or any other recipe?

 

0 Kudos
3 Replies
Ignacio_Toledo

Hi @sasidharp. What is your use case? I'm just starting on my journey with Hadoop and HDFS, but I'm not sure it makes sense talking about an empty HDFS dataset, as the dataset is actually a connection to a file system, from which DSS will try to guess an schema after detecting the file types (json, csv, parquet, etc).

 

0 Kudos
sasidharp
Level 3
Author

Hi @Ignacio_Toledo , my use case is ike, i want to create tables in dataiku rather than creating in my hdfs core. 

usually we import dataset from hdfs, but i want to create in dataiku and push it to hdfs as parquet file. and later that dataset is connected to a plugin recipe to add rows everytime the model is trained.

0 Kudos
Ignacio_Toledo

Aaah, so want you ment is that you want to create a dataset in HDFS then?

If that is so, do you have already a dataset in Dataiku you want to send to HDFS? The sync recipe should work, and create the output dataset with the options "Storage Type" set to "hdfs_managed" and "Format" to "Parquet".

0 Kudos