create empty hdfs dataset using schema

sasidharp · October 2020

I want to create an empty hdfs dataset with my preferred schema before connecting to any recipe.

how can i do that?

using sql query or any other recipe?

Ignacio_Toledo · October 2020

Hi @sasidharp
. What is your use case? I'm just starting on my journey with Hadoop and HDFS, but I'm not sure it makes sense talking about an empty HDFS dataset, as the dataset is actually a connection to a file system, from which DSS will try to guess an schema after detecting the file types (json, csv, parquet, etc).

sasidharp · October 2020

Hi @Ignacio_Toledo
, my use case is ike, i want to create tables in dataiku rather than creating in my hdfs core.

usually we import dataset from hdfs, but i want to create in dataiku and push it to hdfs as parquet file. and later that dataset is connected to a plugin recipe to add rows everytime the model is trained.

Ignacio_Toledo · October 2020

Aaah, so want you ment is that you want to create a dataset in HDFS then?

If that is so, do you have already a dataset in Dataiku you want to send to HDFS? The sync recipe should work, and create the output dataset with the options "Storage Type" set to "hdfs_managed" and "Format" to "Parquet".

create empty hdfs dataset using schema

Answers

Categories

Setup Info

Tags