Use of dataiku client commands in Scala

guitouneo · November 2022

Hi everybody I'm a beginner in Scala and I dont manage to find the code to do the same as I did in Pyspark :

import dataiku

client = dataiku.api_client()
dataset = client.get_project("MY PROJECT").get_dataset("MY DATASET")
schema = dataset.get_schema()
schema["columns"][0]["meaning"] = "Text"
schema["columns"][1]["meaning"] = "Text"
schema["columns"][2]["meaning"] = "FreeText"
dataset.set_schema(schema)

Could you help me and give me the code to do the same thing in Scala ?

Sarina · December 2022

Hi @guitouneo
,

Your example code is actually using the DSS Python API to get and set the dataset schema, as get_schema() and set_schema() are Python API functions that DSS supports:

client = dataiku.api_client()
dataset = client.get_project("MY PROJECT").get_dataset("MY DATASET")
schema = dataset.get_schema()
schema["columns"][0]["meaning"] = "Text"
schema["columns"][1]["meaning"] = "Text"
schema["columns"][2]["meaning"] = "FreeText"
dataset.set_schema(schema)

DSS does not have an equivalent Scala API, which indeed is why you were having trouble figuring out how to do this. Since we have a fully supported Python API, I would suggest sticking with Python/PySpark for any actions that use the DSS API. For other types of transformations (i.e. if you are transforming columns in your dataset, filtering your dataset, adding columns etc.) these types of operations don't use the DSS API, so you could do dataset transformations either in a PySpark recipe or a Scala recipe, whatever option is easiest for you.

I hope that information makes sense. Please let us know if you have any questions about this.

Thank you,
Sarina

Use of dataiku client commands in Scala

Answers

Categories

Setup Info

Tags