Use of dataiku client commands in Scala

guitouneo
guitouneo Registered Posts: 1

Hi everybody I'm a beginner in Scala and I dont manage to find the code to do the same as I did in Pyspark :

import dataiku

client = dataiku.api_client()
dataset = client.get_project("MY PROJECT").get_dataset("MY DATASET")
schema = dataset.get_schema()
schema["columns"][0]["meaning"] = "Text"
schema["columns"][1]["meaning"] = "Text"
schema["columns"][2]["meaning"] = "FreeText"
dataset.set_schema(schema)

Could you help me and give me the code to do the same thing in Scala ?

Tagged:

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 320 Dataiker
    edited July 2024

    Hi @guitouneo
    ,

    Your example code is actually using the DSS Python API to get and set the dataset schema, as get_schema() and set_schema() are Python API functions that DSS supports:

    client = dataiku.api_client()
    dataset = client.get_project("MY PROJECT").get_dataset("MY DATASET")
    schema = dataset.get_schema()
    schema["columns"][0]["meaning"] = "Text"
    schema["columns"][1]["meaning"] = "Text"
    schema["columns"][2]["meaning"] = "FreeText"
    dataset.set_schema(schema)

    DSS does not have an equivalent Scala API, which indeed is why you were having trouble figuring out how to do this. Since we have a fully supported Python API, I would suggest sticking with Python/PySpark for any actions that use the DSS API. For other types of transformations (i.e. if you are transforming columns in your dataset, filtering your dataset, adding columns etc.) these types of operations don't use the DSS API, so you could do dataset transformations either in a PySpark recipe or a Scala recipe, whatever option is easiest for you.

    I hope that information makes sense. Please let us know if you have any questions about this.

    Thank you,
    Sarina

Setup Info
    Tags
      Help me…