API to create a managed dataset

Chiktika
Chiktika Registered Posts: 24 ✭✭✭✭

In a python recipe, is it possible to create a new managed dataset in GCS using dss API ?

create_dataset() method works only to create non-managed dataset.

Many thanks.

Tagged:

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭

    Hi @Chiktika

    You can think of managed datasets as those resulting from a transformation (recipes in DSS). It's managed because DSS will produce (or append to) it in the process of running the recipe.

    As such, you need to consider whether you mean that you want to define both the transformation and the resulting dataset or if indeed it can be unmanaged. It will all depend on how you intend to use afterwards.

    If you want to define recipes as part of your project, then you can refer to this part of the docs.

    I hope this helps!

  • Chiktika
    Chiktika Registered Posts: 24 ✭✭✭✭

    Hi Liev,

    many thanks for your quick answer.

    That's pretty clear.
    I need to automate the creation of a large number of empty datasets and write programmatically datasets's checks.

    Others recipes will write data into them.

    Reading others topics I finally found a way to succeed :

    First :
    output_dataset = project.create_dataset(output_dataset_name, type='GCS', params=params,
    formatType='csv', formatParams=format_params)

    And then :
    ds_def = output_dataset.get_definition()
    ds_def['managed'] = True
    ds_def['metricsChecks']['runOnBuild'] = True
    ...
    output_dataset.set_definition(ds_def)

    Thanks a lot.

  • Pavan
    Pavan Registered Posts: 4 ✭✭✭✭

    Thanks @Chiktika
    . This helped me creating managed datasets through python API.

Setup Info
    Tags
      Help me…