Create sync recipe with python code

Tomas
Tomas Registered, Neuron 2022 Posts: 121 ✭✭✭✭✭
Hi,

how can I create a impala sync recipe with the Public API? I have the source managed dataset stored as parquet and I would like to create a code recipe with sql like "select count(*) from mytable" into a new parquet managed dataset.

So far I have been using this, but this method assumes that the output dataset is created:

CodeRecipeCreator(recipeName, 'impala',prj).with_input(inputDatasetName).with_output(outputDatasetName).with_script(code).build()

But the with_new_output method is not possible here,

Thanks

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Hi,

    Impala being a code recipe, the creator expects datasets to have been created by other means, possibly other calls to the public API. It's still quite simple to make your own Impala recipe creator to have the ability to create the output:

    from dataikuapi.dss.recipe import SingleOutputRecipeCreator
    class ImpalaRecipeCreator(SingleOutputRecipeCreator):
    def __init__(self, name, project):
    SingleOutputRecipeCreator.__init__(self, 'impala', name, project)



    And use it like:

    r = ImpalaRecipeCreator('test', prj).with_input(inputDatasetName).with_new_output(outputDatasetName, 'hdfs_managed', format_option_id='PARQUET_HIVE').build()



    The newely-created recipe will come with the default code snippet, which is a "select * from ..." . To change the SQL query, you can then get and set the recipe's definition.

    Regards,
    Frederic
Setup Info
    Tags
      Help me…