Same input and output external database table in python recipe

georgeannie
georgeannie Registered Posts: 3 ✭✭✭

Hi There,

I am using an external snowflake table to capture metrics, say Table ABC. During training/retraining, I would like the flow to read the same table ABC before updating the model for predictions and write the new metrics and the path to the model in the same table ABC. I have written the python recipe to do this and I have used the dataiku command to read the table ABC and write_from_dataframe to write to the table ABC. The recipe works when I run the recipe by itself. However, if I run the zone containing the same recipe, it gives me the error -

Nothing to build

This zone has no buildable dataset.

I am trying to understand how I can fix this keeping the same functionality. I am also trying to understand why the zone would not build if the same table ABC is used as input and output.

Any insights would be helpful.

For the model, I am not using a traditional model. Hence tracking the metrics manually is mandatory. I am not looking for suggestions on the model or alternatives to capture metrics but a fix to use same table as input/output and run the zone without error.

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
    Answer ✓

    This sort of design is not allowed since it's circular but there are ways around it. On Python recipe you can remove the input. But if you try to read an input that's not defined you will get this error:

    Job failed: Error in Python process: At line 10: <class 'Exception'>: You cannot read dataset [DS]], it is not declared as an input 
    

    One way around this error is to use the ignore_flow=True in the constructor of the Dataset() class. Below is a sample Python recipe that reads and writes to the same dataset:

    input_dataset = dataiku.Dataset("some DS", ignore_flow=True)
    
    df = input_dataset.get_dataframe()
    
    output_dataset = dataiku.Dataset("some DS")
    output_dataset.write_with_schema(df)
    

Answers

Setup Info
    Tags
      Help me…