Acessing datasets during custom trigger

indy2005
indy2005 Registered Posts: 21 ✭✭✭✭

Hi,

I am writing a custom trigger, which does a few things - all of which work in a notebook.

I am reading in an Excel file using xlrd, and checking a data in a particular cell. This is coming from a managed folder linked to FTP, and works in the notebook.

I am then reading in 2 datasets, one from sharepoint and one from oracle, to check some status records and ETL dates from the source system.

If all these things are OK, I fire the trigger.

The base code works in a notebook, it reads in the excel file, parses datasets and then returns a boolean.

When I put this code into a custom trigger, it never fires. I can't see the trigger logs as I am not an admin.

My concern is that perhaps when writing custom code in a trigger, you cannot do the things you can do in a notebook or in a python flow recipe, like access project data sets or managed folders?

Any advice appreciated.

i

Tagged:

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 2024

    Hi @indy2005
    ,

    If your code results in the logic that would fire the "trigger" in a notebook, I would expect you to be able to run it as a custom trigger as well. For a high level example of a trigger that uses the Dataiku APIs to read in datasets, here is an example:

    from dataiku.scenario import Trigger
    import dataiku
    
    client = dataiku.api_client()
    project = client.get_default_project()
    my_dataset = dataiku.Dataset('MY_DATASET')
    df = my_dataset.get_dataframe()
    
    if len(df) > 0:
        t = Trigger()
        t.fire()


    I know that it is quite difficult to troubleshoot custom triggers though, since there isn't a separate log to use to grabbing any potential errors. My suggestion would be to:
    Create a custom Python step in your scenario with your trigger logic. Add in plenty of "print" statements to use for troubleshooting where the trigger many not be firing. For instance, print out each element of your trigger logic and your trigger condition with clear indicators like so:

    import dataiku
    
    client = dataiku.api_client()
    project = client.get_default_project()
    my_dataset = dataiku.Dataset('MY_DATASET')
    df = my_dataset.get_dataframe()
    print('====== DATAFRAME LENGTH ======')
    print(len(df))
    print('====== DATAFRAME DATA =======')
    print(df.head())
    
    print('======= TRIGGER FIRE CONDITION ======')
    print(len(df) > 0)
    if len(df) > 0:
        # this is where the trigger would happen 
        print('TRIGGERED!!')


    This should allow you to more easily debug the trigger logic within the scenario. Then if you run the scenario manually, you can click into the scenario Custom Python "step log" on the last runs page for the scenario. You should be able to see your print lines logged, and hopefully this will clarify where the trigger isn't getting fired and allow you to troubleshoot accordingly.

    Can you try out this method of troubleshooting? If you do run into any issues, we'd be happy to take a look if you want to open a support ticket and attach a scenario diagnostic where you added in your trigger logic into a custom python step, as that should help us troubleshoot.

    Thank you,
    Sarina

Setup Info
    Tags
      Help me…