Do you know the History of Data Science? READ MORE

Concurrent triggers of scenarios in WebApp

Solved!
brs
Level 1
Concurrent triggers of scenarios in WebApp

Hi every,

I created a Dash Webapp with a button that triggers a scenario using the dataiku API.

The scenario runs a python recipe in my flow, where I'm a creating a custom ML model and making inferences based on the variables filled by the user in the Dash Webapp (setting them as global variables of the project, and then retrieving them in the python recipe). After the scenario is completed, I'm retrieving the output dataset of the recipe. 

I have a few questions here : 

- What will happen if there is concurrent users on the app that triggers scenarios at the same time, or one user who triggers a scenario while one is already running ? Is there a queu for scenarios ? 

- For maintainability and other reasons, I chose to keep all the "autoML" code in a python recipe and then running it through the scenario. But is there any other way to do that ? It's pretty heavy and I didn't want to put it in the backend of my app. 

As I'm beginning in the ML world, I may have other questions later on.

Thanks for your help ! 

0 Kudos
2 Solutions
SarinaS
Dataiker
Dataiker

Hi @brs

There is not a queue for scenarios, so if a scenario is triggered while it is already running, the "second" scenario run will not execute. 

In your webapp you could use scenario polling in order to: 

  • check if the scenario is already running when a user performs the action that will queue the scenario 
  • if it's not running, trigger the scenario
  • if it is currently running, poll the scenario status until the scenario has completed, and then once it is complete, trigger the scenario run that was blocked by the prior run. 

Another option that you could consider, though is a bit messier would be to programmatically create and destroy a new scenario in your webapp code each time it you receive the trigger action. That would allow you to spin off any number of scenarios concurrently. To do this you could do something like: 

  • create a permanent base scenario (i.e. your current scenario) 
  • in your webapp code, pull the existing scenario configuration using get_settings() and then create a new scenario using project.create_scenario(), trigger the run of the temp scenario, and then finally delete the temporary scenario. 

Here's a brief example of that approach:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_project('PROJECT_ID')
base_scenario = project.get_scenario('SCENARIO_NAME')

# get the settings field for the persistent scenario 
settings = base_scenario.get_settings()

# create your temporary scenario 
temp_scenario = project.create_scenario('my_temp_scenario', 'step_based', {'params' : settings.data['params']})

# run your scenario 
temp_scenario.run_and_wait()

# remove your temporary scenario 
temp_scenario.delete()

 
Regarding your second question, keeping your "autoML" code in a separate Python recipe does indeed seem like a good idea for maintainability and easier troubleshooting. So your approach is probably the cleanest approach. You might also want to look at project libraries as another option for organizing your code.

I hope that information is helpful, let us know if you have any other questions. 

Thank you,
Sarina 

View solution in original post

SarinaS
Dataiker
Dataiker

Hi @brs,

I'm glad to hear! 

Apologies, you are correct - triggering the same recipe concurrently will also not work well. So the temporary scenario approach will not work if the scenario builds your flow. Depending on what it does,  you could potentially pull your Python recipe code (or code that calls project libraries) into a Python step in a scenario instead of a recipe in order to accomplish this method.

However, for your use case and the cleanest setup, using the "polling" approach certainly seems the most streamlined option. Hopefully that will work for you! 

Thanks,
Sarina

View solution in original post

3 Replies
SarinaS
Dataiker
Dataiker

Hi @brs

There is not a queue for scenarios, so if a scenario is triggered while it is already running, the "second" scenario run will not execute. 

In your webapp you could use scenario polling in order to: 

  • check if the scenario is already running when a user performs the action that will queue the scenario 
  • if it's not running, trigger the scenario
  • if it is currently running, poll the scenario status until the scenario has completed, and then once it is complete, trigger the scenario run that was blocked by the prior run. 

Another option that you could consider, though is a bit messier would be to programmatically create and destroy a new scenario in your webapp code each time it you receive the trigger action. That would allow you to spin off any number of scenarios concurrently. To do this you could do something like: 

  • create a permanent base scenario (i.e. your current scenario) 
  • in your webapp code, pull the existing scenario configuration using get_settings() and then create a new scenario using project.create_scenario(), trigger the run of the temp scenario, and then finally delete the temporary scenario. 

Here's a brief example of that approach:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_project('PROJECT_ID')
base_scenario = project.get_scenario('SCENARIO_NAME')

# get the settings field for the persistent scenario 
settings = base_scenario.get_settings()

# create your temporary scenario 
temp_scenario = project.create_scenario('my_temp_scenario', 'step_based', {'params' : settings.data['params']})

# run your scenario 
temp_scenario.run_and_wait()

# remove your temporary scenario 
temp_scenario.delete()

 
Regarding your second question, keeping your "autoML" code in a separate Python recipe does indeed seem like a good idea for maintainability and easier troubleshooting. So your approach is probably the cleanest approach. You might also want to look at project libraries as another option for organizing your code.

I hope that information is helpful, let us know if you have any other questions. 

Thank you,
Sarina 

View solution in original post

brs
Level 1
Author

Thanks for your answer, it's helping me a lot ! 

As for your suggestion of creating temporary scenarios, it seems pretty interesting, but will there be any problem if multiple scenarios run the same python recipe at the same time ? And since I'm retrieving global project variables in my python recipe to create the model, I feel like running concurrent scenarios with this setup could become a bit messy.

I was not aware of project libraries, it also seems to be a pretty interesting alternative.

Thanks again !

 

0 Kudos
SarinaS
Dataiker
Dataiker

Hi @brs,

I'm glad to hear! 

Apologies, you are correct - triggering the same recipe concurrently will also not work well. So the temporary scenario approach will not work if the scenario builds your flow. Depending on what it does,  you could potentially pull your Python recipe code (or code that calls project libraries) into a Python step in a scenario instead of a recipe in order to accomplish this method.

However, for your use case and the cleanest setup, using the "polling" approach certainly seems the most streamlined option. Hopefully that will work for you! 

Thanks,
Sarina

View solution in original post

A banner prompting to get Dataiku DSS