Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi every,
I created a Dash Webapp with a button that triggers a scenario using the dataiku API.
The scenario runs a python recipe in my flow, where I'm a creating a custom ML model and making inferences based on the variables filled by the user in the Dash Webapp (setting them as global variables of the project, and then retrieving them in the python recipe). After the scenario is completed, I'm retrieving the output dataset of the recipe.
I have a few questions here :
- What will happen if there is concurrent users on the app that triggers scenarios at the same time, or one user who triggers a scenario while one is already running ? Is there a queu for scenarios ?
- For maintainability and other reasons, I chose to keep all the "autoML" code in a python recipe and then running it through the scenario. But is there any other way to do that ? It's pretty heavy and I didn't want to put it in the backend of my app.
As I'm beginning in the ML world, I may have other questions later on.
Thanks for your help !
Hi @brs,
There is not a queue for scenarios, so if a scenario is triggered while it is already running, the "second" scenario run will not execute.
In your webapp you could use scenario polling in order to:
Another option that you could consider, though is a bit messier would be to programmatically create and destroy a new scenario in your webapp code each time it you receive the trigger action. That would allow you to spin off any number of scenarios concurrently. To do this you could do something like:
Here's a brief example of that approach:
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
project = client.get_project('PROJECT_ID')
base_scenario = project.get_scenario('SCENARIO_NAME')
# get the settings field for the persistent scenario
settings = base_scenario.get_settings()
# create your temporary scenario
temp_scenario = project.create_scenario('my_temp_scenario', 'step_based', {'params' : settings.data['params']})
# run your scenario
temp_scenario.run_and_wait()
# remove your temporary scenario
temp_scenario.delete()
Regarding your second question, keeping your "autoML" code in a separate Python recipe does indeed seem like a good idea for maintainability and easier troubleshooting. So your approach is probably the cleanest approach. You might also want to look at project libraries as another option for organizing your code.
I hope that information is helpful, let us know if you have any other questions.
Thank you,
Sarina
Hi @brs,
I'm glad to hear!
Apologies, you are correct - triggering the same recipe concurrently will also not work well. So the temporary scenario approach will not work if the scenario builds your flow. Depending on what it does, you could potentially pull your Python recipe code (or code that calls project libraries) into a Python step in a scenario instead of a recipe in order to accomplish this method.
However, for your use case and the cleanest setup, using the "polling" approach certainly seems the most streamlined option. Hopefully that will work for you!
Thanks,
Sarina
Hi @brs,
There is not a queue for scenarios, so if a scenario is triggered while it is already running, the "second" scenario run will not execute.
In your webapp you could use scenario polling in order to:
Another option that you could consider, though is a bit messier would be to programmatically create and destroy a new scenario in your webapp code each time it you receive the trigger action. That would allow you to spin off any number of scenarios concurrently. To do this you could do something like:
Here's a brief example of that approach:
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
project = client.get_project('PROJECT_ID')
base_scenario = project.get_scenario('SCENARIO_NAME')
# get the settings field for the persistent scenario
settings = base_scenario.get_settings()
# create your temporary scenario
temp_scenario = project.create_scenario('my_temp_scenario', 'step_based', {'params' : settings.data['params']})
# run your scenario
temp_scenario.run_and_wait()
# remove your temporary scenario
temp_scenario.delete()
Regarding your second question, keeping your "autoML" code in a separate Python recipe does indeed seem like a good idea for maintainability and easier troubleshooting. So your approach is probably the cleanest approach. You might also want to look at project libraries as another option for organizing your code.
I hope that information is helpful, let us know if you have any other questions.
Thank you,
Sarina
I tried to use this code to "clone" an existing scenario so that I might be able to run concurrent instances. The scenario I'm using as the base scenario is custom_python. This code snippet just recreated a default custom_python scenario and did not "clone" my base scenario as I was expecting. Am I missing somethng?
Thanks for your answer, it's helping me a lot !
As for your suggestion of creating temporary scenarios, it seems pretty interesting, but will there be any problem if multiple scenarios run the same python recipe at the same time ? And since I'm retrieving global project variables in my python recipe to create the model, I feel like running concurrent scenarios with this setup could become a bit messy.
I was not aware of project libraries, it also seems to be a pretty interesting alternative.
Thanks again !
Hi @brs,
I'm glad to hear!
Apologies, you are correct - triggering the same recipe concurrently will also not work well. So the temporary scenario approach will not work if the scenario builds your flow. Depending on what it does, you could potentially pull your Python recipe code (or code that calls project libraries) into a Python step in a scenario instead of a recipe in order to accomplish this method.
However, for your use case and the cleanest setup, using the "polling" approach certainly seems the most streamlined option. Hopefully that will work for you!
Thanks,
Sarina