Defining a variable before scenario starts

SalehBM
Level 2
Defining a variable before scenario starts

Hello everyone!

This past week, I've been working on a project where I've implemented a scenario reliant on database modifications. However, the issue is encountered when you need to redefine the same variable every time the scenario runs. This process involves loading a huge dictionary and AI models, consuming 2-3 minutes with each run, which is a problem and not acceptable in this case. My question is: Is there a way to predefine the variable in the background so that when the scenario runs, the variable is already defined, requiring only its utilization without the need for repeated defining?

Thank you

0 Kudos
7 Replies
AshleyW
Dataiker

Hi @SalehBM

Would storing that variable as a project variable and defining a process for updating it that's separate from your scenario variable work?

 

Cheers, 

Ashley

 

0 Kudos
SalehBM
Level 2
Author

Hey Ashley

 

Thank you for your reply. The question is, can I store an AI model, weights, or a large dictionary as a project variable? If that's possible, I wouldn't need to update the variable since it remains constant in every run.

0 Kudos
Turribeach

Create another scenario that pre-calculates the variable and passes it to the other scenario when it has the value. It's not clear to me what triggers your scenario, what's the purpose of this variable and why you can't calculate it in advance.

0 Kudos
SalehBM
Level 2
Author

Hey there!

 

Thank you for ur reply. My trigger is to count the rows of the database. Whenever there's a change it'll run and retrieve all rows from the database that comply with the rules. Then, it'll process using these variables, (which take a long time to define).

 

So how could I create a scenario that pre-calculates the variable and pass it, does it really work in my case ๐Ÿค”

0 Kudos

It is still unclear to me what's the dependency between the variable you need to calculate as part of your scenario run and the database changes which trigger the scenario. You seem to imply that this variable can be calculated before but it's unclear to me how much time before this can be done and why having a separate scenario would be a problem for you, I don't see what stops you from doing that and why you think you can't use a second scenario. For instance couldn't you calculate the variable at midnight every day so it's ready for your scenario run? You are not giving us all the requirements to understand your problem.

In any case I wouldn't use project variables for this. The easiest way would to store the value of your variable in a Dataiku dataset. You can easily do that using a Python recipe. If your concern is the additional 2-3 minutes to calculate the variable at the start of the scenario run then you could move the calculation to the end of the flow, assuming there is no timing dependency between the variable and the execution of the scenario. You would only need to run the recipe that populates the variable dataset with the variable value once to "seed" the flow once. After that the scenario will always use the previous execution value of the variable saving you the 2-3 calculation time. You can prevent circular flow references by using a SQL Script recipe to read the previous value of the variable at the start of the flow, assuming you are using a SQL database as your results store.

0 Kudos
SalehBM
Level 2
Author

When establishing the variable, especially for an AI model laden with multi-hidden layer weights, storing it in a Dataiku dataset isn't feasible. It's essential to emphasize that its weights remain unchanged consistently; there's no need for recalculations since the model's weights remain static. This variable plays a crucial role in processing newly inserted rows in the database through a SQL trigger within Dataiku. My point is, how can I ensure a constant variable without the necessity of redefining it in each run, disregarding the typical 2-3 minute delay due to file reading into memory during initialization?

0 Kudos

Why is storing the variable it in a Dataiku dataset unfeasible? Explain the problem. 

0 Kudos