Thread safe exclusive set variable in projects

tomas
tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭
edited July 16 in Using Dataiku

Hi,

just discovered when running multiple python recipes the the DSS Project API - `update_variables` is not thread safe, if when two updates run in the same time, one process can set a variable and the other can unset it.

Example:

project.get_variables().get('local')
{}
---
# These two recipes run in the same time:
# RecipeA - updates only A
project.update_variables({'A': 1}, 'local')
# RecipeB - updates only B
project.update_variables({'B': 2}, 'local')
---
project.get_variables().get('local')
{'B': 2}
# The result can vary also sometimes {'A': 1} or {'A': 1, 'B': 2}
# The correct result should be always {'A': 1, 'B': 2}

Before I bumped into this issue I was using get_variables to fetch the dict, modify a key, and set_variables.

So I assumed the update_variables (taking only a set of keys) will be implemented in a atomic way. But it looks like the implementation is "get all vars" and "set all vars" without any kind of locking.

Best Answer

Answers

  • tomas
    tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭
    edited July 17

    Thanks for confirming. I came up with not a 100% reliable, but still usable workaround:

    def set_project_var(project, key, value, where='local'):
        # Sets the key:value to the project's variable
        # where - can be local or standard
        # Race condition can happen so we set and check until it is set
        project.update_variables({key: value}, type=where)
        while True:
            # wait randomly up to 500ms
            sleep(0.5 * random())
            set_value = project.get_variables().get(where, {}).get(key)
            if not set_value or set_value != value:
                print('>> Race condition the previous set was not succesful')
                sleep(0.5 * random())
                project.update_variables({key: value}, type=where)
            else:
                return



Setup Info
    Tags
      Help me…