Handling Concurrent Access (read & write) Standalone Dataset for Parallel Scenarios

Options
jyosmitha
jyosmitha Registered Posts: 9

Hello everyone,

I have a situation where I've created a standalone dataset to track the status of multiple scenarios. Each scenario reads the dataset, updates its corresponding row with the status 'Completed' and the 'Run date.' The challenge arises when these scenarios run in parallel and attempt to access the dataset simultaneously, resulting in an error stating that the root path of the dataset doesn't exist. This error is triggered because one scenario's custom Python recipe is trying to read the dataset while another scenario's recipe is attempting to write to it (building the dataset).

Running the scenarios in sequence is not an option since each one takes a considerable amount of time to complete. To address this issue, I'm considering partitioning the dataset to allow multiple recipes to read and write in parallel. However, I'm not sure if partitioning would be a viable solution, or if there are any settings I can modify within the dataset to enable concurrent access.

I'd greatly appreciate your insights and suggestions on how to achieve my requirement of maintaining a dataset that tracks the completion status of multiple scenarios while allowing them to run in parallel without conflicting access.

Thank you in advance for your help!

Best Answer

Answers

  • jyosmitha
    jyosmitha Registered Posts: 9
    Options

    Thanks @AlexT
    .It appears that we have restricted access to create the 'StatsDb' dataset in our environment. I have contacted our Dataiku admin to address this issue. I'll keep you posted here if I have any further questions or once I manage to resolve it.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Indeed you will need to have an admin create a project with the internal stats datasets.

    We recommend they sync these to another dataset database or file based dataset on regular basis and share the coies of the dataset with end users.

    To avoid direct reada with the runtime database too frequently.

Setup Info
    Tags
      Help me…