Store Json configuration in S3 bucket and loading it into Dataiku global variables

Options
sonal_18
sonal_18 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Frontrunner 2022 Participant Posts: 12 ✭✭✭✭

Hello Team,

I'm have storeed my JSON configuration on a S3 bucket and i wanted load it into dataiku global variables. But i'm facing challenges with the code to do it.

Could you please help me with it?

Answers

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    Options

    Hi,

    First you will need to load the JSON data from your file stored in S3 into a Python dictionary. It is possible by using the get_object() method from the boto3 package (assuming that the code is executed with the relevant AWS IAM permissions).

    From there, you can update the global variables using the Python client of Dataiku's API.

    Hope this helps!

    Best,

    Harizo

  • sonal_18
    sonal_18 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Frontrunner 2022 Participant Posts: 12 ✭✭✭✭
    edited July 17
    Options

    Hello @HarizoR
    ,

    i'm trying to do this with the below code and im getting error at the last line while setting the project variable.

    def _perform_empty(self, method, path, paraDataikuException: com.dataiku.common.server.DKUControllerBase$MalformedRequestException: Could not parse a ProjectVariables from request body

    Error :

    import dataiku
    import json
    import pandas as pd
    configuration = dataiku.Folder("U44wokeq")
    configuration.get_info()
    with configuration.get_download_stream("NGE_ODH_R3_DEV_CONFIG.json") as f:
    data = f.read()
    my_json = data.decode('utf8').replace("'", '"')
    # Load the JSON to a Python list & dump it back out as formatted JSON
    data = json.loads(my_json)
    s = json.dumps(data, indent=4, sort_keys=True)
    client = dataiku.api_client()
    project = client.get_project(dataiku.get_custom_variables()['projectKey'])
    project_variables = project.get_variables()
    project_variables['standard'] = s
    project.set_variables(project_variables)

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    edited July 17
    Options

    Hi,

    What you describe here differs quite a bit from your original question:

    - You are connecting to your S3 data through a managed folder instead of direct access.

    - You are setting up a project variable and not a global variable.

    That being said, the fix for your problem should be simple: you should directly pass the data dict object to project_variables["standard"] and not the s string:

    project_variables["standard"] = data

    Best,

    Harizo

  • sonal_18
    sonal_18 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Frontrunner 2022 Participant Posts: 12 ✭✭✭✭
    Options

    @HarizoR
    : It worked with "data". Thanks

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
    Options

    Hey @sonal_18
    we're happy you were able to find a solution! When you have a chance feel free to accept a solution to ensure others on the community can make use of it in the future!

  • sonal_18
    sonal_18 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Frontrunner 2022 Participant Posts: 12 ✭✭✭✭
    Options

    in the managed folder can i read "json" already stored in s3?

    My use case is i want to read json configuration stored from s3 and load it to dataiku variables.

Setup Info
    Tags
      Help me…