How do I read a yml file from the dataiku library?
I have a yml config file and want to read it in my notebooks, how do I do that?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
-
I am not able to find a path to the file. It is showing error
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
You are using a relative path without even known what's your current directory. Also Dataiku does not allow you to read files from arbitrary directories, you should use a Dataiku managed folder.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
We had the same kind of use case, as we maintained some configuration yaml files in a bitbucket repository, so it seemed a good idea to connect it to the Project's "Libraries".
To find the path of your file I would use:
import os pythonpaths = os.environ['PYTHONPATH'].split(':')
for pp in pythonpaths:
local_lib_path = re.findall('.*lib/python$', pp)
if local_lib_path :
path = local_lib_path [0] + '/'
yaml_file = f'{path}/insights_config.yml'This should work, but it is not bullet proof, and there are special case that might need to be taken care of.
I think @Turribeach solution might be more robust. But I also appreciate the convenience of being able to edit the yaml file from the Dataiku editor.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @pnkj!
I found a better and robust solution. I'm surprised I didn't find it before:
import dataiku client = dataiku.api_client() project = client.get_default_project() library = project.get_library() with library.get_file('insights_config.yml') as fin: data = yaml.safe_load(fin)
More information here:
Project libraries - Dataiku Developer Guide -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
This is a very good option IF you want to have the YAML file stored as a project library. There is even an API to add a file to the project library so you could do this programmatically. Furthermore there are now even APIs to add a project library from Git programmatically and even push changes to the remote Git.
Having said that this may not be suitable for all use cases. For instance if the YAML file is holding data then the project library is not a good place for it and a managed folder is a better choice. Also if the YAML needs to be placed in a network share or cloud bucket then again a managed folder will be a better choice.