How to get a Pandas dataset from a Dataikuapi dataframe
Hi,
I am in the API designer, writing a Python function.
My goal is to get a pandas dataframe from a Dataiku dataset to be able to perform pandas transformations on it.
I am connecting via the dataikuapi to my project.
I am then getting my dataset via the dataikuapi, transforming it to a Dataiku core dataset and finally to a pandas dataframe.
client = dataikuapi.DSSClient(dss_instance_api_url, API_KEY) project = client.get_project("my_project") df = project.get_dataset("my_dataset").get_as_core_dataset().get_dataframe()
I get the error message:
Failed: Failed to run function : <class 'Exception'> : No DSS URL or API key found from any location
If I don't use the function "get_dataframe()" in my code, everything runs smoothly:
client = dataikuapi.DSSClient(dss_instance_api_url, API_KEY) project = client.get_project("my_project") df = project.get_dataset("my_dataset").get_as_core_dataset()
I was wondering what was wrong here and how to transform a dataikuapi dataset to a pandas dataframe?
Many thanks
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,108 Neuron
What exactly do you have in dss_instance_api_url and API_KEY? The API node doesn't have a DSS API to access it so perhaps you are getting confused here. This URL should point to either a Designer or Automation node. And the API Key should be valid on those instances. While you can fetch a dataset from a Designer or Automation node from a API deployed in the API node you should consider if this is pattern you want to go for. API services can be deployed on High Available mode in Kubernetes. But if your API function has to fetch a dataset from a Designer or Automation node you will have a single point of failure there since neither Designer nor Automation node work in HA.