Operationalizing connection to REST APIs

tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron

Wondering if anyone out there has had some success in operationalizing connections to REST APIs as a source of data for DSS Projects. I would love to have a conversation with folks who are working on this type of challenge.

In my case:

Dataiku DSS has allowed us to automate the gathering of data from a CRM system not directly supported by DSS. And then build a decent model based on past decisions we have made.

However, we are having challenges operationalizing these results. In particular, reliably pulling data from our CRM system is a current major challenge. We are using the wonderful DSS API Connect plugin, which has been very helpful. But after weeks of trying we have not gotten a DSS project to gather this data on a week in and week out basis. Our best run has been sort of 8-10 days without challanges.

The data comes through two typical API endpoints:

  • a summary endpoint with a list of all records, the last updated date, and the ability to filter on update dates,
  • a detailed endpoint providing the current details

Our challenge is to build a reliable DSS flow that consistently recognizes when records change with the first endpoint and then gathers the new details on just the changed records from the second endpoint. (We need to take this incremental approach for several reasons related to monitoring state changes in constituents and more practically API rate limiting.)

Dataiku partitioned datasets, Scenarios, metrics, and variables are all very helpful. With these we are able to pull summary data and attach detailed data, recovering from some errors. But I have not been able to come up with a truly stable project. Intermittent Schema changes from the API Connect plugin when errors occur are causing some of our problems. And when these problems occur having to start reloading data from scratch with days of delay due to rate limiting is a real challenge.

I was wondering if anyone would be willing to spend a bit of time discussing Operationalizing REST API connections as primary data sources for data science projects.

Thanks for considering this request. If you are interested in a conversation please respond here or DM me here in the Dataiku Community.

Operating system used: Production Linux, Dev Mac OS

Setup Info
      Help me…