Using Dataiku to analyze content update patterns from streaming platforms

EdwardWhitemore
EdwardWhitemore Registered Posts: 1

Hi everyone

I’m experimenting with Dataiku to track and analyze data from a few drama and entertainment streaming platforms.
The idea is to study how frequently new content (episodes or shows) gets updated and whether there are noticeable trends in release timing or viewer engagement.

I’ve connected some APIs and CSV data sources, but I’m noticing that sometimes the sync jobs either delay or skip new entries, especially when the data updates irregularly.

Has anyone worked with similar setups — where external content data (like from streaming or entertainment apps) needs to be analyzed in near real-time?
Would appreciate any best practices or suggestions on how to make data refresh and project automation more consistent.

Thanks in advance!

Operating system used: Window 11

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,353 Dataiker

    For real-time use cases like this, you can use Kafka as a streaming data source together with a continuous Python recipe in Dataiku.

    The main challenge is ensuring that your Kafka producer consistently receives updated records. This typically means periodically calling the APIs, fetching new data, and pushing those updates into Kafka. Once the data is streaming in, you can process it continuously within Dataiku using a continuous Python recipe.

    You can find more details in the Dataiku documentation here:
    https://doc.dataiku.com/dss/latest/streaming/index.html

    https://doc.dataiku.com/dss/latest/streaming/cpython.html

    That being said, a time-based scenario should also work; you can use the Upsert recipe to update records in the SQL database after retrieving them from the API.

    Can you elaborate on how you implemented and what you mean exactly by "noticing that sometimes the sync jobs either delay or skip new entries, especially when the data updates irregularly."

Setup Info
    Tags
      Help me…