Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Processing a continuous stream of data

Level 2
Processing a continuous stream of data


I have a service that continuously emits data. You start receiving data once you have connected opening a TCP connection and never stops until you terminate the connection.

I'd like to develop a custom plugin to be able to process that data on Dataiku how I can do that as data never ends?

Will "build" log overload the server?




We are loading data from a flight's metasearch service. They expose a data stream we consume polling from a TCP connection ( We plan to use Dataiku to parse, sanitize, ... data and the drop into Hadoop apart from applying the corresponding analysis and lab 😉

@alexander Hope this helps

0 Kudos
3 Replies
Dataiker Alumni
Hi Gustavo,

For this type of use case, we would advise performing the data ingestion outside of Dataiku DSS, with a streaming engine such as Flume or Kafka.

Once the data is ingested, you can perform data transformation and machine learning modelling in DSS in a micro-batch way, using partitions to avoid recomputing on the whole data:


0 Kudos
Level 2

Hi @Alex_Combessie,

Can you tell me how to connect to kafka in dataiku?





0 Kudos
Dataiker Alumni

Hi Swarna,

I am not a specialist on the topic but here is the link within our documentation regarding Kafka.


0 Kudos


Labels (2)
A banner prompting to get Dataiku