Filter large dataset using a plugin

Highlighted
UserBird Dataiker
Dataiker
Filter large dataset using a plugin
I have a large dataset (~200GB) that I would like other users to be able to filter by entering configuration parameters. (e.g. column contains "term")

Plugin recipes are written in Python (or R), however the dataset is too large to load into a pandas DataFrame.

How can I write the plugin recipe so that the data can be easily filtered by a user?
2 Replies
UserBird Dataiker
Dataiker
Re: Filter large dataset using a plugin
Hi Jonathan,

In python, you can use our advanced API to read a large dataset by chunks:

https://doc.dataiku.com/dss/latest/api/python/advanced.html#chunked-reading-and-writing-with-pandas

Otherwise, you can design a plugin to call SQL, and do the filtering in SQL: https://doc.dataiku.com/dss/latest/api/python/sql.html

Best regards,

Henri
Jonathan
Level 1
Re: Filter large dataset using a plugin
Thanks Henri
0 Kudos
Labels (3)