Export a filtered dataset ( How to download a dataset which has a filter conditon)
For suppose if my dataset has 1000 rows, if I want to download only first 100 records out of 1000 or to download only the records which I filtered using filter condition? How can I download this filtered dataset?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,141 Neuron
Use the Sample recipe to filter and sample a large dataset. Then export the resulting dataset if you wish.
https://knowledge.dataiku.com/latest/data-preparation/visual-recipes/concept-sample-recipe.html
-
jp1 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8 ✭
@Turribeach
Thanks for the reply!! But I don't want to use another recipe in the flow to export filtered records. I Just want to export only the records that comes after applying the filter condition which is the right below of sampling method option of a left pane . I don't want to write my filtered records in another new dataset in order to download those filtered records instead I want to export from original dataset only!! Is this possible? -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,141 Neuron
No, that's not possible. You need to understand that the left pane is just controlling what sample data you see. Any filters and row limits apply to the data sample only, not to the export. In fact these sample settings don't apply to the following recipe in flow either, so it's best you leave data unfiltered when exploring datasets as it can lead to confusion for less advanced Dataiku users that could confuse filters when exploring datasets (which no effect in the data outputs) with filters in recipes (which obviously do have an effect in the data outputs).
You can however limit the amount of rows you export by clicking in the Advanced Properties in the export dialog. You are not able to filter data in exports though.
There are better way to explore datasets. Either use a SQL Notebook or a Python Notebook. Or use recipes as advised above and keep them in a different flow zone so you know they are for analysis only/