Community Conundrum 25:Feature Visualization is now live! Read More

Is there a way to split a dataset based on a column value?

Dataiker
Dataiker
Is there a way to split a dataset based on a column value?
Hello all,

I have a dataset which contains many rows for a given event, which id is in the "event_id" column. There are of course many events in the dataset.

Is there a way to split this dataset more easily than manually defining the output datasets using the split visual recipe? There are hundreds of events... (it would be a bit painful, or at least time-consuming).

I am using DSS 2.2.2



Thanks in advance!
0 Kudos
1 Reply
Dataiker
Dataiker

Hi Alex,



There is not really a better way to do this than with the split recipe. If you want to have one dataset per event, you need anyway to create these datasets. Maybe you could create the datasets with DSS API but is still not ideal.



The best option would be to change your strategy. You should keep a single dataset and create a partition on the event_id column. To learn more about it, you can read Working with partitions and Repartitioning a non-partitioned dataset.



I hope that helps,

Jeremy

Jeremy, Product Manager at Dataiku
0 Kudos
Labels (2)