Community Conundrum 25: Feature Visualization is now live! Read More

Accessing Partition of Dataset in Python Recipe

Level 1
Accessing Partition of Dataset in Python Recipe

My goal is to iterate through the different partitions of a dataset, but I'm having trouble accessing the partitions that exist. For more context, I have a set of functions to manipulate the dataframe that is passed through. I would like to  loop through each partition and set that to a dataframe that can be passed through the functions.

 

I tried using the function iter_rows and specifying the partition spec, but I receive an error that the function does not have the argument "partitions". Could you help me understand why the partitions argument is not working and/or an alternative to accessing a partition of a dataset? Is there a way to only choose a partition when running the get_dataframe function?

DV_1-1579216650469.png

DV_3-1579216840816.png

 

Thank you!

0 Kudos
1 Reply
Dataiker
Dataiker

Hi,

Selecting partitions is done on the Dataset object, not at the time of iterating or getting dataframes:

grp = dataiku.Dataset("mydataset")

grp.add_read_partitions(["1"])

for x in grp.iter_rows():
    # This will only retrieve rows of partition 1
    do_stuff()

 

0 Kudos
Labels (2)