Writing to partitions | Partitions

Solved!
GuimoAAGG
Level 1
Level 1
Writing to partitions | Partitions

Hello everyone,

I'm working with datasets with partitions, I found a post showing how to read a partition from a dataset but I have not found a way to write that partition down on another dataframe in the same partition name.

 

 

for p in dataset.list_partitions():
    dataset.read_partitions = [p]
    df = dataset.get_dataframe()
    print(p, df.shape)  # transformations on df, works down to here
    """How can I write down this to the corresponding partition of another dataset"""
    with dataset_2.get_writer() as writer:
        dataset_2.writePartition = [p]
        writer.write_dataframe(df)

 

 

 

0 Kudos
1 Solution
RoyE
Dataiker
Dataiker

Hello

Thank you for the clarification! 

In order to write to a partition, you must use the below in order to set the correct partition!

<dataset>.set_write_partition("<partition name>")

 

https://doc.dataiku.com/dss/latest/python-api/datasets-reference.html#dataiku.Dataset.set_write_part...

Please note that set_write_partition requires the use of DatasetWriter.

https://doc.dataiku.com/dss/latest/python-api/datasets-reference.html#dataiku.core.dataset_write.Dat...

with myoutputdataset.get_writer() as writer:
    for p in mydataset.list_partitions():
        mydataset.read_partitions = [p]
        myoutputdataset.set_write_partition(str(p))
        df = mydataset.get_dataframe()

        writer.write_dataframe(df)
writer.close()

 

This code should iterate through each partition in your old dataset to your new one!

 

Roy

 

View solution in original post

4 Replies
RoyE
Dataiker
Dataiker

Hello!

Just to be clear, are you trying to save a particular single partition to a new dataset? 

If so, in your for loop, while you are iterating through the different partitions, you could insert an if statement that would save the specific partition that you are looking for.

for p in mydataset.list_partitions():
    mydataset.read_partitions = [p]
    if (p == '2021'): #In this case, my dataset is partitioned by year
        df = mydataset.get_dataframe()

myoutputdataset = dataiku.Dataset("New_partitioned")
myoutputdataset.write_with_schema(df) #as a new dataset will require the schema to be written as well.

 

Are you executing this code through a Python Recipe? 

If so, another alternative is to use the partitioning filters in the Inputs/Outputs tab of the Recipe.

Screen Shot 2021-07-09 at 10.46.39.png

 

Please let me know if I understood or misunderstood the intent and I can provide further assistance! 

Roy

0 Kudos
GuimoAAGG
Level 1
Level 1
Author

Thanks RoyE

Basically I'm trying to move partitions from one partitioned dataset to another partitioned dataset.

For example, I read the dataset:

dataset = dataiku.Dataset('table_name')
dataframe = dataset.get_dataframe()

I am trying to save it to a different place. i've tried 

  • write_dataframe
  • write_with_schema

but I get this exception.

Exception: An error occurred during dataset write (FpKjfFG9nB): RuntimeException: A partition ID must be provided, because the dataset DEEP_SUPPLY_CHAIN_PRODUCCION.Teorico_Real_M_ABX is partitioned

 

0 Kudos
RoyE
Dataiker
Dataiker

Hello

Thank you for the clarification! 

In order to write to a partition, you must use the below in order to set the correct partition!

<dataset>.set_write_partition("<partition name>")

 

https://doc.dataiku.com/dss/latest/python-api/datasets-reference.html#dataiku.Dataset.set_write_part...

Please note that set_write_partition requires the use of DatasetWriter.

https://doc.dataiku.com/dss/latest/python-api/datasets-reference.html#dataiku.core.dataset_write.Dat...

with myoutputdataset.get_writer() as writer:
    for p in mydataset.list_partitions():
        mydataset.read_partitions = [p]
        myoutputdataset.set_write_partition(str(p))
        df = mydataset.get_dataframe()

        writer.write_dataframe(df)
writer.close()

 

This code should iterate through each partition in your old dataset to your new one!

 

Roy

 

View solution in original post

GuimoAAGG
Level 1
Level 1
Author

Hi Roy E,

Thank you very much for your explanation. I was working on something different the last days but your explanation will help me a lot in future projects as this is something I had come across a few times and now I know how to fix it.

0 Kudos
A banner prompting to get Dataiku DSS
Public