Writing df on chunks with buillt in Dataiku functionality

PapaA
PapaA Registered Posts: 20 ✭✭✭✭
edited July 16 in Using Dataiku

Hi team,

I try to write in chnunks a data frame with 1000 columns as the memory cant take. I am writing this on a SQL database table. However, I am receiving a schema error. The target table is empty since I just created.

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
inp = dataiku.Dataset("dto_1")
out = dataiku.Dataset("dto_features_unswifted_1")


with out.get_writer() as writer:

    for df in inp.iter_dataframes( chunksize=10500):
        # Write the processed dataframe
        writer.write_dataframe(df)

Best Answer

  • HenriC
    HenriC Dataiker Posts: 22 Dataiker
    edited July 17 Answer ✓

    Hi @PapaA
    !

    Welcome to the community!

    I think you did not import the right file but I guess the error was saying that the output schema had 0 column while the input had 1000.

    To fix this error, you must proceed in two times. First, you need to replicate the schema and then, load your data.

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    # Read recipe inputs
    inp = dataiku.Dataset("dto_1")
    out = dataiku.Dataset("dto_features_unswifted_1")
    
    out.write_schema_from_dataframe(inp.get_dataframe())
    with out.get_writer() as writer:
        for df in inp.iter_dataframes( chunksize=10500):
            # Write the processed dataframe
            writer.write_dataframe(df)

    If I did not get the right error you were receiving, could you please verify that you sent the right file please?

    Have a great day,

    Henri

Answers

Setup Info
    Tags
      Help me…