Error while running a script step When processing 'ArrayFold', the first 1 rows already used 1024 MB

chi_wong · July 2022

I'm getting this error when trying to parse through output from an API call which comes back as 1 long json string.

Setting the sample size doesn't help in this case because I'm just working with that 1 json string.

any way to get around this error/warning?

output is expected to be around 10k rows and 5 columns

Operating system used: Windows

chi_wong · July 2022

just a quick followup on how I was able to move forward. The 1024mb error seems to be a limitation with the way ArrayFold and Split and Fold functions are implemented.

I discovered that changing my input dataset from a single row with 10k embedded rows to 10 rows with 1k embedded rows resulted in the same error due to hitting the memory limitation.

For me to move forward, I sent a single JSON column to a python recipe and copied a function called "tidy_split" from stack overflow to break out the rows

Not having any python experience, it was important to me to minimize coding if at all possible, so this code snippet shows mostly whatever Dataiku gives me in the initial coding window:

# Read recipe inputs
EQL_Results_testParse = dataiku.Dataset("EQL_Results_testParse")
EQL_Results_testParse_df = EQL_Results_testParse.get_dataframe()


# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

ParsedValues_df = tidy_split(EQL_Results_testParse_df,"testOut",sep='[') 


# Write recipe outputs
ParsedValues = dataiku.Dataset("ParsedValues")
ParsedValues.write_with_schema(ParsedValues_df)

the resulting output data is passed to a prepare step where the rows are split into columns using standard processors. I could have split out the columns in python but wanted to minimize coding for readability for the next guy.

CoreyS · July 2022

Thank you for sharing your solution with the rest of the community @chi_wong
!

Aaron_Rogers · February 2023

Thanks @chi_wong

Error while running a script step When processing 'ArrayFold', the first 1 rows already used 1024 MB

Best Answer

Answers

Categories

Setup Info

Tags