Error while running a script step When processing 'ArrayFold', the first 1 rows already used 1024 MB

Solved!
chi_wong
Level 2
Error while running a script step When processing 'ArrayFold', the first 1 rows already used 1024 MB

I'm getting this error when trying to parse through output from an API call which comes back as 1 long json string. 

Setting the sample size doesn't help in this case because I'm just working with that 1 json string.

any way to get around this error/warning?

output is expected to be around 10k rows and 5 columns


Operating system used: Windows

0 Kudos
1 Solution
chi_wong
Level 2
Author

just a quick followup on how I was able to move forward. The 1024mb error seems to be a limitation with the way ArrayFold and Split and Fold functions are implemented. 

I discovered that changing my input dataset from a single row with 10k embedded rows to 10 rows with 1k embedded rows resulted in the same error due to hitting the memory limitation.

For me to move forward, I sent a single JSON column to a python recipe and copied a function called "tidy_split" from stack overflow to break out the rows

Not having any python experience, it was important to me to minimize coding if at all possible, so this code snippet shows mostly whatever Dataiku gives me in the initial coding window:

 

# Read recipe inputs
EQL_Results_testParse = dataiku.Dataset("EQL_Results_testParse")
EQL_Results_testParse_df = EQL_Results_testParse.get_dataframe()


# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

ParsedValues_df = tidy_split(EQL_Results_testParse_df,"testOut",sep='[') 


# Write recipe outputs
ParsedValues = dataiku.Dataset("ParsedValues")
ParsedValues.write_with_schema(ParsedValues_df)

 

 

the resulting output data is passed to a prepare step where the rows are split into columns using standard processors. I could have split out the columns in python but  wanted to minimize coding for readability for the next guy.

View solution in original post

3 Replies
chi_wong
Level 2
Author

just a quick followup on how I was able to move forward. The 1024mb error seems to be a limitation with the way ArrayFold and Split and Fold functions are implemented. 

I discovered that changing my input dataset from a single row with 10k embedded rows to 10 rows with 1k embedded rows resulted in the same error due to hitting the memory limitation.

For me to move forward, I sent a single JSON column to a python recipe and copied a function called "tidy_split" from stack overflow to break out the rows

Not having any python experience, it was important to me to minimize coding if at all possible, so this code snippet shows mostly whatever Dataiku gives me in the initial coding window:

 

# Read recipe inputs
EQL_Results_testParse = dataiku.Dataset("EQL_Results_testParse")
EQL_Results_testParse_df = EQL_Results_testParse.get_dataframe()


# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

ParsedValues_df = tidy_split(EQL_Results_testParse_df,"testOut",sep='[') 


# Write recipe outputs
ParsedValues = dataiku.Dataset("ParsedValues")
ParsedValues.write_with_schema(ParsedValues_df)

 

 

the resulting output data is passed to a prepare step where the rows are split into columns using standard processors. I could have split out the columns in python but  wanted to minimize coding for readability for the next guy.

CoreyS
Dataiker Alumni

Thank you for sharing your solution with the rest of the community @chi_wong!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos
Aaron_Rogers
Level 1

Thanks @chi_wong ๐Ÿ˜Š

0 Kudos