I have been experiencing OutOfMemory errors when running a python recipe when it tried to load a json dataset from file system. The files are about 2.5GB on disk and the host has 64G memory and there are plenty available when the error occurred. Does DSS limit the memory available for python recipe? Can I change it?
Here is the error stack:
[15:02:21] [INFO] [dku.utils] - *************** Recipe code failed **************
[15:02:21] [INFO] [dku.utils] - Begin Python stack
[15:02:21] [INFO] [dku.utils] - Traceback (most recent call last):
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dss/jobs/WORKFUSIONDP/Build_Generated_Invoices_Flattened_2017-06-29T15-01-44.657/compute_Generated-InvoiceFlattenedFeature_NP/pyrecipenHw7EQsGvQh7/python-exec-wrapper.py", line 3, in <module>
[15:02:21] [INFO] [dku.utils] - execfile(sys.argv[1])
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dss/jobs/WORKFUSIONDP/Build_Generated_Invoices_Flattened_2017-06-29T15-01-44.657/compute_Generated-InvoiceFlattenedFeature_NP/pyrecipenHw7EQsGvQh7/script.py", line 49, in <module>
[15:02:21] [INFO] [dku.utils] - vendorNameRawFeatures_df = vendorNameRawFeatures.get_dataframe()
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/core/dataset.py", line 412, in get_dataframe
[15:02:21] [INFO] [dku.utils] - parse_dates=parse_date_columns)
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python.packages/pandas/io/parsers.py", line 562, in parser_f
[15:02:21] [INFO] [dku.utils] - return _read(filepath_or_buffer, kwds)
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python.packages/pandas/io/parsers.py", line 325, in _read
[15:02:21] [INFO] [dku.utils] - return parser.read()
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python.packages/pandas/io/parsers.py", line 815, in read
[15:02:21] [INFO] [dku.utils] - ret = self._engine.read(nrows)
[15:02:21] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python.packages/pandas/io/parsers.py", line 1314, in read
[15:02:21] [INFO] [dku.utils] - data = self._reader.read(nrows)
[15:02:21] [INFO] [dku.utils] - File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read (pandas/parser.c:8748)
[15:02:21] [INFO] [dku.utils] - File "pandas/parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)
[15:02:21] [INFO] [dku.utils] - File "pandas/parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
[15:02:21] [INFO] [dku.utils] - File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
[15:02:21] [INFO] [dku.utils] - File "pandas/parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas/parser.c:23325)
[15:02:21] [INFO] [dku.utils] - CParserError: Error tokenizing data. C error: out of memory