Argument List too Long error which is independent on the recipe.
I have a Filesystem datasource which is contains thousands of folders and each folder contains a list of comma separated files. Each file in each directory contains a different schema and the file name criteria is used to create partitioned data sources with the following using the following format:
/%{DIR_NAME}/KEY_%{DIR_NAME}.csv
This creates a datasource based on all the files that start with KEY in its name. That part is working as expected. My problem is that I can't do any recipe against that data source. I tried python, shell and sync recipes and all of the failed with the same error:
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.dataiku.dip.security.process.RegularProcess.start(RegularProcess.java:47)
at com.dataiku.dip.security.process.InsecureProcessesLaunchService.launch(InsecureProcessesLaunchService.java:34)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:263)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:37)
at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
Caused by: java.io.IOException: error=7, Argument list too long
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
My current recipe is in python and the code is:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Recipe inputs
print("Here")
events_CSV = dataiku.Dataset("KEY_CSV")
events_CSV_df = events_CSV.get_dataframe()
# Recipe outputs
events_ORC = dataiku.Dataset("KEY_ORC")
events_ORC.write_with_schema(events_CSV_df)
Job fails before printing "Here".
These are the DSS instance settings:
{u'dipInstanceId': u'8bu1n1os-203c299d56c99ef078a53a1a81b6ea23-c60f6bab8e57ecd615a8ec240207f819', u'features': {u'TWITTER': {}, u'HADOOP': {}, u'HIVE': {}, u'PIG': {}, u'R': {}, u'SPARK': {}}, u'devInstance': False, u'distribVersion': u'7.3', u'debug': False, u'version': {u'product_commitid': u'', u'conf_version': u'16', u'product_version': u'4.0.5'}, u'distrib': u'redhat'}
Answers
-
Long Path Tool is a software that will let you easily delete, copy or rename long path files.