Meet DSS user Ben Powis, Data Science Manager at UK retail company MandM Direct Read More

Argument List too Long error which is independent on the recipe.

Dataiker
Dataiker
Argument List too Long error which is independent on the recipe.

I have a Filesystem datasource which is contains thousands of folders and each folder contains a list of comma separated files.  Each file in each directory contains a different schema and the file name criteria is used to create partitioned data sources with the following using the following format:



/%{DIR_NAME}/KEY_%{DIR_NAME}.csv



This creates a datasource based on all the files that start with KEY in its name.  That part is working as expected.  My problem is that I can't do any recipe against that data source.  I tried python, shell and sync recipes and all of the failed with the same error:




at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.dataiku.dip.security.process.RegularProcess.start(RegularProcess.java:47)
at com.dataiku.dip.security.process.InsecureProcessesLaunchService.launch(InsecureProcessesLaunchService.java:34)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:263)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:37)
at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
Caused by: java.io.IOException: error=7, Argument list too long
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)


My current recipe is in python and the code is:



# -*- coding: utf-8 -*-

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu



# Recipe inputs



print("Here")



events_CSV = dataiku.Dataset("KEY_CSV")

events_CSV_df = events_CSV.get_dataframe()



# Recipe outputs

events_ORC = dataiku.Dataset("KEY_ORC")

events_ORC.write_with_schema(events_CSV_df)



Job fails before printing "Here".



These are the DSS instance settings:




{u'dipInstanceId': u'8bu1n1os-203c299d56c99ef078a53a1a81b6ea23-c60f6bab8e57ecd615a8ec240207f819', u'features': {u'TWITTER': {}, u'HADOOP': {}, u'HIVE': {}, u'PIG': {}, u'R': {}, u'SPARK': {}}, u'devInstance': False, u'distribVersion': u'7.3', u'debug': False, u'version': {u'product_commitid': u'', u'conf_version': u'16', u'product_version': u'4.0.5'}, u'distrib': u'redhat'}


 

0 Kudos
1 Reply
Level 1

Long Path Tool is a software that will let you easily delete, copy or rename long path files.

0 Kudos
Labels (3)