Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Argument List too Long error which is independent on the recipe.

UserBird
Dataiker
Argument List too Long error which is independent on the recipe.

I have a Filesystem datasource which is contains thousands of folders and each folder contains a list of comma separated files.  Each file in each directory contains a different schema and the file name criteria is used to create partitioned data sources with the following using the following format:



/%{DIR_NAME}/KEY_%{DIR_NAME}.csv



This creates a datasource based on all the files that start with KEY in its name.  That part is working as expected.  My problem is that I can't do any recipe against that data source.  I tried python, shell and sync recipes and all of the failed with the same error:




at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.dataiku.dip.security.process.RegularProcess.start(RegularProcess.java:47)
at com.dataiku.dip.security.process.InsecureProcessesLaunchService.launch(InsecureProcessesLaunchService.java:34)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:263)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:37)
at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
Caused by: java.io.IOException: error=7, Argument list too long
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)


My current recipe is in python and the code is:



# -*- coding: utf-8 -*-

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu



# Recipe inputs



print("Here")



events_CSV = dataiku.Dataset("KEY_CSV")

events_CSV_df = events_CSV.get_dataframe()



# Recipe outputs

events_ORC = dataiku.Dataset("KEY_ORC")

events_ORC.write_with_schema(events_CSV_df)



Job fails before printing "Here".



These are the DSS instance settings:




{u'dipInstanceId': u'8bu1n1os-203c299d56c99ef078a53a1a81b6ea23-c60f6bab8e57ecd615a8ec240207f819', u'features': {u'TWITTER': {}, u'HADOOP': {}, u'HIVE': {}, u'PIG': {}, u'R': {}, u'SPARK': {}}, u'devInstance': False, u'distribVersion': u'7.3', u'debug': False, u'version': {u'product_commitid': u'', u'conf_version': u'16', u'product_version': u'4.0.5'}, u'distrib': u'redhat'}


 

0 Kudos
1 Reply
andrewjobel
Level 1

Long Path Tool is a software that will let you easily delete, copy or rename long path files.

0 Kudos