Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Shell script teachable churn prediction

Level 3
Shell script teachable churn prediction

Hello, I am trying to replicate the churn prediction case that is in the teachable.dataiku website and I receive the following error:

[2017/04/26-19:02:26.285] [Exec-38] [INFO] [dku.utils] - /home/dataiku/dss/pyenv/lib/python2.7/site-packages/unidecode/__init__.py:46: RuntimeWarning: Argument <type 'str'> is not an unicode object. Passing an encoded string will likely have unexpected results.
[2017/04/26-19:02:26.285] [Exec-38] [INFO] [dku.utils] - _warn_if_not_unicode(string)
[2017/04/26-19:02:26.340] [Exec-38] [INFO] [dku.utils] - Traceback (most recent call last):
[2017/04/26-19:02:26.365] [Exec-38] [INFO] [dku.utils] - File "/home/dataiku/dss/lib/python/vw_transformer.py", line 99, in <module>
[2017/04/26-19:02:26.365] [Exec-38] [INFO] [dku.utils] - sys.stdout.write(vw_record + "\n")
[2017/04/26-19:02:26.365] [Exec-38] [INFO] [dku.utils] - IOError: [Errno 32] Broken pipe
[2017/04/26-19:02:26.457] [Thread-23] [ERROR] [dku.flow.shell] - Error while sending input to script
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
at java.io.BufferedWriter.write(BufferedWriter.java:230)
at java.io.Writer.write(Writer.java:157)
at java.io.Writer.append(Writer.java:227)
at com.dataiku.dip.output.CSVOutputFormatter.appendExcelStyle(CSVOutputFormatter.java:109)
at com.dataiku.dip.output.CSVOutputFormatter.appendFieldToLine(CSVOutputFormatter.java:198)
at com.dataiku.dip.output.CSVOutputFormatter.format(CSVOutputFormatter.java:183)
at com.dataiku.dip.output.StringOutputFormatter.format(StringOutputFormatter.java:33)
at com.dataiku.dip.output.OutputStreamOutputWriter.emitRow(OutputStreamOutputWriter.java:32)
at com.dataiku.dip.input.formats.csv.CSVFormatExtractor.doExtractStream(CSVFormatExtractor.java:366)
at com.dataiku.dip.input.formats.csv.CSVFormatExtractor.doExtractStream(CSVFormatExtractor.java:161)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:135)
at com.dataiku.dip.datasets.AbstractSingleThreadPusher.pushSplits(AbstractSingleThreadPusher.java:176)
at com.dataiku.dip.datasets.UniversalSingleThreadPusher.push(UniversalSingleThreadPusher.java:226)
at com.dataiku.dip.datasets.UniversalSingleThreadPusher.push(UniversalSingleThreadPusher.java:64)
at com.dataiku.dip.recipes.code.shell.ShellScriptRecipeRunner$PipeInThread.run(ShellScriptRecipeRunner.java:220)
[2017/04/26-19:02:26.459] [Thread-23] [INFO] [dku.flow.shell] - Closing the script input
[2017/04/26-19:02:26.463] [FRT-35-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_PG83dheF_NP
com.dataiku.dip.exceptions.ProcessDiedException: The shell process failed (exit code: 127). More info might be available in the logs.

It seems to be entering my python code, but this is not sending back the info. I might be wrong.

Any help will be greatly appreciated.

0 Kudos
4 Replies
Dataiker Alumni
Hi - did you set something in the "Pipe in" or "Pipe out" dropdown menus? It needs to be set to "--nothing--" in both cases, as the Python script takes care of the reading the input data directly.
0 Kudos
Level 3
Yes, I have tried it in all forms, with and without something in the pipe in pipe out, and I always receive the same error 127. I put a database in pipe in, and changed the value so it does not go through the python script and recceived the same error, here is the log:
[2017/04/26-21:13:03.509] [Exec-37] [INFO] [dku.utils] - State Account_Length Area_Code Phone Intl_Plan VMail_Plan VMail_Message Day_Mins Day_Calls Day_Charge Eve_Mins Eve_Calls Eve_Charge Night_Mins Night_Calls Night_Charge Intl_Mins Intl_Calls Intl_Charge CustServ_Calls Churn splitter
[2017/04/26-21:13:03.510] [Exec-37] [INFO] [dku.utils] - ^
[2017/04/26-21:13:03.510] [Exec-37] [INFO] [dku.utils] - SyntaxError: invalid syntax
[2017/04/26-21:13:03.544] [Exec-37] [INFO] [dku.utils] - /home/dataiku/dss/jobs/CHURNPREDICTION/Build_model_vw_2017-04-26T21-13-01.235/compute_PG83dheF_NP/shelljUDVJ4ANXQzQ/script.sh: line 36: --dataset=train: command not found
[2017/04/26-21:13:03.546] [Thread-23] [ERROR] [dku.flow.shell] - Error while sending input to script
java.io.IOException: Broken pipe
0 Kudos
Dataiker Alumni
Also, could you please run "vw --version" in a terminal, on the server hosting DSS, and see what is the output ?
0 Kudos
Level 3
okey, I not sure if I got it I have searched for the command and have not found it, so I checked my versions and are the latest of version 8, also tried to run my python code, but I do not have dataiku package in my python folder, so I can use it via dataiku, not in the terminal.
0 Kudos


Labels (2)
A banner prompting to get Dataiku