Python recipe random connection aborted errors to managed folder

BHoppe
BHoppe Registered Posts: 1

We have a on-prem DSS installation. I have a folder of SFTP type to read from a remote server. My python recipe uses this as input. I'm using get_download_stream() to read the files like so:

handle = dataiku.Folder('my_folder')
handle.get_download_stream('path/to/file')

Now this is looping through several hundred files and every so often I get an exception thrown by get_download_stream(). Its either one of these exceptions:

('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

I've verified that the files exist and can be read from the folder in Dataiku UI. When I run this again, its always a different set of files that throws the exception. The exception shows its making an http call to URL like this:

'http://mydsshost.com:10001/dip/api/tintercom/managed-folders/download-path
?projectKey=MYPROJECT
&lookup=my_folder
&path=%2Fsnou198%2FSNOU198_HTML%2FGUID-4971CC8F-497C-4553-9817-F4189AD5959B.html'

Any ideas on how to debug these connection errors?

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,138 Neuron
    Answer ✓

    Well clearly your remove server connection is being dropped, this is not a Dataiku issue. Speak with your networking team and try to find out why the connection DSS and the SFTP server is being dropped. In any case this sort of remote pull jobs are always going to be subject to network errors so you should implement automatic retries and error trapping in your Python code.

Setup Info
    Tags
      Help me…