Export Redshift Data in CSV with Header on SFTP server

raviagrawal
raviagrawal Partner, Registered Posts: 18 Partner
Hi,



I have requirement where I have to export the Dataiku process output from redshift to CSV file with header on a SFTP site for a complete automation of the process. Could you please provide a possible way how to achieve this ?



Regards,

Ravi Agrawal
Tagged:

Answers

  • Mattsco
    Mattsco Dataiker, Registered Posts: 125 Dataiker

    Hi Ravi,

    You can use the "Export to folder" recipe and specify an already setup connection to your SFTP.

    Matt

  • raviagrawal
    raviagrawal Partner, Registered Posts: 18 Partner
    Hi,

    Is it possible to append the datetimestamp at the end of the file name using this recipe ?

    Also I used the export folder functionality and it failed with following log

    [11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - ----------------------------------------
    [11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - DSS startup: jek version:5.1.5
    [11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - DSS home: /home/dataiku/DATA_DIR
    [11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - OS: Linux 4.14.72-68.55.amzn1.x86_64 amd64 - Java: Oracle Corporation 1.8.0_191
    [11:03:33] [INFO] [dku.flow.jobrunner] running compute_6xb7d0uO_NP - Allocated a slot for this activity!
    [11:03:33] [INFO] [dku.flow.jobrunner] running compute_6xb7d0uO_NP - Run activity
    [11:03:33] [INFO] [dku.flow.activity] running compute_6xb7d0uO_NP - Executing default pre-activity lifecycle hook
    [11:03:33] [INFO] [dip.provider.connection.prepare] running compute_6xb7d0uO_NP - Setup connection with encrypted fields CDP_mbox
    [11:03:33] [INFO] [dku.managedfolders.handler] running compute_6xb7d0uO_NP - Create provider for IQVIATRIGGER.RatkEqgX with path IQVIATRIGGER/RatkEqgX
    [11:03:33] [INFO] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Effective root : 'IQVIATRIGGER/RatkEqgX' from 'null' / 'IQVIATRIGGER/RatkEqgX' useRelativePaths=true
    [11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Connecting to mboxnaprd.jnj.com port 22
    [11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Connection established
    [11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Remote version string: SSH-2.0-CrushFTPSSHD
    [11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Local version string: SSH-2.0-JSCH-0.1.54
    [11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
    [11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_KEXINIT sent
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_KEXINIT received
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group18-sha512,diffie-hellman-group17-sha512,diffie-hellman-group16-sha512,diffie-hellman-group15-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha1
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: ssh-dss,ssh-rsa
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: aes256-ctr,aes128-ctr,aes256-cbc,3des-ctr,3des-cbc,aes192-ctr,aes192-cbc,aes128-cbc,blowfish-cbc,arcfour128,arcfour
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: aes256-ctr,aes128-ctr,aes256-cbc,3des-ctr,3des-cbc,aes192-ctr,aes192-cbc,aes128-cbc,blowfish-cbc,arcfour128,arcfour
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: hmac-md5,hmac-sha256,hmac-sha2-256,hmac-sha2-256-96,hmac-sha512,hmac-sha2-512,hmac-sha2-512-96,hmac-sha1,hmac-sha1-96,hmac-ripemd160,hmac-md5-96
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: hmac-md5,hmac-sha256,hmac-sha2-256,hmac-sha2-256-96,hmac-sha512,hmac-sha2-512,hmac-sha2-512-96,hmac-sha1,hmac-sha1-96,hmac-ripemd160,hmac-md5-96
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: none,zlib
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: none,zlib
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server:
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server:
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: none
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: none
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client:
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client:
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server->client aes128-ctr hmac-md5 none
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client->server aes128-ctr hmac-md5 none
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_KEX_ECDH_INIT sent
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - expecting SSH_MSG_KEX_ECDH_REPLY
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - ssh_rsa_verify: signature true
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Host 'mboxnaprd.jnj.com' is known and matches the RSA host key
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_NEWKEYS sent
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_NEWKEYS received
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_SERVICE_REQUEST sent
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_SERVICE_ACCEPT received
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Authentications that can continue: password
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Next authentication method: password
    [11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Authentication succeeded (password).
    [11:03:34] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file 'IQVIATRIGGER/RatkEqgX'
    [11:03:34] [ERROR] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Dataset root path does not exist : IQVIATRIGGER/RatkEqgX
    java.io.FileNotFoundException
    at com.dataiku.dip.datasets.fs.SSHFSProvider.stat(SSHFSProvider.java:320)
    at com.dataiku.dip.datasets.fs.SSHFSProvider.ensureDirectory(SSHFSProvider.java:191)
    at com.dataiku.dip.managedfolder.ManagedFolderHandler.ensurePartitionFolder(ManagedFolderHandler.java:587)
    at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.handleTargetFolderPre(DefaultActivityLifecycleHook.java:77)
    at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.onPreActivity(DefaultActivityLifecycleHook.java:138)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner.runActivity(ActivityRunner.java:558)
    at com.dataiku.dip.dataflow.jobrunner.JobRunner.runActivity(JobRunner.java:125)
    at com.dataiku.dip.dataflow.jobrunner.JobRunner.access$900(JobRunner.java:36)
    at com.dataiku.dip.dataflow.jobrunner.JobRunner$ActivityExecutorThread.run(JobRunner.java:312)
    [11:03:34] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file IQVIATRIGGER/RatkEqgX
    [11:03:35] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file 'IQVIATRIGGER'
    [11:03:45] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Disconnecting from mboxnaprd.jnj.com port 22
    [11:03:45] [ERROR] [dku.flow.jobrunner] running compute_6xb7d0uO_NP - Activity unexpectedly failed
    java.io.IOException: SFTP error
    at com.dataiku.dip.input.remote.SFTPRemote.mkdirs(SFTPRemote.java:197)
    at com.dataiku.dip.datasets.fs.SSHFSProvider.ensureDirectory(SSHFSProvider.java:197)
    at com.dataiku.dip.managedfolder.ManagedFolderHandler.ensurePartitionFolder(ManagedFolderHandler.java:587)
    at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.handleTargetFolderPre(DefaultActivityLifecycleHook.java:77)
    at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.onPreActivity(DefaultActivityLifecycleHook.java:138)
    at com.dataiku.dip.dataflow.jobrunner.ActivityRunner.runActivity(ActivityRunner.java:558)
    at com.dataiku.dip.dataflow.jobrunner.JobRunner.runActivity(JobRunner.java:125)
    at com.dataiku.dip.dataflow.jobrunner.JobRunner.access$900(JobRunner.java:36)
    at com.dataiku.dip.dataflow.jobrunner.JobRunner$ActivityExecutorThread.run(JobRunner.java:312)
    Caused by: 4: %MKD-bad%
    at com.jcraft.jsch.ChannelSftp.throwStatusError(ChannelSftp.java:2873)
    at com.jcraft.jsch.ChannelSftp.mkdir(ChannelSftp.java:2182)
    at com.dataiku.dip.input.remote.SFTPRemote.mkdirs(SFTPRemote.java:191)
    ... 8 more
    [11:03:45] [INFO] [dku.remotefiles.ssh.jsch] - Caught an exception, leaving main loop due to Socket closed

    Regards,
    Ravi Agrawal
  • Mattsco
    Mattsco Dataiker, Registered Posts: 125 Dataiker
    Is it possible to append the datetimestamp?
    No, we don't have the feature implemented.

    For the error, what version of DSS do you use?

    A first guess is that the file is too big for the throughput of the connection/SFTP server that closes the connection after a while if the upload takes too much time.

    You can try with smaller files first and(or) upload directly from an SFTP client outside of DSS to check the stability of the upstream connection?
  • raviagrawal
    raviagrawal Partner, Registered Posts: 18 Partner
    Hi,

    We are using Dataiku 5.1.5 version and file is only 900 records.

    My guess is that issue is because of following command but I am not sure how to correct this one.

    [11:03:33] [INFO] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Effective root : 'IQVIATRIGGER/RatkEqgX' from 'null' / 'IQVIATRIGGER/RatkEqgX' useRelativePaths=true

    [11:03:34] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file 'IQVIATRIGGER/RatkEqgX'
    [11:03:34] [ERROR] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Dataset root path does not exist : IQVIATRIGGER/RatkEqgX
    java.io.FileNotFoundException

    Regards,
    Ravi Agrawal
  • raviagrawal
    raviagrawal Partner, Registered Posts: 18 Partner
    Hi,

    I am able to put the file on FTP server but I have two issues
    a. File Name is same as table name and I want to change it to the new name.
    b. Adding the Datatimestamp at the end of the file.

    I referred to following link to replicate the process, but somehow file is not being created

    https://answers.dataiku.com/1038/export-to-folder-different-name-each-day

    Could you please check and suggest something ?

    Regards,
    Ravi Agrawal
Setup Info
    Tags
      Help me…