Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am trying to download multiple CSV files into a single data source. The data is in a consistent format and can be stacked (i am doing that in another pipeline (but individual data sources for each file).
Any advice on how i can pull multiple similar files into one stacked dataset, my flow view is very very busy, and was hoping I could use the "add another source" button. But i can not find out the constraints on how to use it.
Once it processes the first file I get the error:
[2017/04/15-07:57:00.679] [Thread-709] [INFO] [dku.remotefiles] - Writing in /home/dataiku/dss/managed_datasets/E55V2.EFAST_SCH_C_P1_I3
[2017/04/15-07:57:00.679] [Thread-709] [INFO] [dku.remotefiles] - outputPartition = NP substituted URL https://www.askebsa.dol.gov/FOIA%20Files/2015/Latest/F_SCH_C_PART1_ITEM3_2015_Latest.zip
[2017/04/15-07:57:00.679] [Thread-709] [WARN] [com.dataiku.dip.ApplicationConfigurator] - GeneralSettings: create a temporary read transaction
[2017/04/15-07:57:00.738] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 2] Start call: /api/datasets/remote-files/get-fetch-status user=admin
[2017/04/15-07:57:00.742] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 6] Done call: /api/datasets/remote-files/get-fetch-status time=6ms user=admin
[2017/04/15-07:57:01.754] [Thread-709] [INFO] [dku.remotefiles] - Copied = 7890
[2017/04/15-07:57:02.811] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/datasets/remote-files/get-fetch-status user=admin
[2017/04/15-07:57:02.815] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 5] Done call: /api/datasets/remote-files/get-fetch-status time=5ms user=admin
[2017/04/15-07:57:04.341] [Thread-709] [INFO] [dku.remotefiles] - outputPartition = NP substituted URL https://www.askebsa.dol.gov/FOIA%20Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip
[2017/04/15-07:57:04.341] [Thread-709] [WARN] [com.dataiku.dip.ApplicationConfigurator] - GeneralSettings: create a temporary read transaction
[2017/04/15-07:57:04.342] [Thread-709] [ERROR] [dku.remotefiles] - Download failed
java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.fetchHTTP(RemoteFilesSynchronizer.java:354)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.runFetch(RemoteFilesSynchronizer.java:267)
at com.dataiku.dip.server.datasets.RemoteFilesDatasetTestService$FetchThread.run(RemoteFilesDatasetTestService.java:279)
[2017/04/15-07:57:04.342] [Thread-709] [ERROR] [dku.datasets] - Fetch failed
java.io.IOException: Download failed for https://www.askebsa.dol.gov/FOIA%20Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.fetchHTTP(RemoteFilesSynchronizer.java:408)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.runFetch(RemoteFilesSynchronizer.java:267)
at com.dataiku.dip.server.datasets.RemoteFilesDatasetTestService$FetchThread.run(RemoteFilesDatasetTestService.java:279)
Caused by: java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.fetchHTTP(RemoteFilesSynchronizer.java:354)
... 2 more
[2017/04/15-07:57:04.342] [Thread-709] [INFO] [dku.datasets] - Fetch finished, final status:
{
"running": false,
"error": true,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 1,
"filesFetched": 1,
"sizeFetched": 7655819,
"filesDeleted": 0,
"perSource": [
{
"error": false,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 0,
"filesFetched": 1,
"sizeFetched": 7655819
},
{
"error": true,
"filesTotal": 0,
"sizeTotal": 0,
"filesToFetch": 0,
"sizeToFetch": 0,
"filesFailed": 1,
"filesFetched": 0,
"sizeFetched": 0
}
],
"errorMessages": [
"Download failed for https://www.askebsa.dol.gov/FOIA%20Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip: Connection pool shut down"
],
"master": {
"running": true,
"error": true,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 1,
"filesFetched": 1,
"sizeFetched": 7655819,
"filesDeleted": 0,
"perSource": [
{
"error": false,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 0,
"filesFetched": 1,
"sizeFetched": 7655819
},
{
"error": true,
"filesTotal": 0,
"sizeTotal": 0,
"filesToFetch": 0,
"sizeToFetch": 0,
"filesFailed": 1,
"filesFetched": 0,
"sizeFetched": 0
}
],
"errorMessages": [
"Download failed for https://www.askebsa.dol.gov/FOIA%20Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip: Connection pool shut down"
]
}
}
[2017/04/15-07:57:04.885] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/datasets/remote-files/get-fetch-status user=admin
[2017/04/15-07:57:04.889] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 5] Done call: /api/datasets/remote-files/get-fetch-status time=5ms user=admin
[2017/04/15-07:57:04.972] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/datasets/test-and-detect-format user=admin [projectKey=E55V2]
[2017/04/15-07:57:04.977] [qtp1545237089-1453] [DEBUG] [com.dataiku.dip.connections.FilesBasedConnectionsDAO] test-RemoteFiles - ConnectionsDAO: create a temporary read transaction
[2017/04/15-07:57:04.978] [qtp1545237089-1453] [WARN] [dku.dataset.inspector] test-RemoteFiles - DatasetInspector: create a temporary read transaction
[2017/04/15-07:57:04.983] [qtp1545237089-1453] [INFO] [dku.datasets] test-RemoteFiles - Got it, closing
[2017/04/15-07:57:04.983] [qtp1545237089-1453] [INFO] [dku.datasets] test-RemoteFiles - Close done