Download Failed : Connection Pool Shutdown

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

I am trying to download multiple CSV files into a single data source. The data is in a consistent format and can be stacked (i am doing that in another pipeline (but individual data sources for each file).

Any advice on how i can pull multiple similar files into one stacked dataset, my flow view is very very busy, and was hoping I could use the "add another source" button. But i can not find out the constraints on how to use it.

Once it processes the first file I get the error:


[2017/04/15-07:57:00.679] [Thread-709] [INFO] [dku.remotefiles] - Writing in /home/dataiku/dss/managed_datasets/E55V2.EFAST_SCH_C_P1_I3
[2017/04/15-07:57:00.679] [Thread-709] [INFO] [dku.remotefiles] - outputPartition = NP substituted URL https://www.askebsa.dol.gov/FOIA Files/2015/Latest/F_SCH_C_PART1_ITEM3_2015_Latest.zip
[2017/04/15-07:57:00.679] [Thread-709] [WARN] [com.dataiku.dip.ApplicationConfigurator] - GeneralSettings: create a temporary read transaction
[2017/04/15-07:57:00.738] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 2] Start call: /api/datasets/remote-files/get-fetch-status user=admin
[2017/04/15-07:57:00.742] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 6] Done call: /api/datasets/remote-files/get-fetch-status time=6ms user=admin
[2017/04/15-07:57:01.754] [Thread-709] [INFO] [dku.remotefiles] - Copied = 7890
[2017/04/15-07:57:02.811] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/datasets/remote-files/get-fetch-status user=admin
[2017/04/15-07:57:02.815] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 5] Done call: /api/datasets/remote-files/get-fetch-status time=5ms user=admin
[2017/04/15-07:57:04.341] [Thread-709] [INFO] [dku.remotefiles] - outputPartition = NP substituted URL https://www.askebsa.dol.gov/FOIA Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip
[2017/04/15-07:57:04.341] [Thread-709] [WARN] [com.dataiku.dip.ApplicationConfigurator] - GeneralSettings: create a temporary read transaction
[2017/04/15-07:57:04.342] [Thread-709] [ERROR] [dku.remotefiles] - Download failed
java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.fetchHTTP(RemoteFilesSynchronizer.java:354)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.runFetch(RemoteFilesSynchronizer.java:267)
at com.dataiku.dip.server.datasets.RemoteFilesDatasetTestService$FetchThread.run(RemoteFilesDatasetTestService.java:279)
[2017/04/15-07:57:04.342] [Thread-709] [ERROR] [dku.datasets] - Fetch failed
java.io.IOException: Download failed for https://www.askebsa.dol.gov/FOIA Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.fetchHTTP(RemoteFilesSynchronizer.java:408)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.runFetch(RemoteFilesSynchronizer.java:267)
at com.dataiku.dip.server.datasets.RemoteFilesDatasetTestService$FetchThread.run(RemoteFilesDatasetTestService.java:279)
Caused by: java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at com.dataiku.dip.input.remote.RemoteFilesSynchronizer.fetchHTTP(RemoteFilesSynchronizer.java:354)
... 2 more
[2017/04/15-07:57:04.342] [Thread-709] [INFO] [dku.datasets] - Fetch finished, final status:
{
"running": false,
"error": true,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 1,
"filesFetched": 1,
"sizeFetched": 7655819,
"filesDeleted": 0,
"perSource": [
{
"error": false,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 0,
"filesFetched": 1,
"sizeFetched": 7655819
},
{
"error": true,
"filesTotal": 0,
"sizeTotal": 0,
"filesToFetch": 0,
"sizeToFetch": 0,
"filesFailed": 1,
"filesFetched": 0,
"sizeFetched": 0
}
],
"errorMessages": [
"Download failed for https://www.askebsa.dol.gov/FOIA Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip: Connection pool shut down"
],
"master": {
"running": true,
"error": true,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 1,
"filesFetched": 1,
"sizeFetched": 7655819,
"filesDeleted": 0,
"perSource": [
{
"error": false,
"filesTotal": 1,
"sizeTotal": 7655819,
"filesToFetch": 1,
"sizeToFetch": 7655819,
"filesFailed": 0,
"filesFetched": 1,
"sizeFetched": 7655819
},
{
"error": true,
"filesTotal": 0,
"sizeTotal": 0,
"filesToFetch": 0,
"sizeToFetch": 0,
"filesFailed": 1,
"filesFetched": 0,
"sizeFetched": 0
}
],
"errorMessages": [
"Download failed for https://www.askebsa.dol.gov/FOIA Files/2014/Latest/F_SCH_C_PART1_ITEM3_2014_Latest.zip: Connection pool shut down"
]
}
}
[2017/04/15-07:57:04.885] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/datasets/remote-files/get-fetch-status user=admin
[2017/04/15-07:57:04.889] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 5] Done call: /api/datasets/remote-files/get-fetch-status time=5ms user=admin
[2017/04/15-07:57:04.972] [qtp1545237089-1453] [DEBUG] [dku.tracing] - [ct: 1] Start call: /api/datasets/test-and-detect-format user=admin [projectKey=E55V2]
[2017/04/15-07:57:04.977] [qtp1545237089-1453] [DEBUG] [com.dataiku.dip.connections.FilesBasedConnectionsDAO] test-RemoteFiles - ConnectionsDAO: create a temporary read transaction
[2017/04/15-07:57:04.978] [qtp1545237089-1453] [WARN] [dku.dataset.inspector] test-RemoteFiles - DatasetInspector: create a temporary read transaction
[2017/04/15-07:57:04.983] [qtp1545237089-1453] [INFO] [dku.datasets] test-RemoteFiles - Got it, closing
[2017/04/15-07:57:04.983] [qtp1545237089-1453] [INFO] [dku.datasets] test-RemoteFiles - Close done

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    This is a known issue in the current HTTP dataset with multiple sources. At the moment, the only workaround is to make several HTTP datasets and use a stack recipe.

    This will be fixed in version 4.1 of DSS (end of summer)
Setup Info
    Tags
      Help me…