After upgrade to Dataiku 14.5.0 error with header Content-MD5
Hello,
I have as object storage a Dell ECS solution, when I upgraded dataiku to 14.5.0 version I have an error on a python notebook:
[2026/04/24-13:34:55.483] [qtp304148291-53] [DEBUG] [dku.jobs] - Command /tintercom/datasets/wait-write-session processed in 310ms
[2026/04/24-13:34:55.480] [qtp304148291-54] [ERROR] [com.dataiku.dip.dataflow.streaming.DatasetWritingService] - Push data error during streaming:Failed to delete S3 path, bucket=b-a-isit path=dataiku/SCANCATALOGER/LAYER0_projects_snapshot/
java.io.IOException: Failed to delete S3 path, bucket=b-a-isit path=dataiku/SCANCATALOGER/LAYER0_projects_snapshot/
at com.dataiku.dip.datasets.fs.S3FSProvider.tryCode(S3FSProvider.java:253)
at com.dataiku.dip.datasets.fs.S3FSProvider.deleteDirectory(S3FSProvider.java:734)
at com.dataiku.dip.datasets.fs.S3FSProvider.deleteRecursive(S3FSProvider.java:742)
at com.dataiku.dip.datasets.fs.BlobLikeDatasetHandler.clearAllData(BlobLikeDatasetHandler.java:153)
at com.dataiku.dip.dataflow.streaming.DatasetWriter.getWriteModeAfterDatasetCleanup(DatasetWriter.java:78)
at com.dataiku.dip.dataflow.streaming.DatasetWriter.<init>(DatasetWriter.java:53)
at com.dataiku.dip.dataflow.streaming.DatasetWriter.build(DatasetWriter.java:118)
at com.dataiku.dip.dataflow.streaming.DatasetWriter.build(DatasetWriter.java:110)
at com.dataiku.dip.dataflow.streaming.DatasetWriteSession.writeAtOnceFromCSVStream(DatasetWriteSession.java:179)
at com.dataiku.dip.dataflow.streaming.DatasetWritingService.pushDataAtOnce(DatasetWritingService.java:372)
at com.dataiku.dip.dataflow.AbstractJobKernelSession.pushData(AbstractJobKernelSession.java:210)
at com.dataiku.dip.dataflow.AbstractJobKernelServlet.baseCommandService(AbstractJobKernelServlet.java:219)
at com.dataiku.dip.dataflow.kernel.slave.KernelServlet.service(KernelServlet.java:561)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:723)
at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:751)
at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1622)
at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1555)
at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:822)
at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:438)
at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:470)
at com.dataiku.dss.shadelib.org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1102)
at com.dataiku.dss.shadelib.org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at com.dataiku.dip.server.JettyUtils$EmptySegmentRewriteHandler.handle(JettyUtils.java:62)
at com.dataiku.dss.shadelib.org.eclipse.jetty.server.Handler$Sequence.handle(Handler.java:805)
at com.dataiku.dss.shadelib.org.eclipse.jetty.server.Server.handle(Server.java:182)
at com.dataiku.dss.shadelib.org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:721)
at com.dataiku.dss.shadelib.org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:416)
at com.dataiku.dss.shadelib.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
at com.dataiku.dss.shadelib.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at com.dataiku.dss.shadelib.org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:492)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.epcRunTask(AdaptiveExecutionStrategy.java:428)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:401)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:255)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:196)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException: Missing required header for this request: Content-MD5 (Service: S3, Status Code: 400, Request ID: 0a98b10d:19c573d8aaf:d80e:670, Extended Request ID: c129988c20060dceed3e1fcde8bca4cb777c54cd302d56a853dd7921c689438b) (SDK Attempt Count: 1)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException$BuilderImpl.build(InvalidRequestException.java:150)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException$BuilderImpl.build(InvalidRequestException.java:98)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.DefaultS3Client.deleteObjects(DefaultS3Client.java:4233)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.DelegatingS3Client.lambda$deleteObjects$25(DelegatingS3Client.java:3519)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.internal.crossregion.S3CrossRegionSyncClient.invokeOperation(S3CrossRegionSyncClient.java:67)
at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.DelegatingS3Client.deleteObjects(DelegatingS3Client.java:3519)
at com.dataiku.dip.datasets.fs.S3FSProvider.deleteDirectory(S3FSProvider.java:720)
I think that the error is this one:
com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException: Missing required header for this request: Content-MD5 (Service: S3, Status Code: 400, Request ID: 0a98b10d:19c573d8aaf:d80e:670, Extended Request ID: c129988c20060dceed3e1fcde8bca4cb777c54cd302d56a853dd7921c689438b)
Dataiku version used: 14.5.0
Dataiku version used: 14.5.0
Dataiku version used: 14.5.0
Best Answer
-
Thank you but the solution is to add to the S3 Connection an Advanced connection properties as described in https://doc.dataiku.com/dss/latest/connecting/s3.html#custom-object-storage
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,700 NeuronDataiku does not natively support Dell ECS. Dell ECS exposes an Amazon S3 compatible API but this API does not implement 100% of the S3 API. You appear to be using a Dataiku S3 connection type in your Dell ECS storage layer so errors like this are expected to happen. I would recommend you move away from an unsupported configuration. Either develop a custom connector plugin using the native Dell ECS APIs or use the native Dell ECS Python APIs directly in your notebook.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,700 NeuronYou are welcome to use to use that solution but the fact that it works doesn't mean that's supported. While Dataiku may give configuration advice on how to make S3-compatible storage providers work they certainly don't support them. So my advice is still sound, move to a supported configuration.