After upgrade to Dataiku 14.5.0 error with header Content-MD5

alessandrofaavale
alessandrofaavale Registered Posts: 2 ✭✭

Hello,

I have as object storage a Dell ECS solution, when I upgraded dataiku to 14.5.0 version I have an error on a python notebook:
[2026/04/24-13:34:55.483] [qtp304148291-53] [DEBUG] [dku.jobs]  - Command /tintercom/datasets/wait-write-session processed in 310ms
[2026/04/24-13:34:55.480] [qtp304148291-54] [ERROR] [com.dataiku.dip.dataflow.streaming.DatasetWritingService]  - Push data error during streaming:Failed to delete S3 path, bucket=b-a-isit path=dataiku/SCANCATALOGER/LAYER0_projects_snapshot/
java.io.IOException: Failed to delete S3 path, bucket=b-a-isit path=dataiku/SCANCATALOGER/LAYER0_projects_snapshot/
  at com.dataiku.dip.datasets.fs.S3FSProvider.tryCode(S3FSProvider.java:253)
  at com.dataiku.dip.datasets.fs.S3FSProvider.deleteDirectory(S3FSProvider.java:734)
  at com.dataiku.dip.datasets.fs.S3FSProvider.deleteRecursive(S3FSProvider.java:742)
  at com.dataiku.dip.datasets.fs.BlobLikeDatasetHandler.clearAllData(BlobLikeDatasetHandler.java:153)
  at com.dataiku.dip.dataflow.streaming.DatasetWriter.getWriteModeAfterDatasetCleanup(DatasetWriter.java:78)
  at com.dataiku.dip.dataflow.streaming.DatasetWriter.<init>(DatasetWriter.java:53)
  at com.dataiku.dip.dataflow.streaming.DatasetWriter.build(DatasetWriter.java:118)
  at com.dataiku.dip.dataflow.streaming.DatasetWriter.build(DatasetWriter.java:110)
  at com.dataiku.dip.dataflow.streaming.DatasetWriteSession.writeAtOnceFromCSVStream(DatasetWriteSession.java:179)
  at com.dataiku.dip.dataflow.streaming.DatasetWritingService.pushDataAtOnce(DatasetWritingService.java:372)
  at com.dataiku.dip.dataflow.AbstractJobKernelSession.pushData(AbstractJobKernelSession.java:210)
  at com.dataiku.dip.dataflow.AbstractJobKernelServlet.baseCommandService(AbstractJobKernelServlet.java:219)
  at com.dataiku.dip.dataflow.kernel.slave.KernelServlet.service(KernelServlet.java:561)
  at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:723)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:751)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1622)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1555)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:822)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:438)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:470)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1102)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
  at com.dataiku.dip.server.JettyUtils$EmptySegmentRewriteHandler.handle(JettyUtils.java:62)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.server.Handler$Sequence.handle(Handler.java:805)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.server.Server.handle(Server.java:182)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:721)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:416)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:492)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.epcRunTask(AdaptiveExecutionStrategy.java:428)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:401)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:255)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:196)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:981)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1211)
  at com.dataiku.dss.shadelib.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1166)
  at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException: Missing required header for this request: Content-MD5 (Service: S3, Status Code: 400, Request ID: 0a98b10d:19c573d8aaf:d80e:670, Extended Request ID: c129988c20060dceed3e1fcde8bca4cb777c54cd302d56a853dd7921c689438b) (SDK Attempt Count: 1)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException$BuilderImpl.build(InvalidRequestException.java:150)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException$BuilderImpl.build(InvalidRequestException.java:98)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.DefaultS3Client.deleteObjects(DefaultS3Client.java:4233)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.DelegatingS3Client.lambda$deleteObjects$25(DelegatingS3Client.java:3519)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.internal.crossregion.S3CrossRegionSyncClient.invokeOperation(S3CrossRegionSyncClient.java:67)
  at com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.DelegatingS3Client.deleteObjects(DelegatingS3Client.java:3519)
  at com.dataiku.dip.datasets.fs.S3FSProvider.deleteDirectory(S3FSProvider.java:720)

I think that the error is this one:

com.dataiku.dss.shadelibawssk2.software.amazon.awssdk.services.s3.model.InvalidRequestException: Missing required header for this request: Content-MD5 (Service: S3, Status Code: 400, Request ID: 0a98b10d:19c573d8aaf:d80e:670, Extended Request ID: c129988c20060dceed3e1fcde8bca4cb777c54cd302d56a853dd7921c689438b)

Dataiku version used: 14.5.0

Dataiku version used: 14.5.0

Dataiku version used: 14.5.0

Best Answer

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,700 Neuron

    Dataiku does not natively support Dell ECS. Dell ECS exposes an Amazon S3 compatible API but this API does not implement 100% of the S3 API. You appear to be using a Dataiku S3 connection type in your Dell ECS storage layer so errors like this are expected to happen. I would recommend you move away from an unsupported configuration. Either develop a custom connector plugin using the native Dell ECS APIs or use the native Dell ECS Python APIs directly in your notebook.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,700 Neuron

    You are welcome to use to use that solution but the fact that it works doesn't mean that's supported. While Dataiku may give configuration advice on how to make S3-compatible storage providers work they certainly don't support them. So my advice is still sound, move to a supported configuration.

Setup Info
    Tags
      Help me…