Dataiku + Spark on Blob Datasets

yashpuranik
yashpuranik Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 69 Neuron

Hi Folks,

Curious about this link: https://doc.dataiku.com/dss/latest/spark/datasets.html#other. This mentions HDFS and S3 as better suited for Spark computation. I am curious why Blob Storage is not included as well. Is this a case of incomplete documentation? Or is Dataiku still working on implementing support for Spark + Azure Blob Storage?

Yash

Tagged:

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
    Answer ✓

    Hi,
    The documentation is not updated. Dataiku will work on Azure Blob.

    You will need to set the HDFS interface in the connection settings of the Azure blob connection:

    Screenshot 2023-10-23 at 5.51.42 PM.png

    Thanks,

Setup Info
    Tags
      Help me…