How to manage .xlsb files

GeorgeAlex
GeorgeAlex Registered Posts: 3 ✭✭✭

I have a requirement to read 7 excel sheets as part of the source data files.

I am selecting a folder and supposed to grab al the excel files available in the folder.

6 of the files are .xlsx files. However 1 of the file is .xlsb file.

I am getting formatting errors . Please let me know how do I proceed.


Operating system used: Windows

Tagged:

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @GeorgeAlex

    Do you need to read content from the .xlsb file or can you ignore it?

    If you can ignore the .xlsb file you might look at creating a dataset from a folder and create a glob based inclusion rule for *.xlsx. This should treat all 6 .xlsx files as one dataset leaving out the *.xlsb file.

    The documentation seems to be a bit think on this point. That said I use this approach often. One caution when using this approach the worksheet names in all of the .xlsx workbook file must have exactly the same name. If the worksheet names are different in every workbook you will have problems with that approach.

    You might find this thread of some interest.
    https://community.dataiku.com/t5/Using-Dataiku/Data-Refresh-out-of-a-Managed-Folder/m-p/25259

    If that does not work for your use case. You might find this Dataiku plugin to be of help. I've not used it. But you might find it helpful

    https://www.dataiku.com/product/plugins/excel-sheet-importer/

Setup Info
    Tags
      Help me…