File format conversion
Hey Dataiku users,
I just wanted to know how I can convert a very big binary data file to a human readable file like xml/ csv or anything that I can see the decoded data?
Thank you!
Operating system used: Windows
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,125 Neuron
Well you need to ask whoever is producing these files to tell you what binary format they have. Then look for Python libraries that support reading these files.
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@TelmanM
,Welcome to the Dataiku community. We are so pleased to have you join us.
Regarding ingesting binary files into Dataiku. It will really depend on what type of binary file you are trying to ingest. And what type of data is represented in the binary file. (Images, text, tabular, audio…) There are a bunch of built in formats and connections that dataiku can import.
https://doc.dataiku.com/dss/latest/connecting/index.html
This list is extended by plugins
https://www.dataiku.com/product/plugins/
And finally if you can program just a little bit, and can find a python or R library that can read the type of file you are working with you can create a code recipe to import your data through a code recipie
https://knowledge.dataiku.com/latest/code/getting-started/concept-code-recipes.html
It is my guess with one of those three methods you can import almost any kind of binary file.
That all said, if you are comfortable sharing a bit more about your use case, as to the type or types of files you are trying to import, and something about the nature of the stored data. There may be someone here in the community with some experience in that use case.
Have a great day and welcome to the community.
-
Thank you for your prompt response.
Basically I do not know the type of binary file but I know that it comes with *.dmb extension and based on my very limited knowledge it contains very high frequency of data recorded by an electrical board. There is a way in matlab software by reading a *.m file to read the file but I am trying to do it from dataiku platform by uploading that file, though by uploading it into dataiku I got this error "Missing format type". I assume it is a tabular data with multiple features binary coded.
Many thanks,
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,125 Neuron
Are you sure it's *.dmb and not *.mdb or *.dmp?
-
The spelling is right.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
After a quick look on the internet. I see that these file may be some kind of game file.
https://file.org/extension/dmb