I'm very new to Dataiku, so maybe there is an answer to this already (I searched and could not find an answer)
I am working with a wide data set (3500 columns of integers that come from a TFIR spectrum analyzer) and I wanted to use MSSQL to store the intermediate data. However SQL screams at me for trying to create a table wider than 1024 columns. What is the best practice for dealing with super wide data like this?
I am using a custom python script to extract the data from the proprietary file format, so I can imagine a few options: a) as part of the extraction script, break the data into sets of 1000 and writing them out to separate temporary data stores. b) trying to find a pre-processing scheme and only storing the valuable data (we only care about the peaks and their magnitudes) c) store it as a blob of some kind (like a json string) so it can be stored in a varchar field (this limits the ability to detect trends in the data directly and requires a decode step later as the data is passed to a later ML process) or d) something else I'm not seeing as an option right now.