Something very strage is happening with csv export.
A csv file of about 2 million rows with 140 columns imported to DSS (file size ca. 1.7 GB).
This dataset was processed and leaned down to 2M rows and 11 columns (only 2 columns added with float vaues truncated to 2 decimal places. (new file size ca. 5.6 GB).
How can this be?
Please I need some DataIkuGuru to help on this. Perhaps, this does not need a Guru. Lol.
There could be several reasons why you are observing this. Firstly, are you positive that the original CSV dataset is uncompressed? Having said that, have you inspected the output CSV dataset with an external tool? (Excel, or a text editor). Are the contents what you expect? Finally, what is the schema of the output dataset? How does it differ from the original dataset?
If you can, I think it would be helpful that you share a job diagnosis with us, so that we can see in more detail exactly what DSS is doing when writing this data.
Juan Eiros Zamora
Technical Support Engineer, Dataiku