Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I've been running into an issue where after creating a dataset which is stored in parquet, while using a pyspark recipe, the dataset is redected as csv, without a very different schema.
Here's the dataset before pressing redetect format:Original dataset
And after pressing redetect format, It goes from 18 to 75 columns:
After using redetect format.
And the new columns make no sense:
New columns that shouldn't exist.
And to confirm the generated parquet files:
I've deleted and recreated the dataset multiple times, but I always get the same result.
I've also checked the pyspark recipe, but it generates the 18 supposed columns, not 75.
Any help would be appreciated, as I'm at a loss on what could be causing this issue.
Operating system used: Windows