Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Parquet format table redetected as CSV

MarcioCoelho
Level 1
Parquet format table redetected as CSV

Hey,

I've been running into an issue where after creating a dataset which is stored in parquet, while using a pyspark recipe, the dataset is redected as csv, without a very different schema.

Here's the dataset before pressing redetect format:

Original datasetOriginal dataset

And after pressing redetect format, It goes from 18 to 75 columns:

After using redetect format.After using redetect format.

And the new columns make no sense:

New columns that shouldn't exist.New columns that shouldn't exist.

And to confirm the generated parquet files:

Parquet files.Parquet files.

 

I've deleted and recreated the dataset multiple times, but I always get the same result.

I've also checked the pyspark recipe, but it generates the 18 supposed columns, not 75.

Any help would be appreciated, as I'm at a loss on what could be causing this issue.

 

Best regards,

Márcio Coelho


Operating system used: Windows

0 Kudos
0 Replies