Parquet format table redetected as CSV

MarcioCoelho
MarcioCoelho Dataiku DSS Core Designer, Registered Posts: 12 ✭✭✭✭

Hey,

I've been running into an issue where after creating a dataset which is stored in parquet, while using a pyspark recipe, the dataset is redected as csv, without a very different schema.

Here's the dataset before pressing redetect format:

Original dataset

And after pressing redetect format, It goes from 18 to 75 columns:

After using redetect format.

And the new columns make no sense:

New columns that shouldn't exist.

And to confirm the generated parquet files:

Parquet files.

I've deleted and recreated the dataset multiple times, but I always get the same result.

I've also checked the pyspark recipe, but it generates the 18 supposed columns, not 75.

Any help would be appreciated, as I'm at a loss on what could be causing this issue.

Best regards,

Márcio Coelho


Operating system used: Windows

Tagged:

Answers

Setup Info
    Tags
      Help me…