Schema Inconsistency

Options
Rushil09
Rushil09 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Frontrunner 2022 Participant Posts: 17 Partner

We have multiple csv files being read by amazon s3 and we figured out that there is schema inconsistency in it. How can we handle it using dataiku?

Because as per my knowledge it will only read certain files with matching schema to the first in the line.


Operating system used: Ubuntu

Tagged:

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi @Rushil09
    ,

    In this case, it seems it would be better to use Folder to upload the different files with different schemas.

    Then use files in the folder dataset ( +Dataset - Internal - Files in folder) to create various datasets from the files and then stack them ( Stack recipe) as needed.

    With "files in folder" dataset you can also specify which file to read the schema from.

    Hope that helps!

Setup Info
    Tags
      Help me…