Avoid data trim

z6epo
z6epo Registered Posts: 2 ✭✭✭

Hello,

When importing a csv file, data are trimed : raw data mustn't be truncated or trimed during import process.

I've attched a simple exmaple of csv file used for testing and illutrating.

Anyone having a solution to avoid this ?

Operating system used: Windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,467 Neuron
    edited April 19

    First of all that is not a CSV file, it's a TSV file (tab separated values):

    image.png

    The file extension does not determine the file format, the file contents do. It's best practice to remove leading and trailing spaces. But this can be changed if needed. In the file upload dataset set the Quoting style to "No escaping nor quoting":

    image.png

    This will force Dataiku to quote around your TSV fields and that will preserve the leading and trailing spaces. Then use a Prepare recipe to remove the quoting:

    image.png

    I added a length column so you can see it's working properly.

  • z6epo
    z6epo Registered Posts: 2 ✭✭✭
    image.png

    Thx for reply, but it doesn't work on as you mentioned, i don't see quotes added when changing to No escape nor quoting :(

    Found a solution by adding a python recipe and a dataset

    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    import os

    input_folder = dataiku.Folder("JRcybThx")
    input_folder_path = input_folder.get_path()

    file_name = "test_file.csv"
    file_path = os.path.join(input_folder_path, file_name)

    test_trim = pd.read_csv(file_path, sep='|', dtype=str, engine='python', skip_blank_lines=False, keep_default_na=False)

    # Write recipe outputs
    test_trim_dataset = dataiku.Dataset("test_trim")
    test_trim_dataset.write_with_schema(test_trim)
Setup Info
    Tags
      Help me…