Avoid data trim

Hello,
When importing a csv file, data are trimed : raw data mustn't be truncated or trimed during import process.
I've attched a simple exmaple of csv file used for testing and illutrating.
Anyone having a solution to avoid this ?
Operating system used: Windows
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,467 Neuron
First of all that is not a CSV file, it's a TSV file (tab separated values):
The file extension does not determine the file format, the file contents do. It's best practice to remove leading and trailing spaces. But this can be changed if needed. In the file upload dataset set the Quoting style to "No escaping nor quoting":
This will force Dataiku to quote around your TSV fields and that will preserve the leading and trailing spaces. Then use a Prepare recipe to remove the quoting:
I added a length column so you can see it's working properly.
-
Thx for reply, but it doesn't work on as you mentioned, i don't see quotes added when changing to No escape nor quoting :(
Found a solution by adding a python recipe and a dataset
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import os
input_folder = dataiku.Folder("JRcybThx")
input_folder_path = input_folder.get_path()
file_name = "test_file.csv"
file_path = os.path.join(input_folder_path, file_name)
test_trim = pd.read_csv(file_path, sep='|', dtype=str, engine='python', skip_blank_lines=False, keep_default_na=False)
# Write recipe outputs
test_trim_dataset = dataiku.Dataset("test_trim")
test_trim_dataset.write_with_schema(test_trim)