Loading a txt file - loss of data from the file.
Hello,
i have a problem with load data from txt files.
I checked the amount in the txt file (number of lines with data). Normally it is 82262. When I load a new dataset using the "DSS - Files in folder" option - later in the loaded dataset I see that there are 81048 lines.
Do you know the reason why this happens?
Thank you for your help! (I am a beginner DATAIKU user).
UPDATE:
I see that if it is a txt file with 41000 lines, there is no problem.
I loaded a file with 500000 lines and everything is fine. There must be something in the settings when creating the DATASET.
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Without actually getting access to the txt file it will be hard to find out why exactly there is a discrepancy on the loaded line count. Having said that my guess is that your data lines are not consistent and that a lot of rows might have some additional carriage returns in them. So if the file doesn't have any confidential data you can share it here and we can have a look.
Another thing you can do is to enable this little known feature of the Files in Folder dataset and enable "Enrich record with context information". This will allow to see the row ID of each record and potentially find out which ones you are missing.
Answers
-
Hello,
thanks for your info.
I found a solution: it was necessary to change the parameterization in DATASET:
Format/Preview -> Quotion style.
Have a nice day!