Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

Loading a txt file - loss of data from the file.

Solved!
Andrew94
Level 2
Loading a txt file - loss of data from the file.

Hello,

i have a problem with load data from txt files.
I checked the amount in the txt file (number of lines with data). Normally it is 82262. When I load a new dataset using the "DSS - Files in folder" option - later in the loaded dataset I see that there are 81048 lines.
Do you know the reason why this happens?
Thank you for your help! (I am a beginner DATAIKU user).

UPDATE:
I see that if it is a txt file with 41000 lines, there is no problem.

I loaded a file with 500000 lines and everything is fine. There must be something in the settings when creating the DATASET.

0 Kudos
1 Solution
Turribeach

Without actually getting access to the txt file it will be hard to find out why exactly there is a discrepancy on the loaded line count. Having said that my guess is that your data lines are not consistent and that a lot of rows might have some additional carriage returns in them. So if the file doesn't have any confidential data you can share it here and we can have a look.

Another thing you can do is to enable this little known feature of the Files in Folder dataset and enable "Enrich record with conte.... This will allow to see the row ID of each record and potentially find out which ones you are missing.

View solution in original post

0 Kudos
2 Replies
Turribeach

Without actually getting access to the txt file it will be hard to find out why exactly there is a discrepancy on the loaded line count. Having said that my guess is that your data lines are not consistent and that a lot of rows might have some additional carriage returns in them. So if the file doesn't have any confidential data you can share it here and we can have a look.

Another thing you can do is to enable this little known feature of the Files in Folder dataset and enable "Enrich record with conte.... This will allow to see the row ID of each record and potentially find out which ones you are missing.

0 Kudos
Andrew94
Level 2
Author

Hello,

 

thanks for your info.

I found a solution: it was necessary to change the parameterization in DATASET:

Format/Preview -> Quotion style.

 

Have a nice day!

0 Kudos