Way to limit 10000 rows while uploading data using datasets

mettuharish
Level 3
Way to limit 10000 rows while uploading data using datasets

Is there  away to limit rows to 10000 only while uploading data using datasets(csv file,excel file etc) and also do we have any way in which we can monitor this as well?


Operating system used: Redhat

0 Kudos
3 Replies
AlexT
Dataiker

Hi @mettuharish ,
There is no way to limit upload files in terms of the number of rows. 

You can monitor this using metrics on datasets or connections.
Administration - Monitoring - Per-connection data, if row counts are computed, you can check the row counts in this view.

If the concern is uploading to the local filesystem/filling up disk space, you can change the default connection from the local filesystem to cloud storage for uploaded datasets.

If you have a proxy, you can limit the maximum file upload size to a lower limit e.g 3g, but it may have unwanted consequences. What if users upload more oversized zip containing images rather then datasets etc?  Project imports, etc.


0 Kudos
tgb417

@mettuharish 

Although I'm not clear exactly what you are trying to do.  And maybe @AlexT has a better sense than I.  As I was reading your post I was thinking of a project where I had to divide a dataset into multiple files.  It was uploading to Google Drive, but due to Dataiku Connections almost any file system like store should be workable. 

If I was splitting the data into multiple files you situations is split data and only upload the first 10,000 records.  Through the rest of the records in the bit bucket as it were.

Here is a thread from the community about this subject.  This is a bit more advanced using a Python Recipe.  But this example should be fairly easy to update with limited knowledge of Dataiku.

https://community.dataiku.com/t5/Using-Dataiku/Exporting-a-Dataset-into-Multiple-CSV-files/m-p/32295 

Good luck.

--Tom
0 Kudos
tgb417

@mettuharish 

My approach was actually for movement of records from Dataiku to a remote file system (google dive).  You may actually be talking about loading from an external data source into Dataiku DSS.  If that is the way you are going, then my comments may not be particularly useful.

--Tom
0 Kudos