Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is there away to limit rows to 10000 only while uploading data using datasets(csv file,excel file etc) and also do we have any way in which we can monitor this as well?
Operating system used: Redhat
Hi @mettuharish ,
There is no way to limit upload files in terms of the number of rows.
You can monitor this using metrics on datasets or connections.
Administration - Monitoring - Per-connection data, if row counts are computed, you can check the row counts in this view.
If the concern is uploading to the local filesystem/filling up disk space, you can change the default connection from the local filesystem to cloud storage for uploaded datasets.
If you have a proxy, you can limit the maximum file upload size to a lower limit e.g 3g, but it may have unwanted consequences. What if users upload more oversized zip containing images rather then datasets etc? Project imports, etc.
Although I'm not clear exactly what you are trying to do. And maybe @AlexT has a better sense than I. As I was reading your post I was thinking of a project where I had to divide a dataset into multiple files. It was uploading to Google Drive, but due to Dataiku Connections almost any file system like store should be workable.
If I was splitting the data into multiple files you situations is split data and only upload the first 10,000 records. Through the rest of the records in the bit bucket as it were.
Here is a thread from the community about this subject. This is a bit more advanced using a Python Recipe. But this example should be fairly easy to update with limited knowledge of Dataiku.
My approach was actually for movement of records from Dataiku to a remote file system (google dive). You may actually be talking about loading from an external data source into Dataiku DSS. If that is the way you are going, then my comments may not be particularly useful.