Way to limit 10000 rows while uploading data using datasets
 
            Is there away to limit rows to 10000 only while uploading data using datasets(csv file,excel file etc) and also do we have any way in which we can monitor this as well?
Operating system used: Redhat
Answers
- 
             Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,352 Dataiker Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,352 DataikerHi @mettuharish 
 ,
 There is no way to limit upload files in terms of the number of rows.
 You can monitor this using metrics on datasets or connections.
 Administration - Monitoring - Per-connection data, if row counts are computed, you can check the row counts in this view.If the concern is uploading to the local filesystem/filling up disk space, you can change the default connection from the local filesystem to cloud storage for uploaded datasets. 
 If you have a proxy, you can limit the maximum file upload size to a lower limit e.g 3g, but it may have unwanted consequences. What if users upload more oversized zip containing images rather then datasets etc? Project imports, etc.
- 
             tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,630 Neuron tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,630 NeuronAlthough I'm not clear exactly what you are trying to do. And maybe @AlexT 
 has a better sense than I. As I was reading your post I was thinking of a project where I had to divide a dataset into multiple files. It was uploading to Google Drive, but due to Dataiku Connections almost any file system like store should be workable.If I was splitting the data into multiple files you situations is split data and only upload the first 10,000 records. Through the rest of the records in the bit bucket as it were. Here is a thread from the community about this subject. This is a bit more advanced using a Python Recipe. But this example should be fairly easy to update with limited knowledge of Dataiku. https://community.dataiku.com/t5/Using-Dataiku/Exporting-a-Dataset-into-Multiple-CSV-files/m-p/32295 Good luck. 
- 
             tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,630 Neuron tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,630 NeuronMy approach was actually for movement of records from Dataiku to a remote file system (google dive). You may actually be talking about loading from an external data source into Dataiku DSS. If that is the way you are going, then my comments may not be particularly useful.