Way to limit 10000 rows while uploading data using datasets

Options
mettuharish
mettuharish Dataiku DSS Core Designer, Registered Posts: 15

Is there away to limit rows to 10000 only while uploading data using datasets(csv file,excel file etc) and also do we have any way in which we can monitor this as well?


Operating system used: Redhat

Tagged:

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi @mettuharish
    ,
    There is no way to limit upload files in terms of the number of rows.

    You can monitor this using metrics on datasets or connections.
    Administration - Monitoring - Per-connection data, if row counts are computed, you can check the row counts in this view.

    If the concern is uploading to the local filesystem/filling up disk space, you can change the default connection from the local filesystem to cloud storage for uploaded datasets.

    If you have a proxy, you can limit the maximum file upload size to a lower limit e.g 3g, but it may have unwanted consequences. What if users upload more oversized zip containing images rather then datasets etc? Project imports, etc.


  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @mettuharish

    Although I'm not clear exactly what you are trying to do. And maybe @AlexT
    has a better sense than I. As I was reading your post I was thinking of a project where I had to divide a dataset into multiple files. It was uploading to Google Drive, but due to Dataiku Connections almost any file system like store should be workable.

    If I was splitting the data into multiple files you situations is split data and only upload the first 10,000 records. Through the rest of the records in the bit bucket as it were.

    Here is a thread from the community about this subject. This is a bit more advanced using a Python Recipe. But this example should be fairly easy to update with limited knowledge of Dataiku.

    https://community.dataiku.com/t5/Using-Dataiku/Exporting-a-Dataset-into-Multiple-CSV-files/m-p/32295

    Good luck.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @mettuharish

    My approach was actually for movement of records from Dataiku to a remote file system (google dive). You may actually be talking about loading from an external data source into Dataiku DSS. If that is the way you are going, then my comments may not be particularly useful.

Setup Info
    Tags
      Help me…