This website uses cookies. By browsing this website, you consent to the use of cookies. Learn more.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
Hi,

I have a dataset representing transactions for all the products.

I would like to perform a loop for on product in a python notebook for loading the transaction for this product, then perform analysis and write the results in a dataset.

How can I load only a partition of my dataset from a python Notebook ?

Thanks in advance

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have a dataset representing transactions for all the products.

I would like to perform a loop for on product in a python notebook for loading the transaction for this product, then perform analysis and write the results in a dataset.

How can I load only a partition of my dataset from a python Notebook ?

Thanks in advance

1 Solution

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different).

Here is what you can do:

1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000)

2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000)

`my_dataset = dataiku.Dataset("name_dataset")`

for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):

# Insert here applicative logic on each partial dataframe.

pass

You can read more in the documentation.

Jeremy, Product Manager at Dataiku

1 Reply

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different).

Here is what you can do:

1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000)

2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000)

`my_dataset = dataiku.Dataset("name_dataset")`

for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):

# Insert here applicative logic on each partial dataframe.

pass

You can read more in the documentation.

Jeremy, Product Manager at Dataiku