Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on October 19, 2023 2:04PM
Likes: 0
Replies: 2
Greetings! I have a data set consisting of various columns, one being 'US State' - All states are represented multiple times. I would like to compile a random sample consisting of 2 samples for each state. I've read up on the different sampling methods and don't see how they will fit my use case. I welcome the Community's thoughts and direction. Thank you!
Operating system used: Windows
Hi,
The sampling method in the Sample
-> Class rebalance (approx. nb. records)
Should somewhat achieve this but there is not guarantee it will select exactly 2 samples from each item but instead ensure you have sample from all states.
If you need this specific type of sampling you could use Python recipe. Using Pandas group by / sample on the "state" column.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html
Thank you! I will give both a try.