Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Greetings! I have a data set consisting of various columns, one being 'US State' - All states are represented multiple times. I would like to compile a random sample consisting of 2 samples for each state. I've read up on the different sampling methods and don't see how they will fit my use case. I welcome the Community's thoughts and direction. Thank you!
Operating system used: Windows
Hi,
The sampling method in the Sample
-> Class rebalance (approx. nb. records)
Should somewhat achieve this but there is not guarantee it will select exactly 2 samples from each item but instead ensure you have sample from all states.
If you need this specific type of sampling you could use Python recipe. Using Pandas group by / sample on the "state" column.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html
Hi,
The sampling method in the Sample
-> Class rebalance (approx. nb. records)
Should somewhat achieve this but there is not guarantee it will select exactly 2 samples from each item but instead ensure you have sample from all states.
If you need this specific type of sampling you could use Python recipe. Using Pandas group by / sample on the "state" column.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html
Thank you! I will give both a try.