Targeted Random Sampling

Solved!
SuzanneD
Level 2
Targeted Random Sampling

Greetings! I have a data set consisting of various columns, one being 'US State' - All states are represented multiple times.  I would like to compile a random sample consisting of 2 samples for each state.  I've read up on the different sampling methods and don't see how they will fit my use case.  I welcome the Community's thoughts and direction. Thank you!


Operating system used: Windows

0 Kudos
1 Solution
AlexT
Dataiker

Hi,
The sampling method in the Sample 
-> Class rebalance (approx. nb. records)
Should somewhat achieve this but there is not guarantee it will select exactly 2 samples from each item but instead ensure you have sample from all states.

If you need this specific type of sampling you could use Python recipe. Using Pandas group by / sample on the "state" column.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html

 

View solution in original post

0 Kudos
2 Replies
AlexT
Dataiker

Hi,
The sampling method in the Sample 
-> Class rebalance (approx. nb. records)
Should somewhat achieve this but there is not guarantee it will select exactly 2 samples from each item but instead ensure you have sample from all states.

If you need this specific type of sampling you could use Python recipe. Using Pandas group by / sample on the "state" column.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html

 

0 Kudos
SuzanneD
Level 2
Author

Thank you! I will give both a try.

0 Kudos