Sampling from Millions of Rows into Dataframe
gblack686
Partner, Registered Posts: 62 Partner
I have a plugin that outputs a SQL table via SQL Executor, and I want to run a webapp in a dashboard on the output. The table contains 10+ million records, and I can't sample without a full pass of the dataset through DSS. What's the best way to get a random sample without creating another "sample table" explicitly with an SQL recipe.
I expect this currently is not possible, just wondering if you guys could think of a workaround
Answers
-
Hi,
If your database supports some native random sampling without a full scan, you may try to use SQLExecutor2 also in your database to load the sample. However, assuming that you want a "stable" sample, creating a sampled table in the Flow would probably be a better idea.