Simplest way to get the aggregate value from one dataset, and bring it in to another
I have dataset A and dataset B. I need the aggregate total from one column called "Total Commission" from B. I want to bring it into A and populate a single column with that value.
I know I can do this in Python with two dataframes and I know I can do this with a join if I create a join key in the datasets. Is there a simpler way to do this than either of those two options?
Thanks!
Answers
-
Here is another way
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
"Simple" is a very subjective adjective. For a "coder" user a Python recipe will be much simpler in their view. However for a "clicker" user a Join recipe will be much simpler. The issue is that your question doesn't really give enough information and clear requirements of what you are after. Are the Python recipe/join recipe not good enough for you? Why do you need another solution?
-
To go the join route I have to perform the following steps:
1) create a group recipe
2) add a join key on both datasets
3) Join then
4) do all the math I need to do
I think this will take at least 3 separate recipes, and will require a hefty amount of downstream schema refreshing. I am looking for the path of least resistance. -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Again the same issue since "the path of least resistance" depends on who is taking the path. Is your actual question "how can I do X with the least amount of visual recipes"?
will require a hefty amount of downstream schema refreshing
In v12 it's trivial to propagate schema changes since now schema propagation works properly and you have a new feature called "Build Downstream" with allows to enable schema propagation (see below).
-
I do not think we are running v12 yet. I do not have those features. I ended up going the python route. Seemed to make the most sense.