Search option for column selection in join recipe
Join recipe is one the most visual recipes that I use in Dataiku. This is a request for a feature addition in join recipe. I usually deal with datasets with number of columns more than 100. For column selection in the join recipe, I need to scroll down the page to select the required columns from datasets in question, which sometimes is a cumbersome task to find out one column or a list of columns through scrolling.
Thus, if we could have a search option in the column selection UI, it would be very helpful. I believe it will certainly enhance user experience to working with visual recipe like join recipe. For reference, I have added a screenshot herewith.
Comments
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,090 Neuron
Not exactly what you are asking for but this is how I would approach your problem. In a Prepare recipe switch to the Columns view. Then on the Actions button select all Columns. Now use the Filter box to find the columns you want to keep and uncheck them (yes uncheck). Continue to change the filter as required, find columns and uncheck the ones you want to keep. Even if you keep changing the filter Dataiku still remembers all the checked columns. Once you are done click on the Actions button and click Delete. You will now have a Dataset that only has the columns you want which you can easily use in the Join recipe and autoselect all the columns in the output, not having to search for the ones you want since now you want all of them in the output.
-
sudipta002 Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 12 Neuron
So here is my argument. I don't think that it is a good approach to introduce a prepare recipe just to avoid search capability in the join recipe that we are talking about. If we are to add the prepare recipe which in turn creates a new dataset that has millions of records to store in, for example, RedShift/S3, we would apparently consume space for no reason. Please correct me if I am wrong.