Pattern to keep only certain column
Hello, I'm new on dataiku, and I have to use for first time pattern to keep column.
I have this database with 6 column :
CA 2020 CA 2021 FOOD SERVICE EFFECTIF 2020 EFFECTIF 2021
2000 300000 MC DO HOTEL 220 240
And I go on prepare ( recipe) with keep only columns.
My aim is to have 3 dataset :
1- only column with number
2- only column who start per CA
3- only column who start per effectif.
I tried a lot of things to have it , but I have always no result.
So I would like to know how I can do it :
thanks in advance
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Welcome to the Dataiku community. We are glad that you have joined us.
If I was in a situation like I think your are describing I’d likely use the remove delete column step using a regular expression or Regex as the filter pattern. Here is a thread from the community about this general topic.
https://community.dataiku.com/t5/Using-Dataiku/Remove-columns-by-pattern/m-p/15817
Regarding the regular expressions they can be a bit of a beast.
The thing to know is that the pipe symbol “|” is the alternative operator. You can in a signal regular expression find each of the kinds of your columns in one step. Separating each alternative with a “|”
here is a bit about the alternative operator.https://www.regular-expressions.info/alternation.html
There are some tools out there that you can play with regular expressions. They have been of some help to me over the years.
others please jump in and lend a hand.
@Fred_ae
let us know how you are getting along. -
Thanks, I feed solution to have only column who start per 'CA'
I dit :
^CA.*
But now i'm trying to have only column who contain the number 20
to have all column with 2021 2022 2023 ...
thanks again
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi @Fred_ae
,Could you provide an example of what you want the result to be ? I don't quite understand your description of the issue.. sorry for that!
Regards,
Jurre
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
For the numerical column names, You might try something like
^\d*
\d is the symbol for a numerical digit.
You then might combine the two as
^CA.*|^\d*
I’m not at a computer with DSS access so I have not been able to test the above.
in the past there was a bug where sometimes one had to use a double backslash in some cases. If you are having problems making this work try \\ rather than a single \.
hope this helps a bit. Let us know how you are getting on with this.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron