Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Pattern to keep only certain column

Fred_ae
Level 1
Pattern to keep only certain column

Hello, I'm new on dataiku, and I have to use for first time pattern to keep column.

 

I have this database with 6 column : 

 

CA 2020    CA 2021    FOOD   SERVICE    EFFECTIF 2020    EFFECTIF 2021

2000         300000      MC DO  HOTEL       220                             240

 

And I go on prepare ( recipe) with keep only columns.

My aim is to have 3 dataset : 

 

1- only column with number

2-  only column who start per CA

3- only column who start per effectif.

 

I tried a lot of things to have it , but I have always no result.

 

So I would like to know how I can do it :

thanks in advance

0 Kudos
6 Replies
tgb417
Neuron
Neuron

@Fred_ae 

Welcome to the Dataiku community.  We are glad that you have joined us.

If I was in a situation like I think your are describing I’d likely use the remove delete column step using a regular expression or Regex as the filter pattern.  Here is a thread from the community about this general topic.

https://community.dataiku.com/t5/Using-Dataiku/Remove-columns-by-pattern/m-p/15817

Regarding the regular expressions they can be a bit of a beast.  

 

The thing to know is that the pipe symbol “|” is the alternative operator.  You can in a signal regular expression find each of the kinds of your columns in one step.  Separating each alternative with a “|”

here is a bit about the alternative operator.

https://www.regular-expressions.info/alternation.html

There are some tools out there that you can play with regular expressions.  They have been of some help to me over the years.

https://regexr.com/

others please jump in and lend a hand.  

@Fred_ae  let us know how you are getting along.

 

--Tom
0 Kudos
Fred_ae
Level 1
Author

Thanks, I feed solution to have only column who start per 'CA'

 

I dit :

^CA.*

 

But now i'm trying to have only column who contain the number 20 

to have all column with 2021 2022 2023 ...

 

thanks again

0 Kudos
tgb417
Neuron
Neuron

For the numerical column names, You might try something like 

^\d*

\d is the symbol for a numerical digit.

You then might combine the two as

 ^CA.*|^\d*

I’m not at a computer with DSS access so I have not been able to test the above.

in the past there was a bug where sometimes one had to use a double backslash in some cases. If you are having problems making this work try \\ rather than a single \. 

hope this helps a bit. Let us know how you are getting on with this.

--Tom
0 Kudos
tgb417
Neuron
Neuron

@Fred_ae 

you might find this post from @Jurre to be helpful in your journey to learn regular expressions.

https://community.dataiku.com/t5/Using-Dataiku/REMOVAL-OF-CHAR-USING-FIND-AND-REPLACE/m-p/23378/high...

Thanks @Jurre 

--Tom
Jurre
Neuron
Neuron

Hi @Fred_ae ,

Could you provide an example of what you want the result to be ? I don't quite understand your description of the issue..  sorry for that!

Regards, 

Jurre

 

0 Kudos
Jurre
Neuron
Neuron

@Fred_ae , 

It seems you have opened two similar topics, is this one a duplicate or another question ? 

cheers, 

Jurre