Normalization of Columns by using Group By

fbakis
fbakis Partner, Registered Posts: 4 Partner

Hello,

I am trying to do normalization of columns in Dataiku. In my dataset I have a categoric value, I want to do normalization of columns according to each group. Can anyone help me?

Thank you in advance.

Fatma

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @fbakis
    ,

    Welcome to the Dataiku community.

    I’m not clear that I understand your question. Given that you are working with a categorical variable. Let’s say for a particular group the variable has the following values A, B, C, A, A, B, C, C, C, D in each of ten rows in this example group.

    What value would you expect for the rows within this group? The minimum of the values as A, or the most common C. Or the Maximun value D. Or some other kind of transform.

    If I’m completely of target related to the scenario you are considering. (Which I think could be very likely.) Could you provide the community an example or what source data might look like and your expected results after the transformation?

    Looking forward to hearing more. And others please jump in if I don’t see the response.

  • fbakis
    fbakis Partner, Registered Posts: 4 Partner

    Let me clarify the situation better.

    Let's say I have two columns one being categorical like A,A,B,B,B,B,C,C,C and other is a numerical column like 10,22,9,3,5,12,10,2,14. My goal is to normalize the values in the second column but I want to normalize the values between 0-1 range considering their categories. A's normalize within themselves, B's within themselves and so on. Desirable output would be something like: 0,1,0.66,0,0.22,1,0.66.0.1 I know Dataiku can normalize a column as a whole but I'm not sure how to do the thing I want. Hope this example made the situation clear.

    Thanks in advance.

    Fatma

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
    edited July 17

    @fbakis
    ,

    Good to hear from you again.

    I'm wondering if your data is stored in a SQL server. I know that I run a PostgreSQL server on my local laptop and servers for similar use cases. The idea is to push of to the SQL server some of the more database-like functions.

    One of the possible ways to use a window visual recipe, on a table stored in SQL and create a custom aggregation written in SQL.

    I've not specifically tried the following. However, I found this chunk of SQL that seems to do what you want on StackOverflow.

    SELECT a,b,
        1.0 * c / CASE WHEN SUM(c) OVER(PARTITION BY a,b) = 0 THEN 1
                       ELSE SUM(c) OVER(PARTITION BY a,b) END AS c
    FROM tab

    I'm wondering if it could be adapted into a custom aggregation of a windowed visual recipe. Here is another conversation in the community that is related.

    If you don't have access to a SQL server, you could likely do this is python or R in a notebook, or in Python in a visual recipe step.

    Good luck, let us know how you get along.

  • ruben80mdq
    ruben80mdq Dataiku DSS Core Designer, Registered Posts: 1 ✭✭✭

    Hi @fbakis
    ,

    I was wondering if you managed to find a solution as I am having the same need ATM and have not find a solution yet.

    I have built a Colour Table in Dataiku and need to develop a normalised column (for each data category so that I can then use it within the colour variable of the heatmap as noted in the DSS documentation https://doc.dataiku.com/dss/latest/visualization/charts-tables.html)

    Any assistance will be uch appreciated.

    Thanks in advance!

    Ruben

Setup Info
    Tags
      Help me…