How to fill cells of column with previous non null value

abalo006
abalo006 Registered Posts: 29

I have a table that is presorted that shows teams and players.

Capture.PNG

What I'm trying to do is create some kind of calculation to fill my blank cells for team, I want the cells to be filled with the last non null value that precedes that row.

For example, row 3 is blank, the last non null preceding value is row 1 = green, so I want that cell to have the value green also

Another example is row 17 is also blank, the last preceding non null value is row 14 = yellow so I want rows 15-16 to also have the value of yellow

Does that make sense? and is that possible using Dataiku?

I've attached a picture of my ideal output below and a sample dataset.

cap2.PNG


Operating system used: windows

Tagged:

Answers

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker

    Hi @abalo006
    ,

    You can do this by using a Window recipe.

    Note that the type of the Player column must be "bigint" (not string) so that the rows are sorted correctly.

    Window definitions:

    swappy-20240501_143358.png

    Aggregations (don't forget to select "First/Last not null"):

    swappy-20240501_143451.png

    Output dataset:

    swappy-20240501_143528.png

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,043 Neuron

    Not easy to do in a Visual recipe but relatively easy in either a SQL recipe or Python recipe. Is your Dataset in a SQL database?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,043 Neuron
    edited July 17

    Nice one @ZachM
    , didn't work for me as postgreSQL doesn't support First/Last not null:Screenshot 2024-05-01 at 23.02.44.png

    But this can be done in a SQL Recipe relatively easily:

    SELECT t1."Player", t1."Team"
    , (
        SELECT t2."Team"
        FROM "CT_TEST_book3_copy" t2
        WHERE t2."Player" = (
            SELECT MAX(t3."Player")
            FROM "CT_TEST_book3_copy" t3
            WHERE t3."Player" <= t1."Player"
            AND t3."Team" IS NOT NULL
           )
    ) AS "Calc_Team"
    FROM "CT_TEST_book3_copy" t1;

    Screenshot 2024-05-01 at 23.04.31.png

Setup Info
    Tags
      Help me…