Stuck on the Use Case: Churn Prediction

Gerry Leo
Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
edited July 16 in Academy Discussions

Hi, so I'm following along the Use Churn Prediction found here. While during couple of the initial steps the SQL went great, but I'm hitting a roadblock while trying to figure out the SQL for the modeling part found here. The problem with mine, it wouldn't get executed properly. Have tried many ways to make it work, but just couldn't understand what went wrong with them.

Here's the snippet from the Use Case:

SELECT train.*
    , nb_products_seen
    , nb_distinct_product
    , nb_dist_0 , nb_dist_1, nb_dist_2
    , amount_bought , nb_product_bought
    , active_time
FROM train
LEFT JOIN (
    -- generate features based on past data
    SELECT user_id
        , COUNT(product_id) AS nb_products_seen
        , COUNT(distinct product_id) AS nb_distinct_product
        , COUNT(distinct category_id_0) AS nb_dist_0
        , COUNT(distinct category_id_1) AS nb_dist_1
        , COUNT(distinct category_id_2) AS nb_dist_2
        , SUM(price::numeric * (event_type = 'buy_order')::int ) AS amount_bought
        , SUM((event_type = 'buy_order')::int ) AS nb_product_bought
        , EXTRACT(
           EPOCH FROM (
            TIMESTAMP '2014-08-01 00:00:00' - MIN(event_timestamp)
           )
          )/(3600*24)
              AS active_time
    FROM events_complete
    WHERE event_timestamp < TIMESTAMP '2014-08-01 00:00:00'
    GROUP BY user_id
    ) features ON train.user_id = features.user_id

And here's my updated version to comply with my local PgSQL connection data credentials:

SELECT train.*
, nb_products_seen
, nb_distinct_product
, nb_dist_0 , nb_dist_1, nb_dist_2
, amount_bought , nb_product_bought
, active_time
FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train"
LEFT JOIN (
-- generate features based on past data
SELECT user_id
, COUNT(product_id) AS nb_products_seen
, COUNT(distinct product_id) AS nb_distinct_product
, COUNT(distinct category_id_0) AS nb_dist_0
, COUNT(distinct category_id_1) AS nb_dist_1
, COUNT(distinct category_id_2) AS nb_dist_2
, SUM(price::numeric * (event_type = 'buy_order')::int ) AS amount_bought
, SUM((event_type = 'buy_order')::int ) AS nb_product_bought
, EXTRACT(
EPOCH FROM (
TIMESTAMP '2014-08-01 00:00:00' - MIN(event_timestamp)
)
)/(3600*24)
AS active_time
FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_events_complete"
WHERE event_timestamp < TIMESTAMP '2014-08-01 00:00:00'
GROUP BY user_id
) features ON train.user_id = features.user_id

And here's what my error look like :

Screen Shot 2020-05-29 at 20.15.17.png

Has anyone encountered the similar problem? Any input, greatly appreciate, thanks in advance.

Cheers,

GLN

Best Answer

  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭
    edited July 17 Answer ✓

    Hi @gerryleonugroho
    !

    The code is looking for the table "train" and can't find a FROM clause with "train" in it, and that's what's causing the error.

    The best way to handle this is to add an alias to your FROM clause

    FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train" AS train

Answers

Setup Info
    Tags
      Help me…