Stuck on the Use Case: Churn Prediction

Solved!
gerryleonugroho
Level 3
Stuck on the Use Case: Churn Prediction

Hi, so I'm following along the Use Churn Prediction found here. While during couple of the initial steps the SQL went great, but I'm hitting a roadblock while trying to figure out the SQL for the modeling part found here. The problem with mine, it wouldn't get executed properly. Have tried many ways to make it work, but just couldn't understand what went wrong with them. 

Here's the snippet from the Use Case:

SELECT train.*
    , nb_products_seen
    , nb_distinct_product
    , nb_dist_0 , nb_dist_1, nb_dist_2
    , amount_bought , nb_product_bought
    , active_time
FROM train
LEFT JOIN (
    -- generate features based on past data
    SELECT user_id
        , COUNT(product_id) AS nb_products_seen
        , COUNT(distinct product_id) AS nb_distinct_product
        , COUNT(distinct category_id_0) AS nb_dist_0
        , COUNT(distinct category_id_1) AS nb_dist_1
        , COUNT(distinct category_id_2) AS nb_dist_2
        , SUM(price::numeric * (event_type = 'buy_order')::int ) AS amount_bought
        , SUM((event_type = 'buy_order')::int ) AS nb_product_bought
        , EXTRACT(
           EPOCH FROM (
            TIMESTAMP '2014-08-01 00:00:00' - MIN(event_timestamp)
           )
          )/(3600*24)
              AS active_time
    FROM events_complete
    WHERE event_timestamp < TIMESTAMP '2014-08-01 00:00:00'
    GROUP BY user_id
    ) features ON train.user_id = features.user_id

And here's my updated version to comply with my local PgSQL connection data credentials:

SELECT train.*
, nb_products_seen
, nb_distinct_product
, nb_dist_0 , nb_dist_1, nb_dist_2
, amount_bought , nb_product_bought
, active_time
FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train"
LEFT JOIN (
-- generate features based on past data
SELECT user_id
, COUNT(product_id) AS nb_products_seen
, COUNT(distinct product_id) AS nb_distinct_product
, COUNT(distinct category_id_0) AS nb_dist_0
, COUNT(distinct category_id_1) AS nb_dist_1
, COUNT(distinct category_id_2) AS nb_dist_2
, SUM(price::numeric * (event_type = 'buy_order')::int ) AS amount_bought
, SUM((event_type = 'buy_order')::int ) AS nb_product_bought
, EXTRACT(
EPOCH FROM (
TIMESTAMP '2014-08-01 00:00:00' - MIN(event_timestamp)
)
)/(3600*24)
AS active_time
FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_events_complete"
WHERE event_timestamp < TIMESTAMP '2014-08-01 00:00:00'
GROUP BY user_id
) features ON train.user_id = features.user_id

 

And here's what my error look like :

Screen Shot 2020-05-29 at 20.15.17.png

 

Has anyone encountered the similar problem? Any input, greatly appreciate, thanks in advance.

 

Cheers,

GLN

1 Solution
Alex_Reutter
Dataiker Alumni

Hi @gerryleonugroho !

The code is looking for the table "train" and can't find a FROM clause with "train" in it, and that's what's causing the error.   

The best way to handle this is to add an alias to your FROM clause

FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train" AS train

 

View solution in original post

2 Replies
Alex_Reutter
Dataiker Alumni

Hi @gerryleonugroho !

The code is looking for the table "train" and can't find a FROM clause with "train" in it, and that's what's causing the error.   

The best way to handle this is to add an alias to your FROM clause

FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train" AS train

 

gerryleonugroho
Level 3
Author

Hi @Alex_Reutter 

Ah, you're right Alex, how could I miss them. Thanks for pointing that out.

 

Cheers,

GLN