Stuck on the Use Case: Churn Prediction
Hi, so I'm following along the Use Churn Prediction found here. While during couple of the initial steps the SQL went great, but I'm hitting a roadblock while trying to figure out the SQL for the modeling part found here. The problem with mine, it wouldn't get executed properly. Have tried many ways to make it work, but just couldn't understand what went wrong with them.
Here's the snippet from the Use Case:
SELECT train.* , nb_products_seen , nb_distinct_product , nb_dist_0 , nb_dist_1, nb_dist_2 , amount_bought , nb_product_bought , active_time FROM train LEFT JOIN ( -- generate features based on past data SELECT user_id , COUNT(product_id) AS nb_products_seen , COUNT(distinct product_id) AS nb_distinct_product , COUNT(distinct category_id_0) AS nb_dist_0 , COUNT(distinct category_id_1) AS nb_dist_1 , COUNT(distinct category_id_2) AS nb_dist_2 , SUM(price::numeric * (event_type = 'buy_order')::int ) AS amount_bought , SUM((event_type = 'buy_order')::int ) AS nb_product_bought , EXTRACT( EPOCH FROM ( TIMESTAMP '2014-08-01 00:00:00' - MIN(event_timestamp) ) )/(3600*24) AS active_time FROM events_complete WHERE event_timestamp < TIMESTAMP '2014-08-01 00:00:00' GROUP BY user_id ) features ON train.user_id = features.user_id
And here's my updated version to comply with my local PgSQL connection data credentials:
SELECT train.* , nb_products_seen , nb_distinct_product , nb_dist_0 , nb_dist_1, nb_dist_2 , amount_bought , nb_product_bought , active_time FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train" LEFT JOIN ( -- generate features based on past data SELECT user_id , COUNT(product_id) AS nb_products_seen , COUNT(distinct product_id) AS nb_distinct_product , COUNT(distinct category_id_0) AS nb_dist_0 , COUNT(distinct category_id_1) AS nb_dist_1 , COUNT(distinct category_id_2) AS nb_dist_2 , SUM(price::numeric * (event_type = 'buy_order')::int ) AS amount_bought , SUM((event_type = 'buy_order')::int ) AS nb_product_bought , EXTRACT( EPOCH FROM ( TIMESTAMP '2014-08-01 00:00:00' - MIN(event_timestamp) ) )/(3600*24) AS active_time FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_events_complete" WHERE event_timestamp < TIMESTAMP '2014-08-01 00:00:00' GROUP BY user_id ) features ON train.user_id = features.user_id
And here's what my error look like :
Has anyone encountered the similar problem? Any input, greatly appreciate, thanks in advance.
Cheers,
GLN
Best Answer
-
Hi @gerryleonugroho
!The code is looking for the table "train" and can't find a FROM clause with "train" in it, and that's what's causing the error.
The best way to handle this is to add an alias to your FROM clause
FROM "BUILDANENDTOENDCHURNPREDICTIONMODEL_train" AS train
Answers
-
Gerry Leo Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 45 ✭✭✭✭✭
Ah, you're right Alex, how could I miss them. Thanks for pointing that out.
Cheers,
GLN