Code fails to generate the correct table column
Hello,
I am following the Python tutorial and arrived at the steps to create a table. However, the resulting table works in the notebook and does generate a three column table, with a customer_ID column. when I switch to code versing and ran the code, the table does not have the Customer_ID column, which then generates an error in the next step of the tutorial. Below is the code that generates correctly in Notebook but does not generate the customer_id column in code method.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs orders = dataiku.Dataset("orders") orders_df = orders.get_dataframe() # Compute recipe outputs from inputs # TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe # NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc. #orders_by_customer_df = orders_df # For this sample code, simply copy input to output orders_by_customer_df = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity ).groupby(by="customer_id" ).agg({"pages_visited":"mean", "total":"sum"}) # Write recipe outputs orders_by_customer = dataiku.Dataset("orders_by_customer") orders_by_customer.write_with_schema(orders_by_customer_df) # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE orders_by_customer = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity ).groupby(by="customer_id" ).agg({"pages_visited":"mean", "total":"sum"}).reset_index()
Best Answer
-
Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
Hi,
You have first group-by part without .reset_index() followed by writing to dataset output. The correct group-by part with .reset_index() just does the grouping without writing anything.
You just need to add .reset_index() to the first group-by and delete the second group-by.
Answers
-
Thank you for clearing this up...