Hello,
I am following the Python tutorial and arrived at the steps to create a table. However, the resulting table works in the notebook and does generate a three column table, with a customer_ID column. when I switch to code versing and ran the code, the table does not have the Customer_ID column, which then generates an error in the next step of the tutorial. Below is the code that generates correctly in Notebook but does not generate the customer_id column in code method.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
orders = dataiku.Dataset("orders")
orders_df = orders.get_dataframe()
# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.
#orders_by_customer_df = orders_df # For this sample code, simply copy input to output
orders_by_customer_df = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
).groupby(by="customer_id"
).agg({"pages_visited":"mean",
"total":"sum"})
# Write recipe outputs
orders_by_customer = dataiku.Dataset("orders_by_customer")
orders_by_customer.write_with_schema(orders_by_customer_df)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
orders_by_customer = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
).groupby(by="customer_id"
).agg({"pages_visited":"mean",
"total":"sum"}).reset_index()
Hi,
You have first group-by part without .reset_index() followed by writing to dataset output. The correct group-by part with .reset_index() just does the grouping without writing anything.
You just need to add .reset_index() to the first group-by and delete the second group-by.
Thank you for clearing this up...