Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I am following the Python tutorial and arrived at the steps to create a table. However, the resulting table works in the notebook and does generate a three column table, with a customer_ID column. when I switch to code versing and ran the code, the table does not have the Customer_ID column, which then generates an error in the next step of the tutorial. Below is the code that generates correctly in Notebook but does not generate the customer_id column in code method.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
orders = dataiku.Dataset("orders")
orders_df = orders.get_dataframe()
# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.
#orders_by_customer_df = orders_df # For this sample code, simply copy input to output
orders_by_customer_df = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
).groupby(by="customer_id"
).agg({"pages_visited":"mean",
"total":"sum"})
# Write recipe outputs
orders_by_customer = dataiku.Dataset("orders_by_customer")
orders_by_customer.write_with_schema(orders_by_customer_df)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
orders_by_customer = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
).groupby(by="customer_id"
).agg({"pages_visited":"mean",
"total":"sum"}).reset_index()
Hi,
You have first group-by part without .reset_index() followed by writing to dataset output. The correct group-by part with .reset_index() just does the grouping without writing anything.
You just need to add .reset_index() to the first group-by and delete the second group-by.
Hi,
You have first group-by part without .reset_index() followed by writing to dataset output. The correct group-by part with .reset_index() just does the grouping without writing anything.
You just need to add .reset_index() to the first group-by and delete the second group-by.
Thank you for clearing this up...