Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Code fails to generate the correct table column

Level 2
Code fails to generate the correct table column

Hello,

I am following the Python tutorial and arrived at the steps to create a table.  However, the resulting table works in the notebook and does generate a three column table, with a customer_ID column.  when I switch to code versing and ran the code, the table does not have the Customer_ID column, which then generates an error in the next step of the tutorial.  Below is the code that generates correctly in Notebook but does not generate the customer_id column in code method.

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
orders = dataiku.Dataset("orders")
orders_df = orders.get_dataframe()


# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

#orders_by_customer_df = orders_df # For this sample code, simply copy input to output

orders_by_customer_df = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
       ).groupby(by="customer_id"
                ).agg({"pages_visited":"mean",
                       "total":"sum"})


# Write recipe outputs
orders_by_customer = dataiku.Dataset("orders_by_customer")
orders_by_customer.write_with_schema(orders_by_customer_df)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
orders_by_customer = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
       ).groupby(by="customer_id"
                ).agg({"pages_visited":"mean",
                       "total":"sum"}).reset_index()

 

0 Kudos
2 Replies
Dataiker
Dataiker

Hi,

You have first group-by part without .reset_index() followed by writing to dataset output. The correct group-by part with .reset_index() just does the grouping without writing anything. 

You just need to add .reset_index() to the first group-by and delete the second group-by. 

Level 2
Author

Thank you for clearing this up...

0 Kudos