Code fails to generate the correct table column

webzest
webzest Registered Posts: 12 ✭✭✭✭
edited July 16 in Using Dataiku

Hello,

I am following the Python tutorial and arrived at the steps to create a table. However, the resulting table works in the notebook and does generate a three column table, with a customer_ID column. when I switch to code versing and ran the code, the table does not have the Customer_ID column, which then generates an error in the next step of the tutorial. Below is the code that generates correctly in Notebook but does not generate the customer_id column in code method.

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
orders = dataiku.Dataset("orders")
orders_df = orders.get_dataframe()


# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

#orders_by_customer_df = orders_df # For this sample code, simply copy input to output

orders_by_customer_df = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
       ).groupby(by="customer_id"
                ).agg({"pages_visited":"mean",
                       "total":"sum"})


# Write recipe outputs
orders_by_customer = dataiku.Dataset("orders_by_customer")
orders_by_customer.write_with_schema(orders_by_customer_df)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
orders_by_customer = orders_df.assign(total=orders_df.tshirt_price*orders_df.tshirt_quantity
       ).groupby(by="customer_id"
                ).agg({"pages_visited":"mean",
                       "total":"sum"}).reset_index()

Best Answer

  • Sergey
    Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker
    Answer ✓

    Hi,

    You have first group-by part without .reset_index() followed by writing to dataset output. The correct group-by part with .reset_index() just does the grouping without writing anything.

    You just need to add .reset_index() to the first group-by and delete the second group-by.

Answers

Setup Info
    Tags
      Help me…