Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Trouble with pivot table

Solved!
wjkelly
Level 3
Trouble with pivot table

Hi,

I'm working with a set of data that has the following schema:

License Number (string) - (we work in a regulated industry and the license number is what we use for a unique customer ID)

Customer Name (string) - (this is the actual name of the store we sell to)

Brand (string) - (we carry multiple brands that stores purchase from us -- we're trying to monitor brand status in each store)

Date Run (date) - this is the field I'm using to create a week by week comparison on brand status within each store

New Brand Status (string) - this is one of four values (Activated, In Activation, Needs Reactivation, Sample) that is calculated in a previous recipe based on recent order history for the brand's products at the store level

New Brand Status Score (int) - this is a score of 1-4 that corresponds to the Brand Status designations, and will be used in a future recipe to quantify change in status. A store that goes from Activated to Needs Reactivation will have a -2 score by the end of this (2 = Needs Reactivation, 4 = Activated; This week's score minus last week's score = 2 - 4 = -2)

I'm trying to get my week to week comparison by pivoting against the Date Run (I've constructed it so that each date is truncated to the day, and run only on a Saturday so the we get a clean week to week comparison, and I'm filtering the data so that by time we get to this pivot step, we're only working with the last two week's of data.)

So, ideally, the resulting table would be in this format:

Store.    |.     Brand.   |.   Two weeks ago brand status and score (two columns).  |   This week's brand status and score (two columns)

One record might look like this (using commas to delimit the fields):

Maritime Cafe, Honu, Activated, 4, Needs Reactivation, 2

 

I have the pivot set up, but every value under the pivot columns is blank. The only aggregate I can get to generate a value is "Count of records".

 

IMPORTANT NOTE: Elsewhere in my flow I have another recipe that similarly uses pivots to organize both text and integer information without any problems. This one, however, is vexing me.

Any ideas? Is it possible that the pivot doesn't like that the pivot value is a date?

Any thoughts would be most welcome.


Operating system used: MacOS Monterey v12.4

0 Kudos
1 Solution
fchataigner2
Dataiker
Dataiker

can you disable the "truncate modalities name" in the Output section of the Pivot recipe? It the output dataset needs short column names, then you should first make the pivot recipe to a filesystem dataset, then sync back to that initial column-length-constrained dataset with a Prepare recipe.

View solution in original post

5 Replies
wjkelly
Level 3
Author

I just tried adding a step to convert the dates into a new column called "Week", and used a formula to make the most recent calculations "This Week" and anything else "Last Week". Now I have a column that contains only two string values (This Week or Last Week).

When I run the pivot on this new column, I'm still getting blanks in all of the value cells in the pivot.

So, it doesn't appear to be choking on the fact that the pivot field contains date values.

Still scratching my head.

0 Kudos
fchataigner2
Dataiker
Dataiker

Hi,

based on what you describe, you have a pivot recipe, with a grouping on customer id, a pivoting on date run, and you selected "count of records" and "max of score" as aggregates. That setup will make as many columns as there are values for the date run column, so I suppose you' filtered the input dataset to keep only the rows for which that date is within the last month (or last 2 weeks). Can you attach screenshots of the recipe's settings, and of the input and output dataset?

In such a setup you should indeed see values, not blanks, so maybe check if the blanks are on the whole dataset, as opposed to "just on the first 10k rows".

0 Kudos
wjkelly
Level 3
Author

Thanks for jumping in here.

The whole data set at this point is right around 4,000 records, so it's pretty easy to verify that none of the records have data where I expect it. Also, there are only two values in the pivot field -- "This Week" and "Last Week". Based on how the data table was set up, there should be data in nearly every cell in the pivot.

Let me show you a couple of screen grabs.

This is the data table that will be pivoted (pre-pivot):

Screen1.jpg

This shows all of the fields, and two records. The "Date_Run" field shows the date the data was created, and truncated to "day" to ensure that every record run on that date is identical. Next to that is the helper field, "Week", where I calculated "This Week" or "Last Week" as the pivot. I tried the pivot using both the "Date_Run" field and the "Week" field, both with the same result.

Here's a screen of the setup for the pivot:

Screen2.jpg

This one has the text string field "new_brand_status" presented with "concat" (even though there will only be one value per record). I've tried "First" as well. Both yield no data. 

Also, with the int field "new_brand_status_score" I've tried Sum (again, only one record, so sum will be equal to that one record's value), Max, Avg -- all yield no data.

Finally, here's the resulting pivot table:

Screen3.jpg

This shows the row that corresponds to the data I focused on in the first screen -- in this store, there were two records for the Bonsai brand status. This table did what I wanted it to -- created a record with one store and one brand, but two brand status details -- Last Week and This Week. As I mentioned, there are about 4,000+ records that should show up on the pivot table and 99% of them should have data in every cell. 

Here's another speculation about what might be going on -- the dataset that is feeding the pivot recipe is set to "Append instead of overwrite" -- that's my mechanism for gathering updated information while preserving the previous week's data -- I append the updated calculations to the dataset, filter so I'm only looking at two weeks' worth, then run the pivot. The resulting table is supposed to give me the mechanism to do two things:

  1. Create a new text string that shows the movement of that brand in that store. Something like "In Activation > Activated", or "Activated > Needs Reactivation". If the two data points are the same, I'll instead push "No change" into the field.
  2. Calculate a score for movement on that brand in that store -- new_brand_status_score from this week minus last weeks will yield the number. This number will eventually feed a store health score that we'll use to prioritize store outreach among the sales force.

There are other similar pivot recipes in this project that work fine utilizing exactly the same processes I've described here. This, however, is the only dataset that uses that "append, don't overwrite" setting. Not sure if that is a reasonable thing to look into, but I'm really perplexed about what could be different, and that's certainly something that's different.

Thanks again for the interest in the problem and for taking a look.

0 Kudos
fchataigner2
Dataiker
Dataiker

can you disable the "truncate modalities name" in the Output section of the Pivot recipe? It the output dataset needs short column names, then you should first make the pivot recipe to a filesystem dataset, then sync back to that initial column-length-constrained dataset with a Prepare recipe.

wjkelly
Level 3
Author

OK -- that worked! Amazing! Thank you.

I'm not sure why it worked, but I'm glad that it did.