Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Test drive anyone?

Level 2
Test drive anyone?

I am finalizing a teaching module for a finance class that makes use of Google eCommerce data available at BigQuery, and would very much welcome anyone's ideas of how that data could be analyzed within the Dataiku platform so that I could incorporate anything novel you might see into what I've already prepared.  A subset of the data is available here, and thanks in advance for any thoughts you might have to contribute!  Once the module is completed I'll be sure to share!

0 Kudos
7 Replies
Level 5

Hi @phb. I'm not sure if I could contribute with something that is novel. What have you prepared already?

Cheers!

0 Kudos
Level 2
Author

Could not help myself from continuing to tinker with it and am additionally getting some awesome support from @AdelaD and @DamienJ... will be sure to loop back once these latest iterations are completed!

Dataiker
Dataiker

Aw! Great to hear Perry! 

0 Kudos
Level 7

@phb 

This is an interesting dataset.  I can see why you have been tinkering with it.  This can get really wide with up to 17,712 columns in one import I've done.  So, I'd think you want to help students subset the columns.

I see the following in a lot of columns "not available in demo dataset".  Are you going to be able to get the/a non-demo dataset?  If not 

There are some time series things you might have students try. I see the visitStartTime.  In fact, you might have the students do a time series forecast based on these online sales.  However, this set seems to have data for a single day.  2017-08-01

You can do some of the basic exploratory data analyses looking for anomalies in columns that might seem out of place.

I see some place-name data here.  You might be able to do some mapping. However, in a quick review, I do not see any geopoints.  So that may add some complexity for your students when it comes to geo-location.

You could do some basic NLP on page titles and page descriptions

In my first quick data import, the product prices are coming through in an odd way.  Looks like right padded with 4 0s and the decimal looks like it needs to be added 6 places from the right.  (There may be some import options to make this better.)  If not DSS can do the needed data cleanup.

It looks like these might be market carts with several items possible in each cart.  You might try some product mix analysis.  

I'm not clear that this is a finance course thing but one might do recommended systems kinds of things.

This dataset might be a bit big for underpowered laptops.  Cloud or datacenter hosting might cure this issue.

Intrested in what you end up doing with this.

--Tom
0 Kudos
Level 1

What format is the dataset?

0 Kudos
Level 2
Author

Text file

0 Kudos

@cinderUARK 

In my looking at the data set it appears to be a JSON style Text File. 

The JSON interperter seems to do an OK Job.

However, you have to be aware that this file seems to have "Nested Arrays".  You will have to decide how you are going the handle the nested elements.  One way is to flatten these into a single record.

JSON File Import.jpg

 

 

--Tom
0 Kudos