I am currently looking at some data (CSV source) with some explanatory columns and columns named 2001_01 to 2015_08. Each row can be identified by a unique identifier (eg: FOO01). The data will have seasonal dependencies and so I am trying to analyze the data year over year.
What would be the proper dataiku way to do this?
For instance, I would like to be able to select one row, plot the data on the Y axis per year and use the months on the X axis per month.
Then, I'll compare data sets: say divide row FOO1 by BAR2 and plot it in the same manner.
I would first reshape the data so that you have a column year and a column month. The Fold Multiple Columns processor might be helpful: http://doc.dataiku.com/dss/latest/preparation/reshaping.html#fold-multiple-columns
The data might even have been shaped like this before being transformed into columns 2001_01 to 2015_08.
Then it's straightforward to plot one column (year) agains another (month).