What am I doing wrong with a simple date sort in the prepare recipe?

gvas
gvas Registered Posts: 2 ✭✭

I have dates that are not parsed. It starts with a group of dates from 2015-01-01 to 2024-12-01 and keeps repeating over and over for each categorical variable. I try to sort the date as is which is as a string and nothing happens. I parse the date (and get a bunch of minute/second junk at the end (ill never want that ever by the way). I try and sort that and expect to see all the dates line up (all the 2015-01s together, followed by all the 2015-02s, etc) but nothing happens.

I'm only having to go through all this because of the strict requirements of doing a pivot where the data has to be in this format, which is also very confusing.

What am I missing on the sorting piece of the prepare recipe? I go to the documentation for the sort function in the recipe and it gives me this: This processor sorts an array (written in JSON).

Thanks.

Operating system used: WIndows 10

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron

    I think you are confusing the output preview of the Prepare recipe with how the data is stored in your data store layer. These are two distinct things. Then you have the dataset Explore tab that can also be customised to sort data. Both the output preview of the Prepare recipe and the dataset Explore tab sorting is only for display purposes and does not affect how the data is stored. You should also keep in mind that the sort options on this case are based on data samples, not on the whole dataset.

    Some data store technologies do not allow data to be sorted. For instance a CSV file doesn't support sorting although it does have a predefined order. SQL databases do not guarantee the order of the data unless you use the ORDER BY clause. In general you shouldn't need to sort data to produce an output. Even a pivot table can be calculated without sorting.

    So my advice will be that you produce the data in the output you want and then use the Sort recipe to see it ordered in the way you want.

  • gvas
    gvas Registered Posts: 2 ✭✭

    Thanks for your reply. To be honest this is super confusing:

    "I think you are confusing the output preview of the Prepare recipe with how the data is stored in your data store layer. These are two distinct things. Then you have the dataset Explore tab that can also be customised to sort data. Both the output preview of the Prepare recipe and the dataset Explore tab sorting is only for display purposes and does not affect how the data is stored. You should also keep in mind that the sort options on this case are based on data samples, not on the whole dataset."

    If I use something like Power Query(Get and Transform) in Excel, or write a stored procedure in SQL and show an output or stage it, it shows me the order correctly with a preview. Why doesn't Dataiku do this?

    "Even a pivot table can be calculated without sorting."

    It can? But then there is this from the pivot documentation, which expects a date sort in this manner:

    "Example of OK input:

    idx1 label1 v1
    idx1 label2 v2
    idx2 label1 v3
    Example of not OK input:

    idx1 label1 v1
    idx2 label1 v3
    idx1 label2 v2

    When I try and do it the other way (the not OK way) I get this weird result where the data never lines up in an array, and I have no idea why anyone would ever want it any other way than an normalized array.

    So my advice will be that you produce the data in the output you want and then use the Sort recipe to see it ordered in the way you want.

    Then what is the point of the sort in the prepare recipe? Alot of people are going to try and use that and waste time getting useless results like I was for an hour.

    None of my frustration directed at you, I know you are trying to help! But I just have a lot of questions on how to do some rather basic ETL here and what a reasonable person would expect to see.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron

    I understand your frustations but in general I would say that you shouldn't expect one tool behave how other tools behave. Dataiku is not an ETL tool so while it can do ETL you shouldn't expect it to behave like one. It's main purpose is to be a machine learning platform. The sort in prepare recipe along with the sort in the dataset explore is simply there to assist in the design of the recipe and testing the outputs on small data samples. You should not rely on them for data analysis. There are many ways to do data analysis in Dataiku:

Setup Info
    Tags
      Help me…