Partition - Discrete dimension example (along a column )

n0thing233
n0thing233 Registered Posts: 13 ✭✭✭✭

I have a file-based dataset(csv format).I want to partition this dataset based on value a a column (there are 5 values of the column (0,1,2,3,4)

I followed the tutorial but cannot paritiion it.

My column name is 'partition'.And I clicked on "add decrete dimension".Then I fill "partition" in to the box and it generate the pattern "%{partition}/.*"

But then I clicked "list partitions" button. it shows me "

Detected 0 partitions

    Found 1 unmatched file:

    • /out-s0.csv

    "

    Anyone can help me?

    Tagged:

    Best Answer

    • Mattsco
      Mattsco Dataiker, Registered Posts: 125 Dataiker
      Answer ✓

      Hi,

      Your dataset is not yet partitioned. You need to rebuild it to see the generated partitions.

      To generate this partitioned dataset the parent recipe should be a sync (Configuration tab) or a prepare recipe (Advanced tab) with the redispatch partitioning activated.



      Redispatch setting

      Matt

    Answers

    • n0thing233
      n0thing233 Registered Posts: 13 ✭✭✭✭
      sorry I don't see "redispatch partitioning according to input columns" in my advanced tab, what could be the reason? My seeting told me " No settings required".
    • Mattsco
      Mattsco Dataiker, Registered Posts: 125 Dataiker
      To see it, you need to have the output dataset of the recipe partitioned.
      I added a picture on the main answer.
    • n0thing233
      n0thing233 Registered Posts: 13 ✭✭✭✭
      Sorry I might ask some stupid questions but the picture you show me is different from my dataiku .
      In my sync recipe-> advanced->settings there is no checkbox for "redispatch partitioning according to input columns". I really want to attach my screenshot here. How should I do that?
    • Mattsco
      Mattsco Dataiker, Registered Posts: 125 Dataiker
      Sorry, in the sync it's in the Configuration tab
    • n0thing233
      n0thing233 Registered Posts: 13 ✭✭✭✭
      In the sync-> configuration tab -> settings .I have only two options:"Free output schema(name-based matching)" and "Maintain strict schema equality".There is still no "redispatch partitioning according to input columns"
    • Mattsco
      Mattsco Dataiker, Registered Posts: 125 Dataiker
      In the sync-> configuration tab -> output, can you confirm the dataset is partitionned by something?
      you should see that mentioned below the name of the dataset.
    • n0thing233
      n0thing233 Registered Posts: 13 ✭✭✭✭
      Thanks. I finally made some progress. I'm able to do the partition.
    • Valengo
      Valengo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 2 ✭✭✭✭

      Hello @n0thing233
      , I'm in the same situation than you, and I'm really interrested about how did you do the partition in the sync recipe. Cound you please explain the steps to follow please ?

      Thank you very much in advance.

    Setup Info
      Tags
        Help me…