How to know which data are used in a flow

Jean-Sébastien
Jean-Sébastien Registered Posts: 2

I would like to know which data is used in a flow.
For example, my input dataset contains 10 data but only 4 are needed for calculations; I want to be able to identify the 6 "useless" data.
Do you have an idea to do this ?
Thanks

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron

    I think by data you really mean rows. Do you rows of data have any unique identifier? If so you can you use it to identify which ones are being used and which are not. If not then you could use the Window recipe to give every row a random ID and use that to track down where they are being used. Where is your dataset stored on?

  • Jean-Sébastien
    Jean-Sébastien Registered Posts: 2

    thanks for your answer.

    I'm really talking about columns:
    Example:
    I have a dataset of 5 columns, 3 are used in the recipes. I would like to be able to easily identify them and exclude the 2 unused columns.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,590 Neuron

    In v12 there isn't much you can do. But v13 has a new Column-level Data Lineage so another good reason to upgrade:

    https://doc.dataiku.com/dss/latest/data-catalog/data-lineage/index.html

Setup Info
    Tags
      Help me…