Features design in visual analysis

Options
chocolatecake
chocolatecake Registered Posts: 1 ✭✭✭

Please let me know if you have answer to the following 2 questions


1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?

キャプチャ.PNG

2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?

Answers

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
    Options

    Hi, @chocolatecake
    ! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

  • MehdiH
    MehdiH Dataiker, Dataiku DSS Core Designer, Dataiku DSS Core Concepts Posts: 21 Dataiker
    Options


    Hi @chocolatecake

    Thank you for reaching out

    > 1. How can I refer to the numerical value of the dummy data
    generated on the [Feature generation] page?

    The features generated in the [Feature generation] tab are only accessible internally by the DSS preprocessing pipeline during training or scoring. Hence you cannot access the dummy encoded categorical features in the "Explicit pairwise interaction" case

    > 2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
    Is it possible to display dummy data on this page?

    The generated features are separate from the [Features handling] tab, so they cannot be accessed there. Hence, and as mentioned above, you cannot access the dummy data there.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    edited July 17
    Options

    @MehdiH
    ,

    I have a curiosity question. For an advanced user. If one generates a notebook from a model result. And then run's the individual cells in the notebook manually, can't one see the features as they will be submitted to Sci-kit learn?

    Exporting Model Notebook.jpg

    In the notebook, there appear to be all/most of the encoding steps.

    However, I do notice the following warning in the notebook.

    Warning
    The goal of this notebook is to provide an easily readable and explainable
    code that reproduces the main steps of training the model. It is not complete:
    some of the preprocessing done by the DSS visual machine learning is not
    replicated in this notebook. This notebook will not give the same results
    and model performance as the DSS visual machine learning model.

    How different are these notebooks from the actual processing that DSS does to our data?

    And what about the exporting of the model documentation? Is that incomplete and in what ways?

  • MehdiH
    MehdiH Dataiker, Dataiku DSS Core Designer, Dataiku DSS Core Concepts Posts: 21 Dataiker
    Options

    Hi @tgb417

    You're right in the sense that the notebook aims at being a standalone version of the model's preprocessing/training pipeline. It can be run outside of DSS, as long as the required packages such as scikit are installed of course.

    However, as you noticed in the disclaimer, it is not an exact copy of Visual ML model: first, it is only available for the Python (in-memory) backend, and all the preprocessings are not available for export (e.g. feature hashing for categorical features, quantization and binarization for numerical features ...). If your goal is to have an exact replica of the DSS model, then the notebook export is not the best solution (see the documentation on how to export a model for other options)

    As for the model documentation, the purpose is quite different: it provides a complete documentation of how the DSS Visual ML model was trained, in order to prove that you followed industry best practices to build your model. Consequently it is directly linked to the Visual ML model design, and tries to be as complete as possible. More precisely, the Model Document Generator documentation states that it provides information regarding:

    • What the model does

    • How the model was built (algorithms, features, processing, …)

    • How the model was tuned

    • What are the model’s performances

    Cheers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @MehdiH

    Thanks for the information and transparency.

    --Tom

Setup Info
    Tags
      Help me…