Features design in visual analysis
Please let me know if you have answer to the following 2 questions
1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?
2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi, @chocolatecake
! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community. -
Thank you for reaching out
> 1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?The features generated in the [Feature generation] tab are only accessible internally by the DSS preprocessing pipeline during training or scoring. Hence you cannot access the dummy encoded categorical features in the "Explicit pairwise interaction" case
> 2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?The generated features are separate from the [Features handling] tab, so they cannot be accessed there. Hence, and as mentioned above, you cannot access the dummy data there.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
@MehdiH
,I have a curiosity question. For an advanced user. If one generates a notebook from a model result. And then run's the individual cells in the notebook manually, can't one see the features as they will be submitted to Sci-kit learn?
In the notebook, there appear to be all/most of the encoding steps.
However, I do notice the following warning in the notebook.
Warning
The goal of this notebook is to provide an easily readable and explainable
code that reproduces the main steps of training the model. It is not complete:
some of the preprocessing done by the DSS visual machine learning is not
replicated in this notebook. This notebook will not give the same results
and model performance as the DSS visual machine learning model.How different are these notebooks from the actual processing that DSS does to our data?
And what about the exporting of the model documentation? Is that incomplete and in what ways?
-
Hi @tgb417
You're right in the sense that the notebook aims at being a standalone version of the model's preprocessing/training pipeline. It can be run outside of DSS, as long as the required packages such as scikit are installed of course.
However, as you noticed in the disclaimer, it is not an exact copy of Visual ML model: first, it is only available for the Python (in-memory) backend, and all the preprocessings are not available for export (e.g. feature hashing for categorical features, quantization and binarization for numerical features ...). If your goal is to have an exact replica of the DSS model, then the notebook export is not the best solution (see the documentation on how to export a model for other options)
As for the model documentation, the purpose is quite different: it provides a complete documentation of how the DSS Visual ML model was trained, in order to prove that you followed industry best practices to build your model. Consequently it is directly linked to the Visual ML model design, and tries to be as complete as possible. More precisely, the Model Document Generator documentation states that it provides information regarding:
What the model does
How the model was built (algorithms, features, processing, …)
How the model was tuned
What are the model’s performances
Cheers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron