Features design in visual analysis

chocolatecake · February 2021

Please let me know if you have answer to the following 2 questions

1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?

キャプチャ.PNG

2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?

CoreyS · February 2021

Hi, @chocolatecake
! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

MehdiH · April 2021

Hi @chocolatecake

Thank you for reaching out

> 1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?

The features generated in the [Feature generation] tab are only accessible internally by the DSS preprocessing pipeline during training or scoring. Hence you cannot access the dummy encoded categorical features in the "Explicit pairwise interaction" case

> 2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?

The generated features are separate from the [Features handling] tab, so they cannot be accessed there. Hence, and as mentioned above, you cannot access the dummy data there.

tgb417 · April 2021

@MehdiH
,

I have a curiosity question. For an advanced user. If one generates a notebook from a model result. And then run's the individual cells in the notebook manually, can't one see the features as they will be submitted to Sci-kit learn?

Exporting Model Notebook.jpg

In the notebook, there appear to be all/most of the encoding steps.

However, I do notice the following warning in the notebook.

Warning
The goal of this notebook is to provide an easily readable and explainable 
code that reproduces the main steps of training the model. It is not complete: 
some of the preprocessing done by the DSS visual machine learning is not 
replicated in this notebook. This notebook will not give the same results 
and model performance as the DSS visual machine learning model.

How different are these notebooks from the actual processing that DSS does to our data?

And what about the exporting of the model documentation? Is that incomplete and in what ways?

MehdiH · April 2021

Hi @tgb417

You're right in the sense that the notebook aims at being a standalone version of the model's preprocessing/training pipeline. It can be run outside of DSS, as long as the required packages such as scikit are installed of course.

However, as you noticed in the disclaimer, it is not an exact copy of Visual ML model: first, it is only available for the Python (in-memory) backend, and all the preprocessings are not available for export (e.g. feature hashing for categorical features, quantization and binarization for numerical features ...). If your goal is to have an exact replica of the DSS model, then the notebook export is not the best solution (see the documentation on how to export a model for other options)

As for the model documentation, the purpose is quite different: it provides a complete documentation of how the DSS Visual ML model was trained, in order to prove that you followed industry best practices to build your model. Consequently it is directly linked to the Visual ML model design, and tries to be as complete as possible. More precisely, the Model Document Generator documentation states that it provides information regarding:

What the model does
How the model was built (algorithms, features, processing, …)
How the model was tuned
What are the model’s performances

Cheers

tgb417 · April 2021

@MehdiH

Thanks for the information and transparency.

--Tom

Features design in visual analysis

Answers

Categories

Setup Info

Tags