The Dataiku Frontrunner Awards have just launched to recognize your achievements! SUBMIT YOUR ENTRY

Features design in visual analysis

chocolatecake
Level 1
Features design in visual analysis

 

Please let me know if you have answer to the following 2 questions 


1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?

キャプチャ.PNG

 

 

 

 

 

 

 

2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?

0 Kudos
5 Replies
CoreyS
Community Manager
Community Manager

Hi, @chocolatecake ! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
MehdiH
Dataiker
Dataiker


Hi @chocolatecake 

Thank you for reaching out

> 1. How can I refer to the numerical value of the dummy data
generated on the [Feature generation] page?

The features generated in the [Feature generation] tab are only accessible internally by the DSS preprocessing pipeline during training or scoring. Hence you cannot access the dummy encoded categorical features in the "Explicit pairwise interaction" case

> 2. Is the dummy data calculated on the [Feature generation] page treated as one feature amount? The calculated dummy data does not appear on the Features handling page.
Is it possible to display dummy data on this page?

The generated features are separate from the [Features handling] tab, so they cannot be accessed there. Hence, and as mentioned above, you cannot access the dummy data there.

0 Kudos
tgb417
Neuron
Neuron

@MehdiH ,

I have a curiosity question. For an advanced user.  If one generates a notebook from a model result.  And then run's the individual cells in the notebook manually, can't one see the features as they will be submitted to Sci-kit learn?

Exporting a Notebook from a DSS ModelExporting a Notebook from a DSS Model

In the notebook, there appear to be all/most of the encoding steps.

However, I do notice the following warning in the notebook.

Warning
The goal of this notebook is to provide an easily readable and explainable
code that reproduces the main steps of training the model. It is not complete:
some of the preprocessing done by the DSS visual machine learning is not
replicated in this notebook. This notebook will not give the same results
and model performance as the DSS visual machine learning model.

How different are these notebooks from the actual processing that DSS does to our data?

And what about the exporting of the model documentation?  Is that incomplete and in what ways?

--Tom
0 Kudos
MehdiH
Dataiker
Dataiker

Hi @tgb417 

You're right in the sense that the notebook aims at being a standalone version of the model's preprocessing/training pipeline. It can be run outside of DSS, as long as the required packages such as scikit are installed of course. 

However, as you noticed in the disclaimer, it is not an exact copy of Visual ML model: first, it is only available for the Python (in-memory) backend, and all the preprocessings are not available for export (e.g. feature hashing for categorical features, quantization and binarization for numerical features ...). If your goal is to have an exact replica of the DSS model, then the notebook export is not the best solution (see the documentation on how to export a model for other options)

As for the model documentation, the purpose is quite different: it provides a complete documentation of how the DSS Visual ML model was trained, in order to prove that you followed industry best practices to build your model. Consequently it is directly linked to the Visual ML model design, and tries to be as complete as possible. More precisely, the Model Document Generator documentation  states that it provides information regarding:

  • What the model does

  • How the model was built (algorithms, features, processing, …)

  • How the model was tuned

  • What are the model’s performances

Cheers

tgb417
Neuron
Neuron

@MehdiH 

Thanks for the information and transparency.

--Tom

--Tom
0 Kudos
A banner prompting to get Dataiku DSS
Public