-
Custom preprocessing steps in the Features Handling section in model design
I'm trying to create a custom transformation but haven't been successful. The sample code provided works fine but when I define my own function with the same transformation it fails. See below for the exact snippets. Works: from sklearn import preprocessing import numpy as np # Applies log transformation to the feature…
-
How to apply the same transformations to test and train without duplicating the flow?
Are filters applied in the visual analysis also deployed together with the model and thus will apply the same filters at prediction time? If not then how to accomplish this typical pipeline behaviour?
-
How to access "input dataset" to ML model using python API?
DSS processes the datasets under the hood to create an "input dataset" that is used to train the model and making prediction using the previously trained model. Is it possible to access the that "input dataset" using Python API? And how to you do that?
-
Difference Between Flow and Analysis Steps
What is the difference between the Flow and Analysis steps? Are the Analysis steps applied to the 'Web Service' before the transformations? Like having a preprocessing function before a sklearn.pipeline()? def preprocessing(data): # do some preprocessing... but not transforms... retrun data
-
Splitting dataset
Hi, I have a dataset with too few churn iterations (0.9 are non-churners) so I want to split the dataset into train and test set but I would like to have higher percentage of churner in the train test. I tried to use the split recipe but I can't manage to get what I want (either I get the same representation or churners in…
-
Trainning session loads all columns not only the selected ones causing an OutOfMemory exception
Hi, I'm having an OutOfMemoryException while running a training session. I can see on the logs that all the columns are normalized even they do not participate on the session. [2018-06-09 20:00:45,946] [24072/MainThread] [INFO] [root] Reading with FIXED dtypes: {u'Flygbolag': 'str', u'Distance': <type 'numpy.float64'>,…
-
Train multiple neural networks in one Analysis?
The title basically says it all. I want to try different hyperparameters for my Neural Network (or algorithms in general). For some, like random forest, I can specify a list - e.g., max_depth. What I need is a queue of Neural Networks with different hyperparameters, so that I can start them in the evening and come back to…
-
"Generate Big Data" Processor?
I can't find the processor "Generate Big Data" (pictured below) in the documentation. I presume it imputes my current dataset to create more samples?
-
Oversampling Dataset
Is it somehow possible to oversample my dataset? for example, I have such records and target variables 1 2 3 | 5 2 2 3 | 6 1 1 1 | 1 3 2 2 | 5 I want to duplicate (or generate more than one duplicate) row #3 and make my dataset looks as follows: 1 2 3 | 5 2 2 3 | 6 1 1 1 | 1 1 1 1 | 1 3 2 2 | 5 How can I do this? Thank you…
-
Spark schema: Cannot handle an ARRAY without specified content type
Hi, I obtain this error message when training a model in DSS with Spark MLLib. However, when I go to the "script" tab, I have properly set the meaning to "Text". Why does DSS still think it's an array ?