Using Dataiku
- I have a custom multiclassification algorithm. I cannot figure out the conditions under which the Dataiku scoring system will call predict_proba. Thanks for any help. Gordon Operating system used: Mar…Last answer by Erlebacher
I have made some progress, within the "lab" attached to one of my training sets. I have defined two scoring functions, and they run properly (I return "synthetic" data consistent with the required formats). But then I get the following error:
ValueError: Classification metrics can't handle a mix of multiclass and continuous targets
It is true that my features are a mixture of multiclass and continuous targets. Still, I have two questions:
1) if I am writing custom functions, why should Dataiku care about this mixture?2) why can't this error be listed before wasting computational resources processing my custom functions? Dataiku must know about this mix immediately at the very first stage when my custom algorithm is trained (which Dataiku had no issues with.)
3) What is happening after the function scoring?
Last answer by ErlebacherI have made some progress, within the "lab" attached to one of my training sets. I have defined two scoring functions, and they run properly (I return "synthetic" data consistent with the required formats). But then I get the following error:
ValueError: Classification metrics can't handle a mix of multiclass and continuous targets
It is true that my features are a mixture of multiclass and continuous targets. Still, I have two questions:
1) if I am writing custom functions, why should Dataiku care about this mixture?2) why can't this error be listed before wasting computational resources processing my custom functions? Dataiku must know about this mix immediately at the very first stage when my custom algorithm is trained (which Dataiku had no issues with.)
3) What is happening after the function scoring?
- Is it possible to set up notifications for admins in Dataiku for certain events? Below is a (non-exhaustive) list of certain events I am interested in: * A new deployment is created by a user * A new …
- Hello, I'm working with partitoned managed folders and i would like to get the pattern of the partionning (available in the web interface ; see screenshot) For Datasets, I can get such informations wi…Last answer by
- Hello everyone, I began using Dataiku a few days ago. I have a lot of "address" data, and I tried to use the Geocoder pluging in order to convert them into usable coordinates and geopoints. As this pl…Last answer byLast answer by Alexandru
Hi @PeteGore
,The python processor within a prepared recipe can only apply to one column.
You can compute both and put them to the targeted column for the python processor and later split it to separate columns with another processor.
You can perform this in a Python recipe instead.
You can leverage project libraries if you need to reuse python code https://doc.dataiku.com/dss/latest/python/reusing-code.html
Or package a python recipe as a plugging:
https://doc.dataiku.com/dss/latest/python/reusing-code.html#packaging-code-as-plugins
- Using the following code: from sklearn.metrics import rand_score (which is what is required per scikit-learn.org I get error ---------------------------------------------------------------------------…Last answer by
- I have a dataset which I would like to update several times to give me monthly records, for 12 months, for example. can this be done in an SQL recipe using loops? Or can it be done? pls let me know ho…Last answer by
- Hello community, I want to write .TIF file into a folder, after I run the code recipe the run is successful but the file is empty. Here is an example of my code and a photo of the output: input_folder…Solution bySolution by razan
I have solved it using os library, where you open the output folder using os.chdir() and after you flush the file cache you delete the output dataset (out_ds) or set it to None.
import os input_folder = dataiku.Folder('input_folder_name') output_folder = dataiku.Folder('output_folder_name') os.chdir(output_folder.get_info()['path']) band = input_folder['path'] + inputfolder.list_paths_in_partition()[0] in_ds = gdal.Open(band) in_band = in_ds.GetRasterBand(1) gtiff_driver = gdal.GetDriverByName('GTiff') out_ds = gtiff_driver.Create('nat_color.tif', in_band.XSize, in_band.YSize, 1, in_band.DataType) out_ds.SetProjection(in_ds.GetProjection()) out_ds.SetGeoTransform(in_ds.GetGeoTransform()) out_band = out_ds.GetRasterBand(1) out_band.WriteArray(in_band.ReadAsArray()) out_ds.FlushCache() for i in range(1, 4): out_ds.GetRasterBand(i).ComputeStatistics(False) out_ds = None in_band = None in_ds = None
- How can one remove a message? It no longer applies. I am referring to this message. Thanks.Last answer by
- Hi all, I am deploying a api pod in kubernetes, our process seems to be slow. Can i find out where do we find the python printouts in logs within the pods? Is there a place where we need to set the ht…Last answer byLast answer by JordanB
Hi @james147
,Where are you experiencing the slowness - is it during the deployment (when you select deploy or update) or while the api is deployed? Note, API deployment requires rebuilding the code env. If you are using R, or code envs with very large Python packages, or Python packages for which precompiled binaries are not available, we confirm that this can take some time. If you are not using R, you can rebuild the deployment without it.
In DSS 10.0.6+ to obtain the pod logs (apimain.log) you can use Administration - Cluster - Actions and run the following command.kubectl exec <podname> -- cat /home/dataiku/data/run/apimain.log
To identify the pod name, go to Administration - Cluster - Monitoring and find the name(s). You may need to check each pod logs.You can also manually redirect the apimain.log output to stdout:cat /home/dataiku/data/run/apimain.log 2>&1
Is there a place where we need to set the http timeouts? > Can you expand on this?
Where can i find the dockerfile that is used to build for this docker image when we do a deployment with code env? > The image can be found within the datadir under /tmp/api_deployer, however, I'm not sure this will help troubleshoot the slowness unless you have customized it.
If you could provide details regarding where/when this slowness is occurring, that would be great!
Thanks!
Jordan
- Are there any examples on how to build a many-to-many relationship within Dataiku? I find it strange that there appears to be no one that even asked for this before. Any direction would be greatly app…Last answer by