Using Dataiku
- I want to check Dataiku status running ./dss status command and it returns: dss: DSS supervisor is not running Dataiku is working fine without any problem but the command is not working. I validated m…Last answer by Turribeach
"DSS supervisor is not running" doesn't mean DSS is not running. It means the supervisord process which looks after the DSS processes is not running. Have a look at the ./run/supervisord.log to see what the problem with the supervisord process was or do "dss restart" to restart all processes.
Last answer by Turribeach"DSS supervisor is not running" doesn't mean DSS is not running. It means the supervisord process which looks after the DSS processes is not running. Have a look at the ./run/supervisord.log to see what the problem with the supervisord process was or do "dss restart" to restart all processes.
- Hi, In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning. I woulf like to check if each element of this array is in an other array defined in glo…Last answer by Turribeach
Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:
# Modify the process function to fit your needs import pandas as pd def process(rows): # In 'cell' mode, the process function must return # a single Pandas Series for each block of rows, # which will be affected to a new column. # The 'rows' argument is a dictionary of columns in the # block of rows, with values in the dictionary being # Pandas Series, which additionally holds an 'index' # field. return pd.Series(len(rows), index=rows.index)
Last answer by TurribeachBecause you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:
# Modify the process function to fit your needs import pandas as pd def process(rows): # In 'cell' mode, the process function must return # a single Pandas Series for each block of rows, # which will be affected to a new column. # The 'rows' argument is a dictionary of columns in the # block of rows, with values in the dictionary being # Pandas Series, which additionally holds an 'index' # field. return pd.Series(len(rows), index=rows.index)
- I am trying to run a python recipe and have a model saved in a managed folder. I understand that I have to use get_download_stream() to read the data, but the python module that I need to use (FAISS) …Solution bySolution by Zach
Hi @Astrogurl
,The following code will download the file to a temporary directory first so that you can pass the path to FAISS:
import os.path import shutil import tempfile import dataiku folder = dataiku.Folder("FOLDER") with tempfile.TemporaryDirectory() as temp_dir: path = os.path.join(temp_dir, "my-file.txt") # Download the remote file to `path` with folder.get_download_stream("/my-file.txt") as download_stream: with open(path, "wb") as local_file: shutil.copyfileobj(download_stream, local_file) # Do stuff with the temp file here # It will be automatically deleted when the `temp_dir` block finishes print(path)
Thanks,
Zach
- Hi there, I encounter the sudden issue of not being able to load datasets into a Jupyter Notebook. Changing environment/Kernel doesn't help. System reboot doesn't help. Force reloading doesn't help ne…Last answer by
- Hi, while working on a Jupyter notebook to build a dataset, getting the attached error. Have tried reloading the notebook as well. Can it be due to Dataiku configuration. Kindly suggest. Thanks, Parul…Last answer by
- Hi, I'm trying to train a model in lab using the API from a notebook. I'm using the below code to setup the ML task.I'm currently using the "MasterData" as my data. I want to use a different dataset "…Last answer byLast answer by Mohammed
@AlexT
, I didn't follow the solution you provided
I see a method set_split_explicit to set the train/test split to an explicit extract from one or two dataset(s).
I tried it as follows
settings.get_split_params().set_split_explicit(train_selection,test_selection,
test_dataset_name="UpcomingData_Toronto")
Not sure what is expected out of train_selection and test_selection arguments.
In the documentation it is given as below.train_selection (Union[DSSDatasetSelectionBuilder, dict]) – Builder or dict defining the settings of the extract for the train dataset. May be None (won’t be changed). A dict with the appropriate schema can be generated via dataikuapi.dss.utils.DSSDatasetSelectionBuilder.build()
test_selection (Union[DSSDatasetSelectionBuilder, dict]) – Builder or dict defining the settings of the extract for the test dataset. May be None (won’t be changed). A dict with the appropriate schema can be generated via dataikuapi.dss.utils.DSSDatasetSelectionBuilder.build()
What should I provide as arguments for train_selection and test_selection in the set_split_explicit ?
- I am trying to train models with "Time ordering" enabled on the attached dataset I get the error message below, but training sails successfully when "Time ordering" is not enabled. The file is a merge…Last answer byLast answer by Azucena
Hi @sdfungayi
Pretty old post but.....were you able to solve this issue?
I was having a similar issue and came across your post. I was having the error:
dataiku.doctor.preprocessing.dataframe preprocessing.dkudroppedmultiframeexception'> : ['target'] values all empty, infinity or with unknown classes (you may need to recompute the training set)I sorted my issue by making sure that storage type and meaning were both discrete in my target variable.
It was originally storage type = string , meaning = decimal
Trying to use random forest and logistic regression.
Once I changed my target variable to
storage type = string , meaning = Integer
The ML models were able to run.
I kept storage type as string, as needed to identify the undefined records vs values (0 and 1)In your post you mentioned you tried also changing meanings, any success with that?
It worked for mine, hope yours as well!!!!! - Hi, I want to update a sql server table (dataset) . I understand that there are scenarios option that triggers update sql querry . As this feature is not included in my licence, what are the other opt…Last answer byLast answer by Alexandru
Hi @Lo96
,
The SQL triggers are indeed licenses-based and typically require Bussiness or Enterprise.
https://doc.dataiku.com/dss/latest/scenarios/triggers.html#sql-triggers
However, SQL triggers are not for updating SQL datasets; instead, they trigger a build/scenario when an SQL query changes.
If you want to update a SQL server table, you can just build a dataset /run a recipe that does the update.
You can trigger time-based scenarios with a Discover license as well. - Hello community I was working on a repetitive rework task and it occurred to me that I could automate it with a new scenario custom python script First I wanted to confirm that it works and then start…Last answer byLast answer by Alexandru
Hi @Lucasjulian
,
The method should work if there is something to build.
Can you try with :
scenario.build_dataset(dataset_name, build_mode='RECURSIVE_FORCED_BUILD')?
If that still build nothing we may need scenario diagnostics. Can you please open a support ticket with the scenario diagnostics?
Thanks - Hi all! I have fairly straightforward problem (or at least I think that it is like that :) ). I have files arriving into a Azure Blob Storage container. I created a flow to process them without a prob…Last answer by