Using Dataiku

Sort by:

751 - 760 of 5.2k

Error messages
2 different laptops, both with the same error when installing plugins for Text Preparation, Visualization and Sentiment Analysis. 'Environment Creation Failed'. Can anyone assist. I have uninstalled t…
Question
Started by carlgomersall
Most recent by Turribeach
Apr 16, 2024
0
1
Last answer by Turribeach
We would need to see the actual error in the code environment log to be able to help.
Last answer by Turribeach
We would need to see the actual error in the code environment log to be able to help.
Reply to Discussion
Reply to Discussion
XML automation
Hi, I want to automate an XML process where everytime an xml is dropped, a bunch of recipes process the xml and generate a csv at the end. However, my xml may vary, some tags and attibutes may or may …
Question
Started by jeanclaude_ho
Most recent by Turribeach
Apr 16, 2024
0
1
Last answer by Turribeach
You will need to use a Python recipe with custom code to handle this. But there is a limit on how much you can change the source file before your code won't be able to handle it.
Last answer by Turribeach
You will need to use a Python recipe with custom code to handle this. But there is a limit on how much you can change the source file before your code won't be able to handle it.
Reply to Discussion
Reply to Discussion
dss: DSS supervisor is not running
I want to check Dataiku status running ./dss status command and it returns: dss: DSS supervisor is not running Dataiku is working fine without any problem but the command is not working. I validated m…
Question
Ignition
Started by rafael_rosado97
Most recent by Turribeach
Apr 16, 2024
0
1
Last answer by
Last answer by Turribeach
"DSS supervisor is not running" doesn't mean DSS is not running. It means the supervisord process which looks after the DSS processes is not running. Have a look at the ./run/supervisord.log to see what the problem with the supervisord process was or do "dss restart" to restart all processes.
Reply to Discussion
Reply to Discussion
custom python function
Hi, In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning. I woulf like to check if each element of this array is in an other array defined in glo…
Question
Started by EdBerth
Most recent by Turribeach
Apr 16, 2024
0
1
Last answer by
Last answer by Turribeach
Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:

# Modify the process function to fit your needs import pandas as pd def process(rows): # In 'cell' mode, the process function must return # a single Pandas Series for each block of rows, # which will be affected to a new column. # The 'rows' argument is a dictionary of columns in the # block of rows, with values in the dictionary being # Pandas Series, which additionally holds an 'index' # field. return pd.Series(len(rows), index=rows.index)
Reply to Discussion
Reply to Discussion

Managed Folder with Container Execution

I am trying to run a python recipe and have a model saved in a managed folder. I understand that I have to use get_download_stream() to read the data, but the python module that I need to use (FAISS) …

Answered ✓

Ignition

Started by Astrogurl

Most recent by info-rchitect

Apr 16, 2024

Solution by

Solution by Zach

Hi @Astrogurl
,

The following code will download the file to a temporary directory first so that you can pass the path to FAISS:

import os.path
import shutil
import tempfile

import dataiku

folder = dataiku.Folder("FOLDER")

with tempfile.TemporaryDirectory() as temp_dir:
    path = os.path.join(temp_dir, "my-file.txt")
    
    # Download the remote file to `path`
    with folder.get_download_stream("/my-file.txt") as download_stream:
        with open(path, "wb") as local_file:
            shutil.copyfileobj(download_stream, local_file)
            
    # Do stuff with the temp file here
    # It will be automatically deleted when the `temp_dir` block finishes
    print(path)

Thanks,

Zach

Reply to Discussion

Exception: Unable to fetch schema for PROJECT.dataset: b'Ticket not given or unrecognized
Hi there, I encounter the sudden issue of not being able to load datasets into a Jupyter Notebook. Changing environment/Kernel doesn't help. System reboot doesn't help. Force reloading doesn't help ne…
Question
code
Datasets
Python
Started by ebbingcasa
Most recent by ebbingcasa
Apr 16, 2024
0
3
Last answer by
Last answer by ebbingcasa
Added the following error code to the first post:
"java.lang.SecurityException: Ticket not given or unrecognized"
Reply to Discussion
Reply to Discussion
Jupyter Notebook error in Dataiku
Hi, while working on a Jupyter notebook to build a dataset, getting the attached error. Have tried reloading the notebook as well. Can it be due to Dataiku configuration. Kindly suggest. Thanks, Parul…
Question
Started by Parul_ch
Most recent by SaschaS
Apr 16, 2024
0
4
Last answer by
Last answer by SaschaS
Hi,
did you solve this problem.
Since yesterday I experience the same issue.
Best
Sascha
Reply to Discussion
Reply to Discussion
Using two different datasets to train a model in lab using API
Hi, I'm trying to train a model in lab using the API from a notebook. I'm using the below code to setup the ML task.I'm currently using the "MasterData" as my data. I want to use a different dataset "…
Question
Started by Mohammed
Most recent by Mohammed
Apr 16, 2024
0
2
Last answer by
Last answer by Mohammed
@AlexT
, I didn't follow the solution you provided
I see a method set_split_explicit to set the train/test split to an explicit extract from one or two dataset(s).

I tried it as follows
settings.get_split_params().set_split_explicit(train_selection,test_selection,
test_dataset_name="UpcomingData_Toronto")
Not sure what is expected out of train_selection and test_selection arguments.
In the documentation it is given as below.
train_selection (Union[DSSDatasetSelectionBuilder, dict]) – Builder or dict defining the settings of the extract for the train dataset. May be None (won’t be changed). A dict with the appropriate schema can be generated via dataikuapi.dss.utils.DSSDatasetSelectionBuilder.build()
test_selection (Union[DSSDatasetSelectionBuilder, dict]) – Builder or dict defining the settings of the extract for the test dataset. May be None (won’t be changed). A dict with the appropriate schema can be generated via dataikuapi.dss.utils.DSSDatasetSelectionBuilder.build()

What should I provide as arguments for train_selection and test_selection in the set_split_explicit ?
Reply to Discussion
Reply to Discussion
Training Fails When Time Ordering Enabled
I am trying to train models with "Time ordering" enabled on the attached dataset I get the error message below, but training sails successfully when "Time ordering" is not enabled. The file is a merge…
Question
Started by sdfungayi
Most recent by Azucena
Apr 16, 2024
0
4
Last answer by
Last answer by Azucena
Hi @sdfungayi
Pretty old post but.....were you able to solve this issue?
I was having a similar issue and came across your post. I was having the error:
dataiku.doctor.preprocessing.dataframe preprocessing.dkudroppedmultiframeexception'> : ['target'] values all empty, infinity or with unknown classes (you may need to recompute the training set)
I sorted my issue by making sure that storage type and meaning were both discrete in my target variable.
It was originally storage type = string , meaning = decimal
Trying to use random forest and logistic regression.

Once I changed my target variable to
storage type = string , meaning = Integer
The ML models were able to run.
I kept storage type as string, as needed to identify the undefined records vs values (0 and 1)
In your post you mentioned you tried also changing meanings, any success with that?
It worked for mine, hope yours as well!!!!!

Reply to Discussion
Reply to Discussion
Update Dataset SQL server
Hi, I want to update a sql server table (dataset) . I understand that there are scenarios option that triggers update sql querry . As this feature is not included in my licence, what are the other opt…
Question
Started by Lo96
Most recent by Alexandru
Apr 12, 2024
0
1
Last answer by
Last answer by Alexandru
Hi @Lo96
,
The SQL triggers are indeed licenses-based and typically require Bussiness or Enterprise.
https://doc.dataiku.com/dss/latest/scenarios/triggers.html#sql-triggers
However, SQL triggers are not for updating SQL datasets; instead, they trigger a build/scenario when an SQL query changes.
If you want to update a SQL server table, you can just build a dataset /run a recipe that does the update.
You can trigger time-based scenarios with a Discover license as well.

Reply to Discussion
Reply to Discussion

751 - 760 of 5.2k76

Trending Discussions

Exporting to Windows Network Drive Folder Location
Answered
1
Send dataset to Teams message
Answered
2
Regarding, Dataiku Scenario, How to control the Scenario steps using variable?
Answered
4

Leaderboard

Member	Points
Turribeach	3702
tgb417	2515
Ignacio_Toledo	1082

Using Dataiku

Top Tags

Trending Discussions

Leaderboard