Using Dataiku

Sort by:
751 - 760 of 5.2k
  • 2 different laptops, both with the same error when installing plugins for Text Preparation, Visualization and Sentiment Analysis. 'Environment Creation Failed'. Can anyone assist. I have uninstalled t…
    Question
    Started by carlgomersall
    Most recent by Turribeach
    0
    1
    Turribeach
    Last answer by Turribeach

    We would need to see the actual error in the code environment log to be able to help.

    Turribeach
    Last answer by Turribeach

    We would need to see the actual error in the code environment log to be able to help.

  • Hi, I want to automate an XML process where everytime an xml is dropped, a bunch of recipes process the xml and generate a csv at the end. However, my xml may vary, some tags and attibutes may or may …
    Question
    Started by jeanclaude_ho
    Most recent by Turribeach
    0
    1
    Turribeach
    Last answer by Turribeach

    You will need to use a Python recipe with custom code to handle this. But there is a limit on how much you can change the source file before your code won't be able to handle it.

    Turribeach
    Last answer by Turribeach

    You will need to use a Python recipe with custom code to handle this. But there is a limit on how much you can change the source file before your code won't be able to handle it.

  • I want to check Dataiku status running ./dss status command and it returns: dss: DSS supervisor is not running Dataiku is working fine without any problem but the command is not working. I validated m…
    Question
    Started by rafael_rosado97
    Most recent by Turribeach
    0
    1
    Last answer by
    Turribeach
    Last answer by Turribeach

    "DSS supervisor is not running" doesn't mean DSS is not running. It means the supervisord process which looks after the DSS processes is not running. Have a look at the ./run/supervisord.log to see what the problem with the supervisord process was or do "dss restart" to restart all processes.

  • Hi, In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning. I woulf like to check if each element of this array is in an other array defined in glo…
    Question
    Started by EdBerth
    Most recent by Turribeach
    0
    1
    Last answer by
    Turribeach
    Last answer by Turribeach

    Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:

    # Modify the process function to fit your needs
    import pandas as pd
    def process(rows):
        # In 'cell' mode, the process function must return
        # a single Pandas Series for each block of rows,
        # which will be affected to a new column.
        # The 'rows' argument is a dictionary of columns in the
        # block of rows, with values in the dictionary being
        # Pandas Series, which additionally holds an 'index'
        # field.
        return pd.Series(len(rows), index=rows.index)

  • I am trying to run a python recipe and have a model saved in a managed folder. I understand that I have to use get_download_stream() to read the data, but the python module that I need to use (FAISS) …
    Answered ✓
    Started by Astrogurl
    Most recent by info-rchitect
    0
    3
    Solution by
    Zach
    Solution by Zach

    Hi @Astrogurl
    ,

    The following code will download the file to a temporary directory first so that you can pass the path to FAISS:

    import os.path
    import shutil
    import tempfile
    
    import dataiku
    
    folder = dataiku.Folder("FOLDER")
    
    with tempfile.TemporaryDirectory() as temp_dir:
        path = os.path.join(temp_dir, "my-file.txt")
        
        # Download the remote file to `path`
        with folder.get_download_stream("/my-file.txt") as download_stream:
            with open(path, "wb") as local_file:
                shutil.copyfileobj(download_stream, local_file)
                
        # Do stuff with the temp file here
        # It will be automatically deleted when the `temp_dir` block finishes
        print(path)

    Thanks,

    Zach

  • Hi there, I encounter the sudden issue of not being able to load datasets into a Jupyter Notebook. Changing environment/Kernel doesn't help. System reboot doesn't help. Force reloading doesn't help ne…
    Question
    Started by ebbingcasa
    Most recent by ebbingcasa
    0
    3
    Last answer by
    ebbingcasa
    Last answer by ebbingcasa

    Added the following error code to the first post:

    "java.lang.SecurityException: Ticket not given or unrecognized"

  • Hi, while working on a Jupyter notebook to build a dataset, getting the attached error. Have tried reloading the notebook as well. Can it be due to Dataiku configuration. Kindly suggest. Thanks, Parul…
    Question
    Started by Parul_ch
    Most recent by SaschaS
    0
    4
    Last answer by
    SaschaS
    Last answer by SaschaS

    Hi,

    did you solve this problem.
    Since yesterday I experience the same issue.

    Best
    Sascha

  • Hi, I'm trying to train a model in lab using the API from a notebook. I'm using the below code to setup the ML task.I'm currently using the "MasterData" as my data. I want to use a different dataset "…
    Question
    Started by Mohammed
    Most recent by Mohammed
    0
    2
    Last answer by
    Mohammed
    Last answer by Mohammed

    @AlexT
    , I didn't follow the solution you provided
    I see a method set_split_explicit to set the train/test split to an explicit extract from one or two dataset(s).

    I tried it as follows
    settings.get_split_params().set_split_explicit(train_selection,test_selection,
    test_dataset_name="UpcomingData_Toronto")
    Not sure what is expected out of train_selection and test_selection arguments.
    In the documentation it is given as below.

  • I am trying to train models with "Time ordering" enabled on the attached dataset I get the error message below, but training sails successfully when "Time ordering" is not enabled. The file is a merge…
    Question
    Started by sdfungayi
    Most recent by Azucena
    0
    4
    Last answer by
    Azucena
    Last answer by Azucena

    Hi @sdfungayi

    Pretty old post but.....were you able to solve this issue?
    I was having a similar issue and came across your post. I was having the error:
    dataiku.doctor.preprocessing.dataframe preprocessing.dkudroppedmultiframeexception'> : ['target'] values all empty, infinity or with unknown classes (you may need to recompute the training set)

    I sorted my issue by making sure that storage type and meaning were both discrete in my target variable.
    It was originally storage type = string , meaning = decimal
    Trying to use random forest and logistic regression.

    Once I changed my target variable to
    storage type = string , meaning = Integer
    The ML models were able to run.
    I kept storage type as string, as needed to identify the undefined records vs values (0 and 1)

    In your post you mentioned you tried also changing meanings, any success with that?
    It worked for mine, hope yours as well!!!!!

  • Hi, I want to update a sql server table (dataset) . I understand that there are scenarios option that triggers update sql querry . As this feature is not included in my licence, what are the other opt…
    Question
    Started by Lo96
    Most recent by Alexandru
    0
    1
    Last answer by
    Alexandru
    Last answer by Alexandru

    Hi @Lo96
    ,
    The SQL triggers are indeed licenses-based and typically require Bussiness or Enterprise.
    https://doc.dataiku.com/dss/latest/scenarios/triggers.html#sql-triggers
    However, SQL triggers are not for updating SQL datasets; instead, they trigger a build/scenario when an SQL query changes.
    If you want to update a SQL server table, you can just build a dataset /run a recipe that does the update.
    You can trigger time-based scenarios with a Discover license as well.


751 - 760 of 5.2k76