Using Dataiku

Sort by:
751 - 760 of 5.2k
  • I want to check Dataiku status running ./dss status command and it returns: dss: DSS supervisor is not running Dataiku is working fine without any problem but the command is not working. I validated m…
    Question
    Started by rafael_rosado97
    Most recent by Turribeach
    0
    1
    Turribeach
    Last answer by Turribeach

    "DSS supervisor is not running" doesn't mean DSS is not running. It means the supervisord process which looks after the DSS processes is not running. Have a look at the ./run/supervisord.log to see what the problem with the supervisord process was or do "dss restart" to restart all processes.

    Turribeach
    Last answer by Turribeach

    "DSS supervisor is not running" doesn't mean DSS is not running. It means the supervisord process which looks after the DSS processes is not running. Have a look at the ./run/supervisord.log to see what the problem with the supervisord process was or do "dss restart" to restart all processes.

  • Hi, In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning. I woulf like to check if each element of this array is in an other array defined in glo…
    Question
    Started by EdBerth
    Most recent by Turribeach
    0
    1
    Turribeach
    Last answer by Turribeach

    Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:

    # Modify the process function to fit your needs
    import pandas as pd
    def process(rows):
        # In 'cell' mode, the process function must return
        # a single Pandas Series for each block of rows,
        # which will be affected to a new column.
        # The 'rows' argument is a dictionary of columns in the
        # block of rows, with values in the dictionary being
        # Pandas Series, which additionally holds an 'index'
        # field.
        return pd.Series(len(rows), index=rows.index)

    Turribeach
    Last answer by Turribeach

    Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:

    # Modify the process function to fit your needs
    import pandas as pd
    def process(rows):
        # In 'cell' mode, the process function must return
        # a single Pandas Series for each block of rows,
        # which will be affected to a new column.
        # The 'rows' argument is a dictionary of columns in the
        # block of rows, with values in the dictionary being
        # Pandas Series, which additionally holds an 'index'
        # field.
        return pd.Series(len(rows), index=rows.index)

  • I am trying to run a python recipe and have a model saved in a managed folder. I understand that I have to use get_download_stream() to read the data, but the python module that I need to use (FAISS) …
    Answered ✓
    Started by Astrogurl
    Most recent by info-rchitect
    0
    3
    Solution by
    Zach
    Solution by Zach

    Hi @Astrogurl
    ,

    The following code will download the file to a temporary directory first so that you can pass the path to FAISS:

    import os.path
    import shutil
    import tempfile
    
    import dataiku
    
    folder = dataiku.Folder("FOLDER")
    
    with tempfile.TemporaryDirectory() as temp_dir:
        path = os.path.join(temp_dir, "my-file.txt")
        
        # Download the remote file to `path`
        with folder.get_download_stream("/my-file.txt") as download_stream:
            with open(path, "wb") as local_file:
                shutil.copyfileobj(download_stream, local_file)
                
        # Do stuff with the temp file here
        # It will be automatically deleted when the `temp_dir` block finishes
        print(path)

    Thanks,

    Zach

  • Hi there, I encounter the sudden issue of not being able to load datasets into a Jupyter Notebook. Changing environment/Kernel doesn't help. System reboot doesn't help. Force reloading doesn't help ne…
    Question
    Started by ebbingcasa
    Most recent by ebbingcasa
    0
    3
    Last answer by
    ebbingcasa
    Last answer by ebbingcasa

    Added the following error code to the first post:

    "java.lang.SecurityException: Ticket not given or unrecognized"

  • Hi, while working on a Jupyter notebook to build a dataset, getting the attached error. Have tried reloading the notebook as well. Can it be due to Dataiku configuration. Kindly suggest. Thanks, Parul…
    Question
    Started by Parul_ch
    Most recent by SaschaS
    0
    4
    Last answer by
    SaschaS
    Last answer by SaschaS

    Hi,

    did you solve this problem.
    Since yesterday I experience the same issue.

    Best
    Sascha

  • Hi, I'm trying to train a model in lab using the API from a notebook. I'm using the below code to setup the ML task.I'm currently using the "MasterData" as my data. I want to use a different dataset "…
    Question
    Started by Mohammed
    Most recent by Mohammed
    0
    2
    Last answer by
    Mohammed
    Last answer by Mohammed

    @AlexT
    , I didn't follow the solution you provided
    I see a method set_split_explicit to set the train/test split to an explicit extract from one or two dataset(s).

    I tried it as follows
    settings.get_split_params().set_split_explicit(train_selection,test_selection,
    test_dataset_name="UpcomingData_Toronto")
    Not sure what is expected out of train_selection and test_selection arguments.
    In the documentation it is given as below.

  • I am trying to train models with "Time ordering" enabled on the attached dataset I get the error message below, but training sails successfully when "Time ordering" is not enabled. The file is a merge…
    Question
    Started by sdfungayi
    Most recent by Azucena
    0
    4
    Last answer by
    Azucena
    Last answer by Azucena

    Hi @sdfungayi

    Pretty old post but.....were you able to solve this issue?
    I was having a similar issue and came across your post. I was having the error:
    dataiku.doctor.preprocessing.dataframe preprocessing.dkudroppedmultiframeexception'> : ['target'] values all empty, infinity or with unknown classes (you may need to recompute the training set)

    I sorted my issue by making sure that storage type and meaning were both discrete in my target variable.
    It was originally storage type = string , meaning = decimal
    Trying to use random forest and logistic regression.

    Once I changed my target variable to
    storage type = string , meaning = Integer
    The ML models were able to run.
    I kept storage type as string, as needed to identify the undefined records vs values (0 and 1)

    In your post you mentioned you tried also changing meanings, any success with that?
    It worked for mine, hope yours as well!!!!!

  • Hi, I want to update a sql server table (dataset) . I understand that there are scenarios option that triggers update sql querry . As this feature is not included in my licence, what are the other opt…
    Question
    Started by Lo96
    Most recent by Alexandru
    0
    1
    Last answer by
    Alexandru
    Last answer by Alexandru

    Hi @Lo96
    ,
    The SQL triggers are indeed licenses-based and typically require Bussiness or Enterprise.
    https://doc.dataiku.com/dss/latest/scenarios/triggers.html#sql-triggers
    However, SQL triggers are not for updating SQL datasets; instead, they trigger a build/scenario when an SQL query changes.
    If you want to update a SQL server table, you can just build a dataset /run a recipe that does the update.
    You can trigger time-based scenarios with a Discover license as well.


  • Hello community I was working on a repetitive rework task and it occurred to me that I could automate it with a new scenario custom python script First I wanted to confirm that it works and then start…
    Question
    Started by Lucasjulian
    Most recent by Alexandru
    0
    1
    Last answer by
    Alexandru
    Last answer by Alexandru

    Hi @Lucasjulian
    ,
    The method should work if there is something to build.
    Can you try with :
    scenario.build_dataset(dataset_name, build_mode='RECURSIVE_FORCED_BUILD')?

    If that still build nothing we may need scenario diagnostics. Can you please open a support ticket with the scenario diagnostics?
    Thanks

  • Hi all! I have fairly straightforward problem (or at least I think that it is like that :) ). I have files arriving into a Azure Blob Storage container. I created a flow to process them without a prob…
    Question
    Started by Tomasz
    Most recent by Turribeach
    0
    7
    Last answer by
    Turribeach
    Last answer by Turribeach

    New files shoud go into ./landing by the source writting process. So anything in ./landing needs to be processed.

751 - 760 of 5.2k76