Using Dataiku

Sort by:
61 - 70 of 517
  • I have this error, I don't know what it means and I can't find anything about it.
    Question
    Started by B2oriel
    Most recent by Sarina
    0
    1
    Sarina
    Last answer by Sarina

    Hi @B2oriel
    ,

    If you can open a support ticket with a job diagnostic attached, we can take a look.

    Thanks,
    Sarina

    Sarina
    Last answer by Sarina

    Hi @B2oriel
    ,

    If you can open a support ticket with a job diagnostic attached, we can take a look.

    Thanks,
    Sarina

  • Hi everyone, I'm trying to create an amount field based on some transaction values, I have used prepare recipe with this formula: if(transaction_type.match('sale', 'credit') , amount ,0) Can anyone he…
    Answered ✓
    Started by obidakacem
    Most recent by Sarina
    0
    1
    Sarina
    Solution by Sarina

    HI @obidakacem
    ,

    Perhaps you can share an example row of your data and exactly what check you want to perform? It sounds like if the `transaction_type` column is either `sale` or `credit`, you want the column to contain `amount` and otherwise set the column value to 0.

    The way match works is described here:

    match(string a, string or regexp) array of strings

    Returns an array of the matching groups found in <span class="pre">s</span>. Groups are designated by () within the specified string or regular expression.

    <span class="pre">match('hello</span><span> </span><span class="pre">world',</span><span> </span><span class="pre">'he(.*)wo(rl)d')</span> returns <span class="pre">["llo</span><span> </span><span class="pre">","rl"]</span>

    From your description, I think you simply want to check if transaction_type is equal to sale or credit instead of returning a match. In that case, I think you could simply do:

    if((transaction_type == 'sale' || transaction_type == 'credit'), amount, 0)

    Let me know if you have any further questions about the formula.

    Thank you,
    Sarina

    Sarina
    Solution by Sarina

    HI @obidakacem
    ,

    Perhaps you can share an example row of your data and exactly what check you want to perform? It sounds like if the `transaction_type` column is either `sale` or `credit`, you want the column to contain `amount` and otherwise set the column value to 0.

    The way match works is described here:

    match(string a, string or regexp) array of strings

    Returns an array of the matching groups found in <span class="pre">s</span>. Groups are designated by () within the specified string or regular expression.

    <span class="pre">match('hello</span><span> </span><span class="pre">world',</span><span> </span><span class="pre">'he(.*)wo(rl)d')</span> returns <span class="pre">["llo</span><span> </span><span class="pre">","rl"]</span>

    From your description, I think you simply want to check if transaction_type is equal to sale or credit instead of returning a match. In that case, I think you could simply do:

    if((transaction_type == 'sale' || transaction_type == 'credit'), amount, 0)

    Let me know if you have any further questions about the formula.

    Thank you,
    Sarina

  • I have a task to perform web scraping in a Dataiku notebook, and for that purpose, I need to utilize ChromeDriver. However, I'm unsure about the process of installing ChromeDriver and integrating it i…
    Question
    Started by Ramya
    Most recent by Alexandru
    0
    1
    Last answer by
    Alexandru
    Last answer by Alexandru

    Hi @Ramya
    ,

    So you would need your systems admin to add and install
    1) chrome driver
    wget https://chromedriver.storage.googleapis.com/$(curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE)/chromedriver_linux64.zip
    unzip chromedriver_linux64.zip
    sudo mv chromedriver /usr/local/bin/

    2) Download and install chrome
    sudo wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
    sudo yum localinstall google-chrome-stable_current_x86_64.rpm

    Then you should add selenium to a code env and use

    import dataiku
    from selenium import webdriver
    import time
    import pandas as pd
    
    # selenium stuff 
    options = webdriver.ChromeOptions() ;
    prefs = {"download.default_directory" : "/tmp", "prompt_for_download": "false"};
    output_dataset = dataiku.Dataset("fitness2")
    chromeOptions = webdriver.ChromeOptions()
    chromeOptions.add_argument("--headless")
    chromeOptions.add_argument("--download.prompt_for_download=false")
    chromeOptions.add_argument("--download.default_directory=/tmp")
    chromeOptions.add_experimental_option("prefs",prefs);
    driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=chromeOptions)
    
    try:
    
        driver.get('https://www.browserstack.com/test-on-the-right-mobile-devices');
        downloadcsv= driver.find_element_by_css_selector('.icon-csv');
        gotit= driver.find_element_by_id('accept-cookie-notification');
        gotit.click();    
        downloadcsv.click();
        time.sleep(5)
        driver.close()
    
    except:
         print("Invalid URL")
         driver.close()
            
    # read downloaded file and create dataset
    cereal_df = pd.read_csv("/tmp/BrowserStack - List of devices to test on.csv")
    
    output_dataset.write_with_schema(cereal_df)

  • Hello, Dataiku Team. I'm trying to send a POST request to an endpoint I created in an API node. However, I get this error: Access to XMLHttpRequest at 'http://ip_server/public/api/v1/service_time_seri…
    Question
    Started by rafael_rosado97
    Most recent by rafael_rosado97
    0
    10
    Last answer by
    rafael_rosado97
    Last answer by rafael_rosado97

    It works now.

    The configuration was ok. The poblem was the CORS headers location on nginx.conf.

    The path used on nginx.conf was /public-cors/, so I modified the endpoint url as http://<ip_machine>:11200/public-cors/api/v1/service_time_series_variables/autocorrelation_analysis/run

    Thank you very much!!

  • Hello, Dataiku Team. I am facing a problem when I want to read a dataset using dataikuapi. This is in order to read a dataset from an endpoint that is the API node. The way I read the dataset is as fo…
    Question
    Started by rafael_rosado97
    Most recent by Turribeach
    0
    1
    Last answer by
    Turribeach
    Last answer by Turribeach

    Did you try using the Internal API?

  • Hi, I am looking for ways to extract the exact train/test/validation sets used in visual ML. This not only implies the data splits, but also datasets that include all new features created as a result …
    Question
    Started by yashpuranik
    Most recent by Samruda
    2
    4
    Last answer by
    Samruda
    Last answer by Samruda

    I see the option to export models under auto ml prediction but not under auto ml clustering sessions. Is that part of the design? Is there a way to keep track of datasets used in building the clustering models?

  • Hello there, I'm a beginner in using Dataiku (container version with Docker) so sorry if I didn't find the answer that could already be in documentation despite having searched already. I'm doing basi…
    Answered ✓
    Started by bricejoosten
    Most recent by bricejoosten
    0
    7
    Solution by
    bricejoosten
    Solution by bricejoosten

    Self-answer as an update :

    I don't know why I didn't think about this earlier but maybe this will be the cleanest thing to do : for my 22 base raw datasets in input, as I have to prepare them individually and merge them all together, I think I can divide everything into 4 zones : 3 zones of 6 datasets preparations and the last one of 4 datasets preparations.

    These zones will technically concerns the cleaning part of the data and should result in 4 output datasets, which will be visually way better than 22, especially when I'll have some enrichment recipes afterwards.

    And this is where I realize how subflows come handy : I would like to be able, in the end, to have four big zones in the project : cleaning, normalization, enrichment, and exploitation with IA so for instance, cleaning zone will contain all these not visually convenient zones.

    I think I've pretty much well fixed my problem the most optimal way, with the existing solutions on the moment. Don't hesitate to tell me if there are some problems to my logical way of thinking or if it will bring problems on terms , suggesting me things or whatever. Thanks for the help.

  • hello i have a list containing items as data frame type i want to make an output to this output as text file or json file if any one can help me please
    Question
    Started by tounsi
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    The following Python code in a Python recipe will output the contents of the df dataframe to a CSV file:

    import dataiku
    
    path_upload_file = "output.csv"
    handle = dataiku.Folder("Some Managed Folder")
    
    with handle.get_writer(path_upload_file) as w:
        w.write(df.to_csv().encode('utf-8'))
  • Hi, I imported a dataset from a SharePoint. I have not added any cleaning recipe yet. I added an export recipe to spit out the data to Tableau server. Everything works when I run / build the entire fl…
    Question
    Started by Jeancarlos
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    Correct.

  • In a list of accounts, I have three probabilities that give the likelihood of each account being in one of three groups: Low, Medium, and High. For a given account, they might have a 30% probability o…
    Question
    Started by COREY
    0
61 - 70 of 5177