Using Dataiku
- I have this error, I don't know what it means and I can't find anything about it.Last answer by Sarina
Hi @B2oriel
,
If you can open a support ticket with a job diagnostic attached, we can take a look.
Thanks,
SarinaLast answer by SarinaHi @B2oriel
,
If you can open a support ticket with a job diagnostic attached, we can take a look.
Thanks,
Sarina - Hi everyone, I'm trying to create an amount field based on some transaction values, I have used prepare recipe with this formula: if(transaction_type.match('sale', 'credit') , amount ,0) Can anyone he…Solution by Sarina
HI @obidakacem
,
Perhaps you can share an example row of your data and exactly what check you want to perform? It sounds like if the `transaction_type` column is either `sale` or `credit`, you want the column to contain `amount` and otherwise set the column value to 0.
The way match works is described here:match(string a, string or regexp) array of strings
Returns an array of the matching groups found in
<span class="pre">s</span>
. Groups are designated by () within the specified string or regular expression.<span class="pre">match('hello</span><span> </span><span class="pre">world',</span><span> </span><span class="pre">'he(.*)wo(rl)d')</span>
returns<span class="pre">["llo</span><span> </span><span class="pre">","rl"]</span>
From your description, I think you simply want to check if transaction_type is equal to sale or credit instead of returning a match. In that case, I think you could simply do:
if((transaction_type == 'sale' || transaction_type == 'credit'), amount, 0)Let me know if you have any further questions about the formula.
Thank you,
SarinaSolution by SarinaHI @obidakacem
,
Perhaps you can share an example row of your data and exactly what check you want to perform? It sounds like if the `transaction_type` column is either `sale` or `credit`, you want the column to contain `amount` and otherwise set the column value to 0.
The way match works is described here:match(string a, string or regexp) array of strings
Returns an array of the matching groups found in
<span class="pre">s</span>
. Groups are designated by () within the specified string or regular expression.<span class="pre">match('hello</span><span> </span><span class="pre">world',</span><span> </span><span class="pre">'he(.*)wo(rl)d')</span>
returns<span class="pre">["llo</span><span> </span><span class="pre">","rl"]</span>
From your description, I think you simply want to check if transaction_type is equal to sale or credit instead of returning a match. In that case, I think you could simply do:
if((transaction_type == 'sale' || transaction_type == 'credit'), amount, 0)Let me know if you have any further questions about the formula.
Thank you,
Sarina - I have a task to perform web scraping in a Dataiku notebook, and for that purpose, I need to utilize ChromeDriver. However, I'm unsure about the process of installing ChromeDriver and integrating it i…Last answer byLast answer by Alexandru
Hi @Ramya
,
So you would need your systems admin to add and install
1) chrome driver
wget https://chromedriver.storage.googleapis.com/$(curl -sS https://chromedriver.storage.googleapis.com/LATEST_RELEASE)/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/
2) Download and install chrome
sudo wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall google-chrome-stable_current_x86_64.rpm
Then you should add selenium to a code env and useimport dataiku from selenium import webdriver import time import pandas as pd # selenium stuff options = webdriver.ChromeOptions() ; prefs = {"download.default_directory" : "/tmp", "prompt_for_download": "false"}; output_dataset = dataiku.Dataset("fitness2") chromeOptions = webdriver.ChromeOptions() chromeOptions.add_argument("--headless") chromeOptions.add_argument("--download.prompt_for_download=false") chromeOptions.add_argument("--download.default_directory=/tmp") chromeOptions.add_experimental_option("prefs",prefs); driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=chromeOptions) try: driver.get('https://www.browserstack.com/test-on-the-right-mobile-devices'); downloadcsv= driver.find_element_by_css_selector('.icon-csv'); gotit= driver.find_element_by_id('accept-cookie-notification'); gotit.click(); downloadcsv.click(); time.sleep(5) driver.close() except: print("Invalid URL") driver.close() # read downloaded file and create dataset cereal_df = pd.read_csv("/tmp/BrowserStack - List of devices to test on.csv") output_dataset.write_with_schema(cereal_df)
- Hello, Dataiku Team. I'm trying to send a POST request to an endpoint I created in an API node. However, I get this error: Access to XMLHttpRequest at 'http://ip_server/public/api/v1/service_time_seri…Last answer byLast answer by rafael_rosado97
It works now.
The configuration was ok. The poblem was the CORS headers location on nginx.conf.
The path used on nginx.conf was /public-cors/, so I modified the endpoint url as http://<ip_machine>:11200/public-cors/api/v1/service_time_series_variables/autocorrelation_analysis/run
Thank you very much!!
- Hello, Dataiku Team. I am facing a problem when I want to read a dataset using dataikuapi. This is in order to read a dataset from an endpoint that is the API node. The way I read the dataset is as fo…Last answer by
- Hi, I am looking for ways to extract the exact train/test/validation sets used in visual ML. This not only implies the data splits, but also datasets that include all new features created as a result …Last answer by
- Hello there, I'm a beginner in using Dataiku (container version with Docker) so sorry if I didn't find the answer that could already be in documentation despite having searched already. I'm doing basi…Solution bySolution by bricejoosten
Self-answer as an update :
I don't know why I didn't think about this earlier but maybe this will be the cleanest thing to do : for my 22 base raw datasets in input, as I have to prepare them individually and merge them all together, I think I can divide everything into 4 zones : 3 zones of 6 datasets preparations and the last one of 4 datasets preparations.
These zones will technically concerns the cleaning part of the data and should result in 4 output datasets, which will be visually way better than 22, especially when I'll have some enrichment recipes afterwards.
And this is where I realize how subflows come handy : I would like to be able, in the end, to have four big zones in the project : cleaning, normalization, enrichment, and exploitation with IA so for instance, cleaning zone will contain all these not visually convenient zones.
I think I've pretty much well fixed my problem the most optimal way, with the existing solutions on the moment. Don't hesitate to tell me if there are some problems to my logical way of thinking or if it will bring problems on terms , suggesting me things or whatever. Thanks for the help. - hello i have a list containing items as data frame type i want to make an output to this output as text file or json file if any one can help me pleaseLast answer byLast answer by Turribeach
The following Python code in a Python recipe will output the contents of the df dataframe to a CSV file:
import dataiku path_upload_file = "output.csv" handle = dataiku.Folder("Some Managed Folder") with handle.get_writer(path_upload_file) as w: w.write(df.to_csv().encode('utf-8'))
- Hi, I imported a dataset from a SharePoint. I have not added any cleaning recipe yet. I added an export recipe to spit out the data to Tableau server. Everything works when I run / build the entire fl…Last answer by
Top Tags
Trending Discussions
- Answered2
- Answered ✓7
Leaderboard
Member | Points |
Turribeach | 3702 |
tgb417 | 2515 |
Ignacio_Toledo | 1082 |