Join us on July 16th as we explore real-world Reinforcement Learning Learn more

Your favorite Dataiku DSS tips & tricks?

Community Manager
Community Manager
Your favorite Dataiku DSS tips & tricks?

Hi there,

What are some of your favorite tips & tricks in Dataiku DSS that make your life easier?

Sharing them here would surely be valuable to all new members who are joining us, and we might also have a special interest in handpicking a few of them for a dedicated newsletter - stay tuned 😉

Thanks in advance for your input!

Lisa, Community Programs Manager at Dataiku

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question 😉
8 Replies
Dataiker
Dataiker

Use selection of some characters in a cell during a prepare recipe to add a step to split a column or to replace some strings! 

Mattsco

Mattsco
Level 3

Create a dataset metric that counts rows with duplicate keys in a SQL table by creating a SQL Probe (at the bottom of the metric editing page) like the following:


SELECT COUNT(*) AS "Duplicate Key Value Count" FROM (SELECT REC_KEY FROM ${DKU_DATASET_TABLE_NAME} GROUP BY 1 HAVING COUNT(*) > 1) T;

Of course one can then create a check on the metric that fails (or warns) a dataset rebuild if value > 0.

Level 3

Execute SQL data modification statements from a Python recipe or notebook. You might want to clear a table prior to inserting rows, delete specified rows, create or drop a table, etc.

Here is example code:

from dataiku.core.sql import SQLExecutor2

sqlexecutor = SQLExecutor2(connection='sql_database_connection_name')
main_query = "SELECT COUNT(*) AS REC_CNT FROM YOURTABLE" # or "SELECT 1" if don't want to check results
pre_queries = ["TRUNCATE TABLE YOURTABLE; COMMIT"] # A list is expected but ok if all of SQL is in one list item; a COMMIT seems to be required in the couple of databases I use

result_df = sqlexecutor.query_to_df(main_query, pre_queries=pre_queries)

assert (result_df['REC_CNT'].iloc[0] == 0), "Truncate of table YOURTABLE failed (record count > 0)"

Hi Marlan

 

Can we run instert statements using SQLExecutor2

0 Kudos

Hi @rsingamsetty, you are interested in running SQL insert statements? I haven't tried SQL insert statements specifically but the technique I described should work for any SQL statement so I'd expect that this would work.

Marlan

Dataiker
Dataiker

Hard to choose one so I'll list a few excellent DSS tricks: 

- drag and drop multiple files in upload box to stack at import

- after training models in the Lab, export the code to Jupyter notebook

- use the View button on the bottom left corner of the Flow to cycle between flow views (tags, connections, recipe engines...)

-  use the select tool (shift + click) to select multiple recipes or copy a part of your flow

- hit Space in the Flow to cycle between right panel tabs

DSS Fanboy
Level 1

If you are dealing with multiple source data and later if you have to merge or stack ADS coming from the all source datasets then for the sake of convenience, make sure the storage type of the respective  columns of all the datasets  are same.

Level 2

I really like making R Shiny applications. I also really like developing these apps in RStudio (one of my favorite IDEs of all time). There are certain areas of my code base that must be different if it's executing on DSS or if it's executing in my local RStudio environment (reading in data is the perfect example). I've found a quick workaround for this by testing if I'm able to load the `dataiku` library or not within R. I haven't tried it yet, but I assume the same type of logic would work in python (might require the importlib standard library?).

This has definitely accelerated the R Shiny development process in DSS. I'm not saying copy/paste is a good deployment strategy, but it's much better than copy, paste, then edit all of the local-environment-specific code to fit DSS. 

 

# R:
# Determine if we're in DSS or not - this dictates how/where we read data from
libload_attempt_dataiku <- tryCatch(expr={library(dataiku)}, error = function(e) e)
in_DSS <- ifelse(!"packageNotFoundError" %in% class(libload_attempt_dataiku), yes=TRUE, no=FALSE)

if (in_DSS) {

# DSS Processing ===========================
df <- dkuReadDataset("<name of my dataset>")

} else {

# Local Processing =======================
df <- read.csv("<name of dataset>.csv", stringsAsFactors = F)
}

# ...application code...