-
How to remove scientific notation in a column
Formatting numbers can often be a tedious data cleaning task. It can be made easier with the format() function of the DSS Formula language. This function takes a "printf format string" and applies it to any value. Format strings are immensely powerful, as they allow you to truncate strings, change precision, switch between…
-
Crawl budget prediction for enhanced SEO with OnCrawl plugin
We’re pleased to share that Dataiku has published an OnCrawl plugin. At OnCrawl, we are convinced that data science, like technical SEO, is essential to strategic decision-making in forward-looking companies today. The complexity of today's markets, the sheer volume of data available affecting SEO, the growing opacity of…
-
How to copy a recipe in your Flow
Do you have recipes that you want to re-use elsewhere in a project? You can copy recipes from the Flow for use in the same project. From the Flow, click the recipe you want to copy, and a Copy action will appear in the Actions sidebar on the right. You will be asked to choose on which dataset the recipe should be applied…
-
How to segment your data using statistical quantiles
You can create statistical quantiles without code in Dataiku DSS in two ways: * The Split recipe allows you to break down each quantile into separate datasets, so it can be useful if you’re planning to separately handle a small amount of quantiles like quartiles or deciles. * The Window recipe allows you to create a new…
-
Utilizing MS Access in Dataiku DSS
Many of our users have shown interest in utilizing MS Access in Dataiku DSS. In the interest of knowledge sharing, we wanted to demonstrate how to do just that. How to open an MS Access file * Download ucanaccess * Copy ucanaccess-4.0.4.jar and jackcess-2.1.11.jar (including the ones from lib/) into DATA_DIR/lib/jdbc/ (see…
-
Which activities in DSS require that a user be added to the allowed_user_groups local Unix group?
Which activities in DSS require that a user be added to the allowed_user_groups local Unix group? When configuring the setup of the local code isolation capability of the User Isolation Framework* (formerly known as Multi-User Security), you must fill in the allowed_user_groups settings with the list of UNIX groups to…
-
How to enable auto-completion in Jupyter Notebook
Many of you have shown interest in enabling auto-completion in Jupyter Notebooks so, in the interest of knowledge sharing, we wanted to demonstrate just how simple it is. Access the Jupyter Menu You have auto-complete in Jupyter notebooks like you have in any other Jupyter environment. Simply hit the "Tab" key while…
-
How to duplicate a DSS project
In many situations, an existing project can serve as a useful template for a new project. Fortunately, it is very easy to duplicate a Dataiku DSS project so you never need to manually replicate a Flow. Whether you want to copy the project to the same or another instance, detailed instructions for duplicating a project can…
-
Dataiku community website / newest Firefox
I'm using up-to-date Mozilla Firefox but with that this website is unasable. Can't post questions or anything since links are blocking my view. Operating system used: Windows 11
-
Dataiku DSS integrates with VSCode
Although Jupyter notebooks are integrated into the Dataiku DSS interface, some developers favor writing code in an external IDE. DSS has integrations with a number of popular IDEs, including VSCode, that make it easy to manage code. The short video below demonstrates how the DSS extension allows developers to pull and push…
-
Building a Jenkins pipeline for Dataiku DSS
In this post, we will show how to set up a sample CI/CD (continuous integration / continuous deployment) pipeline built on Jenkins for our Dataiku DSS project. It follows our blog post Continuous integration and continuous deployment (CI/CD) in Dataiku DSS that presents the concepts and some important questions in order to…
-
How to use NLTK in DSS
Greetings fellow Linguists, You can start by installing NLTK (Natural Language Toolkit) as any other Python package in DSS, by creating a code environment and adding "nltk" to your package requirements. To do so, follow this documentation. However, some functionalities of NLTK such as text corpora and language-specific…
-
How to use spaCy models in DSS
Greetings fellow Linguists, To use spaCy models in DSS, you can start by installing it like any other Python package in DSS: by creating a code environment and adding "spacy" to your package requirements. To do so, follow this documentation. However, some functionalities of spaCy, such as language-specific tokenizers, rely…
-
How to pad a number with leading zeros
A common requirement when you have a column of numbers is to format all numbers so that they have the same length, adding leading zeros if needed. This can be done in the DSS preparation recipe using a Formula. The formula function to use is format. For example, to ensure that all values of the column mycolumn are padded…
-
Use a React Frontend to Create a Web App
React webapps are not natively supported by DSS, but it’s still possible to integrate a React application into Dataiku DSS with the help of DSS dev plugin and a visual webapp. In this article, I'll discuss a few ways you can do this. Quick start All of the steps below are implemented in a demo plugin found in this…
-
How to display an image with Bokeh?
This article applies both to: * Bokeh webapps * Usage of the Bokeh library in a Jupyter notebook Add your image to the "Static Web Resources" * In the global menu of DSS, select "Global Shared Code". If you don't see this menu, your administrator needs to grant you additional permissions. * Click on "Static Web Resources"…
-
Cannot display a web content insight in a dashboard
While adding a "web content" insight to a Dataiku DSS dashboard, you may see either a blank insight, or the "unhappy face" like this: An issue can be that you are trying to embed a non-secure (http://) within a secure (https://) DSS. This kind of embedding is forbidden by browsers as a security measure. Dataiku DSS cannot…
-
How to standardize text fields using fuzzy values clustering
When working with large amounts of disparate, user-entered text data, we often need to standardize or collapse entries into a resolved form. For example, how can we get a computer to recognize that strings like "Abraham Lincoln", "Abe Lincoln", and "Abrahm Lincoln" are actually the same category? We want to map these close…