-
ETL Datastage to Dataiku Migrations
Hello, We have a legacy ETL system which uses IBM DataStage jobs to perform the ETL. Is there a automated way to migrate these DataStage jobs to Dataiku flows? We can take export of DataStage jobs in JSON format but not sure if that can be leveraged to do the migration.
-
How to show Spark progress within Jupyter Notebook?
I'm used to working in Jupyter in standard AWS EC2 instances and via WSL. In both of these, PySpark displays progress while performing queries / transformations. Is there a way to get the behaviour in Dataiku's Jupyter implementation? As always, I have set "spark.ui.showConsoleProgress" to "true"; however, it does not…
-
How do I get the model to automatic re-train after the dataset is updated
When the dataset is updated (including when the input schema also changes), how do I get the model to automatically adapt to this change and learn automatically ? For now, what's happening is that since I also have tools like VIF to pre-filter the columns, when the data is updated, the columns that are filtered out will…
-
Basics 103 Tutorial - Tutorial Data Preparation... Module - get Error message
Hi, I have been working through the tutorials from Basics 101 to Basics 103, but I get stuck on the Data Preparation and Visualization in the Lab Module (Basics 103). I get the following error message when I try to build the deployed script - see attachment. Thoughts on this one? Operating system used: Windows 10
-
Dynamic Snowflake schema not working with recipe variables that override project variables
Hi, We have a Snowflake connection that allows users to setup project variables to indicate the details: { "snowflake_db": "MYDB", "snowflake_role": "MYUSER", "snowflake_wh": "MYWH", "snowflake_schema": "MYSCHEMA"} When we create a dataset in Dataiku, it looks like this: This works great, unless I override the Snowflake…
-
output in same plugin recipe
I sent the data to qlik server through Plugin recipes. Now I want to a new python recipe before my data was sent to qlik server not as a seperate recipe as shown in figure. Is it possible? Operating system used: windows
-
Writing to dataset iteratively
Got a job that is IO-bound and memory intensive and need to write the result(s) iteratively. The job is essentially parsing data from excel files, filter, aggregate, feature engineering etc. Source: 1 billion records, results 1.2 million records. I'm using python recipe with multi-threaded asyncio function on source and…
-
Git-Dataiku Integration Best practices
Hi, Am looking for best practices on working with git integrated with dataiku. I have red few limitations about merge and keeping a copy of whole project for creating branches in dataiku. Hence , am trying to reach out if somebody has already done for their project or know about it.
-
Error exploring missing partition in partitioned data set
Hi Dataiku While fixing some bugs in my flow, I ran into the following problem: - Exploring a specific partition of my data set, which has some error - Realise I don't want to include this partition anyway, so I go upstream in the flow and remove it from the list of partitions to generate - Clear and rebuild the problem…
-
Cannot create these recipes (geojoin, update, CustomCode_forward_geocoding)
Hi there, When I tried to create the following type recipes using Python API, got NoneType builder and cannot create them. How can I create these types of recipes? Thanks. builder1 = my_project.new_recipe('geojoin') # builder1 is NoneTypebuilder2 = my_project.new_recipe('update') # builder2 is NoneTypebuilder3 =…