-
add an incremental column in dataset
requirement is to add an incremental column in datset, it should not be an identity column however data in it will be unique.
-
Is there a simple way to reapply flow/recipes to a second dataset?
Hi I created a flow sequencing recipes for a first dataset. The goal is to create a prediction model at the end of the flow. Next I need to apply my model to an input dataset that has the same schema as my first dataset. I could not figure out how to apply the whole flow to the 2nd dataset and I had to copy the flow steps…
-
Flow in Dataiku
Hi, when I create a flow, for example taking records from table A -> do some blending -> group by -> implementing a model I see that each step in the process is crating a recipe. Does this mean that each step saves the data to - csv if we read from csv the data / sql table if we read from sql DB? or the process is InMemory…
-
Version of Dataiku
Hi All, What is the difference between the Free trial and the license enterprise? Thanks, Kim13
-
Data quality : Monitoring on datasets processing
Hi, I'm asking about how DSS monitors issues during datasets processing. I see two kinds of potential issues: * Volume : Inconsistant number of records in a dataset (eg : I expected at least 1k records per day for my "webtraffic" dataset) * Schema / values: One or more rows have fields that don't respect the defined schema…
-
Running jobs in parallel
Hi everyone, I have a recipe that triggers many jobs, which takes a long time to complete, running one after the other. How could I get these jobs to run in parallel in DSS? Is this a simple thing to do? Ben
-
How to deploy a model to flow?
I have create a model, but cannot see it in the flow diagram. In the document of build the first model, it said that I need to deploy the model to the flow after training. But I cannot see any button to deploy the model. https://www.dataiku.com/learn/guide/visual/machine-learning/deep-learning-first.html
-
Lots of files owned by Dataiku under /tmp
We've got filesystem full issues due to large files created by Dataiku under the /tmp directory (the machine is a Linux). How is it possible ? Is there any process which writes under that directory ) If yes, how to prevent this ? Thanks. Rgds. Giuseppe
-
Can a Recipe in a Flow be used to help 'backup' the data to assist with reruns?
Hi I'm wondering if there is a way to use something like a Sync or Export recipe to help provide a kind off roll back / undo facility for an output dataset? This function although applied to data at the end of a flow would need to run first rather than last in the flow. ie 1.Step 1 - take a copy of the output data from the…
-
How can I copy a recipe to different project?
I cannot find the option to copy a recipe to other projects. The projects have different data set, but the process and model closely same, just change some parameters is enough. Moreover, is it possible to group recipes to be a sub process? my flow so long processes. It should be better if I can group them together. Also,…