-
RAG LLM for multiple datasets
Greetings, While working with the embedding recipe, we faced a limitation where we have two datasets, we want to apply the rag on, how can we apply the knowledge bank on them specifically? Regards
-
Looking to replicate a SUM(COUNTIF) formula in Dataiku
I am working on a scorecard in Dataiku and I would like to calculate the percentage of completion in a set number of columns. Basically, I would like to replicate this formula in excel: =SUM(COUNTIF(ColumnX:ColumnXX,"*")/Total Number of Columns) and am having issues. The columns are a mix of strings, integers, and text,…
-
How to append dataframe in existing output dataset
Hello experts, In dataiku v12.3.0, I was trying to append dataframe using write_dataframe() in existing dataset (with same schema). But it always overwrites with last dataframe even though the dataset spec is configured like: dataset.spec_item["appendMode"] = True The dataset is classified as output so it doesn't let me…
-
How can I replace a dataset created from a csv?
I have uploaded a CSV and stored it in the filesystem_folders. I have built several recipes from this dataset. I have now received an updated version of the CSV, but cannot figure out how to upload it and overwrite the original dataset. It seems to require I create a new dataset. If I do create a new dataset, there doesn't…
-
How to implement a feedback loop on a dataset ?
Hello, Each month, I have to compute a dataset that takes the previous month's dataset (M-1) and add some stuff in it. I wonder how I could to it in Dataiku as for the recipe, I should take the last output dataset (M-1) as the input. I don't think it is currently possible to produce a feedback-loop in Dataiku: do you…
-
Idea: Include Associated Objects When Duplicating Dataset / Flows
Hello When duplicating parts of a flow in Dataiku, the associated datasets are duplicated, but the developed charts linked to those datasets are not included. This means that users have to manually recreate or copy these charts, which can be time-consuming and prone to errors. Benefits Including the duplicate feature for…
-
How do i create a categorisation model for a reviews dataset
Hi there - new to dataiku, Lets say i have an excel sheet of 2 columns where one has app reviews and the other has dates they were posted. Is there a video tutorial anywhere or example where i can create a model to categorise the app reviews into categories eg) ux/ui problem or customer service problem as well as include…
-
Window recipe not producing expected results when using DSS engine
Hi there, The issue I am having is that the DSS engine is producing a completely different result than when I use the SQL engine. Has anyone faced a similar issue? I would appreciate some insight on this. Basically, all I want to do is produce a columns with the MAX() value inferred from another column. No partitions, no…
-
How to output to / update my snowflake table using Dataiku
I have a snowflake table and I've set up the connection and everything looks good, Dataiku requires me to create a dataset using that snowflake table that I can use as my input / output. The issue is I have that dataset as my output and when I run my flow, I can see my results, but it isn't actually outputting to my…
-
How to correctly do time conversions
I have a column that has been parsed and is in UTC, when I try to format the date to be in eastern / New York time I get a new column that is -5 hours, but isn't the current the current difference -4 hours? I'm sure this has something to do with daylight savings time vs normal time, but I just want to ensure that my…