<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Visual recipes — Dataiku Community</title>
        <link>https://community.dataiku.com/</link>
        <pubDate>Wed, 17 Jun 2026 03:10:37 +0000</pubDate>
        <language>en</language>
            <description>Visual recipes — Dataiku Community</description>
    <atom:link href="https://community.dataiku.com/discussions/tagged/p8/feed.rss" rel="self" type="application/rss+xml"/>
    <item>
        <title>Full outer join</title>
        <link>https://community.dataiku.com/discussion/732/full-outer-join</link>
        <pubDate>Thu, 10 Mar 2016 23:27:25 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator></dc:creator>
        <guid isPermaLink="false">732@/discussions</guid>
        <description><![CDATA[Is there a way to do a full outer join between to datasets stored in the DSS memory (so basically made by recipes or analyses) ?]]>
        </description>
    </item>
    <item>
        <title>Censored Regression</title>
        <link>https://community.dataiku.com/discussion/45511/censored-regression</link>
        <pubDate>Wed, 04 Mar 2026 21:12:58 +0000</pubDate>
        <category>Product Ideas</category>
        <dc:creator>Nick_Geitner</dc:creator>
        <guid isPermaLink="false">45511@/discussions</guid>
        <description><![CDATA[<p>It is often the case that modelers encounter censored data, or data that falls &gt;x or &lt;y. In these cases,  there are some typical approaches to address the challenge of building a regression model, but currently these are not available in Visual ML. As such, Interval-censored regression, Tobit regression, or censored regression with lower/upper bounds would be hugely helpful</p>]]>
        </description>
    </item>
    <item>
        <title>Select multiple column</title>
        <link>https://community.dataiku.com/discussion/45495/select-multiple-column</link>
        <pubDate>Fri, 20 Feb 2026 07:32:46 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Axel_ULLERN</dc:creator>
        <guid isPermaLink="false">45495@/discussions</guid>
        <description><![CDATA[<p>Hello Is it possible to select multiple columns in order to remove them ? , i tried the remove/keep recipe which allows to select column one by one but if i have 100 to select and if they are contiguous a selection of the first and last would be easy like in Excel (or providing first col name, last col name to select all of them ) , does someone has a way to do it in Dataiku ? </p><p>(As there is no obvious pattern in the col names i can't use the pattern in recipe remove col )</p><p>Many Thanks</p><p>Axel </p>]]>
        </description>
    </item>
    <item>
        <title>How to stack columns from one dataset</title>
        <link>https://community.dataiku.com/discussion/45468/how-to-stack-columns-from-one-dataset</link>
        <pubDate>Mon, 02 Feb 2026 23:11:36 +0000</pubDate>
        <category>General</category>
        <dc:creator>Theo_from_EPSI</dc:creator>
        <guid isPermaLink="false">45468@/discussions</guid>
        <description><![CDATA[<p>Hi,<br />
Here is a simplified schema of a basic dataset structure I need to reshape:</p><div><table><colgroup><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /></colgroup><tr><th><p>firstname</p></th><th><p>name</p></th><th><p>vote</p></th><th><p>col4</p></th><th><p>col5</p></th><th><p>col6</p></th><th><p>col7</p></th><th><p>col8</p></th><th><p>col9</p></th><th><p>etc..</p></th></tr><tr><td><p>ARTHAUD</p></td><td><p>Nathalie</p></td><td><p>5</p></td><td><p>ARMAND</p></td><td><p>Thierry</p></td><td><p>9</p></td><td><p>ARNAUD</p></td><td><p>Bernard</p></td><td><p>6</p></td><td><p>etc..</p></td></tr><tr><td><p>ARTHAUD</p></td><td><p>Nathalie</p></td><td><p>7</p></td><td><p>ARMAND</p></td><td><p>Thierry</p></td><td><p>3</p></td><td><p>ARNAUD</p></td><td><p>Bernard</p></td><td><p>8</p></td><td><p>etc..</p></td></tr></table></div><p>The number of columns in this is variable but it will always be a multiple of 3.</p><p>The expected output:</p><div><table><colgroup><col /><col /><col /><col /></colgroup><tr><th><p>firstname</p></th><th><p>name</p></th><th><p>vote</p></th></tr><tr><td><p>ARTHAUD</p></td><td><p>Nathalie</p></td><td><p>5</p></td></tr><tr><td><p>ARTHAUD</p></td><td><p>Nathalie</p></td><td><p>7</p></td></tr><tr><td><p>ARMAND</p></td><td><p>Thierry</p></td><td><p>9</p></td></tr><tr><td><p>ARMAND</p></td><td><p>Thierry</p></td><td><p>3</p></td></tr><tr><td><p>ARNAUD</p></td><td><p>Bernard</p></td><td><p>6</p></td></tr><tr><td><p>ARNAUD</p></td><td><p>Bernard</p></td><td><p>8</p></td></tr></table></div><p>Feel free to suggest a solution in Python, <strong>but I would prefer visual recipes</strong>.</p><p>Dataiku version used: <strong>14.3.3</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Python script to visual recipes conversion</title>
        <link>https://community.dataiku.com/discussion/45455/python-script-to-visual-recipes-conversion</link>
        <pubDate>Fri, 23 Jan 2026 15:54:18 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Amruta</dc:creator>
        <guid isPermaLink="false">45455@/discussions</guid>
        <description><![CDATA[<p>I have an existing Python script that needs to be converted into Dataiku visual recipes. Is there any supported or automated way in Dataiku to generate visual recipes from Python code, or does this need to be done manually? </p>]]>
        </description>
    </item>
    <item>
        <title>Does DSS have a recipe for imbalanced sample? Like SMOTE?</title>
        <link>https://community.dataiku.com/discussion/3713/does-dss-have-a-recipe-for-imbalanced-sample-like-smote</link>
        <pubDate>Mon, 12 Aug 2019 09:24:35 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Frank</dc:creator>
        <guid isPermaLink="false">3713@/discussions</guid>
        <description><![CDATA[]]>
        </description>
    </item>
    <item>
        <title>Sync Recipe from Redshift to Oracle RDS</title>
        <link>https://community.dataiku.com/discussion/45103/sync-recipe-from-redshift-to-oracle-rds</link>
        <pubDate>Thu, 03 Jul 2025 18:05:08 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>SHughes_BAE</dc:creator>
        <guid isPermaLink="false">45103@/discussions</guid>
        <description><![CDATA[<p>I am trying to replicate a table in Redshift to a table in Oracle RDS using a sync recipe. I am getting the correct number of records created in the target Oracle RDS table, but all of the fields are empty (null).</p><p>Operating system used: <strong>Linux</strong></p><p>Operating system used: <strong>Linux</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Select Columns Outside of Join Recipe</title>
        <link>https://community.dataiku.com/discussion/44714/select-columns-outside-of-join-recipe</link>
        <pubDate>Mon, 10 Feb 2025 22:52:28 +0000</pubDate>
        <category>Product Ideas</category>
        <dc:creator>Laurie</dc:creator>
        <guid isPermaLink="false">44714@/discussions</guid>
        <description><![CDATA[<p>I would like to be able to select the columns of data outside of a join recipe.  A couple of examples:</p><p>1 - Usage of "unmatched rows".  The column selection occurs after the join does not apply to data that isn't joined.  In this case I am using both sets of data so need the option to select columns from both sets.</p><p>2 - Removal of unneeded/unwanted columns after filtering.  This is especially important when using sensitive HR data.</p><p>This enhances the automation of processing the data vs. adding to it by doing cleanup after the project has completed running.  It also allows me to confirm that I have sensitive data removed before sharing the data with others rather than relying on a manual process to remove it.</p>]]>
        </description>
    </item>
    <item>
        <title>Push to editable recipe</title>
        <link>https://community.dataiku.com/discussion/1367/push-to-editable-recipe</link>
        <pubDate>Sat, 22 Apr 2017 15:30:54 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>UserBird</dc:creator>
        <guid isPermaLink="false">1367@/discussions</guid>
        <description><![CDATA[Hello, Could you take an example to use "Push to editable" recipe? It seems like group or windows.. What exactly is it used for?]]>
        </description>
    </item>
    <item>
        <title>Oops: an unexpected error occurred java.lang.IllegalStateException: Expected a double but was BEGIN</title>
        <link>https://community.dataiku.com/discussion/44049/oops-an-unexpected-error-occurred-java-lang-illegalstateexception-expected-a-double-but-was-begin</link>
        <pubDate>Wed, 31 Jul 2024 14:48:26 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>WeiDU_Geodis3306</dc:creator>
        <guid isPermaLink="false">44049@/discussions</guid>
        <description><![CDATA[<p>Hi, </p><p>I am working on the project "Advanced Designer Assessment" </p><p>after modified Prepare recipe to add column "qualifies", when i open dataset "Online_Retail_Prepared", i got this error message. </p><h4 data-id="oops-an-unexpected-error-occurred"> Oops: an unexpected error occurred</h4><h5 data-id="java-lang-illegalstateexception-expected-a-double-but-was-begin-array-at-line-377-column-21-path-charts-0-def-scatterzoomoptions-scale-caused-by-illegalstateexception-expected-a-double-but-was-begin-array-at-line-377-column-21-path-charts-0-def-scatterzoomoptions-scale">java.lang.IllegalStateException: Expected a double but was BEGIN_ARRAY at line 377 column 21 path $.charts[0].def.scatterZoomOptions.scale, caused by: IllegalStateException: Expected a double but was BEGIN_ARRAY at line 377 column 21 path $.charts[0].def.scatterZoomOptions.scale</h5><p>would you please help me to figure it out? </p><p>thanks in advance.</p>]]>
        </description>
    </item>
    <item>
        <title>Option to rearrange output columns in join recipe</title>
        <link>https://community.dataiku.com/discussion/44413/option-to-rearrange-output-columns-in-join-recipe</link>
        <pubDate>Wed, 30 Oct 2024 08:05:15 +0000</pubDate>
        <category>Product Ideas</category>
        <dc:creator>Antal</dc:creator>
        <guid isPermaLink="false">44413@/discussions</guid>
        <description><![CDATA[<p>I would like to have the option to rearrange output columns in the join recipe.</p><p>Perhaps by making the 'hamburger' icons on the Output panel draggable.</p><span data-embedjson="{&quot;url&quot;:&quot;https:\/\/us.v-cdn.net\/6038231\/uploads\/I8NRQLVOW15X\/capture-png.png&quot;,&quot;name&quot;:&quot;Capture.PNG&quot;,&quot;type&quot;:&quot;image\/png&quot;,&quot;size&quot;:48773,&quot;width&quot;:523,&quot;height&quot;:1119,&quot;displaySize&quot;:&quot;large&quot;,&quot;float&quot;:&quot;none&quot;,&quot;mediaID&quot;:3548,&quot;dateInserted&quot;:&quot;2024-10-30T08:04:01+00:00&quot;,&quot;insertUserID&quot;:8301,&quot;foreignType&quot;:&quot;embed&quot;,&quot;foreignID&quot;:&quot;8301&quot;,&quot;embedType&quot;:&quot;image&quot;,&quot;embedStyle&quot;:&quot;rich_embed_card&quot;}">
    <span>
        <a href="https://community.dataiku.com/home/leaving?allowTrusted=1&amp;target=https%3A%2F%2Fus.v-cdn.net%2F6038231%2Fuploads%2FI8NRQLVOW15X%2Fcapture-png.png" rel="nofollow noopener ugc" target="_blank">
            <img src="https://us.v-cdn.net/6038231/uploads/I8NRQLVOW15X/capture-png.png" alt="Capture.PNG" height="1119" width="523" data-display-size="large" data-float="none" data-type="image/png" data-embed-type="image" srcset="https://us.v-cdn.net/cdn-cgi/image/quality=80, format=auto, fit=scale-down, height=300, width=300/6038231/uploads/I8NRQLVOW15X/capture-png.png 300w, https://us.v-cdn.net/cdn-cgi/image/quality=80, format=auto, fit=scale-down, height=600, width=600/6038231/uploads/I8NRQLVOW15X/capture-png.png 600w, https://us.v-cdn.net/cdn-cgi/image/quality=80, format=auto, fit=scale-down, height=800, width=800/6038231/uploads/I8NRQLVOW15X/capture-png.png 800w, https://us.v-cdn.net/cdn-cgi/image/quality=80, format=auto, fit=scale-down, height=1200, width=1200/6038231/uploads/I8NRQLVOW15X/capture-png.png 1200w, https://us.v-cdn.net/cdn-cgi/image/quality=80, format=auto, fit=scale-down, height=1600, width=1600/6038231/uploads/I8NRQLVOW15X/capture-png.png 1600w, https://us.v-cdn.net/cdn-cgi/image/quality=80, format=auto, fit=scale-down, height=2000, width=2000/6038231/uploads/I8NRQLVOW15X/capture-png.png 2000w, https://us.v-cdn.net/6038231/uploads/I8NRQLVOW15X/capture-png.png" sizes="100vw" /></a>
    </span>
</span>
]]>
        </description>
    </item>
    <item>
        <title>RAG LLM for multiple datasets</title>
        <link>https://community.dataiku.com/discussion/44076/rag-llm-for-multiple-datasets</link>
        <pubDate>Tue, 06 Aug 2024 09:50:54 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Zidan</dc:creator>
        <guid isPermaLink="false">44076@/discussions</guid>
        <description><![CDATA[<p>Greetings,</p><p></p><p>While working with the embedding recipe, we faced a limitation where we have two datasets, we want to apply the rag on, how can we apply the knowledge bank on them specifically? </p><p></p><p>Regards</p>]]>
        </description>
    </item>
    <item>
        <title>How can I replace a dataset created from a csv?</title>
        <link>https://community.dataiku.com/discussion/2317/how-can-i-replace-a-dataset-created-from-a-csv</link>
        <pubDate>Fri, 09 Mar 2018 03:20:23 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Amber_Beasock_Z</dc:creator>
        <guid isPermaLink="false">2317@/discussions</guid>
        <description><![CDATA[I have uploaded a CSV and stored it in the filesystem_folders. I have built several recipes from this dataset. I have now received an updated version of the CSV, but cannot figure out how to upload it and overwrite the original dataset. It seems to require I create a new dataset. If I do create a new dataset, there doesn't seem to be a way to disconnect the current recipe flow from the old dataset and connect it to the new dataset.]]>
        </description>
    </item>
    <item>
        <title>Window recipe not producing expected results when using DSS engine</title>
        <link>https://community.dataiku.com/discussion/43776/window-recipe-not-producing-expected-results-when-using-dss-engine</link>
        <pubDate>Fri, 14 Jun 2024 17:49:37 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>KKhatib</dc:creator>
        <guid isPermaLink="false">43776@/discussions</guid>
        <description><![CDATA[<p>Hi there,</p><p>The issue I am having is that the DSS engine is producing a completely different result than when I use the SQL engine. Has anyone faced a similar issue? I would appreciate some insight on this.</p><p>Basically, all I want to do is produce a columns with the MAX() value inferred from another column. No partitions, no order bys, simple enough? At least that's what I thought. </p><p>It looks like DSS is Ordering By a hidden index on its own and then creating a Window Frame that takes the current row and all preceding rows. Here is an example table to show you what is supposed to happen and what is in fact happening:</p><p>Supposed to happen: (This is what is happening in SQL (In-database) engine)</p><table border="1" style="width: 100%;"><tbody><tr><td style="width: 50%; height: 25px;">Salary</td><td style="width: 50%; height: 25px;">Max(Salary)</td></tr><tr><td style="width: 50%; height: 25px;">22000</td><td style="width: 50%; height: 25px;">25000</td></tr><tr><td style="width: 50%; height: 25px;">23000</td><td style="width: 50%; height: 25px;">25000</td></tr><tr><td style="width: 50%; height: 25px;">24000</td><td style="width: 50%; height: 25px;">25000</td></tr><tr><td style="width: 50%; height: 25px;">25000</td><td style="width: 50%; height: 25px;">25000</td></tr></tbody></table><p> </p><p>What is in fact happening: (This is what is happening in DSS Engine)</p><table border="1" style="width: 100%;"><tbody><tr><td style="width: 50%; height: 25px;">Salary</td><td style="width: 50%; height: 25px;">Max(Salary)</td></tr><tr><td style="width: 50%; height: 25px;">22000</td><td style="width: 50%; height: 25px;">22000</td></tr><tr><td style="width: 50%; height: 25px;">23000</td><td style="width: 50%; height: 25px;">23000</td></tr><tr><td style="width: 50%; height: 25px;">24000</td><td style="width: 50%; height: 25px;">24000</td></tr><tr><td style="width: 50%; height: 25px;">25000</td><td style="width: 50%; height: 25px;">25000<hr />Operating system used: <strong>Windows</strong></td></tr></tbody></table>]]>
        </description>
    </item>
    <item>
        <title>How to output to / update my snowflake table using Dataiku</title>
        <link>https://community.dataiku.com/discussion/43666/how-to-output-to-update-my-snowflake-table-using-dataiku</link>
        <pubDate>Tue, 11 Jun 2024 18:14:13 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>abalo006</dc:creator>
        <guid isPermaLink="false">43666@/discussions</guid>
        <description><![CDATA[<p>I have a snowflake table and I've set up the connection and everything looks good, Dataiku requires me to create a dataset using that snowflake table that I can use as my input / output. The issue is I have that dataset as my output and when I run my flow, I can see my results, but it isn't actually outputting to my snowflake table directly.</p><p>After running my flow, my snowflake table is still empty, I thought the whole point of creating a connection using a data table was to be able to read / write to that table?</p><p>Am I understanding this wrong? is there any way I can set up my flow so that my results are outputting to my snowflake table / connection and actually writing to the table?</p><hr /><p>Operating system used: <strong>windows</strong></p>]]>
        </description>
    </item>
    <item>
        <title>How to correctly do time conversions</title>
        <link>https://community.dataiku.com/discussion/43625/how-to-correctly-do-time-conversions</link>
        <pubDate>Mon, 10 Jun 2024 16:23:13 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>abalo006</dc:creator>
        <guid isPermaLink="false">43625@/discussions</guid>
        <description><![CDATA[<p><img width="313" height="246" alt="TIME.PNG" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/10053iC96191B406CAE10C.png" style="text-align: inline;" /></p><p>I have a column that has been parsed and is in UTC, when I try to format the date to be in eastern / New York time I get a new column that is -5 hours, but isn't the current the current difference -4 hours? I'm sure this has something to do with daylight savings time vs normal time, but I just want to ensure that my formula remains working even when times are changed by an hour due to daylight savings.</p><p>does anybody know how I can get the correct time difference from UTC to ET?</p><p> </p><p>I've attached the steps I'm currently using below</p><p><img width="289" height="440" alt="step1.PNG" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/10055i66FB3B9A19AD2A68.png" style="text-align: inline;" /><img width="287" height="411" alt="step2.PNG" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/10054iB60835C917BD4847.png" style="text-align: inline;" /></p><hr /><p>Operating system used: <strong>windows</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Trigger on Dataset Modified for Partitioned Dataset</title>
        <link>https://community.dataiku.com/discussion/43590/trigger-on-dataset-modified-for-partitioned-dataset</link>
        <pubDate>Mon, 10 Jun 2024 03:28:09 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Satish</dc:creator>
        <guid isPermaLink="false">43590@/discussions</guid>
        <description><![CDATA[<p>Hi Team</p><p>I'm reading the data from SharePoint and the format of the file is Cost Center_06092024.xlsx</p><p>As the file comes with the date format, I partitioned reading the data as /Cost Center_%M%D%Y.xlsx  and in my prepare recipe set the option as Last available by that the flow ONLY get the latest file.</p><p>I'm trying to create a scenario as a Trigger on dataset change. Can you please help me with the option to use here?</p><p>Attached is the screenshot for reference</p><p> </p><p>Thanks</p><p>Satish</p><hr /><p>Operating system used: <strong>Browser</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Bug in Stack Recipe</title>
        <link>https://community.dataiku.com/discussion/41808/bug-in-stack-recipe</link>
        <pubDate>Tue, 26 Mar 2024 14:26:19 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>yashpuranik</dc:creator>
        <guid isPermaLink="false">41808@/discussions</guid>
        <description><![CDATA[<p>Hi All,</p><p> </p><p>I am sharing below a minimum reproducible project that triggered an error in one of our larger workflows involving the stack recipes.We have been seeing these errors for Snowflake tables (they may exist in others) around string length and truncation.</p><div> </div><div><img width="999" height="207" alt="1.png" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/9722i2AAFBDA49054FEB1.png" style="text-align: inline;" /></div><p>The culprit seems to be that Dataiku is automatically recognizing string length when Snowflake tables are created with specific queries but using the smallest column length to create the output schema for the Stacked Recipe. <img width="999" height="177" alt="2.png" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/9720i74E28241DFAF4F6C.png" style="text-align: inline;" /><img width="999" height="260" alt="3.png" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/9718iBD5AAC84EE1BEFDF.png" style="text-align: inline;" /><img width="999" height="296" alt="4.png" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/9721iD8CDB9A0F56E2843.png" style="text-align: inline;" /><img width="999" height="291" alt="5.png" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/9719i5AF40F58CFB4A33E.png" style="text-align: inline;" /></p><p>Of course I can get around this by manually defining the "Table Creation SQL", but would prefer this is addressed on the product level if possible.</p><p> </p><p>Thanks,</p><p>Yash</p>]]>
        </description>
    </item>
    <item>
        <title>Scenario Reporters</title>
        <link>https://community.dataiku.com/discussion/43082/scenario-reporters</link>
        <pubDate>Wed, 22 May 2024 20:59:56 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Satish</dc:creator>
        <guid isPermaLink="false">43082@/discussions</guid>
        <description><![CDATA[<p>Currently using Scenario reporters to send data to a dataset with below configuration.</p><p> </p><p>{<br />"flowname": "${scenarioName}",<br />"status": "${outcome}",<br />"summary": "${failedEventsSummary}"<br />}</p><p> </p><p>The issue is failedEventsSummary is providing too much text. How can we get just the ERROR on why the scenario failed.</p><p> </p><hr /><p>Operating system used: <strong>Browser</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Generate Tile Num and Tile Sequence</title>
        <link>https://community.dataiku.com/discussion/42851/generate-tile-num-and-tile-sequence</link>
        <pubDate>Tue, 14 May 2024 21:42:47 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>satishkurra</dc:creator>
        <guid isPermaLink="false">42851@/discussions</guid>
        <description><![CDATA[<p>Hi team</p><p> </p><p>I'm trying to populate the Tile Num and Tile Sequence Number in the attached picture format. Trying to use windows recipe with no luck.</p><p> </p><p>Can someone please help with this?</p><p> </p><p>Attached is the data, the ask is to make sure generate a tile num for INS column. Highlighted the color combinations in the picture.</p><hr /><p>Operating system used: <strong>Browser</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Fuzzy Join: When to use Relative to the Left vs Right Tables.</title>
        <link>https://community.dataiku.com/discussion/42777/fuzzy-join-when-to-use-relative-to-the-left-vs-right-tables</link>
        <pubDate>Fri, 10 May 2024 23:02:59 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>tgb417</dc:creator>
        <guid isPermaLink="false">42777@/discussions</guid>
        <description><![CDATA[<div><div><span>I'm starting to work with the Fuzzy Joins and having good luck.</span></div><br /><div><span>However, I'm trying to figure out when I might want to use a Relative Threshold related to the Right or Left Table when doing a overall Left Join to find duplicate records.</span></div><br /><div><span>I understand that the proportions of items that need to match will be different based on the difference in the length of each the left and right table data elements.</span></div><br /><div><span>But, my question is why might one be better than the other when I don't necessarily know the length of the strings in my left table and right tables.</span></div><br /><div><span>My us case is a self join (the table to itself as both the left and right table) I've got text strings that can vary from just a few characters to a few thousand characters.   So these strings will appear in both the left and right tables at some point.</span></div><br /><div><span>I think I understand that relative joins are good for me.  Because if I have two short vales as the left and right tables.  Then only a few substitutions are checked, and for longer data elements more characters are checked before the items are considered to be joined.</span></div><br /><div><span>But for example if I have a short string and a long string say:</span></div><br /><div><span>This is a short string.                                              And this is a short string made longer.</span></div><br /><div><span>Lets say that the relative values is 50%</span></div><br /><div><span><span>Why would I use relative to left vs relative to right in a deduplication use case.</span></span><hr /><span>Operating system used: <strong>Mac OS Senoma 14.4.1</strong></span></div></div>]]>
        </description>
    </item>
    <item>
        <title>Error using Embed recipe in RAG tutorial in Dataiku</title>
        <link>https://community.dataiku.com/discussion/42268/error-using-embed-recipe-in-rag-tutorial-in-dataiku</link>
        <pubDate>Mon, 22 Apr 2024 06:46:06 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>VaishnaviRam</dc:creator>
        <guid isPermaLink="false">42268@/discussions</guid>
        <description><![CDATA[<p>Hi,</p><p>I am following the RAG tutorial link -&gt; <a href="https://knowledge.dataiku.com/latest/ml-analytics/gen-ai/tutorial-question-answering-using-rag-approach.html#" target="_blank" rel="noopener nofollow">https://knowledge.dataiku.com/latest/ml-analytics/gen-ai/tutorial-question-answering-using-rag-approach.html#</a></p><p>While trying to run the Embed Recipe I am getting error as follows.</p><h4> Oops: an unexpected error occurred</h4><h5><span>Error in Python process: &lt;class 'dataikuapi.utils.DataikuException'&gt;: com.dataiku.dip.io.SocketBlockLink$SecretKernelTimeoutException: Subprocess failed to connect, it probably crashed at startup. Check the logs., caused by: SocketException: Socket operation on nonsocket: configureBlocking</span></h5><p><span>HTTP code: , type: &lt;class 'dataikuapi.utils.DataikuException'&gt;</span></p><p> </p><p><span>Kindly help me to fix this issue. Have attached the logs</span></p><p> </p><p> </p><p> </p><p> </p><p> </p><hr /><p>Operating system used: <strong>Windows 10</strong></p><hr /><p><strong>Operating system used: <strong>Windows 10</strong></strong></p><hr /><p><strong><strong>Operating system used: <strong>Windows 10</strong></strong></strong></p>]]>
        </description>
    </item>
    <item>
        <title>when training a model with a visual recipe, does dataiku fit the model on the entire dataset?</title>
        <link>https://community.dataiku.com/discussion/31365/when-training-a-model-with-a-visual-recipe-does-dataiku-fit-the-model-on-the-entire-dataset</link>
        <pubDate>Mon, 12 Dec 2022 16:51:14 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Tanguy</dc:creator>
        <guid isPermaLink="false">31365@/discussions</guid>
        <description><![CDATA[<p>Context:</p><ol><li>I have deployed a model to the flow</li><li>I want to retrain that model with its associated "train" recipe</li><li>I understand that the model's performance is evaluated using a test set or K-folds under a cross-validation strategy</li></ol><p>My question: after retraining the model using the "train" recipe, is the resulting new active model fit on the entire dataset (as best practice sometimes suggests to do so)?</p><p>I can't find any information on this final fitting strategy in the recipe (see screenshot below) and failed to find such information in dataiku's documentation.</p><p><img width="911" height="893" alt="model_train_settings.jpg" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/7476i2CE662BFDAE0F849.jpg" style="text-align: center;" /></p><p> </p><hr /><p>Operating system used: <strong>WIndows 10</strong></p><p> </p>]]>
        </description>
    </item>
    <item>
        <title>Coalesce function doesn&#39;t work properly in prepare recipe</title>
        <link>https://community.dataiku.com/discussion/36874/coalesce-function-doesnt-work-properly-in-prepare-recipe</link>
        <pubDate>Tue, 22 Aug 2023 10:00:47 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>kentnardGaleria</dc:creator>
        <guid isPermaLink="false">36874@/discussions</guid>
        <description><![CDATA[<p>Hi everyone!</p><p>I have a question regarding the coalesce recipe in dataiku. I wanted to use the coalesce funtion in dataiku Formula and the preview that I have in the prepare recipe shows that the function works and it shows the value that I want. But after executing the recipe, the resulting column shows a different output from the preview. </p><p>I have made sure that the order of the values in the coalesce function is correct and that the empty cells are NULL instead of an empty string. I could not comprehend where the mistake is. The pictures are attached below. Picture 1 shows the preview in prepare recipe and Picture 2 shows the resulting dataset.</p><p>Thanks in advance!</p>]]>
        </description>
    </item>
    <item>
        <title>DSS visual recipes defaulting to max column length with Redshift tables</title>
        <link>https://community.dataiku.com/discussion/4102/dss-visual-recipes-defaulting-to-max-column-length-with-redshift-tables</link>
        <pubDate>Wed, 08 Jan 2020 18:08:45 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>veenacalambur</dc:creator>
        <guid isPermaLink="false">4102@/discussions</guid>
        <description><![CDATA[<p>Hi everyone, </p><p>When working with Redshift tables in DSS visual recipes we noticed that the table creation settings sometimes defaults to setting certain column lengths to the redshift max (65,000). In many cases this becomes excessive. For example, in the screenshot below the "brand" column has a length of 65k but most of the column has text that span less than 10 characters. </p><p>We wanted  to better understand the logic of column length setting defaults for Redshift and if there is a safe / proper way to modify this.</p><p><img width="400" height="400" alt="veenacalambur_0-1578506355191.png" src="https://us.v-cdn.net/6038231/uploads/lithium_attachments/239iE00D5A52754DA09B.png" style="text-align: inline;" /></p><p> </p>]]>
        </description>
    </item>
    <item>
        <title>Feature handling Dummy encoding</title>
        <link>https://community.dataiku.com/discussion/40740/feature-handling-dummy-encoding</link>
        <pubDate>Fri, 09 Feb 2024 00:02:45 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>stoch</dc:creator>
        <guid isPermaLink="false">40740@/discussions</guid>
        <description><![CDATA[<p>Dataiku's category handling = Dummy encoding with dropping dummy option seems to be using a level with the least exposure/volume as a dummy.</p><p>Q1. Is there a way to set this dummy manually instead of Dataiku's default method? Want to avoid using category handling = custom preprocessing option.</p><p>Q2. Using Variable type = Categorical with Drop one dummy option on input variable of double type seems to be dropping 2 levels. For example, there are only 3 regression coefficients from a variable with 5 levels). I would of expected there would be 4 regression coefficients since 1 is used as a dummy). Does anyone know the reason for this?</p><p>Many thanks in advance.</p><p> </p><p> </p>]]>
        </description>
    </item>
    <item>
        <title>set the random state in visual ML models</title>
        <link>https://community.dataiku.com/discussion/40366/set-the-random-state-in-visual-ml-models</link>
        <pubDate>Fri, 26 Jan 2024 18:40:24 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Tanguy</dc:creator>
        <guid isPermaLink="false">40366@/discussions</guid>
        <description><![CDATA[<p>I have an ongoing project in production that I intend to replace with another project currently in development. As part of this transition, I find myself comparing a dataset that has undergone scoring from a model in each project. Initially, I anticipated the model scores to be identical or, at the very least, very similar. However, I have observed significant differences despite the fact that the underlying data provided to both models is the same.</p><p>Consequently, I am seeking a method to standardize the model training between the two projects by setting the random state. I am utilizing a random forest classifier within a visual recipe, and random forests in scikit-learn have a `random_state` attribute.</p><p>Is there a recommended approach to achieve this?</p><hr /><p>Operating system used: <strong>Redhat 8</strong></p>]]>
        </description>
    </item>
    <item>
        <title>Force substring to integer</title>
        <link>https://community.dataiku.com/discussion/39897/force-substring-to-integer</link>
        <pubDate>Thu, 04 Jan 2024 19:23:26 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>yesitsmeoffical</dc:creator>
        <guid isPermaLink="false">39897@/discussions</guid>
        <description><![CDATA[<p>Here is the sample table:</p>

<table border="1"><tbody><tr><td>ID</td><td>Column A</td></tr><tr><td>1</td><td>AA2001</td></tr><tr><td>2</td><td>BB2002</td></tr></tbody></table><p> </p>

<p>I want to add a Column B, in which the values are <u>forced</u> to be integer.</p>

<p>I know I can do Column B = substring (Column A, -4), and Dataiku will automatically convert the values to integer, but the conversion process is a black box to me, and I don't know what's the conversion criteria/logic and when it might fail.</p>

<p>I thought I could add "numval" in front of the substring to force the conversion but it didn't work and returned blank.</p>

<p>Is there a logic I could apply to achieve this? Basically something like:</p>

<pre spellcheck="false" tabindex="0">pd.to_numeric(df['Column A'].str[:4], errors='coerce')</pre>

<hr /><p></p>

<p>Operating system used: <strong>win 11</strong></p>

<p> </p>
]]>
        </description>
    </item>
    <item>
        <title>Confused on how to use RAG (Retrieval Augmented Generation)</title>
        <link>https://community.dataiku.com/discussion/38673/confused-on-how-to-use-rag-retrieval-augmented-generation</link>
        <pubDate>Thu, 02 Nov 2023 13:14:47 +0000</pubDate>
        <category>Using Dataiku</category>
        <dc:creator>Antal</dc:creator>
        <guid isPermaLink="false">38673@/discussions</guid>
        <description><![CDATA[<p>I'm playing with the new LLM recipes and getting a bit confused with the RAG functionality.</p><p>I can use an Embed recipe to create an Embedding dataset / Vector Store.</p><p>Then I can setup an LLM two query the resulting object in its settings.</p><p> </p><p>But, how to go from there? How can I ask a question / query to the Embedding object? Clicking on it only gives the option of a Python recipe and there's also nothing like a Visual webapp.</p><hr /><p>Operating system used: <strong>AWS Linux</strong></p>]]>
        </description>
    </item>
   </channel>
</rss>
