Using Dataiku

Sort by:
671 - 680 of 5.2k
  • Hi all! I'm currently working on a very simple project that I need to deploy as an application. I'm very, very new to the platform, so I'm not sure how to even begin. I already have a Flow that takes …
    Answered ✓
    Started by DRamos
    Most recent by Turribeach
    0
    3
    Turribeach
    Solution by Turribeach

    Before progressing further with your solution you need to understand certain limitations of the Dataiku world. In Dataiku there is no support for running a flow, a scenario or a recipe concurrently. This means that "there can only be one" running at a single point of time. In general flows/scenarios/recipes are meant to run in "batch" mode in a scheduled basis, not on demand where an adhoc user can select an input and trigger an execution. It's perfectly possible to have a Dataiku WebApp that allows a user trigger a flow/scenario/recipe but this would only work for the first user that does it. If a subsequent user attemped to run the same flow/scenario/recipe while it is already running it will most likely fail (the actual outcome of this depends on which object you are trying to re-run). So before you continue further with your design you need to consider whether your solution needs to support concurrent runs or not. Sometimes you can get away with no concurrent runs and therefore you can use a flow/scenario/recipe design. For instance if you need to support a single user or if the user will only trigger a flow/scenario/recipe once a day. Where you can't use this design approach you need to move away from flow/scenario/recipe "batch" design and look at the API node and deploy an API service. Dataiku API Services allow developers to create APIs which can be used for "realtime" use cases, where you provide a set of inputs and you want to produce a unique output for that request. Typically these are used in scoring/prediction use cases like predicting the probability of default (ie when a customer applies for a credit card). The API node supports high availability and scalability which is something you will expect for a service that's expected to run concurrently and service multiple concurrent requests/users.

    Now let's talk about Dataiku Applications. Dataiku Applications allow you to design and package your project as a reusable application with customizable inputs. While Dataiku Applications can run concurrently the use case Dataiku Applications is aimed at it's not the concurrent realtime use of a flow/scenario/recipe. Dataiku Applications allow to re-use a flow design into other projects so the use case is code reusability. As such in order to achive concurrency with Dataiku Applications you have to use the Dataiku Application into different projects. This is not something that fits your use case where you want to have a single project that serves user requests. I guess you could have a kludge solution where you convert your flow into a Dataiku Application and then you create multiple projects all using the same Dataiku Application (ie Project_Slot_1, Project_Slot_2, Project_Slot_3, Project_Slot_n) which allow you to execute your flow concurrently and have your WebApp sort of handle which application is being used at the moment and choose a "free slot" to execute the user inputs. But this seems like a bad pattern to me.

    So now that you understand how these solutions work you should clarify what your requirements are in terms of concurrency.

    Turribeach
    Solution by Turribeach

    Before progressing further with your solution you need to understand certain limitations of the Dataiku world. In Dataiku there is no support for running a flow, a scenario or a recipe concurrently. This means that "there can only be one" running at a single point of time. In general flows/scenarios/recipes are meant to run in "batch" mode in a scheduled basis, not on demand where an adhoc user can select an input and trigger an execution. It's perfectly possible to have a Dataiku WebApp that allows a user trigger a flow/scenario/recipe but this would only work for the first user that does it. If a subsequent user attemped to run the same flow/scenario/recipe while it is already running it will most likely fail (the actual outcome of this depends on which object you are trying to re-run). So before you continue further with your design you need to consider whether your solution needs to support concurrent runs or not. Sometimes you can get away with no concurrent runs and therefore you can use a flow/scenario/recipe design. For instance if you need to support a single user or if the user will only trigger a flow/scenario/recipe once a day. Where you can't use this design approach you need to move away from flow/scenario/recipe "batch" design and look at the API node and deploy an API service. Dataiku API Services allow developers to create APIs which can be used for "realtime" use cases, where you provide a set of inputs and you want to produce a unique output for that request. Typically these are used in scoring/prediction use cases like predicting the probability of default (ie when a customer applies for a credit card). The API node supports high availability and scalability which is something you will expect for a service that's expected to run concurrently and service multiple concurrent requests/users.

    Now let's talk about Dataiku Applications. Dataiku Applications allow you to design and package your project as a reusable application with customizable inputs. While Dataiku Applications can run concurrently the use case Dataiku Applications is aimed at it's not the concurrent realtime use of a flow/scenario/recipe. Dataiku Applications allow to re-use a flow design into other projects so the use case is code reusability. As such in order to achive concurrency with Dataiku Applications you have to use the Dataiku Application into different projects. This is not something that fits your use case where you want to have a single project that serves user requests. I guess you could have a kludge solution where you convert your flow into a Dataiku Application and then you create multiple projects all using the same Dataiku Application (ie Project_Slot_1, Project_Slot_2, Project_Slot_3, Project_Slot_n) which allow you to execute your flow concurrently and have your WebApp sort of handle which application is being used at the moment and choose a "free slot" to execute the user inputs. But this seems like a bad pattern to me.

    So now that you understand how these solutions work you should clarify what your requirements are in terms of concurrency.

  • Hi Everyone, We are building a Dataiku webapp. We want to get dynamic input from the user through UI and when the user submits the prompt then the prompts gets processed via LLM through AI recipe. We …
    Question
    Started by karan_25
    Most recent by tgb417
    0
    5
    tgb417
    Last answer by tgb417

    Cool, So you are using the Dataiku LLM Mesh. But using the Dataiku API to get access from your Flask bases App. I can imagine that others in the community might want to learn even more about how this is going for you. Is it performant enough, reliable enough, etc.

    tgb417
    Last answer by tgb417

    Cool, So you are using the Dataiku LLM Mesh. But using the Dataiku API to get access from your Flask bases App. I can imagine that others in the community might want to learn even more about how this is going for you. Is it performant enough, reliable enough, etc.

  • I'm starting to work with the Fuzzy Joins and having good luck. However, I'm trying to figure out when I might want to use a Relative Threshold related to the Right or Left Table when doing a overall …
    Question
    Started by tgb417
    0
  • I am connecting to QuickBase using a URL path and looking for a way to dynamically add or remove columns from the initial schema as well as rename the columns as "Simplify (and lower case)". I do not …
    Question
    Started by CurtisC
    Most recent by Turribeach
    0
    10
    Last answer by
    Turribeach
    Last answer by Turribeach

    @CurtisC
    wrote:

    but unsure how this will interpret a new column with special characters or spaces.


    It won't work as Bigquery won't accept those as valid columns names. You need to rename those columns hence why I suggested the Python recipe as the Prepare recipe can't rename columns dynamically whereas you can do that in Python. There is no other alternative.

  • Hi There, I have a single file with multiple suppliers and want to split the file into individual files for individual suppliers. However the suppliers can be dynamic week to week (when the file is re…
    Question
    Started by Jtbonner86
    Most recent by Jtbonner86
    0
    14
    Last answer by
    Last answer by
  • I have master data in the flow. I want to subset the data for each market and, in that market, select submarkets and develop models. There are around 60 models that need to be built. I am wondering wh…
    Answered ✓
    Started by Mohammed
    Most recent by Alexandru
    0
    3
    Solution by
    Solution by
  • Hi, I have a scenario which where we are reading the input from different file and one file among all input files is an out file too. When we do this we are loosing some data may be due to parallelism…
    Question
    Started by Pasumarthiavi
    0
  • Hi Dataiku Community, I wonder how to select and build multiple flow zones in a job run. Say, I have 4 flow zones, A, B, C, and D. They are connected in such a way as follows: A -> C -> D B -> C -> D.…
    Answered ✓
    Started by Frankenstein
    Most recent by Frankenstein
    0
    2
    Solution by
    Solution by
  • I've completed this tutorial on how to implement an LSTM in dataiku. https://knowledge.dataiku.com/latest/kb/analytics-ml/time-series/ts-forecast/time-series-code/deep-learning-ts.html Basically I wan…
    Answered ✓
    Started by Darius679
    Most recent by NIDHS
    0
    3
    Solution by
    Solution by
  • Hi everyone, in my Dataiku flow I have a table (T-DATA-QUALITY-40) that gives me an output consisting of a row with 3 columns. This is the output of the table How can I create a new variable with the …
    Answered ✓
    Started by MassimoRighi96
    Most recent by Turribeach
    0
    9
    Solution by
    Solution by
671 - 680 of 5.2k68