General Discussion

Sort by:
131 - 140 of 965
  • Is there a way to create a new application-as-recipe with builder from Python API same as new_recipe() function?
    Question
    Started by SJB
    Most recent by Turribeach
    0
    1
  • I think Ive found a bug on how the editable dataset sorts integer columns: Im trying to sort descending and its doing alphabetically, which is not desired. Operating system used: MacOS and Windows
    Question
    Started by crunis
    Most recent by Turribeach
    0
    4
    Turribeach
    Last answer by Turribeach

    I don't think this is a bug since it's behaving as designed. You may argue that it's inconvenient and slightly confusing but given the solution is to use the Explore tab it might be one of those where it's best to move on as there is an easy workaround.

    Turribeach
    Last answer by Turribeach

    I don't think this is a bug since it's behaving as designed. You may argue that it's inconvenient and slightly confusing but given the solution is to use the Explore tab it might be one of those where it's best to move on as there is an easy workaround.

  • Hello All, Did you already experience connecting to IBM MQ from Dataiku. I understood we don't have a native plugin/connector with IBM Message Queue yet. I need to build a MQ pub/sub broker connection…
    Question
    Started by INdran
    0
  • I realized that Sampling > Random can only be performed with Engine: DSS , it can't be performed with Engine: In-Database (SQL) currently in 12.4.1. And I wonder why, because it seems to me that they …
    Question
    Started by ecerulm
    Most recent by ecerulm
    0
    2
    Last answer by
    ecerulm
    Last answer by ecerulm

    Aha, so the SQL support is generic among Snowflake, redshift, etc.?

    In any case I did a quick checks and SAMPLE / TABLESAMPLE is supported by many databases (postgress, apache impala, mysql, teradata, BigQuery). Hopefully it somebody at Dataiku can take a look and how to push-down sampling to the SQL engine for the specific databases that support it.

  • Hi folks, I am trying to check the existence of a file in a particular path using python code in dataiku. I am able to access the file manually but when I am trying to check the existence it is not gi…
    Question
    Started by M S
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    Please post your code snippet using a code block (the </> icon in the toolbar). If you don't then the padding is lost and the code can't be executed when you copy/paste it as Python is strict about padding.

    With regards to you issue you can't access the file system directly, you need to use a Dataiku Managed Folder:

    https://knowledge.dataiku.com/latest/code/managed-folders/concept-managed-folders.html

  • I want to create a voice cloning AI with accent integration just like the voice cloning software I've been using since the past 7 months. I want to create a similar one with a unique accent cloning fe…
    Question
    Started by annianni
    Most recent by kaveesak
    0
    3
    Last answer by
    kaveesak
    Last answer by kaveesak

    Consider experimenting with accent embedding techniques to ensure your AI can accurately mimic various accents. Additionally, refining your model's training process to prioritize accent adaptation could enhance its ability to clone voices with precision.
    Since you're aiming for a personalized AI tailored to your specific needs, it's worth investing time in fine-tuning parameters and optimizing the model architecture to achieve the desired outcome.
    Don't hesitate to explore resources and guidance from AI for Business—they might offer valuable insights and strategies to support your project.

  • I have been using the dataiku API to make an algorithm and it involves executing multiple recipes. I am using the job_builder so that the recipes can be executed in "one job" (meaning that there will …
    Question
    Started by kentnardGaleria
    Most recent by Turribeach
    0
    7
    Last answer by
    Turribeach
    Last answer by Turribeach

    I think using jobs you are never going to achive what you want and indeed is not even the right tool for the job (pun intended!).

    With regards to using scenarios you already have a solution that executes things the way you want but you don't like how the information is presented in the scenario Last Runs tab. Personally I don't think this is an issue that's worth persuing, at least as far as how things look. Having said that I do think that having a scenario that only has a single dataset build per step is a bit of an anti-pattern and goes against how scenarios should be used in Dataiku. More importantly this scenario design will not be able to execute any jobs in parallel which depending on your flow design may lead to very poor performance of your custom build jobs as each job has to wait for the previous one to execute.

    And therein lies the problem of this whole thread: your requirement to execute "jobs sequentially" is not a real requirement but how you thought you could solve your real requirement: to build some flow datasets in the correct order. Once you understand what the real requirement is this usually results in a much better solution since you are not trying to go against the flow (second pun intended!).

    So here comes the last solution I will propose and I will declare myself out of this thread. They key of your problem is to understand that you will have the best outcome using the smart flow execution which allows you to specify your end of flow output datasets and set to run all downstream recipes in the right order and in parallel if possible. You will of course say you don't want to run your whole flow and you want to exclude some datasets. That should be what your question should have always been about (see this).

    The solution to your requirement is there to create a scenario with 3 steps:

    1. In the first step you set all the datasets you want to exclude from the flow build using API code in a Python code step like this:

    dataiku_client = dataiku.api_client()
    project = dataiku_client.get_project(project_key)
    dataset = project.get_dataset(dataset_id)
    dataset_settings = dataset.get_settings()
    dataset_settings.get_raw()['flowOptions']['rebuildBehavior']='WRITE_PROTECT'
    dataset_settings.save()​​​

    2) You then add a build a step with all your end of flow output datasets and set it to run all downstream recipe

    3) Finally you add a final Python code step to revert the datasets you changed in step 1 back to normal build mode:

    dataiku_client = dataiku.api_client()
    project = dataiku_client.get_project(project_key)
    dataset = project.get_dataset(dataset_id)
    dataset_settings = dataset.get_settings()
    dataset_settings.get_raw()['flowOptions']['rebuildBehavior']='NORMAL'
    dataset_settings.save()​​​

    You execute this scenario and you will have a flow build scenario which will build all the datasets you want, in the correct order and in parallel where possible and in a single job/scenario step.

  • Oops: an unexpected error occurred The value of parameter linuxConfiguration.ssh.publicKeys.keyData is invalid., caused by: OnNextValue: OnError while emitting onNext value: retrofit2.Response.class
    Question
    Started by DIJITH N
    Most recent by Alexandru
    0
    3
    Last answer by
    Alexandru
    Last answer by Alexandru

    Hi @dijithdinesh007
    ,
    The key need to be id_rsa or id_ed25519 formats e.g the contents of id_rsa.pub or id_ed25519.pub

    should look like :
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXlgyRfX2YqHnxHmrVXJOkDfJNtPRyQnTh
    /XC8oO2Z<SNIP>X6x1pTh= user@example.com

    or

    ssh-ed25519 AAAAC3<SNIP> user@example.com

    and not the Putty format e.g please check and let me know if this is the case. If you are still having issue it may best to open a support ticket with screenshots of what you ssh public
    key looks like exactly.

    Thanks

  • I've created one flow which takes input file from S3 based on Scenario trigger parameters and run the flow and finally saves the processed data into S3 in different locations based on the parameters i…
    Question
    Started by hareesh
    Most recent by Turribeach
    0
    4
    Last answer by
    Turribeach
    Last answer by Turribeach

    The design I proposed was a sample. You can break down the flow in many different ways. For instance you could break it by season so you will end up with two branches. Also you seem to have ignored my reference to Dataiku Applications. This is the only supported way to run multiple flow executions concurrently. Even partitions as suggested by Ignacio will not allow to run multiple flow executions concurrently although you can recalculate multiple partitions at the same time.

    Finally I would say that the method I proposed using an event based trigger (dataset changed on top of a folder) is the more modern way of approaching this problem. I would also like to know why do you want to run multiple files concurrently, is this actual requirement? Or is it just a side effect of your decision to execute the scenario via an API call? If the files don't arrive all at the same time then there is no point in building a flow that supports concurrent execution. And even if they arrive at the same time you may not have any time constraints to process them serially so why bother. And as per example you can build some parallel execution if you break down your flow in different branches.

  • Hey Dataiku users, I just wanted to know how I can convert a very big binary data file to a human readable file like xml/ csv or anything that I can see the decoded data? Thank you! Operating system u…
    Answered ✓
    Started by Telman
    Most recent by tgb417
    0
    6
    Solution by
    Turribeach
    Solution by Turribeach

    Well you need to ask whoever is producing these files to tell you what binary format they have. Then look for Python libraries that support reading these files.

131 - 140 of 96514