General Discussion
- Is there a way to create a new application-as-recipe with builder from Python API same as new_recipe() function?Last answer by TurribeachLast answer by Turribeach
- I think Ive found a bug on how the editable dataset sorts integer columns: Im trying to sort descending and its doing alphabetically, which is not desired. Operating system used: MacOS and WindowsLast answer by Turribeach
I don't think this is a bug since it's behaving as designed. You may argue that it's inconvenient and slightly confusing but given the solution is to use the Explore tab it might be one of those where it's best to move on as there is an easy workaround.
- Hello All, Did you already experience connecting to IBM MQ from Dataiku. I understood we don't have a native plugin/connector with IBM Message Queue yet. I need to build a MQ pub/sub broker connection…
- I realized that Sampling > Random can only be performed with Engine: DSS , it can't be performed with Engine: In-Database (SQL) currently in 12.4.1. And I wonder why, because it seems to me that they …Last answer byLast answer by ecerulm
Aha, so the SQL support is generic among Snowflake, redshift, etc.?
In any case I did a quick checks and SAMPLE / TABLESAMPLE is supported by many databases (postgress, apache impala, mysql, teradata, BigQuery). Hopefully it somebody at Dataiku can take a look and how to push-down sampling to the SQL engine for the specific databases that support it. - Hi folks, I am trying to check the existence of a file in a particular path using python code in dataiku. I am able to access the file manually but when I am trying to check the existence it is not gi…Last answer byLast answer by Turribeach
Please post your code snippet using a code block (the </> icon in the toolbar). If you don't then the padding is lost and the code can't be executed when you copy/paste it as Python is strict about padding.
With regards to you issue you can't access the file system directly, you need to use a Dataiku Managed Folder:
https://knowledge.dataiku.com/latest/code/managed-folders/concept-managed-folders.html
- I want to create a voice cloning AI with accent integration just like the voice cloning software I've been using since the past 7 months. I want to create a similar one with a unique accent cloning fe…Last answer byLast answer by kaveesak
Consider experimenting with accent embedding techniques to ensure your AI can accurately mimic various accents. Additionally, refining your model's training process to prioritize accent adaptation could enhance its ability to clone voices with precision.
Since you're aiming for a personalized AI tailored to your specific needs, it's worth investing time in fine-tuning parameters and optimizing the model architecture to achieve the desired outcome.
Don't hesitate to explore resources and guidance from AI for Business—they might offer valuable insights and strategies to support your project. - I have been using the dataiku API to make an algorithm and it involves executing multiple recipes. I am using the job_builder so that the recipes can be executed in "one job" (meaning that there will …Last answer byLast answer by Turribeach
I think using jobs you are never going to achive what you want and indeed is not even the right tool for the job (pun intended!).
With regards to using scenarios you already have a solution that executes things the way you want but you don't like how the information is presented in the scenario Last Runs tab. Personally I don't think this is an issue that's worth persuing, at least as far as how things look. Having said that I do think that having a scenario that only has a single dataset build per step is a bit of an anti-pattern and goes against how scenarios should be used in Dataiku. More importantly this scenario design will not be able to execute any jobs in parallel which depending on your flow design may lead to very poor performance of your custom build jobs as each job has to wait for the previous one to execute.
And therein lies the problem of this whole thread: your requirement to execute "jobs sequentially" is not a real requirement but how you thought you could solve your real requirement: to build some flow datasets in the correct order. Once you understand what the real requirement is this usually results in a much better solution since you are not trying to go against the flow (second pun intended!).
So here comes the last solution I will propose and I will declare myself out of this thread. They key of your problem is to understand that you will have the best outcome using the smart flow execution which allows you to specify your end of flow output datasets and set to run all downstream recipes in the right order and in parallel if possible. You will of course say you don't want to run your whole flow and you want to exclude some datasets. That should be what your question should have always been about (see this).
The solution to your requirement is there to create a scenario with 3 steps:
- In the first step you set all the datasets you want to exclude from the flow build using API code in a Python code step like this:
dataiku_client = dataiku.api_client() project = dataiku_client.get_project(project_key) dataset = project.get_dataset(dataset_id) dataset_settings = dataset.get_settings() dataset_settings.get_raw()['flowOptions']['rebuildBehavior']='WRITE_PROTECT' dataset_settings.save()
2) You then add a build a step with all your end of flow output datasets and set it to run all downstream recipe
3) Finally you add a final Python code step to revert the datasets you changed in step 1 back to normal build mode:
dataiku_client = dataiku.api_client() project = dataiku_client.get_project(project_key) dataset = project.get_dataset(dataset_id) dataset_settings = dataset.get_settings() dataset_settings.get_raw()['flowOptions']['rebuildBehavior']='NORMAL' dataset_settings.save()
You execute this scenario and you will have a flow build scenario which will build all the datasets you want, in the correct order and in parallel where possible and in a single job/scenario step.
- Oops: an unexpected error occurred The value of parameter linuxConfiguration.ssh.publicKeys.keyData is invalid., caused by: OnNextValue: OnError while emitting onNext value: retrofit2.Response.classLast answer byLast answer by Alexandru
Hi @dijithdinesh007
,
The key need to be id_rsa or id_ed25519 formats e.g the contents of id_rsa.pub or id_ed25519.pub
should look like :
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXlgyRfX2YqHnxHmrVXJOkDfJNtPRyQnTh
/XC8oO2Z<SNIP>X6x1pTh= user@example.com
orssh-ed25519 AAAAC3<SNIP> user@example.com
and not the Putty format e.g please check and let me know if this is the case. If you are still having issue it may best to open a support ticket with screenshots of what you ssh public key looks like exactly.
Thanks - I've created one flow which takes input file from S3 based on Scenario trigger parameters and run the flow and finally saves the processed data into S3 in different locations based on the parameters i…Last answer byLast answer by Turribeach
The design I proposed was a sample. You can break down the flow in many different ways. For instance you could break it by season so you will end up with two branches. Also you seem to have ignored my reference to Dataiku Applications. This is the only supported way to run multiple flow executions concurrently. Even partitions as suggested by Ignacio will not allow to run multiple flow executions concurrently although you can recalculate multiple partitions at the same time.
Finally I would say that the method I proposed using an event based trigger (dataset changed on top of a folder) is the more modern way of approaching this problem. I would also like to know why do you want to run multiple files concurrently, is this actual requirement? Or is it just a side effect of your decision to execute the scenario via an API call? If the files don't arrive all at the same time then there is no point in building a flow that supports concurrent execution. And even if they arrive at the same time you may not have any time constraints to process them serially so why bother. And as per example you can build some parallel execution if you break down your flow in different branches.
- Hey Dataiku users, I just wanted to know how I can convert a very big binary data file to a human readable file like xml/ csv or anything that I can see the decoded data? Thank you! Operating system u…Solution by