Using Dataiku

Using the "Files in folder" dataset
If you load lots of files of the same type into Dataiku you should be looking using the Files in folder dataset. It's a great built-in feature to automate the ingestion of files of the same format. You can create a new Files in Folders dataset by going to: Dataset => New Dataset => All dataset types => DSS => Files in…
Import multiple files in a "managed folder" and create an "original dataset" column containing the f
Hello, It would be cool to have the possibility to import multiple files in a "managed folder" and create a vertically-stacked dataset, and choosing to have an "original dataset" column containing the imported file names. Best regards
How to Clear metrics and checks history?
I wonder how we can clear the actual metrics and checks history? I tried to clear the dataset but it does not impact metrics' and checks' history. Thanks is advance for you guidance.
Tips for working with SQL Temporal tables?
I've searched around a bit, and I have found no information about Dataiku handling MSSQL temporal tables. Am I using the wrong search terms? Is this an area where I can only do this manually (temporal tables store a history with to datetime columns that bracket when the row was effective) MSSQL has some special query…
Share a dataset using python api
Team, Want to know if there is any python API to share the dataset from one project to another. For example, Project X has dataset D. Run a python API to share dataset D with Project Y and Z. Thanks, Skanda
Bug? Axis labels and prefixes not working in line/bar chart
This seems to be a bug. Editing the axis labels and adding a prefix doesn't seem to do anything in the line and bar charts in the Charts tab of a dataset (see screenshot) Operating system used: Windows
How to fully automate model retraining on the most up-to-date training data?
We are trying to build an automated pipeline (via a Scenario) that, among other things, involves retraining our main classification model each time the Scenario is run. Ideally, this retraining should happen on freshly-updated training data (the training dataset is refreshed/recalculated earlier in the same Scenario).…
Dataset virtualization
Hi All, I am trying to understand how virtualization in DSS works. In the following example, SQL pipelines are enabled and virtualization is allowed for 'split_1' and 'split_2'. When building 'stacked' with smart reconstruction, 'split_1' and 'split_2' remain unbuilt (virtualized) as expected. However, in the next example,…
Invalid argument An invalid argument has been encountered : Invalid loc: empty name
This problem exists in a single project on my instance. Every time I attempt to delete ANY dataset (regardless of connection or type) I get the following message. This also happens when I try to create new datasets using recipes. Thanks in advance!
Operationalizing connection to REST APIs
Wondering if anyone out there has had some success in operationalizing connections to REST APIs as a source of data for DSS Projects. I would love to have a conversation with folks who are working on this type of challenge. In my case: Dataiku DSS has allowed us to automate the gathering of data from a CRM system not…

Trending Discussions

How to automate flow flow build ?
I have multiple python scripts that i use to ingest data from DB and perform ETL steps . At the moment all the ETL logic runs on an on premise server (mainly using python / cron). Since we have Dataiku available since few things, i'm thinking about migrating all our ETL to Dataiku (flows using python recipes). Yet , it is…
compatibility of the Foreach and transpose transformations
Hi, I have a project which uses Foreach statement and right after Transpose. The Result of the output should be 26 as u see in the picture but we get 19 for some reason. When we spited the prepare statement in two moving the Transpose in a separate prepare the output was correct. I do believe it has to do something with…
Performance issue in Dataiku.
Hi, I am new to Dataiku and creating one pipeline like, datbricks-read-only dataset to -> prepare recipe (databricks dataset) ->(sync) databricks dataset to ->(sync) Azure dataset and then further process . In prepare recipe I am taking only required columns and renaming it so no space should be there. pipeline is like as…

Leaderboard

Turribeach 3813

tgb417 2527

Ignacio_Toledo 1089