-
Managed Folder contents location unicity
Goodday! Are managed folder contents entered into any type of logging system, backup system, or version control? Ie. can those contents be found in other places than the managed folders themselves? I'm assuming that's not the case, and the actual managed folder is the only place that the contents/data is actually stored…
-
Fuzzy Join: When to use Relative to the Left vs Right Tables.
I'm starting to work with the Fuzzy Joins and having good luck. However, I'm trying to figure out when I might want to use a Relative Threshold related to the Right or Left Table when doing a overall Left Join to find duplicate records. I understand that the proportions of items that need to match will be different based…
-
Dynamic rename and addition/removal of columns
I am connecting to QuickBase using a URL path and looking for a way to dynamically add or remove columns from the initial schema as well as rename the columns as "Simplify (and lower case)". I do not own the source data, and cannot influence the naming convention or columns at the source. I am pushing this data to GCP/Big…
-
Uploading image files to managed folder
Hello, I need to pull images from a folder, transform it and upload it to another folder using python script. I pulled and transformed the images but I have trouble saving to destination folder. upload_data takes binary data not image data. Other upload methods requires "local file path" Anyway other method I can use?…
-
How to combine several rows to one rows?
Hello, My data looks like this: Recordsvaluesrecords_0_NameJimmyrecords_0_Number1records_0_StatusStudentrecords_1_NamesMarierecords_1_Number2records_1_StatusWorker And i want it looks like this: NameNumberStatusJimmy1StudentMarie2Worker Any ideas?
-
Need help regarding reading data from google cloud storage (unsupported data format)
Hello, I have video data stored in google cloud storage. That format is not supported so I can't just click "New dataset > GCS" and import my data. Dataiku - GCP connection is already exists. Is there a way I can import my data using that existing Dataiku - GCP connection? On notebook, shell script, plugin etc... Thank you.
-
Removing duplicate columns
Dear community, I have such a case: - I have a large database that needs cleaning. - while performing the typical cleaning activities (parsing etc.) I discovered that I have numerous columns that are just duplicates of one another (judging by basic analysis it's hundreds) but with different names. Example: 1 column name is…
-
Flow Zone Reuse? Can one flow zone be reused from multiple datasets
I've been working on a number of file systems to process data about the files on various disks. The flow zone I've created looks like this. There are two simple shell scripts that gather data about the same file volume, two preparation recipes that clean up that data, and 1 Join recipe that brings both sets of data…
-
Read Images on Azure Blob Storage
Hi, We are trying to create images dataset from images stored on Azure blob storage. We have successfully established connection with the blob storage and able to view the list of images from a container while creating a dataset. Now when we select an image(png/jpg) from the list it shows that the format of the image is…
-
Using Dataiku DSS for EDA purposes
Hi I'm a big fan and use DSS often and would like to know if everyone finds the visualisations in DSS fascinating? The latest version (I saw in the demo) has even more and really good visualisations. I was wondering if anyone uses DSS for EDA purposes as well and if so you can share some tips with me on how you go about -…