-
Python Pipeline
How do I skip two first rows directly from the pipeline dataiku created
-
dataset running
안녕하세요. 데이터 build 관련해서 질문이 있습니다. 튜토리얼을 따라하고 있는 과정인데, 전체 flow에서 절반의 데이터셋과 recipe 정도를 돌리고 서버를 껐다가 다음날 이어서 진행하려고 하였는데 flow는 정상적으로 있으나 가끔씩 dataset의 data들이 사라져있습니다. 서버를 껐다가 재접속 하는 경우 rebuild를 전부 해야하나요?
-
Odata plugin bring back nulls as None
Hi – we are using the OData - protocol version 3.0) to connect to a dataset that our vendor maintains. version 4.0 does not work. When the dataset is viewed in PowerBI fields will show nulls but when I bring the dataset into Dataiku all null fields are changed to the word None. Is this by design or is there a way to change…
-
"Finger Printing" files in a Managed Folder
I have a managed folder out on an SFTP Dataiku connection with lots of files. (Hundreds of Thousands to Millions of files.) I'm able to open the connection and get basic file details. #... input_folder = dataiku.Folder("AAAAAAAA") paths = input_folder.list_paths_in_partition() #... path_details = [] for path in paths:…
-
Python recipe with Partitioned input and output
Hi Greetings... I am trying to connect a Python recipe with multiple inputs among which couple of them has to be partitioned so that I can do some transformation on a part of data and then write it to the output dataset. Things I need to do: - Partition input dataset on a dimension dynamically. (The list of partition…
-
SFTP Site with .Zip files (with more than just data in the .zip file)
I'm receiving data from an external partner. They have setup an SFTP file server for me to get the data. They .zip the .tsv files that I'm expecting. However, they also add other documents in the .zip file that are not the data I need for my process. Basicly a data dictionary for the data they are providing. From this…
-
Copy data from a MySQL database to Vertica
Hi, I explain you a little bit my problem. The data I use come from a MySQL Database where I have a read-only access. For my work, I use a Vertica Database. The first operation is to copy the data from MySQL to Vertica. I simply use the DSS synchronization recipe. But the problem is that I have database of several hundred…
-
Querying multiple db's in same query with SQLExecutor2
Hi, I am trying to use SQLExecutor2 in python to pull in a dataset from a query. The query uses multiple tables from different databases as the example below. Even though I am specifying the DB in the query I am still getting error 'invalid object name'. Is it possible to query multiple databases and output a dataframe?…
-
Reading a file with 40M+ records using PySpark
Hello, I'm trying to read a file having 40M+ records and around 70 columns. After reading the file, when I'm trying to display the record count using df.count() method, it is taking loads of time to execute. Last time I checked, the statement was executing for 30+ minutes with no output. I'm very new to the Dataiku…
-
Run receipe
Hi Team, I have download a file from AWS S3, Created a recipe ( rename and add new column ) , getting an error while executing the script, com.dataiku.dip.datasets.fs.HTTPDatasetHandler cannot be cast to com.dataiku.dip.datasets.fs.AbstractFSDatasetHandler Logs may contain additional information Additional technical…