New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

append checkbox not showing on sync recipe

Solved!
aw30
Level 3
append checkbox not showing on sync recipe

Hi - I am having trouble figuring out how to get the append checkbox to show up on a sync recipe. I had created a flow in the past and was able to see this but now with a new flow I cannot. Here is what I am trying to accomplish.

I have a python script that calls an API and returns back data. The data set is fairly large so I wanted to have an initial call and then each week append the week's data to the existing data set.

The flow would have a python recipe -> resulting dataset -> sync -> history table

From what I remember initially I performed the initial call for the large data set and called the history table planned_estimates_history. I then replaced the output of the sync recipe with a temporary dataset annetemp. I then have the planned_estimates_history data set sitting unattached. I then updated the output of the sync recipe using the existing data set planned_estimates_history. This is where I thought it would allow you to append since the dataset already existed. 

The datasets sit on AWS S3.

The append checkbox does not show up though. What am I missing and thank you for your time and help with this!

0 Kudos
1 Solution
MarcH
Dataiker
Dataiker

Hi,

It's possible for that append checkbox to not show up, but it's usually in cases with HDFS datasets. Could you please share a bit more info about your flow, perhaps attaching some screenshots of the flow and/or the sync recipe if possible? Also, which version of DSS are you using?

View solution in original post

0 Kudos
2 Replies
MarcH
Dataiker
Dataiker

Hi,

It's possible for that append checkbox to not show up, but it's usually in cases with HDFS datasets. Could you please share a bit more info about your flow, perhaps attaching some screenshots of the flow and/or the sync recipe if possible? Also, which version of DSS are you using?

View solution in original post

0 Kudos
aw30
Level 3
Author

Thank you for mentioning HDFS as I found out how to get this to work. I initially need to create a history data set and then download/upload it back into the flow. I can then adjust the python to pull data moving forward from what is in the history table and sync the results to the uploaded data set.

The data set is still in HDFS but Dataiku allows you to sync/append if the initial data set is uploaded versus created within a flow and then reused.

Just in case anyone needs to do this:

1) Create python recipe with an output dataset. 

2) Download the data set - I added _history to the name of the .csv file

3) Upload history data set

4) Adjust python code to pull for daterange I want - need as strings

#set date back 7 days to pull next set of data
current_date = datetime.date.today()
start_date = current_date - datetime.timedelta(days=8)
end_date = current_date - datetime.timedelta(days=2)

start_date = start_date.strftime("%Y-%m-%d")
end_date = end_date.strftime("%Y-%m-%d")

5) Add sync step that syncs the output data set to the history data set I just uploaded. Click to append to the data

 

 

 

0 Kudos
A banner prompting to get Dataiku DSS