We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Append Data without drop table with python

Solved!
tamvap
Level 2
Append Data without drop table with python

Hello,

I am building a simple web app with some input fields in DSS in order to insert a single row of data to a SQL table of which i have created the Dataset in DSS. I have tried successfully to append the data using the write data frame function and pandas data frame. But this function is dropping the SQL table before loading the data and then it creates the table again.

dataset = Dataset("DATASET_NAME")
mydataset = Dataset("DATASET_NAME").get_dataframe()
mydataset = mydataset.append(new_line)
with dataset.get_writer() as writer:
    writer.write_dataframe(mydataset,infer_schema=False,dropAndCreate=False)  

I have tried also the  parameters of infer schema =False and dropAndCreate False but the i got the below errors:

TypeError: write dataframe() got an unexpected keyword argument 'dropAndCreate' 
TypeError: write dataframe() got an unexpected keyword argument 'infer_schema'

Is there a way to just append new row of data using python without dropping the SQL Table ?

Thanks

 

 

 

0 Kudos
1 Solution
SarinaS
Dataiker
Dataiker

Hi @tamvap ,

You can use the following to use write_dataframe() to write in "append mode" from a webapp: 

    mydataset = dataiku.Dataset("MY_DATASET")
    # allows for appending
    mydataset.spec_item["appendMode"] = True
    with mydataset.get_writer() as writer:
        append_df = pd.DataFrame(data=[{'name': 'somename',  'age': 11}])
        writer.write_dataframe(append_df)

This should prevent the TRUNCATE table commands that happen otherwise.  Let me know if you have any questions about this. 

Thanks,
Sarina

View solution in original post

4 Replies
SarinaS
Dataiker
Dataiker

Hi @tamvap ,

You can use the following to use write_dataframe() to write in "append mode" from a webapp: 

    mydataset = dataiku.Dataset("MY_DATASET")
    # allows for appending
    mydataset.spec_item["appendMode"] = True
    with mydataset.get_writer() as writer:
        append_df = pd.DataFrame(data=[{'name': 'somename',  'age': 11}])
        writer.write_dataframe(append_df)

This should prevent the TRUNCATE table commands that happen otherwise.  Let me know if you have any questions about this. 

Thanks,
Sarina

View solution in original post

tamvap
Level 2
Author

Hello @SarinaS ,

Thanks, for your answer . Seems to work fine !

Tamvap

 

0 Kudos
vpozath
Level 1

We have similar issue when using DATAIKU with R API dkuWriteDataset by default does not append data. It always overwrite the data. How can I force append the data when calling the API outside dataiku flow. In reviewing the source code for the API looks like the code is doing the following:

  1. default the append mode to false. Code reference (appendMode <- FALSE)
  2. Get flow spec using function(getDkuFlowSpec())
  3. Set appendMode based on flowspec appendmode configuration.

Would appreciate if you could let me know how we can configure dataset with appendMode so that when we use the API it always write with append option.

Following is source code for the  R API dkuWriteDataset that I am referencing above

 

appendMode <- FALSE
parsedRSpec <- getDkuFlowSpec()
print(parsedRSpec[["out"]])
if (is.null(parsedRSpec)) {
name <- dku__resolve_smart_name(name)
tmpFilePath <- tempfile(pattern = "r-data-")
currentActivityId <- ""
streamFile <- TRUE
}
else {
sharedTmpDir <- parsedRSpec$tmpDir
if (is.na(file.info(sharedTmpDir)$isdir) || !file.info(sharedTmpDir)$isdir) {
name <- dku__resolve_smart_name(name)
tmpFilePath <- tempfile(pattern = "r-data-")
streamFile <- TRUE
}
else {
tmpFilePath <- tempfile(tmpdir = sharedTmpDir)
streamFile <- FALSE
}
if (partition != "") {
stop("You cannot explicitly set partitions when running within Dataiku Flow")
}
found = FALSE
for (out in parsedRSpec[["out"]]) {
if (out[["fullName"]] == name || out[["smartName"]] ==
name) {
if (!("partition" %in% names(out))) {
partition = ""
}
else if (is.null(out[["partition"]])) {
partition = ""
}
else {
partition = out[["partition"]]
}
name = out[["fullName"]]
if (!(is.null(out[["appendMode"]]))) {
appendMode <- out[["appendMode"]]
}
found = TRUE
}
}

0 Kudos
SarinaS
Dataiker
Dataiker

Hi @vpozath ,

Thanks for your comment and description of your setup! 

There isn't an equivalent for this option in R.  If you want to stick with R, I think your best option is to read in your dataframe, append your row to the dataframe in your R code, and then write back the full updated dataframe with the new data appended.  For a brief example:

 

library(dataiku)

# dataset with some data
my_dataset <- dkuReadDataset("my_dataset")

# add your row 
row <- data.frame("John", 40)
names(row)<-c("name","amount")
my_dataset <- rbind(my_dataset, row)

# write your updated dataframe 
dkuWriteDataset(my_dataset, "my_dataset")

 


Thanks,
Sarina

0 Kudos
A banner prompting to get Dataiku DSS