Append Data without drop table with python
Hello,
I am building a simple web app with some input fields in DSS in order to insert a single row of data to a SQL table of which i have created the Dataset in DSS. I have tried successfully to append the data using the write data frame function and pandas data frame. But this function is dropping the SQL table before loading the data and then it creates the table again.
dataset = Dataset("DATASET_NAME") mydataset = Dataset("DATASET_NAME").get_dataframe() mydataset = mydataset.append(new_line) with dataset.get_writer() as writer: writer.write_dataframe(mydataset,infer_schema=False,dropAndCreate=False)
I have tried also the parameters of infer schema =False and dropAndCreate = False but the i got the below errors:
TypeError: write dataframe() got an unexpected keyword argument 'dropAndCreate' TypeError: write dataframe() got an unexpected keyword argument 'infer_schema'
Is there a way to just append new row of data using python without dropping the SQL Table ?
Thanks
Best Answer
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @tamvap
,You can use the following to use write_dataframe() to write in "append mode" from a webapp:
mydataset = dataiku.Dataset("MY_DATASET") # allows for appending mydataset.spec_item["appendMode"] = True with mydataset.get_writer() as writer: append_df = pd.DataFrame(data=[{'name': 'somename', 'age': 11}]) writer.write_dataframe(append_df)
This should prevent the TRUNCATE table commands that happen otherwise. Let me know if you have any questions about this.
Thanks,
Sarina
Answers
-
-
vpozath Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 1 ✭✭✭
We have similar issue when using DATAIKU with R API dkuWriteDataset by default does not append data. It always overwrite the data. How can I force append the data when calling the API outside dataiku flow. In reviewing the source code for the API looks like the code is doing the following:
- default the append mode to false. Code reference (appendMode <- FALSE)
- Get flow spec using function(getDkuFlowSpec())
- Set appendMode based on flowspec appendmode configuration.
Would appreciate if you could let me know how we can configure dataset with appendMode so that when we use the API it always write with append option.
Following is source code for the R API dkuWriteDataset that I am referencing above
appendMode <- FALSE
parsedRSpec <- getDkuFlowSpec()
print(parsedRSpec[["out"]])
if (is.null(parsedRSpec)) {
name <- dku__resolve_smart_name(name)
tmpFilePath <- tempfile(pattern = "r-data-")
currentActivityId <- ""
streamFile <- TRUE
}
else {
sharedTmpDir <- parsedRSpec$tmpDir
if (is.na(file.info(sharedTmpDir)$isdir) || !file.info(sharedTmpDir)$isdir) {
name <- dku__resolve_smart_name(name)
tmpFilePath <- tempfile(pattern = "r-data-")
streamFile <- TRUE
}
else {
tmpFilePath <- tempfile(tmpdir = sharedTmpDir)
streamFile <- FALSE
}
if (partition != "") {
stop("You cannot explicitly set partitions when running within Dataiku Flow")
}
found = FALSE
for (out in parsedRSpec[["out"]]) {
if (out[["fullName"]] == name || out[["smartName"]] ==
name) {
if (!("partition" %in% names(out))) {
partition = ""
}
else if (is.null(out[["partition"]])) {
partition = ""
}
else {
partition = out[["partition"]]
}
name = out[["fullName"]]
if (!(is.null(out[["appendMode"]]))) {
appendMode <- out[["appendMode"]]
}
found = TRUE
}
} -
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @vpozath
,Thanks for your comment and description of your setup!
There isn't an equivalent for this option in R. If you want to stick with R, I think your best option is to read in your dataframe, append your row to the dataframe in your R code, and then write back the full updated dataframe with the new data appended. For a brief example:
library(dataiku) # dataset with some data my_dataset <- dkuReadDataset("my_dataset") # add your row row <- data.frame("John", 40) names(row)<-c("name","amount") my_dataset <- rbind(my_dataset, row) # write your updated dataframe dkuWriteDataset(my_dataset, "my_dataset")
Thanks,
Sarina -
Hello,
Opening up this old thread....Totally new to Python in Dataiku, but I tried the above solution and I am getting an error:
Exception: An error occurred during dataset write (14I5hwThmP): IllegalArgumentException: Unsupported Append mode
Does the below not work any longer and if so what is the new method?
mydataset.spec_item["appendMode"] = True
Thanks!
Jordan