Append Data without drop table with python

tamvap
tamvap Dataiku DSS Core Designer, Registered Posts: 10 ✭✭✭✭
edited July 16 in General Discussion

Hello,

I am building a simple web app with some input fields in DSS in order to insert a single row of data to a SQL table of which i have created the Dataset in DSS. I have tried successfully to append the data using the write data frame function and pandas data frame. But this function is dropping the SQL table before loading the data and then it creates the table again.

dataset = Dataset("DATASET_NAME")
mydataset = Dataset("DATASET_NAME").get_dataframe()
mydataset = mydataset.append(new_line)
with dataset.get_writer() as writer:
    writer.write_dataframe(mydataset,infer_schema=False,dropAndCreate=False)  

I have tried also the parameters of infer schema =False and dropAndCreate = False but the i got the below errors:

TypeError: write dataframe() got an unexpected keyword argument 'dropAndCreate' 
TypeError: write dataframe() got an unexpected keyword argument 'infer_schema'

Is there a way to just append new row of data using python without dropping the SQL Table ?

Thanks

Best Answer

Answers

  • tamvap
    tamvap Dataiku DSS Core Designer, Registered Posts: 10 ✭✭✭✭

    Hello @SarinaS
    ,

    Thanks, for your answer . Seems to work fine !

    Tamvap

  • vpozath
    vpozath Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 1 ✭✭✭

    We have similar issue when using DATAIKU with R API dkuWriteDataset by default does not append data. It always overwrite the data. How can I force append the data when calling the API outside dataiku flow. In reviewing the source code for the API looks like the code is doing the following:

    1. default the append mode to false. Code reference (appendMode <- FALSE)
    2. Get flow spec using function(getDkuFlowSpec())
    3. Set appendMode based on flowspec appendmode configuration.

    Would appreciate if you could let me know how we can configure dataset with appendMode so that when we use the API it always write with append option.

    Following is source code for the R API dkuWriteDataset that I am referencing above

    appendMode <- FALSE
    parsedRSpec <- getDkuFlowSpec()
    print(parsedRSpec[["out"]])
    if (is.null(parsedRSpec)) {
    name <- dku__resolve_smart_name(name)
    tmpFilePath <- tempfile(pattern = "r-data-")
    currentActivityId <- ""
    streamFile <- TRUE
    }
    else {
    sharedTmpDir <- parsedRSpec$tmpDir
    if (is.na(file.info(sharedTmpDir)$isdir) || !file.info(sharedTmpDir)$isdir) {
    name <- dku__resolve_smart_name(name)
    tmpFilePath <- tempfile(pattern = "r-data-")
    streamFile <- TRUE
    }
    else {
    tmpFilePath <- tempfile(tmpdir = sharedTmpDir)
    streamFile <- FALSE
    }
    if (partition != "") {
    stop("You cannot explicitly set partitions when running within Dataiku Flow")
    }
    found = FALSE
    for (out in parsedRSpec[["out"]]) {
    if (out[["fullName"]] == name || out[["smartName"]] ==
    name) {
    if (!("partition" %in% names(out))) {
    partition = ""
    }
    else if (is.null(out[["partition"]])) {
    partition = ""
    }
    else {
    partition = out[["partition"]]
    }
    name = out[["fullName"]]
    if (!(is.null(out[["appendMode"]]))) {
    appendMode <- out[["appendMode"]]
    }
    found = TRUE
    }
    }

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 17

    Hi @vpozath
    ,

    Thanks for your comment and description of your setup!

    There isn't an equivalent for this option in R. If you want to stick with R, I think your best option is to read in your dataframe, append your row to the dataframe in your R code, and then write back the full updated dataframe with the new data appended. For a brief example:

    library(dataiku)
    
    # dataset with some data
    my_dataset <- dkuReadDataset("my_dataset")
    
    # add your row 
    row <- data.frame("John", 40)
    names(row)<-c("name","amount")
    my_dataset <- rbind(my_dataset, row)
    
    # write your updated dataframe 
    dkuWriteDataset(my_dataset, "my_dataset")


    Thanks,
    Sarina

  • jordan_sklansky
    jordan_sklansky Registered Posts: 6
    edited July 17

    Hello,

    Opening up this old thread....Totally new to Python in Dataiku, but I tried the above solution and I am getting an error:

    Exception: An error occurred during dataset write (14I5hwThmP): IllegalArgumentException: Unsupported Append mode

    Does the below not work any longer and if so what is the new method?

    mydataset.spec_item["appendMode"] = True

    Thanks!

    Jordan

Setup Info
    Tags
      Help me…