Survival Curve Kaplan Meier

DavidALI
DavidALI Registered Posts: 18 ✭✭✭✭✭

Hi,

I am doctor using dataiku to analyse my patients database.

I have one column containing the diagnosis date. And another column containing the date of death, if this column is empty, it means that at the current time , the patient is still alive.

I am looking for a way to generate the survival curve

For exemple with a Kaplan Meier estimator

But without leaving DSS

I scrolled across the discussions but no did not find anything so far.

I have no Python programming skill

Thank you for your help.

Best Answer

  • aabraham
    aabraham Dataiker, Registered Posts: 4 Dataiker
    Answer ✓

    Hello,

    I have joined the plugin I have created which is a very basic plugin allowing to run Kaplan Meier estimator on data. It is from my own initiative, on my free time, so it is in no way related to or supported by Dataiku. There is no documentation because the usage is very straightforward: You must give an integer duration (duration of observation) and a boolean column to indicate if the event (usually death) happend or not. A condition column also allow to run the estimator on several conditions in a single dataset.

    Let me know if you use it so that I can warn you when the official survival analysis plugin is out!

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @DavidALI

    I do not know of a built-in way to do Survival Analysis.

    If I were doing this I'd likely create an R notebook and use a R libraries that dose Survival Analysis.

    There are a bunch of internet posts on doing survival analysis with R

    https://www.datacamp.com/community/tutorials/survival-analysis-R

    http://www.sthda.com/english/wiki/survival-analysis-basics

    --Tom

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @DavidALI

    How’s it going with your survival analysis project?

    I would love to hear more about your progress.

  • DavidALI
    DavidALI Registered Posts: 18 ✭✭✭✭✭

    Hello,

    Someone in the dataiku community called @aabraham proposed to pluginify this function.

    We are currently defining the needs.

    As soon as the solution will be available I will post it here and close this topic as resolved .

    See you.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Excellent, I would like to hear more about your plugin as the details become available. This may be helpful to me in non-profit membership churn scenario as well.

  • aabraham
    aabraham Dataiker, Registered Posts: 4 Dataiker

    Tom, I sent you a private message regarding this plugin ;).

    I had some custom recipes to fit a Kaplan-Meier model in DSS and I turned it into a plugin adapted to the usage of David. It relies on the python lifelines plugin and does only KM but it could use any default method provided in lifelines. This plugin is very rough, it does not handle errors and it is not supported by dataiku in any way. If you are interested, let me know and I can send it to you.

  • DavidALI
    DavidALI Registered Posts: 18 ✭✭✭✭✭

    the plugin runs like a charm !

    Thank you.

  • Srkanoje
    Srkanoje Registered Posts: 32 ✭✭✭✭

    @aabraham
    do we any documentation or any link for this plugin as we have for other plugins

  • aabraham
    aabraham Dataiker, Registered Posts: 4 Dataiker

    @Srkanoje
    I have written a documentation and a sample project. I can send them to you by email if needed or if you give me some time I can drop them on a GitHub.

  • gnanaraja
    gnanaraja Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭

    Hey, can you let me know if the official plugin for Survival analysis is out? i couldnt find it in the plugin stores

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @gnanaraja
    ,

    as far as I know there was never an “official” plugin created.

    above in this thread you will find a zip file. This is the plugin.

    these .zip files can be directly imported as a plugin in Dataiku dss.

    I don’t know if this plugin will work with DSS v11. This may have been built for DSS v8.

    I suspect it is still worth, giving it a try, I’d just try to import the plugin file above into a non production dss instance and give it a test.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @aabraham

    I would like to get a copy of any documentation if it is available.

  • gnanaraja
    gnanaraja Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭

    @tgb417
    thanks for your response. I will test this and keep posted here. allow me sometime

  • DavidALI
    DavidALI Registered Posts: 18 ✭✭✭✭✭

     Hi,

    In case it would be of any help, please find some documentation @aabraham sent me at the time . Note that in my use case, I was analyzing survival of a cohort of patient during a clinical study.

    1- general workflow ( in attachement)

    2 - survival function parameter ( in attachement)

    Usually, KM is used to estimate the survival time in a study. So, t0 being the start of the study, there are usually three cases:
    - the patient dies at time t1, then duration is t1 - t0. Event occured = True.
    - the patient did not die till the end of the study t2. duration is t2 - t0. Event occured = False (this is a censored case)
    - the patient dropped the study at t3 < t2. It can still be taken into account. duration is t3 - t0, Event occured = False.
    Duration, Death observed, and Condition observed are 3 columns present in the input dataset. It is required that:
    - duration is the number of units of time since last treatement. It has no unit so you can put days, or months, whatever you like. Duration must be an int
    - Death observed is just a boolean True / False if the patient is dead, , Death observed must be a boolean.
    - Condition observed is the condition. The recipe will use it to split its computation and generate one survival function per condition. For example, if column Condition contains distincts diagnosis, the plugin wil generate one survival curve for every diagnosis.
    Then you can decide to use the automated timeline generation of lifelines, or specify yours.
    3- The recipe generates a dataset and a folder. The folder contains plots generated using the default plotting function of lifeliness. It looks like this ( survival_curves in attachement)
    Hope this helps,

  • gnanaraja
    gnanaraja Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭

    Sorry, I was not able to test the plugin. My office instance is not allowed to test the unofficial plugin. My trial license i am unable to upload the plugin zip file @tgb417

  • gnanaraja
    gnanaraja Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭

    Thank you so much. I will check the possibility of installing the plugin. @DavidALI

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Not clear why you would have problems with a trial license and the plugin. I've used trial licenses with various plugins when the software is installed on my local workstation. If you are using a trial online license then yes you will not be able to upload this plugin as far as I know.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @DavidALI
    ,

    Thanks for a bit of clarity about the input data format of a typical input dataset.

    My guess is that each row in the dataset represents a participant enrolled in a trial. Not multiple rows for the same participant.

    One of the variable says if the event occurred. (In medical cases this is often death, ergo the name "Survival" Analysis)

    We need another column is a calculated duration of survival at time of analysis or end of study. What if the study lasted 48 months. However a participant was registered on the 12th month. Do we put in 32 months for that patient.

    What it the treatment has multiple variables? Can we use multiple variables/features for the treatment? Or do we have to reduce all of the variable to a single categorical column with all of the combinations?

    Am I understanding correctly that the graphs that are produced show given the treatment, (which graph we are looking at.) this is the likelihood of the event to occur over time. (Given the right censored nature of the data.). And there is an uncertainty interval.

    Thanks for any further insights you can share.

  • gnanaraja
    gnanaraja Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭

    Hi Tom @tgb417
    ,

    yes , i was trying the plug in on trial online license. I will try in my local machine.

    Thank you

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @gnanaraja
    ,

    The installation and use options are a bit hidden.

    In Dataiku Version 11.x

    You have to go to the waffle menu in the upper right-hand corner of the Dataiku page.

    Choose Plugins.

    In Plugins choose Add Plugin button on the upper right-hand corner of the screen.

    You will get a menu. In the menu choose Upload. This will allow you to choose the .zip file for the plugin from your local computer. You will then be asked to choose the file from your local computer.

    If you can restart DSS do so.

    Go back to a flow with the data you wish to conduct Survival Analysis on.

    In the flow click on a data set. Then choose recipe at the top of your flow. You should see the Survival analysis plugin in that pull down menu.

  • DavidALI
    DavidALI Registered Posts: 18 ✭✭✭✭✭

    Hello,

    Indeed, you have it all correct.

    One line is one indivudual

    The variable "Death observed" says if the event occured . In my use case, it is death but it could be disconnection from a website for example

    The variable "Duration" is the duration between 2 events , in my case it is the duration between diagnosis and death. Every patient has a different date of diagnosis and death but the variable of interest is the duration between this 2 events. In an another use case it could be duration between connection and disconnection to a website.

    With the plugin provided, only one variable may be taken into account in the analysis.

    The graph is the likelyhood that the event DO NOT occurs over time. i.e the probabiliy of being alive at each time after diagnosis . ( Or the probability of still being connected to a website, for exmaple ) . The blue area is the uncertainty interval.

    More information about survival function

    And Kaplan-Meier estimator

    I believe this plugin has been made with the lifeline module of python

    Hope this helps

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @DavidALI

    Thank you for this.

    I noticed that your first link "survival function" is not working.

    Looks like we are barley scratching the surface of what the lifelines package can do with the current plugin.

    I'm not clear how easy it is to give lifelines a set of data that do not align well it's assumptions of the model in such a way that we commit mathematical "malpractice" (pun intended).

    --Tom

Setup Info
    Tags
      Help me…