Calculating Median

dhyadav79 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 21 ✭✭✭✭


Could you please help how to calculate median in dataiku with example.



  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    the simplest is to use a SQL database (or Spark): load the data into a table, and use the database builtin median function (most databases have one). If you don't have access to a SQL database or to Spark, you can compute it in python with median() in Pandas if the data is not too large to fit into memory. If the data is too large for memory and you don't have SQL as an option, then using a window recipe to compute a rank and count on a window ordered by the column of which you seek the median, then filtering to keep the first row after P50 should yield the median

Setup Info
      Help me…