Add support for Pandas 2.0

Turribeach
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,876 Neuron

Comments

  • crunis
    crunis Registered Posts: 8 ✭✭✭

    It also allows nullable integer data type , sometimes is really annoying that my integer field becomes float only because there's a single NA

  • AshleyW
    AshleyW Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker

    Thanks for your idea, @Turribeach
    . Your idea meets the criteria for submission, we'll reach out should we require more information.

    If you’re reading this post and think [add more optional details] would be a great capability to add to DSS, be sure to kudos the original post! Feel free to leave a comment in the discussion about how this capability would help you or your team.

    Take care,
    Ashley

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,876 Neuron

    It also worth pointing out that we are already seeing the impact on not having a recent pandas version supported. In our Dataiku v12 environments it takes more than 6 mins to build any code environment with pandas. For 3.9 envs pip downloads pandas-1.1.5-cp39-cp39-manylinux1_x86_64.whl but for 3.11 pip gets pandas-1.3.5.tar.gz. This is because there are no pre-compiled pandas 1.3.5 for Python 3.11! So on Python 3.11 we are basically downloading the pandas source and building it from scratch including the cpython extensions. This is a risky thing to do since building from source is a much more complex and prone to error process than just installing a pre-compiled package. So sooner or later this build will break either due to OS or package dependencies.

    And to add more complexity our Python developers which work on our internal Python data libraries also struggle to get code envs created using Python 3.11 and pandas 1.3.5 since there are no pre-compiled binaries for Windows and building from source is even harder on Windows as Windows doesn't come with the necesary software to do so.

  • AsishM
    AsishM Registered Posts: 4

    There is a decent amount SQLAlchemy functionality that was introduced in 2.0+ that we require but pandas < 2 is not compatible with it at all, rendering any projects that need it having to skip using Dataiku. Pandas 1.3.5 is almost 2 years old by now.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,876 Neuron

    Great to see this product idea gathering some momentum. If you haven't done so please raise a support ticket with Dataiku and express your desire to get pandas 2.0 or above supported. Then also speak to your Customer Success Manager / Account Manager and reiterate your request. Last time I checked with Dataiku they said there was no interest from other customers for support for pandas 2.x so you need to make yourself heard if you want Dataiku to look at this request. Thanks!

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,876 Neuron

    No pandas 2.0 yet but Dataiku v12.5 added support for pandas 1.4/1.5:

    https://doc.dataiku.com/dss/latest/release_notes/12.html#apis-and-coding-experience

    This basically completes all Pandas versions since 0.23 below 2.x.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,876 Neuron

    Dataiku v13.0.0 added support for Pandas 2.x:

    https://doc.dataiku.com/dss/latest/release_notes/13.html#coding

    However the PyArrow backend is still not supported nor is Polars. So the benefits to the product are limited.

  • WH
    WH Registered Posts: 17 ✭✭✭✭

    Would love to see Polars and PyArrow backend supported

Setup Info
    Tags
      Help me…