Add option to support non-pandas dataframes (e.g. polars) in Python recipes

Hi,

 

There are many pandas alternatives.  One that is new and very fast is polars.  Polars is built on Rust so it is memory safe and runs in parallel by design.  I use polars in one of my recipes but have to convert it to pandas to write the dataset.

 

thx

5 Comments
ClemenceB
Dataiker

Thanks @info-rchitect, this has been added to the backlog!

Thanks @info-rchitect, this has been added to the backlog!

ClemenceB
Dataiker
 
Status changed to: In Backlog
 
MichaelG
Community Manager
Community Manager
 
I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
Status changed to: In the Backlog
 
arbolja
Level 1

Hello! I am very interested in this as well. Is there any easy way of checking the life cycle of the updates in the backlog (for example, if they have a tracking number, if they have been implemented in some version...)? Thanks!

Hello! I am very interested in this as well. Is there any easy way of checking the life cycle of the updates in the backlog (for example, if they have a tracking number, if they have been implemented in some version...)? Thanks!

zloe
Level 3

Completely agree!

Currently, there is no benefit in using Polars, duckdb, Ray or Dask, because retrieving dataset always means getting Pandas DataFrame object. Converting it to anything else defeats the purpose.

You can write the results directly to the database, but it is a workaround: you'd still have to initialize the dataset object first. And your code environment would also require you to have db drivers and so on.

Would be nice to have different options for data retrieval and data writing. Polars seems like a good candidate.

Completely agree!

Currently, there is no benefit in using Polars, duckdb, Ray or Dask, because retrieving dataset always means getting Pandas DataFrame object. Converting it to anything else defeats the purpose.

You can write the results directly to the database, but it is a workaround: you'd still have to initialize the dataset object first. And your code environment would also require you to have db drivers and so on.

Would be nice to have different options for data retrieval and data writing. Polars seems like a good candidate.