I recognize that this is a hyper-specific use case, and I'm unable to test broader suitability because I only use SQL, but perhaps this would be useful in other databasing systems as well:
I recently added date partitioning to a datasource that exists outside of Dataiku. Notably, it DOES have indexes specified on the date field used for partitioning. Before partitioning the datasource, it performed well, and I was using a project variable to control how far back a query could go, but I wanted to remove this, and use partitioning instead. My query performance took a serious hit, and here is why:
My table contains datetime data that does not have time zone information with it. It is understood to be in local time. The partitioning functions decorate the query with a where clause that looks (in part) like this:
TODATETIMEOFFSET( "<my_timestamp_field>" , -420) >= '2022-10-11 09:00:00'....
The call to TODATETIMEOFFSET requires a calculation on each row, and therefore causes SQL to perform a tablescan rather than using the appropriate indexes. In testing this query as written, it took about 20 seconds to pull the required data. I then rewrote the query like this:
"<my_timestamp_field>" >= TODATETIMEOFFSET( '2022-10-11 09:00:00' , 420)....
And the same data was returned in under 1 second.
Is it possible to re-write the time offsets such that the fixed value (in this case, the partitioning dimension) is the one that gets modified by the offset? This allows the calculation to be performed just once, and allows the DBMS to utilize the indexes.