The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Get filtered dataframe via Python API

emher
Level 3
Get filtered dataframe via Python API

I am using the Python API to read data from dataiku, i.e. something like

ds = dataiku.Dataset("name", project_key="project_key")
df = ds.get_dataframe(limit=5000)

As indicated in the code above, it is easy to select a subset of the data (in this case 5000 rows). However, what is need is to select a particular subset of the data. In SQL, I would use a WHERE clause, i.e. something along the lines of

SELECT * FROM name WHERE alarm = 1

My question is now, how can I do this in dataiku Python API? I am aware of the filtering options in the GUI (which is essentially what I need), but I couldn't find out how to do the filtering from the Python API. 

0 Kudos
1 Reply
emher
Level 3
Author

So far, the best solution I have found is to use the SQL bindings directly. It looks like this,

from dataiku import SQLExecutor2
executor = SQLExecutor2(connection="name_of_connection_in_dataiku")
df = executor.query_to_df("SELECT * FROM db_name.name WHERE alarm = 1;")

It works more-or-less as intended. Please let me know if there is a better/preferred approach 🙂

0 Kudos
A banner prompting to get Dataiku DSS