Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am using the Python API to read data from dataiku, i.e. something like
ds = dataiku.Dataset("name", project_key="project_key")
df = ds.get_dataframe(limit=5000)
As indicated in the code above, it is easy to select a subset of the data (in this case 5000 rows). However, what is need is to select a particular subset of the data. In SQL, I would use a WHERE clause, i.e. something along the lines of
SELECT * FROM name WHERE alarm = 1
My question is now, how can I do this in dataiku Python API? I am aware of the filtering options in the GUI (which is essentially what I need), but I couldn't find out how to do the filtering from the Python API.
So far, the best solution I have found is to use the SQL bindings directly. It looks like this,
from dataiku import SQLExecutor2
executor = SQLExecutor2(connection="name_of_connection_in_dataiku")
df = executor.query_to_df("SELECT * FROM db_name.name WHERE alarm = 1;")
It works more-or-less as intended. Please let me know if there is a better/preferred approach ๐