Get filtered dataframe via Python API

emher
emher Registered Posts: 32 ✭✭✭✭✭
edited July 16 in Using Dataiku

I am using the Python API to read data from dataiku, i.e. something like

ds = dataiku.Dataset("name", project_key="project_key")
df = ds.get_dataframe(limit=5000)

As indicated in the code above, it is easy to select a subset of the data (in this case 5000 rows). However, what is need is to select a particular subset of the data. In SQL, I would use a WHERE clause, i.e. something along the lines of

SELECT * FROM name WHERE alarm = 1

My question is now, how can I do this in dataiku Python API? I am aware of the filtering options in the GUI (which is essentially what I need), but I couldn't find out how to do the filtering from the Python API.

Answers

  • emher
    emher Registered Posts: 32 ✭✭✭✭✭

    So far, the best solution I have found is to use the SQL bindings directly. It looks like this,

    from dataiku import SQLExecutor2
    executor = SQLExecutor2(connection="name_of_connection_in_dataiku")
    df = executor.query_to_df("SELECT * FROM db_name.name WHERE alarm = 1;")

    It works more-or-less as intended. Please let me know if there is a better/preferred approach

Setup Info
    Tags
      Help me…