API Returns Only Enabled Metrics After First Dataset Upload

Jemmy
Jemmy Registered Posts: 3

When I upload a dataset for the first time, dataset.get_settings().get_raw().get('metrics') returns enabled metrics only be default. After toggling any metric in the UI (enable/disable), the API then returns ALL metrics including disabled ones.
Is there a way to programmatically initialize all metrics without UI interaction, so the API returns all available metrics immediately after upload?

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,630 Neuron

    Seen this in a few DSS API objects where Dataiku fakes the state of the object when you call the API based on same hardcoded default but only actually persist the object when you make a change (DSS User Profiles behave that way). One way around it is to make a change to the object and save it via the API. Try adding a dummy metric and saving the object:

    https://developer.dataiku.com/latest/concepts-and-examples/metrics.html#add-metric-on-a-column

    Make sure you get a new object handle after the save.

  • Jemmy
    Jemmy Registered Posts: 3

    Thanks for the insight! I tried adding a dummy metric and refreshing the object handle, but it still only returns enabled metrics.

    The only thing that works is manually toggling metrics in the UI. Is there a specific API call that replicates what the UI does when you click the metrics toggle?

    I also tried modifying the existing default metrics (changing their configuration or enabled state), but that didn't force the full metrics list to appear .

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,630 Neuron
    edited December 14

    Can you share you API code please? Use a Code Block (the </> icon).

  • Jemmy
    Jemmy Registered Posts: 3
    col_stats_probe = {
        'type': 'col_stats',
        'enabled': True,
        'computeOnBuildMode': 'No',
        'meta': {
            'name': 'Columns statistics',
            'level': 2
        },
        'configuration': {
            'aggregates': []
        }
    }
    
    dataset_definition = dataset('test').get_definition()
    ds_metrics_probe = dataset_definition['metrics']['probes']
    
    if not any(p["type"] == "col_stats" for p in ds_metrics_probe):
        ds_metrics_probe.append(col_stats_probe)
    
    for probe in ds_metrics_probe:
        if probe['type'] == 'basic':
            probe['type'] = False
            
    dataset('test').set_definition(dataset_definition)
    


    I tried get_settings also with save settings

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,630 Neuron
    edited December 14

    So I tested on my side as well and even if you modify an existing metric via the API then the disabled ones don't get added. Looking at the browser console on the metrics dataset screen I can see they get pushed by the GUI itself when you save as the screen calls an internal non-public API passing the 4 disabled missing metrics. So if you want them added you will need to add them via the API yourself. Here is code snippet that adds the 4 missing disabled metrics for a new uploaded dataset:

    import dataiku
    client = dataiku.api_client()
    project = client.get_default_project()
    dataset = project.get_dataset('test')
    dataset_settings = dataset.get_settings()
    metrics = dataset_settings.get_raw().get('metrics')['probes']
    metric_names_list = [item['meta']['name'] for item in metrics]
    
    metric_added = False
    if 'Columns statistics' not in metric_names_list:
        dataset_settings.get_raw().get('metrics')['probes'].append({'type': 'col_stats', 'enabled': True, 'computeOnBuildMode': 'NO', 'meta': {'name': 'Columns statistics', 'level': 2},'configuration': {'aggregates': []}})
        metric_added = True
    if 'Most frequent values' not in metric_names_list:
        dataset_settings.get_raw().get('metrics')['probes'].append({'type': 'adv_col_stats', 'enabled': True, 'computeOnBuildMode': 'NO', 'meta': {'name': 'Most frequent values', 'level': 3}, 'configuration': {'aggregates': [], 'numberTopValues': 10}})
        metric_added = True
    if 'Columns percentiles' not in metric_names_list:
        dataset_settings.get_raw().get('metrics')['probes'].append({'type': 'percentile_stats', 'enabled': True, 'computeOnBuildMode': 'NO', 'meta': {'name': 'Columns percentiles', 'level': 4}, 'configuration': {'aggregates': []}})
        metric_added = True
    if 'Data validity' not in metric_names_list:
        dataset_settings.get_raw().get('metrics')['probes'].append({'type': 'verify_col', 'enabled': True, 'computeOnBuildMode': 'NO','meta': {'name': 'Data validity', 'level': 4}, 'configuration': {'aggregates': []}})
        metric_added = True
    if metric_added:
        dataset_settings.save()
    

    Note I added them all as enabled, but that was just to test they show on the GUI on refresh. I suspect the metrics may vary depending on the dataset type. If that's the case you will need to inspect the metric probes for the different dataset type with:

    dataset_settings.get_raw().get('metrics')['probes']
    

    And see how they get defined. Then add them manually for that specific dataset type.

Setup Info
    Tags
      Help me…