Delete entirely empty columns at import

Options
user1278910
user1278910 Registered Posts: 1 ✭✭✭

Hello - I am importing multiple sources of Excel data over several years, same reporting format (field names, field types) and seeing quite a few entirely empty columns ~ being picked up for combining into a single flat table. This results in random columns being placed in the initial combined data set and triggering the prepare recipe (at post combining) to fail, e.g., trimming. Does anyone have pointers to prevent this? Thanks.

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @user1278910
    ,

    The best way to handle this will probably depend quite a bit on what your input Excel files look like and what the issue in your current output dataset looks like. From your description, it sounds like even though the input files are similar, they may not have exactly the same schema, or some of the files may have an additional column or two that leads to shifting of some of the data unexpectedly. Is that understanding correct? In order to determine the simplest way to reconcile / remove the empty columns, it would be most useful to see a couple of sample rows of two of the mismatched input files, along with a screenshot highlighting the issue in the output dataset in order to determine the best way to resolve.

    Thanks,
    Sarina

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Hi @user1278910
    ,

    In addition to @SarinaS
    : these empty columns, do they hold data in the original files ? Is that data anything special (like non-standard textcharacters for example greek or cyrillic) ?

    cheers

    Jurre

  • shosho88
    shosho88 Partner, Dataiku DSS Core Designer, Registered Posts: 6 Partner
    Options

    I'm starting to learn Dataiku

    I have about a same issue. I imported a dataset (excel file) with many empty column and want to delete it all in one time. Do you have a method ?

    Thanks for your help

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Hi and Welcome @shosho88
    ,

    To delete many cols at once the column view comes in handy, the 'hamburger' button on the right of your screen is for that (see attached screenshot #1). Now check the checkboxes of cols you want to delete, or select all and de-select everything you want to keep. In 'actions' (screenshot #2) you'll find "Delete" which deletes the selected columns.

    Cheers!

  • shosho88
    shosho88 Partner, Dataiku DSS Core Designer, Registered Posts: 6 Partner
    Options

    Thanks @Jurre

    I did this but i Have a problem I want to filter only the column empty to select all once

    I took a screenshot

    Thanks again for your help

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi @shosho88

    If you have a high number of empty columns.

    You could also use a Python recipe and dropna in pandas to remove all columns that are completely empty.

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    input_dataset = dataiku.Dataset("dataset_name")
    
    
    df = input_dataset.get_dataframe().dropna(axis=1, how='all')
    
    
    # Write recipe outputs
    testing = dataiku.Dataset("testing")
    testing.write_with_schema(df)
    

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 114 ✭✭✭✭✭✭✭
    Options

    Thanx for the help @AlexT
    !

    @shosho88
    a bit of backgroundinfo to Alex' solution (code examples and what it does)

  • shosho88
    shosho88 Partner, Dataiku DSS Core Designer, Registered Posts: 6 Partner
    Options

    Perfect! thanks guys @Jurre
    and @AlexT

Setup Info
    Tags
      Help me…