Delete entirely empty columns at import
Hello - I am importing multiple sources of Excel data over several years, same reporting format (field names, field types) and seeing quite a few entirely empty columns ~ being picked up for combining into a single flat table. This results in random columns being placed in the initial combined data set and triggering the prepare recipe (at post combining) to fail, e.g., trimming. Does anyone have pointers to prevent this? Thanks.
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @user1278910
,
The best way to handle this will probably depend quite a bit on what your input Excel files look like and what the issue in your current output dataset looks like. From your description, it sounds like even though the input files are similar, they may not have exactly the same schema, or some of the files may have an additional column or two that leads to shifting of some of the data unexpectedly. Is that understanding correct? In order to determine the simplest way to reconcile / remove the empty columns, it would be most useful to see a couple of sample rows of two of the mismatched input files, along with a screenshot highlighting the issue in the output dataset in order to determine the best way to resolve.
Thanks,
Sarina -
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi @user1278910
,In addition to @SarinaS
: these empty columns, do they hold data in the original files ? Is that data anything special (like non-standard textcharacters for example greek or cyrillic) ?cheers
Jurre
-
I'm starting to learn Dataiku
I have about a same issue. I imported a dataset (excel file) with many empty column and want to delete it all in one time. Do you have a method ?
Thanks for your help
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Hi and Welcome @shosho88
,To delete many cols at once the column view comes in handy, the 'hamburger' button on the right of your screen is for that (see attached screenshot #1). Now check the checkboxes of cols you want to delete, or select all and de-select everything you want to keep. In 'actions' (screenshot #2) you'll find "Delete" which deletes the selected columns.
Cheers!
-
Thanks @Jurre
I did this but i Have a problem I want to filter only the column empty to select all once
I took a screenshot
Thanks again for your help
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,218 Dataiker
Hi @shosho88
If you have a high number of empty columns.
You could also use a Python recipe and dropna in pandas to remove all columns that are completely empty.# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs input_dataset = dataiku.Dataset("dataset_name") df = input_dataset.get_dataframe().dropna(axis=1, how='all') # Write recipe outputs testing = dataiku.Dataset("testing") testing.write_with_schema(df)
-
Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭
Thanx for the help @AlexT
!@shosho88
a bit of backgroundinfo to Alex' solution (code examples and what it does) -