Dataset -> Analyze column fails predictably on big datasets

Highlighted
Thomas_K
Level 3
Dataset -> Analyze column fails predictably on big datasets
Jump to solution

I have a semi-big dataset (9GB, ~25 Mio entries). I tried to get a first look at it by doing Dataset -> Clicking on any column -> Analyze -> on "whole data".





This starts the following process:





... which fails everytime. I tried storing the dataset on Hive and as a single local file. It always fails, in Hadoop it fails considerably faster. (~4h).





Is this a known problem? Any suggestions what to do about it? If there is a connection error, shouldn't the results be calculated on the server still? Do I have to maintain a connection from my PC to the DataIku server the whole time?



Also, a side note:The progress bar doesn't really do anything? It stays white until it fails.



 

0 Kudos
1 Solution

Accepted Solutions
Clément_Stenac Dataiker
Dataiker
Re: Dataset -> Analyze column fails predictably on big datasets
Jump to solution
Hi,

You can open a support ticket (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku-customers)

Please attach an instance diagnosis (https://doc.dataiku.com/dss/latest/troubleshooting/diagnosing.html#getting-an-instance-diagnosis)

View solution in original post

1 Reply
Clément_Stenac Dataiker
Dataiker
Re: Dataset -> Analyze column fails predictably on big datasets
Jump to solution
Hi,

You can open a support ticket (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku-customers)

Please attach an instance diagnosis (https://doc.dataiku.com/dss/latest/troubleshooting/diagnosing.html#getting-an-instance-diagnosis)

View solution in original post