Do you know the History of Data Science? READ MORE

Issues While Visualizing Data with almost 700 million records

Priyansh
Level 1
Issues While Visualizing Data with almost 700 million records

Hi All,

I am new to dataiku, I am learning it while doing a project.

In the project I have to visualize the data which contains 700 million records. There are 24 columns and 2 of the columns are timestamp(at what time user queried) and user-name. I have to make line chart with the timestamp as x-axis for each user. For whole data it gives me some error ( cannot do whole data at once). So is there any other way (like how to use aggregating tables or any other method) to visualize this data to finally get the required result/ analysis ?

Thank you, Everyone for the help!!

 

0 Kudos
2 Replies
fchataigner2
Dataiker
Dataiker

Hi,

if you're plotting the data as a time series (that is line chart in "automatic" X axis mode), then 700M rows is beyond the capabilities of the basic engine. You have to load the data in a SQL database and plot on a SQL dataset.

Also note that you'll probably need to filter on the user names, but if the number of names is large, the simple filters that you can get in DSS' charts won't cut it  (if you need to filter more than a few dozen of values, value lists are quite painful, UI-wise...) and you'll have to code the visualization yourself.

Priyansh
Level 1
Author

Hi,

So there is no other way around to do this task in dataiku, I have to visualize the data using python recipe etc. only right?

 

Thanks for the information, it was really helpful!.

0 Kudos
A banner prompting to get Dataiku DSS