Issues While Visualizing Data with almost 700 million records

Options
Priyansh
Priyansh Registered Posts: 2 ✭✭✭

Hi All,

I am new to dataiku, I am learning it while doing a project.

In the project I have to visualize the data which contains 700 million records. There are 24 columns and 2 of the columns are timestamp(at what time user queried) and user-name. I have to make line chart with the timestamp as x-axis for each user. For whole data it gives me some error ( cannot do whole data at once). So is there any other way (like how to use aggregating tables or any other method) to visualize this data to finally get the required result/ analysis ?

Thank you, Everyone for the help!!

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    if you're plotting the data as a time series (that is line chart in "automatic" X axis mode), then 700M rows is beyond the capabilities of the basic engine. You have to load the data in a SQL database and plot on a SQL dataset.

    Also note that you'll probably need to filter on the user names, but if the number of names is large, the simple filters that you can get in DSS' charts won't cut it (if you need to filter more than a few dozen of values, value lists are quite painful, UI-wise...) and you'll have to code the visualization yourself.

  • Priyansh
    Priyansh Registered Posts: 2 ✭✭✭
    Options

    Hi,

    So there is no other way around to do this task in dataiku, I have to visualize the data using python recipe etc. only right?

    Thanks for the information, it was really helpful!.

Setup Info
    Tags
      Help me…