Slow performance questions

Fahraynk Registered Posts: 3 ✭✭✭


I am having some performance issues. I have about 5000 rows of data where one of the columns is a a large amount of text, and importing this data from JSON was quick. But everything else I try to do is painfully slow. Since data-science tasks can include millions of rows, something must be wrong with my implementation if this software is running so slow with only 5000 rows. The software is also eating my ram at 22-GB for this dataset which is a 23-MB JSON file.

For example, when I apply a recipe, just changing the names of my 8 columns, when I try to open the dateset after applying the recipe it takes several minutes.

Also when I try to load the dataset with Python in a notebook it takes about 3 minutes just to get the data into a pandas dataframe. These are the two lines that are importing the data which is taking the 3 minutes:


dataset = dataiku.Dataset("dataset")
df = dataset.get_dataframe()


Why on my server is it taking DSS 22 GB of ram to work on a 23-MB file? Also, what can I do differently to get things running faster? Are there some best practices I am missing?

Operating system used: pop-os



Setup Info
      Help me…