Big Oracle SQL table

pkansal · ‎02-24-2022

I am trying to run a complex query on a SQL dataset. However, it just takes around 40 min to run. What can I do to run it faster?

Manuel · ‎02-24-2022

Hi,

It is hard to help you without detailed information:

What is the size of the dataset?
What type of recipe are you running?
Which connections are used for input and output dataset?
Which engine are you using to run the recipe? (check cogs near the Run button)

It would help if you screenshot that part of your flow.

Best regards

pkansal · ‎02-25-2022

1. Around 440 million rows

2. It is a SQL recipe that first does a SUM group by, then takes a MAX, and then applies a FILTER on the value of MAX

3. For Input, it is Oracle RDBMS. I am creating a SQL dataset based on a SQL query in settings.

4. I think it is running the query in database itself, right?

Manuel · ‎02-25-2022

Hi,

So, it seems that your input dataset is an SQL query. Did you see the attached warning? Instead, it is recommended that you setup a view that fits your needs in your database.

What is the data connection of the output? Recipes with different input and output connections will likely run on the DSS engine itself.

On the current engine, I suspect because of your setup, the recipe is not running on the database itself, but on the DSS engine. Check the options on the cog near the run button. What does it say its running? Local?

If possible,please take a screenshot of that part of your flow.

I hope this helps.

pkansal · ‎02-25-2022

CDR_Filtered- SQL dataset from sql query

The python recipe is running in a containerized execution mode.

Manuel · ‎02-25-2022

Hi,

One suggestion is that you use first Sync your source dataset (the SQL Query) into an S3 dataset, which will then be the input to your Python recipe. This way you won't be running the Oracle query every time you test your Python recipe.

On the performance of the SQL query iteself, I am afraid there is not much you can do. It is a heavy query, so it will take time to run.

I hope this helps.

pkansal · ‎02-25-2022

I tried to do it but the number of rows is more than 100 million and it takes a lot of time to do it. It uses DSS engine(RUN on stream).

Is there a faster way to do it?

The database is Oracle.

Manuel · ‎02-25-2022

Hi,

I am afraid there are no fast paths between Oracle and S3.

How large is the whole table you define the SQL query dataset on? Instead of the SQL Query dataset, have you considered using the whole table as the input dataset, followed by a recipe that selects the relevant rows, followed by the remaining logic

This would also allow you to use partitioning, processing the data in chunks.

I don't know what your use case is, but it seems worth to engage your Customer Success Manager and other services to have a closer look and help you optimise.

Best regards

pkansal · ‎02-25-2022

Bu the partitioning column is a derived column, so I cant use it to create partitions in SQL dataset.

Manuel · ‎02-25-2022

You should be able to use existing columns for SQL partitioning.

Sign up to take part

Big Oracle SQL table

Big Oracle SQL table

Setup info