Oracle SQL Join Methods: Nested Loops, Hash & Sort-Merge Joins Explained

Oracle SQL | SQL Join Methods

Overview and Setup
Nested Loops Joins
Hash Joins
Sort-Merge Joins

Overview and Setup

Oracle Database uses three primary join algorithms to combine data from multiple tables. The optimizer automatically selects the most efficient method based on table statistics, available indexes, and data volume:

Nested Loops Joins: Best for small result sets and when one table is much smaller.
Hash Joins: Optimal for large result sets with sufficient memory.
Sort-Merge Joins: Effective when both tables are large and pre-sorted.

Each join method can be used with different join types (INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, CARTESIAN) depending on the SQL requirements.

This table and sample data is used in the examples below:

CREATE TABLE table_join_methods
(
    id1 NUMBER(5,0) NOT NULL
,   id2 NUMBER(5,0) NOT NULL
,   type1 NUMBER(1,0) NOT NULL
,   CONSTRAINT cp_table_join_methods_id1 PRIMARY KEY (id1)
);

DECLARE
indexLoop NUMBER(5,0);

BEGIN
FOR indexLoop IN 1..50000 LOOP
    IF MOD(indexLoop,2) = 0 THEN
        INSERT INTO table_join_methods (id1, id2, type1) VALUES (indexLoop, DBMS_RANDOM.VALUE(1,50000), 1);
    ELSE
        INSERT INTO table_join_methods (id1, id2, type1) VALUES (indexLoop, DBMS_RANDOM.VALUE(1,50000), 2);
    END IF;
END LOOP;
END;
/

COMMIT;

CREATE INDEX ix_table_join_methods_id2 ON table_join_methods (id2);
ANALYZE INDEX ix_table_join_methods_id2 COMPUTE STATISTICS;

CREATE INDEX ix_table_join_methods_type1_id2 ON table_join_methods (type1,id2);
ANALYZE INDEX ix_table_join_methods_type1_id2 COMPUTE STATISTICS;

Nested Loops Joins

Nested loops joins use a simple algorithm with two nested iterations. The outer loop processes rows from the driving table, and for each row, the inner loop searches the second table for matching records. This method is highly efficient when the driving table returns few rows and the inner table has an appropriate index on the join column.

The optimizer typically chooses this method when:

The driving table returns a small number of rows.
There is an efficient index on the inner table's join column.
The join condition is selective.
Memory constraints limit the use of hash joins.

SELECT tjm1.id1
FROM table_join_methods tjm1
JOIN table_join_methods tjm2 ON tjm2.id2 = tjm1.id1

Execution plan:

-------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                           |      1 |        |     50 |00:00:00.01 |      16 |
|   1 |  NESTED LOOPS         |                           |      1 |  41763 |     50 |00:00:00.01 |      16 |
|   2 |   INDEX FAST FULL SCAN| IX_TABLE_JOIN_METHODS_ID2 |      1 |  41763 |     50 |00:00:00.01 |       7 |
|*  3 |   INDEX UNIQUE SCAN   | CP_TABLE_JOIN_METHODS_ID1 |     50 |      1 |     50 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("TJM2"."ID2"="TJM1"."ID1")

In this execution plan, the optimizer uses the index on id2 as the driving source, then performs index unique scans on the primary key for each matching row. The low buffer count indicates efficient index usage.

Hash Joins

Hash joins build an in-memory hash table from the smaller dataset (typically the one with fewer rows after applying WHERE clause filters), then probe this hash table for each row in the larger dataset. This method excels when sufficient memory is available and both tables contribute significant numbers of rows to the result set. If memory is insufficient, Oracle may spill to disk, which can degrade performance significantly.

The process works as follows:

The smaller table (build table) is read and a hash table is created in memory.
Each row from the larger table (probe table) is processed.
Hash function is applied to join column values to find potential matches.
Matching rows are returned based on the join condition.

SELECT tjm1.id1
FROM table_join_methods tjm1
JOIN table_join_methods tjm2 ON tjm2.id2 = tjm1.id1
WHERE tjm1.type1 = 1

Execution plan:

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                           |      1 |        |     50 |00:00:00.01 |     166 |       |       |          |
|*  1 |  HASH JOIN            |                           |      1 |  36890 |     50 |00:00:00.01 |     166 |  1295K|  1295K| 1738K (0)|
|*  2 |   TABLE ACCESS FULL   | TABLE_JOIN_METHODS        |      1 |  20887 |  25000 |00:00:00.01 |     159 |       |       |          |
|   3 |   INDEX FAST FULL SCAN| IX_TABLE_JOIN_METHODS_ID2 |      1 |  41763 |     96 |00:00:00.01 |       7 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TJM2"."ID2"="TJM1"."ID1")
   2 - filter("TJM1"."TYPE1"=1)

The execution plan shows that Oracle built the hash table from the filtered table data (25,000 rows with type1 = 1) and probed it with rows from the index scan. The Used-Mem column indicates the hash table fit entirely in memory.

Sort-Merge Joins

Sort-merge joins sort both input datasets by their join columns, then merge the sorted results to identify matching rows. This method is particularly effective when both tables are large, when the data is already sorted, or when indexes exist that can provide pre-sorted access paths.

The algorithm consists of three phases:

Sort phase: Both datasets are sorted by the join columns.
Merge phase: Sorted datasets are merged in a single pass.
Output phase: Matching rows are returned to the calling operation.

Sort-merge joins are advantageous when memory is limited for hash operations, when both tables are approximately the same size, or when the query can benefit from existing indexes that provide sorted access.

SELECT tjm1.id2, tjm2.id2
FROM table_join_methods tjm1
JOIN table_join_methods tjm2 ON tjm1.id2 = tjm2.id2
WHERE tjm1.type1 = 1
AND tjm2.type1 = 2

Execution plan:

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                                 |      1 |        |  1250K |00:00:00.03 |     134 |       |       |          |
|   1 |  MERGE JOIN          |                                 |      1 |    626M|  1250K |00:00:00.03 |     134 |       |       |          |
|*  2 |   INDEX RANGE SCAN   | IX_TABLE_JOIN_METHODS_TYPE1_ID2 |      1 |  25000 |  25000 |00:00:00.01 |      67 |       |       |          |
|*  3 |   INDEX RANGE SCAN   | IX_TABLE_JOIN_METHODS_TYPE1_ID2 |      1 |  25000 |  25000 |00:00:00.01 |      67 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("TJM1"."TYPE1"=1)
   3 - access("TJM2"."TYPE1"=2)

In this example, Oracle benefits from the composite index on (type1, id2) which provides pre-sorted data, eliminating the need for explicit sort operations. The merge join efficiently combines the two sorted streams based on the id2 values.