Oracle NLSSORT Function Indexes

Oracle SQL | NLSSORT Function Indexes

Overview and Best Practices
Binary Sorting with Regular Indexes
Linguistic Sorting with Function-Based Indexes
Common Issues and Limitations

Overview and Best Practices
The NLSSORT function in Oracle enables linguistic sorting and comparison operations that respect different language-specific collation rules.

The key parameters that control sorting and comparison behavior are:
- NLS_SORT: Determines the collation sequence used for sorting operations. Common values include BINARY, BINARY_AI, BINARY_CI, and linguistic values like FRENCH, GERMAN, etc.
- NLS_COMP: Controls how string comparisons are performed. Set to BINARY for byte-by-byte comparison or LINGUISTIC to use the NLS_SORT parameter rules.
Notes:
- For binary sorting (accent-sensitive and case-sensitive):
  - Set NLS_SORT to BINARY
  - Set NLS_COMP to BINARY
  - Use regular indexes without functions for optimal performance
- For linguistic sorting that remains accent-sensitive and case-sensitive:
  - Set NLS_SORT to a linguistic value (e.g., FRENCH, GERMAN)
  - Set NLS_COMP to LINGUISTIC
  - Create function-based indexes using NLSSORT with the same NLS_SORT value
- For accent-insensitive and/or case-insensitive sorting:
  - Set NLS_SORT to BINARY_AI (accent-insensitive), BINARY_CI (case-insensitive), or linguistic values with _AI/_CI extensions
  - Set NLS_COMP to LINGUISTIC
  - Create function-based indexes using NLSSORT with matching NLS_SORT parameters
For detailed explanation of NLS_SORT parameter values, see NLS_SORT Parameter
For detailed explanation of NLS_COMP parameter values, see NLS_COMP Parameter

This table and sample data is used in the examples below:
```
CREATE TABLE table_nlssort
(
    id NUMBER(2,0) NOT NULL
,   desc1 NVARCHAR2(2) NOT NULL
);
```
```
INSERT INTO table_nlssort (id, desc1) VALUES (1, 'aé');
INSERT INTO table_nlssort (id, desc1) VALUES (2, 'ae');
INSERT INTO table_nlssort (id, desc1) VALUES (3, 'af');
INSERT INTO table_nlssort (id, desc1) VALUES (4, 'ad');
INSERT INTO table_nlssort (id, desc1) VALUES (5, 'aÉ');
INSERT INTO table_nlssort (id, desc1) VALUES (6, 'aE');
INSERT INTO table_nlssort (id, desc1) VALUES (7, 'aF');
INSERT INTO table_nlssort (id, desc1) VALUES (8, 'aD');

COMMIT;
```

Binary Sorting with Regular Indexes

When NLS_SORT is set to BINARY, Oracle performs exact byte-by-byte comparisons. This means that 'a', 'A', 'á', and 'Á' are all treated as distinct characters. In this scenario, regular B-tree indexes provide the most efficient access path.

Session Configuration:

ALTER SESSION SET NLS_SORT = BINARY;
ALTER SESSION SET NLS_COMP = LINGUISTIC;

Index Creation and Statistics:

CREATE INDEX idx_table_nlssort_desc1 ON table_nlssort ( desc1 );
ANALYZE INDEX idx_table_nlssort_desc1 COMPUTE STATISTICS;

Query Execution with Regular Index:

SELECT * FROM table_nlssort WHERE desc1 = 'aÉ';

Execution Plan:

| OBJECT_NAME             | OPTIONS        | Access Predicates |
+-------------------------+----------------+-------------------+
| TABLE_NLSSORT           | BY INDEX ROWID |                   |
| IDX_TABLE_NLSSORT_DESC1 | RANGE SCAN     | DESC1 = U'a\00C9' |

The execution plan shows that Oracle efficiently uses the index with a RANGE SCAN operation, which is optimal for this type of query.

Incompatible Function-Based Index Example:

DROP INDEX idx_table_nlssort_desc1;
CREATE INDEX idx_table_nlssort_desc1 ON table_nlssort ( NLSSORT(desc1,'nls_sort=''BINARY_AI''') );
ANALYZE INDEX idx_table_nlssort_desc1 COMPUTE STATISTICS;

SELECT * FROM table_nlssort WHERE desc1 = 'aÉ';

Execution Plan:

| OBJECT_NAME   | OPTIONS | Filter Predicates |
+---------------+---------+-------------------+
| TABLE_NLSSORT | FULL    | DESC1 = U'a\00C9' |

When the session NLS_SORT setting doesn't match the function-based index definition, Oracle cannot use the index and resorts to a full table scan. This demonstrates the importance of aligning session parameters with index definitions.

Forcing Index Usage with Hints:

SELECT /*+ index(table_nlssort idx_table_nlssort_desc1) */ * FROM table_nlssort WHERE desc1 = 'aÉ';

Execution Plan:

| OBJECT_NAME             | OPTIONS        | Filter Predicates |
| ------------------------+----------------+-------------------+
| TABLE_NLSSORT           | BY INDEX ROWID | DESC1 = U'a\00C9' |
| IDX_TABLE_NLSSORT_DESC1 | FULL SCAN      |                   |

While it's possible to force index usage with hints, this results in a FULL SCAN of the index rather than an efficient RANGE SCAN, which is typically less performant for selective queries.

Linguistic Sorting with Function-Based Indexes

When using linguistic sorting parameters like BINARY_AI (accent-insensitive), Oracle transforms string comparisons using the NLSSORT function. This transformation means that 'aé', 'ae', 'aÉ', and 'aE' would be considered equivalent for comparison purposes. Regular indexes cannot handle these transformations, making function-based indexes essential.

Session Configuration for Accent-Insensitive Sorting:

ALTER SESSION SET NLS_SORT = BINARY_AI;
ALTER SESSION SET NLS_COMP = LINGUISTIC;

Regular Index Limitation:

DROP INDEX idx_table_nlssort_desc1;
CREATE INDEX idx_table_nlssort_desc1 ON table_nlssort ( desc1 );
ANALYZE INDEX idx_table_nlssort_desc1 COMPUTE STATISTICS;

SELECT * FROM table_nlssort WHERE desc1 = 'aÉ';

Execution Plan:

| OBJECT_NAME   | OPTIONS | Filter Predicates                                              |
+---------------+---------+----------------------------------------------------------------+
| TABLE_NLSSORT | FULL    | NLSSORT(DESC1,'nls_sort=''BINARY_AI''')=HEXTORAW('0061006500') |

Oracle automatically applies the NLSSORT function to the column in the WHERE clause, but since the index is on the raw column value, it cannot be used effectively. The HEXTORAW value represents the normalized sort key for the comparison.

Forced Index Usage (Inefficient):

SELECT /*+ index(table_nlssort idx_table_nlssort_desc1) */ * FROM table_nlssort WHERE desc1 = 'aÉ';

Execution Plan:

| OBJECT_NAME             | OPTIONS        | Filter Predicates                                              |
+-------------------------+----------------+----------------------------------------------------------------+
| TABLE_NLSSORT           | BY INDEX ROWID |                                                                |
| IDX_TABLE_NLSSORT_DESC1 | FULL SCAN      | NLSSORT(DESC1,'nls_sort=''BINARY_AI''')=HEXTORAW('0061006500') |

Even when forced, the index requires a full scan because the transformation must be applied to every indexed value during the scan.

Proper Function-Based Index Solution:

DROP INDEX idx_table_nlssort_desc1;
CREATE INDEX idx_table_nlssort_desc1 ON table_nlssort ( NLSSORT(desc1,'nls_sort=''BINARY_AI''') );
ANALYZE INDEX idx_table_nlssort_desc1 COMPUTE STATISTICS;

SELECT * FROM table_nlssort WHERE desc1 = 'aÉ';

Execution Plan:

| OBJECT_NAME             | OPTIONS        | Access Predicates                                              |
+-------------------------+----------------+----------------------------------------------------------------+
| TABLE_NLSSORT           | BY INDEX ROWID |                                                                |
| IDX_TABLE_NLSSORT_DESC1 | RANGE SCAN     | NLSSORT(DESC1,'nls_sort=''BINARY_AI''')=HEXTORAW('0061006500') |

With the properly configured function-based index, Oracle can efficiently perform a RANGE SCAN operation, providing optimal query performance. The index stores the pre-computed NLSSORT values, eliminating the need for runtime transformations.

Common Issues and Limitations
When working with NLSSORT function-based indexes, several important considerations and limitations must be understood to avoid common pitfalls.

Unique Index Constraints:
Creating unique indexes on NLSSORT functions can lead to constraint violations when different character representations normalize to the same sort key. This is particularly common with accent-insensitive or case-insensitive collations.

Example of Problematic Unique Index:
```
CREATE UNIQUE INDEX idx_table_nlssort_desc1 ON table_nlssort (NLSSORT(desc1,'NLS_SORT=BINARY_AI'));
```
Query result:
```
Error starting at line 1 in command:
CREATE UNIQUE INDEX IDX_TABLE_NLSSORT_DESC1 ON TABLE_NLSSORT (NLSSORT(DESC1,'NLS_SORT=BINARY_AI'))

Error at Command Line:1 Column:69

Error report:
SQL Error: ORA-00904: "DESC1": invalid identifier.
*Cause:
*Action:
```
This error occurs because the NLSSORT function generates identical sort keys for values that are considered equivalent under the specified collation (such as 'aé' and 'ae' with BINARY_AI). Since multiple rows would produce the same index key, the unique constraint cannot be enforced.