«

»

May 16

Secondary Index in Teradata

Before you start with SI, well I must say that as a prerequisite you must first read about the Primary index in Teradata.


So after knowing about Primary index the point here is that when we already have UPI and NUPI then what’s the use of this Secondary Index?

Well the best possible answer for this question is that – Secondary Indexes provide an alternate path to the data, and should be used on queries that run many times.

Teradata runs extremely well without secondary indexes, but since secondary indexes use up space and overhead, they should only be used on “KNOWN QUERIES” or queries that are run over and over again. Once you know the data warehouse, environment you can create secondary indexes to enhance its performance.

Syntax of creating Secondary Index

Syntax of UNIQUE SI:

CREATE UNIQUE INDEX (Column/Columns) ON <dbname>.<tablename >;

Syntax of NON-UNIQUE SI:

CREATE INDEX (Column/Columns) ON <dbname>.<tablename >;

Note – SI can be created even after table is populated with data. Unlike PI which is created only at the time of creation of table. You can create and drop SI at any time.

Whenever you create SI on the table, Teradata will create a subtable on all AMP. This subtable contains three columns given below –

  1. Secondary Index Value
  2. Secondary Index Row ID (this is the hashed value of SI value)
  3. Base table Row ID (this is the actual base row id  )

Will see the use of all these values later in this post.

USI Subtable Example

When we defined a UNIQUE SI on the table, then Teradata will immediately create a USI subtable in each AMP for that particular table.


Remember creation of subtable requires PERM space, so always be wise to choose your SI. Normally the best SI is that column or columns which is mostly used in the WHERE clause.

Now I’ll explain in-depth architecture of creation of subtable and retrieval of SI for better understanding of the concept.

Please look into the image below –

Suppose we have an Employee table (base table) having attributes Emp, Dept Fname, Lname and Soc_security. We defined USI on the column Soc_Security.

You can see the SI subtable created on each AMP which holds information about the SI column and corresponding Base row id (Base Table Row-ID), which is the ROW ID of the actual Employee table. The steps involve to load this subtable is as follows –

1)      Teradata will first create the subtable on all AMP.

2)      After that it hashes the value of this USI column (Soc_Security) and based on that hashed value it check the hash map for the AMP number which will hold this USI value in its subtable.

3)      After getting the respective AMP number, the SI value along with the two more attributes (secondary index row id and base table row id) will be stored in the subtable of that AMP.

In this way we populate our USI subtable on each AMP. As the SI columns is UNIQUE there is no duplication of SI values in any subtable, means each row in the subtable is unique and will fetch only one row when we make a query on that SI column.

Note – As it is clear now that defining SI will require the creation of subtable, so we should be aware that SI requires space cost factor on our Teradata system.

Teradata retrieval of USI query.

Suppose on the above example we make a query –

Select * from Employee_table where Soc_Security = ‘123-99-8888’;

When a TD optimizer finds USI in where clause it knows that it’s a 2 AMP operation and also only one row will be returned. So the step its perform for retrieval is as follows –

1)      It will hash the value of SI (‘123-99-8888’), by hashing algorithm and found the hash value for it.

2)      Now it checks this hash value in the hash map and gets the AMP number from it. We know that this AMP stores this SI value.

3)      Now it will go to the Employee subtable of that AMP and retrieve the Base row id which is stored for that hash value.

4)      This Base row id will be sent back to optimizer by BYNET.

5)      Now optimizer sent back this ROW ID again and fetch the resultant row from the Base table for which Soc_Security = ‘123-99-8888’ .

As we have seen that Teradata system requires 2 AMP to reach the answer row that’s why we called USI operation as the 2 AMP operations. Even if SI row resides in the same AMP  in which Base row reside , still after getting Base row id from the subtable it will sent back to optimizer so that it start search again based on that Base row id. So it’s always called as the 2 amp operation.

NUSI Subtable Example 

When we defined a NUSI on the table then Teradata will build the subtable on each AMP in the same fashion as that in USI. The only difference in this subtable creation is that, instead of building subtable on each AMP it will be build on AMP local which means that each AMP will build the subtable in it to points it own base rows. In other words each NUSI subtable will reflect and points to the those base rows only which it owns.

Please look into the image below –

Suppose we have an Employee table (base table) on which we defined NUSI on the column Fname.

1)      Now Teradata will first create the subtable on all AMP.

2)      Each AMP will hold the secondary index values for their rows in the base table only. In our example, each AMP holds the Fname column for all employee rows in the base table on their AMP (AMP local).

3)      Each AMP Local Fname will have the Base Table Row-ID (pointer) so the AMP can retrieve it quickly if needed. If an AMP contains duplicate first names, only one subtable row for that name is built with multiple Base Row-IDs. See the example above for the Fname = ‘John’ , the subtable holds multiple base row id for this value.

Teradata retrieval of NUSI query.

Suppose on the above example we make a query –

Select * from Employee_table where Fname = ‘John’;

When an NUSI (Fname) is used in the WHERE clause of an SQL statement, the PE Optimizer recognizes the Non-Unique Secondary Index. It will perform an all AMP operation to look into the subtable for the requested value. So the step its perform for retrieval is as follows –

1)      It will hash the value of NUSI (‘John’), by hashing algorithm and found the hash value for it.

2)      Now it will instruct all AMP to look for this hash value in its Employee subtable. Note unlike USI there is no looking into hash map  because each subtable in the AMP contains rows from its own base rows only. So this look up on hash value will be performed on all AMP subtable.

3)       Any AMP which doesn’t have this hash value will not participate anymore in the operation.

4)      When the hash value found the corresponding Base row id will be fetched from the subtable and send to optimizer for actual retrieval of rows.

The point to note here is that NUSI operation is not similar to FTS (full table scan).

Suppose we don’t have Fname as the NUSI and we make the query on Fname in WHERE clause. In this case first of all Fname from the Employee table is redistributed in SPOOL space and then we match our value given in the where clause from the rows in SPOOL.

While in our case where Fname is defined as NUSI, TD optimizer already knows that this column is NUSI and its already distributed by its value in subtable in each AMP. So it will not go for redistribution step instead of that it will directly match the value for it in each subtables.

The PE will decide if a NUSI is strongly selective and worth using over a Full Table Scan. So it’s advisable to always do COLLECT STATS on NUSI index. You can check the Explain function to see if a NUSI is being utilized or if bitmapping (FTS) is taking place.

 Secondary Index Summary


1)      You can have up to 32 secondary indexes for a table.

2)      Secondary Indexes provide an alternate path to the data.

3)      The two types of secondary indexes are USI and NUSI.

4)      Every secondary index defined causes each AMP to create a subtable.

5)      USI subtables are hash distributed.

6)      NUSI subtables are AMP local.

7)      USI queries are Two-AMP operations.

8)      NUSI queries are All-AMP operations, but not Full Table Scans.

9)      Always Collect Statistics on all NUSI indexes.

Any suggestions, corrections or enhancements to the post are most welcome 🙂

97 pings

Skip to comment form

  1. Secondary Index in Teradata » TeraData Tech » BlinkMoth Software Industries | BlinkMoth Software Industries

    […] Teradata Blog Post From Teradata – Google Blog Search: Teradata runs extremely well without secondary indexes, but since secondary indexes use up space […]

Leave a Reply to Panini Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.