«

»

May 16

Secondary Index in Teradata

Before you start with SI, well I must say that as a prerequisite you must first read about the Primary index in Teradata.


So after knowing about Primary index the point here is that when we already have UPI and NUPI then what’s the use of this Secondary Index?

Well the best possible answer for this question is that – Secondary Indexes provide an alternate path to the data, and should be used on queries that run many times.

Teradata runs extremely well without secondary indexes, but since secondary indexes use up space and overhead, they should only be used on “KNOWN QUERIES” or queries that are run over and over again. Once you know the data warehouse, environment you can create secondary indexes to enhance its performance.

Syntax of creating Secondary Index

Syntax of UNIQUE SI:

CREATE UNIQUE INDEX (Column/Columns) ON <dbname>.<tablename >;

Syntax of NON-UNIQUE SI:

CREATE INDEX (Column/Columns) ON <dbname>.<tablename >;

Note – SI can be created even after table is populated with data. Unlike PI which is created only at the time of creation of table. You can create and drop SI at any time.

Whenever you create SI on the table, Teradata will create a subtable on all AMP. This subtable contains three columns given below –

  1. Secondary Index Value
  2. Secondary Index Row ID (this is the hashed value of SI value)
  3. Base table Row ID (this is the actual base row id  )

Will see the use of all these values later in this post.

USI Subtable Example

When we defined a UNIQUE SI on the table, then Teradata will immediately create a USI subtable in each AMP for that particular table.


Remember creation of subtable requires PERM space, so always be wise to choose your SI. Normally the best SI is that column or columns which is mostly used in the WHERE clause.

Now I’ll explain in-depth architecture of creation of subtable and retrieval of SI for better understanding of the concept.

Please look into the image below –

Suppose we have an Employee table (base table) having attributes Emp, Dept Fname, Lname and Soc_security. We defined USI on the column Soc_Security.

You can see the SI subtable created on each AMP which holds information about the SI column and corresponding Base row id (Base Table Row-ID), which is the ROW ID of the actual Employee table. The steps involve to load this subtable is as follows –

1)      Teradata will first create the subtable on all AMP.

2)      After that it hashes the value of this USI column (Soc_Security) and based on that hashed value it check the hash map for the AMP number which will hold this USI value in its subtable.

3)      After getting the respective AMP number, the SI value along with the two more attributes (secondary index row id and base table row id) will be stored in the subtable of that AMP.

In this way we populate our USI subtable on each AMP. As the SI columns is UNIQUE there is no duplication of SI values in any subtable, means each row in the subtable is unique and will fetch only one row when we make a query on that SI column.

Note – As it is clear now that defining SI will require the creation of subtable, so we should be aware that SI requires space cost factor on our Teradata system.

Teradata retrieval of USI query.

Suppose on the above example we make a query –

Select * from Employee_table where Soc_Security = ‘123-99-8888’;

When a TD optimizer finds USI in where clause it knows that it’s a 2 AMP operation and also only one row will be returned. So the step its perform for retrieval is as follows –

1)      It will hash the value of SI (‘123-99-8888’), by hashing algorithm and found the hash value for it.

2)      Now it checks this hash value in the hash map and gets the AMP number from it. We know that this AMP stores this SI value.

3)      Now it will go to the Employee subtable of that AMP and retrieve the Base row id which is stored for that hash value.

4)      This Base row id will be sent back to optimizer by BYNET.

5)      Now optimizer sent back this ROW ID again and fetch the resultant row from the Base table for which Soc_Security = ‘123-99-8888’ .

As we have seen that Teradata system requires 2 AMP to reach the answer row that’s why we called USI operation as the 2 AMP operations. Even if SI row resides in the same AMP  in which Base row reside , still after getting Base row id from the subtable it will sent back to optimizer so that it start search again based on that Base row id. So it’s always called as the 2 amp operation.

NUSI Subtable Example 

When we defined a NUSI on the table then Teradata will build the subtable on each AMP in the same fashion as that in USI. The only difference in this subtable creation is that, instead of building subtable on each AMP it will be build on AMP local which means that each AMP will build the subtable in it to points it own base rows. In other words each NUSI subtable will reflect and points to the those base rows only which it owns.

Please look into the image below –

Suppose we have an Employee table (base table) on which we defined NUSI on the column Fname.

1)      Now Teradata will first create the subtable on all AMP.

2)      Each AMP will hold the secondary index values for their rows in the base table only. In our example, each AMP holds the Fname column for all employee rows in the base table on their AMP (AMP local).

3)      Each AMP Local Fname will have the Base Table Row-ID (pointer) so the AMP can retrieve it quickly if needed. If an AMP contains duplicate first names, only one subtable row for that name is built with multiple Base Row-IDs. See the example above for the Fname = ‘John’ , the subtable holds multiple base row id for this value.

Teradata retrieval of NUSI query.

Suppose on the above example we make a query –

Select * from Employee_table where Fname = ‘John’;

When an NUSI (Fname) is used in the WHERE clause of an SQL statement, the PE Optimizer recognizes the Non-Unique Secondary Index. It will perform an all AMP operation to look into the subtable for the requested value. So the step its perform for retrieval is as follows –

1)      It will hash the value of NUSI (‘John’), by hashing algorithm and found the hash value for it.

2)      Now it will instruct all AMP to look for this hash value in its Employee subtable. Note unlike USI there is no looking into hash map  because each subtable in the AMP contains rows from its own base rows only. So this look up on hash value will be performed on all AMP subtable.

3)       Any AMP which doesn’t have this hash value will not participate anymore in the operation.

4)      When the hash value found the corresponding Base row id will be fetched from the subtable and send to optimizer for actual retrieval of rows.

The point to note here is that NUSI operation is not similar to FTS (full table scan).

Suppose we don’t have Fname as the NUSI and we make the query on Fname in WHERE clause. In this case first of all Fname from the Employee table is redistributed in SPOOL space and then we match our value given in the where clause from the rows in SPOOL.

While in our case where Fname is defined as NUSI, TD optimizer already knows that this column is NUSI and its already distributed by its value in subtable in each AMP. So it will not go for redistribution step instead of that it will directly match the value for it in each subtables.

The PE will decide if a NUSI is strongly selective and worth using over a Full Table Scan. So it’s advisable to always do COLLECT STATS on NUSI index. You can check the Explain function to see if a NUSI is being utilized or if bitmapping (FTS) is taking place.

 Secondary Index Summary


1)      You can have up to 32 secondary indexes for a table.

2)      Secondary Indexes provide an alternate path to the data.

3)      The two types of secondary indexes are USI and NUSI.

4)      Every secondary index defined causes each AMP to create a subtable.

5)      USI subtables are hash distributed.

6)      NUSI subtables are AMP local.

7)      USI queries are Two-AMP operations.

8)      NUSI queries are All-AMP operations, but not Full Table Scans.

9)      Always Collect Statistics on all NUSI indexes.

Any suggestions, corrections or enhancements to the post are most welcome 🙂

85 comments

1 ping

Skip to comment form

  1. vandan

    Hi Admin ,

    Can you let me know why does PI retrieval use row hash and SI retrieval use the entire row id.

    Regards
    vandan

    1. siddarth Sai

      Secondary Index (NUSI) stores values based on entire row id (which is a combination of row hash and uniqueness) because it allows duplicates and in TD each record is stored in the Amp locally based on the row-hash number but when there is duplication a unique id is added to the row-hash to make it unique so that the PE doesn’t get confused. whereas in case of PI since there is no duplication allowed, each record has unique row hash anyway so it uses the row hash

  2. Shiwali

    Awesome way of explanation..i understood it by reading the full article just once. Thanks Nitin, for helping us 🙂

  3. Jayant Vishwakarma

    Hi Admin,

    Could you please tell me. No. of AMP operations for
    UPI, NUPI, USI, NUSI.??

    Thanks in advance.

    1. Abhishek Upadhyay

      Hi Jayant,

      Till the time admin replies below is the number of AMP operations as requested :

      UPI : 1 AMP
      NUPI : 1 AMP
      USI : 2 AMP
      NUSI : All AMP.

  4. raghu

    excellent work

  5. Akshaykumar

    nice work !!!!!

  6. krishna

    hi admin,
    i dont understand what is the use of Secondary index row id in subtable can you explain it with example.
    Thanks,

  7. sanket deshpande

    ABsolutely helpful and superbly explained.great job !!!

  8. Sankaran

    You wiped out all my confusion on SI ….

  9. Ramya

    Perfect explanation!! Well Simplified!!

  10. Anoorag

    Splendid article.Explained things in a very Lucid manner..

  11. Geoffrey

    Very good understanding of Teradata SI ,,,Thank

  12. jyotsna

    Thank you good explanation it is very easy for the freshers to understand the concept in a better way

  13. Pragnya

    It’s really very helpful.The fundamental things in teradata will be cleared by reading this articles.Thanks a lot and keep posting such nice articles..:)

  14. admin

    Hi All,

    We have started a new special online teradata training batch for developers and DBA profile. For those who are interested to learn teradata can register in this batch. Fees is quite less when you compare it with other batches and biggest advantage is that its a instructor led training batch so you can ask any doubt in the training itself. Also we’ll be covering latest version of Teradata i.e. Teradata 14.10

    Please check following link for more details about this batch –
    http://www.onlineteradatatraining.com/?page_id=96

    Limited Seats. So try to register ASAP.

  15. Srini

    Hello Admin..

    I have always confused of how SI worked.. Nice and simple pictorial explaination has helped in the basic understanding of the working of SI.
    Thank you for the post

  16. venu

    Nice Article with detailed explanation

  17. venu

    Nice Article

  18. Anurag

    This website is absolutely marvelous. It presents the concept so beautifully that all the wrong notions in the mind are dispelled and the supporting examples ensure that the concept is not forgotten for a long time.

    1. admin

      Thanks Anurag 🙂

  19. Marco

    Hi.
    the advantage of having a NUSI instead of proper statistics is only that the hasing of the values is allready done instead of on the fly in a query. Right?
    i am doubting when a NUSI is better than statistics.
    regards, Marco

  20. vikram

    wow..gr8 stuff

  21. Kumar

    Excellent explanation!
    I have a question about USI, You said its always 2 AMP operations. But if we have 4 amps in an organization, do we have to say it as 4 amps operation. My understanding is the table will be divided based on PPI on 4 amps and we will have 4 subtables. Correct my understanding.

    thanks
    Kumar

    1. admin

      USI will always be 2 AMP operation, no matter how many AMPS you have in teradata.
      1 AMP operation is required to fetch the base row id from subtable and 2 AMP operation is required to fetch the actual base row.

  22. Rajasekar

    Nice work Nitin … You have beautifully explained the core concepts, not confusing with advanced details… great

    1. admin

      Thanks…

  23. Srini

    Hi Niteen,

    Your posts are awesome and I have a small question.

    In the example you had given for NUSI, its a one AMP operation only but at the end you were saying NUSI is all AMP operation. I’m little confused, can you please explain it once.

    1. admin

      Hi Srini,

      In which example you are seeing NUSI is one AMP operation ? Please clarify on it.

      NUSI is always ALL AMP operation.

      1. Srini

        Hi Niteen,

        SELECT * FROM Employee Table where Fname = ‘John’.

        In the above query only one AMP is involved to get the result.

        Initially I thought its an one AMP operation. But I have realised later on that after searching ALL AMP’s only it comes to know that ‘John’ as first name records are there in only one AMP. So are we calling it as ALL AMP operation, am I correct?

        Please advise if I’m into wrong understanding.

        Thanks,
        Nitin

        1. admin

          Yes,

          We are calling it as ALL AMP operation.

  24. kishor

    wow…nice explanation with proper examples..:)

  25. Ashwini

    Thanks for a clear explanation by using such simple terms.

  26. deepak

    nicely explained 🙂

  27. swetha

    good n awesome article

  28. Sid

    Excellent Article. Basic concepts are been explained in very easy to understand language.

    Can you let me know teh names of some beginner books on TD .
    Thanks,

  29. Naresh reddy

    Thank u very much for your details explanation
    awesome article thank u lot

  30. Bhagya Rajan

    Nice Article !!!!!!!!

    1. admin

      thanks 🙂

  31. Chinmay Athavale

    I understood the concept of PI and SI by going thorugh your post just ones.

    Where as I was struggling to understand this for past few days even after going through lot’s of material and blogs.

    Thank you Nitin for this beautiful explanation and hats off to your hardwork

    Regards,
    Chinmay Athavale

    1. admin

      thanks Chinmay 🙂

  32. Arnab

    Hi Admin,
    In your NUSI explaination,what will be hashed value of “John” in AMP local subtable,if
    in the 2nd AMP we have record like
    70,40,John,Porter,243-123-127(atrribute sequence is kept unaltered)?
    Will that be 145(like it has appeared in 1st AMP subtable)?

  33. pankaj

    Hi,

    Could you please let me know the use of SI row_id.

    Thanks.

  34. Harish

    Whenever we create a secondary index teradata will create subtable on all AMPs.This subtable contains-
    1.SI value
    2.SI row id
    3.Base table row id

    In case of USI after getting the AMP no. from the hash value it goes to employee subtable of that AMP and retrieves base row id why so when subtable stores SI value as stated above what is the need of retrieving base row id from subtable and sending back this base row id to optimizer which fetches SI value from base table?

    1. admin

      i didnt understand your questions 🙁

      sel * from emp where last_name = ‘kumar’;

      in this SI is last_name and result set will be all columns from table emp whose last_name =’kumar’, so it has to fetch the base row id to get all column values.
      whats the confusion here ?

  35. Veera

    Each and every point is useful ….Thanks for sharing …

  36. Niladri

    You really have the boon of explaining complex things in easy words..
    appriciate ur effort..

    1. admin

      Thanks for appreciation 🙂

  37. prabha

    Very nice article.. Thanks..

    1. admin

      Thanks 🙂

  38. Rupali

    Really very nice post.Can you please share some concepts on procedure,macros and functions.

    1. admin

      i guess those are the topics which is not in scope of this blog 🙂

  39. mike

    It’s awesome,
    I dont know that how much time you required to learn this ! But surely you have in depth experience in TERADATA .

    Please tell us about yourself,your experiences.

    You are superb!

  40. Aravinda

    Good work.. It is very easy to understand.. Can you please post the topics on Locks and all 5 types of Utilities.

  41. suhail

    I have read about secondary Index. that is really good to understand. However, I have one question on secondary index. I am posting this, hope I will get correct ans as I am not able to find the correct ans even after reading his topic on the same website.

    What is a reason to use a USI over a NUSI when creating a table?

    a.They ensure data uniqueness.
    b.They are a oneAMP operation.
    c.They are needed for journaling.
    d.They are supported by the batch load utilities.

    Answer :C (I found this ans) but sure about it.

  42. sumit

    Awesome article…. good job admin really finding easy understanding the TD basics….

  43. Aarsh

    Excellent article. Very well explained and the example is super.

  44. madhuri swapna

    Hi Nitin

    Its an awesome article.Please provide more information about Locks used in Teradata with an example.

    Thanks
    madhuri

    1. admin

      Thanks

      Will try to post on locks as well

  45. Manisha

    Hi Admin,
    I am clear about the USI subtables which are hash distributed. But litle confused with the NUSI subtables which are AMP local. It is mentioned that, in the case of NUSI, the subtable in each AMP will be pointing to those base rows, only which they own.

    So is the USI subtable created on each AMP, will be pointing to the entire rows in the base table? If the answer is: no, it will be decided using the hash algorithm, then how the NUSI and USI subtable differ?

    1. admin

      in case of USI, optimizer hashes the USI column and then look into HASH MAP to get the actual AMP where the row reside.
      while in NUSI, optimizer hashes the NUSI column but won’t refer the HASH MAP instead of that it instructs all the AMP to check in there respective NUSI sub table to look for that value.

      data retrieval is the major difference between USI and NUSI subtables.

  46. priyabrat

    nice explanatio

  47. jitendra kumar

    awesome post..

  48. Manisha

    Hi admin,

    The article is really good. Please answer for my questions below:
    1. Can secondary index be defined on a table which dont have primary index defined? Is there any condition like SI can be defined only if PI has defined on the table?

    2. Suppose NUSI has been defined on a table which has few data. 10 rows are inserting into the table later on. How the the AMP subtables include the new rows?

    1. admin

      your answers –

      1) You cannot create a table without defining PI (if you are below TD14 version). Even if you won’t give PI in DDL, then Teradata itself makes the PI for the table.

      2)As soon as you define NUSI, all the rows of the base table will be sorted and placed in same AMP subtable. When you are inserting 10 new rows, then first they will be processed based on PI in respective AMP and after that processed in sub tables based on SI.

      1. Vardhan

        For 1st Question i have one doubt.
        I created one table with “NO PRIMARY INDEX”.
        (CREATE TABLE A(ID INT,NAME VARCHAR(30),SAL INT) NO PRIMARY INDEX)
        I inserted data into table A. and I created UNIQUE SEC.INDEX(id) on A. it was created.

        Note: we can create UNIQUE SI on a table which doesn’t have PI

        CORRECT ME IF IAM WRONG……….

        1. Raju

          Anyone please answer for this above post…i too got same doubt.

          1. admin

            Hi Vardhan

            yes you can create USI or NUSI on the NPI (NO PRIMARY INDEX) table.

            Besides SI you can also create following on NPI –

            • Fallback
            • Join indexes
            • UNIQUE column constraints
            • CHECK constraints
            • PRIMARY KEY and FOREIGN KEY constraints
            • Triggers

          2. Niraj

            Hi Admin,

            Besides SI you can also create following on NPI –

            • PRIMARY KEY and FOREIGN KEY constraints

            I think you mean primary key like in sql can be created on the table with NO PRIMARY INDEX.

            Can I create primary index after creating USI or NUSI on table?

          3. shashi

            Hi Admin,

            How the data will be stored in AMPs for tables with NO Primary Index(NPI) for prior to TD14 versions?

            As i understood data is stored/inserted in tables based on the hashing algorithm on PI. In absence of PI how the data will be stored…

            Thanks
            Shashi

  49. Arun

    Awesome article…Easy to understand… Great work

  50. leela prasad

    hai sir i have a doubt in case of nusi after generation of hash-row index parser has to to send to appropiate base tables .how it can know that it has to send to that base table only with out refering to hashmap for that base table

    1. admin

      when you use NUSI in your SQL query, PE immediately identifies that this is the NUSI and it must have the subtable defined in each AMP. Now it simply hashes the value of that NUSI and it checks in ALL AMPS which contains the subtable of that base tables. As its a ALL AMP operation no need to check it in HASHMAP.

  51. harsha

    Hi,
    here is my doubt ,when we use non index column in where clause all the values are redistributed in the spool and compare the value in the where clause.How it compare, using hashing or with just with column value?

    1. admin

      teradata compare the columns based on their hash value not actual value .

  52. Ravi Kataria

    This is very nice article. It does explain the complex things in simpler fashion.
    It does makes the concept of indexes clear. which is very helpful

    Thanks a lot

  53. Raj

    Awesome article….complicated things explained in amazingly simple way….:hats off:

  54. Sam

    Great work admin . Thanks for ur efforts on doing this

  55. shreya

    article has made complex things easily understandable. 🙂

  56. Vanni

    Nice information about the teradata.i appreciate your detailed explanation.
    Thanks a lot.

    1. admin

      Thanks Vanni 🙂

  57. Panini

    great article.. simple to read and understand the complex architecture.. appreciate your efforts put into this.. hope to see more of these on Teradata..

    1. admin

      thanks 🙂

  58. admin

    Hi Readers,

    I’ll highly appreciate if you leave your feedback as comments after reading this post 🙂

    Thanks
    Nitin

    1. Neeraj

      Nice work Nitin, information gave lot of insight and understanding.

  1. Secondary Index in Teradata » TeraData Tech » BlinkMoth Software Industries | BlinkMoth Software Industries

    […] Teradata Blog Post From Teradata – Google Blog Search: Teradata runs extremely well without secondary indexes, but since secondary indexes use up space […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>