«

»

Nov
20

Primary Index in Teradata

Each table in Teradata is required to have a primary index.Even if you did not define any primary index in CREATE table statement, the Teradata system will automatically create the primary index based on the setting of DBScontrol setting field . The primary index defines where data will reside and which AMP receives the row.

The three most important roles the primary index does is the following –

  • Data Distribution
  • Fastest way to retrieve Data
  • Incredibly important for Joins

In short primary index provides the fastest physical path to retrieving data.

Two types of Primary Indexes are given below  –

Unique Primary Index (UPI)

A unique primary index means that the value for the selected column must be unique.  In the example below the EMP_ID is the Unique Primary Index.

A unique primary index (UPI) will always spread the rows of the table evenly amongst the AMPs.

Non Unique Primary Index (NUPI)

A Non unique primary index means that the value for the selected column can be non unique.  In the example below the LAST_NAME is the Non Unique Primary Index.

A Non Unique Primary Index (NUPI) will almost never spread the rows of the table evenly amongst the AMPs.

Key Point – Don’t assume that UPI is always the BEST CHOICE; sometimes the use of NUPI is very critical. If you need to perform lot of query with LAST_NAME in the WHERE clause then NUPI on last name is essential.

Primary Index and the ROW HASH


The primary index is the key to determine where the ROW of the table will reside on which AMP. When a new row arrive for insert in Teradata the following steps occur –

  1. Teradata Parsing Engine (PE) examines the primary index of the row.
  2. Teradata takes the primary index of the rows and run it through HASHING ALGORITHM.
  3. The output of the Hashing Algorithm is the 32 bit Row – Hash value.

e.g –

New row coming with following attributes –


Hash the PI value -> 55 / HASHING ALGORITHM = 11110000111100001111000011110000 (Row – Hash)

This value in Binary format is called as the Row – Hash Value. And it will always be unique for the given PI. Means it’s not possible to get two Row – Hash Value for the same EMP_ID (55).

This 32 bit Row – Hash value determines in which AMP the row will reside and it always attached along with the ROW to make it a UNIQUE identification for that ROW.

Now in Teradata we have HASH MAP which contains the different bucket called as Hash Map Buckets. These buckets contain only the different AMP number which is attached with the Teradata system. Suppose that Teradata system contains 4 AMP then this Hash Map contains numbers from 1 to 4 in different buckets. see image below –

Four Amp Hash Map

 

The Row – hash value which we determine above,  is used to point to certain bucket in the Hash Map. This value points to only one bucket in the Hash Map and that bucket number will determine the AMP number where this new row will reside.

e.g. –

11110000111100001111000011110000 tells that it correspond to first row and fourth column of Hash Map i.e. AMP no = 4

So finally we have the AMP number (4) where the new row will reside.

After doing all this calculation in PE , the PE instruct BYNET to send the row along with its unique Row – Hash Value in the AMP number 4.

The new row along with its Row hash Value is send to AMP number 4 for storing. In this way we achieve the data distribution for each row of the given table.

Uniqueness value


When the AMP receives the row it will place the row into the proper table, and the AMP itself check that is there any other row with the same Row- Hash Value. If this is the first Row with the particular Row- Hash value then it will assign a uniqueness value of 1 along with the row. But if this is the second row with the same Row-Hash value for the same table then it will assign a uniqueness value of 2. Similarly it assign a value of 3 if it finds third occurrence of the same Row – Hash, in this fashion it keeps on adding uniqueness value for the duplicates rows.

If you don’t get this concept at the first go don’t worry, it is explained again with the suitable examples for more clarity.

An example of UPI table, where EMP_ID is the UPI.


Note – as the table has UPI defined on EMP_ID it will always has the unique row in the given table. So each time AMP will assign the uniqueness value of 1 against each row.

An example of NUPI table, where LAST_NAME is the NUPI.


Note – as the table has NUPI defined on LAST_NAME, we have three rows for the same LAST_NAME = ‘Kumar’. Hence the AMP will assign different uniqueness value from 1 t 3 for each row.

Together with Row – Hash and the uniqueness value the Teradata make as 64 bit ROW – ID to uniquely identify each row in the given AMP.

This is the complete reference for Primary Index in Teradata. I tried to make it as simple as possible for the newbies, in case of any correction or clarification feel free to reach  me :)

To read about Secondary index in teradata please go to – Secondary Index in Teradata

 

 

85 comments

No ping yet

  1. Ali says:

    Hi Admin,

    Very well explained, even better than TD’s documentation. Thanks for sharing the information.

    But, I would like to make a point here, sharing information is good, stopping someone from copying your content for later use is not good. You know there are thousands of ways that the content can still be copied. You get your credit, since sharing is caring so I would recommend not to use the age-old method of not letting someone copying your content.

    Thanks again

  2. Ritesh kumar says:

    sorry to say , but I didn’t.

  3. abhishek says:

    Hello Admin, Nice and quick Explanation of primary indexes….though I have a query here…:
    Let’s assume we have a employee table with Dept_Id as a Primary Index and Emp_ID as a Secondary index…How many AMP operation it wil be when we execute these below queries:

    1) sel * from employee where Dept_id’101′ and Emp_id=’10’
    2) sel * from employee where Dept_id’101′

    Please reply..Thanks in advance

    1. abhishek says:

      Here Dept_ID !=101

  4. Manju says:

    Could you please answer the below question:

    If we have a primary index as column1 for table A and composite primary index on table B as (column1, column2). I am loading both the tables using MLOAD utility (seperately).

    Both table A and table B have equal number of data count and their Primary Indexs have same number of unique values.
    The tables has huge volume of data.

    In this case which table will be loaded faster and why?

  5. admin says:

    Hi All,

    We have started a new special online teradata training batch for developers and DBA profile. For those who are interested to learn teradata can register in this batch. Fees is quite less when you compare it with other batches and biggest advantage is that its a instructor led training batch so you can ask any doubt in the training itself. Also we’ll be covering latest version of Teradata i.e. Teradata 14.10

    Please check following link for more details about this batch –
    http://www.onlineteradatatraining.com/?page_id=96

    Limited Seats. So try to register ASAP.

  6. Ritesh says:

    Hi Admin
    I’m a newbie to Teradata. I have a got a few doubts :
    1. How do you calculate the Row Hash value(11110000111100001111000011110000 ) for PI value=55 ?

    2. From the hash table given , how do you come to the conclusion that we need to choose 1st row and 4th column to find the AMP no ?

    3. Is uniqueness value a 32-bit binary number ??

    1. Meena says:

      Hi Rithesh, i also new to Teradata. Did u get the answer for ur query?

      1. Pratik says:

        Hi Meena & Ritesh,
        If u got the answer, pls do share. I’m also having the same doubt.
        Thanks in Advance.

  7. Pawanism says:

    AMP never stores the rows of 2 different tables together.

    Bit confused…

    (1) Am having three tables A,B and C so does it mean that rows of these tables will be stored in three AMPS.
    Table A — AMP1
    Table B– AMP2
    Table C– AMP3
    (2) The above statement is applicable for only identical tables ??

    1. admin says:

      Am not sure this statement is published in this blog anywhere.

      Actually each AMP stores some portion of each table exist in Teradata.

      1. pawan says:

        The statement was replied by admin to question of D_Customer and D_customertemp.
        Nice explanation of primary index admin really helps a lot .

  8. uma says:

    when do we go for full table scan

    1. Balakishore says:

      Full Table scans typically occur in the following situations:
      (1) If The index columns are not used in the query.
      (2) An index is used in non-equality test.
      (3) A range of values specified for the Primary Index

  9. Ivy says:

    Hi Nitin,

    Nice explanation! This site is great!!
    Got few doubts. Kindly help me.
    1. the output of hash algorithm is 32bit row hash.The 20 or 16bit is called bucket. the entry of hash map. your image above is 1,2,3,4,1,2,3,4……Are all the hash map distributed like this?
    2.Will this method lead distribution skew?

    1. admin says:

      Thanks, here goes your answers –

      1) No. The representation in the diagram is just to make it easy to understand.
      2) In case of bad PI your data will land into one single AMP, that finally results in skewness.

      1. Ivy says:

        Dear Admin,
        Thank you for your reply.
        But I just came up with another question.
        How are the AMP numbers distributed in a hash map ?
        In what order ?
        Looking forward to …..
        Ivy

        1. Kanav Narula says:

          If we have 5 AMP system, Hash Map will have number from 1 to 5 and after 5 it again starts from 1 till 5 comes and it and it is repeated 1 million times because one Hash Map have 1 million buckets.

  10. shrikant says:

    Very well explained.

    Can you pls explain..how teradata will process any ddl statment or any alter statment.

  11. ratika malik says:

    Like the explanations

  12. krishna says:

    hii admin

    while creating table if PI was not specified but PRIMARY KEY constraint imposed on COLUMN_1 and UNIQUE imposedon COLUMN_2 wich column will it consider as PI by default….(COLUMN_1 or COLUMN_2)

    1. Prasad K says:

      Primary key itself has unique values. It will take primary key as primary index

  13. shivaganga says:

    Hi Admin,
    Very useful information :)
    If in case i am having the duplicated records of PI values then Uniqueness value is generated as 1, 2,3 and so on
    Wanted to know If i am having the same PI values in the table
    D_CUSTOMER and D_CUSTOMER_Temp
    How does the AMP identify record of which Table and How.

    1. admin says:

      AMP has a specific section for each of the table.
      So one value of PI will goes in the section of D_CUSTOMER table, while other duplicate values goes in D_CUSTOMER_Temp table.

      AMP never store the rows of 2 different tables together.

  14. shrikant says:

    Hello

    How can i access the unique value along with Hash row?
    My table is having NUPI and i want to select unique record on the basis of Hash row & unique value?

    Is there any substitute of RowID in Teradata 13

    Thanks

    1. admin says:

      To find hash value try this-
      sel hashrow(columnname) from table;

      to find unique hash value try this –
      sel distinct hashrow(columnname) from table;

      1. shashi says:

        Hi Admin,

        1st thanks for such a wonderful post…this is very helpful.

        As i understand hashrow is generated based on the PI column, then in the above sql what should be the column name(sel hashrow() from table;), by default we should pass PI as the column name or can we pass any column from the table?

        2nd question is this query valid in Teradata 13?

        Thanks
        Shashi

        1. admin says:

          1) you can pass any column in hashrow function. It will simply give you the hashed value (32 bit) of that column. If you pass PI column then you’ll get the actual hash value which distribute that complete row.
          2) yes

  15. mike says:

    Hi Admin,
    Can you tell me that ,how to get Uniqueness value for each row.?

  16. mike says:

    Thanks Admin, very good article!!!!!!!!!!!!!!!!

  17. Siva says:

    I am new for Teradata and I moved here from Oracle. This site having very good information and easy to learn those concepts.

    But I unable to copy these information for my learning purpose.
    I think disable the copy option in this site.

    Thanks,
    Siva, India.

    1. admin says:

      Thanks for your appreciation.
      But please understand that contents are banned from copying to protect the originality. However you can use the link for your reference.

      Hope you understand :)

  18. Ram says:

    hi admin

    can you plz explain in which scenarios fulltable scan would be good

    1. admin says:

      its not about good and bad, optimizer will go for FTS (full table scan) only when there is no other options like (SI,PPI etc).

  19. Mahesh says:

    Hi,
    If we have 2 tables with the same set of PI columns defined, will it be using the same Hash value and only the Uniqueness ID will change or Hash value cannot be shared across tables?

    1. admin says:

      in this case hash value will be the same for both the tables, but at the time of retrieval of data, optimizer knows from which table to select the rows.

  20. Rama says:

    Hi,
    As per the above explanation how to know the Hash value and uniqueness value from the table

    1. Praveen Kumar says:

      Use the below query to get the has value

      select hasrow(<>) from tablename;

  21. Priya says:

    Hi could u post on PPI

    1. admin says:

      very soon :)

  22. Surya says:

    Really useful info…

  23. manohari says:

    is thr a way i can see how many rows r assigned to which amp??

    1. admin says:

      sel hashamp(hashbucket(hashrow(PI_COLUMN))), count(*) from tablename group by 1

      where tablename is the table whose PI is PI_COLUMN

      1. ZEROMAN says:

        Hi admin,
        I executed your above query and got no. of amp’s also i got near about equal partition accross all amp’s.

        Thank You.

  24. abhishek says:

    Hello Admin

    As you told PI can be nullable. How it then stores that reocrd. I mean what will be the row hash value?

    1. admin says:

      sel hashrow(NULL)

      by this you can find the ROW HASH value of NULL, which is 00-00-00-00

      1. Madhav says:

        can u give teradata training…

  25. Manisha says:

    Hi Nitin,

    Got few doubts. Kindly help me.
    1. Can the PI be nullable?
    2. What is the maximum number of columns in a table which can be defines as PI (like in case of SI, it is 32)

    1. admin says:

      Hi Manisha,

      here you go –

      1) Yes PI (UPI or NUPI) can be nullable
      2) From TD 13 onward we can give 64 columns as composite PI, in earlier version i guess only 16 columns.

      1. Kumar says:

        Hi Admin,

        PI can be nullable. It means is it applicable for more than one column as PI or if it is applicable for only one column then how the data distribution will take place with PI column having NULL values?

        1. admin says:

          if you are defining more than 1 column as PI, then all those columns should be NULL to generate PI as NULL.
          NULL is treated as any other value in TD, so all the NULL values will be collected in a single AMP.

  26. priya says:

    Hi Admin.. nice explanation :)

    I have a doubt about this data selection once distributed among the AMPS.

    Suppose I have a table defined with UPI, using this UPI data has been distributed among the AMPS.Now while querying the table if i use a column other than UPI(in where clause) now i want to know how the PE gets the plan for AMP to retrieve the data from Pdisk. As we know how the PE gets the AMP number sending PI value thru hashing Algorithm whether non PI/SI column is also sent in the same and gets the details..?
    NOTE: (the other column used in where clause is not an UPI or any SI too)

    Admin as your explanations are very spoon feeding and crystal clear pls explain me this scenario it would be gr8 help to me :)

    1. admin says:

      Lets explain this with the help of example –

      suppose i ran the query –

      Sel * from EMP where First_name = ‘admin’

      now in the table EMP First_name is not a PI or SI. So when PE makes the plan for this query it will go for FTS (full table scan).
      In this FTS all the AMPs work in parallel and retrieves the rows of EMP table from its VDISK into SPOOL space. After retrieval of all rows redistribution will take place in this SPOOL space on the hashed value of First_name, so that rows will be sent to its proper AMP based on this new hashed value.
      After doing this redistribution , now PE will hash the value of ‘admin’ and then check it with HASHMAP to get the AMP no where the row will reside.

      Hope it helps :)

      1. sush says:

        If there millions of records then also Amp will retrive the data to spool space first.
        Is this option is applicable to huge data.Please explain

        1. admin says:

          Hi Sush,

          Teradata works irrespective of data volume. So the processing for 10 rows will be same as that of 10 billion rows. :)
          Read Primary index post for understanding of data distribution concept.

          1. shashi says:

            HI Admin,

            I have few doubts,
            1. In case multiple columns are part of PI in a table and only one out of them are in where clause. in this case full table scan will be done or PE will simply hash based on this 1 column which is part of PI?

            2. please could u clarify again on “rows will be sent to its proper AMP based on the new hashed value”. will this be a permanent row redistribution

            Thanks
            Shashi

          2. admin says:

            1) Full table scan is done, as PI hash value is combination of columns.
            2) Redistribution on this new hash value will be in SPOOL space.

    2. admin says:

      Lets explain this with the help of example –

      suppose i ran the query –

      Sel * from EMP where First_name = ‘admin’

      now in the table EMP First_name is not a PI or SI. So when PE makes the plan for this query it will go for FTS (full table scan).
      In this FTS all the AMPs work in parallel and retrieves the rows of EMP table from its VDISK into SPOOL space. After retrieval of all rows redistribution will take place in this SPOOL space on the hashed value of First_name, so that rows will be sent to its proper AMP based on this new hashed value.
      After doing this redistribution , now PE will hash the value of ‘admin’ and then check it with HASHMAP to get the AMP no where the row will reside.

      Hope it helps :)

      1. Learner says:

        Hi Admin,

        Great explainations,
        Got one doubt
        1.While the redistribution happens and the rows are redistributed in the AMPs, are the exixting rows for the same tables overwriten?
        2. If yes, is the redistribution maintained till the next transaction?

  27. narsing says:

    how u got the row hash nimber 55 is 11110000 111100000 111110000 11110000
    first row fourth column

    1. admin says:

      we get this number through HASHING ALGORITHM, it’s a C program.
      Now the point to note here is that the binary number which i have written is only for example purpose, its not the exact number which we get by hashing 55 ;)

  28. Raju says:

    Hi ,

    I have one dbout please clarifiey.

    you are saying in above post that –>11110000111100001111000011110000 tells that its corresponig to the first row and fourt column of HASH MAP i.e., AMPno=4 . I am not getting how it decides based on the binary number it should go to AMP no=4..please explain me ?

    Thanks ni advance

    1. admin says:

      the row hash which we are getting i.e. 32 bit binary number is the indicator only that can point to a certain bucket in HASH MAP. Treat this as a pointer to that bucket.

  29. Rohan says:

    Loved it..!!
    explained beautifully..!!
    Mast !!

    1. admin says:

      thanks :)

  30. TDLearner says:

    Fantastic Explanation for new in teradata.

  31. suhasini says:

    Hi really good to see your site. More useful info and it helps a lot.

  32. Leo Oberto says:

    Hi Admin,

    Love your posts!

    I was reading your post and I come across about USI (Unique Secondary Index). Is there similar to UPI but adding another hash -row to another AMP?

    Thanks and Regards.

    1. admin says:

      Yes Leo

      USI will store in some another AMP in the same fashion as UPI, it also create it owns SI subtable in each AMP for storage

  33. Jan says:

    I’m curious. In which part of the PE where the computation takes place? Dispatcher?

  34. ghouse says:

    Nice Explanation .easy to understand and can grasp in short span of time.Could you please explain the usage of uniqueness value in NUPI .
    Suppose select * from where last_name=’kumar’ (NUPI is defined on the table).Based on the rowhash value only it fetches the data.If yes please explain the significance of uniqueness value
    Thank you ..

  35. Debraj Mukherjee says:

    Hi Nitin, A good post especially for the newbies but there is one small thing regarding the primary index that you have overlooked. If you don’t specify the primary index in the DDL of a table then Teradata doesn’t assume the 1st column as primary index. There are 3 checks involved namely:

    1. Teradata looks for the 1st Unique Column of the table and if there is a Unique column present for that table then Teradata Chooses it as a Unique primary index
    2. If there is a primary key defined (for example in a conceptual data model) and you are using the conceptual data model to create the table then Teradata will take the Primary key column as a Primary Index
    3. If any of the above two cases are not satisfied then Teradata takes the first column on the table and creates a NUPI on that column

    You can correct me if I’m wrong.

    1. admin says:

      Hi Debraj,

      Thanks for providing your inputs. :)

      All the three points are perfectly valid and also very well explained, thanks for enriching my post.

      1. Anuj M says:

        Hi Nitin,

        How are you?
        I created one table below in TD13.11 :

        create table ttmp.abcde_anujma
        (
        ename varchar(30),
        –eid integer unique not null, — UNIQUE(eid)
        –eid integer, –PI is ename
        eid integer not null, –PI is ename
        esal decimal(15,2)
        );

        You can check my comment on PI with 3 different types of EID column.
        I was wondering if first condition mentioned by Debraj holds true or not?

    2. vishal says:

      Debraj..how it checks for the unique coulmn when there is no data in table??

      1. admin says:

        there is a UNIQUE constraint in SQL which we can define in the DDL of the table. Based on that constratint it check that columns will be UNIQUE or NON UNIQUE.

    3. Kumar says:

      Hi Debraj,
      About point 1 & 2, i am not getting it correctly and bit confused..(. Can you give more details regarding the same. Also 1st unique column of the table
      — Is it unique value of that column?
      — Is it unique column itself, then how to find the unique column?

      1. admin says:

        Don’t confuse between UNIQUE value and column.
        At the time of creation of table when we are defining the DDL, then obviously there is no data in the table.
        If you have given any UNIQUE constraint on the column then TD will pick that column as UPI or else if you have defined Primary key constraint then agin that column will be defined as PI.
        if neither of the above is true then it will make first column as PI.

        UNIQUE column here signifies the UNIQUE constraint on the DDL.

  36. Andrew J. Gephart says:

    Do you people have a facebook fan page? I searched for one on facebook or twitter but could not discover one, I’d really like to become a fan!

  37. SQL_Server_DBA says:

    Slick way of explanation.
    I will be benefited if you could explain the concept and usage of PPI & MPPI too.

    Thanks :-)

    1. admin says:

      Hi,

      Actually i purposely tried to make all my post as simple as possible for newbies, so am afraid that it has nothing to offer you :(

      But i will try to make another post on PPI and MPPI as mentioned by you :)

      1. sugesh says:

        awsum !!! i understood….. looking for the same on other teradata topics

  38. admin says:

    Done :)

  39. gaurav sharma says:

    In NUPI table example.. i think there is some error…row hash with uniqueness ness value 3 is different.

    1. admin says:

      Yes Gaurav, you are correct.
      i was just checking that anybody is reading my blog properly or not …. and its good to find that i have some active users also :P

      my mistake .. will update it asap

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>