Nov 20

Primary Index in Teradata

Each table in Teradata is required to have a primary index.Even if you did not define any primary index in CREATE table statement, the Teradata system will automatically create the primary index based on the setting of DBScontrol setting field . The primary index defines where data will reside and which AMP receives the row.

The three most important roles the primary index does is the following –

  • Data Distribution
  • Fastest way to retrieve Data
  • Incredibly important for Joins

In short primary index provides the fastest physical path to retrieving data.

Two types of Primary Indexes are given below  –

Unique Primary Index (UPI)

A unique primary index means that the value for the selected column must be unique.  In the example below the EMP_ID is the Unique Primary Index.

A unique primary index (UPI) will always spread the rows of the table evenly amongst the AMPs.

Non Unique Primary Index (NUPI)

A Non unique primary index means that the value for the selected column can be non unique.  In the example below the LAST_NAME is the Non Unique Primary Index.

A Non Unique Primary Index (NUPI) will almost never spread the rows of the table evenly amongst the AMPs.

Key Point – Don’t assume that UPI is always the BEST CHOICE; sometimes the use of NUPI is very critical. If you need to perform lot of query with LAST_NAME in the WHERE clause then NUPI on last name is essential.

Primary Index and the ROW HASH

The primary index is the key to determine where the ROW of the table will reside on which AMP. When a new row arrive for insert in Teradata the following steps occur –

  1. Teradata Parsing Engine (PE) examines the primary index of the row.
  2. Teradata takes the primary index of the rows and run it through HASHING ALGORITHM.
  3. The output of the Hashing Algorithm is the 32 bit Row – Hash value.

e.g –

New row coming with following attributes –

Hash the PI value -> 55 / HASHING ALGORITHM = 11110000111100001111000011110000 (Row – Hash)

This value in Binary format is called as the Row – Hash Value. And it will always be unique for the given PI. Means it’s not possible to get two Row – Hash Value for the same EMP_ID (55).

This 32 bit Row – Hash value determines in which AMP the row will reside and it always attached along with the ROW to make it a UNIQUE identification for that ROW.

Now in Teradata we have HASH MAP which contains the different bucket called as Hash Map Buckets. These buckets contain only the different AMP number which is attached with the Teradata system. Suppose that Teradata system contains 4 AMP then this Hash Map contains numbers from 1 to 4 in different buckets. see image below –

Four Amp Hash Map


The Row – hash value which we determine above,  is used to point to certain bucket in the Hash Map. This value points to only one bucket in the Hash Map and that bucket number will determine the AMP number where this new row will reside.

e.g. –

11110000111100001111000011110000 tells that it correspond to first row and fourth column of Hash Map i.e. AMP no = 4

So finally we have the AMP number (4) where the new row will reside.

After doing all this calculation in PE , the PE instruct BYNET to send the row along with its unique Row – Hash Value in the AMP number 4.

The new row along with its Row hash Value is send to AMP number 4 for storing. In this way we achieve the data distribution for each row of the given table.

Uniqueness value

When the AMP receives the row it will place the row into the proper table, and the AMP itself check that is there any other row with the same Row- Hash Value. If this is the first Row with the particular Row- Hash value then it will assign a uniqueness value of 1 along with the row. But if this is the second row with the same Row-Hash value for the same table then it will assign a uniqueness value of 2. Similarly it assign a value of 3 if it finds third occurrence of the same Row – Hash, in this fashion it keeps on adding uniqueness value for the duplicates rows.

If you don’t get this concept at the first go don’t worry, it is explained again with the suitable examples for more clarity.

An example of UPI table, where EMP_ID is the UPI.

Note – as the table has UPI defined on EMP_ID it will always has the unique row in the given table. So each time AMP will assign the uniqueness value of 1 against each row.

An example of NUPI table, where LAST_NAME is the NUPI.

Note – as the table has NUPI defined on LAST_NAME, we have three rows for the same LAST_NAME = ‘Kumar’. Hence the AMP will assign different uniqueness value from 1 t 3 for each row.

Together with Row – Hash and the uniqueness value the Teradata make as 64 bit ROW – ID to uniquely identify each row in the given AMP.

This is the complete reference for Primary Index in Teradata. I tried to make it as simple as possible for the newbies, in case of any correction or clarification feel free to reach  me 🙂

To read about Secondary index in teradata please go to – Secondary Index in Teradata




Skip to comment form

  1. Manish Gairola

    Where Hash map stored in teradata??

  2. hari arun

    what is amp?

    1. admin

      Access Module processor also consider as workers in Teradata

  3. Siddhu

    Is Nonunique primary index is more than 1 amp operation?

    1. Rohit

      Hi Sidhu,

      It will still be a single amp operation.
      Same primary index , unique or non unique, will always hashes to same hash value and allocated same Amp.

      The difference is when teradata searches on the basis of Unique primary index, it stops searching as soon as it gets the row hash for the first time .however in case of non unique primary index it searches again for the possible row hash in the amp.
      but i either case it searches only in one AMP.

      Hope it helps .

      Cheers !!

      1. admin

        Perfect explanation Rohit !!!

  4. learner


    can anyone tell me how the NULL values will be distributed in the AMP’s.


    1. admin

      NULL is treated like just any other value while doing distribution.

  5. Ali

    Hi Admin,

    Very well explained, even better than TD’s documentation. Thanks for sharing the information.

    But, I would like to make a point here, sharing information is good, stopping someone from copying your content for later use is not good. You know there are thousands of ways that the content can still be copied. You get your credit, since sharing is caring so I would recommend not to use the age-old method of not letting someone copying your content.

    Thanks again

    1. admin

      Thanks for your feedback, we understand your point and will see what best we can do for this.

  6. Ritesh kumar

    sorry to say , but I didn’t.

  7. abhishek

    Hello Admin, Nice and quick Explanation of primary indexes….though I have a query here…:
    Let’s assume we have a employee table with Dept_Id as a Primary Index and Emp_ID as a Secondary index…How many AMP operation it wil be when we execute these below queries:

    1) sel * from employee where Dept_id’101′ and Emp_id=’10’
    2) sel * from employee where Dept_id’101′

    Please reply..Thanks in advance

    1. abhishek

      Here Dept_ID !=101

  8. Manju

    Could you please answer the below question:

    If we have a primary index as column1 for table A and composite primary index on table B as (column1, column2). I am loading both the tables using MLOAD utility (seperately).

    Both table A and table B have equal number of data count and their Primary Indexs have same number of unique values.
    The tables has huge volume of data.

    In this case which table will be loaded faster and why?

  9. admin

    Hi All,

    We have started a new special online teradata training batch for developers and DBA profile. For those who are interested to learn teradata can register in this batch. Fees is quite less when you compare it with other batches and biggest advantage is that its a instructor led training batch so you can ask any doubt in the training itself. Also we’ll be covering latest version of Teradata i.e. Teradata 14.10

    Please check following link for more details about this batch –

    Limited Seats. So try to register ASAP.

  10. Ritesh

    Hi Admin
    I’m a newbie to Teradata. I have a got a few doubts :
    1. How do you calculate the Row Hash value(11110000111100001111000011110000 ) for PI value=55 ?

    2. From the hash table given , how do you come to the conclusion that we need to choose 1st row and 4th column to find the AMP no ?

    3. Is uniqueness value a 32-bit binary number ??

    1. Meena

      Hi Rithesh, i also new to Teradata. Did u get the answer for ur query?

      1. Pratik

        Hi Meena & Ritesh,
        If u got the answer, pls do share. I’m also having the same doubt.
        Thanks in Advance.

    2. admin

      here goes your answer –
      1. How do you calculate the Row Hash value(11110000111100001111000011110000 ) for PI value=55?

      Row hash value is calculated using hashing algorithm which is C program residing in the optimizer. This HA will always generate 32-bit binary digit for your PI value. Please don’t confuse by the number (11110000111100001111000011110000 )given here, this is just for understanding purpose, actual binary number for PI = 55 may be different from the one given here

      2. From the hash table given , how do you come to the conclusion that we need to choose 1st row and 4th column to find the AMP no ?

      HASH map is another algorithm residing in optimizer this is also C code which points binary number to the available AMP in Teradata. Again all these examples are for our understanding and we cannot really see the output of code directly in Teradata.

      3. Is uniqueness value a 32-bit binary number ??

  11. Pawanism

    AMP never stores the rows of 2 different tables together.

    Bit confused…

    (1) Am having three tables A,B and C so does it mean that rows of these tables will be stored in three AMPS.
    Table A — AMP1
    Table B– AMP2
    Table C– AMP3
    (2) The above statement is applicable for only identical tables ??

    1. admin

      Am not sure this statement is published in this blog anywhere.

      Actually each AMP stores some portion of each table exist in Teradata.

      1. pawan

        The statement was replied by admin to question of D_Customer and D_customertemp.
        Nice explanation of primary index admin really helps a lot .

    2. Mukul

      well A small part of the table will be stored in each amp of the system irrespective the size and type of the table

  12. uma

    when do we go for full table scan

    1. Balakishore

      Full Table scans typically occur in the following situations:
      (1) If The index columns are not used in the query.
      (2) An index is used in non-equality test.
      (3) A range of values specified for the Primary Index

    2. mukul jain

      Well we go for full table scan when we don’t have any primary or secondary index for the table

    3. admin

      Well, there may be n number of situations for full table scan and some of them are legitimate as well. Scenarios given by Balakishore and Mukul Jain is also 100% correct.

  13. Ivy

    Hi Nitin,

    Nice explanation! This site is great!!
    Got few doubts. Kindly help me.
    1. the output of hash algorithm is 32bit row hash.The 20 or 16bit is called bucket. the entry of hash map. your image above is 1,2,3,4,1,2,3,4……Are all the hash map distributed like this?
    2.Will this method lead distribution skew?

    1. admin

      Thanks, here goes your answers –

      1) No. The representation in the diagram is just to make it easy to understand.
      2) In case of bad PI your data will land into one single AMP, that finally results in skewness.

      1. Ivy

        Dear Admin,
        Thank you for your reply.
        But I just came up with another question.
        How are the AMP numbers distributed in a hash map ?
        In what order ?
        Looking forward to …..

        1. Kanav Narula

          If we have 5 AMP system, Hash Map will have number from 1 to 5 and after 5 it again starts from 1 till 5 comes and it and it is repeated 1 million times because one Hash Map have 1 million buckets.

          1. Mukul

            @kanav :
            Hash mapping for bucket starts with 0 and goes to the number less than the number of AMPS

  14. shrikant

    Very well explained.

    Can you pls explain..how teradata will process any ddl statment or any alter statment.

  15. ratika malik

    Like the explanations

  16. krishna

    hii admin

    while creating table if PI was not specified but PRIMARY KEY constraint imposed on COLUMN_1 and UNIQUE imposedon COLUMN_2 wich column will it consider as PI by default….(COLUMN_1 or COLUMN_2)

    1. Prasad K

      Primary key itself has unique values. It will take primary key as primary index

  17. shivaganga

    Hi Admin,
    Very useful information 🙂
    If in case i am having the duplicated records of PI values then Uniqueness value is generated as 1, 2,3 and so on
    Wanted to know If i am having the same PI values in the table
    How does the AMP identify record of which Table and How.

    1. admin

      AMP has a specific section for each of the table.
      So one value of PI will goes in the section of D_CUSTOMER table, while other duplicate values goes in D_CUSTOMER_Temp table.

      AMP never store the rows of 2 different tables together.

  18. shrikant


    How can i access the unique value along with Hash row?
    My table is having NUPI and i want to select unique record on the basis of Hash row & unique value?

    Is there any substitute of RowID in Teradata 13


    1. admin

      To find hash value try this-
      sel hashrow(columnname) from table;

      to find unique hash value try this –
      sel distinct hashrow(columnname) from table;

      1. shashi

        Hi Admin,

        1st thanks for such a wonderful post…this is very helpful.

        As i understand hashrow is generated based on the PI column, then in the above sql what should be the column name(sel hashrow() from table;), by default we should pass PI as the column name or can we pass any column from the table?

        2nd question is this query valid in Teradata 13?


        1. admin

          1) you can pass any column in hashrow function. It will simply give you the hashed value (32 bit) of that column. If you pass PI column then you’ll get the actual hash value which distribute that complete row.
          2) yes

  19. mike

    Hi Admin,
    Can you tell me that ,how to get Uniqueness value for each row.?

  20. mike

    Thanks Admin, very good article!!!!!!!!!!!!!!!!

  21. Siva

    I am new for Teradata and I moved here from Oracle. This site having very good information and easy to learn those concepts.

    But I unable to copy these information for my learning purpose.
    I think disable the copy option in this site.

    Siva, India.

    1. admin

      Thanks for your appreciation.
      But please understand that contents are banned from copying to protect the originality. However you can use the link for your reference.

      Hope you understand 🙂

  22. Ram

    hi admin

    can you plz explain in which scenarios fulltable scan would be good

    1. admin

      its not about good and bad, optimizer will go for FTS (full table scan) only when there is no other options like (SI,PPI etc).

  23. Mahesh

    If we have 2 tables with the same set of PI columns defined, will it be using the same Hash value and only the Uniqueness ID will change or Hash value cannot be shared across tables?

    1. admin

      in this case hash value will be the same for both the tables, but at the time of retrieval of data, optimizer knows from which table to select the rows.

  24. Rama

    As per the above explanation how to know the Hash value and uniqueness value from the table

    1. Praveen Kumar

      Use the below query to get the has value

      select hasrow(<>) from tablename;

  25. Priya

    Hi could u post on PPI

    1. admin

      very soon 🙂

  26. Surya

    Really useful info…

  27. manohari

    is thr a way i can see how many rows r assigned to which amp??

    1. admin

      sel hashamp(hashbucket(hashrow(PI_COLUMN))), count(*) from tablename group by 1

      where tablename is the table whose PI is PI_COLUMN

      1. ZEROMAN

        Hi admin,
        I executed your above query and got no. of amp’s also i got near about equal partition accross all amp’s.

        Thank You.

  28. abhishek

    Hello Admin

    As you told PI can be nullable. How it then stores that reocrd. I mean what will be the row hash value?

    1. admin

      sel hashrow(NULL)

      by this you can find the ROW HASH value of NULL, which is 00-00-00-00

      1. Madhav

        can u give teradata training…

  29. Manisha

    Hi Nitin,

    Got few doubts. Kindly help me.
    1. Can the PI be nullable?
    2. What is the maximum number of columns in a table which can be defines as PI (like in case of SI, it is 32)

    1. admin

      Hi Manisha,

      here you go –

      1) Yes PI (UPI or NUPI) can be nullable
      2) From TD 13 onward we can give 64 columns as composite PI, in earlier version i guess only 16 columns.

      1. Kumar

        Hi Admin,

        PI can be nullable. It means is it applicable for more than one column as PI or if it is applicable for only one column then how the data distribution will take place with PI column having NULL values?

        1. admin

          if you are defining more than 1 column as PI, then all those columns should be NULL to generate PI as NULL.
          NULL is treated as any other value in TD, so all the NULL values will be collected in a single AMP.

  30. priya

    Hi Admin.. nice explanation 🙂

    I have a doubt about this data selection once distributed among the AMPS.

    Suppose I have a table defined with UPI, using this UPI data has been distributed among the AMPS.Now while querying the table if i use a column other than UPI(in where clause) now i want to know how the PE gets the plan for AMP to retrieve the data from Pdisk. As we know how the PE gets the AMP number sending PI value thru hashing Algorithm whether non PI/SI column is also sent in the same and gets the details..?
    NOTE: (the other column used in where clause is not an UPI or any SI too)

    Admin as your explanations are very spoon feeding and crystal clear pls explain me this scenario it would be gr8 help to me 🙂

    1. admin

      Lets explain this with the help of example –

      suppose i ran the query –

      Sel * from EMP where First_name = ‘admin’

      now in the table EMP First_name is not a PI or SI. So when PE makes the plan for this query it will go for FTS (full table scan).
      In this FTS all the AMPs work in parallel and retrieves the rows of EMP table from its VDISK into SPOOL space. After retrieval of all rows redistribution will take place in this SPOOL space on the hashed value of First_name, so that rows will be sent to its proper AMP based on this new hashed value.
      After doing this redistribution , now PE will hash the value of ‘admin’ and then check it with HASHMAP to get the AMP no where the row will reside.

      Hope it helps 🙂

      1. sush

        If there millions of records then also Amp will retrive the data to spool space first.
        Is this option is applicable to huge data.Please explain

        1. admin

          Hi Sush,

          Teradata works irrespective of data volume. So the processing for 10 rows will be same as that of 10 billion rows. 🙂
          Read Primary index post for understanding of data distribution concept.

          1. shashi

            HI Admin,

            I have few doubts,
            1. In case multiple columns are part of PI in a table and only one out of them are in where clause. in this case full table scan will be done or PE will simply hash based on this 1 column which is part of PI?

            2. please could u clarify again on “rows will be sent to its proper AMP based on the new hashed value”. will this be a permanent row redistribution


          2. admin

            1) Full table scan is done, as PI hash value is combination of columns.
            2) Redistribution on this new hash value will be in SPOOL space.

    2. admin

      Lets explain this with the help of example –

      suppose i ran the query –

      Sel * from EMP where First_name = ‘admin’

      now in the table EMP First_name is not a PI or SI. So when PE makes the plan for this query it will go for FTS (full table scan).
      In this FTS all the AMPs work in parallel and retrieves the rows of EMP table from its VDISK into SPOOL space. After retrieval of all rows redistribution will take place in this SPOOL space on the hashed value of First_name, so that rows will be sent to its proper AMP based on this new hashed value.
      After doing this redistribution , now PE will hash the value of ‘admin’ and then check it with HASHMAP to get the AMP no where the row will reside.

      Hope it helps 🙂

      1. Learner

        Hi Admin,

        Great explainations,
        Got one doubt
        1.While the redistribution happens and the rows are redistributed in the AMPs, are the exixting rows for the same tables overwriten?
        2. If yes, is the redistribution maintained till the next transaction?

  31. narsing

    how u got the row hash nimber 55 is 11110000 111100000 111110000 11110000
    first row fourth column

    1. admin

      we get this number through HASHING ALGORITHM, it’s a C program.
      Now the point to note here is that the binary number which i have written is only for example purpose, its not the exact number which we get by hashing 55 😉

  32. Raju

    Hi ,

    I have one dbout please clarifiey.

    you are saying in above post that –>11110000111100001111000011110000 tells that its corresponig to the first row and fourt column of HASH MAP i.e., AMPno=4 . I am not getting how it decides based on the binary number it should go to AMP no=4..please explain me ?

    Thanks ni advance

    1. admin

      the row hash which we are getting i.e. 32 bit binary number is the indicator only that can point to a certain bucket in HASH MAP. Treat this as a pointer to that bucket.

  33. Rohan

    Loved it..!!
    explained beautifully..!!
    Mast !!

    1. admin

      thanks 🙂

  34. TDLearner

    Fantastic Explanation for new in teradata.

  35. suhasini

    Hi really good to see your site. More useful info and it helps a lot.

  36. Leo Oberto

    Hi Admin,

    Love your posts!

    I was reading your post and I come across about USI (Unique Secondary Index). Is there similar to UPI but adding another hash -row to another AMP?

    Thanks and Regards.

    1. admin

      Yes Leo

      USI will store in some another AMP in the same fashion as UPI, it also create it owns SI subtable in each AMP for storage

  37. Jan

    I’m curious. In which part of the PE where the computation takes place? Dispatcher?

  38. ghouse

    Nice Explanation .easy to understand and can grasp in short span of time.Could you please explain the usage of uniqueness value in NUPI .
    Suppose select * from where last_name=’kumar’ (NUPI is defined on the table).Based on the rowhash value only it fetches the data.If yes please explain the significance of uniqueness value
    Thank you ..

  39. Debraj Mukherjee

    Hi Nitin, A good post especially for the newbies but there is one small thing regarding the primary index that you have overlooked. If you don’t specify the primary index in the DDL of a table then Teradata doesn’t assume the 1st column as primary index. There are 3 checks involved namely:

    1. Teradata looks for the 1st Unique Column of the table and if there is a Unique column present for that table then Teradata Chooses it as a Unique primary index
    2. If there is a primary key defined (for example in a conceptual data model) and you are using the conceptual data model to create the table then Teradata will take the Primary key column as a Primary Index
    3. If any of the above two cases are not satisfied then Teradata takes the first column on the table and creates a NUPI on that column

    You can correct me if I’m wrong.

    1. admin

      Hi Debraj,

      Thanks for providing your inputs. 🙂

      All the three points are perfectly valid and also very well explained, thanks for enriching my post.

      1. Anuj M

        Hi Nitin,

        How are you?
        I created one table below in TD13.11 :

        create table ttmp.abcde_anujma
        ename varchar(30),
        –eid integer unique not null, — UNIQUE(eid)
        –eid integer, –PI is ename
        eid integer not null, –PI is ename
        esal decimal(15,2)

        You can check my comment on PI with 3 different types of EID column.
        I was wondering if first condition mentioned by Debraj holds true or not?

    2. vishal

      Debraj..how it checks for the unique coulmn when there is no data in table??

      1. admin

        there is a UNIQUE constraint in SQL which we can define in the DDL of the table. Based on that constratint it check that columns will be UNIQUE or NON UNIQUE.

    3. Kumar

      Hi Debraj,
      About point 1 & 2, i am not getting it correctly and bit confused..(. Can you give more details regarding the same. Also 1st unique column of the table
      — Is it unique value of that column?
      — Is it unique column itself, then how to find the unique column?

      1. admin

        Don’t confuse between UNIQUE value and column.
        At the time of creation of table when we are defining the DDL, then obviously there is no data in the table.
        If you have given any UNIQUE constraint on the column then TD will pick that column as UPI or else if you have defined Primary key constraint then agin that column will be defined as PI.
        if neither of the above is true then it will make first column as PI.

        UNIQUE column here signifies the UNIQUE constraint on the DDL.

  40. Andrew J. Gephart

    Do you people have a facebook fan page? I searched for one on facebook or twitter but could not discover one, I’d really like to become a fan!

  41. SQL_Server_DBA

    Slick way of explanation.
    I will be benefited if you could explain the concept and usage of PPI & MPPI too.

    Thanks 🙂

    1. admin


      Actually i purposely tried to make all my post as simple as possible for newbies, so am afraid that it has nothing to offer you 🙁

      But i will try to make another post on PPI and MPPI as mentioned by you 🙂

      1. sugesh

        awsum !!! i understood….. looking for the same on other teradata topics

  42. admin

    Done 🙂

  43. gaurav sharma

    In NUPI table example.. i think there is some error…row hash with uniqueness ness value 3 is different.

    1. admin

      Yes Gaurav, you are correct.
      i was just checking that anybody is reading my blog properly or not …. and its good to find that i have some active users also 😛

      my mistake .. will update it asap

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>