«

»

Feb 25

Partition Primary Index – Basics

Partitioned primary index or PPI is used for physically splitting the table into a series of subtables. With the proper use of Partition primary Index we can save queries from time consuming full table scan. Instead of scanning full table, only one particular partition is accessed.


Follow the example below to get the insight of PPI –

We have an order table (ORDER_TABLE) having two columns – Order_Date and Order_Number, in which PI is defined on Order_Date. The primary Index (Order_Date) was hashed and rows were distributed to the proper AMP based on Row Hash Value then sorted by the Row ID. The distribution of rows will take place as explained in the image below –

PPI_1

Now when we execute Query –

Select * from Order_Table where Order_Date between 1-1-2003 and 1-31-2003;

This query will result in a full table scan despite of Order_Date being PI.

PPI_2

Now we have defined PPI on the column Order_date. The primary Index (Order_Date) was hashed and rows were distributed to the proper AMP based on Row Hash Value then sorted by the Order_Date and not by Row ID. The distribution of rows will take place as explained in the image below –

PPI_3

Now when we execute Query –

Select * from Order_Table where Order_Date between 1-1-2003 and 1-31-2003;

This query will not result in a full table scan because all the January orders are kept together in their partition. 

PPI_4



Partitions are usually defined based on Range or Case as follows.

Partition by CASE

CREATE      TABLE ORDER_Table (

ORDER_ID  integer NOT NULL,

CUST_ID integer NOT NULL,

ORDER_DATE  date ,

ORDER_AMOUNT integer

)

PRIMARY INDEX (CUST_ID)

PARTITION BY case_n ( 

ORDER_AMOUNT  < 10000 ,

ORDER_AMOUNT  < 20000 ,

ORDER_AMOUNT  < 30000,

NO           CASE     OR        UNKNOWN ) ;


Partition by RANGE

CREATE      TABLE ORDER_Table

(

ORDER_ID  integer NOT NULL,

CUST_ID integer NOT NULL,

ORDER_DATE  date ,

ORDER_AMOUNT integer

)

PRIMARY INDEX (CUST_ID)

PARTITION BY range_n (

ORDER_DATE BETWEEN date ‘2012-01-01’ and ‘2012-12-31’ Each interval ‘1’ Month,

NO        range OR  UNKNOWN ) ;

If we use NO RANGE or NO CASE – then all values not in this range will be in a single partition.

If we specify UNKNOWN, then all null values will be placed in this partition


To know about the advantage and disadvantage of PPI please go to link below –

Partition Primary Index – Advantage and Disadvantage

35 comments

Skip to comment form

  1. Sagar

    Hi,

    The explanatin is good but need to understand how the data is been retrieved while quering on PPI?
    Will it exactly query only that partition in AMP instead of having so many number of partitins in AMP?

    1. admin

      Data retrieval process remains same with the exception of an additional check on partition number. If you are using PPI column in where clause then with the help of binary search Teradata will query exact partition, but in case no PPI column is being used in where clause then it needs to scan all the rows (in respective AMP) to find the matching row. This is trade off of using PPI because all queries without PPI column face all rows scan (in single AMP).

      Hope it helps !!!

      1. Sagar

        Thanks..one more question what happens if i define PPI columns other than PI column in that table.How the data is being retrieved.will the performamce increase or decrease by the above method.??

  2. Vijay Jadoun

    Dear admin ,

    thanks for share Wonderfull blog

    Thanks regards
    Vijay Jadoun

  3. Veeresh alkod

    Thank you Sir.

  4. Rajesh Korlepara

    What is the process to change the existing table to PPI table and will there be any change in row distribution after conversion into Partioned primary index?

  5. krishnendu hazra

    Sir can you please kindly explain multiload , the way u have explained fastload?

    fastload explanation was really helpful..

    Regards

  6. viraj raina

    How many amp operation is PPI?

    1. admin

      can be any number of AMP depending on your query. but not full table scan if you are using PPI columns in where clause

  7. Akanksha Gupta

    Range bases query on NUSI column ALWAYS do Full Table Scan OR it may utilize index?

  8. Manoj

    simply superb explanation.

  9. jeffry

    Information is very clear and simple…good work admin/..all the best for your future works

  10. raj

    will there be an impact while loading huge data into tables like performance increase/decrease

    1. admin

      yes definitely.
      if data is huge and your table is defined as SET table, then teradata needs to check uniqueness of each new row inserted. Similar check is required for Unique columns, UPI and USI as well.
      This is just one example where your performance can be degraded because of huge number of rows.

  11. anji

    its useful to me

  12. anji

    gud

  13. Naveen

    Why it is called partition primary index ? if we can do PPI on any column no matter its PI or not .

  14. dilip

    I did not get this. So taking your last example. suppose cust_id is the primary index (NUPI) and you did the partitioning based on case (order_amount). Now i fire query like this:
    sel * from order table where order_amount between 5000 and 15000;

    what does it mean ? we are not quering on the PI (cust_id) at all ! this will result into FTS still..right.
    Also, why is it called PPI if the PI is cust_id and partitioning is done on ‘order_amount’ ? Thanks 🙂

    1. admin

      if order amount is defined as PPI, then optimizer will go for dynamic partition elimination step and pick only those partition which falls into the range specified in WHERE clause. Thus it will not be a FTS operation.

  15. Sindhu

    Hi,
    Very nice explanation. Finally understood PPI.

    Thanks a lot!

  16. CinderElla

    How do I actually view infromation about the partitions? Info like amount of data in each partition, # of rows, date loaded, those kind of items?

    1. admin

      you can write SQL queries to achieve this result. I don’t think there is any teradata tool available for this.

  17. Agilan

    Hi Admin,

    if we didnt mention NO RANGE OR NO CASE or UNKNOWN, what will happen to the records not falling in the range/case?

    Thanks.

  18. vignesh

    Hi,

    The article is really good.. 🙂
    One concern,
    In the example,since jan data is avilable in both the amp’s it would still require a full table scan right?

    1. dblob

      No, because each AMP processes only his own part of the table, especially only his part of the partition of the table..

  19. Muniraj

    Good explanation

  20. Prasanna

    Really superb. Very Simple explanation

    Finally i undeerstood PPI.

    Thanks Admin.

  21. Nagesh

    Good One.

  22. manju

    good explaination.. could have explained about Each interval ‘1’ Month,

  23. Sarika

    I never understood PPI but after reading ur article my doubt is clear.

  24. veera

    I missed ur website all these days 🙁 .. Really helpful …keep posting .

    1. admin

      Hi Veera,

      What do you mean by missed it ? … are you facing some server downtime for my website ?
      Next time please let me know in case you are facing this issue on my site 🙂

  25. Nitin

    Very Well explained 🙂
    Thanks for sharing this info.

    1. admin

      Thanks 🙂

      1. Naveen

        well explained..

        why it is called parition primary index ?
        if we can do PPI on any column no matters its PI or not .

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>