«

»

Dec
25

Join Indexes



The join index JOIN the two tables together and keeps the result set in the permanent space of Teradata. This JOIN index will hold the result set of the two table, and at the time of JOIN parsing engine will decide whether it is fast to build the result set from the actual BASE tables or the JOIN index. User never directly query the JOIN index. In the sense JOIN index is the result of joining two tables together so that parsing engine always decide to take the result set from this JOIN index instead of going and doing manual join on the base table.

Types of JOIN index -
Multi table JOIN index

Suppose we have two BASE tables EMPLOYEE_TABLE and DEP_TABLE, which holds the data of EMPLOYEE and DEPARTMENT respectively. Now a JOIN index on these two tables will be somewhat-

CREATE JOIN INDEX EMP_DEPT
AS
SELECT EMP_NO,EMP_NAME, EMP_DEPT, EMP_SAL, EMP_MGR
FROM EMPLOYEE_TABLE EMP
INNER JOIN DEP_TABLE DEP
ON EMP.EMP_DEPT = DEP.DEPT_NO
UNIQUE PRIMARY INDEX (EMP_NO);
 

This way the JOIN index EMP_DEPT holds the result set of two BASE tables, and at the time of JOIN PE will decide weather it is faster to join actual tables or to take result set from this JOIN index. So always choose wise list of columns and tables to create JOIN index.

Single Table JOIN index

A single table JOIN index duplicate a single table, but changes the primary index. Users will only query the base table and its PE who decide which result set is faster, from JOIN index or from actual BASE tables. The reason to create the single table JOIN index is so joins can be performed faster because no redistribution or duplication needs to occur.

CREATE JOIN INDEX EMP_SNAP
AS
SELECT EMP_NO, EMP_NAME, EMO_DEPT
FROM EMPLOYEE_TABLE
PRIMARY INDEX(EMP_DEPT);

Aggregate JOIN index

An aggregate JOIN index will allow the tracking of Averages SUM and COUNT on any table. This JOIN index is basically used if we need to perform any aggregate function in the data of the table.

CREATE JOIN INDEX AGG_TABLE
SEL
DEPT,
AVG(EMP_SAL)
FROM EMP_SALARY
GROUP BY 1;

The main fundamentals of JOIN indexes are -

    • JOIN index is not a pointer to data it actually store data in PERM space
    • Users never query them directly, its PE who decide which result set to take
    • Updated when base tables are changed
    • Can’t be loaded with Fastload or Multiload.

 


39 comments

No ping yet

  1. Mahesh Zac says:

    Hi Admin,

    I’ve a little doubt in my mind, Is it really required to have a UNIQUE PRIMARY INDEX(EMP_NO) for JOIN INDEX? If it YES, How about distribution for same?

    Can we use another column as UPI for JOIN INDEX, which is not a UPI of same table but column having unique values.

    Regards,
    Mahesh

  2. ravi says:

    Hi Admin,

    When we are creating joining index on 2 tables with 10 columns in the select clause.In my query if we are using 11 th column which is not in the join index will it use join index for join.

    Thanks,
    Ravi

    1. admin says:

      No, if the number of selected columns is more than what given in JI then optimizer will not make use of JI.

  3. admin says:

    Hi All,

    We have started a new special online teradata training batch for developers and DBA profile. For those who are interested to learn teradata can register in this batch. Fees is quite less when you compare it with other batches and biggest advantage is that its a instructor led training batch so you can ask any doubt in the training itself. Also we’ll be covering latest version of Teradata i.e. Teradata 14.10

    Please check following link for more details about this batch –
    http://www.onlineteradatatraining.com/?page_id=96

    Limited Seats. So try to register ASAP.

  4. sam says:

    Hi Amin,

    Thanks for such an simple and clear explanation on topics. I’m late for the show but I’m quite enjoying reading through your blog.

    For the question regarding Single Table Join Index and Secondary Index, I think both are different. In STJI the table is stored in Amps by hash value generated on the column(not on the PI column), in your example its “EMP_DEPT”. But in SI we use subtable concept.

    Please correct me if I’m wrong.

    1. admin says:

      yes,
      in addition to that, Join Index stores actual ROWS (based on your select query of JI) while Secondary Index is like a pointer to the actual base table row.

      1. sam says:

        Thanks !!

  5. Amit says:

    Good explanation about indexes………….Keep posting

    Thanks,
    Amit

  6. Amit says:

    Good explanation about indexes………….Kepp posting

    Thanks,
    Amit

  7. Shashi says:

    Hi Admin,

    Thanks once again for your wonderful blogs…I have one doubt.

    Suppose we have a report requires 10 tables to join , say A1,A2…A10.

    and we have created join index as “Join_index_temp” for joining table A1 to A5.

    Now in the report which joins table A1 to A10 how the syntex should be…joining all the 10 tables or joining “Join_index_temp” and remaining A6 to A10. I want to know how the PE will know whether an Join Index has been created which has join among 1st 5 tables.

    Hope my question is clear..

    Thanks
    Shashi

    1. admin says:

      As soon as you create JI, there is an entry in the DBC.INDICES table (indextype =’j’). If in your reporting query you are selecting those columns defined in JI, then PE will select the rows from JI only. If there is any other column which is not mentioned in your JI then PE will consider the base tables.

  8. Bhawani says:

    appriciate ur explanation, i always reffer ur blogs

  9. Chetan K. says:

    I am confused about the Aggregate Join Index query above. I’ve pasted it here again..

    CREATE JOIN INDEX AGG_TABLE
    SEL
    EMP_NO,
    SUM(EMP_SAL)
    FROM EMP_SALARY
    GROUP BY 1;

    Now, why would I sum the salary of an employee who is GROUP’ed BY EMP_NO only? I am assuming that the EMP_NO is unique in the EMP_SAL table. If it is not, we have a problem. I am trying to understand what performance gain will be achieved by building an AGGREGATE JOIN INDEX on the above query. The above query is the same as..

    SELECT EMP_NO, EMP_SAL FROM EMP_SAL;

    -Chetan

    1. admin says:

      Hi Chetan,

      Thanks for pointing this.
      Your are right, this query is same what you have mentioned below and no significant performance gain.

      We have changed the example now.

      Thanks

  10. sravani says:

    Hi,
    I am new to Teradata. From the above expalination i understood like, single table JOIN INDEX can be treated as Secondery index. Please correct if my understanding is not correct…

    1. admin says:

      Single table JI cannot be treated as SI, because there are lot of differences between these two. The most important one is – SI is like a pointer to the base table row while JI stores the actual row, based on your select query.

  11. Vaishu says:

    Hey admin.. Your explanations are pretty much clear. Appreciate the effort that you are putting. Gr8 job. Thanks for spending Time:):)

  12. Rajagopal says:

    Thank You very very much, it’s really really useful…………

  13. Manju says:

    How does PE gets to know whether a join Index is created for a particular join??

    Suppose I query using the below SQL, then how will PE know that there is a join index EMP_DEPT existing for this join?

    SELECT EMP_NO,EMP_NAME, EMP_DEPT, EMP_SAL, EMP_MGR
    FROM EMPLOYEE_TABLE EMP
    INNER JOIN DEP_TABLE DEP
    ON EMP.EMP_DEPT = DEP.DEPT_NO

    1. admin says:

      As soon as you create any JI, there is an entry in the system table for it . DBC.INDICES where indextype = ‘J’.

  14. priyabrat says:

    Hi Admin,
    I have the following doubts:
    1. Suppose i create a join index on a column then how can i use that join index in my stored procedure ,just like i create stats on some columns and later i just write Collect stats on table_name in my procedure.
    2.How the join indexes are getting updated if new records are inserted into the table.
    3.You have mentioned that :
    The reason to create the single table JOIN index is so joins can be performed faster because no redistribution or duplication needs to occur. Can u please explain how ??
    Thanks

  15. Venkadeshwaran says:

    Hi admin,

    First i would like to say thanks for your great job.

    i have some doubts regarding join index.

    1. Both Join Index and the normal join operations from base table are created by the user only. Then what is the use of creating index and wasting extra PERM space.
    2. How will the PE identify which is faster? is there any criteria for that?

    Thanks in advance.

    1. admin says:

      your answers goes below -

      1)Suppose for your reporting queries you requires join of 10 dimensional table to fetch data. When this query runs it will make data by joining tables at run time and slows the process. If you made join index for 2-3 tables before the reporting query then optimizer knows that it don’t need to make the data at run time instead of that it picks data from join index itself. So the processing time for join of this 2 tables is eliminated. This is just one example i have given in which join index is used to save execution time. Similarly you can have lot of scenarios in real life projects.

      2) Deciding which approach to choose is the optimizer call. This decision is totally based on getting effective explain plan. If optimizer believes that it can fetch the data by normal join processing, then it’s also possible that it wont go for join index.

      1. Venkadeshwaran says:

        Thank you.

  16. Aarsh says:

    Can you please explain partition primary index as well?

  17. Himanshu says:

    hey admin, thanks a lot for all your blogs. They are very easy to understand…

    Could you please explain that when should we create JOIN INDEX ? It would be great if you could provide a real life scenerio to elaborate…

    1. admin says:

      JOIN index will make a join between 2 or more tables and store the result in PERM space. Any changes in the parent tables data will be reflected automatically in JOIN index. So if you are confident enough that in your analysis you always required the same join on 2 or more tables, then its advisable to create JOIN index.

      1. iliyas khan says:

        Hi Admin,

        Please clarify my doubt. As you mentioned -
        “So if you are confident enough that in your analysis you always required the same join on 2 or more tables, then its advisable to create JOIN index.”

        In above we can choose to create a permanent table instead of join index. Why join index is advisable in this scenario ?

        Thanks
        Iliyas Khan

        1. admin says:

          in permanent table you need to add ETL for inserting data into that permanent table.
          while in JOIN INDEX, data loading takes places internally as soon as base tables data changes.

          1. learner says:

            hi admin,

            Thanks for such a great work.

            Can you please tell whether secondary index can replace single table join index. If yes what are the advantages of one over other?

  18. leela prasad says:

    how the system can know that to create any index at the initial stages of table creation

    2. is the join tables should be created by user or system based on queries?

    1. admin says:

      JOIN Index is created by the USER manually, but it is up to optimizer that he wants to make plan considering JOIN index or actual base tables. It picks the least expensive plan which ever is possible.

  19. Anu says:

    Thanks a lot…its easy to understand.I am new to Tera data and its very interesting…
    keep posting good articles..
    Thank you…

    1. admin says:

      welcome :)

  20. abhishek says:

    This explanation is pretty much clear……………………Thanks a LOT…..:)

    1. admin says:

      thanks …

  21. Anith Babu says:

    Thats a good and simple explanation!!!
    Thanks a lot :) :)

  22. Ram says:

    I came across so many discussions forums and websites to understand join indexes. But I didn’t find them useful to understand. Here Very good explanation…Really helpful to understand… Thanks a lot…

    1. admin says:

      Thanks Ram …

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>