«

»

Apr 21

SET or MULTISET tables



We know that in Teradata all the tables are either SET or MULTISET.

The difference between SET and MULTISET tables is –


 SET tables – SET tables did not allow duplicate values in the table.

e.g.

COLUMN 1          COLUMN 2          COLUMN 3          COLUMN 4

A                            b                            c                            d

H                           g                            y                            k

A                            b                            c                            d             (Not allowed)

Syntax

CREATE SET TABLE <table_name> ………

………

 

MULTISET tables – MULTISET tables allow duplicate values in table.

e.g.

COLUMN 1          COLUMN 2          COLUMN 3          COLUMN 4

A                            b                            c                            d

H                           g                            y                            k

A                            b                            c                            d             (allowed)

Syntax

CREATE MULTISET TABLE <table_name>………

………

If not specified in the DDL of the table then Teradata will create table as default SET. A SET table force Teradata to check for the duplicate rows every time a new row is inserted or updated in the table. This is an overhead on the resource if we need to insert massive amount of rows.

Which table to choose?

Before creating the table it’s very important to know what kind of data is required in the table and based on that we must define SET or MULTISET.


Remember that SET table causes an additional overhead of checking for the duplicate records. So we need to follow few points to save Teradata from this additional overhead.

  • If you are using any GROUP BY or QUALIFY statement on the source table then it’s highly recommended to define target table as MULTISET. As GROUP BY and QUALIFY will remove the duplicate records from the source.
  • If the source table has UPI (Unique Primary Index) then also there is no need of SET target table. As UPI will never allows duplicate PI in the same table.

So with the help of little bit of awareness about SET and MULTISET we can save a lot of time while loading the table.


Key Points to remember.

  • If we are inserting data using INSERT into SEL from clause then SET table check for duplicate rows will removed automatically and there will be no DUPLICATE ROW ERROR.
  • If we are inserting data using INSERT into VALUES clause then SET table check for duplicate rows will not be removed automatically and there will be DUPLICATE ROW ERROR.


25 comments

1 ping

Skip to comment form

  1. mark t

    the whole reason for specifying PK’s is to prevent duplicates (else how do you know which record is valid?).

    I must be missing something.

    Can someone explain why you would want duplicate records (same primary key / MULTISET)? if ‘duplicates’ are wanted or are present, then maybe the PK is not defined well.

    thx

  2. Ana Jain

    A good website to learn teradata basics.
    Thanks!

  3. Indu

    Hi Admin,

    The above post says “SET” is default. But I tried creating a table and found that “MULTISET” is default. COuld you please clarify?

    Thanks

    1. admin

      In the new version default format has been changed from SET to MULTISET. So if you are using TD 12 onward then default is MULTISET.

      1. Manjul Pant

        I am using TD 14 , but still defaul table is table .

      2. Manjul Pant

        I am using TD 14 , but still default table is the set table

    2. Sheikh

      ## CREATE TABLE by default will result in a SET table being created if the session is using Teradata semantics;
      ## CREATE TABLE by default will result in a MULTISET table being created if the session is using ANSI semantics.

      ##You can tell which semantic mode you’re with SQL Assistant or BTEQ using the following information:

      SQL Assistant (AKA Queryman) – run the HELP SESSION command and look for the column named Transaction Semantics to tell whether you’re in Teradata or ANSI mode

      BTEQ – log into BTEQ and look for the message that says ‘Tranaction semantics are’ followed by BTET (Teradata) or ANSI

  4. teradatapro

    I would like to add some details. The reason why duplicate row checks can have such an heavy impact is the way it is done. Basically, Teradata has to compare the reocrd which should be inserted sequentially against all others having the same row hash. This may not even be a problem in case of a few duplicates but becomes worse with many rows per Primary Index or in case of heavy skewing.

    1. admin

      Thanks Roland

  5. Chhatresh Joshi

    Row size / Sort Key size Over Flow error:

    Could you please explain about this?
    Table has over 100 columns and selecting * from table throws this error.

    Might be it is due to the lenght of the coulmn name>

    Thnaks.
    CJ

  6. Surya

    Very Useful and clean and clear explanation of each and every topic..

    Thanks a lot…..
    u earned it…

  7. Dinesh Saraswat

    Hi Dhamu,

    To have duplicates for your primary key columns in a table you have to define that table as SET table and PRIMARY INDEX on the KEY columns. Hence it will capture the duplicates on the primary keys.
    If you define UPI than it will not accept any duplicates at the UPI level, which are your PK’s in this case.

    Also to mention if you are expecting full duplicates like
    Col A, COL B , COL C, COl D
    A,B,C,D
    A,B,C,D

    Then your SET table with PI will also not capture the full duplicate record.

  8. Dhamu

    Hi Admin,

    Thanks for brief expalnation. I have a query, If I created SET table with columns an data

    COLUMN 1 COLUMN 2 COLUMN 3 COLUMN 4
    A B C D
    A C D B
    A B F G

    There is no duplcates. If I created NUPI on (COLUMN1, COLUMN2) there will duplicates for this two column combination right? Create SET table and UPI on two columns make sence right?

    Please let me know what do you think.

    Thanks,
    Dhamu

  9. aditya

    What does “Merge block ratio” mean in every Teradata table definition?

  10. Kumar

    HI Admin,

    Multiset table is clear but wanted to know in which scenario we can use the SET table? For ex: i am creating a SET table with UPI. In this case, SET table itself will take care of removing duplicates or it won’t allow duplicates, then what is the need of UPI?
    Please clarify.

    1. admin

      defining table as SET if it has UPI, make no sense.

      you should use SET table only with NUPI, in case your requirement is not to include complete duplicate rows then you can use SET with NUPI

      1. Kumar

        Thanks Admin…:)

      2. Stef

        Hi, what does NUPI stand for? Thanks

        1. admin

          Non Unique Primary Index

  11. MIKE

    Please explain Fallback option in create statement

    1. admin

      Fallback is a data protection feature in teradata at table level. Fallback protected tables stores the mirror image of the data of the original table. In case of failure, this fallback copy is used to replace the original table.

  12. Dinesh Saraswat

    Very Clear and Crisp explanation of every concept whichever explained.

    🙂

  13. deepak shrivastava

    The content of the complete website is fantastic. It’s brief and clear. User gets a clear understanding of what it is. It’s very useful.

    1. admin

      Thanks 🙂

  14. Anurag

    This is an awesome post. Brief yet powerful. Small yet complete. I thought I knew it (SET- MULTISET) completely, but it still added to my knowledge

  1. SET or MULTISET tables » TeraData Tech | BlinkMoth Software Industries

    […] Post From Teradata – Google Blog Search: set tables, multiset tables, set and multiset tables in teradata , set and multiset, multi set in […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>