SET or MULTISET tables



We know that in Teradata all the tables are either SET or MULTISET.

The difference between SET and MULTISET tables is –


 SET tables – SET tables did not allow duplicate values in the table.

e.g.

COLUMN 1          COLUMN 2          COLUMN 3          COLUMN 4

A                            b                            c                            d

H                           g                            y                            k

A                            b                            c                            d             (Not allowed)

Syntax

CREATE SET TABLE <table_name> ………

………

 

MULTISET tables – MULTISET tables allow duplicate values in table.

e.g.

COLUMN 1          COLUMN 2          COLUMN 3          COLUMN 4

A                            b                            c                            d

H                           g                            y                            k

A                            b                            c                            d             (allowed)

Syntax

CREATE MULTISET TABLE <table_name>………

………

If not specified in the DDL of the table then Teradata will create table as default SET. A SET table force Teradata to check for the duplicate rows every time a new row is inserted or updated in the table. This is an overhead on the resource if we need to insert massive amount of rows.

Which table to choose?

Before creating the table it’s very important to know what kind of data is required in the table and based on that we must define SET or MULTISET.


Remember that SET table causes an additional overhead of checking for the duplicate records. So we need to follow few points to save Teradata from this additional overhead.

  • If you are using any GROUP BY or QUALIFY statement on the source table then it’s highly recommended to define target table as MULTISET. As GROUP BY and QUALIFY will remove the duplicate records from the source.
  • If the source table has UPI (Unique Primary Index) then also there is no need of SET target table. As UPI will never allows duplicate PI in the same table.

So with the help of little bit of awareness about SET and MULTISET we can save a lot of time while loading the table.


Key Points to remember.

  • If we are inserting data using INSERT into SEL from clause then SET table check for duplicate rows will removed automatically and there will be no DUPLICATE ROW ERROR.
  • If we are inserting data using INSERT into VALUES clause then SET table check for duplicate rows will not be removed automatically and there will be DUPLICATE ROW ERROR.


31 comments

1 ping

Skip to comment form

  1. Win789vn8, not gonna lie, it caught my eye. I am trying my luck. See for yourself: win789vn8

  2. Interesting read! Understanding variance is key in any game of chance, and seeing platforms like nustargame slot focus on RTP (96.2-97.8%!) shows a commitment to fair play. Solid analysis here!

  3. Looking for some fresh games? Heard pakgamesdownload is the place to be. Gonna scope it out for some hidden gems. All games on pakgamesdownload

  4. Alright, been poking around sodopro.com. Not bad, not bad at all. Looks like they know their stuff. Give it a shot if you’re looking for something solid. Check it out here: sodopro

    • Ganesh Kumar on October 17, 2018 at 3:30 am
    • Reply

    Hi,

    I have a question that why duplicate error coming when we use insert into values but not when we use insert into see from other table. Please clarify.

    Thanks,
    Ganesh

    • mark t on March 25, 2015 at 8:39 pm
    • Reply

    the whole reason for specifying PK’s is to prevent duplicates (else how do you know which record is valid?).

    I must be missing something.

    Can someone explain why you would want duplicate records (same primary key / MULTISET)? if ‘duplicates’ are wanted or are present, then maybe the PK is not defined well.

    thx

      • Ike on June 16, 2018 at 11:50 am
      • Reply

      Because there is a difference between Primary key and Primay Index terms. Primary key is a logical term being physically implemented as Unique Primary /Secondary index in TD, while TD Primary index enables MPP. I.e it defines how rows are distributed among AMPs based on Primary index column(s). Hence Primary index doesn’t serve as Primary key. That’s why. – SET was only useful for Stage data, after NOPI was introduced, it’s obsolete, in fact. – I didn’t get why TD hasn’t set MULTISET as default.

    • Ana Jain on December 7, 2014 at 8:53 am
    • Reply

    A good website to learn teradata basics.
    Thanks!

    • Indu on July 2, 2014 at 7:09 pm
    • Reply

    Hi Admin,

    The above post says “SET” is default. But I tried creating a table and found that “MULTISET” is default. COuld you please clarify?

    Thanks

    1. In the new version default format has been changed from SET to MULTISET. So if you are using TD 12 onward then default is MULTISET.

        • Manjul Pant on July 11, 2015 at 2:29 am
        • Reply

        I am using TD 14 , but still defaul table is table .

        • Manjul Pant on July 11, 2015 at 2:32 am
        • Reply

        I am using TD 14 , but still default table is the set table

      • Sheikh on April 16, 2015 at 3:50 pm
      • Reply

      ## CREATE TABLE by default will result in a SET table being created if the session is using Teradata semantics;
      ## CREATE TABLE by default will result in a MULTISET table being created if the session is using ANSI semantics.

      ##You can tell which semantic mode you’re with SQL Assistant or BTEQ using the following information:

      SQL Assistant (AKA Queryman) – run the HELP SESSION command and look for the column named Transaction Semantics to tell whether you’re in Teradata or ANSI mode

      BTEQ – log into BTEQ and look for the message that says ‘Tranaction semantics are’ followed by BTET (Teradata) or ANSI

    • teradatapro on April 4, 2014 at 11:15 pm
    • Reply

    I would like to add some details. The reason why duplicate row checks can have such an heavy impact is the way it is done. Basically, Teradata has to compare the reocrd which should be inserted sequentially against all others having the same row hash. This may not even be a problem in case of a few duplicates but becomes worse with many rows per Primary Index or in case of heavy skewing.

    1. Thanks Roland

    • Chhatresh Joshi on December 24, 2013 at 10:52 am
    • Reply

    Row size / Sort Key size Over Flow error:

    Could you please explain about this?
    Table has over 100 columns and selecting * from table throws this error.

    Might be it is due to the lenght of the coulmn name>

    Thnaks.
    CJ

  5. Very Useful and clean and clear explanation of each and every topic..

    Thanks a lot…..
    u earned it…

    • Dinesh Saraswat on October 9, 2013 at 8:48 am
    • Reply

    Hi Dhamu,

    To have duplicates for your primary key columns in a table you have to define that table as SET table and PRIMARY INDEX on the KEY columns. Hence it will capture the duplicates on the primary keys.
    If you define UPI than it will not accept any duplicates at the UPI level, which are your PK’s in this case.

    Also to mention if you are expecting full duplicates like
    Col A, COL B , COL C, COl D
    A,B,C,D
    A,B,C,D

    Then your SET table with PI will also not capture the full duplicate record.

    • Dhamu on October 9, 2013 at 8:20 am
    • Reply

    Hi Admin,

    Thanks for brief expalnation. I have a query, If I created SET table with columns an data

    COLUMN 1 COLUMN 2 COLUMN 3 COLUMN 4
    A B C D
    A C D B
    A B F G

    There is no duplcates. If I created NUPI on (COLUMN1, COLUMN2) there will duplicates for this two column combination right? Create SET table and UPI on two columns make sence right?

    Please let me know what do you think.

    Thanks,
    Dhamu

    • aditya on April 8, 2013 at 1:39 pm
    • Reply

    What does “Merge block ratio” mean in every Teradata table definition?

    • Kumar on April 3, 2013 at 11:15 am
    • Reply

    HI Admin,

    Multiset table is clear but wanted to know in which scenario we can use the SET table? For ex: i am creating a SET table with UPI. In this case, SET table itself will take care of removing duplicates or it won’t allow duplicates, then what is the need of UPI?
    Please clarify.

    1. defining table as SET if it has UPI, make no sense.

      you should use SET table only with NUPI, in case your requirement is not to include complete duplicate rows then you can use SET with NUPI

        • Kumar on April 4, 2013 at 4:24 am
        • Reply

        Thanks Admin…:)

        • Stef on October 12, 2015 at 11:07 am
        • Reply

        Hi, what does NUPI stand for? Thanks

        1. Non Unique Primary Index

    • MIKE on March 8, 2013 at 8:10 am
    • Reply

    Please explain Fallback option in create statement

    1. Fallback is a data protection feature in teradata at table level. Fallback protected tables stores the mirror image of the data of the original table. In case of failure, this fallback copy is used to replace the original table.

    • Dinesh Saraswat on February 20, 2013 at 1:23 pm
    • Reply

    Very Clear and Crisp explanation of every concept whichever explained.

    🙂

    • deepak shrivastava on January 22, 2013 at 1:28 pm
    • Reply

    The content of the complete website is fantastic. It’s brief and clear. User gets a clear understanding of what it is. It’s very useful.

    1. Thanks 🙂

    • Anurag on November 19, 2012 at 1:04 pm
    • Reply

    This is an awesome post. Brief yet powerful. Small yet complete. I thought I knew it (SET- MULTISET) completely, but it still added to my knowledge

  1. […] Post From Teradata – Google Blog Search: set tables, multiset tables, set and multiset tables in teradata , set and multiset, multi set in […]

Leave a Reply to Stef Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.