«

»

Feb 21

Collect Statistics in Teradata

Statistics are data demographics (or hints) used by the Teradata optimizer.


There are many ways to generate a query plan for a given SQL, and collecting statistics ensures that the optimizer will have the most accurate information to create the best access and join plans.

The optimizing phase of Teradata, makes decisions on how to access table data. These decisions can be very important when table joins (especially those involving multiple joins) are required by a query. By default, the Optimizer uses approximations of the number of rows in each table (known as the cardinality of the table) and of the number of unique values in indexes in making its decisions. To build such estimates, the Optimizer picks a random AMP and builds the information and it is possible for the estimates to be significantly off. This can lead to poor choices of join plans, and associated increases in the response times of the queries involved.

One way to help the Optimizer make better decisions is to give it more accurate information as to the content of the table. This can be done using the COLLECT STATISTICS statement. When the Optimizer finds that there are statistics available for a referenced table, it will use those statistics instead of using estimated table cardinality or estimated unique index value counts.

Stats should be collected mainly under the below circumstances:


1. A thumb rule is to collect statistics when they’ve changed by 10%. (That would be 10% more rows inserted, or 10% of the rows deleted, or 10% of the rows changed, or some combination.)

2. The range of values for an index or column of a table for which statistics have been collected has changed significantly. Sometimes one can infer this from the date and time the statistics were last collected, or by the very nature of the column (for instance, if the column in question holds a transaction date, and statistics on that column were last gathered a year ago, it is almost certain that the statistics for that column are stale).

How stats are built over the table?


TD builds the uniqueness count for each identified column / set of columns for the completed table/partition data and stores the information in the DBC tables.

Whenever the stats are collected later, the previously collected information is lost and fresh stats are updated in the DBC tables.

The time taken to collect stats doesn’t depend on how frequently the stats have been collected or how recently the stats have been collected.

Stats should be collected on all dimensions, history, transactional, reference and aggregate tables based on the below approach:


1. If the table is loaded under DELETE INSERT mode, then STATS should be collected during each load.

2. If the table is built under INSERT UPDATE mode, then STATS should be collected if the data demographics change by more than 10%.

3. If the target is a transactional table loaded in APPEND mode, then STATS should be collected if the data demographics change by more than 10%.

4. If the table is built under INSERT mode; (aggregate tables where data is built for a particular duration and queried upon this duration) tables where partitions are built over each aggregation period, STATS should be collected on the new partition, even if the data demographics for the entire table changes less than 10%, because user queries or extractions might be built over data for current period of aggregation.


↓↓↓↓↓↓↓↓ SHARE IT WITH YOUR FRIENDS  ↓↓↓↓↓↓↓↓

12 comments

Skip to comment form

  1. Manoj

    Hi Admin,
    is it possible to install TD software in personal laptop.
    Regards
    Manoj

    1. admin

      yes check for TD 14 express for vmware

  2. Ramana

    How we can copy the contents of this website as it is read only. Please advise

    1. nisha Reddy

      print screen! simple 🙂

  3. Sekhar

    Hello Admin,

    I am pretty new to teradata. I am creating a simple data cube based on an existing transac table in TD12 and i have been told to collect the stats on a few tables i have created.
    So i am using the below query to look for the stats

    HELP STATISTICS D_TEST ;, but it is coing up as NO STATSDEFINED FOR THIS TABLE.

    Not sure if have to define the statics once i create any table,if so how do i go about doing it, please shed some light.
    Thanks

    1. admin

      first you need to create some stats on the table D_TEST with the help collect stats query, than you can see the result from HELP STATISTICS D_TEST. It will return the columns names where you have collected stats.

    2. Sekhar

      Hello Admin,

      It has worked like a treat, many thanks for your help on this.

      I am actually fairly new to teradata and for that matter any other DB, but i was asked to create a new data cube based on an existing transac table. But i am having difficulties setting up the Primary Key, Generating the Surrogate key, joining the tablesand dealing with SCD type 2 objects and also having difficulties getting the TD SQLSyntax to work.
      Could you be able to help me on this? And what’s the best method to contact you or may be a verbal chat in the first place would help and we can take it from there.
      Please leave your details if thats okay with you or
      you can reach me on 0044-7809490363 or raja.chs@gmail.com

      Many Thanks for your help in advance.

      Kind Regards

  4. shrikant

    Hi admin

    there is some issue with the blog..Menu bar is not showing properly.i think you have recently installed some wp plugin which is creating this issue.

    thanks

    1. admin

      Hi Shrikant,

      Thanks a lot for reporting this issues.

      We have fixed the menu bar as of now 🙂

      Thanks

      1. uma

        hi
        i am new for teradata plz let me know .how to remove duplicate record

        1. admin

          Hi,
          Best way to remove duplicate records in Teradata is to make use of SET table. Make a temporary SET table and dump all your records into it, SET table ensure that there is no duplicate in the table.
          You can also use SQL for this task, but that totally depends on unique id of record.

          Hope it helps.

  5. Anu

    hi admin,the term cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column and you said is the number of rows…can you give more clarity on this..
    Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>