Feb 21

Collect Statistics in Teradata

Categories:

Statistics are data demographics (or hints) used by the Teradata optimizer.

There are many ways to generate a query plan for a given SQL, and collecting statistics ensures that the optimizer will have the most accurate information to create the best access and join plans.

The optimizing phase of Teradata, makes decisions on how to access table data. These decisions can be very important when table joins (especially those involving multiple joins) are required by a query. By default, the Optimizer uses approximations of the number of rows in each table (known as the cardinality of the table) and of the number of unique values in indexes in making its decisions. To build such estimates, the Optimizer picks a random AMP and builds the information and it is possible for the estimates to be significantly off. This can lead to poor choices of join plans, and associated increases in the response times of the queries involved.

One way to help the Optimizer make better decisions is to give it more accurate information as to the content of the table. This can be done using the COLLECT STATISTICS statement. When the Optimizer finds that there are statistics available for a referenced table, it will use those statistics instead of using estimated table cardinality or estimated unique index value counts.

Stats should be collected mainly under the below circumstances:

1. A thumb rule is to collect statistics when they’ve changed by 10%. (That would be 10% more rows inserted, or 10% of the rows deleted, or 10% of the rows changed, or some combination.)

2. The range of values for an index or column of a table for which statistics have been collected has changed significantly. Sometimes one can infer this from the date and time the statistics were last collected, or by the very nature of the column (for instance, if the column in question holds a transaction date, and statistics on that column were last gathered a year ago, it is almost certain that the statistics for that column are stale).

How stats are built over the table?

TD builds the uniqueness count for each identified column / set of columns for the completed table/partition data and stores the information in the DBC tables.

Whenever the stats are collected later, the previously collected information is lost and fresh stats are updated in the DBC tables.

The time taken to collect stats doesn’t depend on how frequently the stats have been collected or how recently the stats have been collected.

Stats should be collected on all dimensions, history, transactional, reference and aggregate tables based on the below approach:

1. If the table is loaded under DELETE INSERT mode, then STATS should be collected during each load.

2. If the table is built under INSERT UPDATE mode, then STATS should be collected if the data demographics change by more than 10%.

3. If the target is a transactional table loaded in APPEND mode, then STATS should be collected if the data demographics change by more than 10%.

4. If the table is built under INSERT mode; (aggregate tables where data is built for a particular duration and queried upon this duration) tables where partitions are built over each aggregation period, STATS should be collected on the new partition, even if the data demographics for the entire table changes less than 10%, because user queries or extractions might be built over data for current period of aggregation.

↓↓↓↓↓↓↓↓ SHARE IT WITH YOUR FRIENDS ↓↓↓↓↓↓↓↓

Tags: collect statistics, collect statistics in teradata, collect statistics on teradata, collect stats, collect stats on table level, collect stats teradata

12 pings

Skip to comment form ↓

Collect Statistics in Teradata

↓↓↓↓↓↓↓↓ SHARE IT WITH YOUR FRIENDS ↓↓↓↓↓↓↓↓

12 pings

Leave a Reply Cancel reply

Categories

Teradata Certification – Sample Papers

Teradata Utilities

Popular Posts of the Week

Must Read Posts

Get Latest Post in your Inbox

Recent Posts

Categories

Teradata Certification – Sample Papers

Teradata Utilities

Sponsors of Teradata Tech

Like us

Who's Online

Copyright