«

»

Aug 03

Collecting Stats on Table Level or Column Level ?


COLLECT STATS is one on the most useful utility in Teradata. It helps Parsing Engine (PE) to make an effective plan to execute query, so that less resource are utilized and performance is improvised.

But there is a serious confusion between doing COLLECT STATS on column level or table level.

Here i like to explain the difference between both the scenarios with appropriate example.


Suppose we have a table and we want to collect statistics on 4 columns. We can do this by the below mentioned query –

collect stats on TABLE_NAME column(COL1);
collect stats on TABLE_NAME column(COL2);
collect stats on TABLE_NAME column(COL3);
collect stats on TABLE_NAME
column(COL4);

The other way of defining COLLECT STATS on the same table is –

collect stats on TABLE_NAME;

The second query is collecting stats on table level. Both the approach will do the same thing, but we cannot directly collect stats on  table level.

If you are collecting STATS on table level, then the STATS must already be defined on the above mentioned 4 columns of the table. This can be done at the time of creation of table. If we are not defining the STATS for the columns earlier, then our COLLECT STATS on table level will give an error message.

Collect stats on table can only be used on a table which has stats defined on it, on any no. of columns for that be. Once stats are defined on the columns you can you use collect stats on table for refresh the stats for all the defined columns. If you use collect stats on TABLE_NAME column(COL1) it will refresh the stats on the mentioned column (COL1) only.


We can say that COLLECT STATS on table level is just the shortcut of collecting stats on all the columns on whom we have already defined stats. It saves the overhead of writing COLLECT STATS query on each column, each time we want to gather statistics. In our case there are only 4 columns whose statistics we want, but suppose in a huge table if there are more than 10-20 columns which are required for COLLECT STATS, then our COLLECT STATS on table level saves us a lot of typing time. Its good practice to perform a collect stats on the columns of even an empty table, when data is loaded into the table a collect stats on table can be used to collect all the statistics without having to collect statistics on the individual columns.

But from a DBA point of view its always wise to collect stats on column level. Because if we are doing collect stats on TABLE level then stats are not committed on any column until all stats are collected. Since collecting stats required resources, so if you have huge data table then it might take a long time. If you need to free the resource then you have to abort the whole process, and no stats will be collected on any column.On the other hand if your are doing it on COLUMN level, and killed the process in between then atleast we have stats collected on columns which are already done before killing the script. And we can again start from that column where we left.

So these are the pros and cons of both the approach of collecting stats. Its according to the requirement of the project, which approach to choose

 

 

40 pings

Skip to comment form

Leave a Reply to Mike Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.