«

»

Aug 03

Collecting Stats on Table Level or Column Level ?


COLLECT STATS is one on the most useful utility in Teradata. It helps Parsing Engine (PE) to make an effective plan to execute query, so that less resource are utilized and performance is improvised.

But there is a serious confusion between doing COLLECT STATS on column level or table level.

Here i like to explain the difference between both the scenarios with appropriate example.


Suppose we have a table and we want to collect statistics on 4 columns. We can do this by the below mentioned query –

collect stats on TABLE_NAME column(COL1);
collect stats on TABLE_NAME column(COL2);
collect stats on TABLE_NAME column(COL3);
collect stats on TABLE_NAME
column(COL4);

The other way of defining COLLECT STATS on the same table is –

collect stats on TABLE_NAME;

The second query is collecting stats on table level. Both the approach will do the same thing, but we cannot directly collect stats onΒ  table level.

If you are collecting STATS on table level, then the STATS must already be defined on the above mentioned 4 columns of the table. This can be done at the time of creation of table. If we are not defining the STATS for the columns earlier, then our COLLECT STATS on table level will give an error message.

Collect stats on table can only be used on a table which has stats defined on it, on any no. of columns for that be. Once stats are defined on the columns you can you use collect stats on table for refresh the stats for all the defined columns. If you use collect stats on TABLE_NAME column(COL1) it will refresh the stats on the mentioned column (COL1) only.


We can say that COLLECT STATS on table level is just the shortcut of collecting stats on all the columns on whom we have already defined stats. It saves the overhead of writing COLLECT STATS query on each column, each time we want to gather statistics. In our case there are only 4 columns whose statistics we want, but suppose in a huge table if there are more than 10-20 columns which are required for COLLECT STATS, then our COLLECT STATS on table level saves us a lot of typing time. Its good practice to perform a collect stats on the columns of even an empty table, when data is loaded into the table a collect stats on table can be used to collect all the statistics without having to collect statistics on the individual columns.

But from a DBA point of view its always wise to collect stats on column level. Because if we are doing collect stats on TABLE level then stats are not committed on any column until all stats are collected. Since collecting stats required resources, so if you have huge data table then it might take a long time. If you need to free the resource then you have to abort the whole process, and no stats will be collected on any column.On the other hand if your are doing it on COLUMN level, and killed the process in between then atleast we have stats collected on columns which are already done before killing the script. And we can again start from that column where we left.

So these are the pros and cons of both the approach of collecting stats. Its according to the requirement of the project, which approach to choose

 

 

40 comments

Skip to comment form

  1. Arpan

    Thanks for the Very Useful Information. This Website is my first point of contact

    1. admin

      thanks

  2. Anjali3

    begin
    dbms_stats.gather_table_stats(owner=>’GSharma3′,
    tabname=>’Script_DB_030116_16′
    block_sample=>TRUE,
    estimate_percent=>1,
    degree=>9);
    end;

    I am getting error stating missing keyword. Could you please tell me where am I going wrong?

    Thanks.

  3. Nithiya

    Hi,

    I have an update query to update a table its was working fine but 2 days ago there was an issue. Job to update the table completed successfully in production but didn’t resulted as expected. Have tested the same update query in testing environment it works fine and I could see the columns populated with updated data. But same is not happening in production. Any Idea?

  4. Nirmal

    Hi,

    I have a update query which fetch data from various data and update the latest update in columns, it was working well in production so for but suddenly I could see that the update is no properly being done.

    I have tested the same in testing environment. same query works fine.
    Does stats got anything to here?

    Hepl Plz!

  5. Bharath

    This is really a good website.

  6. ramana

    I would like too use stats on table or column. is it any multiple columns at a time we can use stats! It’s possible or not? Please help on this

    Thanks
    ramana

  7. Pankaj Kumar

    Can you please explain why does the output for collect stats for single query on column level says that 2 rows executed successfully. Why 2 rows are executed and how?

    1. admin

      this 2 rows signifies entry in the system tables for that particular column whose stats you collected.

  8. Ramana

    How we would know what are all the fields to be defined for collecting the stats at the time of table creation?

    1. admin

      Its not necessary and recommended to define stats at the time of table creation. Stats are useful when table is loaded with data.
      You can also make use of ‘ Diagnostic HELPSTATS’ feature in Teradata. Just write DIAGNOSTIC HELPSTATS ON FOR SESSION in query window, after that do explain of your query. It will display recommended STATS in explain plan itself. However some times recommendation may not be appropriate, in that case you need to do some experiment to derive the correct stats.

      Collect Stats are recommended on –
      NUSI,
      Joining Columns,
      Columns used in where clause,
      PARTITION column,
      UPI (only when table size is significantly less)

      Please read below link for mode details
      http://www.teradatatech.com/?p=611

  9. Prashant

    Good Job Mr Admin. Really explained the Stats Concept in a easy to Understand Manner.

  10. TCS - RaviKiran

    Great Going…
    Explained in a very simple and user friendly manner
    Appreciated.

    Thanks.

  11. Satyarth Singh

    Hi Rohit,

    We have a stats proc which collect stats on a table level. some times it runs twice daily.
    We have a requirement, that when a stats proc runs, if it sees an entry into the database , it does not collect again,

    We made a Union all to the databases used by specific users and see some how works at the moment. But this solution below may not work because, within the DEV / TEST environments STATS are always collected so it’s possible that STATS would not be collected because some testing had been completed on the same day.

    We have one Box and on eproc for Prod/dev/test.

    Is it possible to build a dynamic SQL statement (below) similar to the way the log/error table updates are initiated below to and then check the returned SQL code?

    1. admin

      As per my understanding of your question –
      you want to make a dynamic code which detects the previous status of SQL command (Pass or Fail) ?

      I guess you can achieve this with the help of UNIX shell scripting (if your environment is in UNIX box). With the help of shell programing we can trace the status of previous SQL command. You need to call BTEQ in between your shell script.

  12. Amit Sethiya

    Useful information about Collecting stats.

    thanks admin

  13. m k suresh

    Very helpful blog.
    thanks a lot for helping others learning teradata more easier.

    your work is very much appreciated.

  14. Mike

    I am new to Teradata… you mentioned that collect stats just refresh the stats which are already defined… Can you please tell me how to define stats on a table/column to begin with?

    1. admin

      the syntax for doing collect stats in TERADATA is as follows –

      COLLECT STATS ON table_name COLUMN (column_name);

      For the difference between collect stats on COLUMN level and TABLE level please refer the following post
      http://www.teradatatech.com/?p=430

      1. Mike

        So the same syntax works for both defining the stats on a table and refreshing them.
        Correct me if I am wrong..

        1. admin

          yes the same syntax work fine for defining as well as refreshing STATS.

  15. fighter

    can you please explain the COLLECT STATISTICS at index level? How and when to use it?

    1. Kumar

      Mostly this will be same as column level stats because in index whatever columns you have given will be already there as column. Collecting stats on index will help in faster data retrieval.
      How?
      — collect stats on index ;
      When?
      — During initial stats collection and then you can collect stats on table level.

      Hi Admin,
      Please correct me if i am wrong.

      1. Kumar

        Small correction:
        How?
        collect stats on index ;

        1. Kumar

          Sorry again there is some issue while typing the syntax. Please use the below syntax to collect stats on index:
          Collect stats on index ;

          1. admin

            correct syntax –

            collect stats on table_name index (index_name);

  16. Victory99

    Very well planned and compiled. You also made this topic very intresting to read.
    Please carry on this fabulous work.
    BTW… I very rarley comment…for the most part I am a silent reader.

    1. admin

      thanks for your appreciation …. πŸ™‚

  17. srinivas

    thanks very good info

  18. sowmya

    good one…

    1. admin

      Thanks Sowmya πŸ™‚

  19. Manoj

    Very useful content

  20. Windows

    This is very interesting, You are a very skilled blogger. I have joined your feed and look forward to seeking more of your excellent post. Also, I have shared your web site in my social networks!

    1. admin

      Thanks πŸ™‚

  21. Anonymous

    I like the layout of your blog. The font, themes and colors all match ver well.

    1. admin

      Thanks i wish you also like the content πŸ˜‰

      1. Misty

        I suppose that sunods and smells just about right.

  22. admin

    Thanks car repo for you time πŸ™‚

    1. Morrie

      Okay I’m convinced. Let’s put it to atcion.

      1. Swap

        Thnx car repo for your time.pls keep posting new articles about teradata

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>