How to find out which field (column) in the table contains the most difficult data, and which one is the easiest?
Are you interested in the size in bytes, occupied by one or another column on the disk (apparently, since you mentioned sp_spaceused )?
Not sure if it is possible to determine it accurately. You can see exactly how many pages (blocks of 8Kb by which SqlServer stores data) occupy the data of the entire table (or index).
The place on the disk allocated for the storage of a specific column, apparently, can only be given a certain estimate (which, however, will not always be adequate). And with datalength not quite simple. Further a little more.
Take a request
/*1*/ select sum(datalength([Column])) from [Table];
for evaluation basis.
First of all . In addition to the actual data of the column, there is always additional service information that may be related to the column, but its volume between the columns can logically be disproportionate to their number in the table (for example, the row header data). Those. size estimate /*1*/ should be taken as "no less than".
The smaller the column in the table, and the shorter the record, the greater the overhead of the service data, the further the estimate /*1*/ of the reality. Thus, for a table with one short column, the total size of the data (including service information) can significantly exceed the “logical” data size of the column itself. Compare, for example, for a table
create table tiny_nums (value tinyint); insert into tiny_nums (value) select top (1048576) 1 from master.sys.all_columns a cross join master.sys.all_columns b;
the result returned by the query /*1*/ so that sp_spaceused shows.
Secondly . The value returned by datalength does not always correspond to reality. In particular, if datalength([Column]) returns NULL , then physically it may not be zero at all.
This is because column types are divided into fixed-length (eg int , char(20) , datetime2(0) , uniqueidentifier , etc.) and variable-length (eg varbinary(64) , nvarchar(30) , etc.). And if for variable-length the estimate /*1*/ approximately valid, then for fixed-length columns, space is reserved for storing the value, even if the value itself is NULL .
Those. for fixed-length columns, the estimate /*1*/ should be adjusted using instead of NULL (if possible) any non-empty value corresponding to the column type (for example, 0 for int ):
/*2*/ select sum(datalength(isnull([IntColumn], 0))) from [Table];
You should also take into account that for bit type columns, the returned datalength value is 1. However, if there are several bit columns in the table (or index), then SqlServer combines them by 8 to 1 byte.
Also, columns can be sparse , which means 0 bytes to store NULL (even for fixed-length), but plus 4 extra bytes to store values if it is not NULL :
/*3*/ select sum(datalength([SparseColumn]) + 4) from [TableName];
Thirdly . If the column is not just present in the table, but also participates in the indices, then it is "weighted" multiple of the number of indices in which it participates. If the column is key in a clustered index, then you need to add an estimate multiple to the number of all nonclustered indexes (since the leaf-level pages of nonclustered indexes contain the values of the keys of the cluster index). So in the table
create table SomeTable ( PK_ID bigint primary key, FK_1_ID int, FK_2_ID int, ... FK_10_ID int, UID uniqueidentifier ); create index IX_1 on SomeTable (FK_1_ID); ... create index IX_10 on SomeTable (FK_10_ID);
the most "heavy" is likely to be not a UID column at all, but a PK_ID , since (besides participation in the cluster primary key) PK_ID values will be present in 10 more nonclustered indexes.
It should also be noted that if a nonclustered index is a filtered index , then the corresponding estimate ( /*1*/ , /*2*/ or /*3*/ ) should be taken not across the table, but along the lines corresponding to the filter of such an index.
Fourthly (refers to the Enterprise edition). If compressing rows or pages of a table
alter table [TableName] rebuild partition = all with (data_compression = row);
either index
alter index [IX_Name] on [TableName] rebuild with (data_compression = page);
then assessments using datalength cease to be adequate and the no less than factor ceases to work.
Compare for tables
create table strings (value char(2000)); insert into strings (value) select top (10000) replicate('a', 2000) from master.sys.all_columns a cross join master.sys.all_columns b;
and
create table strings_page (value char(2000)) with (data_compression = page); insert into strings_page with (tablock) (value) select top (10000) replicate('a', 2000) from master.sys.all_columns a cross join master.sys.all_columns b;
values for estimating the size of a column using datalength with what sp_spaceused shows. For the first table, the “readings” of datalength and sp_spaceused will be close (since the row of the table is “wide” and the amount of overhead information is not so great), and for the second, they will diverge very much.
Fifth . All that has been said up to this point is true for SqlServer 2008. In later versions, COLUMNSTORE indexes appeared, which, due to the characteristics of their device, can store data in a substantially compressed form. For them, estimating the size of a column using datalength can also give inadequate results. If for the table
create table strings_col (value char(2000)); insert into strings_col (value) select top (10000) replicate('a', 2000) from master.sys.all_columns a cross join master.sys.all_columns b; create clustered columnstore index ix_clcs on strings_col;
sp_spaceused compare sp_spaceused with datalength , then again you can see a strong discrepancy.
I believe that this list of factors that should be considered when assessing the place occupied by a particular column is not exhaustive.