Is it possible to somehow find out how much certain rows in a table weigh in a MySQL database, or to find out the weight of the sample when processed on the server. You need to do this either using PHP or using MySQL.
1 answer
It should be borne in mind that the size occupied by the data is summed up in fact from two values ββ- the volume of data itself and meta-information. For example, for the size of the TINYBLOB
field is determined by the number of bytes allocated for data, plus 1 byte for their addressing, in the case of a BLOB
, another 2 bytes are added, in the case of MEDIUMBLOB
, 3, and in the case of LONGBLOB
, 4 bytes.
A number of fields have a fixed size, a number - variable.
The size of the text information is highly dependent on the encoding, so under UTF8 3 bytes are allocated for each character, under UTF8MB4 4 bytes are allocated for each character. This information is stored in the system information_schema
database. For example, you can extract the number of bytes that are allocated for the character of the current encoding as follows
<?php // ΠΡΡΠΈΡΠ»ΡΠ΅ΠΌ ΡΠ°Π·ΠΌΠ΅Ρ ΡΠΈΠΌΠ²ΠΎΠ»Π° Π² Π±Π°ΠΉΡΠ°Ρ
function bytesInSymbols($pdo, $charset = 'utf8_general_ci') { $query = "SELECT MAXLEN AS bytes FROM information_schema.CHARACTER_SETS WHERE DEFAULT_COLLATE_NAME = :charset"; $sym = $pdo->prepare($query); $sym->execute(['charset' => $charset]); return $sym->fetchColumn(); }
Calculate first how many bytes are allocated for the meta information of the string
<?php // Π Π°Π·ΠΌΠ΅Ρ ΠΌΠ΅ΡΠ°-ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ Π΄Π»Ρ ΡΠΈΠΏΠ° $type function sizeOfColumn($type, $precision) { $types = ['tynint' => 1, 'smallint' => 2, 'mediumint' => 3, 'int' => 4, 'integer' => 4, 'bigint' => 8, 'float' => function($precision){ return $precision <= 24 ? 4 : 8; }, 'double' => 8, 'decimal' => function($precision){ return $precision + 2; }, 'numeric' => function($precision){ return $precision + 2; }, 'date' => 3, 'time' => 3, 'datetime' => 8, 'timestamp' => 4, 'year' => 1, 'char' => 0, 'binary' => 0, 'varbinary' => function($precision){ return $precision > 255 ? 2 : 1; }, 'varchar' => function($precision){ return $precision > 255 ? 2 : 1; }, 'enum' => function($length){ return $length + 2; } , 'set' => function($length){ return $length + 8; }, 'tinyblob' => 1, 'tinytext' => 1, 'blob' => 2, 'text' => 2, 'mediumblob' => 3, 'mediumtext' => 3, 'longblob' => 4, 'longtext' => 4, ]; return is_numeric($types[$type]) ? $types[$type] : $types[$type]($precision); } // Π Π°Π·ΠΌΠ΅Ρ ΠΌΠ΅ΡΠ°-ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ, Π·Π°Π½ΠΈΠΌΠ°Π΅ΠΌΡΠΉ ΡΠ°Π±Π»ΠΈΡΠ΅ΠΉ function sizeOfMeta($pdo, $database, $table) { $query = "SELECT * FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = :database AND TABLE_NAME = :table"; $sym = $pdo->prepare($query); $sym->execute(['database' => $database, 'table' => $table]); $result = 0; while($fetch = $sym->fetch()) { if(in_array($fetch['DATA_TYPE'], ['enum','set'])) $precision = $fetch['CHARACTER_OCTET_LENGTH']; else $precision = $fetch['NUMERIC_PRECISION']; $result += sizeOfColumn( $fetch['DATA_TYPE'], $precision); } return $result; }
The size in bytes occupied by binary and textual data can be calculated as follows.
// Π Π°Π·ΠΌΠ΅Ρ, Π·Π°Π½ΠΈΠΌΠ°Π΅ΠΌΡΠΉ ΡΡΡΠΎΠΊΠΎΠΉ function sizeOfRow($pdo, $database, $table, $id) { // Π Π°Π·ΠΌΠ΅Ρ Π·Π°Π½ΠΈΠΌΠ°Π΅ΠΌΡΠΉ ΠΌΠ΅ΡΠ°-ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠ΅ΠΉ $result = sizeOfMeta($pdo, $database, $table); // ΠΠ·Π²Π»Π΅ΠΊΠ°Π΅ΠΌ ΡΡΠΎΠ»Π±ΡΡ, Π² ΠΊΠΎΡΠΎΡΡΡ
ΡΠΈΠΌΠ²ΠΎΠ» Π·Π°Π½ΠΈΠΌΠ°Π΅Ρ 1 Π±Π°ΠΉΡ $query = "SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = :database AND TABLE_NAME = :table AND DATA_TYPE IN ('binary', 'varbinary', 'tinyblob', 'blob', 'mediumblob', 'longblob')"; $sym = $pdo->prepare($query); $sym->execute(['database' => $database, 'table' => $table]); $arr = []; while($field = $sym->fetchColumn()){ $arr[] = "LENGTH(`$field`) * $bytes"; } if(count($arr)) { $query = "SELECT ".implode(' + ', $arr)." AS total FROM $table WHERE id = :id"; $sym = $pdo->prepare($query); $sym->execute(['id' => $id]); $result += $sym->fetchColumn(); } // ΠΠ·Π²Π»Π΅ΠΊΠ°Π΅ΠΌ ΡΡΠΎΠ»Π±ΡΡ, Π² ΠΊΠΎΡΠΎΡΡΡ
ΡΠΈΠΌΠ²ΠΎΠ» Π·Π°Π½ΠΈΠΌΠ°Π΅Ρ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΎ Π±Π°ΠΉΡ $query = "SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = :database AND TABLE_NAME = :table AND DATA_TYPE IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext')"; $sym = $pdo->prepare($query); $sym->execute(['database' => $database, 'table' => $table]); $arr = []; $bytes = bytesInSymbols($pdo); while($field = $sym->fetchColumn()){ $arr[] = "LENGTH(`$field`) * $bytes"; } if(count($arr)) { $query = "SELECT ".implode(' + ', $arr)." AS total FROM $table WHERE id = :id"; $sym = $pdo->prepare($query); $sym->execute(['id' => $id]); $result += $sym->fetchColumn(); } return $result; }
Let all the functions listed above are in the connect.php file along with the code that establishes the connection to the database.
<?php try { $pdo = new PDO( 'mysql:host=localhost;dbname=test', 'root', '', [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_EMULATE_PREPARES => true]); } catch (PDOException $e) { echo "ΠΠ΅Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ ΡΡΡΠ°Π½ΠΎΠ²ΠΈΡΡ ΡΠΎΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΠ΅ Ρ Π±Π°Π·ΠΎΠΉ Π΄Π°Π½Π½ΡΡ
"; }
Thus, you can get the size of the row with id = 1
of tbl
table in tbl
database as follows:
<?php require_once('connect.php'); echo sizeOfRow($pdo, 'test', 'tbl', 1);
- under UTF8, 3 bytes are allocated for each character, under UTF8MB4, 4 bytes are allocated for each character - the documentation states a little differently: one to three bytes per character (utf8) and one to four bytes per character (utf8mb4) - aleksandr barakin
- @alexanderbarakin What's wrong? dev.mysql.com/doc/refman/5.7/en/charset-unicode-conversion.html In fact, if you conduct an experiment and calculate the size in practice, you can be sure that you did not bother with UTF-8. Under the symbol is always assigned a maximum. - cheops
- Yes, it is simply hard to believe that the developers have complicated the procedure of saving / retrieving data so much. for the rest of the βsoftware worldβ considers utf8 as a set of characters that can take from one to 4 (rfc3629) or 6 (iso10646) bytes . - aleksandr barakin pm
- @alexanderbarakin They have simplified the task of counting the number of characters per line so much, but, yes, they hacked the idea of ββutf8 at the root, if you try to create a VARCHAR (65536) in UTF8 encoding, you will not succeed, the maximum is to create a VARCHAR in the area 21800. That is each character is 3 bytes. And then when they began to use Emoji, for which you need a 4-byte developer sequence, you had to crush the new UTF8MB4 encoding. What can I say: MySQL running Oracle - lost base. - cheops
strlen()
, for MySQL,LENGTH
orBIT_LENGTH
(in bits). Otherwise, a variable or a cell of the table will take up as much space as is allocated for its type. - Indifferent