Chapter 15 The InnoDB Storage Engine


内容表

15.1 Introduction to InnoDB
15.1.1 Benefits of Using InnoDB Tables
15.1.2 Best Practices for InnoDB Tables
15.1.3 Verifying that InnoDB is the Default Storage Engine
15.1.4 Testing and Benchmarking with InnoDB
15.2 InnoDB and the ACID Model
15.3 InnoDB Multi-Versioning
15.4 InnoDB Architecture
15.5 InnoDB In-Memory Structures
15.5.1 Buffer Pool
15.5.2 Change Buffer
15.5.3 Adaptive Hash Index
15.5.4 Log Buffer
15.6 InnoDB On-Disk Structures
15.6.1 Tables
15.6.2 Indexes
15.6.3 Tablespaces
15.6.4 Doublewrite Buffer
15.6.5 Redo Log
15.6.6 Undo Logs
15.7 InnoDB Locking and Transaction Model
15.7.1 InnoDB Locking
15.7.2 InnoDB Transaction Model
15.7.3 Locks Set by Different SQL Statements in InnoDB
15.7.4 Phantom Rows
15.7.5 Deadlocks in InnoDB
15.8 InnoDB Configuration
15.8.1 InnoDB Startup Configuration
15.8.2 Configuring InnoDB for Read-Only Operation
15.8.3 InnoDB Buffer Pool Configuration
15.8.4 Configuring Thread Concurrency for InnoDB
15.8.5 Configuring the Number of Background InnoDB I/O Threads
15.8.6 Using Asynchronous I/O on Linux
15.8.7 Configuring the InnoDB Master Thread I/O Rate
15.8.8 Configuring Spin Lock Polling
15.8.9 Configuring InnoDB Purge Scheduling
15.8.10 Configuring Optimizer Statistics for InnoDB
15.8.11 Configuring the Merge Threshold for Index Pages
15.8.12 Enabling Automatic Configuration for a Dedicated MySQL Server
15.9 InnoDB Table and Page Compression
15.9.1 InnoDB Table Compression
15.9.2 InnoDB Page Compression
15.10 InnoDB Row Formats
15.11 InnoDB Disk I/O and File Space Management
15.11.1 InnoDB Disk I/O
15.11.2 File Space Management
15.11.3 InnoDB Checkpoints
15.11.4 Defragmenting a Table
15.11.5 Reclaiming Disk Space with TRUNCATE TABLE
15.12 InnoDB and Online DDL
15.12.1 Online DDL Operations
15.12.2 Online DDL Performance and Concurrency
15.12.3 Online DDL Space Requirements
15.12.4 Simplifying DDL Statements with Online DDL
15.12.5 Online DDL Failure Conditions
15.12.6 Online DDL Limitations
15.13 InnoDB Startup Options and System Variables
15.14 InnoDB INFORMATION_SCHEMA Tables
15.14.1 InnoDB INFORMATION_SCHEMA Tables about Compression
15.14.2 InnoDB INFORMATION_SCHEMA Transaction and Locking Information
15.14.3 InnoDB INFORMATION_SCHEMA Schema Object Tables
15.14.4 InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables
15.14.5 InnoDB INFORMATION_SCHEMA Buffer Pool Tables
15.14.6 InnoDB INFORMATION_SCHEMA Metrics Table
15.14.7 InnoDB INFORMATION_SCHEMA Temporary Table Info Table
15.14.8 Retrieving InnoDB Tablespace Metadata from INFORMATION_SCHEMA.FILES
15.15 InnoDB Integration with MySQL Performance Schema
15.15.1 Monitoring ALTER TABLE Progress for InnoDB Tables Using Performance Schema
15.15.2 Monitoring InnoDB Mutex Waits Using Performance Schema
15.16 InnoDB Monitors
15.16.1 InnoDB Monitor Types
15.16.2 Enabling InnoDB Monitors
15.16.3 InnoDB Standard Monitor and Lock Monitor Output
15.17 InnoDB Backup and Recovery
15.17.1 InnoDB Backup
15.17.2 InnoDB Recovery
15.18 InnoDB and MySQL Replication
15.19 InnoDB memcached Plugin
15.19.1 Benefits of the InnoDB memcached Plugin
15.19.2 InnoDB memcached Architecture
15.19.3 Setting Up the InnoDB memcached Plugin
15.19.4 InnoDB memcached Multiple get and Range Query Support
15.19.5 Security Considerations for the InnoDB memcached Plugin
15.19.6 Writing Applications for the InnoDB memcached Plugin
15.19.7 The InnoDB memcached Plugin and Replication
15.19.8 InnoDB memcached Plugin Internals
15.19.9 Troubleshooting the InnoDB memcached Plugin
15.20 InnoDB Troubleshooting
15.20.1 Troubleshooting InnoDB I/O Problems
15.20.2 Forcing InnoDB Recovery
15.20.3 Troubleshooting InnoDB Data Dictionary Operations
15.20.4 InnoDB Error Handling

15.1 Introduction to InnoDB

InnoDB is a general-purpose storage engine that balances high reliability and high performance. In MySQL 8.0, InnoDB is the default MySQL storage engine. Unless you have configured a different default storage engine, issuing a CREATE TABLE statement without an ENGINE= clause creates an InnoDB table.

Key Advantages of InnoDB

Table 15.1 InnoDB Storage Engine Features

Feature Support
B-tree indexes Yes
Backup/point-in-time recovery (Implemented in the server, rather than in the storage engine.) Yes
Cluster database support No
Clustered indexes Yes
Compressed data Yes
Data caches Yes
Encrypted data Yes (Implemented in the server via encryption functions; In MySQL 5.7 and later, data-at-rest tablespace encryption is supported.)
Foreign key support Yes
Full-text search indexes Yes (InnoDB support for FULLTEXT indexes is available in MySQL 5.6 and later.)
Geospatial data type support Yes
Geospatial indexing support Yes (InnoDB support for geospatial indexing is available in MySQL 5.7 and later.)
Hash indexes No (InnoDB utilizes hash indexes internally for its Adaptive Hash Index feature.)
Index caches Yes
Locking granularity Row
MVCC Yes
Replication support (Implemented in the server, rather than in the storage engine.) Yes
Storage limits 64TB
T-tree indexes No
Transactions Yes
Update statistics for data dictionary Yes

To compare the features of InnoDB with other storage engines provided with MySQL, see the Storage Engine Features table in Chapter 16, Alternative Storage Engines .

InnoDB Enhancements and New Features

For information about InnoDB enhancements and new features, refer to:

Additional InnoDB Information and Resources

15.1.1 Benefits of Using InnoDB Tables

You may find InnoDB tables beneficial for the following reasons:

  • If your server crashes because of a hardware or software issue, regardless of what was happening in the database at the time, you don't need to do anything special after restarting the database. InnoDB crash recovery automatically finalizes any changes that were committed before the time of the crash, and undoes any changes that were in process but not committed. Just restart and continue where you left off.

  • The InnoDB storage engine maintains its own buffer pool that caches table and index data in main memory as data is accessed. Frequently used data is processed directly from memory. This cache applies to many types of information and speeds up processing. On dedicated database servers, up to 80% of physical memory is often assigned to the buffer pool.

  • If you split up related data into different tables, you can set up foreign keys that enforce referential integrity . Update or delete data, and the related data in other tables is updated or deleted automatically. Try to insert data into a secondary table without corresponding data in the primary table, and the bad data gets kicked out automatically.

  • If data becomes corrupted on disk or in memory, a checksum mechanism alerts you to the bogus data before you use it.

  • When you design your database with appropriate primary key columns for each table, operations involving those columns are automatically optimized. It is very fast to reference the primary key columns in WHERE clauses, ORDER BY clauses, GROUP BY clauses, and join operations.

  • Inserts, updates, and deletes are optimized by an automatic mechanism called change buffering . InnoDB not only allows concurrent read and write access to the same table, it caches changed data to streamline disk I/O.

  • Performance benefits are not limited to giant tables with long-running queries. When the same rows are accessed over and over from a table, a feature called the Adaptive Hash Index takes over to make these lookups even faster, as if they came out of a hash table.

  • You can compress tables and associated indexes.

  • You can create and drop indexes with much less impact on performance and availability.

  • Truncating a file-per-table tablespace is very fast, and can free up disk space for the operating system to reuse, rather than freeing up space within the system tablespace that only InnoDB can reuse.

  • The storage layout for table data is more efficient for BLOB and long text fields, with the DYNAMIC row format.

  • You can monitor the internal workings of the storage engine by querying INFORMATION_SCHEMA tables.

  • You can monitor the performance details of the storage engine by querying Performance Schema tables.

  • You can freely mix InnoDB tables with tables from other MySQL storage engines, even within the same statement. For example, you can use a join operation to combine data from InnoDB and MEMORY tables in a single query.

  • InnoDB has been designed for CPU efficiency and maximum performance when processing large data volumes.

  • InnoDB tables can handle large quantities of data, even on operating systems where file size is limited to 2GB.

For InnoDB -specific tuning techniques you can apply in your application code, see Section 8.5, “Optimizing for InnoDB Tables” .

15.1.2 Best Practices for InnoDB Tables

This section describes best practices when using InnoDB tables.

  • Specifying a primary key for every table using the most frequently queried column or columns, or an auto-increment value if there is no obvious primary key.

  • Using joins wherever data is pulled from multiple tables based on identical ID values from those tables. For fast join performance, define foreign keys on the join columns, and declare those columns with the same data type in each table. Adding foreign keys ensures that referenced columns are indexed, which can improve performance. Foreign keys also propagate deletes or updates to all affected tables, and prevent insertion of data in a child table if the corresponding IDs are not present in the parent table.

  • Turning off autocommit . Committing hundreds of times a second puts a cap on performance (limited by the write speed of your storage device).

  • Grouping sets of related DML operations into transactions , by bracketing them with START TRANSACTION and COMMIT statements. While you don't want to commit too often, you also don't want to issue huge batches of INSERT , UPDATE , or DELETE statements that run for hours without committing.

  • Not using LOCK TABLES statements. InnoDB can handle multiple sessions all reading and writing to the same table at once, without sacrificing reliability or high performance. To get exclusive write access to a set of rows, use the SELECT ... FOR UPDATE syntax to lock just the rows you intend to update.

  • Enabling the innodb_file_per_table option or using general tablespaces to put the data and indexes for tables into separate files, instead of the system tablespace .

    The innodb_file_per_table option is enabled by default.

  • Evaluating whether your data and access patterns benefit from the InnoDB table or page compression features. You can compress InnoDB tables without sacrificing read/write capability.

  • Running your server with the option --sql_mode=NO_ENGINE_SUBSTITUTION to prevent tables being created with a different storage engine if there is an issue with the engine specified in the ENGINE= clause of CREATE TABLE .

15.1.3 Verifying that InnoDB is the Default Storage Engine

Issue the SHOW ENGINES statement to view the available MySQL storage engines. Look for DEFAULT in the InnoDB line.

mysql> SHOW ENGINES;
                        

Alternatively, query the INFORMATION_SCHEMA.ENGINES table.

mysql> SELECT * FROM INFORMATION_SCHEMA.ENGINES;
                        

15.1.4 Testing and Benchmarking with InnoDB

If InnoDB is not your default storage engine, you can determine if your database server or applications work correctly with InnoDB by restarting the server with --default-storage-engine=InnoDB defined on the command line or with default-storage-engine=innodb defined in the [mysqld] section of your MySQL server option file.

Since changing the default storage engine only affects new tables as they are created, run all your application installation and setup steps to confirm that everything installs properly. Then exercise all the application features to make sure all the data loading, editing, and querying features work. If a table relies on a feature that is specific to another storage engine, you will receive an error; add the ENGINE= other_engine_name clause to the CREATE TABLE statement to avoid the error.

If you did not make a deliberate decision about the storage engine, and you want to preview how certain tables work when created using InnoDB , issue the command ALTER TABLE table_name ENGINE=InnoDB; for each table. Or, to run test queries and other statements without disturbing the original table, make a copy:

CREATE TABLE InnoDB_Table (...) ENGINE=InnoDB AS SELECT * FROM other_engine_table;
                        

To assess performance with a full application under a realistic workload, install the latest MySQL server and run benchmarks.

Test the full application lifecycle, from installation, through heavy usage, and server restart. Kill the server process while the database is busy to simulate a power failure, and verify that the data is recovered successfully when you restart the server.

Test any replication configurations, especially if you use different MySQL versions and options on the master and slaves.

15.2 InnoDB and the ACID Model

The ACID model is a set of database design principles that emphasize aspects of reliability that are important for business data and mission-critical applications. MySQL includes components such as the InnoDB storage engine that adhere closely to the ACID model, so that data is not corrupted and results are not distorted by exceptional conditions such as software crashes and hardware malfunctions. When you rely on ACID-compliant features, you do not need to reinvent the wheel of consistency checking and crash recovery mechanisms. In cases where you have additional software safeguards, ultra-reliable hardware, or an application that can tolerate a small amount of data loss or inconsistency, you can adjust MySQL settings to trade some of the ACID reliability for greater performance or throughput.

The following sections discuss how MySQL features, in particular the InnoDB storage engine, interact with the categories of the ACID model:

  • A : atomicity.

  • C : consistency.

  • I: : isolation.

  • D : durability.

Atomicity

The atomicity aspect of the ACID model mainly involves InnoDB transactions . Related MySQL features include:

  • Autocommit setting.

  • COMMIT statement.

  • ROLLBACK statement.

  • Operational data from the INFORMATION_SCHEMA tables.

Consistency

The consistency aspect of the ACID model mainly involves internal InnoDB processing to protect data from crashes. Related MySQL features include:

Isolation

The isolation aspect of the ACID model mainly involves InnoDB transactions , in particular the isolation level that applies to each transaction. Related MySQL features include:

  • Autocommit setting.

  • SET ISOLATION LEVEL statement.

  • The low-level details of InnoDB locking . During performance tuning, you see these details through INFORMATION_SCHEMA tables.

Durability

The durability aspect of the ACID model involves MySQL software features interacting with your particular hardware configuration. Because of the many possibilities depending on the capabilities of your CPU, network, and storage devices, this aspect is the most complicated to provide concrete guidelines for. (And those guidelines might take the form of buy new hardware .) Related MySQL features include:

  • InnoDB doublewrite buffer , turned on and off by the innodb_doublewrite configuration option.

  • Configuration option innodb_flush_log_at_trx_commit .

  • Configuration option sync_binlog .

  • Configuration option innodb_file_per_table .

  • Write buffer in a storage device, such as a disk drive, SSD, or RAID array.

  • Battery-backed cache in a storage device.

  • The operating system used to run MySQL, in particular its support for the fsync() system call.

  • Uninterruptible power supply (UPS) protecting the electrical power to all computer servers and storage devices that run MySQL servers and store MySQL data.

  • Your backup strategy, such as frequency and types of backups, and backup retention periods.

  • For distributed or hosted data applications, the particular characteristics of the data centers where the hardware for the MySQL servers is located, and network connections between the data centers.

15.3 InnoDB Multi-Versioning

InnoDB is a multi-versioned storage engine : it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback . This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read .

内部, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.

Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.

Commit your transactions regularly, including those transactions that issue only consistent reads. Otherwise, InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.

The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.

In the InnoDB multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge , and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.

If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the dead rows, making everything disk-bound and very slow. In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable. See Section 15.13, “InnoDB Startup Options and System Variables” for more information.

Multi-Versioning and Secondary Indexes

InnoDB multiversion concurrency control (MVCC) treats secondary indexes differently than clustered indexes. Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place.

When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.

If a secondary index record is marked for deletion or the secondary index page is updated by a newer transaction, the covering index technique is not used. Instead of returning values from the index structure, InnoDB looks up the record in the clustered index.

However, if the index condition pushdown (ICP) optimization is enabled, and parts of the WHERE condition can be evaluated using only fields from the index, the MySQL server still pushes this part of the WHERE condition down to the storage engine where it is evaluated using the index. If no matching records are found, the clustered index lookup is avoided. If matching records are found, even among delete-marked records, InnoDB looks up the record in the clustered index.

15.4 InnoDB Architecture

The following diagram shows in-memory and on-disk structures that comprise the InnoDB storage engine architecture. For information about each structure, see Section 15.5, “InnoDB In-Memory Structures” , and Section 15.6, “InnoDB On-Disk Structures” .

Figure 15.1 InnoDB Architecture

InnoDB architecture diagram showing in-memory and on-disk structures.

15.5 InnoDB In-Memory Structures

This section describes InnoDB in-memory structures and related topics.

15.5.1 Buffer Pool

The buffer pool is an area in main memory where caches table and index data as it is accessed. The buffer pool permits frequently used data to be processed directly from memory, which speeds up processing. On dedicated servers, up to 80% of physical memory is often assigned to the buffer pool.

For efficiency of high-volume read operations, the buffer pool is divided into pages that can potentially hold multiple rows. For efficiency of cache management, the buffer pool is implemented as a linked list of pages; data that is rarely used is aged out of the cache using a variation of the LRU algorithm.

Knowing how to take advantage of the buffer pool to keep frequently accessed data in memory is an important aspect of MySQL tuning.

Buffer Pool LRU Algorithm

The buffer pool is managed as a list using a variation of the least recently used (LRU) algorithm. When room is needed to add a new page to the buffer pool, the least recently used page is evicted and a new page is added to the middle of the list. This midpoint insertion strategy treats the list as two sublists:

  • At the head, a sublist of new ( young ) pages that were accessed recently

  • At the tail, a sublist of old pages that were accessed less recently

Figure 15.2 Buffer Pool List

Content is described in the surrounding text.

The algorithm keeps pages that are heavily used by queries in the new sublist. The old sublist contains less-used pages; these pages are candidates for eviction .

By default, the algorithm operates as follows:

  • 3/8 of the buffer pool is devoted to the old sublist.

  • The midpoint of the list is the boundary where the tail of the new sublist meets the head of the old sublist.

  • When InnoDB reads a page into the buffer pool, it initially inserts it at the midpoint (the head of the old sublist). A page can be read because it is required for a user-specified operation such as an SQL query, or as part of a read-ahead operation performed automatically by InnoDB .

  • Accessing a page in the old sublist makes it young , moving it to the head of the buffer pool (the head of the new sublist). If the page was read because it was required, the first access occurs immediately and the page is made young. If the page was read due to read-ahead, the first access does not occur immediately (and might not occur at all before the page is evicted).

  • As the database operates, pages in the buffer pool that are not accessed age by moving toward the tail of the list. Pages in both the new and old sublists age as other pages are made new. Pages in the old sublist also age as pages are inserted at the midpoint. Eventually, a page that remains unused reaches the tail of the old sublist and is evicted.

By default, pages read by queries immediately move into the new sublist, meaning they stay in the buffer pool longer. A table scan (such as performed for a mysqldump operation, or a SELECT statement with no WHERE clause) can bring a large amount of data into the buffer pool and evict an equivalent amount of older data, even if the new data is never used again. Similarly, pages that are loaded by the read-ahead background thread and then accessed only once move to the head of the new list. These situations can push frequently used pages to the old sublist where they become subject to eviction. For information about optimizing this behavior, see Section 15.8.3.3, “Making the Buffer Pool Scan Resistant” , and Section 15.8.3.4, “Configuring InnoDB Buffer Pool Prefetching (Read-Ahead)” .

InnoDB Standard Monitor output contains several fields in the BUFFER POOL AND MEMORY section regarding operation of the buffer pool LRU algorithm. For details, see Monitoring the Buffer Pool Using the InnoDB Standard Monitor .

Buffer Pool Configuration

You can configure the various aspects of the buffer pool to improve performance.

Monitoring the Buffer Pool Using the InnoDB Standard Monitor

InnoDB Standard Monitor output, which can be accessed using SHOW ENGINE INNODB STATUS , provides metrics regarding operation of the buffer pool. Buffer pool metrics are located in the BUFFER POOL AND MEMORY section of InnoDB Standard Monitor output and appear similar to the following:

----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 2198863872
Dictionary memory allocated 776332
Buffer pool size   131072
Free buffers       124908
Database pages     5720
Old database pages 2071
Modified db pages  910
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 4, not young 0
0.10 youngs/s, 0.00 non-youngs/s
Pages read 197, created 5523, written 5060
0.00 reads/s, 190.89 creates/s, 244.94 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not
0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read
ahead 0.00/s
LRU len: 5720, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
                            

The following table describes buffer pool metrics reported by the InnoDB Standard Monitor.

Note

Per second averages provided in InnoDB Standard Monitor output are based on the elapsed time since InnoDB Standard Monitor output was last printed.

Table 15.2 InnoDB Buffer Pool Metrics

Name 描述
Total memory allocated The total memory allocated for the buffer pool in bytes.
Dictionary memory allocated The total memory allocated for the InnoDB data dictionary in bytes.
Buffer pool size The total size in pages allocated to the buffer pool.
Free buffers The total size in pages of the buffer pool free list.
Database pages The total size in pages of the buffer pool LRU list.
Old database pages The total size in pages of the buffer pool old LRU sublist.
Modified db pages The current number of pages modified in the buffer pool.
Pending reads The number of buffer pool pages waiting to be read into the buffer pool.
Pending writes LRU The number of old dirty pages within the buffer pool to be written from the bottom of the LRU list.
Pending writes flush list The number of buffer pool pages to be flushed during checkpointing.
Pending writes single page The number of pending independent page writes within the buffer pool.
Pages made young The total number of pages made young in the buffer pool LRU list (moved to the head of sublist of new pages).
Pages made not young The total number of pages not made young in the buffer pool LRU list (pages that have remained in the old sublist without being made young).
youngs/s The per second average of accesses to old pages in the buffer pool LRU list that have resulted in making pages young. See the notes that follow this table for more information.
non-youngs/s The per second average of accesses to old pages in the buffer pool LRU list that have resulted in not making pages young. See the notes that follow this table for more information.
Pages read The total number of pages read from the buffer pool.
Pages created The total number of pages created within the buffer pool.
Pages written The total number of pages written from the buffer pool.
reads/s The per second average number of buffer pool page reads per second.
creates/s The per second average number of buffer pool pages created per second.
writes/s The per second average number of buffer pool page writes per second.
Buffer pool hit rate The buffer pool page hit rate for pages read from the buffer pool memory vs from disk storage.
young-making rate The average hit rate at which page accesses have resulted in making pages young. See the notes that follow this table for more information.
not (young-making rate) The average hit rate at which page accesses have not resulted in making pages young. See the notes that follow this table for more information.
Pages read ahead The per second average of read ahead operations.
Pages evicted without access The per second average of the pages evicted without being accessed from the buffer pool.
Random read ahead The per second average of random read ahead operations.
LRU len The total size in pages of the buffer pool LRU list.
unzip_LRU len The total size in pages of the buffer pool unzip_LRU list.
I/O sum The total number of buffer pool LRU list pages accessed, for the last 50 seconds.
I/O cur The total number of buffer pool LRU list pages accessed.
I/O unzip sum The total number of buffer pool unzip_LRU list pages accessed.
I/O unzip cur The total number of buffer pool unzip_LRU list pages accessed.

Notes :

  • The youngs/s metric is applicable only to old pages. It is based on the number of accesses to pages and not the number of pages. There can be multiple accesses to a given page, all of which are counted. If you see very low youngs/s values when there are no large scans occurring, you might need to reduce the delay time or increase the percentage of the buffer pool used for the old sublist. Increasing the percentage makes the old sublist larger, so pages in that sublist take longer to move to the tail, which increases the likelihood that those pages will be accessed again and made young.

  • The non-youngs/s metric is applicable only to old pages. It is based on the number of accesses to pages and not the number of pages. There can be multiple accesses to a given page, all of which are counted. If you do not see a higher non-youngs/s value when performing large table scans (and a higher youngs/s value), increase the delay value.

  • The young-making rate accounts for accesses to all buffer pool pages, not just accesses to pages in the old sublist. The young-making rate and not rate do not normally add up to the overall buffer pool hit rate. Page hits in the old sublist cause pages to move to the new sublist, but page hits in the new sublist cause pages to move to the head of the list only if they are a certain distance from the head.

  • not (young-making rate) is the average hit rate at which page accesses have not resulted in making pages young due to the delay defined by innodb_old_blocks_time not being met, or due to page hits in the new sublist that did not result in pages being moved to the head. This rate accounts for accesses to all buffer pool pages, not just accesses to pages in the old sublist.

Buffer pool server status variables and the INNODB_BUFFER_POOL_STATS table provide many of the same buffer pool metrics found in InnoDB Standard Monitor output. For more information, see Example 15.10, “Querying the INNODB_BUFFER_POOL_STATS Table” .

15.5.2 Change Buffer

The change buffer is a special data structure that caches changes to secondary index pages when those pages are not in the buffer pool . The buffered changes, which may result from INSERT , UPDATE , or DELETE operations (DML), are merged later when the pages are loaded into the buffer pool by other read operations.

Figure 15.3 Change Buffer

Content is described in the surrounding text.

Unlike clustered indexes , secondary indexes are usually nonunique, and inserts into secondary indexes happen in a relatively random order. Similarly, deletes and updates may affect secondary index pages that are not adjacently located in an index tree. Merging cached changes at a later time, when affected pages are read into the buffer pool by other operations, avoids substantial random access I/O that would be required to read secondary index pages into the buffer pool from disk.

Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately.

Change buffer merging may take several hours when there are many affected rows and numerous secondary indexes to update. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries. Change buffer merging may also continue to occur after a transaction is committed, and even after a server shutdown and restart (see Section 15.20.2, “Forcing InnoDB Recovery” for more information).

In memory, the change buffer occupies part of the buffer pool. On disk, the change buffer is part of the system tablespace, where index changes are buffered when the database server is shut down.

The type of data cached in the change buffer is governed by the innodb_change_buffering variable. 更多信息,请见 Configuring Change Buffering . You can also configure the maximum change buffer size. For more information, see Configuring the Change Buffer Maximum Size .

Change buffering is not supported for a secondary index if the index contains a descending index column or if the primary key includes a descending index column.

For answers to frequently asked questions about the change buffer, see Section A.15, “MySQL 8.0 FAQ: InnoDB Change Buffer” .

Configuring Change Buffering

When INSERT , UPDATE , and DELETE operations are performed on a table, the values of indexed columns (particularly the values of secondary keys) are often in an unsorted order, requiring substantial I/O to bring secondary indexes up to date. The change buffer caches changes to secondary index entries when the relevant page is not in the buffer pool , thus avoiding expensive I/O operations by not immediately reading in the page from disk. The buffered changes are merged when the page is loaded into the buffer pool, and the updated page is later flushed to disk. The InnoDB main thread merges buffered changes when the server is nearly idle, and during a slow shutdown .

Because it can result in fewer disk reads and writes, the change buffer feature is most valuable for workloads that are I/O-bound, for example applications with a high volume of DML operations such as bulk inserts.

However, the change buffer occupies a part of the buffer pool, reducing the memory available to cache data pages. If the working set almost fits in the buffer pool, or if your tables have relatively few secondary indexes, it may be useful to disable change buffering. If the working data set fits entirely within the buffer pool, change buffering does not impose extra overhead, because it only applies to pages that are not in the buffer pool.

You can control the extent to which InnoDB performs change buffering using the innodb_change_buffering configuration parameter. You can enable or disable buffering for inserts, delete operations (when index records are initially marked for deletion) and purge operations (when index records are physically deleted). An update operation is a combination of an insert and a delete. The default innodb_change_buffering value is all .

Permitted innodb_change_buffering values include:

  • all

    The default value: buffer inserts, delete-marking operations, and purges.

  • none

    Do not buffer any operations.

  • inserts

    Buffer insert operations.

  • deletes

    Buffer delete-marking operations.

  • changes

    Buffer both inserts and delete-marking operations.

  • purges

    Buffer the physical deletion operations that happen in the background.

You can set the innodb_change_buffering parameter in the MySQL option file ( my.cnf or my.ini ) or change it dynamically with the SET GLOBAL statement, which requires privileges sufficient to set global system variables. See Section 5.1.9.1, “System Variable Privileges” . Changing the setting affects the buffering of new operations; the merging of existing buffered entries is not affected.

Configuring the Change Buffer Maximum Size

The innodb_change_buffer_max_size variable permits configuring the maximum size of the change buffer as a percentage of the total size of the buffer pool. By default, innodb_change_buffer_max_size is set to 25. The maximum setting is 50.

Consider increasing innodb_change_buffer_max_size on a MySQL server with heavy insert, update, and delete activity, where change buffer merging does not keep pace with new change buffer entries, causing the change buffer to reach its maximum size limit.

Consider decreasing innodb_change_buffer_max_size on a MySQL server with static data used for reporting, or if the change buffer consumes too much of the memory space shared with the buffer pool, causing pages to age out of the buffer pool sooner than desired.

Test different settings with a representative workload to determine an optimal configuration. The innodb_change_buffer_max_size setting is dynamic, which permits modifying the setting without restarting the server.

Monitoring the Change Buffer

The following options are available for change buffer monitoring:

  • InnoDB Standard Monitor output includes change buffer status information. To view monitor data, issue the SHOW ENGINE INNODB STATUS statement.

    mysql> SHOW ENGINE INNODB STATUS\G
                                            

    Change buffer status information is located under the INSERT BUFFER AND ADAPTIVE HASH INDEX heading and appears similar to the following:

    -------------------------------------
    INSERT BUFFER AND ADAPTIVE HASH INDEX
    -------------------------------------
    Ibuf: size 1, free list len 0, seg size 2, 0 merges
    merged operations:
     insert 0, delete mark 0, delete 0
    discarded operations:
     insert 0, delete mark 0, delete 0
    Hash table size 4425293, used cells 32, node heap has 1 buffer(s)
    13577.57 hash searches/s, 202.47 non-hash searches/s
                                            

    更多信息,请见 Section 15.16.3, “InnoDB Standard Monitor and Lock Monitor Output” .

  • The INFORMATION_SCHEMA.INNODB_METRICS table provides most of the data points found in InnoDB Standard Monitor output, plus other data points. To view change buffer metrics and a description of each, issue the following query:

    mysql> SELECT NAME, COMMENT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME LIKE '%ibuf%'\G
                                            

    For INNODB_METRICS table usage information, see Section 15.14.6, “InnoDB INFORMATION_SCHEMA Metrics Table” .

  • The INFORMATION_SCHEMA.INNODB_BUFFER_PAGE table provides metadata about each page in the buffer pool, including change buffer index and change buffer bitmap pages. Change buffer pages are identified by PAGE_TYPE . IBUF_INDEX is the page type for change buffer index pages, and IBUF_BITMAP is the page type for change buffer bitmap pages.

    Warning

    Querying the INNODB_BUFFER_PAGE table can introduce significant performance overhead. To avoid impacting performance, reproduce the issue you want to investigate on a test instance and run your queries on the test instance.

    For example, you can query the INNODB_BUFFER_PAGE table to determine the approximate number of IBUF_INDEX and IBUF_BITMAP pages as a percentage of total buffer pool pages.

    mysql> SELECT (SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE
           WHERE PAGE_TYPE LIKE 'IBUF%') AS change_buffer_pages,
           (SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE) AS total_pages,
           (SELECT ((change_buffer_pages/total_pages)*100))
           AS change_buffer_page_percentage;
    +---------------------+-------------+-------------------------------+
    | change_buffer_pages | total_pages | change_buffer_page_percentage |
    +---------------------+-------------+-------------------------------+
    |                  25 |        8192 |                        0.3052 |
    +---------------------+-------------+-------------------------------+
                                            

    For information about other data provided by the INNODB_BUFFER_PAGE table, see Section 25.39.1, “The INFORMATION_SCHEMA INNODB_BUFFER_PAGE Table” . For related usage information, see Section 15.14.5, “InnoDB INFORMATION_SCHEMA Buffer Pool Tables” .

  • Performance Schema provides change buffer mutex wait instrumentation for advanced performance monitoring. To view change buffer instrumentation, issue the following query:

    mysql> SELECT * FROM performance_schema.setup_instruments
           WHERE NAME LIKE '%wait/synch/mutex/innodb/ibuf%';
    +-------------------------------------------------------+---------+-------+
    | NAME                                                  | ENABLED | TIMED |
    +-------------------------------------------------------+---------+-------+
    | wait/synch/mutex/innodb/ibuf_bitmap_mutex             | YES     | YES   |
    | wait/synch/mutex/innodb/ibuf_mutex                    | YES     | YES   |
    | wait/synch/mutex/innodb/ibuf_pessimistic_insert_mutex | YES     | YES   |
    +-------------------------------------------------------+---------+-------+
                                            

    For information about monitoring InnoDB mutex waits, see Section 15.15.2, “Monitoring InnoDB Mutex Waits Using Performance Schema” .

15.5.3 Adaptive Hash Index

The adaptive hash index feature enables InnoDB to perform more like an in-memory database on systems with appropriate combinations of workload and sufficient memory for the buffer pool without sacrificing transactional features or reliability. The adaptive hash index feature is enabled by the innodb_adaptive_hash_index variable, or turned off at server startup by --skip-innodb-adaptive-hash-index .

Based on the observed pattern of searches, a hash index is built using a prefix of the index key. The prefix can be any length, and it may be that only some values in the B-tree appear in the hash index. Hash indexes are built on demand for the pages of the index that are accessed often.

If a table fits almost entirely in main memory, a hash index can speed up queries by enabling direct lookup of any element, turning the index value into a sort of pointer. InnoDB has a mechanism that monitors index searches. If InnoDB notices that queries could benefit from building a hash index, it does so automatically.

With some workloads, the speedup from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash index structure. Access to the adaptive hash index can sometimes become a source of contention under heavy workloads, such as multiple concurrent joins. Queries with LIKE operators and % wildcards also tend not to benefit. For workloads that do not benefit from the adaptive hash index feature, turning it off reduces unnecessary performance overhead. Because it is difficult to predict in advance whether the adaptive hash index feature is appropriate for a particular system and workload, consider running benchmarks with it enabled and disabled. Architectural changes in MySQL 5.6 make it more suitable to disable the adaptive hash index feature than in earlier releases.

The adaptive hash index feature is partitioned. Each index is bound to a specific partition, and each partition is protected by a separate latch. Partitioning is controlled by the innodb_adaptive_hash_index_parts variable. The innodb_adaptive_hash_index_parts variable is set to 8 by default. The maximum setting is 512.

You can monitor adaptive hash index use and contention in the SEMAPHORES section of SHOW ENGINE INNODB STATUS output. If there are numerous threads waiting on RW-latches created in btr0sea.c , consider increasing the number of adaptive hash index partitions or disabling the adaptive hash index feature.

For information about the performance characteristics of hash indexes, see Section 8.3.9, “Comparison of B-Tree and Hash Indexes” .

15.5.4 Log Buffer

The log buffer is the memory area that holds data to be written to the log files on disk. Log buffer size is defined by the innodb_log_buffer_size variable. The default size is 16MB. The contents of the log buffer are periodically flushed to disk. A large log buffer enables large transactions to run without the need to write redo log data to disk before the transactions commit. Thus, if you have transactions that update, insert, or delete many rows, increasing the size of the log buffer saves disk I/O.

The innodb_flush_log_at_trx_commit variable controls how the contents of the log buffer are written and flushed to disk. The innodb_flush_log_at_timeout variable controls log flushing frequency.

For related information, see Memory Configuration , and Section 8.5.4, “Optimizing InnoDB Redo Logging” .

15.6 InnoDB On-Disk Structures

This section describes InnoDB on-disk structures and related topics.

15.6.1 Tables

This section covers topics related to InnoDB tables.

15.6.1.1 Creating InnoDB Tables

To create an InnoDB table, use the CREATE TABLE statement.

CREATE TABLE t1 (a INT, b CHAR (20), PRIMARY KEY (a)) ENGINE=InnoDB;
                            

You do not need to specify the ENGINE=InnoDB clause if InnoDB is defined as the default storage engine, which it is by default. To check the default storage engine, issue the following statement:

mysql> SELECT @@default_storage_engine;
+--------------------------+
| @@default_storage_engine |
+--------------------------+
| InnoDB                   |
+--------------------------+
                            

You might still use ENGINE=InnoDB clause if you plan to use mysqldump or replication to replay the CREATE TABLE statement on a server where the default storage engine is not InnoDB .

An InnoDB table and its indexes can be created in the system tablespace , in a file-per-table tablespace, or in a general tablespace . When innodb_file_per_table is enabled, which is the default, an InnoDB table is implicitly created in an individual file-per-table tablespace. Conversely, when innodb_file_per_table is disabled, an InnoDB table is implicitly created in the InnoDB system tablespace. To create a table in a general tablespace, use CREATE TABLE ... TABLESPACE syntax. For more information, see Section 15.6.3.3, “General Tablespaces” .

When you create a table in a file-per-table tablespace, MySQL creates an .ibd tablespace file in a database directory under the MySQL data directory, by default. A table created in the InnoDB system tablespace is created in an existing ibdata file , which resides in the MySQL data directory. A table created in a general tablespace is created in an existing general tablespace .ibd file . General tablespace files can be created inside or outside of the MySQL data directory. For more information, see Section 15.6.3.3, “General Tablespaces” .

内部, InnoDB adds an entry for each table to the data dictionary. The entry includes the database name. For example, if table t1 is created in the test database, the data dictionary entry for the database name is 'test/t1' . This means you can create a table of the same name ( t1 ) in a different database, and the table names do not collide inside InnoDB .

InnoDB Tables and Row Formats

The default row format for InnoDB tables is defined by the innodb_default_row_format configuration option, which has a default value of DYNAMIC . Dynamic and Compressed row format allow you to take advantage of InnoDB features such as table compression and efficient off-page storage of long column values. To use these row formats, innodb_file_per_table must be enabled (the default).

SET GLOBAL innodb_file_per_table=1;
CREATE TABLE t3 (a INT, b CHAR (20), PRIMARY KEY (a)) ROW_FORMAT=DYNAMIC;
CREATE TABLE t4 (a INT, b CHAR (20), PRIMARY KEY (a)) ROW_FORMAT=COMPRESSED;
                                

Alternatively, you can use CREATE TABLE ... TABLESPACE syntax to create an InnoDB table in a general tablespace. General tablespaces support all row formats. For more information, see Section 15.6.3.3, “General Tablespaces” .

CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts1 ROW_FORMAT=DYNAMIC;
                                

CREATE TABLE ... TABLESPACE syntax can also be used to create InnoDB tables with a Dynamic row format in the system tablespace, alongside tables with a Compact or Redundant row format.

CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE = innodb_system ROW_FORMAT=DYNAMIC;
                                

For more information about InnoDB row formats, see Section 15.10, “InnoDB Row Formats” . For how to determine the row format of an InnoDB table and the physical characteristics of InnoDB row formats, see Section 15.10, “InnoDB Row Formats” .

InnoDB Tables and Primary Keys

Always define a primary key for an InnoDB table, specifying the column or columns that:

  • Are referenced by the most important queries.

  • Are never left blank.

  • Never have duplicate values.

  • Rarely if ever change value once inserted.

For example, in a table containing information about people, you would not create a primary key on (firstname, lastname) because more than one person can have the same name, some people have blank last names, and sometimes people change their names. With so many constraints, often there is not an obvious set of columns to use as a primary key, so you create a new column with a numeric ID to serve as all or part of the primary key. You can declare an auto-increment column so that ascending values are filled in automatically as rows are inserted:

# The value of ID can act like a pointer between related items in different tables.
CREATE TABLE t5 (id INT AUTO_INCREMENT, b CHAR (20), PRIMARY KEY (id));
# The primary key can consist of more than one column. Any autoinc column must come first.
CREATE TABLE t6 (id INT AUTO_INCREMENT, a INT, b CHAR (20), PRIMARY KEY (id,a));
                                

Although the table works correctly without defining a primary key, the primary key is involved with many aspects of performance and is a crucial design aspect for any large or frequently used table. It is recommended that you always specify a primary key in the CREATE TABLE statement. If you create the table, load data, and then run ALTER TABLE to add a primary key later, that operation is much slower than defining the primary key when creating the table.

Viewing InnoDB Table Properties

To view the properties of an InnoDB table, issue a SHOW TABLE STATUS statement:

mysql> SHOW TABLE STATUS FROM test LIKE 't%' \G;
*************************** 1. row ***************************
           Name: t1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2015-03-16 15:13:31
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_0900_ai_ci
       Checksum: NULL
 Create_options:
        Comment:
                                

For information about SHOW TABLE STATUS output, see Section 13.7.6.36, “SHOW TABLE STATUS Syntax” .

InnoDB table properties may also be queried using the InnoDB Information Schema system tables:

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_TABLES WHERE NAME='test/t1' \G
*************************** 1. row ***************************
     TABLE_ID: 45
         NAME: test/t1
         FLAG: 1
       N_COLS: 5
        SPACE: 35
   ROW_FORMAT: Compact
ZIP_PAGE_SIZE: 0
   SPACE_TYPE: Single
                                

更多信息,请见 Section 15.14.3, “InnoDB INFORMATION_SCHEMA Schema Object Tables” .

15.6.1.2 Moving or Copying InnoDB Tables

This section describes techniques for moving or copying some or all InnoDB tables to a different server or instance. For example, you might move an entire MySQL instance to a larger, faster server; you might clone an entire MySQL instance to a new replication slave server; you might copy individual tables to another instance to develop and test an application, or to a data warehouse server to produce reports.

On Windows, InnoDB always stores database and table names internally in lowercase. To move databases in a binary format from Unix to Windows or from Windows to Unix, create all databases and tables using lowercase names. A convenient way to accomplish this is to add the following line to the [mysqld] section of your my.cnf or my.ini file before creating any databases or tables:

[mysqld]
lower_case_table_names=1
                            
Note

It is prohibited to start the server with a lower_case_table_names setting that is different from the setting used when the server was initialized.

Techniques for moving or copying InnoDB tables include:

Transportable Tablespaces

The transportable tablespaces feature uses FLUSH TABLES ... FOR EXPORT to ready InnoDB tables for copying from one server instance to another. To use this feature, InnoDB tables must be created with innodb_file_per_table set to ON so that each InnoDB table has its own tablespace. For usage information, see Section 15.6.3.7, “Copying Tablespaces to Another Instance” .

MySQL Enterprise Backup

The MySQL Enterprise Backup product lets you back up a running MySQL database with minimal disruption to operations while producing a consistent snapshot of the database. When MySQL Enterprise Backup is copying tables, reads and writes can continue. In addition, MySQL Enterprise Backup can create compressed backup files, and back up subsets of tables. In conjunction with the MySQL binary log, you can perform point-in-time recovery. MySQL Enterprise Backup is included as part of the MySQL Enterprise subscription.

For more details about MySQL Enterprise Backup, see Section 30.2, “MySQL Enterprise Backup Overview” .

Copying Data Files (Cold Backup Method)

You can move an InnoDB database simply by copying all the relevant files listed under "Cold Backups" in Section 15.17.1, “InnoDB Backup” .

InnoDB data and log files are binary-compatible on all platforms having the same floating-point number format. If the floating-point formats differ but you have not used FLOAT or DOUBLE data types in your tables, then the procedure is the same: simply copy the relevant files.

When you move or copy file-per-table .ibd files, the database directory name must be the same on the source and destination systems. The table definition stored in the InnoDB shared tablespace includes the database name. The transaction IDs and log sequence numbers stored in the tablespace files also differ between databases.

To move an .ibd file and the associated table from one database to another, use a RENAME TABLE statement:

RENAME TABLE db1.tbl_name TO db2.tbl_name;
                            

If you have a clean backup of an .ibd file, you can restore it to the MySQL installation from which it originated as follows:

  1. The table must not have been dropped or truncated since you copied the .ibd file, because doing so changes the table ID stored inside the tablespace.

  2. Issue this ALTER TABLE statement to delete the current .ibd file:

    ALTER TABLE tbl_name DISCARD TABLESPACE;
                                            
  3. Copy the backup .ibd file to the proper database directory.

  4. Issue this ALTER TABLE statement to tell InnoDB to use the new .ibd file for the table:

    ALTER TABLE tbl_name IMPORT TABLESPACE;
                                            
    Note

    The ALTER TABLE ... IMPORT TABLESPACE feature does not enforce foreign key constraints on imported data.

In this context, a clean .ibd file backup is one for which the following requirements are satisfied:

  • There are no uncommitted modifications by transactions in the .ibd file.

  • There are no unmerged insert buffer entries in the .ibd file.

  • Purge has removed all delete-marked index records from the .ibd file.

  • mysqld has flushed all modified pages of the .ibd file from the buffer pool to the file.

You can make a clean backup .ibd file using the following method:

  1. Stop all activity from the mysqld server and commit all transactions.

  2. Wait until SHOW ENGINE INNODB STATUS shows that there are no active transactions in the database, and the main thread status of InnoDB is Waiting for server activity . Then you can make a copy of the .ibd file.

Another method for making a clean copy of an .ibd file is to use the MySQL Enterprise Backup product:

  1. Use MySQL Enterprise Backup to back up the InnoDB installation.

  2. Start a second mysqld server on the backup and let it clean up the .ibd files in the backup.

Export and Import (mysqldump)

You can use mysqldump to dump your tables on one machine and then import the dump files on the other machine. Using this method, it does not matter whether the formats differ or if your tables contain floating-point data.

One way to increase the performance of this method is to switch off autocommit mode when importing data, assuming that the tablespace has enough space for the big rollback segment that the import transactions generate. Do the commit only after importing a whole table or a segment of a table.

15.6.1.3 Converting Tables from MyISAM to InnoDB

If you have MyISAM tables that you want to convert to InnoDB for better reliability and scalability, review the following guidelines and tips before converting.

Note

Partitioned MyISAM tables created in previous versions of MySQL are not compatible with MySQL 8.0. Such tables must be prepared prior to upgrade, either by removing the partitioning, or by converting them to InnoDB . See Section 23.6.2, “Partitioning Limitations Relating to Storage Engines” , for more information.

Adjusting Memory Usage for MyISAM and InnoDB

As you transition away from MyISAM tables, lower the value of the key_buffer_size configuration option to free memory no longer needed for caching results. Increase the value of the innodb_buffer_pool_size configuration option, which performs a similar role of allocating cache memory for InnoDB tables. The InnoDB buffer pool caches both table data and index data, speeding up lookups for queries and keeping query results in memory for reuse. For guidance regarding buffer pool size configuration, see Section 8.12.3.1, “How MySQL Uses Memory” .

Handling Too-Long Or Too-Short Transactions

Because MyISAM tables do not support transactions , you might not have paid much attention to the autocommit configuration option and the COMMIT and ROLLBACK statements. These keywords are important to allow multiple sessions to read and write InnoDB tables concurrently, providing substantial scalability benefits in write-heavy workloads.

While a transaction is open, the system keeps a snapshot of the data as seen at the beginning of the transaction, which can cause substantial overhead if the system inserts, updates, and deletes millions of rows while a stray transaction keeps running. Thus, take care to avoid transactions that run for too long:

  • If you are using a mysql session for interactive experiments, always COMMIT (to finalize the changes) or ROLLBACK (to undo the changes) when finished. Close down interactive sessions rather than leave them open for long periods, to avoid keeping transactions open for long periods by accident.

  • Make sure that any error handlers in your application also ROLLBACK incomplete changes or COMMIT completed changes.

  • ROLLBACK is a relatively expensive operation, because INSERT , UPDATE , and DELETE operations are written to InnoDB tables prior to the COMMIT , with the expectation that most changes are committed successfully and rollbacks are rare. When experimenting with large volumes of data, avoid making changes to large numbers of rows and then rolling back those changes.

  • When loading large volumes of data with a sequence of INSERT statements, periodically COMMIT the results to avoid having transactions that last for hours. In typical load operations for data warehousing, if something goes wrong, you truncate the table (using TRUNCATE TABLE ) and start over from the beginning rather than doing a ROLLBACK .

The preceding tips save memory and disk space that can be wasted during too-long transactions. When transactions are shorter than they should be, the problem is excessive I/O. With each COMMIT , MySQL makes sure each change is safely recorded to disk, which involves some I/O.

  • For most operations on InnoDB tables, you should use the setting autocommit=0 . From an efficiency perspective, this avoids unnecessary I/O when you issue large numbers of consecutive INSERT , UPDATE , or DELETE statements. From a safety perspective, this allows you to issue a ROLLBACK statement to recover lost or garbled data if you make a mistake on the mysql command line, or in an exception handler in your application.

  • The time when autocommit=1 is suitable for InnoDB tables is when running a sequence of queries for generating reports or analyzing statistics. In this situation, there is no I/O penalty related to COMMIT or ROLLBACK , and InnoDB can automatically optimize the read-only workload .

  • If you make a series of related changes, finalize all the changes at once with a single COMMIT at the end. For example, if you insert related pieces of information into several tables, do a single COMMIT after making all the changes. Or if you run many consecutive INSERT statements, do a single COMMIT after all the data is loaded; if you are doing millions of INSERT statements, perhaps split up the huge transaction by issuing a COMMIT every ten thousand or hundred thousand records, so the transaction does not grow too large.

  • Remember that even a SELECT statement opens a transaction, so after running some report or debugging queries in an interactive mysql session, either issue a COMMIT or close the mysql session.

Handling Deadlocks

You might see warning messages referring to deadlocks in the MySQL error log, or the output of SHOW ENGINE INNODB STATUS . Despite the scary-sounding name, a deadlock is not a serious issue for InnoDB tables, and often does not require any corrective action. When two transactions start modifying multiple tables, accessing the tables in a different order, they can reach a state where each transaction is waiting for the other and neither can proceed. When deadlock detection is enabled (the default), MySQL immediately detects this condition and cancels ( rolls back ) the smaller transaction, allowing the other to proceed. If deadlock detection is disabled using the innodb_deadlock_detect configuration option, InnoDB relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock.

Either way, your applications need error-handling logic to restart a transaction that is forcibly cancelled due to a deadlock. When you re-issue the same SQL statements as before, the original timing issue no longer applies. Either the other transaction has already finished and yours can proceed, or the other transaction is still in progress and your transaction waits until it finishes.

If deadlock warnings occur constantly, you might review the application code to reorder the SQL operations in a consistent way, or to shorten the transactions. You can test with the innodb_print_all_deadlocks option enabled to see all deadlock warnings in the MySQL error log, rather than only the last warning in the SHOW ENGINE INNODB STATUS output.

更多信息,请见 Section 15.7.5, “Deadlocks in InnoDB” .

Planning the Storage Layout

To get the best performance from InnoDB tables, you can adjust a number of parameters related to storage layout.

When you convert MyISAM tables that are large, frequently accessed, and hold vital data, investigate and consider the innodb_file_per_table and innodb_page_size configuration options, and the ROW_FORMAT and KEY_BLOCK_SIZE clauses of the CREATE TABLE statement.

During your initial experiments, the most important setting is innodb_file_per_table . When this setting is enabled, which is the default, new InnoDB tables are implicitly created in file-per-table tablespaces. In contrast with the InnoDB system tablespace, file-per-table tablespaces allow disk space to be reclaimed by the operating system when a table is truncated or dropped. File-per-table tablespaces also support DYNAMIC and COMPRESSED row formats and associated features such as table compression, efficient off-page storage for long variable-length columns, and large index prefixes. For more information, see Section 15.6.3.2, “File-Per-Table Tablespaces” .

You can also store InnoDB tables in a shared general tablespace, which support multiple tables and all row formats. For more information, see Section 15.6.3.3, “General Tablespaces” .

Converting an Existing Table

To convert a non- InnoDB table to use InnoDB use ALTER TABLE :

ALTER TABLE table_name ENGINE=InnoDB;
                                
Cloning the Structure of a Table

You might make an InnoDB table that is a clone of a MyISAM table, rather than using ALTER TABLE to perform conversion, to test the old and new table side-by-side before switching.

Create an empty InnoDB table with identical column and index definitions. Use SHOW CREATE TABLE table_name \G to see the full CREATE TABLE statement to use. Change the ENGINE clause to ENGINE=INNODB .

Transferring Existing Data

To transfer a large volume of data into an empty InnoDB table created as shown in the previous section, insert the rows with INSERT INTO innodb_table SELECT * FROM myisam_table ORDER BY primary_key_columns .

You can also create the indexes for the InnoDB table after inserting the data. Historically, creating new secondary indexes was a slow operation for InnoDB, but now you can create the indexes after the data is loaded with relatively little overhead from the index creation step.

If you have UNIQUE constraints on secondary keys, you can speed up a table import by turning off the uniqueness checks temporarily during the import operation:

SET unique_checks=0;
... import operation ...
SET unique_checks=1;
                                

For big tables, this saves disk I/O because InnoDB can use its change buffer to write secondary index records as a batch. Be certain that the data contains no duplicate keys. unique_checks permits but does not require storage engines to ignore duplicate keys.

For better control over the insertion process, you can insert big tables in pieces:

INSERT INTO newtable SELECT * FROM oldtable
   WHERE yourkey > something AND yourkey <= somethingelse;
                                

After all records are inserted, you can rename the tables.

During the conversion of big tables, increase the size of the InnoDB buffer pool to reduce disk I/O, to a maximum of 80% of physical memory. You can also increase the size of InnoDB log files.

Storage Requirements

If you intend to make several temporary copies of your data in InnoDB tables during the conversion process, it is recommended that you create the tables in file-per-table tablespaces so that you can reclaim the disk space when you drop the tables. When the innodb_file_per_table configuration option is enabled (the default), newly created InnoDB tables are implicitly created in file-per-table tablespaces.

Whether you convert the MyISAM table directly or create a cloned InnoDB table, make sure that you have sufficient disk space to hold both the old and new tables during the process. InnoDB tables require more disk space than MyISAM tables. If an ALTER TABLE operation runs out of space, it starts a rollback, and that can take hours if it is disk-bound. For inserts, InnoDB uses the insert buffer to merge secondary index records to indexes in batches. That saves a lot of disk I/O. For rollback, no such mechanism is used, and the rollback can take 30 times longer than the insertion.

In the case of a runaway rollback, if you do not have valuable data in your database, it may be advisable to kill the database process rather than wait for millions of disk I/O operations to complete. For the complete procedure, see Section 15.20.2, “Forcing InnoDB Recovery” .

Defining a PRIMARY KEY for Each Table

The PRIMARY KEY clause is a critical factor affecting the performance of MySQL queries and the space usage for tables and indexes. The primary key uniquely identifies a row in a table. Every row in the table must have a primary key value, and no two rows can have the same primary key value.

These are guidelines for the primary key, followed by more detailed explanations.

  • Declare a PRIMARY KEY for each table. Typically, it is the most important column that you refer to in WHERE clauses when looking up a single row.

  • Declare the PRIMARY KEY clause in the original CREATE TABLE statement, rather than adding it later through an ALTER TABLE statement.

  • Choose the column and its data type carefully. Prefer numeric columns over character or string ones.

  • Consider using an auto-increment column if there is not another stable, unique, non-null, numeric column to use.

  • An auto-increment column is also a good choice if there is any doubt whether the value of the primary key column could ever change. Changing the value of a primary key column is an expensive operation, possibly involving rearranging data within the table and within each secondary index.

Consider adding a primary key to any table that does not already have one. Use the smallest practical numeric type based on the maximum projected size of the table. This can make each row slightly more compact, which can yield substantial space savings for large tables. The space savings are multiplied if the table has any secondary indexes , because the primary key value is repeated in each secondary index entry. In addition to reducing data size on disk, a small primary key also lets more data fit into the buffer pool , speeding up all kinds of operations and improving concurrency.

If the table already has a primary key on some longer column, such as a VARCHAR , consider adding a new unsigned AUTO_INCREMENT column and switching the primary key to that, even if that column is not referenced in queries. This design change can produce substantial space savings in the secondary indexes. You can designate the former primary key columns as UNIQUE NOT NULL to enforce the same constraints as the PRIMARY KEY clause, that is, to prevent duplicate or null values across all those columns.

If you spread related information across multiple tables, typically each table uses the same column for its primary key. For example, a personnel database might have several tables, each with a primary key of employee number. A sales database might have some tables with a primary key of customer number, and other tables with a primary key of order number. Because lookups using the primary key are very fast, you can construct efficient join queries for such tables.

If you leave the PRIMARY KEY clause out entirely, MySQL creates an invisible one for you. It is a 6-byte value that might be longer than you need, thus wasting space. Because it is hidden, you cannot refer to it in queries.

Application Performance Considerations

The reliability and scalability features of InnoDB require more disk storage than equivalent MyISAM tables. You might change the column and index definitions slightly, for better space utilization, reduced I/O and memory consumption when processing result sets, and better query optimization plans making efficient use of index lookups.

If you do set up a numeric ID column for the primary key, use that value to cross-reference with related values in any other tables, particularly for join queries. For example, rather than accepting a country name as input and doing queries searching for the same name, do one lookup to determine the country ID, then do other queries (or a single join query) to look up relevant information across several tables. Rather than storing a customer or catalog item number as a string of digits, potentially using up several bytes, convert it to a numeric ID for storing and querying. A 4-byte unsigned INT column can index over 4 billion items (with the US meaning of billion: 1000 million). For the ranges of the different integer types, see Section 11.2.1, “Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT” .

Understanding Files Associated with InnoDB Tables

InnoDB files require more care and planning than MyISAM files do.

15.6.1.4 AUTO_INCREMENT Handling in InnoDB

InnoDB provides a configurable locking mechanism that can significantly improve scalability and performance of SQL statements that add rows to tables with AUTO_INCREMENT columns. To use the AUTO_INCREMENT mechanism with an InnoDB table, an AUTO_INCREMENT column must be defined as part of an index such that it is possible to perform the equivalent of an indexed SELECT MAX( ai_col ) lookup on the table to obtain the maximum column value. Typically, this is achieved by making the column the first column of some table index.

This section describes the behavior of AUTO_INCREMENT lock modes, usage implications for different AUTO_INCREMENT lock mode settings, and how InnoDB initializes the AUTO_INCREMENT counter.

InnoDB AUTO_INCREMENT Lock Modes

This section describes the behavior of AUTO_INCREMENT lock modes used to generate auto-increment values, and how each lock mode affects replication. Auto-increment lock modes are configured at startup using the innodb_autoinc_lock_mode configuration parameter.

The following terms are used in describing innodb_autoinc_lock_mode settings:

  • INSERT -like statements

    All statements that generate new rows in a table, including INSERT , INSERT ... SELECT , REPLACE , REPLACE ... SELECT , and LOAD DATA . Includes simple-inserts , bulk-inserts , and mixed-mode inserts.

  • Simple inserts

    Statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). This includes single-row and multiple-row INSERT and REPLACE statements that do not have a nested subquery, but not INSERT ... ON DUPLICATE KEY UPDATE .

  • Bulk inserts

    Statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. This includes INSERT ... SELECT , REPLACE ... SELECT , and LOAD DATA statements, but not plain INSERT . InnoDB assigns new values for the AUTO_INCREMENT column one at a time as each row is processed.

  • Mixed-mode inserts

    These are simple insert statements that specify the auto-increment value for some (but not all) of the new rows. An example follows, where c1 is an AUTO_INCREMENT column of table t1 :

    INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
                                                

    Another type of mixed-mode insert is INSERT ... ON DUPLICATE KEY UPDATE , which in the worst case is in effect an INSERT followed by a UPDATE , where the allocated value for the AUTO_INCREMENT column may or may not be used during the update phase.

There are three possible settings for the innodb_autoinc_lock_mode configuration parameter. The settings are 0, 1, or 2, for traditional , consecutive , or interleaved lock mode, respectively. As of MySQL 8.0, interleaved lock mode ( innodb_autoinc_lock_mode=2 ) is the default setting. Prior to MySQL 8.0, consecutive lock mode is the default ( innodb_autoinc_lock_mode=1 ).

The default setting of interleaved lock mode in MySQL 8.0 reflects the change from statement-based replication to row based replication as the default replication type. Statement-based replication requires the consecutive auto-increment lock mode to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of SQL statements, whereas row-based replication is not sensitive to the execution order of SQL statements.

  • innodb_autoinc_lock_mode = 0 ( traditional lock mode)

    The traditional lock mode provides the same behavior that existed before the innodb_autoinc_lock_mode configuration parameter was introduced in MySQL 5.1. The traditional lock mode option is provided for backward compatibility, performance testing, and working around issues with “mixed-mode inserts”, due to possible differences in semantics.

    In this lock mode, all INSERT-like statements obtain a special table-level AUTO-INC lock for inserts into tables with AUTO_INCREMENT columns. This lock is normally held to the end of the statement (not to the end of the transaction) to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of INSERT statements, and to ensure that auto-increment values assigned by any given statement are consecutive.

    In the case of statement-based replication, this means that when an SQL statement is replicated on a slave server, the same values are used for the auto-increment column as on the master server. The result of execution of multiple INSERT statements is deterministic, and the slave reproduces the same data as on the master. If auto-increment values generated by multiple INSERT statements were interleaved, the result of two concurrent INSERT statements would be nondeterministic, and could not reliably be propagated to a slave server using statement-based replication.

    To make this clear, consider an example that uses this table:

    CREATE TABLE t1 (
      c1 INT(11) NOT NULL AUTO_INCREMENT,
      c2 VARCHAR(10) DEFAULT NULL,
      PRIMARY KEY (c1)
    ) ENGINE=InnoDB;
                                                

    Suppose that there are two transactions running, each inserting rows into a table with an AUTO_INCREMENT column. One transaction is using an INSERT ... SELECT statement that inserts 1000 rows, and another is using a simple INSERT statement that inserts one row:

    Tx1: INSERT INTO t1 (c2) SELECT 1000 rows from another table ...
    Tx2: INSERT INTO t1 (c2) VALUES ('xxx');
                                                

    InnoDB cannot tell in advance how many rows are retrieved from the SELECT in the INSERT statement in Tx1, and it assigns the auto-increment values one at a time as the statement proceeds. With a table-level lock, held to the end of the statement, only one INSERT statement referring to table t1 can execute at a time, and the generation of auto-increment numbers by different statements is not interleaved. The auto-increment value generated by the Tx1 INSERT ... SELECT statement are consecutive, and the (single) auto-increment value used by the INSERT statement in Tx2 are either smaller or larger than all those used for Tx1, depending on which statement executes first.

    As long as the SQL statements execute in the same order when replayed from the binary log (when using statement-based replication, or in recovery scenarios), the results are the same as they were when Tx1 and Tx2 first ran. Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those table-level locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time.

    In the preceding example, if there were no table-level lock, the value of the auto-increment column used for the INSERT in Tx2 depends on precisely when the statement executes. If the INSERT of Tx2 executes while the INSERT of Tx1 is running (rather than before it starts or after it completes), the specific auto-increment values assigned by the two INSERT statements are nondeterministic, and may vary from run to run.

    Under the consecutive lock mode, InnoDB can avoid using table-level AUTO-INC locks for simple insert statements where the number of rows is known in advance, and still preserve deterministic execution and safety for statement-based replication.

    If you are not using the binary log to replay SQL statements as part of recovery or replication, the interleaved lock mode can be used to eliminate all use of table-level AUTO-INC locks for even greater concurrency and performance, at the cost of permitting gaps in auto-increment numbers assigned by a statement and potentially having the numbers assigned by concurrently executing statements interleaved.

  • innodb_autoinc_lock_mode = 1 ( consecutive lock mode)

    In this mode, bulk inserts use the special AUTO-INC table-level lock and hold it until the end of the statement. This applies to all INSERT ... SELECT , REPLACE ... SELECT , and LOAD DATA statements. Only one statement holding the AUTO-INC lock can execute at a time. If the source table of the bulk insert operation is different from the target table, the AUTO-INC lock on the target table is taken after a shared lock is taken on the first row selected from the source table. If the source and target of the bulk insert operation are the same table, the AUTO-INC lock is taken after shared locks are taken on all selected rows.

    Simple inserts (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes. No table-level AUTO-INC lock is used unless an AUTO-INC lock is held by another transaction. If another transaction holds an AUTO-INC lock, a simple insert waits for the AUTO-INC lock, as if it were a bulk insert .

    This lock mode ensures that, in the presence of INSERT statements where the number of rows is not known in advance (and where auto-increment numbers are assigned as the statement progresses), all auto-increment values assigned by any INSERT -like statement are consecutive, and operations are safe for statement-based replication.

    Simply put, this lock mode significantly improves scalability while being safe for use with statement-based replication. Further, as with traditional lock mode, auto-increment numbers assigned by any given statement are consecutive . There is no change in semantics compared to traditional mode for any statement that uses auto-increment, with one important exception.

    The exception is for mixed-mode inserts , where the user provides explicit values for an AUTO_INCREMENT column for some, but not all, rows in a multiple-row simple insert . For such inserts, InnoDB allocates more auto-increment values than the number of rows to be inserted. However, all values automatically assigned are consecutively generated (and thus higher than) the auto-increment value generated by the most recently executed previous statement. Excess numbers are lost.

  • innodb_autoinc_lock_mode = 2 ( interleaved lock mode)

    In this lock mode, no INSERT -like statements use the table-level AUTO-INC lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log.

    In this lock mode, auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing INSERT -like statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.

    If the only statements executing are simple inserts where the number of rows to be inserted is known ahead of time, there are no gaps in the numbers generated for a single statement, except for mixed-mode inserts . However, when bulk inserts are executed, there may be gaps in the auto-increment values assigned by any given statement.

InnoDB AUTO_INCREMENT Lock Mode Usage Implications
  • Using auto-increment with replication

    If you are using statement-based replication, set innodb_autoinc_lock_mode to 0 or 1 and use the same value on the master and its slaves. Auto-increment values are not ensured to be the same on the slaves as on the master if you use innodb_autoinc_lock_mode = 2 ( interleaved ) or configurations where the master and slaves do not use the same lock mode.

    If you are using row-based or mixed-format replication, all of the auto-increment lock modes are safe, since row-based replication is not sensitive to the order of execution of the SQL statements (and the mixed format uses row-based replication for any statements that are unsafe for statement-based replication).

  • Lost auto-increment values and sequence gaps

    In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are lost . Once a value is generated for an auto-increment column, it cannot be rolled back, whether or not the INSERT -like statement is completed, and whether or not the containing transaction is rolled back. Such lost values are not reused. Thus, there may be gaps in the values stored in an AUTO_INCREMENT column of a table.

  • Specifying NULL or 0 for the AUTO_INCREMENT column

    In all lock modes (0, 1, and 2), if a user specifies NULL or 0 for the AUTO_INCREMENT column in an INSERT , InnoDB treats the row as if the value was not specified and generates a new value for it.

  • Assigning a negative value to the AUTO_INCREMENT column

    In all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if you assign a negative value to the AUTO_INCREMENT column.

  • If the AUTO_INCREMENT value becomes larger than the maximum integer for the specified integer type

    In all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if the value becomes larger than the maximum integer that can be stored in the specified integer type.

  • Gaps in auto-increment values for bulk inserts

    With innodb_autoinc_lock_mode set to 0 ( traditional ) or 1 ( consecutive ), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-level AUTO-INC lock is held until the end of the statement, and only one such statement can execute at a time.

    With innodb_autoinc_lock_mode set to 2 ( interleaved ), there may be gaps in the auto-increment values generated by bulk inserts, but only if there are concurrently executing INSERT -like statements.

    For lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.

  • Auto-increment values assigned by mixed-mode inserts

    Consider a mixed-mode insert, where a simple insert specifies the auto-increment value for some (but not all) resulting rows. Such a statement behaves differently in lock modes 0, 1, and 2. For example, assume c1 is an AUTO_INCREMENT column of table t1 , and that the most recent automatically generated sequence number is 100.

    mysql> CREATE TABLE t1 (
        -> c1 INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 
        -> c2 CHAR(1)
        -> ) ENGINE = INNODB;
                                                

    Now, consider the following mixed-mode insert statement:

    mysql> INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
                                                

    With innodb_autoinc_lock_mode set to 0 ( traditional ), the four new rows are:

    mysql> SELECT c1, c2 FROM t1 ORDER BY c2;
    +-----+------+
    | c1  | c2   |
    +-----+------+
    |   1 | a    |
    | 101 | b    |
    |   5 | c    |
    | 102 | d    |
    +-----+------+
                                                

    The next available auto-increment value is 103 because the auto-increment values are allocated one at a time, not all at once at the beginning of statement execution. This result is true whether or not there are concurrently executing INSERT -like statements (of any type).

    With innodb_autoinc_lock_mode set to 1 ( consecutive ), the four new rows are also:

    mysql> SELECT c1, c2 FROM t1 ORDER BY c2;
    +-----+------+
    | c1  | c2   |
    +-----+------+
    |   1 | a    |
    | 101 | b    |
    |   5 | c    |
    | 102 | d    |
    +-----+------+
                                                

    However, in this case, the next available auto-increment value is 105, not 103 because four auto-increment values are allocated at the time the statement is processed, but only two are used. This result is true whether or not there are concurrently executing INSERT -like statements (of any type).

    With innodb_autoinc_lock_mode set to mode 2 ( interleaved ), the four new rows are:

    mysql> SELECT c1, c2 FROM t1 ORDER BY c2;
    +-----+------+
    | c1  | c2   |
    +-----+------+
    |   1 | a    |
    |   x | b    |
    |   5 | c    |
    |   y | d    |
    +-----+------+
                                                

    The values of x and y are unique and larger than any previously generated rows. However, the specific values of x and y depend on the number of auto-increment values generated by concurrently executing statements.

    Finally, consider the following statement, issued when the most-recently generated sequence number is 100:

    mysql> INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (101,'c'), (NULL,'d');
                                                

    With any innodb_autoinc_lock_mode setting, this statement generates a duplicate-key error 23000 ( Can't write; duplicate key in table ) because 101 is allocated for the row (NULL, 'b') and insertion of the row (101, 'c') fails.

  • Modifying AUTO_INCREMENT column values in the middle of a sequence of INSERT statements

    In MySQL 5.7 and earlier, modifying an AUTO_INCREMENT column value in the middle of a sequence of INSERT statements could lead to Duplicate entry errors. For example, if you performed an UPDATE operation that changed an AUTO_INCREMENT column value to a value larger than the current maximum auto-increment value, subsequent INSERT operations that did not specify an unused auto-increment value could encounter Duplicate entry errors. In MySQL 8.0 and later, if you modify an AUTO_INCREMENT column value to a value larger than the current maximum auto-increment value, the new value is persisted, and subsequent INSERT operations allocate auto-increment values starting from the new, larger value. This behavior is demonstrated in the following example.

    mysql> CREATE TABLE t1 (
        -> c1 INT NOT NULL AUTO_INCREMENT,
        -> PRIMARY KEY (c1)
        ->  ) ENGINE = InnoDB;
    mysql> INSERT INTO t1 VALUES(0), (0), (3);
    mysql> SELECT c1 FROM t1;
    +----+
    | c1 |
    +----+
    |  1 |
    |  2 |
    |  3 |
    +----+
    mysql> UPDATE t1 SET c1 = 4 WHERE c1 = 1;
    mysql> SELECT c1 FROM t1;
    +----+
    | c1 |
    +----+
    |  2 |
    |  3 |
    |  4 |
    +----+
    mysql> INSERT INTO t1 VALUES(0);
    mysql> SELECT c1 FROM t1;
    +----+
    | c1 |
    +----+
    |  2 |
    |  3 |
    |  4 |
    |  5 |
    +----+
                                                
InnoDB AUTO_INCREMENT Counter Initialization

This section describes how InnoDB initializes AUTO_INCREMENT counters.

If you specify an AUTO_INCREMENT column for an InnoDB table, the in-memory table object contains a special counter called the auto-increment counter that is used when assigning new values for the column.

In MySQL 5.7 and earlier, the auto-increment counter is stored only in main memory, not on disk. To initialize an auto-increment counter after a server restart, InnoDB would execute the equivalent of the following statement on the first insert into a table containing an AUTO_INCREMENT column.

SELECT MAX(ai_col) FROM table_name FOR UPDATE;
                                

In MySQL 8.0, this behavior is changed. The current maximum auto-increment counter value is written to the redo log each time it changes and is saved to an engine-private system table on each checkpoint. These changes make the current maximum auto-increment counter value persistent across server restarts.

On a server restart following a normal shutdown, InnoDB initializes the in-memory auto-increment counter using the current maximum auto-increment value stored in the data dictionary system table.

On a server restart during crash recovery, InnoDB initializes the in-memory auto-increment counter using the current maximum auto-increment value stored in the data dictionary system table and scans the redo log for auto-increment counter values written since the last checkpoint. If a redo-logged value is greater than the in-memory counter value, the redo-logged value is applied. However, in the case of a server crash, reuse of a previously allocated auto-increment value cannot be guaranteed. Each time the current maximum auto-increment value is changed due to an INSERT or UPDATE operation, the new value is written to the redo log, but if the crash occurs before the redo log is flushed to disk, the previously allocated value could be reused when the auto-increment counter is initialized after the server is restarted.

The only circumstance in which InnoDB uses the equivalent of a SELECT MAX(ai_col) FROM table_name FOR UPDATE statement in MySQL 8.0 and later to initialize an auto-increment counter is when importing a tablespace without a .cfg metadata file. Otherwise, the current maximum auto-increment counter value is read from the .cfg metadata file.

In MySQL 5.7 and earlier, a server restart cancels the effect of the AUTO_INCREMENT = N table option, which may be used in a CREATE TABLE or ALTER TABLE statement to set an initial counter value or alter the existing counter value, respectively. In MySQL 8.0, a server restart does not cancel the effect of the AUTO_INCREMENT = N table option. If you initialize the auto-increment counter to a specific value, or if you alter the auto-increment counter value to a larger value, the new value is persisted across server restarts.

Note

ALTER TABLE ... AUTO_INCREMENT = N can only change the auto-increment counter value to a value larger than the current maximum.

In MySQL 5.7 and earlier, a server restart immediately following a ROLLBACK operation could result in the reuse of auto-increment values that were previously allocated to the rolled-back transaction, effectively rolling back the current maximum auto-increment value. In MySQL 8.0, the current maximum auto-increment value is persisted, preventing the reuse of previously allocated values.

If a SHOW TABLE STATUS statement examines a table before the auto-increment counter is initialized, InnoDB opens the table and initializes the counter value using the current maximum auto-increment value that is stored in the data dictionary system table. The value is stored in memory for use by later inserts or updates. Initialization of the counter value uses a normal exclusive-locking read on the table which lasts to the end of the transaction. InnoDB follows the same procedure when initializing the auto-increment counter for a newly created table that has a user-specified auto-increment value that is greater than 0.

After the auto-increment counter is initialized, if you do not explicitly specify an auto-increment value when inserting a row, InnoDB implicitly increments the counter and assigns the new value to the column. If you insert a row that explicitly specifies an auto-increment column value, and the value is greater than the current maximum counter value, the counter is set to the specified value.

InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the auto-increment counter, as described earlier.

The auto_increment_offset configuration option determines the starting point for the AUTO_INCREMENT column value. The default setting is 1.

The auto_increment_increment configuration option controls the interval between successive column values. The default setting is 1.

15.6.1.5 InnoDB and FOREIGN KEY Constraints

How the InnoDB storage engine handles foreign key constraints is described under the following topics in this section:

For foreign key usage information and examples, see Section 13.1.20.6, “Using FOREIGN KEY Constraints” .

Foreign Key Definitions

Foreign key definitions for InnoDB tables are subject to the following conditions:

  • InnoDB permits a foreign key to reference any index column or group of columns. However, in the referenced table, there must be an index where the referenced columns are listed as the first columns in the same order.

  • InnoDB does not currently support foreign keys for tables with user-defined partitioning. This means that no user-partitioned InnoDB table may contain foreign key references or columns referenced by foreign keys.

  • InnoDB allows a foreign key constraint to reference a nonunique key. This is an InnoDB extension to standard SQL.

Referential Actions

Referential actions for foreign keys of InnoDB tables are subject to the following conditions:

  • While SET DEFAULT is allowed by the MySQL Server, it is rejected as invalid by InnoDB . CREATE TABLE and ALTER TABLE statements using this clause are not allowed for InnoDB tables.

  • If there are several rows in the parent table that have the same referenced key value, InnoDB acts in foreign key checks as if the other parent rows with the same key value do not exist. For example, if you have defined a RESTRICT type constraint, and there is a child row with several parent rows, InnoDB does not permit the deletion of any of those parent rows.

  • InnoDB performs cascading operations through a depth-first algorithm, based on records in the indexes corresponding to the foreign key constraints.

  • If ON UPDATE CASCADE or ON UPDATE SET NULL recurses to update the same table it has previously updated during the cascade, it acts like RESTRICT . This means that you cannot use self-referential ON UPDATE CASCADE or ON UPDATE SET NULL operations. This is to prevent infinite loops resulting from cascaded updates. A self-referential ON DELETE SET NULL , on the other hand, is possible, as is a self-referential ON DELETE CASCADE . Cascading operations may not be nested more than 15 levels deep.

  • Like MySQL in general, in an SQL statement that inserts, deletes, or updates many rows, InnoDB checks UNIQUE and FOREIGN KEY constraints row-by-row. When performing foreign key checks, InnoDB sets shared row-level locks on child or parent records it has to look at. InnoDB checks foreign key constraints immediately; the check is not deferred to transaction commit. According to the SQL standard, the default behavior should be deferred checking. That is, constraints are only checked after the entire SQL statement has been processed. Until InnoDB implements deferred constraint checking, some things are impossible, such as deleting a record that refers to itself using a foreign key.

Foreign Key Restrictions for Generated Columns and Virtual Indexes
  • A foreign key constraint on a stored generated column cannot use ON UPDATE CASCADE , ON DELETE SET NULL , ON UPDATE SET NULL , ON DELETE SET DEFAULT , or ON UPDATE SET DEFAULT .

  • A foreign key constraint cannot reference a virtual generated column .

  • Prior to MySQL 8.0, a foreign key constraint cannot reference a secondary index defined on a virtual generated column.

15.6.1.6 Limits on InnoDB Tables

Limits on InnoDB tables are described under the following topics in this section:

Warning

Before using NFS with InnoDB , review potential issues outlined in Using NFS with MySQL .

Maximums and Minimums
  • A table can contain a maximum of 1017 columns. Virtual generated columns are included in this limit.

  • A table can contain a maximum of 64 secondary indexes .

  • The index key prefix length limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format.

    The index key prefix length limit is 767 bytes for InnoDB tables that use REDUNDANT or COMPACT row format. For example, you might hit this limit with a column prefix index of more than 191 characters on a TEXT or VARCHAR column, assuming a utf8mb4 character set and the maximum of 4 bytes for each character.

    Attempting to use an index key prefix length that exceeds the limit returns an error.

    The limits that apply to index key prefixes also apply to full-column index keys.

  • If you reduce the InnoDB page size to 8KB or 4KB by specifying the innodb_page_size option when creating the MySQL instance, the maximum length of the index key is lowered proportionally, based on the limit of 3072 bytes for a 16KB page size. That is, the maximum index key length is 1536 bytes when the page size is 8KB, and 768 bytes when the page size is 4KB.

  • A maximum of 16 columns is permitted for multicolumn indexes. Exceeding the limit returns an error.

    ERROR 1070 (42000): Too many key parts specified; max 16 parts allowed
                                                
  • The maximum row length, except for variable-length columns ( VARBINARY , VARCHAR , BLOB and TEXT ), is slightly less than half of a page for 4KB, 8KB, 16KB, and 32KB page sizes. For example, the maximum row length for the default innodb_page_size of 16KB is about 8000 bytes. For an InnoDB page size of 64KB, the maximum row length is about 16000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

    If a row is less than half a page long, all of it is stored locally within the page. If it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page, as described in Section 15.11.2, “File Space Management” .

  • Although InnoDB supports row sizes larger than 65,535 bytes internally, MySQL itself imposes a row-size limit of 65,535 for the combined size of all columns:

    mysql> CREATE TABLE t (a VARCHAR(8000), b VARCHAR(10000),
        -> c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000),
        -> f VARCHAR(10000), g VARCHAR(10000)) ENGINE=InnoDB;
    ERROR 1118 (42000): Row size too large. The maximum row size for the
    used table type, not counting BLOBs, is 65535. You have to change some
    columns to TEXT or BLOBs
                                                

    See Section C.10.4, “Limits on Table Column Count and Row Size” .

  • On some older operating systems, files must be less than 2GB. This is not a limitation of InnoDB itself, but if you require a large tablespace, configure it using several smaller data files rather than one large data file.

  • The combined size of the InnoDB log files can be up to 512GB.

  • The minimum tablespace size is slightly larger than 10MB. The maximum tablespace size depends on the InnoDB page size.

    Table 15.3 InnoDB Maximum Tablespace Size

    InnoDB Page Size Maximum Tablespace Size
    4KB 16TB
    8KB 32TB
    16KB 64TB
    32KB 128TB
    64KB 256TB

    The maximum tablespace size is also the maximum size for a table.

  • The default page size in InnoDB is 16KB. You can increase or decrease the page size by configuring the innodb_page_size option when creating the MySQL instance.

    32KB and 64KB page sizes are supported, but ROW_FORMAT=COMPRESSED is unsupported for page sizes greater than 16KB. For both 32KB and 64KB page sizes, the maximum record size is 16KB. For innodb_page_size=32KB , extent size is 2MB. For innodb_page_size=64KB , extent size is 4MB.

    A MySQL instance using a particular InnoDB page size cannot use data files or log files from an instance that uses a different page size.

Restrictions on InnoDB Tables
  • ANALYZE TABLE determines index cardinality (as displayed in the Cardinality column of SHOW INDEX output) by performing random dives on each of the index trees and updating index cardinality estimates accordingly. Because these are only estimates, repeated runs of ANALYZE TABLE could produce different numbers. This makes ANALYZE TABLE fast on InnoDB tables but not 100% accurate because it does not take all rows into account.

    You can make the statistics collected by ANALYZE TABLE more precise and more stable by turning on the innodb_stats_persistent configuration option, as explained in Section 15.8.10.1, “Configuring Persistent Optimizer Statistics Parameters” . When that setting is enabled, it is important to run ANALYZE TABLE after major changes to indexed column data, because the statistics are not recalculated periodically (such as after a server restart).

    If the persistent statistics setting is enabled, you can change the number of random dives by modifying the innodb_stats_persistent_sample_pages system variable. If the persistent statistics setting is disabled, modify the innodb_stats_transient_sample_pages system variable instead.

    MySQL uses index cardinality estimates in join optimization. If a join is not optimized in the right way, try using ANALYZE TABLE . In the few cases that ANALYZE TABLE does not produce values good enough for your particular tables, you can use FORCE INDEX with your queries to force the use of a particular index, or set the max_seeks_for_key system variable to ensure that MySQL prefers index lookups over table scans. See Section B.6.5, “Optimizer-Related Issues” .

  • If statements or transactions are running on a table, and ANALYZE TABLE is run on the same table followed by a second ANALYZE TABLE operation, the second ANALYZE TABLE operation is blocked until the statements or transactions are completed. This behavior occurs because ANALYZE TABLE marks the currently loaded table definition as obsolete when ANALYZE TABLE is finished running. New statements or transactions (including a second ANALYZE TABLE statement) must load the new table definition into the table cache, which cannot occur until currently running statements or transactions are completed and the old table definition is purged. Loading multiple concurrent table definitions is not supported.

  • SHOW TABLE STATUS does not give accurate statistics on InnoDB tables except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization.

  • InnoDB does not keep an internal count of rows in a table because concurrent transactions might see different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

    For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate (GROUP BY) Function Descriptions” .

  • On Windows, InnoDB always stores database and table names internally in lowercase. To move databases in a binary format from Unix to Windows or from Windows to Unix, create all databases and tables using lowercase names.

  • An AUTO_INCREMENT column ai_col must be defined as part of an index such that it is possible to perform the equivalent of an indexed SELECT MAX( ai_col ) lookup on the table to obtain the maximum column value. Typically, this is achieved by making the column the first column of some table index.

  • InnoDB sets an exclusive lock on the end of the index associated with the AUTO_INCREMENT column while initializing a previously specified AUTO_INCREMENT column on a table.

    With innodb_autoinc_lock_mode=0 , InnoDB uses a special AUTO-INC table lock mode where the lock is obtained and held to the end of the current SQL statement while accessing the auto-increment counter. Other clients cannot insert into the table while the AUTO-INC table lock is held. The same behavior occurs for bulk inserts with innodb_autoinc_lock_mode=1 . Table-level AUTO-INC locks are not used with innodb_autoinc_lock_mode=2 . For more information, See Section 15.6.1.4, “AUTO_INCREMENT Handling in InnoDB” .

  • When an AUTO_INCREMENT integer column runs out of values, a subsequent INSERT operation returns a duplicate-key error. This is general MySQL behavior.

  • DELETE FROM tbl_name does not regenerate the table but instead deletes all rows, one by one.

  • Cascaded foreign key actions do not activate triggers.

  • You cannot create a table with a column name that matches the name of an internal InnoDB column (including DB_ROW_ID , DB_TRX_ID , DB_ROLL_PTR , and DB_MIX_ID ). This restriction applies to use of the names in any letter case.

    mysql> CREATE TABLE t1 (c1 INT, db_row_id INT) ENGINE=INNODB;
    ERROR 1166 (42000): Incorrect column name 'db_row_id'
                                                
Locking and Transactions
  • LOCK TABLES acquires two locks on each table if innodb_table_locks=1 (the default). In addition to a table lock on the MySQL layer, it also acquires an InnoDB table lock. Versions of MySQL before 4.1.2 did not acquire InnoDB table locks; the old behavior can be selected by setting innodb_table_locks=0 . If no InnoDB table lock is acquired, LOCK TABLES completes even if some records of the tables are being locked by other transactions.

    In MySQL 8.0, innodb_table_locks=0 has no effect for tables locked explicitly with LOCK TABLES ... WRITE . It does have an effect for tables locked for read or write by LOCK TABLES ... WRITE implicitly (for example, through triggers) or by LOCK TABLES ... READ .

  • All InnoDB locks held by a transaction are released when the transaction is committed or aborted. Thus, it does not make much sense to invoke LOCK TABLES on InnoDB tables in autocommit=1 mode because the acquired InnoDB table locks would be released immediately.

  • You cannot lock additional tables in the middle of a transaction because LOCK TABLES performs an implicit COMMIT and UNLOCK TABLES .

  • For limits associated with concurrent read-write transactions, see Section 15.6.6, “Undo Logs” .

15.6.2 Indexes

This section covers topics related to InnoDB indexes.

15.6.2.1 Clustered and Secondary Indexes

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key . To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

  • When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.

  • If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

  • If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

How the Clustered Index Speeds Up Queries

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

How Secondary Indexes Relate to the Clustered Index

All indexes other than the clustered index are known as secondary indexes . In InnoDB , each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.

If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

For guidelines to take advantage of InnoDB clustered and secondary indexes, see Section 8.3, “Optimization and Indexes” .

15.6.2.2 The Physical Structure of an InnoDB Index

With the exception of spatial indexes, InnoDB indexes are B-tree data structures. Spatial indexes use R-trees , which are specialized data structures for indexing multi-dimensional data. Index records are stored in the leaf pages of their B-tree or R-tree data structure. The default size of an index page is 16KB.

When new records are inserted into an InnoDB clustered index , InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full.

InnoDB performs a bulk load when creating or rebuilding B-tree indexes. This method of index creation is known as a sorted index build. The innodb_fill_factor configuration option defines the percentage of space on each B-tree page that is filled during a sorted index build, with the remaining space reserved for future index growth. Sorted index builds are not supported for spatial indexes. For more information, see Section 15.6.2.3, “Sorted Index Builds” . An innodb_fill_factor setting of 100 leaves 1/16 of the space in clustered index pages free for future index growth.

If the fill factor of an InnoDB index page drops below the MERGE_THRESHOLD , which is 50% by default if not specified, InnoDB tries to contract the index tree to free the page. The MERGE_THRESHOLD setting applies to both B-tree and R-tree indexes. For more information, see Section 15.8.11, “Configuring the Merge Threshold for Index Pages” .

You can define the page size for all InnoDB tablespaces in a MySQL instance by setting the innodb_page_size configuration option prior to initializing the MySQL instance. Once the page size for an instance is defined, you cannot change it without reinitializing the instance. Supported sizes are 64KB, 32KB, 16KB (default), 8KB, and 4KB.

A MySQL instance using a particular InnoDB page size cannot use data files or log files from an instance that uses a different page size.

15.6.2.3 Sorted Index Builds

InnoDB performs a bulk load instead of inserting one index record at a time when creating or rebuilding indexes. This method of index creation is also known as a sorted index build. Sorted index builds are not supported for spatial indexes.

There are three phases to an index build. In the first phase, the clustered index is scanned, and index entries are generated and added to the sort buffer. When the sort buffer becomes full, entries are sorted and written out to a temporary intermediate file. This process is also known as a run . In the second phase, with one or more runs written to the temporary intermediate file, a merge sort is performed on all entries in the file. In the third and final phase, the sorted entries are inserted into the B-tree .

Prior to the introduction of sorted index builds, index entries were inserted into the B-tree one record at a time using insert APIs. This method involved opening a B-tree cursor to find the insert position and then inserting entries into a B-tree page using an optimistic insert. If an insert failed due to a page being full, a pessimistic insert would be performed, which involves opening a B-tree cursor and splitting and merging B-tree nodes as necessary to find space for the entry. The drawbacks of this top-down method of building an index are the cost of searching for an insert position and the constant splitting and merging of B-tree nodes.

Sorted index builds use a bottom-up approach to building an index. With this approach, a reference to the right-most leaf page is held at all levels of the B-tree. The right-most leaf page at the necessary B-tree depth is allocated and entries are inserted according to their sorted order. Once a leaf page is full, a node pointer is appended to the parent page and a sibling leaf page is allocated for the next insert. This process continues until all entries are inserted, which may result in inserts up to the root level. When a sibling page is allocated, the reference to the previously pinned leaf page is released, and the newly allocated leaf page becomes the right-most leaf page and new default insert location.

Reserving B-tree Page Space for Future Index Growth

To set aside space for future index growth, you can use the innodb_fill_factor configuration option to reserve a percentage of B-tree page space. For example, setting innodb_fill_factor to 80 reserves 20 percent of the space in B-tree pages during a sorted index build. This setting applies to both B-tree leaf and non-leaf pages. It does not apply to external pages used for TEXT or BLOB entries. The amount of space that is reserved may not be exactly as configured, as the innodb_fill_factor value is interpreted as a hint rather than a hard limit.

Sorted Index Builds and Full-Text Index Support

Sorted index builds are supported for fulltext indexes . Previously, SQL was used to insert entries into a fulltext index.

Sorted Index Builds and Compressed Tables

For compressed tables , the previous index creation method appended entries to both compressed and uncompressed pages. When the modification log (representing free space on the compressed page) became full, the compressed page would be recompressed. If compression failed due to a lack of space, the page would be split. With sorted index builds, entries are only appended to uncompressed pages. When an uncompressed page becomes full, it is compressed. Adaptive padding is used to ensure that compression succeeds in most cases, but if compression fails, the page is split and compression is attempted again. This process continues until compression is successful. For more information about compression of B-Tree pages, see Section 15.9.1.5, “How Compression Works for InnoDB Tables” .

Sorted Index Builds and Redo Logging

Redo logging is disabled during a sorted index build. Instead, there is a checkpoint to ensure that the index build can withstand a crash or failure. The checkpoint forces a write of all dirty pages to disk. During a sorted index build, the page cleaner thread is signaled periodically to flush dirty pages to ensure that the checkpoint operation can be processed quickly. Normally, the page cleaner thread flushes dirty pages when the number of clean pages falls below a set threshold. For sorted index builds, dirty pages are flushed promptly to reduce checkpoint overhead and to parallelize I/O and CPU activity.

Sorted Index Builds and Optimizer Statistics

Sorted index builds may result in optimizer statistics that differ from those generated by the previous method of index creation. The difference in statistics, which is not expected to affect workload performance, is due to the different algorithm used to populate the index.

15.6.2.4 InnoDB FULLTEXT Indexes

FULLTEXT indexes are created on text-based columns ( CHAR , VARCHAR , or TEXT columns) to help speed up queries and DML operations on data contained within those columns, omitting any words that are defined as stopwords.

A FULLTEXT index is defined as part of a CREATE TABLE statement or added to an existing table using ALTER TABLE or CREATE INDEX .

Full-text search is performed using MATCH() ... AGAINST syntax. For usage information, see Section 12.9, “Full-Text Search Functions” .

InnoDB FULLTEXT indexes are described under the following topics in this section:

InnoDB Full-Text Index Design

InnoDB FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

InnoDB Full-Text Index Tables

When creating an InnoDB FULLTEXT index, a set of index tables is created, as shown in the following example:

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200),
       FULLTEXT idx (opening_line)
       ) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_TABLES
       WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name                                               | space |
+----------+----------------------------------------------------+-------+
|      333 | test/fts_0000000000000147_00000000000001c9_index_1 |   289 |
|      334 | test/fts_0000000000000147_00000000000001c9_index_2 |   290 |
|      335 | test/fts_0000000000000147_00000000000001c9_index_3 |   291 |
|      336 | test/fts_0000000000000147_00000000000001c9_index_4 |   292 |
|      337 | test/fts_0000000000000147_00000000000001c9_index_5 |   293 |
|      338 | test/fts_0000000000000147_00000000000001c9_index_6 |   294 |
|      330 | test/fts_0000000000000147_being_deleted            |   286 |
|      331 | test/fts_0000000000000147_being_deleted_cache      |   287 |
|      332 | test/fts_0000000000000147_config                   |   288 |
|      328 | test/fts_0000000000000147_deleted                  |   284 |
|      329 | test/fts_0000000000000147_deleted_cache            |   285 |
|      327 | test/opening_lines                                 |   283 |
+----------+----------------------------------------------------+-------+
                                

The first six tables represent the inverted index and are referred to as auxiliary index tables. When incoming documents are tokenized, the individual words (also referred to as tokens ) are inserted into the index tables along with position information and the associated Document ID ( DOC_ID ). The words are fully sorted and partitioned among the six index tables based on the character set sort weight of the word's first character.

The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree option. Consider increasing the number of threads when creating FULLTEXT indexes on large tables.

Auxiliary index table names are prefixed with fts_ and postfixed with index_* . Each index table is associated with the indexed table by a hex value in the index table name that matches the table_id of the indexed table. For example, the table_id of the test/opening_lines table is 327 , for which the hex value is 0x147. As shown in the preceding example, the 147 hex value appears in the names of index tables that are associated with the test/opening_lines table.

A hex value representing the index_id of the FULLTEXT index also appears in auxiliary index table names. For example, in the auxiliary table name test/fts_0000000000000147_00000000000001c9_index_1 , the hex value 1c9 has a decimal value of 457. The index defined on the opening_lines table ( idx ) can be identified by querying the INFORMATION_SCHEMA.INNODB_INDEXES table for this value (457).

mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_INDEXES
       WHERE index_id=457;
+----------+------+----------+-------+
| index_id | name | table_id | space |
+----------+------+----------+-------+
|      457 | idx  |      327 |   283 |
+----------+------+----------+-------+
                                

Index tables are stored in their own tablespace if the primary table is created in a file-per-table tablespace.

The other index tables shown in the preceding example are referred to as common index tables and are used for deletion handling and storing the internal state of FULLTEXT indexes. Unlike the inverted index tables, which are created for each full-text index, this set of tables is common to all full-text indexes created on a particular table.

Common auxiliary tables are retained even if full-text indexes are dropped. When a full-text index is dropped, the FTS_DOC_ID column that was created for the index is retained, as removing the FTS_DOC_ID column would require rebuilding the table. Common axillary tables are required to manage the FTS_DOC_ID column.

  • fts_*_deleted and fts_*_deleted_cache

    Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The fts_*_deleted_cache is the in-memory version of the fts_*_deleted table.

  • fts_*_being_deleted and fts_*_being_deleted_cache

    Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The fts_*_being_deleted_cache table is the in-memory version of the fts_*_being_deleted table.

  • fts_*_config

    Stores information about the internal state of the FULLTEXT index. Most importantly, it stores the FTS_SYNCED_DOC_ID , which identifies documents that have been parsed and flushed to disk. In case of crash recovery, FTS_SYNCED_DOC_ID values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to the FULLTEXT index cache. To view the data in this table, query the INFORMATION_SCHEMA.INNODB_FT_CONFIG table.

InnoDB Full-Text Index Cache

When a document is inserted, it is tokenized, and the individual words and associated data are inserted into the FULLTEXT index. This process, even for small documents, could result in numerous small insertions into the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, InnoDB uses a FULLTEXT index cache to temporarily cache index table insertions for recently inserted rows. This in-memory cache structure holds insertions until the cache is full and then batch flushes them to disk (to the auxiliary index tables). You can query the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table to view tokenized data for recently inserted rows.

The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.

The innodb_ft_cache_size variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the innodb_ft_total_cache_size option.

The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.

InnoDB Full-Text Index Document ID and FTS_DOC_ID Column

InnoDB uses a unique document identifier referred to as a Document ID ( DOC_ID ) to map words in the full-text index to document records where the word appears. The mapping requires an FTS_DOC_ID column on the indexed table. If an FTS_DOC_ID column is not defined, InnoDB automatically adds a hidden FTS_DOC_ID column when the full-text index is created. The following example demonstrates this behavior.

The following table definition does not include an FTS_DOC_ID column:

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200)
       ) ENGINE=InnoDB;
                                

When you create a full-text index on the table using CREATE FULLTEXT INDEX syntax, a warning is returned which reports that InnoDB is rebuilding the table to add the FTS_DOC_ID column.

mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 1
mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------------+
| Level   | Code | Message                                          |
+---------+------+--------------------------------------------------+
| Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
+---------+------+--------------------------------------------------+
                                

The same warning is returned when using ALTER TABLE to add a full-text index to a table that does not have an FTS_DOC_ID column. If you create a full-text index at CREATE TABLE time and do not specify an FTS_DOC_ID column, InnoDB adds a hidden FTS_DOC_ID column, without warning.

Defining an FTS_DOC_ID column at CREATE TABLE time is less expensive than creating a full-text index on a table that is already loaded with data. If an FTS_DOC_ID column is defined on a table prior to loading data, the table and its indexes do not have to be rebuilt to add the new column. If you are not concerned with CREATE FULLTEXT INDEX performance, leave out the FTS_DOC_ID column to have InnoDB create it for you. InnoDB creates a hidden FTS_DOC_ID column along with a unique index ( FTS_DOC_ID_INDEX ) on the FTS_DOC_ID column. If you want to create your own FTS_DOC_ID column, the column must be defined as BIGINT UNSIGNED NOT NULL and named FTS_DOC_ID (all upper case), as in the following example:

Note

The FTS_DOC_ID column does not need to be defined as an AUTO_INCREMENT column, but AUTO_INCREMENT could make loading data easier.

mysql> CREATE TABLE opening_lines (
       FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200)
       ) ENGINE=InnoDB;
                                

If you choose to define the FTS_DOC_ID column yourself, you are responsible for managing the column to avoid empty or duplicate values. FTS_DOC_ID values cannot be reused, which means FTS_DOC_ID values must be ever increasing.

Optionally, you can create the required unique FTS_DOC_ID_INDEX (all upper case) on the FTS_DOC_ID column.

mysql> CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);
                                

If you do not create the FTS_DOC_ID_INDEX , InnoDB creates it automatically.

Note

FTS_DOC_ID_INDEX cannot be defined as a descending index because the InnoDB SQL parser does not use descending indexes.

The permitted gap between the largest used FTS_DOC_ID value and new FTS_DOC_ID value is 65535.

To avoid rebuilding the table, the FTS_DOC_ID column is retained when dropping a full-text index.

InnoDB Full-Text Index Deletion Handling

Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the Document ID ( DOC_ID ) of a deleted document is logged in a special FTS_*_DELETED table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the FTS_*_DELETED table is used to filter out deleted Document IDs. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index. For more information, see Optimizing InnoDB Full-Text Indexes .

InnoDB Full-Text Index Transaction Handling

InnoDB FULLTEXT indexes have special transaction handling characteristics due its caching and batch processing behavior. Specifically, updates and insertions on a FULLTEXT index are processed at transaction commit time, which means that a FULLTEXT search can only see committed data. The following example demonstrates this behavior. The FULLTEXT search only returns a result after the inserted lines are committed.

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200),
       FULLTEXT idx (opening_line)
       ) ENGINE=InnoDB;
mysql> BEGIN;
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
       ('Call me Ishmael.','Herman Melville','Moby-Dick'),
       ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
       ('I am an invisible man.','Ralph Ellison','Invisible Man'),
       ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
       ('It was love at first sight.','Joseph Heller','Catch-22'),
       ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
       ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
       ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
mysql> COMMIT;
mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+
                                
Monitoring InnoDB Full-Text Indexes

You can monitor and examine the special text-processing aspects of InnoDB FULLTEXT indexes by querying the following INFORMATION_SCHEMA tables:

You can also view basic information for FULLTEXT indexes and tables by querying INNODB_INDEXES and INNODB_TABLES .

更多信息,请见 Section 15.14.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables” .

15.6.3 Tablespaces

This section covers topics related to InnoDB tablespaces.

15.6.3.1 The System Tablespace

The InnoDB system tablespace is the storage area for the doublewrite buffer and the change buffer. The system tablespace also contains table and index data for user-created tables created in the system tablespace. In previous releases, the system tablespace contained the InnoDB data dictionary. In MySQL 8.0, InnoDB stores metadata in the MySQL data dictionary. See Chapter 14, MySQL Data Dictionary .

The system tablespace can have one or more data files. By default, one system tablespace data file, named ibdata1 , is created in the data directory. The size and number of system tablespace data files is controlled by the innodb_data_file_path startup option. For related information, see System Tablespace Data File Configuration .

Resizing the System Tablespace

This section describes how to increase or decrease the size of the InnoDB system tablespace.

Increasing the Size of the InnoDB System Tablespace

The easiest way to increase the size of the InnoDB system tablespace is to configure it from the beginning to be auto-extending. Specify the autoextend attribute for the last data file in the tablespace definition. Then InnoDB increases the size of that file automatically in 64MB increments when it runs out of space. The increment size can be changed by setting the value of the innodb_autoextend_increment system variable, which is measured in megabytes.

You can expand the system tablespace by a defined amount by adding another data file:

  1. Shut down the MySQL server.

  2. If the previous last data file is defined with the keyword autoextend , change its definition to use a fixed size, based on how large it has actually grown. Check the size of the data file, round it down to the closest multiple of 1024 × 1024 bytes (= 1MB), and specify this rounded size explicitly in innodb_data_file_path .

  3. Add a new data file to the end of innodb_data_file_path , optionally making that file auto-extending. Only the last data file in the innodb_data_file_path can be specified as auto-extending.

  4. Start the MySQL server again.

For example, this tablespace has just one auto-extending data file ibdata1 :

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:10M:autoextend
                                

Suppose that this data file, over time, has grown to 988MB. Here is the configuration line after modifying the original data file to use a fixed size and adding a new auto-extending data file:

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend
                                

When you add a new data file to the system tablespace configuration, make sure that the filename does not refer to an existing file. InnoDB creates and initializes the file when you restart the server.

Decreasing the Size of the InnoDB System Tablespace

You cannot remove a data file from the system tablespace. To decrease the system tablespace size, use this procedure:

  1. Use mysqldump to dump all your InnoDB tables, including InnoDB tables located in the MySQL database.

    mysql> SELECT TABLE_NAME from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='mysql' and ENGINE='InnoDB';
    +---------------------------+
    | TABLE_NAME                |
    +---------------------------+
    | columns_priv              |
    | component                 |
    | db                        |
    | default_roles             |
    | engine_cost               |
    | func                      |
    | global_grants             |
    | gtid_executed             |
    | help_category             |
    | help_keyword              |
    | help_relation             |
    | help_topic                |
    | innodb_dynamic_metadata   |
    | innodb_index_stats        |
    | innodb_table_stats        |
    | plugin                    |
    | procs_priv                |
    | proxies_priv              |
    | role_edges                |
    | server_cost               |
    | servers                   |
    | slave_master_info         |
    | slave_relay_log_info      |
    | slave_worker_info         |
    | tables_priv               |
    | time_zone                 |
    | time_zone_leap_second     |
    | time_zone_name            |
    | time_zone_transition      |
    | time_zone_transition_type |
    | user                      |
    +---------------------------+
                                                
  2. Stop the server.

  3. Remove all the existing tablespace files ( *.ibd ), including the ibdata and ib_log files. Do not forget to remove *.ibd files for tables located in the MySQL database.

  4. Configure a new tablespace.

  5. Restart the server.

  6. Import the dump files.

Note

If your databases only use the InnoDB engine, it may be simpler to dump all databases, stop the server, remove all databases and InnoDB log files, restart the server, and import the dump files.

Using Raw Disk Partitions for the System Tablespace

You can use raw disk partitions as data files in the InnoDB system tablespace . This technique enables nonbuffered I/O on Windows and on some Linux and Unix systems without file system overhead. Perform tests with and without raw partitions to verify whether this change actually improves performance on your system.

When you use a raw disk partition, ensure that the user ID that runs the MySQL server has read and write privileges for that partition. For example, if you run the server as the mysql user, the partition must be readable and writeable by mysql . If you run the server with the --memlock option, the server must be run as root , so the partition must be readable and writeable by root .

The procedures described below involve option file modification. For additional information, see Section 4.2.7, “Using Option Files” .

Allocating a Raw Disk Partition on Linux and Unix Systems
  1. When you create a new data file, specify the keyword newraw immediately after the data file size for the innodb_data_file_path option. The partition must be at least as large as the size that you specify. Note that 1MB in InnoDB is 1024 × 1024 bytes, whereas 1MB in disk specifications usually means 1,000,000 bytes.

    [mysqld]
    innodb_data_home_dir=
    innodb_data_file_path=/dev/hdd1:3Gnewraw;/dev/hdd2:2Gnewraw
                                                
  2. Restart the server. InnoDB notices the newraw keyword and initializes the new partition. However, do not create or change any InnoDB tables yet. Otherwise, when you next restart the server, InnoDB reinitializes the partition and your changes are lost. (As a safety measure InnoDB prevents users from modifying data when any partition with newraw is specified.)

  3. After InnoDB has initialized the new partition, stop the server, change newraw in the data file specification to raw :

    [mysqld]
    innodb_data_home_dir=
    innodb_data_file_path=/dev/hdd1:3Graw;/dev/hdd2:2Graw
                                                
  4. Restart the server. InnoDB now permits changes to be made.

Allocating a Raw Disk Partition on Windows

On Windows systems, the same steps and accompanying guidelines described for Linux and Unix systems apply except that the innodb_data_file_path setting differs slightly on Windows.

  1. When you create a new data file, specify the keyword newraw immediately after the data file size for the innodb_data_file_path option:

    [mysqld]
    innodb_data_home_dir=
    innodb_data_file_path=//./D::10Gnewraw
                                                

    The //./ corresponds to the Windows syntax of \\.\ for accessing physical drives. In the example above, D: is the drive letter of the partition.

  2. Restart the server. InnoDB notices the newraw keyword and initializes the new partition.

  3. After InnoDB has initialized the new partition, stop the server, change newraw in the data file specification to raw :

    [mysqld]
    innodb_data_home_dir=
    innodb_data_file_path=//./D::10Graw
                                                
  4. Restart the server. InnoDB now permits changes to be made.

15.6.3.2 File-Per-Table Tablespaces

Historically, InnoDB tables were stored in the system tablespace . This monolithic approach was targeted at machines dedicated to database processing, with carefully planned data growth, where any disk storage allocated to MySQL would never be needed for other purposes. The file-per-table tablespace feature provides a more flexible alternative, where each InnoDB table is stored in its own tablespace data file ( .ibd file). This feature is controlled by the innodb_file_per_table configuration option, which is enabled by default.

Advantages
  • You can reclaim disk space when truncating or dropping a table stored in a file-per-table tablepace. Truncating or dropping tables stored in the shared system tablespace creates free space internally in the system tablespace data files ( ibdata files ) which can only be used for new InnoDB data.

    Similarly, a table-copying ALTER TABLE operation on table that resides in a shared tablespace can increase the amount of space used by the tablespace. Such operations may require as much additional space as the data in the table plus indexes. The additional space required for the table-copying ALTER TABLE operation is not released back to the operating system as it is for file-per-table tablespaces.

  • The TRUNCATE TABLE operation is faster when run on tables stored in file-per-table tablepaces.

  • You can store specific tables on separate storage devices, for I/O optimization, space management, or backup purposes by specifying the location of each table using the syntax CREATE TABLE ... DATA DIRECTORY = absolute_path_to_directory , as explained in Section 15.6.3.6, “Creating a Tablespace Outside of the Data Directory” .

  • You can run OPTIMIZE TABLE to compact or recreate a file-per-table tablespace. When you run an OPTIMIZE TABLE , InnoDB creates a new .ibd file with a temporary name, using only the space required to store actual data. When the optimization is complete, InnoDB removes the old .ibd file and replaces it with the new one. If the previous .ibd file grew significantly but the actual data only accounted for a portion of its size, running OPTIMIZE TABLE can reclaim the unused space.

  • You can move individual InnoDB tables rather than entire databases.

  • You can copy individual InnoDB tables from one MySQL instance to another (known as the transportable tablespace feature).

  • Tables created in file-per-table tablespaces support features associated with compressed and dynamic row formats.

  • You can enable more efficient storage for tables with large BLOB or TEXT columns using the dynamic row format .

  • File-per-table tablespaces may improve chances for a successful recovery and save time when a corruption occurs, when a server cannot be restarted, or when backup and binary logs are unavailable.

  • You can back up or restore individual tables quickly using the MySQL Enterprise Backup product, without interrupting the use of other InnoDB tables. This is beneficial if you have tables that require backup less frequently or on a different backup schedule. See Making a Partial Backup for details.

  • File-per-table tablespaces are convenient for per-table status reporting when copying or backing up tables.

  • You can monitor table size at a file system level without accessing MySQL.

  • Common Linux file systems do not permit concurrent writes to a single file when innodb_flush_method is set to O_DIRECT . As a result, there are possible performance improvements when using file-per-table tablespaces in conjunction with innodb_flush_method .

  • The system tablespace stores the data dictionary and undo logs, and is limited in size by InnoDB tablespace size limits. See Section 15.6.1.6, “Limits on InnoDB Tables” . With file-per-table tablespaces, each table has its own tablespace, which provides room for growth.

Potential Disadvantages
  • With file-per-table tablespaces, each table may have unused space, which can only be utilized by rows of the same table. This could lead to wasted space if not properly managed.

  • fsync operations must run on each open table rather than on a single file. Because there is a separate fsync operation for each file, write operations on multiple tables cannot be combined into a single I/O operation. This may require InnoDB to perform a higher total number of fsync operations.

  • mysqld must keep one open file handle per table, which may impact performance if you have numerous tables in file-per-table tablespaces.

  • More file descriptors are used.

  • innodb_file_per_table is enabled by default in MySQL 5.6 and higher. You may consider disabling it if backward compatibility with earlier versions of MySQL is a concern.

  • If many tables are growing there is potential for more fragmentation which can impede DROP TABLE and table scan performance. However, when fragmentation is managed, having files in their own tablespace can improve performance.

  • The buffer pool is scanned when dropping a file-per-table tablespace, which can take several seconds for buffer pools that are tens of gigabytes in size. The scan is performed with a broad internal lock, which may delay other operations. Tables in the system tablespace are not affected.

  • The innodb_autoextend_increment variable, which defines increment size (in MB) for extending the size of an auto-extending shared tablespace file when it becomes full, does not apply to file-per-table tablespace files, which are auto-extending regardless of the innodb_autoextend_increment setting. The initial extensions are by small amounts, after which extensions occur in increments of 4MB.

Enabling File-Per-Table Tablespaces

The innodb_file_per_table option is enabled by default.

To set the innodb_file_per_table option at startup, start the server with the --innodb_file_per_table command-line option, or add this line to the [mysqld] section of my.cnf :

[mysqld]
innodb_file_per_table=1
                                

You can also set innodb_file_per_table dynamically, while the server is running:

mysql> SET GLOBAL innodb_file_per_table=1;
                                

With innodb_file_per_table enabled, you can store InnoDB tables in a tbl_name .ibd file. Unlike the MyISAM storage engine, with its separate tbl_name .MYD and tbl_name .MYI files for indexes and data, InnoDB stores the data and the indexes together in a single .ibd file.

If you disable innodb_file_per_table in your startup options and restart the server, or disable it with the SET GLOBAL command, InnoDB creates new tables inside the system tablespace unless you have explicitly placed the table in file-per-table tablespace or general tablespace using the CREATE TABLE ... TABLESPACE option.

You can always read and write any InnoDB tables, regardless of the file-per-table setting.

To move a table from the system tablespace to its own tablespace, change the innodb_file_per_table setting and rebuild the table:

mysql> SET GLOBAL innodb_file_per_table=1;
mysql> ALTER TABLE table_name ENGINE=InnoDB;
                                

Tables added to the system tablespace using CREATE TABLE ... TABLESPACE or ALTER TABLE ... TABLESPACE syntax are not affected by the innodb_file_per_table setting. To move these tables from the system tablespace to a file-per-table tablespace, they must be moved explicitly using ALTER TABLE ... TABLESPACE syntax.

Note

InnoDB always needs the system tablespace because it puts its internal data dictionary and undo logs there. The .ibd files are not sufficient for InnoDB to operate.

When a table is moved out of the system tablespace into its own .ibd file, the data files that make up the system tablespace remain the same size. The space formerly occupied by the table can be reused for new InnoDB data, but is not reclaimed for use by the operating system. When moving large InnoDB tables out of the system tablespace, where disk space is limited, you may prefer to enable innodb_file_per_table and recreate the entire instance using the mysqldump command. As mentioned above, tables added to the system tablespace using CREATE TABLE ... TABLESPACE or ALTER TABLE ... TABLESPACE syntax are not affected by the innodb_file_per_table setting. These tables must be moved individually.

15.6.3.3 General Tablespaces

A general tablespace is a shared InnoDB tablespace that is created using CREATE TABLESPACE syntax. General tablespace capabilities and features are described under the following topics in this section:

General Tablespace Capabilities

The general tablespace feature provides the following capabilities:

  • Similar to the system tablespace, general tablespaces are shared tablespaces that can store data for multiple tables.

  • General tablespaces have a potential memory advantage over file-per-table tablespaces . The server keeps tablespace metadata in memory for the lifetime of a tablespace. Multiple tables in fewer general tablespaces consume less memory for tablespace metadata than the same number of tables in separate file-per-table tablespaces.

  • General tablespace data files may be placed in a directory relative to or independent of the MySQL data directory, which provides you with many of the data file and storage management capabilities of file-per-table tablespaces . As with file-per-table tablespaces, the ability to place data files outside of the MySQL data directory allows you to manage performance of critical tables separately, setup RAID or DRBD for specific tables, or bind tables to particular disks, for example.

  • General tablespaces support both Antelope and Barracuda file formats, and therefore support all table row formats and associated features. With support for both file formats, general tablespaces have no dependence on innodb_file_format or innodb_file_per_table settings, nor do these variables have any effect on general tablespaces.

  • The TABLESPACE option can be used with CREATE TABLE to create tables in a general tablespaces, file-per-table tablespace, or in the system tablespace.

  • The TABLESPACE option can be used with ALTER TABLE to move tables between general tablespaces, file-per-table tablespaces, and the system tablespace. Previously, it was not possible to move a table from a file-per-table tablespace to the system tablespace. With the general tablespace feature, you can now do so.

Creating a General Tablespace

General tablespaces are created using CREATE TABLESPACE syntax.

CREATE TABLESPACE tablespace_name
    [ADD DATAFILE 'file_name']
    [FILE_BLOCK_SIZE = value]
        [ENGINE [=] engine_name]
                                

A general tablespace can be created in the data directory or outside of it. To avoid conflicts with implicitly created file-per-table tablespaces, creating a general tablespace in a subdirectory under the data directory is not supported. When creating a general tablespace outside of the data directory, the directory must exist and must be known to InnoDB prior to creating the tablespace. To make an unknown directory known to InnoDB , add the directory to the innodb_directories argument value. innodb_directories is a read-only startup option. Configuring it requires restarting the server.

Examples:

Creating a general tablespace in the data directory:

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;
                                

or

mysql> CREATE TABLESPACE `ts1` Engine=InnoDB;
                                

The ADD DATAFILE clause is optional as of MySQL 8.0.14 and required before that. If the ADD DATAFILE clause is not specified when creating a tablespace, a tablespace data file with a unique file name is created implicitly. The unique file name is a 128 bit UUID formatted into five groups of hexadecimal numbers separated by dashes ( aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee ). General tablespace data files include an .ibd file extension. In a replication environment, the data file name created on the master is not the same as the data file name created on the slave.

Creating a general tablespace in a directory outside of the data directory:

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE '/my/tablespace/directory/ts1.ibd' Engine=InnoDB;
                                

You can specify a path that is relative to the data directory as long as the tablespace directory is not under the data directory. In this example, the my_tablespace directory is at the same level as the data directory:

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE '../my_tablespace/ts1.ibd' Engine=InnoDB;
                                
Note

The ENGINE = InnoDB clause must be defined as part of the CREATE TABLESPACE statement, or InnoDB must be defined as the default storage engine ( default_storage_engine=InnoDB ).

Adding Tables to a General Tablespace

After creating an InnoDB general tablespace, you can use CREATE TABLE tbl_name ... TABLESPACE [=] tablespace_name or ALTER TABLE tbl_name TABLESPACE [=] tablespace_name to add tables to the tablespace, as shown in the following examples:

CREATE TABLE :

mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts1;
                                

ALTER TABLE :

mysql> ALTER TABLE t2 TABLESPACE ts1;
                                
Note

Support for adding table partitions to shared tablespaces was deprecated in MySQL 5.7.24 and removed in MySQL 8.0.13. Shared tablespaces include the InnoDB system tablespace and general tablespaces.

For detailed syntax information, see CREATE TABLE and ALTER TABLE .

General Tablespace Row Format Support

General tablespaces support all table row formats ( REDUNDANT , COMPACT , DYNAMIC , COMPRESSED ) with the caveat that compressed and uncompressed tables cannot coexist in the same general tablespace due to different physical page sizes.

For a general tablespace to contain compressed tables ( ROW_FORMAT=COMPRESSED ), FILE_BLOCK_SIZE must be specified, and the FILE_BLOCK_SIZE value must be a valid compressed page size in relation to the innodb_page_size value. Also, the physical page size of the compressed table ( KEY_BLOCK_SIZE ) must be equal to FILE_BLOCK_SIZE/1024 . For example, if innodb_page_size=16KB and FILE_BLOCK_SIZE=8K , the KEY_BLOCK_SIZE of the table must be 8.

The following table shows permitted innodb_page_size , FILE_BLOCK_SIZE , and KEY_BLOCK_SIZE combinations. FILE_BLOCK_SIZE values may also be specified in bytes. To determine a valid KEY_BLOCK_SIZE value for a given FILE_BLOCK_SIZE , divide the FILE_BLOCK_SIZE value by 1024. Table compression is not support for 32K and 64K InnoDB page sizes. For more information about KEY_BLOCK_SIZE , see CREATE TABLE , and Section 15.9.1.2, “Creating Compressed Tables” .

Table 15.4 Permitted Page Size, FILE_BLOCK_SIZE, and KEY_BLOCK_SIZE Combinations for Compressed Tables

InnoDB Page Size (innodb_page_size) Permitted FILE_BLOCK_SIZE Value Permitted KEY_BLOCK_SIZE Value
64KB 64K (65536) Compression is not supported
32KB 32K (32768) Compression is not supported
16KB 16K (16384) N/A: If innodb_page_size is equal to FILE_BLOCK_SIZE , the tablespace cannot contain a compressed table.
16KB 8K (8192) 8
16KB 4K (4096) 4
16KB 2K (2048) 2
16KB 1K (1024) 1
8KB 8K (8192) N/A: If innodb_page_size is equal to FILE_BLOCK_SIZE , the tablespace cannot contain a compressed table.
8KB 4K (4096) 4
8KB 2K (2048) 2
8KB 1K (1024) 1
4KB 4K (4096) N/A: If innodb_page_size is equal to FILE_BLOCK_SIZE , the tablespace cannot contain a compressed table.
4KB 2K (2048) 2
4KB 1K (1024) 1

This example demonstrates creating a general tablespace and adding a compressed table. The example assumes a default innodb_page_size of 16KB. The FILE_BLOCK_SIZE of 8192 requires that the compressed table have a KEY_BLOCK_SIZE of 8.

mysql> CREATE TABLESPACE `ts2` ADD DATAFILE 'ts2.ibd' FILE_BLOCK_SIZE = 8192 Engine=InnoDB;
mysql> CREATE TABLE t4 (c1 INT PRIMARY KEY) TABLESPACE ts2 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
                                

If you do not specify FILE_BLOCK_SIZE when creating a general tablespace, FILE_BLOCK_SIZE defaults to innodb_page_size . When FILE_BLOCK_SIZE is equal to innodb_page_size , the tablespace may only contain tables with an uncompressed row format ( COMPACT , REDUNDANT , and DYNAMIC row formats).

Moving Tables Between Tablespaces Using ALTER TABLE

You can use ALTER TABLE with the TABLESPACE option to move a table to an existing general tablespace, to a new file-per-table tablespace, or to the system tablespace.

Note

Support for placing table partitions in shared tablespaces was deprecated in MySQL 5.7.24 and removed MySQL 8.0.13. Shared tablespaces include the InnoDB system tablespace and general tablespaces.

To move a table from a file-per-table tablespace or from the system tablespace to a general tablespace, specify the name of the general tablespace. The general tablespace must exist. See CREATE TABLESPACE for more information.

ALTER TABLE tbl_name TABLESPACE [=] tablespace_name;
                                

To move a table from a general tablespace or file-per-table tablespace to the system tablespace, specify innodb_system as the tablespace name.

ALTER TABLE tbl_name TABLESPACE [=] innodb_system;
                                

To move a table from the system tablespace or a general tablespace to a file-per-table tablespace, specify innodb_file_per_table as the tablespace name.

ALTER TABLE tbl_name TABLESPACE [=] innodb_file_per_table;
                                

ALTER TABLE ... TABLESPACE operations always cause a full table rebuild, even if the TABLESPACE attribute has not changed from its previous value.

ALTER TABLE ... TABLESPACE syntax does not support moving a table from a temporary tablespace to a persistent tablespace.

The DATA DIRECTORY clause is permitted with CREATE TABLE ... TABLESPACE=innodb_file_per_table but is otherwise not supported for use in combination with the TABLESPACE option.

Restrictions apply when moving tables from encrypted tablespaces. See InnoDB Tablespace Encryption Limitations .

Renaming a General Tablespace

Renaming a general tablespace is supported using ALTER TABLESPACE ... RENAME TO syntax.

ALTER TABLESPACE s1 RENAME TO s2;
                                

The CREATE TABLESPACE privilege is required to rename a general tablespace.

RENAME TO operations are implicitly performed in autocommit mode, regardless of the autocommit setting.

A RENAME TO operation cannot be performed while LOCK TABLES or FLUSH TABLES WITH READ LOCK is in effect for tables that reside in the tablespace.

Exclusive metadata locks are taken on tables within a general tablespace while the tablespace is renamed, which prevents concurrent DDL. Concurrent DML is supported.

Dropping a General Tablespace

The DROP TABLESPACE statement is used to drop an InnoDB general tablespace.

All tables must be dropped from the tablespace prior to a DROP TABLESPACE operation. If the tablespace is not empty, DROP TABLESPACE returns an error.

Use a query similar to the following to identify tables in a general tablespace.

mysql> SELECT a.NAME AS space_name, b.NAME AS table_name FROM INFORMATION_SCHEMA.INNODB_TABLESPACES a,
       INFORMATION_SCHEMA.INNODB_TABLES b WHERE a.SPACE=b.SPACE AND a.NAME LIKE 'ts1';
+------------+------------+
| space_name | table_name |
+------------+------------+
| ts1        | test/t1    |
| ts1        | test/t2    |
| ts1        | test/t3    |
+------------+------------+
                                

A general InnoDB tablespace is not deleted automatically when the last table in the tablespace is dropped. The tablespace must be dropped explicitly using DROP TABLESPACE tablespace_name .

A general tablespace does not belong to any particular database. A DROP DATABASE operation can drop tables that belong to a general tablespace but it cannot drop the tablespace, even if the DROP DATABASE operation drops all tables that belong to the tablespace. A general tablespace must be dropped explicitly using DROP TABLESPACE tablespace_name .

Similar to the system tablespace, truncating or dropping tables stored in a general tablespace creates free space internally in the general tablespace .ibd data file which can only be used for new InnoDB data. Space is not released back to the operating system as it is when a file-per-table tablespace is deleted during a DROP TABLE operation.

This example demonstrates how to drop an InnoDB general tablespace. The general tablespace ts1 is created with a single table. The table must be dropped before dropping the tablespace.

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;
mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts10 Engine=InnoDB;
mysql> DROP TABLE t1;
mysql> DROP TABLESPACE ts1;
                                
Note

tablespace_name is a case-sensitive identifier in MySQL.

General Tablespace Limitations
  • A generated or existing tablespace cannot be changed to a general tablespace.

  • Creation of temporary general tablespaces is not supported.

  • General tablespaces do not support temporary tables.

  • Similar to the system tablespace, truncating or dropping tables stored in a general tablespace creates free space internally in the general tablespace .ibd data file which can only be used for new InnoDB data. Space is not released back to the operating system as it is for file-per-table tablespaces.

    Additionally, a table-copying ALTER TABLE operation on table that resides in a shared tablespace (a general tablespace or the system tablespace) can increase the amount of space used by the tablespace. Such operations require as much additional space as the data in the table plus indexes. The additional space required for the table-copying ALTER TABLE operation is not released back to the operating system as it is for file-per-table tablespaces.

  • ALTER TABLE ... DISCARD TABLESPACE and ALTER TABLE ...IMPORT TABLESPACE are not supported for tables that belong to a general tablespace.

  • Support for placing table partitions in general tablespaces was deprecated in MySQL 5.7.24 and removed in MySQL 8.0.13.

15.6.3.4 Undo Tablespaces

Undo tablespaces contain undo logs, which are collections of undo log records that contain information about how to undo the latest change by a transaction to a clustered index record. Undo logs exist within undo log segments, which are contained within rollback segments. The innodb_rollback_segments variable defines the number of rollback segments allocated to each undo tablespace.

Two default undo tablespaces are created when the MySQL instance is initialized. Default undo tablespaces are created at initialization time to provide a location for rollback segments that must exist before SQL statements can be accepted. A minimum of two undo tablespaces is required to support automated truncation of undo tablespaces. See Truncating Undo Tablespaces .

Default undo tablespaces are created in the location defined by the innodb_undo_directory variable. If the innodb_undo_directory variable is undefined, default undo tablespaces are created in the data directory. Default undo tablespace data files are named undo_001 and undo_002 . The corresponding undo tablespace names defined in the data dictionary are innodb_undo_001 and innodb_undo_002 .

As of MySQL 8.0.14, additional undo tablespaces can be created at runtime using SQL. See Adding Undo Tablespaces .

The initial size of an undo tablespace data file depends on the innodb_page_size value. For the default 16KB page size, the initial undo tablespace file size is 10MiB. For 4KB, 8KB, 32KB, and 64KB page sizes, the initial undo tablespace files sizes are 7MiB, 8MiB, 20MiB, and 40MiB, respectively.

Adding Undo Tablespaces

Because undo logs can become large during long-running transactions, creating additional undo tablespaces can help prevent individual undo tablespaces from becoming too large. As of MySQL 8.0.14, additional undo tablespaces can be created at runtime using CREATE UNDO TABLESPACE syntax.

CREATE UNDO TABLESPACE tablespace_name ADD DATAFILE 'file_name.ibu'
                                

The undo tablespace file name must have an .ibu extension. It is not permitted to specify a relative path when defining the undo tablespace file name. A fully qualified path is permitted, but the path must be known to InnoDB . Known paths are those defined by the innodb_directories variable.

At startup, directories defined by the innodb_directories variable are scanned for undo tablespace files. (The scan also traverses subdirectories.) Directories defined by the innodb_data_home_dir , innodb_undo_directory , and datadir variables are automatically appended to the innodb_directories value, regardless of whether the innodb_directories variable is defined explicitly. An undo tablespace can therefore reside in paths defined by any of those variables.

If the undo tablespace file name does not include a path, the undo tablespace is created in the directory defined by the innodb_undo_directory variable. If that variable is undefined, the undo tablespace is created in the data directory.

Note

The InnoDB recovery process requires that undo tablespace files reside in known directories. Undo tablespace files must be discovered and opened before redo recovery and before other data files are opened to permit uncommitted transactions and data dictionary changes to be rolled back. An undo tablespace not found before recovery cannot be used, which can cause database inconsistencies. An error message is reported at startup if an undo tablespace known to the data dictionary is not found. The known directory requirement also supports undo tablespace portability. See Moving Undo Tablespaces .

To create undo tablespaces in a path relative to the data directory, set the innodb_undo_directory variable to the relative path, and specify the file name only when creating an undo tablespace.

To view undo tablespace names and paths, query INFORMATION_SCHEMA.FILES :

SELECT TABLESPACE_NAME, FILE_NAME FROM INFORMATION_SCHEMA.FILES
  WHERE FILE_TYPE LIKE 'UNDO LOG';
                                

A MySQL instance supports up to 127 undo tablespaces including the two default undo tablespaces created when the MySQL instance is initialized.

Note

Prior to MySQL 8.0.14, additional undo tablespaces are created by configuring the innodb_undo_tablespaces startup variable. This variable is deprecated and no longer configurable as of MySQL 8.0.14.

Prior to MySQL 8.0.14, increasing the innodb_undo_tablespaces setting creates the specified number of undo tablespaces and adds them to the list of active undo tablespaces. Decreasing the innodb_undo_tablespaces setting removes undo tablespaces from the list of active undo tablespaces. Undo tablespaces that are removed from the active list remain active until they are no longer used by existing transactions. The innodb_undo_tablespaces variable can be configured at runtime using a SET statement or defined in a configuration file.

Prior to MySQL 8.0.14, deactivated undo tablespaces cannot be removed. Manual removal of undo tablespace files is possible after a slow shutdown but is not recommended, as deactivated undo tablespaces may contain active undo logs for some time after the server is restarted if open transactions were present when shutting down the server. As of MySQL 8.0.14, undo tablespaces can be dropped using DROP UNDO TABALESPACE syntax. See Dropping Undo Tablespaces .

Dropping Undo Tablespaces

As of MySQL 8.0.14, undo tablespaces created using CREATE UNDO TABLESPACE syntax can be dropped at runtime using DROP UNDO TABALESPACE syntax.

An undo tablespace must be empty before it can be dropped. To empty an undo tablespace, the undo tablespace must first be marked as inactive using ALTER UNDO TABLESPACE syntax so that the tablespace is no longer used for assigning rollback segments to new transactions.

ALTER UNDO TABLESPACE tablespace_name SET INACTIVE;
                                

After an undo tablespace is marked as inactive, transactions currently using rollback segments in the undo tablespace are permitted to finish, as are any transactions started before those transactions are completed. After transactions are completed, the purge system frees the rollback segments in the undo tablepace, and the undo tablespace is truncated to its initial size. (The same process is used when truncating undo tablespaces. See Truncating Undo Tablespaces .) When the undo tablespace is empty, it can be dropped.

DROP UNDO TABLESPACE tablespace_name;
                                
Note

Alternatively, the undo tablespace can be left in an empty state and reactivated later, when needed, by issuing an ALTER UNDO TABLESPACE tablespace_name SET ACTIVE statement.

The state of an undo tablespace can be monitored by querying the INFORMATION_SCHEMA.INNODB_TABLESPACES table.

SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES
  WHERE NAME LIKE tablepace_name;
                                

An inactive state indicates that rollback segments in an undo tablespace are no longer used by new transactions. An empty state indicates that an undo tablespace is empty and ready to be dropped, or made active again using an ALTER UNDO TABLESPACE tablespace_name SET ACTIVE statement. Attempting to drop an undo tablespace that is not empty returns an error.

The default undo tablespaces ( innodb_undo_001 and innodb_undo_002 ) created when the MySQL instance is initialized cannot be dropped. They can, however, be made inactive using an ALTER UNDO TABLESPACE tablespace_name SET INACTIVE statement. Before a default undo tablespace can be made inactive, there must be an undo tablespace to take its place. A minimum of two active undo tablespaces are required at all times to support automated truncation of undo tablespaces.

Moving Undo Tablespaces

Undo tablespaces created with CREATE UNDO TABLESPACE syntax can be moved while the server is offline to any known directory. Known directories are those defined by the innodb_directories variable. Directories defined by innodb_data_home_dir , innodb_undo_directory , and datadir are automatically appended to the innodb_directories value regardless of whether the innodb_directories variable is defined explicitly. Those directories and their subdirectories are scanned at startup for undo tablespaces files. An undo tablespace file moved to any of those directories is discovered at startup and assumed to be the undo tablespace that was moved.

The default undo tablespaces ( innodb_undo_001 and innodb_undo_002 ) created when the MySQL instance is initialized must always reside in the directory defined by the innodb_undo_directory variable. If the innodb_undo_directory variable is undefined, default undo tablespaces reside in the data directory. If default undo tablespaces are moved while the server is offline, the server must be started with the innodb_undo_directory variable configured to the new directory.

The I/O patterns for undo logs make undo tablespaces good candidates for SSD storage.

Configuring the Number of Rollback Segments

The innodb_rollback_segments variable defines the number of rollback segments allocated to each undo tablespace and to the global temporary tablespace. The innodb_rollback_segments variable can be configured at startup or while the server is running.

The default setting for innodb_rollback_segments is 128, which is also the maximum value. For information about the number of transactions that a rollback segment supports, see Section 15.6.6, “Undo Logs” .

Truncating Undo Tablespaces

There are two methods of truncating undo tablespaces, which can be used individually or in combination to manage undo tablespace size. One method is automated, enabled using configuration variables. The other method is manual, performed using SQL statements.

The automated method does not require monitoring undo tablespace size and, once enabled, it performs deactivation, truncation, and reactivation of undo tablespaces without manual intervention. The manual truncation method may be preferable if you want to control when undo tablespaces are taken offline for truncation. For example, you may want to avoid truncating undo tablespaces during peak workload times.

Automated Truncation

Automated truncation of undo tablespaces requires a minimum of two active undo tablespaces, which ensures that one undo tablespace remains active while the other is taken offline to be truncated. By default, two undo tablespaces are created when the MySQL instance is initialized.

To have undo tablespaces automatically truncated, enable the innodb_undo_log_truncate variable. For example:

mysql> SET GLOBAL innodb_undo_log_truncate=ON;
                                

When the innodb_undo_log_truncate variable is enabled, undo tablespaces that exceed the size limit defined by the innodb_max_undo_log_size variable are subject to truncation. The innodb_max_undo_log_size variable is dynamic and has a default value of 1073741824 bytes (1024 MiB).

mysql> SELECT @@innodb_max_undo_log_size;
+----------------------------+
| @@innodb_max_undo_log_size |
+----------------------------+
|                 1073741824 |
+----------------------------+
                                

When the innodb_undo_log_truncate variable is enabled:

  1. Default and user-defined undo tablespaces that exceed the innodb_max_undo_log_size setting are marked for truncation. Selection of an undo tablespace for truncation is performed in a circular fashion to avoid truncating the same undo tablespace each time.

  2. Rollback segments residing in the selected undo tablespace are made inactive so that they are not assigned to new transactions. Existing transactions that are currently using rollback segments are permitted to finish.

  3. The purge system frees rollback segments that are no longer in use.

  4. After all rollback segments in the undo tablespace are freed, the truncate operation runs and truncates the undo tablespace to its initial size. The initial size of an undo tablespace depends on the innodb_page_size value. For the default 16KB page size, the initial undo tablespace file size is 10MiB. For 4KB, 8KB, 32KB, and 64KB page sizes, the initial undo tablespace files sizes are 7MiB, 8MiB, 20MiB, and 40MiB, respectively.

    The size of an undo tablespace after a truncate operation may be larger than the initial size due to immediate use following the completion of the operation.

    The innodb_undo_directory variable defines the location of default undo tablespace files. If the innodb_undo_directory variable is undefined, default undo tablespaces reside in the data directory. The location of all undo tablespace files including user-defined undo tablespaces created using CREATE UNDO TABLESPACE syntax can be determined by querying the INFORMATION_SCHEMA.FILES table:

    SELECT TABLESPACE_NAME, FILE_NAME FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE LIKE 'UNDO LOG';
                                                
  5. Rollback segments are reactivated so that they can be assigned to new transactions.

Manual Truncation

Manual truncation of undo tablespaces requires a minimum of three active undo tablespaces. Two active undo tablespaces are required at all times to support the possibility that automated truncation is enabled. A minimum of three undo tablespaces satisfies this requirement while permitting an undo tablespace to be taken offline manually.

To manually initiate truncation of an undo tablespace, deactivate the undo tablespace by issuing the following statement:

ALTER UNDO TABLESPACE tablespace_name SET INACTIVE;
                                

After the undo tablespace is marked as inactive, transactions currently using rollback segments in the undo tablespace are permitted to finish, as are any transactions started before those transactions are completed. After transactions are completed, the purge system frees the rollback segments in the undo tablepace, the undo tablespace is truncated to its initial size, and the undo tablespace state changes from inactive to empty .

Note

When an ALTER UNDO TABLESPACE tablespace_name SET INACTIVE statement deactivates an undo tablespace, the purge thread looks for that undo tablespaces at the next opportunity. Once the undo tablespace is found and marked for truncation, the purge thread returns with increased frequency to quickly empty and truncate the undo tablespace.

To check the state of an undo tablespace, query the INFORMATION_SCHEMA.INNODB_TABLESPACES table.

SELECT NAME, STATE FROM INFORMATION_SCHEMA.INNODB_TABLESPACES
  WHERE NAME LIKE tablepace_name;
                                

Once the undo tablespace is in an empty state, it can be reactivated by issuing the following statement:

ALTER UNDO TABLESPACE tablespace_name SET ACTIVE;
                                

An undo tablespace in an empty state can also be dropped. See Dropping Undo Tablespaces .

Expediting Automated Truncation of Undo Tablespaces

The purge thread is responsible for emptying and truncating undo tablespaces. By default, the purge thread looks for undo tablespaces to truncate once every 128 times that purge is invoked. The frequency with which the purge thread looks for undo tablespaces to truncate is controlled by the innodb_purge_rseg_truncate_frequency variable, which has a default setting of 128.

mysql> SELECT @@innodb_purge_rseg_truncate_frequency;
+----------------------------------------+
| @@innodb_purge_rseg_truncate_frequency |
+----------------------------------------+
|                                    128 |
+----------------------------------------+
                                

To increase that frequency, decrease the innodb_purge_rseg_truncate_frequency setting. For example, to have the purge thread look for undo tabespaces once every 32 timees that purge is invoked, set innodb_purge_rseg_truncate_frequency to 32.

mysql> SET GLOBAL innodb_purge_rseg_truncate_frequency=32;
                                

When the purge thread finds an undo tablespace that requires truncation, the purge thread returns with increased frequency to quickly empty and truncate the undo tablespace.

Performance Impact of Truncating Undo Tablespace Files

When an undo tablespace is truncated, the rollback segments in the undo tablespace are deactivated. The active rollback segments in other undo tablespaces assume responsibility for the entire system load, which may result in a slight performance degradation. The amount of performance degradation depends on a number of factors:

  • Number of undo tablespaces

  • Number of undo logs

  • Undo tablespace size

  • Speed of the I/O susbsystem

  • Existing long running transactions

  • System load

The easiest way to avoid impacting performance when truncating undo tablepaces is to increase the number of undo tablespaces.

Monitoring Undo Tablespace Truncation

As of MySQL 8.0.16, undo and purge susbsystem counters are provided for monitoring background activities associated with undo log truncation. For counter names and descriptions, query the INFORMATION_SCHEMA.INNODB_METRICS table.

SELECT NAME, SUBSYSTEM, COMMENT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME LIKE '%truncate%';
                                

For information about enabling counters and querying counter data, see Section 15.14.6, “InnoDB INFORMATION_SCHEMA Metrics Table” .

15.6.3.5 Temporary Tablespaces

InnoDB uses session temporary tablespaces and a global temporary tablespace.

Session Temporary Tablespaces

Session temporary tablespaces store user-created temporary tables and internal temporary tables created by the optimizer when InnoDB is configured as the storage engine for on-disk internal temporary tables. The internal_tmp_disk_storage_engine variable defines the storage engine for on-disk internal temporary tables and is set to InnoDB by default.

Session temporary tablespaces are allocated to a session from a pool of temporary tablespaces on the first request to create an on-disk temporary table. A maximum of two tablespaces is allocated to a session, one for user-created temporary tables and the other for internal temporary tables created by the optimizer. The temporary tablespaces allocated to a session are used for all on-disk temporary tables created by the session. When a session disconnects, its temporary tablespaces are truncated and released back to the pool. A pool of 10 temporary tablespaces is created when the server is started. The size of the pool never shrinks and tablespaces are added to the pool automatically as necessary. The pool of temporary tablespaces is removed on normal shutdown or on an aborted initialization. Session temporary tablespace files are five pages in size when created and have an .ibt file name extension.

A range of 400 thousand space IDs is reserved for session temporary tablespaces. Because the pool of session temporary tablespaces is recreated each time the server is started, space IDs for session temporary tablespaces are not persisted when the server is shut down and may be reused.

The innodb_temp_tablespaces_dir variable defines the location where session temporary tablespaces are created. The default location is the #innodb_temp directory in the data directory. Startup is refused if the pool of temporary tablespaces cannot be created.

shell> cd BASEDIR/data/#innodb_temp
shell> ls
temp_10.ibt  temp_2.ibt  temp_4.ibt  temp_6.ibt  temp_8.ibt
temp_1.ibt   temp_3.ibt  temp_5.ibt  temp_7.ibt  temp_9.ibt
                                

In statement based replication (SBR) mode, temporary tables created on a slave reside in a single session temporary tablespace that is truncated only when the MySQL server is shut down.

The INNODB_SESSION_TEMP_TABLESPACES table provides metadata about session temporary tablespaces.

The INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO table provides metadata about user-created temporary tables that are active in an InnoDB instance.

Global Temporary Tablespace

The global temporary tablespace ( ibtmp1 ) stores rollback segments for changes made to user-created temporary tables.

The innodb_temp_data_file_path variable defines the relative path, name, size, and attributes for global temporary tablespace data files. If no value is specified for innodb_temp_data_file_path , the default behavior is to create a single auto-extending data file named ibtmp1 in the innodb_data_home_dir directory. The initial file size is slightly larger than 12MB.

The global temporary tablespace is removed on normal shutdown or on an aborted initialization, and recreated each time the server is started. The global temporary tablespace receives a dynamically generated space ID when it is created. Startup is refused if the global temporary tablespace cannot be created. The global temporary tablespace is not removed if the server halts unexpectedly. In this case, a database administrator can remove the global temporary tablespace manually or restart the MySQL server. Restarting the MySQL server removes and recreates the global temporary tablespace automatically.

The global temporary tablespace cannot reside on a raw device.

INFORMATION_SCHEMA.FILES provides metadata about the global temporary tablespace. Issue a query similar to this one to view global temporary tablespace metadata:

mysql> SELECT * FROM INFORMATION_SCHEMA.FILES WHERE TABLESPACE_NAME='innodb_temporary'\G
                                

By default, the global temporary tablespace data file is autoextending and increases in size as necessary.

To determine if a global temporary tablespace data file is autoextending, check the innodb_temp_data_file_path setting:

mysql> SELECT @@innodb_temp_data_file_path;
+------------------------------+
| @@innodb_temp_data_file_path |
+------------------------------+
| ibtmp1:12M:autoextend        |
+------------------------------+
                                

To check the size of global temporary tablespace data files, query the INFORMATION_SCHEMA.FILES table using a query similar to this one:

mysql> SELECT FILE_NAME, TABLESPACE_NAME, ENGINE, INITIAL_SIZE, TOTAL_EXTENTS*EXTENT_SIZE
       AS TotalSizeBytes, DATA_FREE, MAXIMUM_SIZE FROM INFORMATION_SCHEMA.FILES
       WHERE TABLESPACE_NAME = 'innodb_temporary'\G
*************************** 1. row ***************************
      FILE_NAME: ./ibtmp1
TABLESPACE_NAME: innodb_temporary
         ENGINE: InnoDB
   INITIAL_SIZE: 12582912
 TotalSizeBytes: 12582912
      DATA_FREE: 6291456
   MAXIMUM_SIZE: NULL
                                

TotalSizeBytes shows the current size of the global temporary tablespace data file. For information about other field values, see Section 25.11, “The INFORMATION_SCHEMA FILES Table” .

Alternatively, check the global temporary tablespace data file size on your operating system. The global temporary tablespace data file is located in the directory defined by the innodb_temp_data_file_path variable.

To reclaim disk space occupied by a global temporary tablespace data file, restart the MySQL server. Restarting the server removes and recreates the global temporary tablespace data file according to the attributes defined by innodb_temp_data_file_path .

To limit the size of the global temporary tablespace data file, configure innodb_temp_data_file_path to specify a maximum file size. For example:

[mysqld]
innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:500M
                                

Configuring innodb_temp_data_file_path requires restarting the server.

15.6.3.6 Creating a Tablespace Outside of the Data Directory

The CREATE TABLE ... DATA DIRECTORY clause permits creating a file-per-table tablespace outside of the data directory. For example, you can use the DATA DIRECTORY clause to create a tablespace on a separate storage device with particular performance or capacity characteristics, such as a fast SSD or a high-capacity HDD