GLOSSARY

Database Terms, Defined

Clear definitions of distributed database, cloud-native, and TiDB-specific terminology — written for engineers, not marketers.

CORE DATABASE TERMS

HTAP

Hybrid Transactional and Analytical Processing

A database architecture that handles both transactional (OLTP) and analytical (OLAP) workloads on the same system simultaneously. HTAP eliminates the need for separate databases and ETL pipelines, enabling real-time analytics on live transactional data. TiDB is purpose-built for HTAP workloads — learn how TiDB's HTAP architecture works in practice.

OLTP

Online Transactional Processing

A class of database workloads characterized by high-frequency, short-duration transactions — inserts, updates, and deletes. OLTP systems prioritize low latency and high concurrency. Examples include e-commerce order processing and banking transactions.

OLAP

Online Analytical Processing

A class of database workloads involving complex queries over large datasets for business intelligence and reporting. OLAP queries typically scan many rows and perform aggregations. Traditionally handled by separate data warehouses, but TiDB's columnar storage engine (TiFlash) brings OLAP capabilities to the same cluster — read how an HTAP database handles both at the same time.

NewSQL

Modern relational databases that deliver SQL and ACID transactions with improved horizontal scalability and availability compared to traditional RDBMS. “NewSQL” is outcome-focused and not inherently cloud-native by design, even though many deployments run in cloud environments.

Distributed SQL

Distributed SQL Database

A cloud-native SQL database that operates as one logical system while automatically partitioning, replicating, and routing data and queries across nodes for scale and fault tolerance. TiDB is a distributed SQL database, providing MySQL compatibility with transparent distribution and resilience. See why distributed SQL databases elevate modern application development — covering architecture, partitioning, and ACID at scale.

Quorum

Quorum (Distributed Consensus)

The minimum number of replica nodes that must agree before a write is considered committed in a distributed database. TiDB's storage engine, TiKV, uses the Raft consensus protocol: a write must be acknowledged by a majority of replicas in a Raft group before it is durable. This ensures strong consistency even when individual nodes fail.

Replication

Data Replication

The process of maintaining multiple synchronized copies of data across nodes. In TiDB, TiKV automatically replicates each data Region to a configurable number of replicas (default: three) using the Raft consensus algorithm. Replication is the mechanism that delivers both fault tolerance and high availability.

Sharding

Data Sharding

The practice of splitting a dataset into smaller chunks (shards or partitions) distributed across multiple nodes. TiDB performs sharding transparently. Data is divided into Regions (fixed-size key-range chunks) and automatically distributed across TiKV nodes. Applications do not need to implement sharding logic or choose partition keys.

Strong Consistency

Strong Consistency (Linearizability)

A consistency model guaranteeing that once a write is committed, all subsequent reads reflect that write, regardless of which node serves the request. TiDB is strongly consistent by default. All reads see the latest committed state, eliminating the stale-read anomalies common in eventually consistent systems.

Eventual Consistency

A weaker consistency model where replicas are allowed to diverge temporarily, converging to the same state only after sufficient time has passed with no new writes. Many NoSQL databases default to eventual consistency to maximize write throughput. TiDB does not use eventual consistency for primary reads. It provides strong consistency guarantees across the distributed cluster.

CAP Theorem

CAP Theorem (Consistency, Availability, Partition Tolerance)

A theoretical framework stating that a distributed system can fully guarantee at most two of three properties simultaneously: Consistency (every read returns the latest write), Availability (every request receives a response), and Partition Tolerance (the system continues operating despite network splits). TiDB prioritizes Consistency and Partition Tolerance (CP), ensuring strong consistency even during network partitions, with availability maintained through automatic failover.

Serializable Isolation

The strictest standard transaction isolation level, guaranteeing that the outcome of concurrent transactions is identical to some serial (one-at-a-time) execution. TiDB supports Read Committed and Repeatable Read isolation levels, with Repeatable Read implemented as Snapshot Isolation via MVCC. TiDB does not expose Serializable as an isolation level. Pessimistic locking and `SELECT ... FOR UPDATE` can be used to prevent specific anomalies such as lost updates where stronger guarantees are required.

Two-Phase Commit (2PC)

Two-Phase Commit

A distributed protocol that ensures all participants in a distributed transaction either commit or abort together. TiDB uses a modified 2PC protocol, coordinated by the TiDB SQL node initiating the transaction, with globally consistent timestamps provided by PD (Placement Driver). The prepare phase locks the rows being modified across TiKV nodes; the commit phase finalizes them atomically.

Consensus Algorithm

Distributed Consensus Algorithm

A protocol that enables nodes in a distributed system to agree on a single value or state, even when failures occur. TiDB's storage layer (TiKV) uses the Raft consensus algorithm, which elects a leader per Region and requires a quorum of replicas to acknowledge each write before committing.

Write-Ahead Log (WAL)

Write-Ahead Log

A durability mechanism in which changes are written to a sequential log before being applied to the main data store. If the node crashes mid-write, the WAL allows the database to replay or roll back incomplete operations during recovery. TiKV uses RocksDB as its storage engine, which maintains a WAL for crash recovery and durability.

TIDB-SPECIFIC TERMS

TiKV

Ti Key-Value Store

The distributed transactional key-value storage engine that powers TiDB. TiKV stores data across multiple nodes using the Raft consensus algorithm for strong consistency. For a deep dive into TiKV's internals, see the TiKV overview in TiDB Docs. It is also available as a standalone CNCF graduated project for teams that need a distributed key-value store without the SQL layer.

TiFlash

TiFlash Columnar Storage

TiDB's columnar storage extension that enables real-time OLAP queries. TiFlash maintains a columnar replica of TiKV data asynchronously, allowing analytical queries to run on column-oriented storage for significantly better performance — without impacting transactional workloads on TiKV. See the TiFlash overview for deployment and configuration details.

PD

Placement Driver

The metadata management component of a TiDB cluster. PD is a central component of the TiDB architecture, responsible for storing cluster topology, allocating transaction IDs (timestamps), and orchestrating data placement and load balancing across TiKV nodes. It uses etcd internally for distributed consensus.

Raft

Raft Consensus Algorithm

A distributed consensus algorithm used by TiKV to ensure data consistency across replicas. Raft elects a leader node for each data region, and all writes must be acknowledged by a majority of replicas before committing — providing strong consistency guarantees even in the event of node failures. For an in-depth look at how Raft is implemented inside TiKV, see Building a Large-scale Distributed Storage System Based on Raft.

Region

TiDB Region

The fundamental unit of data distribution in TiDB. A Region is a contiguous key-range chunk of approximately 96 MB, stored and replicated as a Raft group across TiKV nodes. PD tracks all Regions and orchestrates their placement and load balancing across the cluster. When a Region grows beyond the size threshold, it splits automatically.

TiCDC

TiDB Change Data Capture

TiDB's real-time change data capture and replication tool. TiCDC reads the change log from TiKV and streams row-level changes to downstream systems (including Kafka, MySQL, and object storage) with low latency. It is used for event-driven architectures, data synchronization, and feeding analytics pipelines without ETL batch jobs.

TiDB Operator

TiDB Operator (Kubernetes)

The official Kubernetes operator for deploying and managing TiDB clusters on Kubernetes. TiDB Operator automates provisioning, scaling, upgrades, backup, and failover for TiDB components (TiDB, TiKV, PD, TiFlash) on any Kubernetes-compatible platform. Maintained by PingCAP, it is the recommended way to run TiDB on Kubernetes in production.

Hot Spot

Hot Spot (Write/Read Hot Spot)

A condition where a disproportionate volume of read or write traffic concentrates on a small number of TiKV nodes or Regions, degrading performance. Hot spots commonly occur when applications use monotonically increasing keys (e.g., auto-increment IDs) as primary keys, causing all new writes to target the same Region. TiDB provides SHARD_ROW_ID_BITS and AUTO_RANDOM to distribute write load evenly.

Placement Rules

TiDB Placement Rules

A TiDB feature that controls where data is stored across nodes, availability zones, and regions by specifying constraints on Raft replica placement. Placement rules enable data locality. For example, certain tables can be restricted to specific geographic regions to satisfy data residency regulations. This is TiDB's mechanism for multi-region data control, distinct from geo-partitioning approaches.

Resource Control (RU)

Resource Control / Resource Units (RU)

TiDB's workload isolation feature, allowing administrators to define Resource Groups with guaranteed and burstable quotas expressed in Request Units (RUs). RUs abstract CPU, I/O, and memory consumption into a single metric, enabling fair sharing of cluster resources across different applications or tenants. Learn more in the TiDB docs.

CLOUD & INFRASTRUCTURE TERMS

Horizontal Scaling

Horizontal Scaling (Scale-Out)

Adding more nodes to a distributed system to increase capacity, as opposed to vertical scaling (adding more resources to a single node). TiDB scales horizontally — you add TiKV nodes to increase storage and throughput, and TiDB nodes to increase query concurrency, without downtime.

Multi-Tenancy

An architecture where a single database cluster serves multiple tenants (customers or teams) with isolated resources and data. TiDB Cloud supports multi-tenancy through resource groups and access controls, enabling shared infrastructure with tenant-level isolation.

Elastic Scaling

The ability to automatically scale database resources up or down in response to workload changes. TiDB Cloud provides elastic scaling — clusters expand to handle traffic spikes and contract during off-peak hours, reducing cost without manual intervention.

ACID

Atomicity, Consistency, Isolation, Durability

The four properties that guarantee reliable database transactions. TiDB provides full ACID compliance at the distributed level — a transaction either commits across all nodes or rolls back entirely, even in multi-region deployments. See how ACID scales in a distributed SQL database — and why it matters beyond single-node MySQL.

High Availability (HA)

High Availability

A system property ensuring that a database remains accessible even when individual components fail. TiDB achieves high availability through Raft-based multi-replica storage: when a TiKV node fails, the Raft group automatically elects a new leader from surviving replicas, and PD rebalances data, typically within seconds and without data loss.

RPO / RTO

Recovery Point Objective / Recovery Time Objective

Two metrics used to characterize database disaster recovery posture. RPO is the maximum acceptable data loss (measured in time) after a failure. RTO is the maximum acceptable time to restore service. TiDB is designed for near-zero RPO and low RTO through synchronous Raft replication. Committed data is never lost provided a replica majority survives.

Multi-Region Deployment

Running a database cluster across multiple geographic regions to reduce latency for global users and survive regional failures. TiDB supports multi-region deployments through placement rules and Raft-based replication across availability zones. Note: TiDB's active-active multi-region support is an upcoming capability. Current multi-region deployments use placement rules for data locality within a cluster.

Availability Zone (AZ)

Availability Zone

A physically isolated data center (or subset of a data center) within a cloud region, with independent power, networking, and cooling. Distributing TiKV replicas across availability zones ensures that a single-AZ failure does not cause data loss or downtime. TiDB recommends placing Raft replicas across at least three AZs for production clusters.

Cloud-Native Database

A database designed from the ground up to run on cloud infrastructure, taking advantage of elastic compute, managed storage, container orchestration, and pay-as-you-go pricing. Cloud-native databases separate storage from compute, support auto-scaling, and are typically deployed via containers or managed services. TiDB Cloud is PingCAP's fully managed cloud-native offering, available on AWS, Google Cloud, and Azure.

Compute-Storage Separation

Compute-Storage Separation (Disaggregated Architecture)

An architecture in which the query-processing layer (compute) and the data-persistence layer (storage) scale independently. This contrasts with shared-nothing architectures where compute and storage are bundled per node. TiDB Cloud Starter uses a disaggregated storage model, allowing compute and storage to scale independently and enabling cost-efficient elastic scaling for variable workloads.

Change Data Capture (CDC)

Change Data Capture

A technique for tracking and streaming row-level changes (inserts, updates, deletes) from a database to downstream consumers in real time. CDC enables event-driven architectures, real-time data synchronization, and analytics on live operational data without full table scans or batch ETL jobs. TiDB's CDC tool is TiCDC.

ETL

Extract, Transform, Load

A data integration pattern in which data is extracted from a source system, transformed into a target format, and loaded into a destination (typically a data warehouse). ETL pipelines introduce latency and operational complexity. HTAP databases like TiDB reduce the need for ETL by allowing analytical queries to run directly on live transactional data via TiFlash, the columnar storage extension.

Vector Database

A database optimized for storing, indexing, and querying high-dimensional vector embeddings, which are numerical representations of unstructured data such as text, images, or audio generated by machine learning models. Vector databases power similarity search, semantic search, and retrieval-augmented generation (RAG) in AI applications. TiDB supports vector search natively, enabling teams to combine structured SQL queries with vector similarity search in a single database without managing a separate vector store.

RAG

Retrieval-Augmented Generation

An AI architecture pattern that improves large language model (LLM) output by retrieving relevant context from an external knowledge base before generating a response. The retrieved context, often sourced via vector similarity search, grounds the LLM in factual, up-to-date information. TiDB's vector search capability makes it a natural fit for RAG pipelines that need both structured data queries and semantic retrieval in one system.

Serverless Database

A database deployment model in which the cloud provider automatically provisions, scales, and manages infrastructure in response to workload. Users pay only for the resources consumed rather than provisioning fixed capacity. TiDB Cloud Starter is PingCAP's serverless-style tier, offering automatic scaling and consumption-based pricing for development and variable workloads.

CONSISTENCY & TRANSACTION TERMS

Snapshot Isolation

Snapshot Isolation (SI)

A transaction isolation level where each transaction reads from a consistent snapshot of the database taken at the start of the transaction, rather than being affected by concurrent writes. Snapshot isolation prevents dirty reads and non-repeatable reads while allowing high concurrency. TiDB uses Multi-Version Concurrency Control (MVCC) to implement snapshot isolation.

MVCC

Multi-Version Concurrency Control

A concurrency control mechanism that maintains multiple versions of each row to allow readers and writers to operate simultaneously without blocking each other. Readers access a consistent snapshot without locking rows; writers create new versions rather than overwriting existing ones. TiDB and TiKV use MVCC as the foundation for both snapshot isolation and non-blocking reads.

Pessimistic Locking

A concurrency control strategy that acquires locks on rows at the start of a transaction, before the actual write, to prevent conflicts. Pessimistic locking is suitable for high-contention workloads where conflicts are frequent. TiDB supports pessimistic transactions as the default mode, matching the behavior of MySQL and simplifying application migration.

Optimistic Locking

A concurrency control strategy that defers conflict detection to commit time, assuming conflicts are rare. If another transaction has modified the same data since the current transaction read it, the commit fails and the application must retry. TiDB also supports optimistic transactions, which can outperform pessimistic locking in low-contention, read-heavy workloads.

CTA cube

Ready to Put These Concepts Into Practice?

TiDB Cloud brings HTAP, elastic scaling, and MySQL compatibility to a fully managed service.