MatrixCube is a fundamental library for building distributed systems, which offers guarantees about reliability, consistency, and scalability. It is designed to facilitate distributed, stateful application building to allow developers only need to focus on the business logic on a single node. MatrixCube is currently built upon multi-raft to provide replicated state machine and will migrate to Paxos families to increase friendliness to scenarios spanning multiple data centers.
Unlike many other distributed systems, MatrixCube is designed as part of the storage nodes. A matrixone distributed deployment doesn't not have dedicated scheduling nodes. MatrixCube cannot work as a standalone module.
There are several key concepts for understanding how MatrixCube works.
A MatrixCube distributed system consists of several physical computers, our data are stored across these physical computers. We call each computer inside this cluster a
Our data in database are organized in tables logically. But for physical storage, the data are split into different partitions to store in order to get better scalability. Each partition is called a
Shard. In our design, a new created table is initially a
Shard. When the size of the table exceeds the
Shard size limit, the
Shard will split.
To provide reliable service, each
Shard is stored not only once, it will have several copy stored in different
Stores. We call each copy a
Shard can have multiple
Replica, the data in each
Replica are the same.
Raft-group and Leader
Replicas are located in different
Stores, once a
Replica is updated, the other
Replicas must be updated to keep data consistency. When a client makes query to no matter which
Replica, it always gets the same result. We deploy Raft protocol to implement the concensus process. The
Replicas of a particular
Shard group into a
Leader is elected to be the representative of this group. All consistent read and write requests are handled only by the leader.
Learn more about: How does a
Leader get elected in Raft?
DataStorage is an interface for implementing distributed storage service. It must be defined in prior to using MatrixCube.
DataStorage needs to be implemented based on the characteristics of storage engine. Some common distributed storage service can be easily constructed based on
DataStorage, such as
Distributed File System etc. A default Key-Value based
DataStorage is provided to meet the requirements of most scenarios.
Prophet is a scheduling module. It takes charge of
Auto-Rebalance, which keeps the system storage level and read/write throughput level balanced across
Stores. The inital 3
Stores of a MatrixCube cluster are all
Learn more about How does Prophet handle the scheduling?
Raftstore is the core component of MatrixCube, it implements the most important features of MatrixCube:
Metadata storage: including the metadata of
Multi-Raft management: the relationship between
Raft-Group, the communication between multiple
Raft-Groups, Leader election and re-election.
Global Routing: a global routing table will be constructed with the Event Notify mechanism of
Prophet. The read/write routing will be based on this routing table.
Shard Proxy: a proxy for read/write request for
Shard. With the proxy, the detailed implementation of
Multi-Raftis senseless and all
Stores are equal for users. User can make request to any
Store, all requests will be routed to the right
Learn more about How do the
Shard Proxy and
Global Routing work?
MatrixCube provides a strong consistency. It is guaranteed that after any successful data write, the reading afterwards will get the latest value, no matter from which store.
The distributed storage service implemented by MatrixCube is a fault tolerant and high available service. When a
Replicas, the system can still work until
For example, a cluster with 3
Stores can survive with 1
Store failure; a cluster with 5
Stores can survive with 2
There is a certain limit to a
Shard size. Whenever a
Shard exceeds its storage limit, MatrixCube splits a
Shard into two
Shards and keep each
Shard with the same storage level.
You can checkout a more detailed descripition about this process with How does the Shard Splitting work?.
A distributed system should leverage all the computation power and storage of all nodes. For a MatrixCube cluster, when there is an increase or decrease of
Auto-Rebalance will occur, which moves data across
Stores to reach balance for each single
Learn more about: How does the Auto-Rebalance work?.
With shard splitting and auto-rebalance, a MatrixCube distributed system is capable of scaling out. The cluster storage and throughput capability are proportional to the number of
User-defined storage engine
MatrixCube has no limit to standalone data storage engine. Any storage engine implementing
DataStorage interface defined by MatrixCube could construct a MatrixCube-based distributed system. By default, MatrixCube provides a
Pebble-based Key-Value storage engine. (
Pebble is a Go version of
As a general distributed framework, different distributed storage system could be build based on MatrixCube. User can also customize their read/write commands. As long as it works in a standalone version, MatrixCube can help you upgrading it to a distributed version.