Introduction¶
Any distributed data store can only provide two of the following three guarantees:
Consistency
Availability
Partition tolerance
And in this page I will elaborate this further and look into my Ramses project and tie the knowledge to the implementation.
Theory¶
Why does the theory state that it can provide only two out of three?
Consistency: Every read receives the most recent write, or it receives an error Availability: Every request receives a non error response, without the guarantee that it contains the most recent write Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network nodes.
Source: https://www.julianbrowne.com/article/brewers-cap-theorem
The reason for this is that, for example, if a network connectivity issue occurs a choice has to be made between consistency & availability. Network partitioning (network connectivity issues) has to be tolerated to some degree because no distributed system is safe from these.
When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. When choosing availability, the system will always query and try to return the most recent available version of the information.
Without network failures, availability & consistency can be guaranteed.
Ramses & theory¶
My own application uses a mySQL database, this is also known as a RDBMS(Relational databse). Which choses consistency over availability. Systems designed BASE(basically-available, soft-state, eventual consistency) such as noSQL choses availability over consistency.
Basically available: reading and writing operations are available as much as possible (using all nodes of a database cluster), but might not be consistent (the write might not persist after conflicts are reconciled, and the read might not get the latest write) Soft-state: without consistency guarantees, after some amount of time, we only have some probability of knowing the state, since it might not yet have converged Eventually consistent: If we execute some writes and then the system functions long enough, we can know the state of the data; any further reads of that data item will return the same value
This means that the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning
For Ramses this could be relevant as one of the annexes within the learning goals is scalability. In which case, it could be beneficial to use an AP database like Cassandra & rely on eventual consistency.