- Restrictions on Access:
- Open Access.
- Distributed key-value stores are an essential component of the cloud computing infrastructure and are used by several applications including reservation systems and financial transactions. These systems commonly replicate the data over multiple nodes to provide fast data access and to ensure the data availability despite the possible failures. In such systems, the data stored is updated frequently by the write clients, and the time scales of data updates and access are often comparable to the time scales of propagating the data to the storage nodes. Ensuring that a read client gets the most recent version of the data is challenging in these settings as the data updates may not reach all nodes. This notion of ensuring that the latest version of the data must be accessible despite the frequent data updates is known as consistency in distributed systems. In replication-based distributed key-value stores, each node stores the entire data which results in a significant communication cost and storage cost as key-value stores use expensive high-speed memory to provide fast access to the data. This has motivated the use of erasure coding in key-value stores, where each node just stores only a part of the data to reduce the storage and communication costs. However, erasure-coded distributed key-value stores face unique challenges that do not appear in replication-based systems as a result of each node storing only a part of the data, in the form of coded elements, and a read client has to obtain enough coded elements corresponding to the same version of the data. Specifically, when an erasure code is used in a strongly consistent key-value store, a storage node that receives a new version of the data cannot delete the older versions of the data and it has to effectively wait for a sufficient number of nodes to receive the new version before deleting the older versions of the data. The multi-version coding framework was formulated by Wang et al. to study the storage cost of strongly consistent distributed key-value stores from an information-theoretic point of view. Specifically, this work showed that the storage nodes have to store several versions of the data in strongly consistent decentralized erasure-coded storage systems, thereby offsetting some of the storage cost benefits of erasure coding. Given that storing multiple versions of the data is inevitable in strongly consistent decentralized erasure-coded systems, exploiting the possible correlations between the different data versions is an important opportunity to reduce the storage cost in such systems. While the multi-version coding framework treats the different data versions as being independent, our work obtains storage cost savings by developing code construction that combine the benefits of erasure coding and delta coding to compress the differences between subsequent versions of the data and shows that the correlation can be exploited even in asynchronous decentralized consistent storage systems.Next, motivated by real-world considerations of geo-distributed key-value stores where the system is not completely decentralized as the nodes can exchange data depending on the network topology, we study a system where a node acquires side information of which versions propagated to some other nodes based on the network topology. Our code constructions show that the side information can result in a better storage cost as compared with the case where the nodes do not exchange side information considered in the multi-version coding framework for some regimes at the expense of the additional latency and the negligible communication overhead of exchanging the side information. Our impossibility results identify surprising scenarios where exchanging tremendous amount of side information does not reduce the storage cost beyond the storage cost of the case where the nodes do not exchange side information. Hence, our work indicates that a careful study of the network topology is necessary to exploit the side information and reduce the storage cost in strongly consistent erasure-coded key-value stores.Finally, we focus on improving the latency of distributed key-value stores. In replication-based strict quorum systems, the quorum sizes are chosen to ensure that the write and read quorums intersect and this guarantees strong consistency. In order to have fast access to the data, many key-values stores allow non-strict (partial, sloppy or probabilistic) quorums that may not intersect. Hence, such systems do not guarantee strong consistency and only provide eventual consistency. We study the latency-consistency trade-off of such systems and derive a closed-form expression for the inconsistency probability for $3$-way replication assuming exponential delays. We also extend our study to eventually consistent erasure-coded key-value stores, show how to characterize the inconsistency probability of such systems and provide examples showing that erasure coding can provide more flexibility to the latency-consistency trade-off as compared with replication-based systems. Our approach allows tuning the latency and consistency guarantees based on the expected network delays.
- Dissertation Note:
- Ph.D. Pennsylvania State University 2020.
- Technical Details:
- The full text of the dissertation is available as an Adobe Acrobat .pdf file ; Adobe Acrobat Reader required to view the file.
View MARC record | catkey: 30586199