Overview
一个cassandra 集群是由一些commodity server组成的ring-like的去中心化网络。一个client连接cluster是通过连接这个ring中的任意一个node。然后使用CQL interface。这个client连接的node就叫做coordinator。
Cassandra是一个分布式的系统,它依赖于data partitioning去将数据分布到cluster的node上。 为了防止single-point-of-failure, 它需要employ data replication去存储数据副本(replicas)。 这样才会有high availability。
我们会设置一个参数,表示每一次成功的读写至少需要多少的replicas acknowledge。也可以说,当一定数量的读写被relicas执行之后,Coordinator 才会告诉client本次读写成功
一个operation在data propagated 给所有node的之前就可以标记为成功了. 这个对performance是非常重要的。
Nodes in the Cassandra cluster rely on the Gossip Protocol to exchange information with each other.
This protocol allows nodes to obtain state information about other nodes by exchanging information a node has about itself and other nodes. A particular node does not directly exchange information with every other node in the cluster;
node通过流言协议就可以知道其它node的情况,所以不需要直接和所有其它node交换信息。
Cassandra Query Language
With the Cassandra server running, open a new terminal window and access the Cassandra Query Language shell by typing:
$ cqlsh
A keyspace is a container for our application data. You could think of it as an analogue to schema of a RDBMS. The keyspace requires that the replication strategy and replication factor be specified — the number of nodes data must be distributed as replicas to.
CREATE KEYSPACE test01
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
$ DESCRIBE KEYSPACES
system_schema system_auth system system_distributed system_traces test01
swithc key space
USE test01;
CREATE TABLE countries (
id INT PRIMARY KEY,
official_name TEXT,
capital_city TEXT
);
INSERT INTO countries (id, official_name, capital_city) VALUES (1, 'Islamic Republic of Afghanistan', 'Kabul');
SELECT * FROM countries WHERE id = 1;