A Thorough Introduction to Distributed Systems

What is a Distributed System and why is it so complicated?

(文章发表于 Apr. 28 2018)

目录：
A Thorough Introduction to Distributed Systems
Introduction

What is a distributed system?
Why distribute a system?
Database scaling example
3.1 Scaling our database
3.1.1 Pitfall
3.2 Continuing to Scale
3.2.1 Pitfall
Decentralized vs Distributed
Distributed System Categories
Distributed Data Stores
1.1 CAP Theorem
1.2 Cassandra
1.3 Consensus
Distributed Computing
2.1 MapReduce
2.2 Better Techniques
Distributed File Systems
3.1 HDFS
3.2 IPFS
Distributed Messaging
Distributed Applications
5.1 Erlang Virtual Machine
5.2 BitTorrent
Distributed Ledgers
6.1 Blockchain
6.2 Bitcoin
6.2.3 Ethereum
6.2.4 Further usages of distributed ledgers
Summary
Caution
Further Distributed Systems Reading
Reference

Introduction

With the ever-growing technological expansion of the world, distributed systems are becoming more and more widespread. They are a vast and complex field of study in computer science.

This article aims to introduce you to distributed systems in a basic manner, showing you a glimpse of the different categories of such systems while not diving deep into the details.

1. What is a distributed system?

A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user.

These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

I propose we incrementally work through an example of distributing a system so that you can get a better sense of it all:

Let’s go with a database! Traditional databases are stored on the filesystem of one single machine, whenever you want to fetch/insert information in it — you talk to that machine directly.

A traditional Stack.png

For us to distribute this database system, we’d need to have this database run on multiple machines at the same time. The user must be able to talk to whichever machine he chooses and should not be able to tell that he is not talking to a single machine — if he inserts a record into node#1, node #3 must be able to return that record.

An architecture that can be considered distributed.png

2. Why distribute a system?

Systems are always distributed by necessity. The truth of the matter is — managing distributed systems is a complex topic chock-full of pitfalls and landmines. It is a headache to deploy, maintain and debug distributed systems, so why go there at all?

What a distributed system enables you to do is scale horizontally. Going back to our previous example of the single database server, the only way to handle more traffic would be to upgrade the hardware the database is running on. This is called scaling vertically.

Scaling horizontally simply means adding more computers rather than upgrading the hardware of a single one.

It is significantly cheaper than vertical scaling after a certain threshold but that is not its main case for preference.

Vertical scaling can only bump your performance up to the latest hardware’s capabilities. These capabilities prove to be insufficient for technological companies with moderate to big workloads.

3-Horizontal scaling becomes much cheaper after a certain threshold.png

The best thing about horizontal scaling is that you have no cap on how much you can scale — whenever performance degrades you simply add another machine, up to infinity potentially.

Easy scaling is not the only benefit you get from distributed systems. Fault tolerance and low latency are also equally as important.

Fault Tolerance — a cluster of ten machines across two data centers is inherently more fault-tolerant than a single machine. Even if one data center catches on fire, your application would still work.

Low Latency — The time for a network packet to travel the world is physically bounded by the speed of light. For example, the shortest possible time for a request‘s round-trip time (that is, go back and forth) in a fiber-optic cable between New York to Sydney is 160ms. Distributed systems allow you to have a node in both cities, allowing traffic to hit the node that is closest to it.

For a distributed system to work, though, you need the software running on those machines to be specifically designed for running on multiple computers at the same time and handling the problems that come along with it. This turns out to be no easy feat.

3. Database scaling example

3.1 Scaling our database

Imagine that our web application got insanely popular. Imagine also that our database started getting twice as much queries per second as it can handle. Your application would immediately start to decline in performance and this would get noticed by your users.

Let’s work together and make our database scale to meet our high demands.

In a typical web application you normally read information much more frequently than you insert new information or modify old one.

There is a way to increase read performance and that is by the so-called ** Master-Slave Replication ** strategy. Here, you create two new database servers which sync up with the main one. The catch is that you can only read from these new instances.

4-Master-Slave Replication strategy .png

Whenever you insert or modify information — you talk to the master database. It, in turn, asynchronously informs the slaves of the change and they save it as well.

Congratulations, you can now execute 3x as much read queries! Isn’t this great?

3.1.1 Pitfall

Gotcha! We immediately lost the C in our relational database’s ACID guarantees, which stands for Consistency.

You see, there now exists a possibility in which we insert a new record into the database, immediately afterwards issue a read query for it and get nothing back, as if it didn’t exist!

Propagating the new information from the master to the slave does not happen instantaneously. There actually exists a time window in which you can fetch stale information. If this were not the case, your write performance would suffer, as it would have to synchronously wait for the data to be propagated.

Distributed systems come with a handful of trade-offs. This particular issue is one you will have to live with if you want to adequately scale.

3.2 Continuing to Scale

Using the slave database approach, we can horizontally scaled our read traffic up to some extent. That’s great but we’ve hit a wall in regards to our write traffic — it’s still all in one server!

We’re not left with much options here. We simply need to split our write traffic into multiple servers as one is not able to handle it.

One way is to go with a multi-master replication strategy. There, instead of slaves that you can only read from, you have multiple master nodes which support reads and writes. Unfortunately, this gets complicated real quick as you now have the ability to create conflicts (e.g insert two records with same ID).

Let’s go with another technique called sharding (also called partitioning).

With sharding you split your server into multiple smaller servers, called shards. These shards all hold different records — you create a rule as to what kind of records go into which shard. It is very important to create the rule such that the data gets spread in an uniform way.

A possible approach to this is to define ranges according to some information about a record (e.g users with name A-D).

5-shards.png

This sharding key should be chosen very carefully, as the load is not always equal based on arbitrary columns. (e.g more people have a name starting with C rather than Z). A single shard that receives more requests than others is called a hot spot and must be avoided. Once split up, re-sharding data becomes incredibly expensive and can cause significant downtime, as was the case with FourSquare’s infamous 11 hour outage.

o keep our example simple, assume our client (the Rails app) knows which database to use for each record. It is also worth noting that there are many strategies for sharding and this is a simple example to illustrate the concept.

We have won quite a lot right now — we can increase our write traffic N times where N is the number of shards. This practically gives us almost no limit — imagine how finely-grained we can get with this partitioning.

3.2.1 Pitfall

Everything in Software Engineering is more or less a trade-off and this is no exception. Sharding is no simple feat and is best avoided until really needed.

We have now made queries by keys other than the partitioned key incredibly inefficient (they need to go through all of the shards). SQL JOIN queries are even worse and complex ones become practically unusable.

Decentralized vs Distributed

Before we go any further I’d like to make a distinction between the two terms.

Even though the words sound similar and can be concluded to mean the same logically, their difference makes a significant technological and political impact.

Decentralized is still distributed in the technical sense, but the whole decentralized systems is not owned by one actor. No one company can own a decentralized system, otherwise it wouldn’t be decentralized anymore.

This means that most systems we will go over today can be thought of as distributed centralized systems — and that is what they’re made to be.

If you think about it — it is harder to create a decentralized system because then you need to handle the case where some of the participants are malicious. This is not the case with normal distributed systems, as you know you own all the nodes.

Note: This definition has been debated a lot and can be confused with others (peer-to-peer, federated). In early literature, it’s been defined differently as well. Regardless, what I gave you as a definition is what I feel is the most widely used now that blockchain and cryptocurrencies popularized the term.

Distributed System Categories

We are now going to go through a couple of distributed system categories and list their largest publicly-known production usage. Bear in mind that most such numbers shown are outdated and are most probably significantly bigger as of the time you are reading this.

1. Distributed Data Stores

Distributed Data Stores are most widely used and recognized as Distributed Databases. Most distributed databases are NoSQL non-relational databases, limited to key-value semantics. They provide incredible performance and scalability at the cost of consistency or availability.

Known Scale — Apple is known to use 75,000 Apache Cassandra nodes storing over 10 petabytes of data, back in 2015

We cannot go into discussions of distributed data stores without first introducing the CAP Theorem.

1.1 CAP Theorem

Proven way back in 2002, the CAP theorem states that a distributed data store cannot simultaneously be consistent, available and partition tolerant.

6-Choose 2 out of 3 (But not Consistency and Availability).png

Choose 2 out of 3 (But not Consistency and Availability)

Some quick definitions:

Consistency— What you read and write sequentially is what is expected (remember the gotcha with the database replication a few paragraphs ago?)
Availability — the whole system does not die — every non-failing node always returns a response.
Partition Tolerant — The system continues to function and uphold its consistency/availability guarantees in spite of network partitions

In reality, partition tolerance must be a given for any distributed data store. As mentioned in many places, one of which this great article, you cannot have consistency and availability without partition tolerance.

Think about it: if you have two nodes which accept information and their connection dies — how are they both going to be available and simultaneously provide you with consistency? They have no way of knowing what the other node is doing and as such have can either become offline (unavailable) or work with stale information (inconsistent).

7_What_do_we_do.png

What do we do?

In the end you’re left to choose if you want your system to be strongly consistent or highly available under a network partition.

Practice shows that most applications value availability more. You do not necessarily always need strong consistency. Even then, that trade-off is not necessarily made because you need the 100% availability guarantee, but rather because network latency can be an issue when having to synchronize machines to achieve strong consistency. These and more factors make applications typically opt for solutions which offer high availability.

Such databases settle with the weakest consistency model — eventual consistency (strong vs eventual consistency explanation). This model guarantees that if no new updates are made to a given item, eventually all accesses to that item will return the latest updated value.

Those systems provide BASE properties (as opposed to traditional databases’ ACID)

Basically Available — The system always returns a response
Soft state — The system could change over time, even during times of no input (due to eventual consistency)
Eventual consistency — In the absence of input, the data will spread to every node sooner or later — thus becoming consistent

Examples of such available distributed databases — Cassandra, [Riakhttp://basho.com/products/riak-kv/(http://basho.com/products/riak-kv/), Voldemort

Of course, there are other data stores which prefer stronger consistency — HBase, Couchbase, Redis, Zookeeper

The CAP theorem is worthy of multiple articles on its own — some regarding how you can tweak a system’s CAP properties depending on how the client behaves and others on how it is not understood properly.

1.2 Cassandra

Cassandra, as mentioned above, is a distributed No-SQL database which prefers the AP properties out of the CAP, settling with eventual consistency. I must admit this may be a bit misleading, as Cassandra is highly configurable — you can make it provide strong consistency at the expense of availability as well, but that is not its common use case.

Cassandra uses consistent hashing to determine which nodes out of your cluster must manage the data you are passing in. You set a replication factor, which basically states to how many nodes you want to replicate your data.

8-Sample write-Cassandra replication Factor.png

Sample write

When reading, you will read from those nodes only.

Cassandra is massively scalable, providing absurdly high write throughput.

9--.png

Possibly biased diagram, showing writes per second benchmarks. Taken from here.

Even though this diagram might be biased and it looks like it compares Cassandra to databases set to provide strong consistency (otherwise I can’t see why MongoDB would drop performance when upgraded from 4 to 8 nodes), this should still show what a properly set up Cassandra cluster is capable of.

Regardless, in the distributed systems trade-off which enables horizontal scaling and incredibly high throughput, Cassandra does not provide some fundamental features of ACID databases — namely, transactions.

1.3 Consensus

Database transactions are tricky to implement in distributed systems as they require each node to agree on the right action to take (abort or commit). This is known as consensus and it is a fundamental problem in distributed systems.

Reaching the type of agreement needed for the “transaction commit” problem is straightforward if the participating processes and the network are completely reliable. However, real systems are subject to a number of possible faults, such as process crashes, network partitioning, and lost, distorted, or duplicated messages.

This poses an issue — it has been proven impossible to guarantee that a correct consensus is reached within a bounded time frame on a non-reliable network.

In practice, though, there are algorithms that reach consensus on a non-reliable network pretty quickly. Cassandra actually provides lightweight transactions through the use of the Paxos algorithm for distributed consensus.

2. Distributed Computing

Distributed computing is the key to the influx of Big Data processing we’ve seen in recent years. It is the technique of splitting an enormous task (e.g aggregate 100 billion records), of which no single computer is capable of practically executing on its own, into many smaller tasks, each of which can fit into a single commodity machine. You split your huge task into many smaller ones, have them execute on many machines in parallel, aggregate the data appropriately and you have solved your initial problem. This approach again enables you to scale horizontally — when you have a bigger task, simply include more nodes in the calculation.

Known Scale — Folding@Home had 160k active machines in 2012

An early innovator in this space was Google, which by necessity of their large amounts of data had to invent a new paradigm for distributed computation — MapReduce. They published a paper on it in 2004 and the open source community later created Apache Hadoop based on it.

2.1 MapReduce

MapReduce can be simply defined as two steps — mapping the data and reducing it to something meaningful.

Let’s get at it with an example again:

Say we are Medium and we stored our enormous information in a secondary distributed database for warehousing purposes. We want to fetch data representing the number of claps issued each day throughout April 2017 (a year ago).

This example is kept as short, clear and simple as possible, but imagine we are working with loads of data (e.g analyzing billions of claps). We won’t be storing all of this information on one machine obviously and we won’t be analyzing all of this with one machine only. We also won’t be querying the production database but rather some “warehouse” database built specifically for low-priority offline jobs.

9- Mapping_And_Reducing.png

Each Map job is a separate node transforming as much data as it can. Each job traverses all of the data in the given storage node and maps it to a simple tuple of the date and the number one. Then, three intermediary steps (which nobody talks about) are done — Shuffle, Sort and Partition. They basically further arrange the data and delete it to the appropriate reduce job. As we’re dealing with big data, we have each Reduce job separated to work on a single date only.

This is a good paradigm and surprisingly enables you to do a lot with it — you can chain multiple MapReduce jobs for example.

2.2 Better Techniques

MapReduce is somewhat legacy nowadays and brings some problems with it. Because it works in batches (jobs) a problem arises where if your job fails — you need to restart the whole thing. A 2-hour job failing can really slow down your whole data processing pipeline and you do not want that in the very least, especially in peak hours.

Another issue is the time you wait until you receive results. In real-time analytic systems (which all have big data and thus use distributed computing) it is important to have your latest crunched data be as fresh as possible and certainly not from a few hours ago.

As such, other architectures have emerged that address these issues. Namely Lambda Architecture (mix of batch processing and stream processing) and Kappa Architecture (only stream processing). These advances in the field have brought new tools enabling them — Kafka Streams, Apache Spark, Apache Storm, Apache Samza.

3. Distributed File Systems

Distributed file systems can be thought of as distributed data stores. They’re the same thing as a concept — storing and accessing a large amount of data across a cluster of machines all appearing as one. They typically go hand in hand with Distributed Computing.

Known Scale — Yahoo is known for running HDFS on over 42,000 nodes for storage of 600 Petabytes of data, way back in 2011

Wikipedia defines the difference being that distributed file systems allow files to be accessed using the same interfaces and semantics as local files, not through a custom API like the Cassandra Query Language (CQL).

3.1 HDFS

Hadoop Distributed File System (HDFS) is the distributed file system used for distributed computing via the Hadoop framework. Boasting widespread adoption, it is used to store and replicate large files (GB or TB in size) across many machines.

Its architecture consists mainly of NameNodes and DataNodes. NameNodes are responsible for keeping metadata about the cluster, like which node contains which file blocks. They act as coordinators for the network by figuring out where best to store and replicate files, tracking the system’s health. DataNodes simply store files and execute commands like replicating a file, writing a new one and others.

10-HDFS-NameNode-DataNode.png

Unsurprisingly, HDFS is best used with Hadoop for computation as it provides data awareness to the computation jobs. Said jobs then get ran on the nodes storing the data. This leverages data locality — optimizes computations and reduces the amount of traffic over the network.

3.2 IPFS

Interplanetary File System (IPFS) is an exciting new peer-to-peer protocol/network for a distributed file system. Leveraging Blockchain technology, it boasts a completely decentralized architecture with no single owner nor point of failure.

IPFS offers a naming system (similar to DNS) called IPNS and lets users easily access information. It stores file via historic versioning, similar to how Git does. This allows for accessing all of a file’s previous states.

It is still undergoing heavy development (v0.4 as of time of writing) but has already seen projects interested in building over it (FileCoin).

4. Distributed Messaging

Messaging systems provide a central place for storage and propagation of messages/events inside your overall system. They allow you to decouple your application logic from directly talking with your other systems.

Known Scale —* LinkedIn’s Kafka cluster processed 1 trillion messages a day with peaks of 4.5 millions messages a second.8

Simply put, a messaging platform works in the following way:

A MESSAGE.png

A message is broadcast from the application which potentially create it (called a producer), goes into the platform and is read by potentially multiple applications which are interested in it (called consumers).

If you need to save a certain event to a few places (e.g user creation to database, warehouse, email sending service and whatever else you can come up with) a messaging platform is the cleanest way to spread that message.

Consumers can either pull information out of the brokers (pull model) or have the brokers push information directly into the consumers (push model).

There are a couple of popular top-notch messaging platforms:

RabbitMQ — Message broker which allows you finer-grained control of message trajectories via routing rules and other easily configurable settings. Can be called a smart broker, as it has a lot of logic in it and tightly keeps track of messages that pass through it. Provides settings for both AP and CP from CAP. Uses a push model for notifying the consumers.

Kafka — Message broker (and all out platform) which is a bit lower level, as in it does not keep track of which messages have been read and does not allow for complex routing logic. This helps it achieve amazing performance. In my opinion, this is the biggest prospect in this space with active development from the open-source community and support from the Confluent team. Kafka arguably has the most widespread use from top tech companies. I wrote a thorough introduction to this, where I go into detail about all of its goodness.

Apache ActiveMQ — The oldest of the bunch, dating from 2004. Uses the JMS API, meaning it is geared towards Java EE applications. It got rewritten as ActiveMQ Artemis, which provides outstanding performance on par with Kafka.

Amazon SQS — A messaging service provided by AWS. Lets you quickly integrate it with existing applications and eliminates the need to handle your own infrastructure, which might be a big benefit, as systems like Kafka are notoriously tricky to set up. Amazon also offers two similar services — SNS and MQ, the latter of which is basically ActiveMQ but managed by Amazon.

5. Distributed Applications

If you roll up 5 Rails servers behind a single load balancer all connected to one database, could you call that a distributed application? Recall my definition from up above:

A distributed system is a group of computers working together as to appear as a single computer to the end-user. These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

If you count the database as a shared state, you could argue that this can be classified as a distributed system — but you’d be wrong, as you’ve missed the “working together” part of the definition.

A system is distributed only if the nodes communicate with each other to coordinate their actions.

Therefore something like an application running its back-end code on a peer-to-peer network can better be classified as a distributed application. Regardless, this is all needless classification that serves no purpose but illustrate how fussy we are about grouping things together.

Known Scale — BitTorrent swarm of 193,000 nodes for an episode of Game of Thrones, April, 2014

5.1 Erlang Virtual Machine

Erlang is a functional language that has great semantics for concurrency, distribution and fault-tolerance. The Erlang Virtual Machine itself handles the distribution of an Erlang application.

Its model works by having many isolated lightweight processes all with the ability to talk to each other via a built-in system of message passing. This is called the Actor Model and the Erlang OTP libraries can be thought of as a distributed actor framework (along the lines of Akka for the JVM).

The model is what helps it achieve great concurrency rather simply — the processes are spread across the available cores of the system running them. Since this is indistinguishable from a network setting (apart from the ability to drop messages), Erlang’s VM can connect to other Erlang VMs running in the same data center or even in another continent. This swarm of virtual machines run one single application and handle machine failures via takeover (another node gets scheduled to run).

In fact, the distributed layer of the language was added in order to provide fault tolerance. Software running on a single machine is always at risk of having that single machine dying and taking your application offline. Software running on many nodes allows easier hardware failure handling, provided the application was built with that in mind.

5.2 BitTorrent

BitTorrent is one of the most widely used protocol for transferring large files across the web via torrents. The main idea is to facilitate file transfer between different peers in the network without having to go through a main server.

Using a BitTorrent client, you connect to multiple computers across the world to download a file. When you open a .torrent file, you connect to a so-called tracker, which is a machine that acts as a coordinator. It helps with peer discovery, showing you the nodes in the network which have the file you want.

11 - BitTorrent-A_Sample_Network.png

a sample network

You have the notions of two types of user, a leecher and a seeder. A leecher is the user who is downloading a file and a seeder is the user who is uploading said file.

The funny thing about peer-to-peer networks is that you, as an ordinary user, have the ability to join and contribute to the network.

BitTorrent and its precursors (Gnutella, apster) allow you to voluntarily host files and upload to other users who want them. The reason BitTorrent is so popular is that it was the first of its kind to provide incentives for contributing to the network. Freeriding, where a user would only download files, was an issue with the previous file sharing protocols.

BitTorrent solved freeriding to an extent by making seeders upload more to those who provide the best download rates. It works by incentivizing you to upload while downloading a file. Unfortunately, after you’re done, nothing is making you stay active in the network. This causes a lack of seeders in the network who have the full file and as the protocol relies heavily on such users, solutions like private trackers came into fruition. Private trackers require you to be a member of a community (often invite-only) in order to participate in the distributed network.

After advancements in the field, trackerless torrents were invented. This was an upgrade to the BitTorrent protocol that did not rely on centralized trackers for gathering metadata and finding peers but instead use new algorithms. One such instance is Kademlia (Mainline DHT), a distributed hash table (DHT) which allows you to find peers through other peers. In effect, each user performs a tracker’s duties.

6. Distributed Ledgers

A distributed ledger can be thought of as an immutable, append-only database that is replicated, synchronized and shared across all nodes in the distributed network.

Known Scale — Ethereum Network had a peak of 1.3 million transactions a day on January 4th, 2018.

They leverage the Event Sourcing pattern, allowing you to rebuild the ledger’s state at any time in its history.

6.1 Blockchain

Blockchain is the current underlying technology used for distributed ledgers and in fact marked their start. This latest and greatest innovation in the distributed space enabled the creation of the first ever truly distributed payment protocol — Bitcoin.

Blockchain is a distributed ledger carrying an ordered list of all transactions that ever occurred in its network. Transactions are grouped and stored in blocks. The whole blockchain is essentially a linked-list of blocks (hence the name). Said blocks are computationally expensive to create and are tightly linked to each other through cryptography.

Simply said, each block contains a special hash (that starts with X amount of zeroes) of the current block’s contents (in the form of a Merkle Tree) plus the previous block’s hash. This hash requires a lot of CPU power to be produced because the only way to come up with it is through brute-force.

12-Simplified blockchain.png

Simplified blockchain

Miners are the nodes who try to compute the hash (via bruteforce). The miners all compete with each other for who can come up with a random string (called a nonce) which, when combine with the contents, produces the aforementioned hash. Once somebody finds the correct nonce — he broadcasts it to the whole network. Said string is then verified by each node on its own and accepted into their chain.

This translates into a system where it is absurdly costly to modify the blockchain and absurdly easy to verify that it is not tampered with.

It is costly to change a block’s contents because that would produce a different hash. Remember that each subsequent block‘s hash is dependent on it. If you were to change a transaction in the first block of the picture above — you would change the Merkle Root. This would in turn change the block’s hash (most likely without the needed leading zeroes) — that would change block #2’s hash and so on and so on. This means you’d need to brute-force a new nonce for every block after the one you just modified.

The network always trusts and replicates the longest valid chain. In order to cheat the system and eventually produce a longer chain you’d need more than 50% of the total CPU power used by all the nodes.

Blockchain can be thought of as a distributed mechanism for emergent consensus. Consensus is not achieved explicitly — there is no election or fixed moment when consensus occurs. Instead, consensus is an emergent product of the asynchronous interaction of thousands of independent nodes, all following protocol rules.

This unprecedented innovation has recently become a boom in the tech space with people predicting it will mark the creation of the Web 3.0. It is definitely the most exciting space in the software engineering world right now, filled with extremely challenging and interesting problems waiting to be solved.

6.2 Bitcoin

What previous distributed payment protocols lacked was a way to practically prevent the double-spending problem in real time, in a distributed manner. Research has produced interesting propositions^[1] but Bitcoin was the first to implement a practical solution with clear advantages over others.

The double spending problem states that an actor (e.g Bob) cannot spend his single resource in two places. If Bob has $1, he should not be able to give it to both Alice and Zack — it is only one asset, it cannot be duplicated. It turns out it is really hard to truly achieve this guarantee in a distributed system. There are some interesting mitigation approaches predating blockchain, but they do not completely solve the problem in a practical way.

Double-spending is solved easily by Bitcoin, as only one block is added to the chain at a time. Double-spending is impossible within a single block, therefore even if two blocks are created at the same time — only one will come to be on the eventual longest chain.

13-Bitcoin.jpeg

Bitcoin relies on the difficulty of accumulating CPU power.

While in a voting system an attacker need only add nodes to the network (which is easy, as free access to the network is a design target), in a CPU power based scheme an attacker faces a physical limitation: getting access to more and more powerful hardware.

This is also the reason malicious groups of nodes need to control over 50% of the computational power of the network to actually carry any successful attack. Less than that, and the rest of the network will create a longer blockchain faster.

6.2.3 Ethereum

Ethereum can be thought of as a programmable blockchain-based software platform. It has its own cryptocurrency (Ether) which fuels the deployment of smart contracts on its blockchain.

Smart contracts are a piece of code stored as a single transaction in the Ethereum blockchain. To run the code, all you have to do is issue a transaction with a smart contract as its destination. This in turn makes the miner nodes execute the code and whatever changes it incurs. The code is executed inside the Ethereum Virtual Machine.

Solidity, Ethereum’s native programming language, is what’s used to write smart contracts. It is a turing-complete programming language which directly interfaces with the Ethereum blockchain, allowing you to query state like balances or other smart contract results. To prevent infinite loops, running the code requires some amount of Ether.

As the blockchain can be interpreted as a series of state changes, a lot of Distributed Applications (DApps) have been built on top of Ethereum and similar platforms.

6.2.4 Further usages of distributed ledgers

Proof of Existence — A service to anonymously and securely store proof that a certain digital document existed at some point of time. Useful for ensuring document integrity, ownership and timestamping.

Decentralized Autonomous Organizations (DAO) — organizations which use blockchain as a means of reaching consensus on the organization’s improvement propositions. Examples are Dash’s governance system, the SmartCash project

Decentralized Authentication — Store your identity on the blockchain, enabling you to use single sign-on (SSO) everywhere. Sovrin, Civic

And many, many more. The distributed ledger technology really did open up endless possibilities. Some are most probably being invented as we speak!

Summary

In the short span of this article, we managed define what a distributed system is, why you’d use one and go over each category a little. Some important things to remember are:

Distributed Systems are complex
They are chosen by necessity of scale and price
They are harder to work with
CAP Theorem — Consistency/Availability trade-off
They have 6 categories — data stores, computing, file systems, messaging systems, ledgers, applications

To be frank, we have barely touched the surface on distributed systems. I did not have the chance to thoroughly tackle and explain core problems like consensus, replication strategies, event ordering & time, failure tolerance, broadcasting a message across the network and others.

Caution

Let me leave you with a parting forewarning:

You must stray away from distributed systems as much as you can. The complexity overhead they incur with themselves is not worth the effort if you can avoid the problem by either solving it in a different way or some other out-of-the-box solution.

Don't get addicted to the buzz that comes with solving hard problems.

If you're solving the wrong problems, your effort will be wasted.

If you miss a chance to turn a hard problem into an easy one, your effort will be wasted.

Find inspiration in progress, not problem solving.

Further Distributed Systems Reading

esigning Data-Intensive Applications, Martin Kleppmann — A great book that goes over everything in distributed systems and more.

Cloud Computing Specialization, University of Illinois, Coursera — A long series of courses (6) going over distributed system concepts, applications

Jepsen — Blog explaining a lot of distributed technologies (ElasticSearch, Redis, MongoDB, etc)

Thanks for taking the time to read through this long(~5600 words) article!

If, by any chance, you found this informative or thought it provided you with value, please make sure to give it as many claps you believe it deserves and consider sharing with a friend who could use an introduction to this wonderful field of study.

~Stanislav Kozlovski

Reference

A Thorough Introduction to Distributed Systems: https://hackernoon.com/a-thorough-introduction-to-distributed-systems-3b91562c9b3c
A primer on latency and bandwidth: https://www.oreilly.com/learning/primer-on-latency-and-bandwidth
Multi-master data conflicts - Part 1: understanding the problem What is a conflict? :http://datacharmer.blogspot.com/2013/03/multi-master-data-conflicts-part-1.html
How Sharding Works https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6
Why you don’t want to shard. https://www.percona.com/blog/2009/08/06/why-you-dont-want-to-shard/
Foursquare's 11-Hour Downtime: What Went Wrong: https://mashable.com/2010/10/05/foursquare-downtime-post-mortem/#WaLULGQKDkqX
You Can’t Sacrifice Partition Tolerance:https://codahale.com/you-cant-sacrifice-partition-tolerance/
An Illustrated Proof of the CAP Theorem: https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/
Eventual vs Strong Consistency in Distributed Databases :https://hackernoon.com/eventual-vs-strong-consistency-in-distributed-databases-282fdad37cf7
Apache Cassandra NoSQL Performance Benchmarks: https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks
Cassandra lightweight transactions: https://www.beyondthelines.net/databases/cassandra-lightweight-transactions/
MapReduce: Simplified Data Processing on Large Clusters: http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf
From Lambda to Kappa: A Guide on Real-time Big Data Architectures: https://www.talend.com/blog/2017/08/28/lambda-kappa-real-time-big-data-architectures/
How We’re Improving and Advancing Kafka at LinkedIn: https://engineering.linkedin.com/apache-kafka/how-we_re-improving-and-advancing-kafka-linkedin
Thorough Introduction to Apache Kafka™--A deep dive into a system that serves as the heart of many companies’ architecture: https://hackernoon.com/thorough-introduction-to-apache-kafka-6fbf2989bbc1
Linked Lists: https://www.cs.cmu.edu/~adamchik/15-121/lectures/Linked%20Lists/linked%20lists.html
Why the Web 3.0 Matters and you should know about it: https://medium.com/@matteozago/why-the-web-3-0-matters-and-you-should-know-about-it-a5851d63c949
2018 — The Year of DApps: https://medium.com/the-mission/2018-the-year-of-dapps-dbe108860bcb
WHAT ARE DAPPS? https://blog.bitnation.co/what-are-dapps/
Distributed systems theory for the distributed systems engineer： http://www.the-paper-trail.org/post/2014-08-09-distributed-systems-theory-for-the-distributed-systems-engineer/

原文	翻译	备注
Distributed systems come with a handful of trade-offs.	分布式系统有少量的折衷方案。	a handful of ：一把，少数、几个、不多。 eg: Only a handful of firms offer share option schemes to all their employees. 只有少数几家公司向所有员工提供股票期权方案。
we can horizontally scaled our read traffic up to some extent.		some extent：有几分，在某种程度上。（To some extent this was the truth 在某种程度上，这是事实）
This sharding key should be chosen very carefully, as the load is not always equal based on arbitrary columns.	这个切片/分区键应该非常仔细地选择，因为基于任意列的负载并不总是相等的。
It is also worth noting that there are many strategies for sharding and this is a simple example to illustrate the concept.	还有值得注意的是，有许多策略用于分割，这是一个简单的例子来说明这个概念	noting V.S nothing
We have now made queries by keys other than the partitioned key incredibly inefficient	现在我们已经用键以外的关键字进行了查询，效率非常低。	political affect:政治情感 V.S political effect: 政治效应 V.S political impact: 政治影响
This means that most systems we will go over today can be thought of as distributed centralized systems — and that is what they’re made to be.	这意味着我们今天将【要讨论】的大多数系统可以被认为是分布式集中式系统-------这就是他们所做的。
go over： (PHRASAL VERB) 仔细检查；认真讨论；用心思考.If you go over a document, incident, or problem, you examine, discuss, or think about it very carefully.	vt.重温;翻;转为;留下印象;搁置起来.(PHRASAL VERB) 仔细检查；认真讨论；用心思考.	eg: I won't know how successful it is until an accountant has gone over the books.要等到会计核查完账簿，我才会知道盈利状况如何。
They provide incredible performance and scalability at the cost of consistency or availability.	它们以一致性或可用性为代价提供令人难以置信的性能和可扩展性。	at the cost of
you can make it provide strong consistency at the expense of availability as well, but that is not its common use case.		at the expense of
petabyte	10的15次方字节千兆字节
Such databases settle with the weakest consistency model	settle with: 与…达成协议，与…成交，算清账目	I have a debt to settle with him.
consistent hashing	一致哈希
Cassandra uses consistent hashing to determine which nodes out of your cluster must manage the data you are passing in.	使用一致的散列来确定集群中的哪些节点的数据必须被管理。
Cassandra does not provide some fundamental features of ACID databases — namely, transactions.	没有提供ACID据库的一些基本特征，即交易。	amely adv:即，也就是;换句话说;亦即;就是说 Used to introduce more exact and detailed information about sth that you have just mentioned
However, real systems are subject to a number of possible faults	然而，实际系统会遇到许多可能的故障。be subject to V:受支配;从属于;可以…的;常遭受…	Employee appointment to the Council will be subject to a term of probation of 6 months. 被任命到理事会的员工将有6个月的见习期。
This poses an issue	这构成了一个问题。	pose: 提出;造成（威胁、问题等）;引起;产生
Blockchain is the current underlying technology used for distributed ledgers and in fact marked their start.	BBLAST是目前用于分布式分类帐的底层技术，事实上标志着它们的开始。	underlying technology;底层技术
This latest and greatest innovation in the distributed space enabled the creation of the first ever truly distributed payment protocol — Bitcoin.	这种分布式空间中最新和最伟大的创新使得能够创造出有史以来第一个真正的分布式支付协议---Bitcoin。	enabled the creation of ....
This translates into a system where it is absurdly costly to modify the blockchain and absurdly easy to verify that it is not tampered with.
ledger 收支总账；分类账簿；分户账簿	a book in which a bank, a business, etc. records the money it has paid and received	A ledger is a book in which a company or organization writes down the amounts of money it spends and receives.

Combating Double-Spending Using Cooperative P2P Systems, 25–27 June 2007 — a proposed solution in which each ‘coin’ can expire and is assigned a witness (validator) to it being spent.
Bitgold, December 2005 — A high-level overview of a protocol extremely similar to Bitcoin’s. It is said this is the precursor to Bitcoin. ↩

最后编辑于：2018.09.20 15:30:41

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,362评论 5赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,330评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,247评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,560评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,580评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,569评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,929评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,587评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,840评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,596评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,678评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,366评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,945评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,929评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,165评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 43,271评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,403评论 2赞 342

A Thorough Introduction to Distributed Systems

A Thorough Introduction to Distributed Systems

Introduction

1. What is a distributed system?

2. Why distribute a system?

3. Database scaling example

3.1 Scaling our database

3.1.1 Pitfall

3.2 Continuing to Scale

3.2.1 Pitfall

Decentralized vs Distributed

Distributed System Categories

1. Distributed Data Stores

1.1 CAP Theorem

1.2 Cassandra

1.3 Consensus

2. Distributed Computing

2.1 MapReduce

2.2 Better Techniques

3. Distributed File Systems

3.1 HDFS

3.2 IPFS

4. Distributed Messaging

5. Distributed Applications

5.1 Erlang Virtual Machine

5.2 BitTorrent

6. Distributed Ledgers

6.1 Blockchain

6.2 Bitcoin

6.2.3 Ethereum

6.2.4 Further usages of distributed ledgers

Summary

Caution

Further Distributed Systems Reading

Reference

推荐阅读更多精彩内容