关于分布式系统:分布式系统设计简卷0MapReduce

本篇是对于 2022-MIT 6.828 的对于 MapReduce 的课程笔记及其试验记录；
课堂笔记局部根本摘自 Lecture，齐全能够跳过；
如果发现内容上的纰漏，请不要悭吝您的键盘。

You should try everything else, before you try to build a distributed system, because they’re not simpler.

Lab’s Goal?

Lab 1: MapReduce
- implement your own version of the paper
Lab 2: replication for fault-tolerance using Raft
Lab 3: fault-tolerant key/value store
- use your Raft implementation to build a replicated K/V server
Lab 4: sharded key/value store
- clone the K/V server into a number of independent groups and split the data in your K/V storage system to get parallel speed-up and moving chunks of data between servers as they come and go without dropping any balls

Why do people build distributed systems?

to increase capacity via parallelism
- to get high-performance, lots of CPUs, memories, disk arms moving in parallel…
to tolerate faults via replication
- have two computers do the exact same things and if one of them fails you can cut over to the other one
to place computing physically close to external entities
- 美国西海岸和东海岸的两个服务器之间的通信问题
to achieve security via isolation

Challenges:

many concurrent parts, complex interactions
must cope with partial failure
tricky to realize performance potential

Big Goal?

This is a course about infrastructure for applications.

Storage system.
Communication sytem.
Computation system.

咱们要尽可能地做 abstractions，使这些局部像是一个 non-distributed sytem 的局部。比方将 Storage system 形象成相熟的 file sytem，但背地实际上还是很简单的 parrallel, fault-tolerated distributed system。因为这很难做到，所以这是咱们的 big goal。

Topic:

implementation.
- here are three tools you’ll need to build such a distributed system.
- RPC(remote procedure call), whose goal is to mask the fact that we’re communicating over an unreliable network
- threads
- concurrenty control
performance.
- the high-level goal of building a distributed sytem is to get what people call scalable speed-up.
- scalability: 2x computers -> 2x throughput
fault tolerance
- 1000s of servers, big network -> always something broken
- We’d like to hide these failures from the application.(abstraction)
- Availability — app can make progress despite failures
  - 有挂掉的也能持续失常工作
- Recoverability — app will come back to life when failures are repaired
  - 修护后也能持续失常工作
  - non-volatile storage（但重大影响性能，通常不会抉择）
  - Replication
consistency
- 假如以后 distributed system 提供的是 K/V Service
- 操作只有两种：Put(k, v) and Get(k) return v
- 若在 non-distributed sytem 中，这两种操作都只有一种语义；
- 但在 distributed system，因为通常会有 replication 或 caching 的存在，数据(kv pair) 会有多个拷贝。当你在更新这些拷贝的途中呈现 failure 了，如果保证数据的 consistency 将会是件很重要的事。
- Strong consistency 能够使你拜访到 the most recently updated data，但通信的开销太大；
- Weak consistency 尽力而为，但通常能 acceptable，取决于你的 trade-off。

Lab 1 的指标是把 MapReduce 这篇 2004 年 Google 的经典论文的内容简略复现一遍（玩具），并通过测试。

MapReduce 设计了一个框架，这个框架可能屏蔽分布式系统外部的简单细节，用户只须要编写 map function 和 reduce function 就能够享受分布式并行计算带来的高性能。实现着重浏览论文第 2 节 Programming Model 和第 3 节 Implementation 的局部。这两个局部间接关系到试验的实现。

asynchronous programming == event-driven programming（异步编程 == 事件驱动编程）？

single thread of control, sits an event-driven loop, waits for inputs, whenever it gets an input like a packet, it figures out which client did this packet come from, and it’ll had a table of sort of what the state is of whatever activity it’s managing for that client. it’ll find I was in the middle of reading such-and-such a file, and now it’s asked me to read the next block I’ll go and be the next block I’ll return it.

When to use sharing and locks, versus channels?

Most problems can be solved in either style
- state — sharing and locks
- communication — channels
For the 6.824 labs, I recommend sharing+locks for state, and sync.Cond or channels or time.Sleep() for waiting/notification.

配置好虚拟机，装置最新版本的 Golang，装置一个 Goland，用 Git 把试验代码仓库克隆下来，试验环境就没了。这相比于之前 6.S081 去装置 qemu 简略太多。试过用 VScode 写 Golang，但发现装置了语法检查和主动补全的组件后间接代码编辑区间接爆红，临时找不到解决方案就间接上手 Goland 了。

先前没有接触过 Go，所以遵循试验指导书的提醒把 Go 官网的 Tutorial 过了一遍，成果真的很不错。刷完后至多能把试验的源码看得懂了。

对于试验的实现，其实仅依据论文的形容，尤其是论文里的那张彩图，对照着做就好了。亲测仅依赖论文和试验指导书的内容就能够独立实现试验 1，本人的实现也就 500 行左右的代码，况且难度还只是 moderate，好日子还在后头呢。

据说这门课间接贴具体代码是违规的，所以这里就只贴三个文件的 abstract。

src/mr/coordinator.go：

src/mr/worker.c：

src/mr/rpc.c：

判断哪些是共享变量
- 有些线程平安问题 -race option 都有可能检测不进去，因而可能会呈现数据不统一的问题。比方我原来在 createWorkerId() 时，“workerId 的调配”和“将 worker 构造体增加到 workers 切片内”的操作不是绑定的的，因而前期可能会呈现数据的不统一问题。
慎用 append
- 将一个元素 append 一个切片后，当长度超过容量时，Go 会将切片中的所有元素拷贝到另一个更大的闲暇区域。在这之前若有线程持有在切片某个元素的指针时，这个指针会指向原来的切片的地位，造成数据的不一致性。我的解决办法比拟粗犷，初始化切片时间接给它调配 1000 的长度，再也不必放心切片扩容了。

集体了解的一个分布式系统开发周期，要分三个阶段，各阶段不是严格串行执行的，每个阶段内可做数次迭代：

架构：做正确的事，系统分析与设计（预计破费 60% 的工夫）

浏览相干资料、剖析，了解；
画状态时序图，设计数据结构和算法。

稳固：正确地做事，测试驱动开发（预计破费 40% 的工夫）

先保障一个用例运行起来程序不报错；
再保障一个用例可能能够输入正确的后果；
最初保障所有用例都能输入正确的后果，而且无论何时都能输入正确的后果；
迭代式地增加性能，按外围重要水平顺次增加，每增加一个残缺的性能就要做一次回归测试；
当资源短缺并可行时，为每个功能模块做一次单元测试。
进行必要的重构，进步可维护性，实现新性能。

性能：精益求精，晋升客户体验（规格外，无奈预估）

升高资源应用门槛，压迫硬件性能，直到达到架构设限的天花板

最初说一下 Coding 的一些习惯：

先不思考代码格调和品质等这些高级话题，就说最常见的 Debug。写好程序后跑测试挂掉了，而后翻日志一直进行比对，而后通过不晓得多长时间后终于发现了数据不统一或不正确的起因。且不说这个过程有如许地浪费时间且低效，它还会对你的眼睛和颈椎造成挫伤。

所以于其“浪费时间”Debug，要保障一开始写的时候就要 思考各种各样的 corner case，多思考思考这么写会有什么副作用，就算短时间实现不了也要留个正文备注一下。就算真呈现 Bug 了也要尽快缩短排错工夫，Error 信息要标准，要齐备。我晓得这听起来像是废话，但我的意思 Coding 的时候要集中注意力，带耳机听歌什么的还是不要了，不然你的那一点“马虎”所付出的代价可能是极其苦楚的，特地是在这种分布式系统场景下。对本人的代码负责的同时，更要对本人的工夫负责。

一句话概括：无意识地缩短 Debug 的工夫，不论是从源头上还是在过程上。

关于分布式系统:分布式系统设计简卷0MapReduce

前言

注释

课堂笔记

试验局部

试验指标

试验技巧

试验过程

试验踩坑

后记