关于算法:CSE-505-两阶段提交

32次阅读

共计 7092 个字符,预计需要花费 18 分钟才能阅读完成。

Lab 4 Part 3
CSE 505
Lab 4 Part 3: Two-phase commit
● Keys partitioned over different hosts; one coordinator per transaction
● Acquire locks on all data read/written; release after commit
● To commit, coordinator first sends prepare messages to all involved
shard-holders; they respond prepare_ok or abort, acquiring locks.
○ if prepare_ok, they must be able to commit transaction
○ Paxos is used to replicate the prepare log entry (including locks) as well as commit and
abort messages
● If all prepare_ok, coordinator sends commit to all; they write commit
record and release locks
Lab 4 Part 3: Hints
All ShardStoreServer nodes tag their transaction-handling messages with their
configuration number.
Servers reject any prepare requests coming from different configurations,
causing the transaction to abort.
Servers delay reconfigurations when there are outstanding locks for keys (i.e.,
there are transactions pending in the previous configuration).
Distributed Systems in a nutshell
Goal of Lab 4 Part 3
● Support transactions across multiple keys potentially located across
different replica groups
○ Transactions can be:
■ Series of reads
■ Series of writes
■ Swaps – new one :)
● Use Two-Phase Commit to acquire locks during transaction.
● Why can’t we just use Paxos? Because each group is responsible for a
different set of shards and we want to support transactions across
shards/replica groups.
Two-Phase Commit

  1. Allows all participants to arrive at same conclusion.
  2. Pick a coordinator (i.e leader of the transaction – group with the highest group ID)
  3. Coordinator sends out prepares to all replica groups responsible for a given
    transaction
  4. Participants check if the key of the transaction is locked
    a. If so, reply with an abort
    b. Otherwise, reply PrepareOK and lock the key to prevent reads and writes on it
  5. If Coordinator received PrepareOKs from all groups, send out commits
  6. Commits actually perform read/write, and reply back to coordinator
  7. Coordinator replies back to the client once all CommitOks are received and
    Results are combined
    Transactions
    ● Transaction interface
    ○ Set readSet();
    ○ Set writeSet();
    ○ Set keySet() // union of readSet and writeSet
    ● Implementations of Transaction
    ○ MultiGet
    ■ Set keys
    ■ Read one or more keys across replica groups
    ○ MultiPut
    ■ Map values
    ■ Write one or more keys across replica groups
    ○ Swap
    ■ String key1, key2
    ■ Given two keys, swap their values
    ● Your goal: implement support for these 3 types of Transactions
    Transactions: The Normal / Happy Path from Coordinators perspective
    ● Suggestion: Transaction coordinator is the replica group with the largest group ID in the transaction
    ● Client sends the Transaction to the coordinator
    ● Coordinator replicates Transaction through its Paxos subnodes
    ○ In general, incoming messages to replica groups should be replicated (just like in part 2).
    ○ Use the same design of process(Command c, boolean replicated)
    ● The coordinator will run 2 phase commit to execute the transaction
    ○ Prepare
    ■ Coordinator sends out prepares to all replica groups involved in the transaction
    ■ Once coordinator receives prepareOKs from everyone (not majority anymore, this ain’t
    Paxos), can proceed to commit stage
    ○ Commit
    ■ Coordinator sends out commit message to all replica groups involved in transaction
    ■ Participants can unlock keys after receiving commit
    ■ Coordinator responds to client after all CommitOKs received from participants
    Transactions: The Normal / Happy Path from Participants perspective
    ● Upon receiving prepare:
    ○ Validation: check config nums match
    ○ Participants check if key locked
    ■ If locked, send abort
    ● Coordinator should send aborts, wait for client to retry the entire transaction
    ■ If not locked, lock keys and send PrepareOK
    ● Once coordinator receives PrepareOK from everyone (NOT just majority anymore, this ain’t Paxos),
    the coordinator will send commit messages to participants
    ● Upon receiving commit:
    ○ Replicate in Paxos
    ○ Verify that commit is for correct transaction, config num matches, attempt num matches
    ○ If write commit
    ■ Perform write and update application state
    ○ Update client address -> highest sequence number seen mapping
    ○ Unlock keys
    ○ Respond with CommitOK
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    MultiGet(keys={b,d})
    Client sends Transaction to the coordinator. Coordinator is the max({replica groups involved in this
    transaction}). In this case, R3 (for key=b) and R2 (key=d) are involved, max(3,2) = 3. So replica group 3 is
    coordinator.
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    Coordinator sends Prepare messages to all participants in the transaction. Participant should
    lock keys involved in transaction.
    Prepare(command=MultiGet(keys={b,d})
    Prepare(command=MultiGet(keys={b,d})
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    PrepareOK
    PrepareOK
    Each replica group responds with PrepareOK (including the keys and values).
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    Commit
    Commit
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    CommitOK({b:…}
    )
    CommitOK({d: …})
    R2
    R1 R3
    R{1…3} are replica groups. Each is a set of
    ShardStoreServers, each with a Paxos subnode.
    keys={a}
    keys={b,c}
    keys={d,e}
    Reply({b: …, d: …})
    Tips
    ● Groups can be in two roles: participant and coordinator. May need to
    handle sending messages to your own group (i.e. if a given group is both a
    participator and coordinator). Similar to how in Paxos nodes can be both
    acceptors/proposers.
    ● Keep track of active transactions at each node (each transaction could
    represent a set of keys that are currently locked)
    ● What happens if coordinator receives an abort?
    ○ It may be the case that coordinator config is out of sync
    ■ Participants may reject prepare from coordinator.
    ○ If coordinator learns of new config as part receiving a prepare response, the coordinator
    should finish any outstanding transactions, send out aborts to other transactions if
    needed and process config change. Then client will retry and will start new transaction.
    ○ This means each transaction should have an attempt / retry # associated with it. If a
    transaction needs to be retried, the coordinator needs to reach out to all participants again
    as part of a new attempt #. Attempt # needed so participants can deduplicate &
    distinguish new transaction attempts from a given coordinator.
    Aborts
    ● Never accept aborts from anyone other than the transaction leader
    ● Cases when aborts can happen:
    ○ Coordinator sends prepare for a transaction but key is locked on receiver
    ■ Replicate abort on Paxos subnodes, send back Abort to coordinator
    ■ Coordinator should end up try again with new round of prepares (pay attention to
    livelock)
    ○ Coordinator on newer attempt/config, participant on older attempt/config
    ■ Participant should keep sending prepare responses until it gets either a commit or
    an abort from coordinator
    ○ Participant has higher config number than coordinator’s config number
    Use Attempt # to distinguish prepare
    responses/aborts across different attempts
    ● client sends transaction
    ● coordinator sends prepare message to participants (P1, P2)
    ● P1 replies yes, P2 replies no
    ● coordinator gets P2’s no, sends abort, and tells the client to retry.
    ● client retries the same transaction
    ● coordinator sends prepare message to P1 and P2
    ● P1 replies no, P2 replies yes
    ● However, P1’s “yes” in the previous round arrives now (maybe delayed,
    maybe duplicated), with P2’s yes in this round (P1’s “no” reply in this round
    arrives late), the coordinator could falsely proceed to commit!
    Potential Workflow
    Tackling the entire two-phase commit process at once can be difficult, so one
    suggestion of what order to get things done is
  8. SingleKeyCommands and Transactions that are all on one shard (no 2PC)
  9. Transactions that span shards within one group (no 2PC)
  10. MultiGet and MultiPut through 2PC
  11. Swap through 2PC (slightly more complicated)
正文完
 0