本文主要研究一下apache gossip的FailureDetector

FailureDetector

incubator-retired-gossip/gossip-base/src/main/java/org/apache/gossip/accrual/FailureDetector.java

public class FailureDetector {  public static final Logger LOGGER = Logger.getLogger(FailureDetector.class);  private final DescriptiveStatistics descriptiveStatistics;  private final long minimumSamples;  private volatile long latestHeartbeatMs = -1;  private final String distribution;  public FailureDetector(long minimumSamples, int windowSize, String distribution) {    descriptiveStatistics = new DescriptiveStatistics(windowSize);    this.minimumSamples = minimumSamples;    this.distribution = distribution;  }  /**   * Updates the statistics based on the delta between the last   * heartbeat and supplied time   *   * @param now the time of the heartbeat in milliseconds   */  public synchronized void recordHeartbeat(long now) {    if (now <= latestHeartbeatMs) {      return;    }    if (latestHeartbeatMs != -1) {      descriptiveStatistics.addValue(now - latestHeartbeatMs);    }    latestHeartbeatMs = now;  }  public synchronized Double computePhiMeasure(long now) {    if (latestHeartbeatMs == -1 || descriptiveStatistics.getN() < minimumSamples) {      return null;    }    long delta = now - latestHeartbeatMs;    try {      double probability;      if (distribution.equals("normal")) {        double standardDeviation = descriptiveStatistics.getStandardDeviation();        standardDeviation = standardDeviation < 0.1 ? 0.1 : standardDeviation;        probability = new NormalDistributionImpl(descriptiveStatistics.getMean(), standardDeviation).cumulativeProbability(delta);      } else {        probability = new ExponentialDistributionImpl(descriptiveStatistics.getMean()).cumulativeProbability(delta);      }      final double eps = 1e-12;      if (1 - probability < eps) {        probability = 1.0;      }      return -1.0d * Math.log10(1.0d - probability);    } catch (MathException | IllegalArgumentException e) {      LOGGER.debug(e);      return null;    }  }}
  • FailureDetector的构造器接收三个参数,分别是minimumSamples, windowSize, distribution
  • 其中minimumSamples表示最少需要多少统计值的时候才真正计算phi值,windowSize表示统计窗口的大小,distribution表示使用哪种分布,normal表示NormalDistribution,其他表示ExponentialDistribution
  • FailureDetector使用了apache commons math的DescriptiveStatistics来作为Heartbeat Interval的时间窗口统计;使用了NormalDistribution、ExponentialDistribution来完成正态分布、指数分布的累积分布probability,最后使用公式-1.0d * Math.log10(1.0d - probability)来计算phi值

小结

  • The Phi Accrual Failure Detector by Hayashibara et al论文提出了基于phi值的Accrual Failure Detector方法
  • 业界关于Failure Detector的实现大致有两种,一种是以akka为代表的按照论文基于NormalDistribution来计算;一种是以cassandra为代表的基于ExponentialDistribution来计算
  • apache gossip的FailureDetector则集大成地同时支持了NormalDistribution及ExponentialDistribution两种实现方式

doc

  • The Phi Accrual Failure Detector by Hayashibara et al
  • PhiAccrualFailureDetector.scala
  • 聊聊hazelcast的PhiAccrualFailureDetector
  • 聊聊Cassandra的FailureDetector
  • inconsistent implementation of 'cumulative distribution function' for Exponential Distribution