乐趣区

聊聊apache-gossip的FailureDetector

本文主要研究一下 apache gossip 的 FailureDetector

FailureDetector

incubator-retired-gossip/gossip-base/src/main/java/org/apache/gossip/accrual/FailureDetector.java

public class FailureDetector {public static final Logger LOGGER = Logger.getLogger(FailureDetector.class);
  private final DescriptiveStatistics descriptiveStatistics;
  private final long minimumSamples;
  private volatile long latestHeartbeatMs = -1;
  private final String distribution;

  public FailureDetector(long minimumSamples, int windowSize, String distribution) {descriptiveStatistics = new DescriptiveStatistics(windowSize);
    this.minimumSamples = minimumSamples;
    this.distribution = distribution;
  }

  /**
   * Updates the statistics based on the delta between the last
   * heartbeat and supplied time
   *
   * @param now the time of the heartbeat in milliseconds
   */
  public synchronized void recordHeartbeat(long now) {if (now <= latestHeartbeatMs) {return;}
    if (latestHeartbeatMs != -1) {descriptiveStatistics.addValue(now - latestHeartbeatMs);
    }
    latestHeartbeatMs = now;
  }

  public synchronized Double computePhiMeasure(long now) {if (latestHeartbeatMs == -1 || descriptiveStatistics.getN() < minimumSamples) {return null;}
    long delta = now - latestHeartbeatMs;
    try {
      double probability;
      if (distribution.equals("normal")) {double standardDeviation = descriptiveStatistics.getStandardDeviation();
        standardDeviation = standardDeviation < 0.1 ? 0.1 : standardDeviation;
        probability = new NormalDistributionImpl(descriptiveStatistics.getMean(), standardDeviation).cumulativeProbability(delta);
      } else {probability = new ExponentialDistributionImpl(descriptiveStatistics.getMean()).cumulativeProbability(delta);
      }
      final double eps = 1e-12;
      if (1 - probability < eps) {probability = 1.0;}
      return -1.0d * Math.log10(1.0d - probability);
    } catch (MathException | IllegalArgumentException e) {LOGGER.debug(e);
      return null;
    }
  }
}
  • FailureDetector 的构造器接收三个参数,分别是 minimumSamples, windowSize, distribution
  • 其中 minimumSamples 表示最少需要多少统计值的时候才真正计算 phi 值,windowSize 表示统计窗口的大小,distribution 表示使用哪种分布,normal 表示 NormalDistribution,其他表示 ExponentialDistribution
  • FailureDetector 使用了 apache commons math 的 DescriptiveStatistics 来作为 Heartbeat Interval 的时间窗口统计;使用了 NormalDistribution、ExponentialDistribution 来完成正态分布、指数分布的累积分布 probability,最后使用公式 -1.0d * Math.log10(1.0d – probability) 来计算 phi 值

小结

  • The Phi Accrual Failure Detector by Hayashibara et al 论文提出了基于 phi 值的 Accrual Failure Detector 方法
  • 业界关于 Failure Detector 的实现大致有两种,一种是以 akka 为代表的按照论文基于 NormalDistribution 来计算;一种是以 cassandra 为代表的基于 ExponentialDistribution 来计算
  • apache gossip 的 FailureDetector 则集大成地同时支持了 NormalDistribution 及 ExponentialDistribution 两种实现方式

doc

  • The Phi Accrual Failure Detector by Hayashibara et al
  • PhiAccrualFailureDetector.scala
  • 聊聊 hazelcast 的 PhiAccrualFailureDetector
  • 聊聊 Cassandra 的 FailureDetector
  • inconsistent implementation of ‘cumulative distribution function’ for Exponential Distribution
退出移动版