乐趣区

聊聊flink的AscendingTimestampExtractor


本文主要研究一下 flink 的 AscendingTimestampExtractor
AscendingTimestampExtractor
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/timestamps/AscendingTimestampExtractor.java
/**
* A timestamp assigner and watermark generator for streams where timestamps are monotonously
* ascending. In this case, the local watermarks for the streams are easy to generate, because
* they strictly follow the timestamps.
*
* @param <T> The type of the elements that this function can extract timestamps from
*/
@PublicEvolving
public abstract class AscendingTimestampExtractor<T> implements AssignerWithPeriodicWatermarks<T> {

private static final long serialVersionUID = 1L;

/** The current timestamp. */
private long currentTimestamp = Long.MIN_VALUE;

/** Handler that is called when timestamp monotony is violated. */
private MonotonyViolationHandler violationHandler = new LoggingHandler();

/**
* Extracts the timestamp from the given element. The timestamp must be monotonically increasing.
*
* @param element The element that the timestamp is extracted from.
* @return The new timestamp.
*/
public abstract long extractAscendingTimestamp(T element);

/**
* Sets the handler for violations to the ascending timestamp order.
*
* @param handler The violation handler to use.
* @return This extractor.
*/
public AscendingTimestampExtractor<T> withViolationHandler(MonotonyViolationHandler handler) {
this.violationHandler = requireNonNull(handler);
return this;
}

// ————————————————————————

@Override
public final long extractTimestamp(T element, long elementPrevTimestamp) {
final long newTimestamp = extractAscendingTimestamp(element);
if (newTimestamp >= this.currentTimestamp) {
this.currentTimestamp = newTimestamp;
return newTimestamp;
} else {
violationHandler.handleViolation(newTimestamp, this.currentTimestamp);
return newTimestamp;
}
}

@Override
public final Watermark getCurrentWatermark() {
return new Watermark(currentTimestamp == Long.MIN_VALUE ? Long.MIN_VALUE : currentTimestamp – 1);
}

//……
}

AscendingTimestampExtractor 抽象类实现 AssignerWithPeriodicWatermarks 接口的 extractTimestamp 及 getCurrentWatermark 方法,同时声明抽象方法 extractAscendingTimestamp 供子类实现
AscendingTimestampExtractor 适用于 elements 的时间在每个 parallel task 里头是单调递增 (timestamp monotony) 的场景,extractTimestamp 这里先是调用子类实现的 extractAscendingTimestamp 方法从 element 提取 newTimestamp,然后返回,对于违反 timestamp monotony 的,这里调用 MonotonyViolationHandler 进行处理
getCurrentWatermark 方法在 currentTimestamp 不为 Long.MIN_VALUE 时返回 Watermark(currentTimestamp – 1)

MonotonyViolationHandler
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/timestamps/AscendingTimestampExtractor.java
/**
* Interface for handlers that handle violations of the monotonous ascending timestamps
* property.
*/
public interface MonotonyViolationHandler extends java.io.Serializable {

/**
* Called when the property of monotonously ascending timestamps is violated, i.e.,
* when {@code elementTimestamp < lastTimestamp}.
*
* @param elementTimestamp The timestamp of the current element.
* @param lastTimestamp The last timestamp.
*/
void handleViolation(long elementTimestamp, long lastTimestamp);
}

/**
* Handler that does nothing when timestamp monotony is violated.
*/
public static final class IgnoringHandler implements MonotonyViolationHandler {
private static final long serialVersionUID = 1L;

@Override
public void handleViolation(long elementTimestamp, long lastTimestamp) {}
}

/**
* Handler that fails the program when timestamp monotony is violated.
*/
public static final class FailingHandler implements MonotonyViolationHandler {
private static final long serialVersionUID = 1L;

@Override
public void handleViolation(long elementTimestamp, long lastTimestamp) {
throw new RuntimeException(“Ascending timestamps condition violated. Element timestamp ”
+ elementTimestamp + ” is smaller than last timestamp ” + lastTimestamp);
}
}

/**
* Handler that only logs violations of timestamp monotony, on WARN log level.
*/
public static final class LoggingHandler implements MonotonyViolationHandler {
private static final long serialVersionUID = 1L;

private static final Logger LOG = LoggerFactory.getLogger(AscendingTimestampExtractor.class);

@Override
public void handleViolation(long elementTimestamp, long lastTimestamp) {
LOG.warn(“Timestamp monotony violated: {} < {}”, elementTimestamp, lastTimestamp);
}
}

MonotonyViolationHandler 继承了 Serializable,它定义了 handleViolation 方法,这个接口内置有三个实现类,分别是 IgnoringHandler、FailingHandler、FailingHandler
IgnoringHandler 的 handleViolation 方法不做任何处理;FailingHandler 的 handleViolation 会抛出 RuntimeException;LoggingHandler 的 handleViolation 方法会打印 warn 日志
AscendingTimestampExtractor 默认使用的是 LoggingHandler,也可以通过 withViolationHandler 方法来进行设置

实例
@Test
public void testWithFailingHandler() {
AscendingTimestampExtractor<Long> extractor = (new AscendingTimestampExtractorTest.LongExtractor()).withViolationHandler(new FailingHandler());
this.runValidTests(extractor);

try {
this.runInvalidTest(extractor);
Assert.fail(“should fail with an exception”);
} catch (Exception var3) {
;
}

}

private void runValidTests(AscendingTimestampExtractor<Long> extractor) {
Assert.assertEquals(13L, extractor.extractTimestamp(13L, -1L));
Assert.assertEquals(13L, extractor.extractTimestamp(13L, 0L));
Assert.assertEquals(14L, extractor.extractTimestamp(14L, 0L));
Assert.assertEquals(20L, extractor.extractTimestamp(20L, 0L));
Assert.assertEquals(20L, extractor.extractTimestamp(20L, 0L));
Assert.assertEquals(20L, extractor.extractTimestamp(20L, 0L));
Assert.assertEquals(500L, extractor.extractTimestamp(500L, 0L));
Assert.assertEquals(9223372036854775806L, extractor.extractTimestamp(9223372036854775806L, 99999L));
}

private void runInvalidTest(AscendingTimestampExtractor<Long> extractor) {
Assert.assertEquals(1000L, extractor.extractTimestamp(1000L, 100L));
Assert.assertEquals(1000L, extractor.extractTimestamp(1000L, 100L));
Assert.assertEquals(999L, extractor.extractTimestamp(999L, 100L));
}

private static class LongExtractor extends AscendingTimestampExtractor<Long> {
private static final long serialVersionUID = 1L;

private LongExtractor() {
}

public long extractAscendingTimestamp(Long element) {
return element;
}
}
这里使用 withViolationHandler 设置了 violationHandler 为 FailingHandler,在遇到 999 这个时间的时候,由于比之前的 1000 小,因而会调用 MonotonyViolationHandler.handleViolation 方法
小结

flink 为了方便开发提供了几个内置的 Pre-defined Timestamp Extractors / Watermark Emitters,其中一个就是 AscendingTimestampExtractor
AscendingTimestampExtractor 抽象类实现 AssignerWithPeriodicWatermarks 接口的 extractTimestamp 及 getCurrentWatermark 方法,同时声明抽象方法 extractAscendingTimestamp 供子类实现
AscendingTimestampExtractor 适用于 elements 的时间在每个 parallel task 里头是单调递增的,对于违反 timestamp monotony 的,这里调用 MonotonyViolationHandler 的 handleViolation 方法进行处理;MonotonyViolationHandler 继承了 Serializable,它定义了 handleViolation 方法,这个接口内置有三个实现类,分别是 IgnoringHandler、FailingHandler、FailingHandler

doc
Pre-defined Timestamp Extractors / Watermark Emitters

退出移动版