项目文章

搜索 TinyTools

Apache Flink一个开源强大的流处理和批处理功能框架

25.5k 文档演示 Github仓库 Gitee仓库

Apache Flink

Apache Flink 是一个框架和分布式处理引擎，用于对无界和有界数据流进行有状态计算。Flink 旨在运行在所有常见的集群环境中，以内存速度和任意规模执行计算。

了解更多关于 Flink 的信息，请访问 https://flink.apache.org/

特性

流式优先的运行时，同时支持批处理和数据流程序
优雅流畅的 Java API
同时支持极高吞吐量和低事件延迟的运行时
基于 Dataflow 模型，在 DataStream API 中支持 事件时间 和乱序处理
跨不同时间语义（事件时间、处理时间）的灵活窗口（时间、计数、会话、自定义触发器）
具有 恰好一次 处理保证的容错能力
流式程序中的自然背压
用于图处理（批处理）、机器学习（批处理）和复杂事件处理（流式）的库
自定义内存管理，可在内存和核外数据处理算法之间高效稳健地切换
Apache Hadoop MapReduce 的兼容层
与 YARN、HDFS、HBase 以及 Apache Hadoop 生态系统的其他组件集成

Streaming 示例

java 复制代码

// pojo class WordWithCount
public class WordWithCount {
    public String word;
    public int count;

    public WordWithCount() {}
    
    public WordWithCount(String word, int count) {
        this.word = word;
        this.count = count;
    }
}

// main method
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> text = env.socketTextStream(host, port);
DataStream<WordWithCount> windowCounts = text
    .flatMap(
        (FlatMapFunction<String, String>) (line, collector) 
            -> Arrays.stream(line.split("\\s")).forEach(collector::collect)
    ).returns(String.class)
    .map(word -> new WordWithCount(word, 1)).returns(TypeInformation.of(WordWithCount.class))
    .keyBy(wordWithCnt -> wordWithCnt.word)
    .window(TumblingProcessingTimeWindows.of(Duration.ofSeconds(5)))
    .sum("count").returns(TypeInformation.of(WordWithCount.class));

windowCounts.print();
env.execute();
}

Batch Example

java 复制代码

// pojo class WordWithCount
public class WordWithCount {
    public String word;
    public int count;

    public WordWithCount() {}

    public WordWithCount(String word, int count) {
        this.word = word;
        this.count = count;
    }
}

// main method
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
FileSource<String> source = FileSource.forRecordStreamFormat(new TextLineInputFormat(), new Path("MyInput.txt")).build();
DataStreamSource<String> text = env.fromSource(source, WatermarkStrategy.noWatermarks(), "MySource");
DataStream<WordWithCount> windowCounts = text
        .flatMap((FlatMapFunction<String, String>) (line, collector) -> Arrays
                .stream(line.split("\\s"))
                .forEach(collector::collect)).returns(String.class)
        .map(word -> new WordWithCount(word, 1)).returns(TypeInformation.of(WordWithCount.class))
        .keyBy(wordWithCount -> wordWithCount.word)
        .sum("count").returns(TypeInformation.of(WordWithCount.class));

windowCounts.print();
env.execute();

从源代码构建 Apache Flink

构建 Flink 的前提条件：

类 Unix 环境（我们使用 Linux、Mac OS X、Cygwin 和 WSL）
Git
Maven（我们需要 3.8.6 版本）
Java（版本 11、17 或 21）

基本构建说明

首先，克隆代码库：

复制代码

git clone https://github.com/apache/flink.git
cd flink

然后，根据您的 Java 版本选择以下命令之一：

对于 Java 11

复制代码

./mvnw clean package -DskipTests -Djdk11 -Pjava11-target

对于 Java 17（默认）

复制代码

./mvnw clean package -DskipTests -Djdk17 -Pjava17-target

对于Java 21

复制代码

./mvnw clean package -DskipTests -Djdk21 -Pjava21-target

构建过程大约需要 10 分钟。Flink 将安装在“build-target”中。

注意事项

确保您的 JAVA_HOME 环境变量指向正确的 JDK 版本
构建命令使用 Maven 包装器 (mvnw)，以确保使用正确的 Maven 版本
-DskipTests 标志会跳过正在运行的测试，以加快构建过程
每个 Java 版本都需要其对应的配置文件 (-Pjava-target) 和 JDK 标志 (-Djdk)

开发 Flink

Flink 提交者使用 IntelliJ IDEA 开发 Flink 代码库。
我们推荐使用 IntelliJ IDEA 开发涉及 Scala 代码的项目。

IDE 的最低要求如下：

支持 Java 和 Scala（也支持混合项目）
支持 Maven、Java 和 Scala

IntelliJ IDEA

IntelliJ IDE 开箱即用地支持 Maven，并提供 Scala 开发插件。

IntelliJ 下载：https://www.jetbrains.com/idea/
IntelliJ Scala 插件：https://plugins.jetbrains.com/plugin/?id=1347

查看我们的IntelliJ 设置指南，了解更多详情。

Eclipse Scala IDE

**注意：**根据我们的经验，此设置不适用于 Flink，
原因是 Scala IDE 3.0.3 捆绑的旧 Eclipse 版本存在缺陷，
或者由于 Scala IDE 4.4.1 中捆绑的 Scala 版本不兼容。

我们建议使用 IntelliJ（见上文）

支持

欢迎随时提问！

如果您需要任何帮助，请通过邮件列表联系开发者和社区。

如果您发现 Flink 中的错误，请提交问题。

文档

Apache Flink 的文档位于网站：https://flink.apache.org 或源代码的“docs/”目录中。

分支和贡献

这是一个活跃的开源项目。我们始终欢迎所有想要使用该系统或为其做出贡献的人。
如果您正在寻找适合您技能的实施任务，请联系我们。本文介绍了如何为 Apache Flink 做出贡献。

外部化连接器

大多数 Flink 连接器已外部化到 Apache 软件基金会下的独立代码库中：

关于

Apache Flink 是 Apache 软件基金会 (ASF) 的一个开源项目。Apache Flink 项目源自 Stratosphere 研究项目。

关于项目

Apache-2.0

Java

25,532

13791

929

2014-06-07

2025-11-27

Apache Flink一个开源强大的流处理和批处理功能框架

Apache Flink

特性

Streaming 示例

Batch Example

从源代码构建 Apache Flink

基本构建说明

注意事项

开发 Flink

IntelliJ IDEA

Eclipse Scala IDE

支持

文档

分支和贡献

外部化连接器

关于

关于项目

增长趋势 - stars