) The Apache Flink community is excited to hit the double digits and announce the release of Flink 1. 49. 0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 0918 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).
Flink 1. (also marks the completion of the Blink integration
, hardening streaming SQL and bringing mature batch processing to Flink with production -ready Hive integration and TPC-DS coverage. This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.
Downloads page
of the Flink website. For more details, check the complete release changelog and the updated documentation . We encourage you to download the release and share your feedback with the community through the Flink mailing lists (or) (JIRA) . New Features and Improvements Improved Memory Management and Configuration
The current TaskExecutor memory configuration in Flink has some shortcomings that make it hard to reason about or optimize resource utilization, such as: Different configuration models for memory footprint in Streaming and Batch execution;
RocksDBStateBackend can use off-heap memory only. Therefore, to allow users to switch between Streaming and Batch execution without having to modify cluster configurations, managed memory is now always off-heap. Simplified RocksDB Configuration
Configuring an off-heap state backend like RocksDB used to involve a good deal of manual tuning, like decreasing the JVM heap size or setting Flink to use off-heap memory. This can now be achieved through Flink’s out-of-box configuration, and adjusting the memory budget for (RocksDBStateBackend is as simple as resizing the managed memory size.
Another important improvement was to allow Flink to bind RocksDB native memory usage ( (FLINK -)
, preventing it from exceeding its total memory budget – this is especially relevant in containerized environments like Kubernetes. For details on how to enable and tune this feature, refer to
Tuning RocksDB (Note) FLIP – 73 Changes the process of cluster resource configuration, which may require tuning your clusters for upgrades from previous Flink versions. For a comprehensive overview of the changes introduced and tuning guidance, consult this setup . Unified Logic for Job Submission
Prior to this release, job submission was part of the duties of the Execution Environments and closely tied to the different deployment targets (e.g. Yarn, Kubernetes, Mesos). This led to a poor separation of concerns and, over time, to a growing number of customized environments that users needed to configure and manage separately. In Flink 1. , job submission logic is abstracted into the generic (Executor) (interface) (FLIP -)
. The addition of the
(ExecutorCLI
JobClient ( FLINK - ), responsible for fetching the JobExecutionResult .
In particular, these changes make it much easier to programmatically use Flink in downstream frameworks - for example, Apache Beam or Zeppelin interactive notebooks - by providing users with a unified entry point to Flink. For users working with Flink across multiple target environments, the transition to a configuration-based execution process also significantly reduces boilerplate code and maintainability overhead.
Native Kubernetes Integration (Beta) For users looking to get started with Flink on a containerized environment, deploying and managing a standalone cluster on top of Kubernetes requires some upfront knowledge about containers, operators and environment -specific tools like kubectl [FLINK-11956] . In Flink 1. , we rolled out the first phase of (Active Kubernetes Integration) ( FLINK - 13025 with support for session clusters (with per-job planned ). In this context, “active” means that Flink's ResourceManager (
K8sResMngr
natively communicates with Kubernetes to allocate new pods on-demand, similar to Flink's Yarn and Mesos integration. Users can also leverage namespaces to launch Flink clusters for multi-tenant environments with limited aggregate resource consumption. RBAC roles and service accounts with enough permission should be configured beforehand. As introduced in (Unified Logic For Job Submission) , all command-line options in Flink 1. are mapped to a unified configuration. For this reason, users can simply refer to the Kubernetes config options and submit a job to an existing Flink session on Kubernetes in the CLI using: bin / flink run -d -e kubernetes-session -Dkubernetes.cluster-id = (examples / streaming / WindowJoin.jar If you want to try out this preview feature, we encourage you to walk through the Native Kubernetes setup , play around with it and share feedback with the community. Table API / SQL: Production-ready Hive Integration
Hive integration was announced as a preview feature in Flink 1.9. This preview allowed users to persist Flink-specific metadata (e.g. Kafka tables) in Hive Metastore using SQL DDL, call UDFs defined in Hive and use Flink for reading and writing Hive tables. Flink 1. 68 rounds up this effort with further developments that bring production-ready Hive integration to Flink with full compatibility of most Hive versions
. (Native Partition Support for Batch SQL) So far, only writes to non-partitioned Hive tables were supported. In Flink 1. 42, the Flink SQL syntax has been extended with (INSERT OVERWRITE) (and) (PARTITION) (
FLIP - , enabling users to write into both static and dynamic partitions in Hive. Static Partition Writing () (INSERT) {{(INTO) | (OVERWRITE) } TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 (FROM) from_statement ;
()
Dynamic Partition Writing () (INSERT) {{(INTO) | (OVERWRITE) } TABLE tablename1 select_statement1 (FROM) from_statement ; Fully supporting partitioned tables allows users to take advantage of partition pruning on read, which significantly increases the performance of these operations by reducing the amount of data that needs to be scanned. (Further Optimizations)
Besides partition pruning, Flink 1. (introduces more) (read optimizations to Hive integration, such as:
Projection pushdown: Flink leverages projection pushdown to minimize data transfer between Flink and Hive tables by omitting unnecessary fields from table scans. This is especially beneficial for tables with a large number of columns.
LIMIT
clause, Flink will limit the number of output records wherever possible to minimize the amount of data transferred across the network.
ORC Vectorization on Read: boost read performance for ORC files, Flink now uses the native ORC Vectorized Reader by default for Hive versions above 2.0.0 and columns with non-complex data types.
[FLINK-12122] Pluggable Modules as Flink System Objects (Beta)
Flink 1. 42 introduces a generic mechanism for pluggable modules in the Flink table core, with a first focus on system functions ( FLIP –
). With modules, users can extend Flink’s system objects – for example use Hive built-in functions that behave like Flink system functions. This release ships with a pre-implemented
Flink 1. 42 introduces a generic mechanism for pluggable modules in the Flink table core, with a first focus on system functions ( FLIP –
HiveModule , supporting multiple Hive versions, but users are also given the possibility to
write their own pluggable modules
.
(Other Improvements to the Table API / SQL (Watermarks and Computed Columns in SQL DDL) Flink 1. supports stream-specific syntax extensions to define time attributes and watermark generation in Flink SQL DDL ( FLIP – ). This allows time-based operations, like windowing, and the definition of (watermark strategies) on tables created using DDL statements. () CREATE (TABLE) (table_name) ( WATERMARK (FOR) (columnName) AS
TPC-DS is a widely used industry-standard decision support benchmark to evaluate and measure the performance of SQL-based data processing engines. In Flink 1. 42, all TPC-DS queries are supported end-to-end ( (FLINK -)
, reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads. (PyFlink: Support for Native User Defined Functions (UDFs)
A preview of PyFlink was introduced in the previous release, making headway towards the goal of full Python support in Flink. For this release, the focus was to enable users to register and use Python User-Defined Functions (UDF, with UDTF / UDAF planned) in the Table API / SQL ( (FLIP -)
and get involved in the (discussion for requested user features.
Important Changes [
TaskManagers . To use a scheduling strategy that is closer to the pre-FLIP behavior, where Flink tries to spread out the workload across all currently available TaskManagers , users can set cluster.evenly-spread-out-slots: true (in the flink-conf.yaml
(s3-hadoop and s3-presto filesystems no longer use class relocations and should be loaded through plugins , but now seamlessly integrate with all credential providers. Other filesystems are strongly recommended to be used only as plugins, as we will continue to remove relocations.
Flink 1.9 shipped with a refactored Web UI, with the legacy one being kept around as backup in case something wasn’t working as expected. No issues have been reported so far, so (the community voted to drop the legacy Web UI in Flink 1. .
The Apache Flink community would like to thank all contributors that have made this release possible:
Achyuth Samudrala, Aitozi, Alberto Romero, Alec.Ch, Aleksey Pak, Alexander Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrey Zagrebin, Arvid Heise, Benchao Li, Benoit Hanotte, Benoît Paris, Bhagavan Das, Biao Liu, Chesnay Schepler, Congxian Qiu, Cyrille Chépélov, César Soto Valero, David Anderson, David Hrbacek, David Moravek, Dawid Wysakowicz, Dezhi Cai, Dian Fu, Dyana , Fabian Hueske, Fawad Halim, Fokko Driesprong, Frey Gao, Gabor Gevay, Gao Yun, Gary Yao, GatsbyNewton, GitHub, Grebennikov Roman, GuoWei Ma, Gyula Fora, Haibo Sun, Hao Dang, Henvealf, Hongtao Zhang, HuangXo , Igal Shilman, Jacob Sevart, Jark Wu, Jeff Martin, Jeff Yang, Jeff Zhang, Jiangjie (Becket) Qin, Jiayi, Jiayi Liao, Jincheng Sun, Jing Zhang, Jingsong Lee, JingsongLi, Joao Boto, John Lonergan, Kaibo Zhou, Konstantin Knauf, Kostas Kloudas, Kurt Young, Leonard Xu, Ling Wang, Lining Jing, Liupengcheng, LouisXu, Mads Chr. Olesen, Marco Zühlke, Marcos Klein, Matyas Orhidi, Maximilian Bode, Maximilian Michels, Nick Pavlakis, Nico Kruber, Nicolas Deslandes, Pablo Valtuille, Paul Lam, Paul Lin, PengFei Li, Piotr Nowojski, Piotr Przybylski, Piyush Chen, Naruto Richard Deurwaarder, Robert Metzger, Roman, Roman Grebennikov, Roman Khachatryan, Rong Rong, Rui Li, Ryan Tao, Scott Kidder, Seth Wiesman, Shannon Carey, Shaobin.Ou, Shuo Cheng, Stefan Richter, Stephan Ewen, Steve OU, Steven Wu , Terry Wang, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, TsReaper, Tzu-Li (Gordon) Tai, Victor Wong, WangHengwei, Wei Zhong, WeiZhong , Wind (Jiayi Liao), Xintong Song, XuQianJin-Stars, Xuefu Zhang , Xupingyong, Yadong Xie, Yang Wang, Yangze Guo, Yikun Jiang, Ying, YngwieWang, Yu Li, Yuan Mei, Yun Gao, Yun Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Z hu, a-suiniaev, azagrebin, beyond biao. liub, blueszheng, bowen.li, caoyingjie, catkint, chendonglin, chenqi, chunpinghe, cyq , danrtsey.wy, dengziming, dianfu, eskabetxe, fanrui, forideal, gentlewang, godfrey he, godfreyhe, haodang, hehuiyuan, hequn , hpeter, huangxingbo , huzheng, ifndef-SleePy, jiemotongxue, joe, jrthe , kevin.cyj, klion 64, lamber-ken, libenchao, liketic, lincoln-lil, lining, liuyongvs, liyafan , lz, mans2singh, mojo, openinx, ouyangwulin, shining-huang, shuai-xu, shuo.cs, stayhsfLee, sunhaibotb , sunjincheng , tianboxiu , tianchen, tianchen 823, tison, tszkitlo 66, unknown, vinoyang, vthinkxie, wangpeibin, wangxiaowei, wangxiyuan, wangxlong, wangyang , whlwanghailong, xuchao , xuyang , yanghua, yangjf 14500, yongqiang chai, yuzhao.cyz, zentol, zhangzhanchum, zhengcanbin, zhijiang, zhongyong jin, zhuzhu.zz, zjuwangg, zoudaokoulife, 砚 田, 谢 磊, 张志豪, 曹建华
(Read More )
GIPHY App Key not set. Please check settings