Add 10-year anniversary blog post #756

pitrou · 2026-02-09T12:31:15Z

No description provided.

github-actions · 2026-02-09T12:35:53Z

Preview URL: https://pitrou.github.io/arrow-site

If the preview URL doesn't work, you may forget to configure your fork repository for preview.
See https://github.com/apache/arrow-site/blob/main/README.md#forks how to configure.

_posts/2026-02-10-arrow-anniversary.md

wgtmac · 2026-02-10T02:18:06Z

_posts/2026-02-10-arrow-anniversary.md

+example of how building on top of existing Arrow formats and implementations can
+enable groundbreaking efficiency improvements in a very non-trivial problem space.
+
+It should also be noted that Arrow is often used hand in hand with


Do we want to mention the donation of parquet-cpp and native parquet implementations on rust and go?

I don't know, how would you word it? It's more of a rationalization of existing development practices rather than a "donation" (i.e. a gift).

At minimum, it's probably worth mentioning that nearly all official parquet libraries live in Arrow repositories

I think readers may be interested in the story of the "donation" which is less common. I asked Gemini to write the anecdote as below.

Did you know that the Parquet C++ code used to live in its own repository? In the early years, developers found themselves in a 'circular dependency morass.' To fix a bug in PyArrow's Parquet support, you often had to submit a patch to one project, wait for a release, and then update the other. In 2018, the community decided to stop fighting the logistics and merged the C++ and Python development of both into the Apache Arrow mono-repo. This move streamlined the project and solidified the tight-knit relationship between the world's best on-disk and in-memory columnar formats.

At minimum, it's probably worth mentioning that nearly all official parquet libraries live in Arrow repositories

Definitely, will do.

alamb

I love this -- thank you @pitrou for writing it. I left a bunch of editorial comments, but I don't think any of them are required (all "nice to have")

alamb · 2026-02-10T13:31:25Z

_posts/2026-02-10-arrow-anniversary.md

+integration tests that are routinely checked against multiple implementations of
+Arrow have data files [generated in 2019 by Arrow 0.14.1](https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration/0.14.1).
+
+## The lost Union validity bitmap


I would personally suggest removing this section -- it is already mentioned above and I think it distracts from the main narrative about Arrow's stability and widespread adoption.

Union types cannot have a top-level validity bitmap anymore.

I suggest adding a link to the mailing list discussion in that text https://lists.apache.org/thread/przo99rtpv4rp66g1h4gn0zyxdq56m27 and then removing this section

I don't know, it might be a bit of interesting trivia for the reader. What do other people thing? @ianmcook @paleolimbot @raulcd

I agree with Andrew here, although the post is great either way. This particular piece of trivia caused me mild personal discontent since compatibility with this and previous versions is still exercised in integration testing; however, it does interrupt the narrative a bit and I'm not sure there are many implementors of IPC readers out there.

I agree with @alamb

I as a reader feel this part interesting. And I take this as a callback to the previously mentioning words. But I think it might be better to highlight the rationale of this change in a short sentence.

I am ok with both keeping it or removing it. I personally find it interesting and I also find that adding it as a section reinforces the message that the format is stable, there hasn't been any breaking changes on the formats since then and we are very careful and aware of those.

Piggybacking on @raulcd 's comment...

The section could potentially be retitled as "No breaking changes (almost)" or something to that effect. This fits in with the overall narrative while still giving a spot to talk about the trivia.

_posts/2026-02-10-arrow-anniversary.md

alamb · 2026-02-10T13:49:33Z

_posts/2026-02-10-arrow-anniversary.md

+Since then, there has been precisely zero breaking change in the Arrow Columnar and IPC
+formats.
+
+## Apache Arrow 1.0.0


I personally suggest moving this paragraph up to the top (right after the introduction)

I realize that the current blog structure is chronological, but I think ordering it in descending order of importance would improve the flow -- if we moved this paragraph to the start, the blog would start with a victory lap about the stability and wide reaching impact (Arrow 1.0) and then discuss some of the path to get there.

But the "today" part would still be at the end, or? That might read awkwardly?

Yes, you are right -- it would make sense to move the "today" section to the top as well if we go in this direction

_posts/2026-02-10-arrow-anniversary.md

zeroshade · 2026-02-10T18:48:59Z

_posts/2026-02-10-arrow-anniversary.md

+Beyond these subprojects, many third-party efforts have adopted the Arrow formats
+for efficient interoperability. [GeoArrow](https://geoarrow.org/) is an impressive


would it be worthwhile to mention polaris (built on top of arrow), NVIDIA Rapids and cuDF (use the Arrow Format on the gpu), duckdb (zero-copy interoperable with Arrow), Dremio (Arrow-native internally, uses FlightSQL), InfluxDB (FlightSQL and Arrow-native), Snowflake returning Arrow, Google BigQuery returning Arrow, Spark Connect using Arrow, etc....

I don't know, we'll always be misrepresenting reality since there are so many projects that could be mentioned. I thought GeoArrow is interesting to mention because they are adding their own datatypes to address a particular problem space.

Maybe we link to Powered By or something?

We already do above :)

zanmato1984

Thanks for the epic post!

zanmato1984 · 2026-02-11T02:14:36Z

_posts/2026-02-10-arrow-anniversary.md

+integration tests that are routinely checked against multiple implementations of
+Arrow have data files [generated in 2019 by Arrow 0.14.1](https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration/0.14.1).
+
+## The lost Union validity bitmap


I as a reader feel this part interesting. And I take this as a callback to the previously mentioning words. But I think it might be better to highlight the rationale of this change in a short sentence.

pitrou · 2026-02-11T10:05:02Z

_posts/2026-02-10-arrow-anniversary.md

+## How it started
+
+From the start, Arrow has been a joint effort between practitioners of various
+horizons looking to build common grounds to efficiently exchange columnar data
+between different libraries and systems.
+In [this blog post](https://sympathetic.ink/2024/02/06/Chapter-2-From-Parquet-to-Arrow.html),
+Julien Le Dem recalls how some of the founders of the [Apache Parquet](https://parquet.apache.org/)
+project participated in the early days of the Arrow design phase. The idea of Arrow
+as an in-memory format was meant to address the over half of the interoperability
+problem, the natural complement to Parquet as a persistent storage format.


@julienledem Would you like to do a quick read here, in case I'm misrepresenting things?

raulcd

Thanks @pitrou for working on this. I think it's great! We should celebrate more!

westonpace

Great article, thanks for writing this!

westonpace · 2026-02-11T14:53:49Z

_posts/2026-02-10-arrow-anniversary.md

+In [this blog post](https://sympathetic.ink/2024/02/06/Chapter-2-From-Parquet-to-Arrow.html),
+Julien Le Dem recalls how some of the founders of the [Apache Parquet](https://parquet.apache.org/)
+project participated in the early days of the Arrow design phase. The idea of Arrow
+as an in-memory format was meant to address the over half of the interoperability


Suggested change

as an in-memory format was meant to address the over half of the interoperability

as an in-memory format was meant to address the other half of the interoperability

westonpace · 2026-02-11T14:57:41Z

_posts/2026-02-10-arrow-anniversary.md

+integration tests that are routinely checked against multiple implementations of
+Arrow have data files [generated in 2019 by Arrow 0.14.1](https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration/0.14.1).
+
+## The lost Union validity bitmap


Piggybacking on @raulcd 's comment...

The section could potentially be retitled as "No breaking changes (almost)" or something to that effect. This fits in with the overall narrative while still giving a spot to talk about the trivia.

westonpace · 2026-02-11T14:59:58Z

_posts/2026-02-10-arrow-anniversary.md

+participate constructively. While the specifications are stable, they may still
+welcome additions to cater for new use cases, as they have done in the past.


Suggested change

participate constructively. While the specifications are stable, they may still

welcome additions to cater for new use cases, as they have done in the past.

participate constructively. While the specifications are stable, they still

welcome additions to cater for new use cases, as they have done in the past.

Feel free to ignore, I don't have any formal justification for this suggestion, the wording just seemed a little off

westonpace · 2026-02-11T15:00:20Z

_posts/2026-02-10-arrow-anniversary.md

+welcome additions to cater for new use cases, as they have done in the past.
+
+The Arrow implementations are actively maintained, gaining new features, bug fixes,
+performance improvements. We encourage people to contribute to their implementation


Suggested change

performance improvements. We encourage people to contribute to their implementation

and performance improvements. We encourage people to contribute to their implementation

pitrou force-pushed the arrow-10-years branch 3 times, most recently from 32313b0 to 097250b Compare February 9, 2026 13:23

ianmcook reviewed Feb 9, 2026

View reviewed changes

_posts/2026-02-10-arrow-anniversary.md Outdated Show resolved Hide resolved

pitrou force-pushed the arrow-10-years branch 2 times, most recently from 77e83f8 to 418acd0 Compare February 9, 2026 17:46

wgtmac reviewed Feb 10, 2026

View reviewed changes

alamb approved these changes Feb 10, 2026

View reviewed changes

pitrou force-pushed the arrow-10-years branch from 418acd0 to cfb6a57 Compare February 10, 2026 17:30

zeroshade reviewed Feb 10, 2026

View reviewed changes

zanmato1984 approved these changes Feb 11, 2026

View reviewed changes

Add 10-year anniversary blog post

22000ba

pitrou force-pushed the arrow-10-years branch from cfb6a57 to 22000ba Compare February 11, 2026 09:58

pitrou commented Feb 11, 2026

View reviewed changes

raulcd approved these changes Feb 11, 2026

View reviewed changes

westonpace approved these changes Feb 11, 2026

View reviewed changes

		Beyond these subprojects, many third-party efforts have adopted the Arrow formats
		for efficient interoperability. [GeoArrow](https://geoarrow.org/) is an impressive

	as an in-memory format was meant to address the over half of the interoperability
	as an in-memory format was meant to address the other half of the interoperability

		participate constructively. While the specifications are stable, they may still
		welcome additions to cater for new use cases, as they have done in the past.

	performance improvements. We encourage people to contribute to their implementation
	and performance improvements. We encourage people to contribute to their implementation

Add 10-year anniversary blog post #756

Are you sure you want to change the base?

Add 10-year anniversary blog post #756

Conversation

pitrou commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raulcd Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanmato1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raulcd left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

raulcd Feb 11, 2026 •

edited

Loading