pandas

Friday, April 24, 2026
in performance, benchmarks, pandas
5 min read

Benchmarking DataFrame Paths in polars-bio 0.29.0

polars-bio 0.29.0 adds support for Pandas >= 3.0.0. Since pandas 3.0 made PyArrow-backed data even more central, with the new default string dtype using pyarrow under the hood when available, we wanted to measure what that means for interval workloads in practice.

So instead of comparing different interval libraries, this benchmark compares different input and execution paths through the same polars-bio range engine:

direct Parquet scan through Apache DataFusion
Pandas DataFrame
Pandas with Arrow-backed dtypes
Polars eager DataFrame
Polars lazy LazyFrame

The question is simple: how much overhead do you pay once data is materialized into a Python DataFrame, and how much of that gap can Arrow-backed Pandas close?