Interval operations benchmark โ update February 2026
Introduction
Back in September 2025 we benchmarked three libraries across three operations. A lot has changed since then. In December 2025, pyranges1 published a preprint describing its Rust-powered backend (ruranges) and an expanded set of interval operations. On the polars-bio side, version 0.24.0 ships a fully rewritten range-operations engine built on upstream DataFusion UDTF providers (OverlapProvider, NearestProvider, and the new coverage/cluster/complement/merge/subtract providers from datafusion-bio-function-ranges), replacing the earlier sequila-native backend.
This rewrite also expanded the operation set from three to eight. In addition to overlap, nearest, and count_overlaps, polars-bio 0.24.0 supports coverage, cluster, complement, merge, and subtract โ covering all the everyday interval manipulation tasks that genomics workflows depend on.
We also added a fourth contender: Bioframe, a pandas-based genomic interval library widely used in the 3D genomics community. This gives us a broader view of the Python genomic interval landscape.
For comparability with our previous benchmarks, we continue to use the same AIList dataset. All benchmark code and raw results are available in the polars-bio-bench repository.
Setup
Software versions
| Library | Version |
|---|---|
| polars-bio | 0.24.0 |
| pyranges1 | 1.2.0 |
| GenomicRanges | 0.8.4 |
| bioframe | 0.8.0 |
Benchmark test cases
Binary operations (overlap, nearest, count_overlaps, coverage):
| Dataset pairs | Size | # of overlaps (1-based) |
|---|---|---|
| 1-2 & 2-1 | Small | 54,246 |
| 3-7 & 7-3 | Medium | 4,408,383 |
| 7-8 & 8-7 | Large | 307,184,634 |
Unary operations (cluster, complement, merge, subtract):
| Dataset | Size | Name (intervals) |
|---|---|---|
| 1 | Small | fBrain (199K) |
| 2 | Small | exons (439K) |
| 7 | Medium | ex-anno (1,194K) |
| 3 | Medium | chainOrnAna1 (1,957K) |
| 8 | Large | ex-rna (9,945K) |
| 5 | Large | chainXenTro3Link (50,981K) |
Operations and tool support
| Operation | polars-bio | PyRanges1 | GenomicRanges | Bioframe |
|---|---|---|---|---|
| overlap | yes | yes | yes | yes |
| nearest | yes | yes | yes | yes |
| count_overlaps | yes | yes | yes | yes |
| coverage | yes | -- | yes | yes |
| cluster | yes | yes | -- | yes |
| complement | yes | yes | yes | yes |
| merge | yes | yes | yes | yes |
| subtract | yes | yes | -- | yes |
Results
Speedup comparison across all operations
Info
- Missing bars indicate that the operation is not supported by the library for the given dataset.
- Crash bars indicate that the library failed to complete.
Key takeaways:
- polars-bio is the fastest library in 7 out of 8 operations on the large dataset (8-7). The sole exception is nearest, where GenomicRanges holds a 1.63x advantage.
- On small datasets (1-2, 2-1), GenomicRanges leads in overlap (1.74x) and count_overlaps, reflecting lower per-call overhead for small inputs.
- Bioframe is consistently the slowest library, falling 5-50x behind polars-bio depending on the operation and dataset size.
- For the new operations (coverage, cluster, complement, merge, subtract), polars-bio leads across the board with PyRanges1 a respectable second (0.23-0.61x relative to polars-bio on the large dataset).
Summary โ dataset 8-7 speedup relative to polars-bio (higher is better for polars-bio):
| Operation | polars-bio | PyRanges1 | GenomicRanges | Bioframe |
|---|---|---|---|---|
| overlap | 1.00x | 0.18x | 0.73x | 0.13x |
| nearest | 1.00x | 0.06x | 1.63x | 0.04x |
| count_overlaps | 1.00x | 0.24x | 0.19x | 0.02x |
| coverage | 1.00x | -- | 0.36x | 0.05x |
| cluster | 1.00x | 0.61x | -- | 0.20x |
| complement | 1.00x | 0.53x | 0.02x | 0.06x |
| merge | 1.00x | 0.51x | 0.02x | 0.12x |
| subtract | 1.00x | 0.23x | -- | 0.05x |
Thread scalability (dataset 8-7)
One of polars-bio's key advantages is transparent multithreaded execution. The table below shows wall-clock times for all eight operations on the large dataset (8-7) as thread count increases:
| Operation | 1 thread | 8 threads | Speedup |
|---|---|---|---|
| overlap | 4.03s | 0.74s | 5.43x |
| nearest | 2.55s | 0.44s | 5.77x |
| count_overlaps | 1.55s | 0.27s | 5.75x |
| coverage | 1.20s | 0.22s | 5.48x |
| subtract | 1.15s | 0.17s | 6.59x |
| merge | 0.29s | 0.05s | 5.34x |
| complement | 0.29s | 0.06s | 4.99x |
| cluster | 0.48s | 0.15s | 3.27x |
Key takeaways:
- Most operations achieve near-linear scaling, with 5-6x speedup at 8 threads.
- subtract scales best at 6.59x, while cluster shows more modest scaling (3.27x) โ expected, as clustering is inherently sequential within each chromosome.
- At 8 threads, overlap on 307M result pairs completes in just 0.74 seconds.
Summary
- polars-bio is the fastest single-threaded library for 7 out of 8 operations at scale, with speedups ranging from 1.5x to 25x over alternatives.
- GenomicRanges wins the nearest operation and leads on small datasets, making it a solid choice when input sizes are modest.
- polars-bio delivers excellent thread scaling (5-6x at 8 threads), turning already-fast single-threaded times into sub-second performance on large datasets.
- PyRanges1, despite its recent preprint claiming "ultrafast" performance, is slower than polars-bio in every operation tested on large datasets (0.06-0.61x). While it is a solid improvement over its predecessor, the "ultrafast" characterization does not hold when compared against polars-bio's DataFusion-based engine.
- Bioframe is consistently the slowest across all operations and dataset sizes.
- The expanded eight-operation benchmark confirms that polars-bio's advantage extends beyond overlap and nearest to all standard range operation categories โ coverage, cluster, complement, merge, and subtract.
All benchmark code, raw results, and additional figures are available at polars-bio-bench.
