Skip to content

๐Ÿ”ฌ Supplementary material

Supplemental material

This document provides additional information about the algorithms, benchmarking setup, data, and results that were presented in the manuscript.

Algorithm description

polars-bio implements a set of binary interval operations on genomic ranges, such as overlap, nearest, count-overlaps, and coverage. All these operations share the very similar algorithmic structure, which is presented in the diagram below.

flowchart TB
    %% Define header node
    H["Interval operation"]

    %% Define DataFrame nodes
    I0["left DataFrame"]
    I1["right DataFrame"]

    style I0 stroke-dasharray: 5 5, stroke-width: 1

    %% Draw edges with labels
    H -->|probe /streaming/ side| I0
    H -->|build /search structure/ side| I1

    %% Record batches under left DataFrame within a dotted box
    I0 --> LeftGroup
    subgraph LeftGroup["Record Batches"]
        direction TB
        LB0["Batch 1"]
        LB1["Batch 2"]
        LB2["Batch 3"]
    end
    style LeftGroup stroke-dasharray: 5 5, stroke-width: 1

    %% Record batches under right DataFrame within a dotted box
    I1 --> RightGroup
    subgraph RightGroup["Record Batches"]
        direction TB
        RB0["Batch 1"]
        RB1["Batch 2"]
        RB2["Batch 3"]
    end

The basic concept is that each operation consists of two sides: the probe side and the build side. The probe side is the one that is streamed, while the build side is the one that is implemented as a search data structure (for generic overlap operation the search structure can be changed using algorithm parameter, for other operations is always Cache Oblivious Interval Trees as according to the benchmark COITrees outperforms other data structures). In the case of nearest operation there is an additional sorted list of intervals used for searching for closest intervals in the case of non-existing overlaps.

Note

Available search structure implementations for overlap operation:

Once the build side data structure is ready, then records from the probe side are processed against the search structure organized as record batches. Each record batch can be processed independently. Search structure nodes contains identifiers of the rows from the build side that are then used to construct a new record that is returned as a result of the operation.

Out-of-core (streaming) processing

This algorithm allows you to process your results without requiring all your data to be in memory at the same time. In particular, the probe side can be streamed from a file or a cloud storage, while the build side needs to be materialized in memory. In real applications, the probe side is usually a large file with genomic intervals, while the build side is a smaller file with annotations or other genomic features. This allows you to process large genomic datasets without running out of memory.

Note

In this sense, the order of the sides is important, as the probe side is streamed and processed in batches, while the build side is fully materialized in memory.

Parallelization

In the current implementation, the probe side can be processed in parallel using multiple threads on partitioned (implicitly or explicilty partitioned inputs - see partitioning strategies). The build side is predominantly single-threaded (with the notable exception of BGZF compressed or partitioned Parquet/CSV input data files reading, which can be parallelized).

Implementation

polars-bio uses the following Apache DataFusion extension points:

Comparison with existing tools

The table below compares polars-bio with other popular Python libraries for genomic ranges operations.

Feature/Library polars-bio Bioframe PyRanges0 PyRanges1 pybedtools PyGenomics GenomicRanges
out-of-core processing โœ… โŒ โŒ โŒ โŒ โŒ โŒ
parallel processing โœ… โŒ โœ…1 โŒ โŒ โŒ โŒ
vectorized execution engine โœ… โŒ โŒ โŒ โŒ โŒ โŒ
cloud object storage support โœ… โœ…/โŒ2 โŒ โŒ โŒ โŒ โœ…
Pandas/Polars DataFrame support โœ…/โœ… โœ…/โŒ โœ…/โŒ3 โœ…/โŒ4 โŒ/โŒ โŒ/โŒ โœ…/โœ…

Note

1 PyRanges0 supports parallel processing with Ray, but it does not bring any performance benefits over single-threaded execution and it is not recommended. Overlap and nearest operations benchmark (1,2,4,6,8 threads) on 8-7 on Apple M3 Max platfotm confirms this observation.

Library Min (s) Max (s) Mean (s) Speedup
pyranges0 16.519153 17.889156 17.118936 1.00x
pyranges0-2 32.539549 34.858773 33.762477 0.51x
pyranges0-4 30.033927 30.367822 30.158362 0.57x
pyranges0-6 27.711752 33.280867 30.089641 0.57x
pyranges0-8 30.049501 33.257462 31.553328 0.54x
Library Min (s) Max (s) Mean (s) Speedup
pyranges0 1.580677 1.703093 1.630820 1.00x
pyranges0-2 3.954720 4.032619 3.997087 0.41x
pyranges0-4 3.716688 4.004058 3.847917 0.42x
pyranges0-6 3.853526 3.942475 3.883337 0.42x
pyranges0-8 3.861577 3.924950 3.902913 0.42x

2 Some input functions, such as read_table support cloud object storage

3 Only export/import with data copying is supported

4 RangeFrame class extends Pandas DataFrame

Benchmark setup

Code and benchmarking scenarios

Repository

Memory profiling

For memory profiling Python memory-profiler version 0.61.0 was used. A helper run-memory-profiler.py script was developed and a sample invocation was used to run the tests as it is presented in the snippet below:

PRFOF_FILE="polars_bio_1-2.dat"
mprof run --output $PRFOF_FILE python src/run-memory-profiler.py --bench-config conf/paper/benchmark-e2e-overlap.yaml --tool polars_bio --test-case 1-2
mprof plot $PRFOF_FILE

Note

On each memory profile plot, the maximum memory is marked at the intersection of the two dashed lines.

Operating systems and hardware configurations

macOS

  • cpu architecture: arm64
  • cpu name: Apple M3 Max
  • cpu cores: 16
  • memory: 64 GB
  • kernel: Darwin Kernel Version 24.2.0: Fri Dec 6 19:02:12 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6031
  • system: Darwin
  • os-release: macOS-15.2-arm64-arm-64bit
  • python: 3.12.4
  • polars-bio: 0.8.3

Linux

c3-standard-22 machine was used for benchmarking.

  • cpu architecture: x86_64
  • cpu name: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
  • cpu cores: 22
  • memory: 88 GB
  • kernel: Linux-6.8.0-1025-gcp-x86_64-with-glibc2.35
  • system: Linux
  • os-release: #27~22.04.1-Ubuntu SMP Mon Feb 24 16:42:24 UTC 2025
  • python: 3.12.8
  • polars-bio: 0.8.3

Software

Data

Real dataset

The AIList dataset after transcoding into the Parquet file format (with the Snappy compression) was used for benchmarking. This dataset was published with the AIList paper:

Jianglin Feng , Aakrosh Ratan , Nathan C Sheffield, Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics 2019.

Dataset# Name Size(x1000) Description
0 chainRn4 2,351 Source
1 fBrain 199 Source
2 exons 439 Dataset used in the BEDTools tutorial.
3 chainOrnAna1 1,957 Source
4 chainVicPac2 7,684 Source
5 chainXenTro3Link 50,981 Source
6 chainMonDom5Link 128,187 Source
7 ex-anno 1,194 Dataset contains GenCode annotations with ~1.2 million lines, mixing all types of features.
8 ex-rna 9,945 Dataset contains ~10 million direct-RNA mappings.

Source: AIList Github

All Parquet files from this dataset shared the same schema:

  contig STRING
  pos_start INT32
  pos_end INT32

Sythetic dataset

Randomly generated intervals (100-10,000,000) inspired by bioframe performance analysis. Generated with generate_dataset.py

poetry run python src/generate_dataset.py
All Parquet files from this dataset shared the same schema:
  contig STRING
  pos_start INT64
  pos_end INT64

Note

Test datasets in the Parquet format can be downloaded from:

Overlap results

Test case polars_bio1 - # of overlaps bioframe2 - # of overlaps pyranges0 - # of overlaps pyranges1 - # of overlaps
1-2 54246 54246 54246 54246
8-7 307184634 307184634 307184634 307184634
100 781 781 781 781
1000 8859 8859 8859 8859
10000 90236 90236 90236 90236
100000 902553 902553 902553 902553
1000000 9007817 9007817 9007817 9007817
10000000 90005371 90005371 90005371 90005371

1 bioframe and pyranges are zero-based, this is why we need to set use_zero_based=True (polars-bio >= 0.10.3) in polars-bio to get the same results as in bioframe and pyranges.

2 bioframe how parameter is set to inner (left by default)

Single thread results

Results for overlap, nearest, count-overlaps, and coverage operations with single-thread performance on apple-m3-max and gcp-linux platforms.

Note

Please note that in case of pyranges0 we were unable to compute the results of coverage and count-overlaps operations for macOS and Linux in the synthetic benchmark, so the results are not presented here.

apple-m3-max

1-2
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.035619 0.043113 0.0383 2.70x
bioframe 0.102257 0.104425 0.103354 1.00x
pyranges0 0.025425 0.032821 0.028001 3.69x
pyranges1 0.059608 0.064147 0.061763 1.67x
pybedtools 0.343204 0.352804 0.348434 0.30x
genomicranges 1.042893 1.044245 1.043488 0.10x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.039943 0.045166 0.042109 4.45x
bioframe 0.185452 0.189631 0.187388 1.00x
pyranges0 0.092334 0.09634 0.093688 2.00x
pyranges1 0.133631 0.134179 0.133981 1.40x
pybedtools 0.756676 0.761866 0.75953 0.25x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.026706 0.029754 0.028142 4.69x
bioframe 0.131124 0.133729 0.132052 1.00x
pyranges0 0.039136 0.039774 0.039377 3.35x
pyranges1 0.061976 0.063181 0.062658 2.11x
pybedtools 0.665804 0.673844 0.668534 0.20x
genomicranges 0.994963 1.006435 0.999389 0.13x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.0262 0.028749 0.027418 6.30x
bioframe 0.16949 0.176628 0.172842 1.00x
pyranges0 0.07376 0.076708 0.075369 2.29x
pyranges1 0.128027 0.133263 0.130247 1.33x
pybedtools 0.701817 0.708726 0.705839 0.24x
genomicranges 1.032651 1.049059 1.040799 0.17x
8-7
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.987391 4.648581 4.235518 7.17x
bioframe 29.793837 30.991576 30.375518 1.00x
pyranges0 15.632212 15.974075 15.857213 1.92x
pyranges1 31.622804 33.699074 32.680701 0.93x
pybedtools 916.711575 919.974811 918.154834 0.03x
genomicranges 479.214112 487.832054 484.579554 0.06x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.116922 2.169534 2.139006 32.13x
bioframe 68.581465 68.992651 68.725495 1.00x
pyranges0 1.381964 1.508513 1.424446 48.25x
pyranges1 2.697684 2.728407 2.717532 25.29x
pybedtools 35.528719 35.876667 35.699544 1.93x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.445467 1.484052 1.46225 58.77x
bioframe 85.632767 86.26148 85.935955 1.00x
pyranges0 9.674847 9.833233 9.753982 8.81x
pyranges1 10.170249 10.254359 10.201813 8.42x
pybedtools 33.101592 33.966188 33.423595 2.57x
genomicranges 488.972732 490.395787 489.548184 0.18x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.195279 1.205765 1.199323 20.45x
bioframe 24.423391 24.682901 24.525909 1.00x
pyranges0 11.093644 11.328071 11.220416 2.19x
pyranges1 11.987003 12.147925 12.066045 2.03x
pybedtools 59.699275 60.04087 59.84965 0.41x
genomicranges 500.041974 503.31936 502.043072 0.05x
100-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002471 0.006262 0.003855 0.54x
bioframe 0.001374 0.002735 0.002067 1.00x
pyranges0 0.000977 0.001952 0.001337 1.55x
pyranges1 0.002276 0.003591 0.002739 0.75x
pybedtools 0.006856 0.010064 0.008032 0.26x
genomicranges 0.001784 0.002115 0.001938 1.07x
pygenomics 0.000475 0.000541 0.000509 4.06x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002802 0.007312 0.004371 0.51x
bioframe 0.00157 0.00347 0.002251 1.00x
pyranges0 0.00135 0.004085 0.002281 0.99x
pyranges1 0.002084 0.003622 0.002633 0.85x
pybedtools 0.005288 0.023073 0.011717 0.19x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.001892 0.006355 0.003397 0.52x
bioframe 0.001563 0.002165 0.001775 1.00x
pyranges1 0.00181 0.002209 0.001972 0.90x
pybedtools 0.020892 0.062978 0.036866 0.05x
genomicranges 0.001896 0.002057 0.001957 0.91x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.001911 0.006057 0.003343 1.03x
bioframe 0.003065 0.00411 0.003452 1.00x
pyranges1 0.004455 0.005845 0.005021 0.69x
pybedtools 0.02477 0.059532 0.037421 0.09x
1000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.00262 0.004367 0.003278 0.71x
bioframe 0.001909 0.002988 0.002313 1.00x
pyranges0 0.001361 0.00182 0.001543 1.50x
pyranges1 0.002678 0.003166 0.002927 0.79x
pybedtools 0.037238 0.039737 0.038453 0.06x
genomicranges 0.019265 0.019945 0.01957 0.12x
pygenomics 0.006876 0.006994 0.006949 0.33x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.003048 0.0083 0.00553 0.65x
bioframe 0.003269 0.004119 0.003604 1.00x
pyranges0 0.002514 0.003506 0.003099 1.16x
pyranges1 0.003722 0.00418 0.003935 0.92x
pybedtools 0.00881 0.011281 0.009729 0.37x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.001854 0.004714 0.002898 1.00x
bioframe 0.002523 0.003547 0.002898 1.00x
pyranges1 0.002302 0.002838 0.002498 1.16x
pybedtools 0.032681 0.047822 0.037981 0.08x
genomicranges 0.020029 0.02029 0.020192 0.14x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002202 0.003516 0.002696 1.77x
bioframe 0.004238 0.005691 0.004758 1.00x
pyranges1 0.004909 0.005934 0.005284 0.90x
pybedtools 0.030735 0.045004 0.03646 0.13x
10000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004603 0.008294 0.006073 1.81x
bioframe 0.010529 0.011367 0.011014 1.00x
pyranges0 0.006498 0.007306 0.006811 1.62x
pyranges1 0.01096 0.012611 0.011684 0.94x
pybedtools 0.94646 0.94995 0.948121 0.01x
genomicranges 0.198868 0.200266 0.199428 0.06x
pygenomics 0.080325 0.08121 0.080663 0.14x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004851 0.007782 0.005908 4.50x
bioframe 0.025947 0.027779 0.026584 1.00x
pyranges0 0.00501 0.005703 0.00526 5.05x
pyranges1 0.007517 0.007937 0.00769 3.46x
pybedtools 0.040749 0.043864 0.041889 0.63x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.003283 0.008069 0.005083 3.12x
bioframe 0.014669 0.016689 0.015834 1.00x
pyranges1 0.007637 0.008979 0.008178 1.94x
pybedtools 0.720797 0.730655 0.725407 0.02x
genomicranges 0.202131 0.209398 0.204628 0.08x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002756 0.004613 0.003377 3.06x
bioframe 0.009849 0.011243 0.010339 1.00x
pyranges1 0.01326 0.015308 0.013973 0.74x
pybedtools 0.727294 0.733098 0.73116 0.01x
100000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.030583 0.038892 0.033394 3.33x
bioframe 0.108358 0.115233 0.111059 1.00x
pyranges0 0.059633 0.065599 0.061791 1.80x
pyranges1 0.100074 0.105947 0.102267 1.09x
pybedtools 13.434458 13.602339 13.496321 0.01x
genomicranges 2.030365 2.052434 2.039897 0.05x
pygenomics 1.001974 1.018231 1.009213 0.11x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.03013 0.036718 0.03339 10.61x
bioframe 0.352786 0.356839 0.354241 1.00x
pyranges0 0.032403 0.034701 0.033667 10.52x
pyranges1 0.044958 0.046169 0.045629 7.76x
pybedtools 0.369122 0.379131 0.3729 0.95x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.021035 0.026894 0.023802 13.86x
bioframe 0.308013 0.347919 0.329806 1.00x
pyranges1 0.076199 0.085019 0.079372 4.16x
pybedtools 11.056327 11.280248 11.149039 0.03x
genomicranges 2.057607 2.07651 2.067998 0.16x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.013263 0.014998 0.013874 5.68x
bioframe 0.077717 0.081116 0.078865 1.00x
pyranges1 0.094753 0.114552 0.10257 0.77x
pybedtools 11.374602 11.428316 11.393849 0.01x
1000000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.482548 0.538737 0.507383 2.55x
bioframe 1.26082 1.35031 1.296195 1.00x
pyranges0 0.775969 0.828801 0.810501 1.60x
pyranges1 1.272326 1.29706 1.28585 1.01x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.439544 0.488414 0.458975 14.86x
bioframe 6.592501 7.111734 6.818208 1.00x
pyranges0 0.398173 0.413055 0.406623 16.77x
pyranges1 0.51649 0.520946 0.518407 13.15x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.257781 0.305275 0.28525 17.65x
bioframe 4.640915 5.437883 5.033454 1.00x
pyranges1 0.916714 0.925945 0.920594 5.47x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.128241 0.137198 0.132474 7.71x
bioframe 0.996542 1.065777 1.021738 1.00x
pyranges1 1.115134 1.247674 1.172964 0.87x
10000000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 8.532137 9.738828 8.978132 2.20x
bioframe 19.276665 20.295566 19.708064 1.00x
pyranges0 14.819439 15.339048 15.092611 1.31x
pyranges1 20.153432 22.654892 21.56345 0.91x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 7.12779 7.490779 7.263011 22.17x
bioframe 156.356696 169.531002 160.989714 1.00x
pyranges0 6.402183 6.879779 6.62806 24.29x
pyranges1 7.526236 8.176338 7.857803 20.49x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 4.887937 5.553197 5.165014 20.21x
bioframe 102.637625 105.903506 104.389343 1.00x
pyranges1 13.35283 15.167609 14.19713 7.35x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.627897 1.683304 1.655288 9.86x
bioframe 15.586487 16.774274 16.316676 1.00x
pyranges1 16.99118 17.447484 17.195844 0.95x

gcp-linux

1-2
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.045943 0.064732 0.054234 1.66x
bioframe 0.084137 0.099481 0.090107 1.00x
pyranges0 0.056206 0.065654 0.061844 1.46x
pyranges1 0.09908 0.119018 0.106228 0.85x
pybedtools 0.38246 0.406379 0.39153 0.23x
genomicranges 1.19939 1.224621 1.208255 0.07x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.057012 0.073822 0.064665 2.49x
bioframe 0.158764 0.165707 0.161273 1.00x
pyranges0 0.172297 0.176259 0.17363 0.93x
pyranges1 0.217619 0.234088 0.22335 0.72x
pybedtools 0.845945 0.84898 0.847447 0.19x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.035631 0.043555 0.04066 2.74x
bioframe 0.108015 0.116522 0.111266 1.00x
pyranges0 0.077336 0.080282 0.07844 1.42x
pyranges1 0.100883 0.106671 0.103181 1.08x
pybedtools 0.745958 0.759006 0.754393 0.15x
genomicranges 1.154942 1.164158 1.158506 0.10x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.036476 0.040001 0.037897 5.10x
bioframe 0.189201 0.20046 0.193401 1.00x
pyranges0 0.141659 0.14424 0.143188 1.35x
pyranges1 0.206033 0.224902 0.213089 0.91x
pybedtools 0.773732 0.780424 0.776934 0.25x
genomicranges 1.186341 1.194172 1.189255 0.16x
8-7
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 6.235223 9.61441 7.723144 6.54x
bioframe 50.319263 50.956633 50.537202 1.00x
pyranges0 36.371926 36.581642 36.448645 1.39x
pyranges1 63.336711 63.455435 63.40654 0.80x
pybedtools 1149.001487 1152.127068 1150.070659 0.04x
genomicranges 597.951648 599.960895 599.002871 0.08x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.576373 3.679698 3.633697 15.54x
bioframe 56.301865 56.776617 56.464305 1.00x
pyranges0 2.45308 2.60494 2.505172 22.54x
pyranges1 4.975662 5.011008 4.997007 11.30x
pybedtools 44.181913 44.79409 44.386971 1.27x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.052196 2.104447 2.075706 38.15x
bioframe 79.174164 79.234115 79.194209 1.00x
pyranges0 18.797436 18.851941 18.824498 4.21x
pyranges1 20.399172 20.436149 20.418562 3.88x
pybedtools 35.850631 36.142479 36.041115 2.20x
genomicranges 612.985873 613.52087 613.229997 0.13x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.829478 1.838981 1.834999 15.44x
bioframe 28.29136 28.361417 28.326821 1.00x
pyranges0 18.611247 20.021441 19.473105 1.45x
pyranges1 22.118838 22.210733 22.161329 1.28x
pybedtools 74.477086 74.868659 74.618066 0.38x
genomicranges 623.865655 623.94955 623.896645 0.05x
100-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004992 0.01456 0.008429 0.47x
bioframe 0.003258 0.005112 0.00392 1.00x
pyranges0 0.002368 0.003408 0.002777 1.41x
pyranges1 0.005606 0.006547 0.005975 0.66x
pybedtools 0.005909 0.006483 0.006194 0.63x
genomicranges 0.003124 0.003404 0.003233 1.21x
pygenomics 0.000777 0.000879 0.000818 4.79x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.039758 0.157059 0.082499 0.06x
bioframe 0.004496 0.005139 0.004808 1.00x
pyranges1 0.005232 0.006285 0.005613 0.86x
pybedtools 0.002655 0.002957 0.002758 1.74x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004349 0.072032 0.027109 0.16x
bioframe 0.003966 0.004761 0.004247 1.00x
pyranges0 0.002885 0.00314 0.002973 1.43x
pyranges1 0.004525 0.004943 0.004694 0.90x
pybedtools 0.002502 0.002934 0.0027 1.57x
genomicranges 0.003229 0.003376 0.003278 1.30x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004251 0.062087 0.0237 0.37x
bioframe 0.007449 0.011114 0.008755 1.00x
pyranges1 0.010586 0.012078 0.011134 0.79x
pybedtools 0.002555 0.002829 0.002686 3.26x
1000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.005234 0.008876 0.006523 0.77x
bioframe 0.004581 0.005864 0.005016 1.00x
pyranges0 0.003191 0.003455 0.003296 1.52x
pyranges1 0.008031 0.008103 0.008074 0.62x
pybedtools 0.053782 0.054005 0.053929 0.09x
genomicranges 0.032026 0.032674 0.032265 0.16x
pygenomics 0.010626 0.01142 0.010918 0.46x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.005791 0.006665 0.006123 1.01x
bioframe 0.005982 0.006628 0.00621 1.00x
pyranges0 0.006279 0.006752 0.006447 0.96x
pyranges1 0.009039 0.009504 0.009217 0.67x
pybedtools 0.007826 0.007978 0.007917 0.78x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004343 0.007692 0.005481 0.98x
bioframe 0.005139 0.005735 0.005359 1.00x
pyranges1 0.005589 0.005976 0.005719 0.94x
pybedtools 0.01436 0.014635 0.014456 0.37x
genomicranges 0.032931 0.03307 0.033016 0.16x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004259 0.005305 0.004947 2.08x
bioframe 0.009969 0.010782 0.010297 1.00x
pyranges1 0.011982 0.012304 0.012103 0.85x
pybedtools 0.014775 0.015246 0.014956 0.69x
10000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.01015 0.018572 0.013027 1.84x
bioframe 0.02268 0.025605 0.023968 1.00x
pyranges0 0.016065 0.018936 0.017143 1.40x
pyranges1 0.030509 0.031181 0.030868 0.78x
pybedtools 1.335037 1.358509 1.345311 0.02x
genomicranges 0.322956 0.326403 0.324169 0.07x
pygenomics 0.136783 0.141169 0.13853 0.17x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.009616 0.011293 0.010447 3.08x
bioframe 0.031761 0.032938 0.032167 1.00x
pyranges0 0.010939 0.011387 0.01109 2.90x
pyranges1 0.015275 0.015676 0.015419 2.09x
pybedtools 0.059244 0.059899 0.059542 0.54x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.007028 0.007725 0.007363 2.50x
bioframe 0.018051 0.019179 0.018436 1.00x
pyranges1 0.014252 0.014683 0.014423 1.28x
pybedtools 0.926946 1.012523 0.973852 0.02x
genomicranges 0.330064 0.33175 0.331123 0.06x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.005994 0.006967 0.006396 2.78x
bioframe 0.017402 0.018389 0.017779 1.00x
pyranges1 0.022651 0.023034 0.022779 0.78x
pybedtools 0.952175 1.000698 0.97678 0.02x
100000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.059271 0.08788 0.070483 3.02x
bioframe 0.209074 0.218512 0.21252 1.00x
pyranges0 0.144653 0.164863 0.151749 1.40x
pyranges1 0.228314 0.247017 0.234636 0.91x
pybedtools 19.263571 19.313483 19.286741 0.01x
genomicranges 3.290473 3.294306 3.291987 0.06x
pygenomics 1.881858 1.924059 1.896222 0.11x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.054573 0.060741 0.057958 6.31x
bioframe 0.363422 0.368554 0.365524 1.00x
pyranges0 0.062446 0.06448 0.06321 5.78x
pyranges1 0.084614 0.086633 0.085545 4.27x
pybedtools 0.570352 0.57555 0.572301 0.64x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.034675 0.047883 0.039593 7.85x
bioframe 0.309819 0.311936 0.310958 1.00x
pyranges1 0.113469 0.114316 0.113866 2.73x
pybedtools 15.265868 16.802575 16.206183 0.02x
genomicranges 3.369224 3.374411 3.371411 0.09x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.02543 0.026808 0.026079 4.08x
bioframe 0.104575 0.1096 0.106393 1.00x
pyranges1 0.147505 0.151673 0.149512 0.71x
pybedtools 16.382024 17.619212 16.802475 0.01x
1000000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.947673 1.10915 1.003667 2.49x
bioframe 2.490142 2.513556 2.499533 1.00x
pyranges0 2.119717 2.178453 2.148959 1.16x
pyranges1 3.274957 3.298976 3.288601 0.76x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.771199 0.783292 0.777746 6.96x
bioframe 5.394265 5.434618 5.411728 1.00x
pyranges0 0.874484 0.932857 0.901145 6.01x
pyranges1 1.127032 1.149141 1.140538 4.74x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.501775 0.530884 0.514401 8.01x
bioframe 4.117035 4.131015 4.121744 1.00x
pyranges1 1.583204 1.678121 1.631619 2.53x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.247626 0.266565 0.256522 4.86x
bioframe 1.243608 1.250394 1.246153 1.00x
pyranges1 1.916323 2.005555 1.949487 0.64x
10000000-1p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 18.531273 18.806266 18.632293 1.62x
bioframe 30.074841 30.116671 30.097846 1.00x
pyranges0 29.579651 30.536834 29.904783 1.01x
pyranges1 42.196037 42.278681 42.232728 0.71x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 14.43136 14.548645 14.496101 5.56x
bioframe 80.443039 80.705181 80.548879 1.00x
pyranges0 13.64936 14.330292 13.882901 5.80x
pyranges1 17.384461 17.654503 17.561143 4.59x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 9.812223 9.925855 9.856571 6.23x
bioframe 61.348815 61.558393 61.444649 1.00x
pyranges1 24.969282 25.069392 25.029806 2.45x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.188742 3.31096 3.235061 6.11x
bioframe 19.748385 19.802079 19.778009 1.00x
pyranges1 30.304058 30.446378 30.353857 0.65x

Parallel performance

Results for parallel operations with 1, 2, 4, 6 and 8 threads.

apple-m3-max

8-7-8p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.247022 3.803021 3.370889 1.00x
polars_bio-2 1.798569 1.848162 1.811417 1.86x
polars_bio-4 1.140229 1.158243 1.147355 2.94x
polars_bio-6 0.959703 0.968725 0.962915 3.50x
polars_bio-8 0.694637 0.710492 0.701048 4.81x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.186354 2.248171 2.220822 1.00x
polars_bio-2 1.162969 1.222115 1.187505 1.87x
polars_bio-4 0.708508 0.735763 0.720115 3.08x
polars_bio-6 0.632877 0.652955 0.642816 3.45x
polars_bio-8 0.456674 0.476473 0.465284 4.77x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.502551 1.534006 1.515078 1.00x
polars_bio-2 0.811236 0.821365 0.815682 1.86x
polars_bio-4 0.440628 0.46778 0.455358 3.33x
polars_bio-6 0.331317 0.338207 0.334638 4.53x
polars_bio-8 0.280465 0.282707 0.281311 5.39x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.181806 1.185549 1.183889 1.00x
polars_bio-2 0.644288 0.645076 0.644587 1.84x
polars_bio-4 0.362752 0.363411 0.363036 3.26x
polars_bio-6 0.258583 0.272702 0.264111 4.48x
polars_bio-8 0.222888 0.234884 0.229052 5.17x
1000000-8p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.468442 0.523065 0.494609 1.00x
polars_bio-2 0.262861 0.26828 0.265028 1.87x
polars_bio-4 0.1629 0.166657 0.164536 3.01x
polars_bio-6 0.137724 0.146893 0.143772 3.44x
polars_bio-8 0.111952 0.11465 0.113521 4.36x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.393067 0.415076 0.404032 1.00x
polars_bio-2 0.234559 0.235746 0.235051 1.72x
polars_bio-4 0.158996 0.167352 0.16349 2.47x
polars_bio-6 0.14634 0.14935 0.148215 2.73x
polars_bio-8 0.125472 0.128158 0.126606 3.19x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.267875 0.296727 0.277677 1.00x
polars_bio-2 0.163662 0.170045 0.165917 1.67x
polars_bio-4 0.111136 0.114835 0.112891 2.46x
polars_bio-6 0.097944 0.104607 0.101477 2.74x
polars_bio-8 0.099474 0.117493 0.106059 2.62x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.128377 0.131261 0.129598 1.00x
polars_bio-2 0.081762 0.085104 0.08324 1.56x
polars_bio-4 0.064151 0.066197 0.064851 2.00x
polars_bio-6 0.066926 0.06892 0.06768 1.91x
polars_bio-8 0.072767 0.074339 0.073589 1.76x
10000000-8p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 9.081732 9.388126 9.203018 1.00x
polars_bio-2 4.696455 4.912478 4.793254 1.92x
polars_bio-4 2.885023 2.902893 2.896218 3.18x
polars_bio-6 2.196605 2.217945 2.209839 4.16x
polars_bio-8 1.813586 1.860947 1.833498 5.02x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 7.299887 7.659385 7.495962 1.00x
polars_bio-2 4.01928 4.158504 4.069511 1.84x
polars_bio-4 2.683383 2.720981 2.704975 2.77x
polars_bio-6 2.141075 2.162109 2.150595 3.49x
polars_bio-8 1.859186 1.865634 1.862653 4.02x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 5.30938 5.450332 5.381068 1.00x
polars_bio-2 2.893766 2.91378 2.906401 1.85x
polars_bio-4 1.748771 1.797485 1.768895 3.04x
polars_bio-6 1.352671 1.385655 1.369312 3.93x
polars_bio-8 1.178559 1.199971 1.192577 4.51x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.638818 1.678156 1.655573 1.00x
polars_bio-2 0.994195 0.996554 0.995701 1.66x
polars_bio-4 0.678722 0.701234 0.689151 2.40x
polars_bio-6 0.620289 0.662175 0.639026 2.59x
polars_bio-8 0.570659 0.582937 0.57688 2.87x

gcp-linux

8-7-8p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 6.325617 8.185275 7.005925 1.00x
polars_bio-2 3.920645 4.617084 4.198055 1.67x
polars_bio-4 3.036273 3.060781 3.0452 2.30x
polars_bio-6 2.127994 2.134505 2.131016 3.29x
polars_bio-8 1.731485 1.789347 1.752986 4.00x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 4.047329 4.439016 4.198198 1.00x
polars_bio-2 2.624132 2.722843 2.682361 1.57x
polars_bio-4 1.809028 1.917798 1.871763 2.24x
polars_bio-6 1.309557 1.362131 1.333989 3.15x
polars_bio-8 1.066945 1.113168 1.087907 3.86x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.426441 2.456318 2.439266 1.00x
polars_bio-2 1.22516 1.272066 1.245401 1.96x
polars_bio-4 0.711421 0.744023 0.724315 3.37x
polars_bio-6 0.563797 0.607321 0.580574 4.20x
polars_bio-8 0.459308 0.493886 0.479126 5.09x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.212958 2.23035 2.222531 1.00x
polars_bio-2 1.132056 1.15405 1.146413 1.94x
polars_bio-4 0.645737 0.661564 0.652277 3.41x
polars_bio-6 0.50589 0.511256 0.50839 4.37x
polars_bio-8 0.439503 0.450924 0.447075 4.97x
1000000-8p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.903831 1.098046 0.974229 1.00x
polars_bio-2 0.50099 0.512259 0.504852 1.93x
polars_bio-4 0.300453 0.328605 0.318188 3.06x
polars_bio-6 0.257792 0.278203 0.268718 3.63x
polars_bio-8 0.22321 0.243244 0.230621 4.22x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.758815 0.78919 0.769877 1.00x
polars_bio-2 0.465192 0.47484 0.468824 1.64x
polars_bio-4 0.332101 0.336953 0.334461 2.30x
polars_bio-6 0.276071 0.29266 0.281794 2.73x
polars_bio-8 0.237269 0.263256 0.254046 3.03x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.496938 0.517659 0.505043 1.00x
polars_bio-2 0.295325 0.313859 0.302686 1.67x
polars_bio-4 0.194371 0.20433 0.200853 2.51x
polars_bio-6 0.175505 0.181913 0.178222 2.83x
polars_bio-8 0.15672 0.163036 0.160701 3.14x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.245895 0.250118 0.247479 1.00x
polars_bio-2 0.167378 0.173578 0.171251 1.45x
polars_bio-4 0.122749 0.126635 0.124491 1.99x
polars_bio-6 0.11385 0.119157 0.116185 2.13x
polars_bio-8 0.108127 0.110327 0.10942 2.26x
10000000-8p
overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 18.215782 19.091392 18.510207 1.00x
polars_bio-2 9.399565 9.680242 9.566631 1.93x
polars_bio-4 5.303647 5.555487 5.442898 3.40x
polars_bio-6 4.022274 4.066371 4.051045 4.57x
polars_bio-8 3.369559 3.416123 3.388564 5.46x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 14.325736 14.444885 14.39027 1.00x
polars_bio-2 8.095907 8.178189 8.136852 1.77x
polars_bio-4 5.096407 5.15379 5.122893 2.81x
polars_bio-6 3.986362 4.205706 4.128561 3.49x
polars_bio-8 3.491618 3.711814 3.577309 4.02x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 9.701679 9.796148 9.740035 1.00x
polars_bio-2 5.346433 5.399117 5.370757 1.81x
polars_bio-4 3.150557 3.203458 3.178719 3.06x
polars_bio-6 2.485947 2.56386 2.52768 3.85x
polars_bio-8 2.156472 2.176608 2.163483 4.50x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.091184 3.216964 3.135982 1.00x
polars_bio-2 1.998423 2.041581 2.01331 1.56x
polars_bio-4 1.412483 1.45218 1.426102 2.20x
polars_bio-6 1.281432 1.328666 1.301256 2.41x
polars_bio-8 1.176944 1.193294 1.18414 2.65x

End to end tests

Results for an end-to-end test with calculating overlaps, nearest, coverage and count overlaps and saving results to a CSV file.

Note

Please note that in case of pyranges0 we were unable to export the results of coverage and count-overlaps operations to a CSV file, so the results are not presented here.

apple-m3-max

1-2
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.042378 0.130957 0.071929 3.10x 285.468
polars_bio_streaming 0.035498 0.037438 0.036653 6.09x 274.093
bioframe 0.208548 0.251457 0.223219 1.00x 300.75
pyranges0 0.409707 0.415361 0.412135 0.54x 329.968
pyranges1 0.47518 0.491508 0.482739 0.46x 324.468
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.053349 0.058382 0.055362 9.14x 321.062
polars_bio_streaming 0.051385 0.053979 0.052764 9.59x 311.422
bioframe 0.503887 0.510257 0.506123 1.00x 316.969
pyranges0 1.135469 1.183369 1.151801 0.44x 364.594
pyranges1 1.327935 1.334101 1.331346 0.38x 357.734
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.034756 0.038718 0.036421 13.40x 290.078
polars_bio_streaming 0.03607 0.037332 0.036534 13.35x 274.344
bioframe 0.48449 0.492328 0.487891 1.00x 419.312
pyranges1 0.971084 0.980085 0.975012 0.50x 407.562
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.03452 0.037714 0.035927 9.27x 294.266
polars_bio_streaming 0.035863 0.036756 0.036414 9.14x 278.438
bioframe 0.328145 0.338734 0.332951 1.00x 306.234
pyranges1 0.532739 0.544914 0.538646 0.62x 328.328
8-7
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 22.781745 23.916568 23.161559 16.64x 14677.0468
polars_bio_streaming 18.501279 18.797602 18.676707 20.63x 555.109
bioframe 383.108514 387.500069 385.309331 1.00x 33806.062
pyranges0 276.421312 279.839508 277.845198 1.39x 29777.312
pyranges1 355.703878 367.680249 360.875151 1.07x 34526.859
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 2.597955 2.760537 2.674482 32.02x 1060.031
polars_bio_streaming 2.65088 2.685157 2.665171 32.13x 560.453
bioframe 85.238305 86.131916 85.644961 1.00x 6894.062
pyranges0 13.530549 13.705834 13.620471 6.29x 3031.797
pyranges1 16.290782 16.385961 16.322671 5.25x 3509.984
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 1.523833 1.555472 1.541038 21.41x 717.984
polars_bio_streaming 1.336613 1.397324 1.364051 24.19x 411.703
bioframe 32.294844 33.421618 32.99334 1.00x 16651.922
pyranges1 26.382409 27.382901 27.020202 1.22x 6119.125
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 1.806838 1.845584 1.82594 54.33x 729.078
polars_bio_streaming 1.681187 1.767811 1.714943 57.85x 416.094
bioframe 97.91802 101.736351 99.210461 1.00x 23029.219
pyranges1 19.498264 19.676838 19.561322 5.07x 5270.234
100-1p
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.009118 0.077181 0.032054 1.55x 248.594
polars_bio_streaming 0.003382 0.004769 0.003853 12.92x 247.562
bioframe 0.030154 0.088667 0.049769 1.00x 231.641
pyranges0 0.045764 0.051035 0.047857 1.04x 228.516
pyranges1 0.053751 0.072545 0.060221 0.83x 228.609
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.009145 0.038799 0.019201 2.24x 253.156
polars_bio_streaming 0.003964 0.005051 0.004504 9.53x 248.188
bioframe 0.033372 0.061107 0.042931 1.00x 229.906
pyranges0 0.049586 0.057381 0.052364 0.82x 231.812
pyranges1 0.054496 0.059205 0.056362 0.76x 231.688
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.005492 0.020652 0.012584 5.20x 245.578
polars_bio_streaming 0.003059 0.003746 0.003397 19.25x 243.5
bioframe 0.060684 0.074157 0.065378 1.00x 230.953
pyranges1 0.093668 0.096265 0.094567 0.69x 243.5
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.005291 0.008843 0.006568 5.53x 249.406
polars_bio_streaming 0.003279 0.003697 0.003447 10.53x 245.672
bioframe 0.032914 0.042309 0.036302 1.00x 234.141
pyranges1 0.045085 0.045477 0.045224 0.80x 232.703
10000000-1p
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 11.109423 11.871893 11.397992 10.38x 7064.312
polars_bio_streaming 12.049206 12.327491 12.191582 9.71x 1505.109
bioframe 117.701516 119.51073 118.356016 1.00x 16380.234
pyranges0 235.484308 243.216406 239.726101 0.49x 14245.203
pyranges1 109.722359 112.326873 111.23273 1.06x 19423.172
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 7.842314 8.84828 8.510181 21.04x 2301.0
polars_bio_streaming 7.589706 8.153016 7.842404 22.83x 1327.531
bioframe 174.790383 183.458906 179.035999 1.00x 10996.234
pyranges0 32.793505 32.826686 32.809101 5.46x 4882.656
pyranges1 18.866156 19.570609 19.142653 9.35x 5253.281
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 1.901833 1.957711 1.928367 12.15x 956.844
polars_bio_streaming 1.797332 1.802527 1.800497 13.01x 651.266
bioframe 23.269774 23.55838 23.430125 1.00x 6493.234
pyranges1 26.370249 27.172173 26.879266 0.87x 10397.531
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 5.025462 5.234103 5.129963 20.79x 1036.734
polars_bio_streaming 4.956087 5.076052 5.014242 21.27x 968.719
bioframe 105.322287 107.758078 106.64158 1.00x 12803.828
pyranges1 22.079391 23.069931 22.618209 4.71x 10039.297

gcp-linux

1-2
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.072393 0.151871 0.09916 2.80x 314.234
polars_bio_streaming 0.064092 0.067914 0.066202 4.19x 288.621
bioframe 0.258278 0.31288 0.277225 1.00x 287.101
pyranges0 0.591745 0.599954 0.595204 0.47x 307.218
pyranges1 0.683388 0.702289 0.690362 0.40x 327.863
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.123656 1.702108 0.659015 1.99x 331.398
polars_bio_streaming 0.111801 0.762227 0.328874 3.98x 308.738
bioframe 0.881782 2.161628 1.309551 1.00x 297.695
pyranges0 1.728053 2.579086 2.030527 0.64x 308.93
pyranges1 1.953048 2.161655 2.049615 0.64x 337.352
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.064626 0.146983 0.093048 8.11x 299.094
polars_bio_streaming 0.065193 0.072839 0.068651 10.99x 280.387
bioframe 0.704155 0.791049 0.754463 1.00x 328.184
pyranges1 1.41166 1.432833 1.42261 0.53x 352.582
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.063012 0.207803 0.113156 4.03x 309.176
polars_bio_streaming 0.062735 0.071474 0.065886 6.92x 286.336
bioframe 0.436935 0.491839 0.455688 1.00x 303.07
pyranges1 0.785823 0.786847 0.786487 0.58x 316.227
8-7
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 44.539766 45.543038 45.196903 12.55x 14575.14
polars_bio_streaming 34.007093 35.972075 35.309756 16.06x 480.207
bioframe 566.167037 567.617695 567.13069 1.00x 43295.378
pyranges0 417.291061 421.875539 419.571591 1.35x 22915.917
pyranges1 538.365637 548.624613 543.918168 1.04x 43408.699
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 6.565142 7.696104 6.973448 12.21x 1070.016
polars_bio_streaming 5.840416 6.828222 6.203 13.73x 527.008
bioframe 84.30831 86.512823 85.150539 1.00x 2418.629
pyranges0 20.679566 21.424632 20.949203 4.06x 2239.047
pyranges1 25.352803 27.604137 26.544063 3.21x 2534.629
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 3.532309 3.660049 3.579428 12.53x 738.887
polars_bio_streaming 3.167344 3.169622 3.168694 14.15x 416.164
bioframe 41.150587 51.89725 44.839673 1.00x 14297.098
pyranges1 40.065526 41.350493 40.892187 1.10x 3096.812
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 3.827717 3.83517 3.830823 25.47x 737.059
polars_bio_streaming 3.346898 3.388796 3.372987 28.93x 428.422
bioframe 97.272988 97.790775 97.572564 1.00x 25981.051
pyranges1 30.021737 30.181438 30.124339 3.24x 3102.84
100-1p
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.012022 0.433918 0.153077 0.55x 262.656
polars_bio_streaming 0.007427 0.153294 0.056144 1.49x 259.039
bioframe 0.039406 0.172494 0.08386 1.00x 229.824
pyranges0 0.059086 0.075573 0.06466 1.30x 231.199
pyranges1 0.069077 0.088036 0.075488 1.11x 230.684
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.012814 0.408332 0.145315 1.24x 263.16
polars_bio_streaming 0.007605 0.007975 0.00779 23.20x 260.242
bioframe 0.044222 0.45263 0.180742 1.00x 230.684
pyranges0 0.066032 0.074886 0.068992 2.62x 231.195
pyranges1 0.07111 0.075383 0.072851 2.48x 230.68
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.008954 0.106466 0.041616 2.35x 258.184
polars_bio_streaming 0.006726 0.007524 0.007124 13.72x 255.258
bioframe 0.07866 0.135591 0.097742 1.00x 230.68
pyranges1 0.120404 0.12302 0.121487 0.80x 231.023
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.008886 0.095835 0.038506 1.53x 262.312
polars_bio_streaming 0.006988 0.008915 0.007637 7.71x 259.555
bioframe 0.043766 0.08625 0.058895 1.00x 230.852
pyranges1 0.060574 0.060725 0.06064 0.97x 231.195
10000000-1p
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 35.783773 37.155414 36.274578 4.73x 6926.227
polars_bio_streaming 28.364026 33.834447 32.005189 5.37x 1172.484
bioframe 170.321558 173.826371 171.750864 1.00x 17544.5
pyranges0 374.384106 377.106338 375.972726 0.46x 12951.133
pyranges1 174.205234 176.465726 174.996859 0.98x 23198.973
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 16.577973 16.599008 16.591114 5.88x 2208.477
polars_bio_streaming 14.957252 15.214483 15.115055 6.45x 1202.555
bioframe 96.638005 98.21701 97.559755 1.00x 7832.391
pyranges0 54.678927 55.140916 54.905002 1.78x 3125.051
pyranges1 31.874441 33.028755 32.303297 3.02x 4447.332
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 4.184608 4.403694 4.281924 6.97x 768.75
polars_bio_streaming 3.792566 3.917329 3.838723 7.77x 591.812
bioframe 29.609636 29.972788 29.838014 1.00x 4040.508
pyranges1 41.671912 42.238756 41.949904 0.71x 8503.844
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 11.003474 11.126526 11.052697 6.35x 1012.105
polars_bio_streaming 9.939793 10.434084 10.264927 6.83x 696.078
bioframe 70.00646 70.300308 70.14716 1.00x 8315.176
pyranges1 33.521685 33.726979 33.593637 2.09x 8495.672

Memory profiles

### apple-m3-max #### 1-2

Operation: overlap for dataset: 1-2 on platform: apple-m3-max

2025-07-18T14:02:40.458014 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:40.609030 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:40.774853 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:40.952239 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:41.252305 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 1-2 on platform: apple-m3-max

2025-07-18T14:02:41.430378 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:41.596503 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:41.775039 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:41.950872 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:42.126659 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 1-2 on platform: apple-m3-max

2025-07-18T14:02:42.301321 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:42.476426 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:42.651642 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:42.825769 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 1-2 on platform: apple-m3-max

2025-07-18T14:02:43.014236 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:43.185290 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:43.438450 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:43.612058 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ #### 8-7

Operation: overlap for dataset: 8-7 on platform: apple-m3-max

2025-07-18T14:02:43.795309 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:43.973136 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:44.195876 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:44.443233 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:44.668880 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 8-7 on platform: apple-m3-max

2025-07-18T14:02:44.930413 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:45.231370 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:45.405333 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:45.595318 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:45.784752 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 8-7 on platform: apple-m3-max

2025-07-18T14:02:45.971316 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:46.171395 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:46.333134 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:46.524316 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 8-7 on platform: apple-m3-max

2025-07-18T14:02:46.697893 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:46.877829 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:47.051656 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:47.230555 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ #### 100-1p

Operation: overlap for dataset: 100-1p on platform: apple-m3-max

2025-07-18T14:02:47.427047 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:47.591986 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:47.765780 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:47.928525 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:48.095253 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 100-1p on platform: apple-m3-max

2025-07-18T14:02:48.276843 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:48.446304 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:48.608912 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:48.768561 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:48.930053 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 100-1p on platform: apple-m3-max

2025-07-18T14:02:49.095409 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:49.397012 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:49.562522 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:49.727856 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 100-1p on platform: apple-m3-max

2025-07-18T14:02:49.905796 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:50.072768 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:50.235824 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:50.414385 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ #### 10000000-1p

Operation: overlap for dataset: 10000000-1p on platform: apple-m3-max

2025-07-18T14:02:50.567168 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:50.747744 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:50.953712 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:51.168123 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:51.402016 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 10000000-1p on platform: apple-m3-max

2025-07-18T14:02:51.624983 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:51.791509 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:51.987313 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:52.204731 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:52.401470 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 10000000-1p on platform: apple-m3-max

2025-07-18T14:02:52.578177 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:52.743931 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:52.916940 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:53.095304 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 10000000-1p on platform: apple-m3-max

2025-07-18T14:02:53.276695 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:53.489177 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:53.663741 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:53.861047 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ ### gcp-linux #### 1-2

Operation: overlap for dataset: 1-2 on platform: gcp-linux

2025-07-18T14:02:54.035229 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:54.209373 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:54.387257 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:54.563767 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:54.937654 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 1-2 on platform: gcp-linux

2025-07-18T14:02:55.126595 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:55.298357 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:55.484829 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:55.659648 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:55.831796 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 1-2 on platform: gcp-linux

2025-07-18T14:02:56.010121 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:56.184639 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:56.353021 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:56.547953 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 1-2 on platform: gcp-linux

2025-07-18T14:02:56.735006 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:56.924445 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:57.120739 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:57.306289 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ #### 8-7

Operation: overlap for dataset: 8-7 on platform: gcp-linux

2025-07-18T14:02:57.498827 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:57.703586 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:57.900045 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:58.170178 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:58.425861 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 8-7 on platform: gcp-linux

2025-07-18T14:02:58.690526 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:58.868896 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:59.048478 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:59.229055 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:59.392801 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 8-7 on platform: gcp-linux

2025-07-18T14:02:59.569089 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:59.738575 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:02:59.902150 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:00.109808 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 8-7 on platform: gcp-linux

2025-07-18T14:03:00.306929 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:00.481050 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:00.644557 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:00.835243 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ #### 100-1p

Operation: overlap for dataset: 100-1p on platform: gcp-linux

2025-07-18T14:03:01.023819 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:01.199360 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:01.376322 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:01.554389 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:01.985953 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 100-1p on platform: gcp-linux

2025-07-18T14:03:02.161811 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:02.370875 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:02.545043 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:02.722845 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:02.909228 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 100-1p on platform: gcp-linux

2025-07-18T14:03:03.091344 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:03.284261 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:03.462267 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:03.639753 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 100-1p on platform: gcp-linux

2025-07-18T14:03:03.811303 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:03.988984 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:04.191451 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:04.355322 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ #### 10000000-1p

Operation: overlap for dataset: 10000000-1p on platform: gcp-linux

2025-07-18T14:03:04.536843 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:04.730729 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:04.925144 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:05.178849 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:05.432816 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 10000000-1p on platform: gcp-linux

2025-07-18T14:03:05.638877 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:05.822698 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:06.024062 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:06.222406 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:06.418702 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 10000000-1p on platform: gcp-linux

2025-07-18T14:03:06.616922 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:06.798269 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:06.969933 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:07.164813 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 10000000-1p on platform: gcp-linux

2025-07-18T14:03:07.331797 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:07.523497 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:07.705478 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-07-18T14:03:07.907693 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Comparison of the output schemas and data types

polars-bio tries to preserve the output schema of the bioframe package, pyranges uses its own internal representation that can be converted to a Pandas dataframe. It is also worth mentioning that pyranges always uses int64 for start/end positions representation (polars-bio and bioframe determine it adaptively based on the input file formats/DataFrames datatypes used. polars-bio does not support interval operations on chromosomes longer than 2Gp(issue)). However, in the analyzed test case (8-7) input/output data structures have similar memory requirements. Please compare the following schema and memory size estimates of the input and output DataFrames for 8-7 test case:

import bioframe as bf
import polars_bio as pb
import pandas as pd
import polars as pl
import pyranges0 as pr0


DATA_DIR="/Users/mwiewior/research/polars-bio-benchmarking/data/"
df_1 = f"{DATA_DIR}/ex-anno/*.parquet"
df_2 = f"{DATA_DIR}/ex-rna/*.parquet"
df1 = pd.read_parquet(df_1.replace("*.parquet", ""))
df2 = pd.read_parquet(df_2.replace("*.parquet", ""))
cols = ["contig", "pos_start", "pos_end"]

def df2pr0(df):
    return pr0.PyRanges(
        chromosomes=df.contig,
        starts=df.pos_start,
        ends=df.pos_end,
    )
Input datasets sizes and schemas

df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1194285 entries, 0 to 1194284
Data columns (total 3 columns):
#   Column     Non-Null Count    Dtype
---  ------     --------------    -----
0   contig     1194285 non-null  object
1   pos_start  1194285 non-null  int32
2   pos_end    1194285 non-null  int32
dtypes: int32(2), object(1)
memory usage: 18.2+ MB

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9944559 entries, 0 to 9944558
Data columns (total 3 columns):
 #   Column     Dtype
---  ------     -----
 0   contig     object
 1   pos_start  int32
 2   pos_end    int32
dtypes: int32(2), object(1)
memory usage: 151.7+ MB
polars-bio output DataFrames schema and memory used (Polars and Pandas)
df_pb = pb.overlap(df_1, df_2, cols1=cols, cols2=cols, use_zero_based=True)
df_pb.count().collect()
307184634

df_pb.collect_schema()
Schema([('contig_1', String),
        ('pos_start_1', Int32),
        ('pos_end_1', Int32),
        ('contig_2', String),
        ('pos_start_2', Int32),
        ('pos_end_2', Int32)])

df_pb.collect().estimated_size("mb")
7360.232946395874
df_pb.collect().to_pandas().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307184634 entries, 0 to 307184633
Data columns (total 6 columns):
 #   Column       Dtype
---  ------       -----
 0   contig_1     object
 1   pos_start_1  int32
 2   pos_end_1    int32
 3   contig_2     object
 4   pos_start_2  int32
 5   pos_end_2    int32
dtypes: int32(4), object(2)
memory usage: 9.2+ GB
bioframe output DataFrame schema and memory used (Pandas)

df_bf = bf.overlap(df1, df2, cols1=cols, cols2=cols, how="inner")
len(df_bf)
307184634
df_bf.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307184634 entries, 0 to 307184633
Data columns (total 6 columns):
 #   Column      Dtype
---  ------      -----
 0   contig      object
 1   pos_start   int32
 2   pos_end     int32
 3   contig_     object
 4   pos_start_  int32
 5   pos_end_    int32
dtypes: int32(4), object(2)
memory usage: 9.2+ GB
pyranges0 output DataFrame schema and memory used (Pandas)
df_pr0_1 = df2pr0(df1)
df_pr0_2 = df2pr0(df2)
df_pr0_1.df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1194285 entries, 0 to 1194284
Data columns (total 3 columns):
 #   Column      Non-Null Count    Dtype
---  ------      --------------    -----
 0   Chromosome  1194285 non-null  category
 1   Start       1194285 non-null  int64
 2   End         1194285 non-null  int64
dtypes: category(1), int64(2)
memory usage: 19.4 MB
df_pr0_2.df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9944559 entries, 0 to 9944558
Data columns (total 3 columns):
 #   Column      Dtype
---  ------      -----
 0   Chromosome  category
 1   Start       int64
 2   End         int64
dtypes: category(1), int64(2)
memory usage: 161.2 MB

df_pr0 = df_pr0_1.join(df_pr0_2)
len(df_pr0)
307184634
df_pr0.df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307184634 entries, 0 to 307184633
Data columns (total 5 columns):
 #   Column      Dtype
---  ------      -----
 0   Chromosome  category
 1   Start       int64
 2   End         int64
 3   Start_b     int64
 4   End_b       int64
dtypes: category(1), int64(4)
memory usage: 9.4 GB

Note

Please note that pyranges unlike bioframe and polars-bio returns only one chromosome column but uses int64 data types for encoding start and end positions even if input datasets use int32.