This document provides additional information about the algorithms, benchmarking setup, data, and results that were presented in the manuscript.
Algorithm description
polars-bio implements a set of binary interval operations on genomic ranges, such as overlap, nearest, count-overlaps, and coverage. All these operations share the very similar algorithmic structure, which is presented in the diagram below.
flowchart TB
%% Define header node
H["Interval operation"]
%% Define DataFrame nodes
I0["left DataFrame"]
I1["right DataFrame"]
style I0 stroke-dasharray: 5 5, stroke-width: 1
%% Draw edges with labels
H -->|probe /streaming/ side| I0
H -->|build /search structure/ side| I1
%% Record batches under left DataFrame within a dotted box
I0 --> LeftGroup
subgraph LeftGroup["Record Batches"]
direction TB
LB0["Batch 1"]
LB1["Batch 2"]
LB2["Batch 3"]
end
style LeftGroup stroke-dasharray: 5 5, stroke-width: 1
%% Record batches under right DataFrame within a dotted box
I1 --> RightGroup
subgraph RightGroup["Record Batches"]
direction TB
RB0["Batch 1"]
RB1["Batch 2"]
RB2["Batch 3"]
end
The basic concept is that each operation consists of two sides: the probe side and the build side. The probe side is the one that is streamed, while the build side is the one that is implemented as a search data structure (for generic overlap operation the search structure can be changed using algorithm parameter, for other operations is always Cache Oblivious Interval Trees as according to the benchmark COITrees outperforms other data structures). In the case of nearest operation there is an additional sorted list of intervals used for searching for closest intervals in the case of non-existing overlaps.
Note
Available search structure implementations for overlap operation:
Once the build side data structure is ready, then records from the probe side are processed against the search structure organized as record batches. Each record batch can be processed independently. Search structure nodes contains identifiers of the rows from the build side that are then used to construct a new record that is returned as a result of the operation.
Out-of-core (streaming) processing
This algorithm allows you to process your results without requiring all your data to be in memory at the same time. In particular, the probe side can be streamed from a file or a cloud storage, while the build side needs to be materialized in memory. In real applications, the probe side is usually a large file with genomic intervals, while the build side is a smaller file with annotations or other genomic features. This allows you to process large genomic datasets without running out of memory.
Note
In this sense, the order of the sides is important, as the probe side is streamed and processed in batches, while the build side is fully materialized in memory.
Parallelization
In the current implementation, the probe side can be processed in parallel using multiple threads on partitioned (implicitly or explicilty partitioned inputs - see partitioning strategies). The build side is predominantly single-threaded (with the notable exception of BGZF compressed or partitioned Parquet/CSV input data files reading, which can be parallelized).
DefaultPhysicalPlanner and PhysicalOptimizerRule for detecting and rewriting generic interval join operations (i.e. overlap and nearest) with optimizied execution strategies. This is implemented as a part of our another project sequila-native that exposes optimized interval join operations for Apache DataFusion with both SQL and DataFrame APIs.
The table below compares polars-bio with other popular Python libraries for genomic ranges operations.
Feature/Library
polars-bio
Bioframe
PyRanges0
PyRanges1
pybedtools
PyGenomics
GenomicRanges
out-of-core processing
โ
โ
โ
โ
โ
โ
โ
parallel processing
โ
โ
โ 1
โ
โ
โ
โ
vectorized execution engine
โ
โ
โ
โ
โ
โ
โ
cloud object storage support
โ
โ /โ2
โ
โ
โ
โ
โ
Pandas/Polars DataFrame support
โ /โ
โ /โ
โ /โ3
โ /โ4
โ/โ
โ/โ
โ /โ
Note
1 PyRanges0 supports parallel processing with Ray, but it does not bring any performance benefits over single-threaded execution and it is not recommended. Overlap and nearest operations benchmark (1,2,4,6,8 threads) on 8-7 on Apple M3 Max platfotm confirms this observation.
Library
Min (s)
Max (s)
Mean (s)
Speedup
pyranges0
16.519153
17.889156
17.118936
1.00x
pyranges0-2
32.539549
34.858773
33.762477
0.51x
pyranges0-4
30.033927
30.367822
30.158362
0.57x
pyranges0-6
27.711752
33.280867
30.089641
0.57x
pyranges0-8
30.049501
33.257462
31.553328
0.54x
Library
Min (s)
Max (s)
Mean (s)
Speedup
pyranges0
1.580677
1.703093
1.630820
1.00x
pyranges0-2
3.954720
4.032619
3.997087
0.41x
pyranges0-4
3.716688
4.004058
3.847917
0.42x
pyranges0-6
3.853526
3.942475
3.883337
0.42x
pyranges0-8
3.861577
3.924950
3.902913
0.42x
2 Some input functions, such as read_table support cloud object storage
3 Only export/import with data copying is supported
For memory profiling Python memory-profilerversion 0.61.0 was used. A helper run-memory-profiler.py script was developed and a sample invocation was used to run the tests as it is presented in the snippet below:
The AIList dataset after transcoding into the Parquet file format (with the Snappy compression) was used for benchmarking.
This dataset was published with the AIList paper:
Jianglin Feng , Aakrosh Ratan , Nathan C Sheffield, Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics 2019.
1 bioframe and pyranges are zero-based, this is why we need to set use_zero_based=True (polars-bio >= 0.10.3) in polars-bio to get the same results as in bioframe and pyranges.
2 bioframe how parameter is set to inner (left by default)
Single thread results
Results for overlap, nearest, count-overlaps, and coverage operations with single-thread performance on apple-m3-max and gcp-linux platforms.
Note
Please note that in case of pyranges0 we were unable to compute the results of coverage and count-overlaps operations for macOS and Linux in the synthetic benchmark, so the results are not presented here.
apple-m3-max
1-2
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.035619
0.043113
0.0383
2.70x
bioframe
0.102257
0.104425
0.103354
1.00x
pyranges0
0.025425
0.032821
0.028001
3.69x
pyranges1
0.059608
0.064147
0.061763
1.67x
pybedtools
0.343204
0.352804
0.348434
0.30x
genomicranges
1.042893
1.044245
1.043488
0.10x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.039943
0.045166
0.042109
4.45x
bioframe
0.185452
0.189631
0.187388
1.00x
pyranges0
0.092334
0.09634
0.093688
2.00x
pyranges1
0.133631
0.134179
0.133981
1.40x
pybedtools
0.756676
0.761866
0.75953
0.25x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.026706
0.029754
0.028142
4.69x
bioframe
0.131124
0.133729
0.132052
1.00x
pyranges0
0.039136
0.039774
0.039377
3.35x
pyranges1
0.061976
0.063181
0.062658
2.11x
pybedtools
0.665804
0.673844
0.668534
0.20x
genomicranges
0.994963
1.006435
0.999389
0.13x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.0262
0.028749
0.027418
6.30x
bioframe
0.16949
0.176628
0.172842
1.00x
pyranges0
0.07376
0.076708
0.075369
2.29x
pyranges1
0.128027
0.133263
0.130247
1.33x
pybedtools
0.701817
0.708726
0.705839
0.24x
genomicranges
1.032651
1.049059
1.040799
0.17x
8-7
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
3.987391
4.648581
4.235518
7.17x
bioframe
29.793837
30.991576
30.375518
1.00x
pyranges0
15.632212
15.974075
15.857213
1.92x
pyranges1
31.622804
33.699074
32.680701
0.93x
pybedtools
916.711575
919.974811
918.154834
0.03x
genomicranges
479.214112
487.832054
484.579554
0.06x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
2.116922
2.169534
2.139006
32.13x
bioframe
68.581465
68.992651
68.725495
1.00x
pyranges0
1.381964
1.508513
1.424446
48.25x
pyranges1
2.697684
2.728407
2.717532
25.29x
pybedtools
35.528719
35.876667
35.699544
1.93x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.445467
1.484052
1.46225
58.77x
bioframe
85.632767
86.26148
85.935955
1.00x
pyranges0
9.674847
9.833233
9.753982
8.81x
pyranges1
10.170249
10.254359
10.201813
8.42x
pybedtools
33.101592
33.966188
33.423595
2.57x
genomicranges
488.972732
490.395787
489.548184
0.18x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.195279
1.205765
1.199323
20.45x
bioframe
24.423391
24.682901
24.525909
1.00x
pyranges0
11.093644
11.328071
11.220416
2.19x
pyranges1
11.987003
12.147925
12.066045
2.03x
pybedtools
59.699275
60.04087
59.84965
0.41x
genomicranges
500.041974
503.31936
502.043072
0.05x
100-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.002471
0.006262
0.003855
0.54x
bioframe
0.001374
0.002735
0.002067
1.00x
pyranges0
0.000977
0.001952
0.001337
1.55x
pyranges1
0.002276
0.003591
0.002739
0.75x
pybedtools
0.006856
0.010064
0.008032
0.26x
genomicranges
0.001784
0.002115
0.001938
1.07x
pygenomics
0.000475
0.000541
0.000509
4.06x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.002802
0.007312
0.004371
0.51x
bioframe
0.00157
0.00347
0.002251
1.00x
pyranges0
0.00135
0.004085
0.002281
0.99x
pyranges1
0.002084
0.003622
0.002633
0.85x
pybedtools
0.005288
0.023073
0.011717
0.19x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.001892
0.006355
0.003397
0.52x
bioframe
0.001563
0.002165
0.001775
1.00x
pyranges1
0.00181
0.002209
0.001972
0.90x
pybedtools
0.020892
0.062978
0.036866
0.05x
genomicranges
0.001896
0.002057
0.001957
0.91x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.001911
0.006057
0.003343
1.03x
bioframe
0.003065
0.00411
0.003452
1.00x
pyranges1
0.004455
0.005845
0.005021
0.69x
pybedtools
0.02477
0.059532
0.037421
0.09x
1000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.00262
0.004367
0.003278
0.71x
bioframe
0.001909
0.002988
0.002313
1.00x
pyranges0
0.001361
0.00182
0.001543
1.50x
pyranges1
0.002678
0.003166
0.002927
0.79x
pybedtools
0.037238
0.039737
0.038453
0.06x
genomicranges
0.019265
0.019945
0.01957
0.12x
pygenomics
0.006876
0.006994
0.006949
0.33x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.003048
0.0083
0.00553
0.65x
bioframe
0.003269
0.004119
0.003604
1.00x
pyranges0
0.002514
0.003506
0.003099
1.16x
pyranges1
0.003722
0.00418
0.003935
0.92x
pybedtools
0.00881
0.011281
0.009729
0.37x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.001854
0.004714
0.002898
1.00x
bioframe
0.002523
0.003547
0.002898
1.00x
pyranges1
0.002302
0.002838
0.002498
1.16x
pybedtools
0.032681
0.047822
0.037981
0.08x
genomicranges
0.020029
0.02029
0.020192
0.14x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.002202
0.003516
0.002696
1.77x
bioframe
0.004238
0.005691
0.004758
1.00x
pyranges1
0.004909
0.005934
0.005284
0.90x
pybedtools
0.030735
0.045004
0.03646
0.13x
10000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004603
0.008294
0.006073
1.81x
bioframe
0.010529
0.011367
0.011014
1.00x
pyranges0
0.006498
0.007306
0.006811
1.62x
pyranges1
0.01096
0.012611
0.011684
0.94x
pybedtools
0.94646
0.94995
0.948121
0.01x
genomicranges
0.198868
0.200266
0.199428
0.06x
pygenomics
0.080325
0.08121
0.080663
0.14x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004851
0.007782
0.005908
4.50x
bioframe
0.025947
0.027779
0.026584
1.00x
pyranges0
0.00501
0.005703
0.00526
5.05x
pyranges1
0.007517
0.007937
0.00769
3.46x
pybedtools
0.040749
0.043864
0.041889
0.63x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.003283
0.008069
0.005083
3.12x
bioframe
0.014669
0.016689
0.015834
1.00x
pyranges1
0.007637
0.008979
0.008178
1.94x
pybedtools
0.720797
0.730655
0.725407
0.02x
genomicranges
0.202131
0.209398
0.204628
0.08x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.002756
0.004613
0.003377
3.06x
bioframe
0.009849
0.011243
0.010339
1.00x
pyranges1
0.01326
0.015308
0.013973
0.74x
pybedtools
0.727294
0.733098
0.73116
0.01x
100000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.030583
0.038892
0.033394
3.33x
bioframe
0.108358
0.115233
0.111059
1.00x
pyranges0
0.059633
0.065599
0.061791
1.80x
pyranges1
0.100074
0.105947
0.102267
1.09x
pybedtools
13.434458
13.602339
13.496321
0.01x
genomicranges
2.030365
2.052434
2.039897
0.05x
pygenomics
1.001974
1.018231
1.009213
0.11x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.03013
0.036718
0.03339
10.61x
bioframe
0.352786
0.356839
0.354241
1.00x
pyranges0
0.032403
0.034701
0.033667
10.52x
pyranges1
0.044958
0.046169
0.045629
7.76x
pybedtools
0.369122
0.379131
0.3729
0.95x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.021035
0.026894
0.023802
13.86x
bioframe
0.308013
0.347919
0.329806
1.00x
pyranges1
0.076199
0.085019
0.079372
4.16x
pybedtools
11.056327
11.280248
11.149039
0.03x
genomicranges
2.057607
2.07651
2.067998
0.16x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.013263
0.014998
0.013874
5.68x
bioframe
0.077717
0.081116
0.078865
1.00x
pyranges1
0.094753
0.114552
0.10257
0.77x
pybedtools
11.374602
11.428316
11.393849
0.01x
1000000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.482548
0.538737
0.507383
2.55x
bioframe
1.26082
1.35031
1.296195
1.00x
pyranges0
0.775969
0.828801
0.810501
1.60x
pyranges1
1.272326
1.29706
1.28585
1.01x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.439544
0.488414
0.458975
14.86x
bioframe
6.592501
7.111734
6.818208
1.00x
pyranges0
0.398173
0.413055
0.406623
16.77x
pyranges1
0.51649
0.520946
0.518407
13.15x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.257781
0.305275
0.28525
17.65x
bioframe
4.640915
5.437883
5.033454
1.00x
pyranges1
0.916714
0.925945
0.920594
5.47x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.128241
0.137198
0.132474
7.71x
bioframe
0.996542
1.065777
1.021738
1.00x
pyranges1
1.115134
1.247674
1.172964
0.87x
10000000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
8.532137
9.738828
8.978132
2.20x
bioframe
19.276665
20.295566
19.708064
1.00x
pyranges0
14.819439
15.339048
15.092611
1.31x
pyranges1
20.153432
22.654892
21.56345
0.91x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
7.12779
7.490779
7.263011
22.17x
bioframe
156.356696
169.531002
160.989714
1.00x
pyranges0
6.402183
6.879779
6.62806
24.29x
pyranges1
7.526236
8.176338
7.857803
20.49x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
4.887937
5.553197
5.165014
20.21x
bioframe
102.637625
105.903506
104.389343
1.00x
pyranges1
13.35283
15.167609
14.19713
7.35x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.627897
1.683304
1.655288
9.86x
bioframe
15.586487
16.774274
16.316676
1.00x
pyranges1
16.99118
17.447484
17.195844
0.95x
gcp-linux
1-2
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.045943
0.064732
0.054234
1.66x
bioframe
0.084137
0.099481
0.090107
1.00x
pyranges0
0.056206
0.065654
0.061844
1.46x
pyranges1
0.09908
0.119018
0.106228
0.85x
pybedtools
0.38246
0.406379
0.39153
0.23x
genomicranges
1.19939
1.224621
1.208255
0.07x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.057012
0.073822
0.064665
2.49x
bioframe
0.158764
0.165707
0.161273
1.00x
pyranges0
0.172297
0.176259
0.17363
0.93x
pyranges1
0.217619
0.234088
0.22335
0.72x
pybedtools
0.845945
0.84898
0.847447
0.19x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.035631
0.043555
0.04066
2.74x
bioframe
0.108015
0.116522
0.111266
1.00x
pyranges0
0.077336
0.080282
0.07844
1.42x
pyranges1
0.100883
0.106671
0.103181
1.08x
pybedtools
0.745958
0.759006
0.754393
0.15x
genomicranges
1.154942
1.164158
1.158506
0.10x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.036476
0.040001
0.037897
5.10x
bioframe
0.189201
0.20046
0.193401
1.00x
pyranges0
0.141659
0.14424
0.143188
1.35x
pyranges1
0.206033
0.224902
0.213089
0.91x
pybedtools
0.773732
0.780424
0.776934
0.25x
genomicranges
1.186341
1.194172
1.189255
0.16x
8-7
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
6.235223
9.61441
7.723144
6.54x
bioframe
50.319263
50.956633
50.537202
1.00x
pyranges0
36.371926
36.581642
36.448645
1.39x
pyranges1
63.336711
63.455435
63.40654
0.80x
pybedtools
1149.001487
1152.127068
1150.070659
0.04x
genomicranges
597.951648
599.960895
599.002871
0.08x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
3.576373
3.679698
3.633697
15.54x
bioframe
56.301865
56.776617
56.464305
1.00x
pyranges0
2.45308
2.60494
2.505172
22.54x
pyranges1
4.975662
5.011008
4.997007
11.30x
pybedtools
44.181913
44.79409
44.386971
1.27x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
2.052196
2.104447
2.075706
38.15x
bioframe
79.174164
79.234115
79.194209
1.00x
pyranges0
18.797436
18.851941
18.824498
4.21x
pyranges1
20.399172
20.436149
20.418562
3.88x
pybedtools
35.850631
36.142479
36.041115
2.20x
genomicranges
612.985873
613.52087
613.229997
0.13x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.829478
1.838981
1.834999
15.44x
bioframe
28.29136
28.361417
28.326821
1.00x
pyranges0
18.611247
20.021441
19.473105
1.45x
pyranges1
22.118838
22.210733
22.161329
1.28x
pybedtools
74.477086
74.868659
74.618066
0.38x
genomicranges
623.865655
623.94955
623.896645
0.05x
100-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004992
0.01456
0.008429
0.47x
bioframe
0.003258
0.005112
0.00392
1.00x
pyranges0
0.002368
0.003408
0.002777
1.41x
pyranges1
0.005606
0.006547
0.005975
0.66x
pybedtools
0.005909
0.006483
0.006194
0.63x
genomicranges
0.003124
0.003404
0.003233
1.21x
pygenomics
0.000777
0.000879
0.000818
4.79x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.039758
0.157059
0.082499
0.06x
bioframe
0.004496
0.005139
0.004808
1.00x
pyranges1
0.005232
0.006285
0.005613
0.86x
pybedtools
0.002655
0.002957
0.002758
1.74x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004349
0.072032
0.027109
0.16x
bioframe
0.003966
0.004761
0.004247
1.00x
pyranges0
0.002885
0.00314
0.002973
1.43x
pyranges1
0.004525
0.004943
0.004694
0.90x
pybedtools
0.002502
0.002934
0.0027
1.57x
genomicranges
0.003229
0.003376
0.003278
1.30x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004251
0.062087
0.0237
0.37x
bioframe
0.007449
0.011114
0.008755
1.00x
pyranges1
0.010586
0.012078
0.011134
0.79x
pybedtools
0.002555
0.002829
0.002686
3.26x
1000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.005234
0.008876
0.006523
0.77x
bioframe
0.004581
0.005864
0.005016
1.00x
pyranges0
0.003191
0.003455
0.003296
1.52x
pyranges1
0.008031
0.008103
0.008074
0.62x
pybedtools
0.053782
0.054005
0.053929
0.09x
genomicranges
0.032026
0.032674
0.032265
0.16x
pygenomics
0.010626
0.01142
0.010918
0.46x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.005791
0.006665
0.006123
1.01x
bioframe
0.005982
0.006628
0.00621
1.00x
pyranges0
0.006279
0.006752
0.006447
0.96x
pyranges1
0.009039
0.009504
0.009217
0.67x
pybedtools
0.007826
0.007978
0.007917
0.78x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004343
0.007692
0.005481
0.98x
bioframe
0.005139
0.005735
0.005359
1.00x
pyranges1
0.005589
0.005976
0.005719
0.94x
pybedtools
0.01436
0.014635
0.014456
0.37x
genomicranges
0.032931
0.03307
0.033016
0.16x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.004259
0.005305
0.004947
2.08x
bioframe
0.009969
0.010782
0.010297
1.00x
pyranges1
0.011982
0.012304
0.012103
0.85x
pybedtools
0.014775
0.015246
0.014956
0.69x
10000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.01015
0.018572
0.013027
1.84x
bioframe
0.02268
0.025605
0.023968
1.00x
pyranges0
0.016065
0.018936
0.017143
1.40x
pyranges1
0.030509
0.031181
0.030868
0.78x
pybedtools
1.335037
1.358509
1.345311
0.02x
genomicranges
0.322956
0.326403
0.324169
0.07x
pygenomics
0.136783
0.141169
0.13853
0.17x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.009616
0.011293
0.010447
3.08x
bioframe
0.031761
0.032938
0.032167
1.00x
pyranges0
0.010939
0.011387
0.01109
2.90x
pyranges1
0.015275
0.015676
0.015419
2.09x
pybedtools
0.059244
0.059899
0.059542
0.54x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.007028
0.007725
0.007363
2.50x
bioframe
0.018051
0.019179
0.018436
1.00x
pyranges1
0.014252
0.014683
0.014423
1.28x
pybedtools
0.926946
1.012523
0.973852
0.02x
genomicranges
0.330064
0.33175
0.331123
0.06x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.005994
0.006967
0.006396
2.78x
bioframe
0.017402
0.018389
0.017779
1.00x
pyranges1
0.022651
0.023034
0.022779
0.78x
pybedtools
0.952175
1.000698
0.97678
0.02x
100000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.059271
0.08788
0.070483
3.02x
bioframe
0.209074
0.218512
0.21252
1.00x
pyranges0
0.144653
0.164863
0.151749
1.40x
pyranges1
0.228314
0.247017
0.234636
0.91x
pybedtools
19.263571
19.313483
19.286741
0.01x
genomicranges
3.290473
3.294306
3.291987
0.06x
pygenomics
1.881858
1.924059
1.896222
0.11x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.054573
0.060741
0.057958
6.31x
bioframe
0.363422
0.368554
0.365524
1.00x
pyranges0
0.062446
0.06448
0.06321
5.78x
pyranges1
0.084614
0.086633
0.085545
4.27x
pybedtools
0.570352
0.57555
0.572301
0.64x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.034675
0.047883
0.039593
7.85x
bioframe
0.309819
0.311936
0.310958
1.00x
pyranges1
0.113469
0.114316
0.113866
2.73x
pybedtools
15.265868
16.802575
16.206183
0.02x
genomicranges
3.369224
3.374411
3.371411
0.09x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.02543
0.026808
0.026079
4.08x
bioframe
0.104575
0.1096
0.106393
1.00x
pyranges1
0.147505
0.151673
0.149512
0.71x
pybedtools
16.382024
17.619212
16.802475
0.01x
1000000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.947673
1.10915
1.003667
2.49x
bioframe
2.490142
2.513556
2.499533
1.00x
pyranges0
2.119717
2.178453
2.148959
1.16x
pyranges1
3.274957
3.298976
3.288601
0.76x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.771199
0.783292
0.777746
6.96x
bioframe
5.394265
5.434618
5.411728
1.00x
pyranges0
0.874484
0.932857
0.901145
6.01x
pyranges1
1.127032
1.149141
1.140538
4.74x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.501775
0.530884
0.514401
8.01x
bioframe
4.117035
4.131015
4.121744
1.00x
pyranges1
1.583204
1.678121
1.631619
2.53x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.247626
0.266565
0.256522
4.86x
bioframe
1.243608
1.250394
1.246153
1.00x
pyranges1
1.916323
2.005555
1.949487
0.64x
10000000-1p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
18.531273
18.806266
18.632293
1.62x
bioframe
30.074841
30.116671
30.097846
1.00x
pyranges0
29.579651
30.536834
29.904783
1.01x
pyranges1
42.196037
42.278681
42.232728
0.71x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
14.43136
14.548645
14.496101
5.56x
bioframe
80.443039
80.705181
80.548879
1.00x
pyranges0
13.64936
14.330292
13.882901
5.80x
pyranges1
17.384461
17.654503
17.561143
4.59x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
9.812223
9.925855
9.856571
6.23x
bioframe
61.348815
61.558393
61.444649
1.00x
pyranges1
24.969282
25.069392
25.029806
2.45x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
3.188742
3.31096
3.235061
6.11x
bioframe
19.748385
19.802079
19.778009
1.00x
pyranges1
30.304058
30.446378
30.353857
0.65x
Parallel performance
Results for parallel operations with 1, 2, 4, 6 and 8 threads.
apple-m3-max
8-7-8p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
3.247022
3.803021
3.370889
1.00x
polars_bio-2
1.798569
1.848162
1.811417
1.86x
polars_bio-4
1.140229
1.158243
1.147355
2.94x
polars_bio-6
0.959703
0.968725
0.962915
3.50x
polars_bio-8
0.694637
0.710492
0.701048
4.81x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
2.186354
2.248171
2.220822
1.00x
polars_bio-2
1.162969
1.222115
1.187505
1.87x
polars_bio-4
0.708508
0.735763
0.720115
3.08x
polars_bio-6
0.632877
0.652955
0.642816
3.45x
polars_bio-8
0.456674
0.476473
0.465284
4.77x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.502551
1.534006
1.515078
1.00x
polars_bio-2
0.811236
0.821365
0.815682
1.86x
polars_bio-4
0.440628
0.46778
0.455358
3.33x
polars_bio-6
0.331317
0.338207
0.334638
4.53x
polars_bio-8
0.280465
0.282707
0.281311
5.39x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.181806
1.185549
1.183889
1.00x
polars_bio-2
0.644288
0.645076
0.644587
1.84x
polars_bio-4
0.362752
0.363411
0.363036
3.26x
polars_bio-6
0.258583
0.272702
0.264111
4.48x
polars_bio-8
0.222888
0.234884
0.229052
5.17x
1000000-8p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.468442
0.523065
0.494609
1.00x
polars_bio-2
0.262861
0.26828
0.265028
1.87x
polars_bio-4
0.1629
0.166657
0.164536
3.01x
polars_bio-6
0.137724
0.146893
0.143772
3.44x
polars_bio-8
0.111952
0.11465
0.113521
4.36x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.393067
0.415076
0.404032
1.00x
polars_bio-2
0.234559
0.235746
0.235051
1.72x
polars_bio-4
0.158996
0.167352
0.16349
2.47x
polars_bio-6
0.14634
0.14935
0.148215
2.73x
polars_bio-8
0.125472
0.128158
0.126606
3.19x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.267875
0.296727
0.277677
1.00x
polars_bio-2
0.163662
0.170045
0.165917
1.67x
polars_bio-4
0.111136
0.114835
0.112891
2.46x
polars_bio-6
0.097944
0.104607
0.101477
2.74x
polars_bio-8
0.099474
0.117493
0.106059
2.62x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.128377
0.131261
0.129598
1.00x
polars_bio-2
0.081762
0.085104
0.08324
1.56x
polars_bio-4
0.064151
0.066197
0.064851
2.00x
polars_bio-6
0.066926
0.06892
0.06768
1.91x
polars_bio-8
0.072767
0.074339
0.073589
1.76x
10000000-8p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
9.081732
9.388126
9.203018
1.00x
polars_bio-2
4.696455
4.912478
4.793254
1.92x
polars_bio-4
2.885023
2.902893
2.896218
3.18x
polars_bio-6
2.196605
2.217945
2.209839
4.16x
polars_bio-8
1.813586
1.860947
1.833498
5.02x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
7.299887
7.659385
7.495962
1.00x
polars_bio-2
4.01928
4.158504
4.069511
1.84x
polars_bio-4
2.683383
2.720981
2.704975
2.77x
polars_bio-6
2.141075
2.162109
2.150595
3.49x
polars_bio-8
1.859186
1.865634
1.862653
4.02x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
5.30938
5.450332
5.381068
1.00x
polars_bio-2
2.893766
2.91378
2.906401
1.85x
polars_bio-4
1.748771
1.797485
1.768895
3.04x
polars_bio-6
1.352671
1.385655
1.369312
3.93x
polars_bio-8
1.178559
1.199971
1.192577
4.51x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
1.638818
1.678156
1.655573
1.00x
polars_bio-2
0.994195
0.996554
0.995701
1.66x
polars_bio-4
0.678722
0.701234
0.689151
2.40x
polars_bio-6
0.620289
0.662175
0.639026
2.59x
polars_bio-8
0.570659
0.582937
0.57688
2.87x
gcp-linux
8-7-8p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
6.325617
8.185275
7.005925
1.00x
polars_bio-2
3.920645
4.617084
4.198055
1.67x
polars_bio-4
3.036273
3.060781
3.0452
2.30x
polars_bio-6
2.127994
2.134505
2.131016
3.29x
polars_bio-8
1.731485
1.789347
1.752986
4.00x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
4.047329
4.439016
4.198198
1.00x
polars_bio-2
2.624132
2.722843
2.682361
1.57x
polars_bio-4
1.809028
1.917798
1.871763
2.24x
polars_bio-6
1.309557
1.362131
1.333989
3.15x
polars_bio-8
1.066945
1.113168
1.087907
3.86x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
2.426441
2.456318
2.439266
1.00x
polars_bio-2
1.22516
1.272066
1.245401
1.96x
polars_bio-4
0.711421
0.744023
0.724315
3.37x
polars_bio-6
0.563797
0.607321
0.580574
4.20x
polars_bio-8
0.459308
0.493886
0.479126
5.09x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
2.212958
2.23035
2.222531
1.00x
polars_bio-2
1.132056
1.15405
1.146413
1.94x
polars_bio-4
0.645737
0.661564
0.652277
3.41x
polars_bio-6
0.50589
0.511256
0.50839
4.37x
polars_bio-8
0.439503
0.450924
0.447075
4.97x
1000000-8p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.903831
1.098046
0.974229
1.00x
polars_bio-2
0.50099
0.512259
0.504852
1.93x
polars_bio-4
0.300453
0.328605
0.318188
3.06x
polars_bio-6
0.257792
0.278203
0.268718
3.63x
polars_bio-8
0.22321
0.243244
0.230621
4.22x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.758815
0.78919
0.769877
1.00x
polars_bio-2
0.465192
0.47484
0.468824
1.64x
polars_bio-4
0.332101
0.336953
0.334461
2.30x
polars_bio-6
0.276071
0.29266
0.281794
2.73x
polars_bio-8
0.237269
0.263256
0.254046
3.03x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.496938
0.517659
0.505043
1.00x
polars_bio-2
0.295325
0.313859
0.302686
1.67x
polars_bio-4
0.194371
0.20433
0.200853
2.51x
polars_bio-6
0.175505
0.181913
0.178222
2.83x
polars_bio-8
0.15672
0.163036
0.160701
3.14x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
0.245895
0.250118
0.247479
1.00x
polars_bio-2
0.167378
0.173578
0.171251
1.45x
polars_bio-4
0.122749
0.126635
0.124491
1.99x
polars_bio-6
0.11385
0.119157
0.116185
2.13x
polars_bio-8
0.108127
0.110327
0.10942
2.26x
10000000-8p
overlap
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
18.215782
19.091392
18.510207
1.00x
polars_bio-2
9.399565
9.680242
9.566631
1.93x
polars_bio-4
5.303647
5.555487
5.442898
3.40x
polars_bio-6
4.022274
4.066371
4.051045
4.57x
polars_bio-8
3.369559
3.416123
3.388564
5.46x
nearest
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
14.325736
14.444885
14.39027
1.00x
polars_bio-2
8.095907
8.178189
8.136852
1.77x
polars_bio-4
5.096407
5.15379
5.122893
2.81x
polars_bio-6
3.986362
4.205706
4.128561
3.49x
polars_bio-8
3.491618
3.711814
3.577309
4.02x
count-overlaps
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
9.701679
9.796148
9.740035
1.00x
polars_bio-2
5.346433
5.399117
5.370757
1.81x
polars_bio-4
3.150557
3.203458
3.178719
3.06x
polars_bio-6
2.485947
2.56386
2.52768
3.85x
polars_bio-8
2.156472
2.176608
2.163483
4.50x
coverage
Library
Min (s)
Max (s)
Mean (s)
Speedup
polars_bio
3.091184
3.216964
3.135982
1.00x
polars_bio-2
1.998423
2.041581
2.01331
1.56x
polars_bio-4
1.412483
1.45218
1.426102
2.20x
polars_bio-6
1.281432
1.328666
1.301256
2.41x
polars_bio-8
1.176944
1.193294
1.18414
2.65x
End to end tests
Results for an end-to-end test with calculating overlaps, nearest, coverage and count overlaps and saving results to a CSV file.
Note
Please note that in case of pyranges0 we were unable to export the results of coverage and count-overlaps operations to a CSV file, so the results are not presented here.
apple-m3-max
1-2
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.042378
0.130957
0.071929
3.10x
285.468
polars_bio_streaming
0.035498
0.037438
0.036653
6.09x
274.093
bioframe
0.208548
0.251457
0.223219
1.00x
300.75
pyranges0
0.409707
0.415361
0.412135
0.54x
329.968
pyranges1
0.47518
0.491508
0.482739
0.46x
324.468
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.053349
0.058382
0.055362
9.14x
321.062
polars_bio_streaming
0.051385
0.053979
0.052764
9.59x
311.422
bioframe
0.503887
0.510257
0.506123
1.00x
316.969
pyranges0
1.135469
1.183369
1.151801
0.44x
364.594
pyranges1
1.327935
1.334101
1.331346
0.38x
357.734
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.034756
0.038718
0.036421
13.40x
290.078
polars_bio_streaming
0.03607
0.037332
0.036534
13.35x
274.344
bioframe
0.48449
0.492328
0.487891
1.00x
419.312
pyranges1
0.971084
0.980085
0.975012
0.50x
407.562
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.03452
0.037714
0.035927
9.27x
294.266
polars_bio_streaming
0.035863
0.036756
0.036414
9.14x
278.438
bioframe
0.328145
0.338734
0.332951
1.00x
306.234
pyranges1
0.532739
0.544914
0.538646
0.62x
328.328
8-7
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
22.781745
23.916568
23.161559
16.64x
14677.0468
polars_bio_streaming
18.501279
18.797602
18.676707
20.63x
555.109
bioframe
383.108514
387.500069
385.309331
1.00x
33806.062
pyranges0
276.421312
279.839508
277.845198
1.39x
29777.312
pyranges1
355.703878
367.680249
360.875151
1.07x
34526.859
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
2.597955
2.760537
2.674482
32.02x
1060.031
polars_bio_streaming
2.65088
2.685157
2.665171
32.13x
560.453
bioframe
85.238305
86.131916
85.644961
1.00x
6894.062
pyranges0
13.530549
13.705834
13.620471
6.29x
3031.797
pyranges1
16.290782
16.385961
16.322671
5.25x
3509.984
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
1.523833
1.555472
1.541038
21.41x
717.984
polars_bio_streaming
1.336613
1.397324
1.364051
24.19x
411.703
bioframe
32.294844
33.421618
32.99334
1.00x
16651.922
pyranges1
26.382409
27.382901
27.020202
1.22x
6119.125
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
1.806838
1.845584
1.82594
54.33x
729.078
polars_bio_streaming
1.681187
1.767811
1.714943
57.85x
416.094
bioframe
97.91802
101.736351
99.210461
1.00x
23029.219
pyranges1
19.498264
19.676838
19.561322
5.07x
5270.234
100-1p
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.009118
0.077181
0.032054
1.55x
248.594
polars_bio_streaming
0.003382
0.004769
0.003853
12.92x
247.562
bioframe
0.030154
0.088667
0.049769
1.00x
231.641
pyranges0
0.045764
0.051035
0.047857
1.04x
228.516
pyranges1
0.053751
0.072545
0.060221
0.83x
228.609
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.009145
0.038799
0.019201
2.24x
253.156
polars_bio_streaming
0.003964
0.005051
0.004504
9.53x
248.188
bioframe
0.033372
0.061107
0.042931
1.00x
229.906
pyranges0
0.049586
0.057381
0.052364
0.82x
231.812
pyranges1
0.054496
0.059205
0.056362
0.76x
231.688
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.005492
0.020652
0.012584
5.20x
245.578
polars_bio_streaming
0.003059
0.003746
0.003397
19.25x
243.5
bioframe
0.060684
0.074157
0.065378
1.00x
230.953
pyranges1
0.093668
0.096265
0.094567
0.69x
243.5
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.005291
0.008843
0.006568
5.53x
249.406
polars_bio_streaming
0.003279
0.003697
0.003447
10.53x
245.672
bioframe
0.032914
0.042309
0.036302
1.00x
234.141
pyranges1
0.045085
0.045477
0.045224
0.80x
232.703
10000000-1p
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
11.109423
11.871893
11.397992
10.38x
7064.312
polars_bio_streaming
12.049206
12.327491
12.191582
9.71x
1505.109
bioframe
117.701516
119.51073
118.356016
1.00x
16380.234
pyranges0
235.484308
243.216406
239.726101
0.49x
14245.203
pyranges1
109.722359
112.326873
111.23273
1.06x
19423.172
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
7.842314
8.84828
8.510181
21.04x
2301.0
polars_bio_streaming
7.589706
8.153016
7.842404
22.83x
1327.531
bioframe
174.790383
183.458906
179.035999
1.00x
10996.234
pyranges0
32.793505
32.826686
32.809101
5.46x
4882.656
pyranges1
18.866156
19.570609
19.142653
9.35x
5253.281
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
1.901833
1.957711
1.928367
12.15x
956.844
polars_bio_streaming
1.797332
1.802527
1.800497
13.01x
651.266
bioframe
23.269774
23.55838
23.430125
1.00x
6493.234
pyranges1
26.370249
27.172173
26.879266
0.87x
10397.531
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
5.025462
5.234103
5.129963
20.79x
1036.734
polars_bio_streaming
4.956087
5.076052
5.014242
21.27x
968.719
bioframe
105.322287
107.758078
106.64158
1.00x
12803.828
pyranges1
22.079391
23.069931
22.618209
4.71x
10039.297
gcp-linux
1-2
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.072393
0.151871
0.09916
2.80x
314.234
polars_bio_streaming
0.064092
0.067914
0.066202
4.19x
288.621
bioframe
0.258278
0.31288
0.277225
1.00x
287.101
pyranges0
0.591745
0.599954
0.595204
0.47x
307.218
pyranges1
0.683388
0.702289
0.690362
0.40x
327.863
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.123656
1.702108
0.659015
1.99x
331.398
polars_bio_streaming
0.111801
0.762227
0.328874
3.98x
308.738
bioframe
0.881782
2.161628
1.309551
1.00x
297.695
pyranges0
1.728053
2.579086
2.030527
0.64x
308.93
pyranges1
1.953048
2.161655
2.049615
0.64x
337.352
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.064626
0.146983
0.093048
8.11x
299.094
polars_bio_streaming
0.065193
0.072839
0.068651
10.99x
280.387
bioframe
0.704155
0.791049
0.754463
1.00x
328.184
pyranges1
1.41166
1.432833
1.42261
0.53x
352.582
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.063012
0.207803
0.113156
4.03x
309.176
polars_bio_streaming
0.062735
0.071474
0.065886
6.92x
286.336
bioframe
0.436935
0.491839
0.455688
1.00x
303.07
pyranges1
0.785823
0.786847
0.786487
0.58x
316.227
8-7
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
44.539766
45.543038
45.196903
12.55x
14575.14
polars_bio_streaming
34.007093
35.972075
35.309756
16.06x
480.207
bioframe
566.167037
567.617695
567.13069
1.00x
43295.378
pyranges0
417.291061
421.875539
419.571591
1.35x
22915.917
pyranges1
538.365637
548.624613
543.918168
1.04x
43408.699
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
6.565142
7.696104
6.973448
12.21x
1070.016
polars_bio_streaming
5.840416
6.828222
6.203
13.73x
527.008
bioframe
84.30831
86.512823
85.150539
1.00x
2418.629
pyranges0
20.679566
21.424632
20.949203
4.06x
2239.047
pyranges1
25.352803
27.604137
26.544063
3.21x
2534.629
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
3.532309
3.660049
3.579428
12.53x
738.887
polars_bio_streaming
3.167344
3.169622
3.168694
14.15x
416.164
bioframe
41.150587
51.89725
44.839673
1.00x
14297.098
pyranges1
40.065526
41.350493
40.892187
1.10x
3096.812
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
3.827717
3.83517
3.830823
25.47x
737.059
polars_bio_streaming
3.346898
3.388796
3.372987
28.93x
428.422
bioframe
97.272988
97.790775
97.572564
1.00x
25981.051
pyranges1
30.021737
30.181438
30.124339
3.24x
3102.84
100-1p
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.012022
0.433918
0.153077
0.55x
262.656
polars_bio_streaming
0.007427
0.153294
0.056144
1.49x
259.039
bioframe
0.039406
0.172494
0.08386
1.00x
229.824
pyranges0
0.059086
0.075573
0.06466
1.30x
231.199
pyranges1
0.069077
0.088036
0.075488
1.11x
230.684
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.012814
0.408332
0.145315
1.24x
263.16
polars_bio_streaming
0.007605
0.007975
0.00779
23.20x
260.242
bioframe
0.044222
0.45263
0.180742
1.00x
230.684
pyranges0
0.066032
0.074886
0.068992
2.62x
231.195
pyranges1
0.07111
0.075383
0.072851
2.48x
230.68
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.008954
0.106466
0.041616
2.35x
258.184
polars_bio_streaming
0.006726
0.007524
0.007124
13.72x
255.258
bioframe
0.07866
0.135591
0.097742
1.00x
230.68
pyranges1
0.120404
0.12302
0.121487
0.80x
231.023
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
0.008886
0.095835
0.038506
1.53x
262.312
polars_bio_streaming
0.006988
0.008915
0.007637
7.71x
259.555
bioframe
0.043766
0.08625
0.058895
1.00x
230.852
pyranges1
0.060574
0.060725
0.06064
0.97x
231.195
10000000-1p
e2e-overlap-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
35.783773
37.155414
36.274578
4.73x
6926.227
polars_bio_streaming
28.364026
33.834447
32.005189
5.37x
1172.484
bioframe
170.321558
173.826371
171.750864
1.00x
17544.5
pyranges0
374.384106
377.106338
375.972726
0.46x
12951.133
pyranges1
174.205234
176.465726
174.996859
0.98x
23198.973
e2e-nearest-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
16.577973
16.599008
16.591114
5.88x
2208.477
polars_bio_streaming
14.957252
15.214483
15.115055
6.45x
1202.555
bioframe
96.638005
98.21701
97.559755
1.00x
7832.391
pyranges0
54.678927
55.140916
54.905002
1.78x
3125.051
pyranges1
31.874441
33.028755
32.303297
3.02x
4447.332
e2e-coverage-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
4.184608
4.403694
4.281924
6.97x
768.75
polars_bio_streaming
3.792566
3.917329
3.838723
7.77x
591.812
bioframe
29.609636
29.972788
29.838014
1.00x
4040.508
pyranges1
41.671912
42.238756
41.949904
0.71x
8503.844
e2e-count-overlaps-csv
Library
Min (s)
Max (s)
Mean (s)
Speedup
Peak memory (MB)
polars_bio
11.003474
11.126526
11.052697
6.35x
1012.105
polars_bio_streaming
9.939793
10.434084
10.264927
6.83x
696.078
bioframe
70.00646
70.300308
70.14716
1.00x
8315.176
pyranges1
33.521685
33.726979
33.593637
2.09x
8495.672
Memory profiles
### apple-m3-max
#### 1-2
Operation: overlap for dataset: 1-2 on platform: apple-m3-max
Operation: nearest for dataset: 1-2 on platform: apple-m3-max
Operation: coverage for dataset: 1-2 on platform: apple-m3-max
Operation: count-overlaps for dataset: 1-2 on platform: apple-m3-max
#### 8-7
Operation: overlap for dataset: 8-7 on platform: apple-m3-max
Operation: nearest for dataset: 8-7 on platform: apple-m3-max
Operation: coverage for dataset: 8-7 on platform: apple-m3-max
Operation: count-overlaps for dataset: 8-7 on platform: apple-m3-max
#### 100-1p
Operation: overlap for dataset: 100-1p on platform: apple-m3-max
Operation: nearest for dataset: 100-1p on platform: apple-m3-max
Operation: coverage for dataset: 100-1p on platform: apple-m3-max
Operation: count-overlaps for dataset: 100-1p on platform: apple-m3-max
#### 10000000-1p
Operation: overlap for dataset: 10000000-1p on platform: apple-m3-max
Operation: nearest for dataset: 10000000-1p on platform: apple-m3-max
Operation: coverage for dataset: 10000000-1p on platform: apple-m3-max
Operation: count-overlaps for dataset: 10000000-1p on platform: apple-m3-max
### gcp-linux
#### 1-2
Operation: overlap for dataset: 1-2 on platform: gcp-linux
Operation: nearest for dataset: 1-2 on platform: gcp-linux
Operation: coverage for dataset: 1-2 on platform: gcp-linux
Operation: count-overlaps for dataset: 1-2 on platform: gcp-linux
#### 8-7
Operation: overlap for dataset: 8-7 on platform: gcp-linux
Operation: nearest for dataset: 8-7 on platform: gcp-linux
Operation: coverage for dataset: 8-7 on platform: gcp-linux
Operation: count-overlaps for dataset: 8-7 on platform: gcp-linux
#### 100-1p
Operation: overlap for dataset: 100-1p on platform: gcp-linux
Operation: nearest for dataset: 100-1p on platform: gcp-linux
Operation: coverage for dataset: 100-1p on platform: gcp-linux
Operation: count-overlaps for dataset: 100-1p on platform: gcp-linux
#### 10000000-1p
Operation: overlap for dataset: 10000000-1p on platform: gcp-linux
Operation: nearest for dataset: 10000000-1p on platform: gcp-linux
Operation: coverage for dataset: 10000000-1p on platform: gcp-linux
Operation: count-overlaps for dataset: 10000000-1p on platform: gcp-linux
Comparison of the output schemas and data types
polars-bio tries to preserve the output schema of the bioframe package, pyranges uses its own internal representation that can be converted to a Pandas dataframe. It is also worth mentioning that pyranges always uses int64 for start/end positions representation (polars-bio and bioframe determine it adaptively based on the input file formats/DataFrames datatypes used. polars-bio does not support interval operations on chromosomes longer than 2Gp(issue)). However, in the analyzed test case (8-7) input/output data structures have similar memory requirements.
Please compare the following schema and memory size estimates of the input and output DataFrames for 8-7 test case:
Please note that pyranges unlike bioframe and polars-bio returns only one chromosome column but uses int64 data types for encoding start and end positions even if input datasets use int32.