Skip to content

๐Ÿ”ฌ Supplementary material

Supplemental material

This document provides additional information about the benchmarking setup, data, and results that were presented in the manuscript.

Benchmark setup

Code and benchmarking scenarios

Repository

Memory profiling

For memory profiling Python memory-profiler version 0.61.0 was used. A helper run-memory-profiler.py script was developed and a sample invocation was used to run the tests as it is presented in the snippet below:

PRFOF_FILE="polars_bio_1-2.dat"
mprof run --output $PRFOF_FILE python src/run-memory-profiler.py --bench-config conf/paper/benchmark-e2e-overlap.yaml --tool polars_bio --test-case 1-2
mprof plot $PRFOF_FILE

Operating systems and hardware configurations

macOS

  • cpu architecture: arm64
  • cpu name: Apple M3 Max
  • cpu cores: 16
  • memory: 64 GB
  • kernel: Darwin Kernel Version 24.2.0: Fri Dec 6 19:02:12 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6031
  • system: Darwin
  • os-release: macOS-15.2-arm64-arm-64bit
  • python: 3.12.4
  • polars-bio: 0.8.3

Linux

c3-standard-22 machine was used for benchmarking.

  • cpu architecture: x86_64
  • cpu name: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
  • cpu cores: 22
  • memory: 88 GB
  • kernel: Linux-6.8.0-1025-gcp-x86_64-with-glibc2.35
  • system: Linux
  • os-release: #27~22.04.1-Ubuntu SMP Mon Feb 24 16:42:24 UTC 2025
  • python: 3.12.8
  • polars-bio: 0.8.3

Software

Data

Real dataset

The AIList dataset after transcoding into the Parquet file format (with the Snappy compression) was used for benchmarking. This dataset was published with the AIList paper:

Jianglin Feng , Aakrosh Ratan , Nathan C Sheffield, Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics 2019.

Dataset# Name Size(x1000) Description
0 chainRn4 2,351 Source
1 fBrain 199 Source
2 exons 439 Dataset used in the BEDTools tutorial.
3 chainOrnAna1 1,957 Source
4 chainVicPac2 7,684 Source
5 chainXenTro3Link 50,981 Source
6 chainMonDom5Link 128,187 Source
7 ex-anno 1,194 Dataset contains GenCode annotations with ~1.2 million lines, mixing all types of features.
8 ex-rna 9,945 Dataset contains ~10 million direct-RNA mappings.

Source: AIList Github

All Parquet files from this dataset shared the same schema:

  contig STRING
  pos_start INT32
  pos_end INT32

Sythetic dataset

Randomly generated intervals (100-10,000,000) inspired by bioframe. Generated with generate_dataset.py

poetry run python src/generate_dataset.py
All Parquet files from this dataset shared the same schema:
  contig STRING
  pos_start INT64
  pos_end INT64

Note

Test datasets in the Parquet format can be downloaded from:

Single thread results

Results for overlap, nearest, count-overlaps, and coverage operations with single-thread performance on apple-m3-max and gcp-linux platforms.

apple-m3-max

1-2

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.035619 0.043113 0.0383 2.70x
bioframe 0.102257 0.104425 0.103354 1.00x
pyranges0 0.025425 0.032821 0.028001 3.69x
pyranges1 0.059608 0.064147 0.061763 1.67x
pybedtools 0.343204 0.352804 0.348434 0.30x
genomicranges 1.042893 1.044245 1.043488 0.10x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.039943 0.045166 0.042109 4.45x
bioframe 0.185452 0.189631 0.187388 1.00x
pyranges0 0.092334 0.09634 0.093688 2.00x
pyranges1 0.133631 0.134179 0.133981 1.40x
pybedtools 0.756676 0.761866 0.75953 0.25x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.026706 0.029754 0.028142 4.69x
bioframe 0.131124 0.133729 0.132052 1.00x
pyranges0 0.039136 0.039774 0.039377 3.35x
pyranges1 0.061976 0.063181 0.062658 2.11x
pybedtools 0.665804 0.673844 0.668534 0.20x
genomicranges 0.994963 1.006435 0.999389 0.13x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.0262 0.028749 0.027418 6.30x
bioframe 0.16949 0.176628 0.172842 1.00x
pyranges0 0.07376 0.076708 0.075369 2.29x
pyranges1 0.128027 0.133263 0.130247 1.33x
pybedtools 0.701817 0.708726 0.705839 0.24x
genomicranges 1.032651 1.049059 1.040799 0.17x

8-7

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.987391 4.648581 4.235518 7.17x
bioframe 29.793837 30.991576 30.375518 1.00x
pyranges0 15.632212 15.974075 15.857213 1.92x
pyranges1 31.622804 33.699074 32.680701 0.93x
pybedtools 916.711575 919.974811 918.154834 0.03x
genomicranges 479.214112 487.832054 484.579554 0.06x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.116922 2.169534 2.139006 32.13x
bioframe 68.581465 68.992651 68.725495 1.00x
pyranges0 1.381964 1.508513 1.424446 48.25x
pyranges1 2.697684 2.728407 2.717532 25.29x
pybedtools 35.528719 35.876667 35.699544 1.93x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.445467 1.484052 1.46225 58.77x
bioframe 85.632767 86.26148 85.935955 1.00x
pyranges0 9.674847 9.833233 9.753982 8.81x
pyranges1 10.170249 10.254359 10.201813 8.42x
pybedtools 33.101592 33.966188 33.423595 2.57x
genomicranges 488.972732 490.395787 489.548184 0.18x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.195279 1.205765 1.199323 20.45x
bioframe 24.423391 24.682901 24.525909 1.00x
pyranges0 11.093644 11.328071 11.220416 2.19x
pyranges1 11.987003 12.147925 12.066045 2.03x
pybedtools 59.699275 60.04087 59.84965 0.41x
genomicranges 500.041974 503.31936 502.043072 0.05x

100-1p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002471 0.006262 0.003855 0.54x
bioframe 0.001374 0.002735 0.002067 1.00x
pyranges0 0.000977 0.001952 0.001337 1.55x
pyranges1 0.002276 0.003591 0.002739 0.75x
pybedtools 0.006856 0.010064 0.008032 0.26x
genomicranges 0.001784 0.002115 0.001938 1.07x
pygenomics 0.000475 0.000541 0.000509 4.06x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002802 0.007312 0.004371 0.51x
bioframe 0.00157 0.00347 0.002251 1.00x
pyranges0 0.00135 0.004085 0.002281 0.99x
pyranges1 0.002084 0.003622 0.002633 0.85x
pybedtools 0.005288 0.023073 0.011717 0.19x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.001892 0.006355 0.003397 0.52x
bioframe 0.001563 0.002165 0.001775 1.00x
pyranges1 0.00181 0.002209 0.001972 0.90x
pybedtools 0.020892 0.062978 0.036866 0.05x
genomicranges 0.001896 0.002057 0.001957 0.91x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.001911 0.006057 0.003343 1.03x
bioframe 0.003065 0.00411 0.003452 1.00x
pyranges1 0.004455 0.005845 0.005021 0.69x
pybedtools 0.02477 0.059532 0.037421 0.09x

1000-1p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.00262 0.004367 0.003278 0.71x
bioframe 0.001909 0.002988 0.002313 1.00x
pyranges0 0.001361 0.00182 0.001543 1.50x
pyranges1 0.002678 0.003166 0.002927 0.79x
pybedtools 0.037238 0.039737 0.038453 0.06x
genomicranges 0.019265 0.019945 0.01957 0.12x
pygenomics 0.006876 0.006994 0.006949 0.33x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.003048 0.0083 0.00553 0.65x
bioframe 0.003269 0.004119 0.003604 1.00x
pyranges0 0.002514 0.003506 0.003099 1.16x
pyranges1 0.003722 0.00418 0.003935 0.92x
pybedtools 0.00881 0.011281 0.009729 0.37x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.001854 0.004714 0.002898 1.00x
bioframe 0.002523 0.003547 0.002898 1.00x
pyranges1 0.002302 0.002838 0.002498 1.16x
pybedtools 0.032681 0.047822 0.037981 0.08x
genomicranges 0.020029 0.02029 0.020192 0.14x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002202 0.003516 0.002696 1.77x
bioframe 0.004238 0.005691 0.004758 1.00x
pyranges1 0.004909 0.005934 0.005284 0.90x
pybedtools 0.030735 0.045004 0.03646 0.13x

10000-1p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004603 0.008294 0.006073 1.81x
bioframe 0.010529 0.011367 0.011014 1.00x
pyranges0 0.006498 0.007306 0.006811 1.62x
pyranges1 0.01096 0.012611 0.011684 0.94x
pybedtools 0.94646 0.94995 0.948121 0.01x
genomicranges 0.198868 0.200266 0.199428 0.06x
pygenomics 0.080325 0.08121 0.080663 0.14x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.004851 0.007782 0.005908 4.50x
bioframe 0.025947 0.027779 0.026584 1.00x
pyranges0 0.00501 0.005703 0.00526 5.05x
pyranges1 0.007517 0.007937 0.00769 3.46x
pybedtools 0.040749 0.043864 0.041889 0.63x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.003283 0.008069 0.005083 3.12x
bioframe 0.014669 0.016689 0.015834 1.00x
pyranges1 0.007637 0.008979 0.008178 1.94x
pybedtools 0.720797 0.730655 0.725407 0.02x
genomicranges 0.202131 0.209398 0.204628 0.08x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.002756 0.004613 0.003377 3.06x
bioframe 0.009849 0.011243 0.010339 1.00x
pyranges1 0.01326 0.015308 0.013973 0.74x
pybedtools 0.727294 0.733098 0.73116 0.01x

100000-1p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.030583 0.038892 0.033394 3.33x
bioframe 0.108358 0.115233 0.111059 1.00x
pyranges0 0.059633 0.065599 0.061791 1.80x
pyranges1 0.100074 0.105947 0.102267 1.09x
pybedtools 13.434458 13.602339 13.496321 0.01x
genomicranges 2.030365 2.052434 2.039897 0.05x
pygenomics 1.001974 1.018231 1.009213 0.11x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.03013 0.036718 0.03339 10.61x
bioframe 0.352786 0.356839 0.354241 1.00x
pyranges0 0.032403 0.034701 0.033667 10.52x
pyranges1 0.044958 0.046169 0.045629 7.76x
pybedtools 0.369122 0.379131 0.3729 0.95x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.021035 0.026894 0.023802 13.86x
bioframe 0.308013 0.347919 0.329806 1.00x
pyranges1 0.076199 0.085019 0.079372 4.16x
pybedtools 11.056327 11.280248 11.149039 0.03x
genomicranges 2.057607 2.07651 2.067998 0.16x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.013263 0.014998 0.013874 5.68x
bioframe 0.077717 0.081116 0.078865 1.00x
pyranges1 0.094753 0.114552 0.10257 0.77x
pybedtools 11.374602 11.428316 11.393849 0.01x

1000000-1p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.482548 0.538737 0.507383 2.55x
bioframe 1.26082 1.35031 1.296195 1.00x
pyranges0 0.775969 0.828801 0.810501 1.60x
pyranges1 1.272326 1.29706 1.28585 1.01x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.439544 0.488414 0.458975 14.86x
bioframe 6.592501 7.111734 6.818208 1.00x
pyranges0 0.398173 0.413055 0.406623 16.77x
pyranges1 0.51649 0.520946 0.518407 13.15x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.257781 0.305275 0.28525 17.65x
bioframe 4.640915 5.437883 5.033454 1.00x
pyranges1 0.916714 0.925945 0.920594 5.47x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.128241 0.137198 0.132474 7.71x
bioframe 0.996542 1.065777 1.021738 1.00x
pyranges1 1.115134 1.247674 1.172964 0.87x

10000000-1p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 8.532137 9.738828 8.978132 2.20x
bioframe 19.276665 20.295566 19.708064 1.00x
pyranges0 14.819439 15.339048 15.092611 1.31x
pyranges1 20.153432 22.654892 21.56345 0.91x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 7.12779 7.490779 7.263011 22.17x
bioframe 156.356696 169.531002 160.989714 1.00x
pyranges0 6.402183 6.879779 6.62806 24.29x
pyranges1 7.526236 8.176338 7.857803 20.49x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 4.887937 5.553197 5.165014 20.21x
bioframe 102.637625 105.903506 104.389343 1.00x
pyranges1 13.35283 15.167609 14.19713 7.35x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.627897 1.683304 1.655288 9.86x
bioframe 15.586487 16.774274 16.316676 1.00x
pyranges1 16.99118 17.447484 17.195844 0.95x

gcp-linux

1-2

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.045943 0.064732 0.054234 1.66x
bioframe 0.084137 0.099481 0.090107 1.00x
pyranges0 0.056206 0.065654 0.061844 1.46x
pyranges1 0.09908 0.119018 0.106228 0.85x
pybedtools 0.38246 0.406379 0.39153 0.23x
genomicranges 1.19939 1.224621 1.208255 0.07x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.057012 0.073822 0.064665 2.49x
bioframe 0.158764 0.165707 0.161273 1.00x
pyranges0 0.172297 0.176259 0.17363 0.93x
pyranges1 0.217619 0.234088 0.22335 0.72x
pybedtools 0.845945 0.84898 0.847447 0.19x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.035631 0.043555 0.04066 2.74x
bioframe 0.108015 0.116522 0.111266 1.00x
pyranges0 0.077336 0.080282 0.07844 1.42x
pyranges1 0.100883 0.106671 0.103181 1.08x
pybedtools 0.745958 0.759006 0.754393 0.15x
genomicranges 1.154942 1.164158 1.158506 0.10x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.036476 0.040001 0.037897 5.10x
bioframe 0.189201 0.20046 0.193401 1.00x
pyranges0 0.141659 0.14424 0.143188 1.35x
pyranges1 0.206033 0.224902 0.213089 0.91x
pybedtools 0.773732 0.780424 0.776934 0.25x
genomicranges 1.186341 1.194172 1.189255 0.16x

8-7

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 6.235223 9.61441 7.723144 6.54x
bioframe 50.319263 50.956633 50.537202 1.00x
pyranges0 36.371926 36.581642 36.448645 1.39x
pyranges1 63.336711 63.455435 63.40654 0.80x
pybedtools 1149.001487 1152.127068 1150.070659 0.04x
genomicranges 597.951648 599.960895 599.002871 0.08x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.576373 3.679698 3.633697 15.54x
bioframe 56.301865 56.776617 56.464305 1.00x
pyranges0 2.45308 2.60494 2.505172 22.54x
pyranges1 4.975662 5.011008 4.997007 11.30x
pybedtools 44.181913 44.79409 44.386971 1.27x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.052196 2.104447 2.075706 38.15x
bioframe 79.174164 79.234115 79.194209 1.00x
pyranges0 18.797436 18.851941 18.824498 4.21x
pyranges1 20.399172 20.436149 20.418562 3.88x
pybedtools 35.850631 36.142479 36.041115 2.20x
genomicranges 612.985873 613.52087 613.229997 0.13x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.829478 1.838981 1.834999 15.44x
bioframe 28.29136 28.361417 28.326821 1.00x
pyranges0 18.611247 20.021441 19.473105 1.45x
pyranges1 22.118838 22.210733 22.161329 1.28x
pybedtools 74.477086 74.868659 74.618066 0.38x
genomicranges 623.865655 623.94955 623.896645 0.05x

100-1p

overlap
nearest
count-overlaps
coverage

1000-1p

overlap
nearest
count-overlaps
coverage

10000-1p

overlap
nearest
count-overlaps
coverage

100000-1p

overlap
nearest
count-overlaps
coverage

1000000-1p

overlap
nearest
count-overlaps
coverage

10000000-1p

overlap
nearest
count-overlaps
coverage

Parallel performance

Results for parallel operations with 1, 2, 4, 6 and 8 threads.

apple-m3-max

8-7-8p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 3.247022 3.803021 3.370889 1.00x
polars_bio-2 1.798569 1.848162 1.811417 1.86x
polars_bio-4 1.140229 1.158243 1.147355 2.94x
polars_bio-6 0.959703 0.968725 0.962915 3.50x
polars_bio-8 0.694637 0.710492 0.701048 4.81x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 2.186354 2.248171 2.220822 1.00x
polars_bio-2 1.162969 1.222115 1.187505 1.87x
polars_bio-4 0.708508 0.735763 0.720115 3.08x
polars_bio-6 0.632877 0.652955 0.642816 3.45x
polars_bio-8 0.456674 0.476473 0.465284 4.77x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.502551 1.534006 1.515078 1.00x
polars_bio-2 0.811236 0.821365 0.815682 1.86x
polars_bio-4 0.440628 0.46778 0.455358 3.33x
polars_bio-6 0.331317 0.338207 0.334638 4.53x
polars_bio-8 0.280465 0.282707 0.281311 5.39x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.181806 1.185549 1.183889 1.00x
polars_bio-2 0.644288 0.645076 0.644587 1.84x
polars_bio-4 0.362752 0.363411 0.363036 3.26x
polars_bio-6 0.258583 0.272702 0.264111 4.48x
polars_bio-8 0.222888 0.234884 0.229052 5.17x

1000000-8p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.468442 0.523065 0.494609 1.00x
polars_bio-2 0.262861 0.26828 0.265028 1.87x
polars_bio-4 0.1629 0.166657 0.164536 3.01x
polars_bio-6 0.137724 0.146893 0.143772 3.44x
polars_bio-8 0.111952 0.11465 0.113521 4.36x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.393067 0.415076 0.404032 1.00x
polars_bio-2 0.234559 0.235746 0.235051 1.72x
polars_bio-4 0.158996 0.167352 0.16349 2.47x
polars_bio-6 0.14634 0.14935 0.148215 2.73x
polars_bio-8 0.125472 0.128158 0.126606 3.19x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.267875 0.296727 0.277677 1.00x
polars_bio-2 0.163662 0.170045 0.165917 1.67x
polars_bio-4 0.111136 0.114835 0.112891 2.46x
polars_bio-6 0.097944 0.104607 0.101477 2.74x
polars_bio-8 0.099474 0.117493 0.106059 2.62x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 0.128377 0.131261 0.129598 1.00x
polars_bio-2 0.081762 0.085104 0.08324 1.56x
polars_bio-4 0.064151 0.066197 0.064851 2.00x
polars_bio-6 0.066926 0.06892 0.06768 1.91x
polars_bio-8 0.072767 0.074339 0.073589 1.76x

10000000-8p

overlap
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 9.081732 9.388126 9.203018 1.00x
polars_bio-2 4.696455 4.912478 4.793254 1.92x
polars_bio-4 2.885023 2.902893 2.896218 3.18x
polars_bio-6 2.196605 2.217945 2.209839 4.16x
polars_bio-8 1.813586 1.860947 1.833498 5.02x
nearest
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 7.299887 7.659385 7.495962 1.00x
polars_bio-2 4.01928 4.158504 4.069511 1.84x
polars_bio-4 2.683383 2.720981 2.704975 2.77x
polars_bio-6 2.141075 2.162109 2.150595 3.49x
polars_bio-8 1.859186 1.865634 1.862653 4.02x
count-overlaps
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 5.30938 5.450332 5.381068 1.00x
polars_bio-2 2.893766 2.91378 2.906401 1.85x
polars_bio-4 1.748771 1.797485 1.768895 3.04x
polars_bio-6 1.352671 1.385655 1.369312 3.93x
polars_bio-8 1.178559 1.199971 1.192577 4.51x
coverage
Library Min (s) Max (s) Mean (s) Speedup
polars_bio 1.638818 1.678156 1.655573 1.00x
polars_bio-2 0.994195 0.996554 0.995701 1.66x
polars_bio-4 0.678722 0.701234 0.689151 2.40x
polars_bio-6 0.620289 0.662175 0.639026 2.59x
polars_bio-8 0.570659 0.582937 0.57688 2.87x

gcp-linux

8-7-8p

overlap
nearest
count-overlaps
coverage

1000000-8p

overlap
nearest
count-overlaps
coverage

10000000-8p

overlap
nearest
count-overlaps
coverage

End to end tests

Results for an end-to-end test with calculating overlaps, nearest, coverage and count overlaps and saving results to a CSV file.

Note

Please note that in case of pyranges0 we were unable to export the results of coverage and count-overlaps operations to a CSV file, so the results are not presented here.

apple-m3-max

1-2
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.042378 0.130957 0.071929 3.10x 285.468
polars_bio_streaming 0.035498 0.037438 0.036653 6.09x 274.093
bioframe 0.208548 0.251457 0.223219 1.00x 300.75
pyranges0 0.409707 0.415361 0.412135 0.54x 329.968
pyranges1 0.47518 0.491508 0.482739 0.46x 324.468
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.053349 0.058382 0.055362 9.14x 321.062
polars_bio_streaming 0.051385 0.053979 0.052764 9.59x 311.422
bioframe 0.503887 0.510257 0.506123 1.00x 316.969
pyranges0 1.135469 1.183369 1.151801 0.44x 364.594
pyranges1 1.327935 1.334101 1.331346 0.38x 357.734
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.034756 0.038718 0.036421 13.40x 290.078
polars_bio_streaming 0.03607 0.037332 0.036534 13.35x 274.344
bioframe 0.48449 0.492328 0.487891 1.00x 419.312
pyranges1 0.971084 0.980085 0.975012 0.50x 407.562
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.03452 0.037714 0.035927 9.27x 294.266
polars_bio_streaming 0.035863 0.036756 0.036414 9.14x 278.438
bioframe 0.328145 0.338734 0.332951 1.00x 306.234
pyranges1 0.532739 0.544914 0.538646 0.62x 328.328
8-7
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 22.781745 23.916568 23.161559 16.64x 14677.0468
polars_bio_streaming 18.501279 18.797602 18.676707 20.63x 555.109
bioframe 383.108514 387.500069 385.309331 1.00x 33806.062
pyranges0 276.421312 279.839508 277.845198 1.39x 29777.312
pyranges1 355.703878 367.680249 360.875151 1.07x 34526.859
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 2.597955 2.760537 2.674482 32.02x 1060.031
polars_bio_streaming 2.65088 2.685157 2.665171 32.13x 560.453
bioframe 85.238305 86.131916 85.644961 1.00x 6894.062
pyranges0 13.530549 13.705834 13.620471 6.29x 3031.797
pyranges1 16.290782 16.385961 16.322671 5.25x 3509.984
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 1.523833 1.555472 1.541038 21.41x 717.984
polars_bio_streaming 1.336613 1.397324 1.364051 24.19x 411.703
bioframe 32.294844 33.421618 32.99334 1.00x 16651.922
pyranges1 26.382409 27.382901 27.020202 1.22x 6119.125
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 1.806838 1.845584 1.82594 54.33x 729.078
polars_bio_streaming 1.681187 1.767811 1.714943 57.85x 416.094
bioframe 97.91802 101.736351 99.210461 1.00x 23029.219
pyranges1 19.498264 19.676838 19.561322 5.07x 5270.234
100-1p
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.009118 0.077181 0.032054 1.55x 248.594
polars_bio_streaming 0.003382 0.004769 0.003853 12.92x 247.562
bioframe 0.030154 0.088667 0.049769 1.00x 231.641
pyranges0 0.045764 0.051035 0.047857 1.04x 228.516
pyranges1 0.053751 0.072545 0.060221 0.83x 228.609
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.009145 0.038799 0.019201 2.24x 253.156
polars_bio_streaming 0.003964 0.005051 0.004504 9.53x 248.188
bioframe 0.033372 0.061107 0.042931 1.00x 229.906
pyranges0 0.049586 0.057381 0.052364 0.82x 231.812
pyranges1 0.054496 0.059205 0.056362 0.76x 231.688
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.005492 0.020652 0.012584 5.20x 245.578
polars_bio_streaming 0.003059 0.003746 0.003397 19.25x 243.5
bioframe 0.060684 0.074157 0.065378 1.00x 230.953
pyranges1 0.093668 0.096265 0.094567 0.69x 243.5
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.005291 0.008843 0.006568 5.53x 249.406
polars_bio_streaming 0.003279 0.003697 0.003447 10.53x 245.672
bioframe 0.032914 0.042309 0.036302 1.00x 234.141
pyranges1 0.045085 0.045477 0.045224 0.80x 232.703
10000000-1p
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 11.109423 11.871893 11.397992 10.38x 7064.312
polars_bio_streaming 12.049206 12.327491 12.191582 9.71x 1505.109
bioframe 117.701516 119.51073 118.356016 1.00x 16380.234
pyranges0 235.484308 243.216406 239.726101 0.49x 14245.203
pyranges1 109.722359 112.326873 111.23273 1.06x 19423.172
e2e-nearest-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 7.842314 8.84828 8.510181 21.04x 2301.0
polars_bio_streaming 7.589706 8.153016 7.842404 22.83x 1327.531
bioframe 174.790383 183.458906 179.035999 1.00x 10996.234
pyranges0 32.793505 32.826686 32.809101 5.46x 4882.656
pyranges1 18.866156 19.570609 19.142653 9.35x 5253.281
e2e-coverage-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 1.901833 1.957711 1.928367 12.15x 956.844
polars_bio_streaming 1.797332 1.802527 1.800497 13.01x 651.266
bioframe 23.269774 23.55838 23.430125 1.00x 6493.234
pyranges1 26.370249 27.172173 26.879266 0.87x 10397.531
e2e-count-overlaps-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 5.025462 5.234103 5.129963 20.79x 1036.734
polars_bio_streaming 4.956087 5.076052 5.014242 21.27x 968.719
bioframe 105.322287 107.758078 106.64158 1.00x 12803.828
pyranges1 22.079391 23.069931 22.618209 4.71x 10039.297

gcp-linux

1-2
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 0.072393 0.151871 0.09916 2.80x 314.234
polars_bio_streaming 0.064092 0.067914 0.066202 4.19x 288.621
bioframe 0.258278 0.31288 0.277225 1.00x 287.101
pyranges0 0.591745 0.599954 0.595204 0.47x 307.218
pyranges1 0.683388 0.702289 0.690362 0.40x 327.863
e2e-nearest-csv
e2e-coverage-csv
e2e-count-overlaps-csv
8-7
e2e-overlap-csv
Library Min (s) Max (s) Mean (s) Speedup Peak memory (MB)
polars_bio 44.539766 45.543038 45.196903 12.55x 14575.14
polars_bio_streaming 34.007093 35.972075 35.309756 16.06x 480.207
bioframe 566.167037 567.617695 567.13069 1.00x 43295.378
pyranges0 417.291061 421.875539 419.571591 1.35x 22915.917
pyranges1 538.365637 548.624613 543.918168 1.04x 43408.699
e2e-nearest-csv
e2e-coverage-csv
e2e-count-overlaps-csv
100-1p
e2e-overlap-csv
e2e-nearest-csv
e2e-coverage-csv
e2e-count-overlaps-csv
10000000-1p
e2e-overlap-csv
e2e-nearest-csv
e2e-coverage-csv
e2e-count-overlaps-csv

Memory profiles

Operation: overlap for dataset: 1-2 on platform: apple-m3-max

2025-06-26T22:54:15.110095 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:15.273324 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:15.452151 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:15.668720 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:15.870470 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 1-2 on platform: apple-m3-max

2025-06-26T22:54:16.071854 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:16.367684 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:16.545911 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:16.734337 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:16.939756 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 1-2 on platform: apple-m3-max

2025-06-26T22:54:17.117154 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:17.286336 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:17.455649 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:17.704583 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 1-2 on platform: apple-m3-max

2025-06-26T22:54:17.908081 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:18.113725 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:18.294196 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:18.563892 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: overlap for dataset: 8-7 on platform: apple-m3-max

2025-06-26T22:54:18.768181 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:18.967471 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:19.181551 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:19.452908 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:19.708623 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 8-7 on platform: apple-m3-max

2025-06-26T22:54:19.967936 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:20.158157 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:23.411533 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:23.747885 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:23.927104 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 8-7 on platform: apple-m3-max

2025-06-26T22:54:24.130640 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:24.340869 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:24.507246 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:24.771255 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 8-7 on platform: apple-m3-max

2025-06-26T22:54:24.982148 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:25.165509 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:25.389361 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:25.648560 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: overlap for dataset: 100-1p on platform: apple-m3-max

2025-06-26T22:54:25.854407 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:26.059891 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:26.244383 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:26.412049 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:26.582607 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 100-1p on platform: apple-m3-max

2025-06-26T22:54:26.751933 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:26.919348 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:27.110779 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:27.295593 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:27.450273 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 100-1p on platform: apple-m3-max

2025-06-26T22:54:27.657153 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:27.823530 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:28.038278 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:28.290152 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 100-1p on platform: apple-m3-max

2025-06-26T22:54:28.633856 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:28.804112 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:28.982330 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:29.269163 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: overlap for dataset: 10000000-1p on platform: apple-m3-max

2025-06-26T22:54:29.442855 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:29.637730 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:29.824747 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:30.031750 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:30.296988 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 10000000-1p on platform: apple-m3-max

2025-06-26T22:54:30.502913 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:30.662488 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:30.883983 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:31.113419 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:31.279084 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: coverage for dataset: 10000000-1p on platform: apple-m3-max

2025-06-26T22:54:31.482550 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:31.648781 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:31.825536 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:32.094820 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: count-overlaps for dataset: 10000000-1p on platform: apple-m3-max

2025-06-26T22:54:32.300746 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:32.517622 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:32.691495 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:33.010724 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: overlap for dataset: 1-2 on platform: gcp-linux

2025-06-26T22:54:33.184048 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:33.392379 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:33.716672 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:33.897746 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:34.086728 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 1-2 on platform: gcp-linux

Operation: coverage for dataset: 1-2 on platform: gcp-linux

Operation: count-overlaps for dataset: 1-2 on platform: gcp-linux

Operation: overlap for dataset: 8-7 on platform: gcp-linux

2025-06-26T22:54:35.523718 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:35.926763 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:36.123993 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:36.398532 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/ 2025-06-26T22:54:36.654804 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Operation: nearest for dataset: 8-7 on platform: gcp-linux

Operation: coverage for dataset: 8-7 on platform: gcp-linux

Operation: count-overlaps for dataset: 8-7 on platform: gcp-linux

Operation: overlap for dataset: 100-1p on platform: gcp-linux

Operation: nearest for dataset: 100-1p on platform: gcp-linux

Operation: coverage for dataset: 100-1p on platform: gcp-linux

Operation: count-overlaps for dataset: 100-1p on platform: gcp-linux

Operation: overlap for dataset: 10000000-1p on platform: gcp-linux

Operation: nearest for dataset: 10000000-1p on platform: gcp-linux

Operation: coverage for dataset: 10000000-1p on platform: gcp-linux

Operation: count-overlaps for dataset: 10000000-1p on platform: gcp-linux

Comparison of the output schemas and data types

polars-bio tries to preserve the output schema of the bioframe package, pyranges uses its own internal representation that can be converted to a Pandas dataframe. It is also worth mentioning that pyranges always uses int64 for start/end positions representation (polars-bio and bioframe determine it adaptively based on the input file formats/DataFrames datatypes used. polars-bio does not support interval operations on chromosomes longer than 2Gp(issue)). However, in the analyzed test case (8-7) input/output data structures have similar memory requirements. Please compare the following schema and memory size estimates of the input and output DataFrames for 8-7 test case:

import bioframe as bf
import polars_bio as pb
import pandas as pd
import polars as pl
import pyranges0 as pr0


DATA_DIR="/Users/mwiewior/research/polars-bio-benchmarking/data/"
df_1 = f"{DATA_DIR}/ex-anno/*.parquet"
df_2 = f"{DATA_DIR}/ex-rna/*.parquet"
df1 = pd.read_parquet(df_1.replace("*.parquet", ""))
df2 = pd.read_parquet(df_2.replace("*.parquet", ""))
cols = ["contig", "pos_start", "pos_end"]

def df2pr0(df):
    return pr0.PyRanges(
        chromosomes=df.contig,
        starts=df.pos_start,
        ends=df.pos_end,
    )
Input datasets sizes and schemas

df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1194285 entries, 0 to 1194284
Data columns (total 3 columns):
#   Column     Non-Null Count    Dtype
---  ------     --------------    -----
0   contig     1194285 non-null  object
1   pos_start  1194285 non-null  int32
2   pos_end    1194285 non-null  int32
dtypes: int32(2), object(1)
memory usage: 18.2+ MB

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9944559 entries, 0 to 9944558
Data columns (total 3 columns):
 #   Column     Dtype
---  ------     -----
 0   contig     object
 1   pos_start  int32
 2   pos_end    int32
dtypes: int32(2), object(1)
memory usage: 151.7+ MB
polars-bio output DataFrames schema and memory used (Polars and Pandas)
df_pb = pb.overlap(df_1, df_2, cols1=cols, cols2=cols, use_zero_based=True)
df_pb.count().collect()
307184634

df_pb.collect_schema()
Schema([('contig_1', String),
        ('pos_start_1', Int32),
        ('pos_end_1', Int32),
        ('contig_2', String),
        ('pos_start_2', Int32),
        ('pos_end_2', Int32)])

df_pb.collect().estimated_size("mb")
7360.232946395874
df_pb.collect().to_pandas().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307184634 entries, 0 to 307184633
Data columns (total 6 columns):
 #   Column       Dtype
---  ------       -----
 0   contig_1     object
 1   pos_start_1  int32
 2   pos_end_1    int32
 3   contig_2     object
 4   pos_start_2  int32
 5   pos_end_2    int32
dtypes: int32(4), object(2)
memory usage: 9.2+ GB
bioframe output DataFrame schema and memory used (Pandas)

df_bf = bf.overlap(df1, df2, cols1=cols, cols2=cols, how="inner")
len(df_bf)
307184634
df_bf.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307184634 entries, 0 to 307184633
Data columns (total 6 columns):
 #   Column      Dtype
---  ------      -----
 0   contig      object
 1   pos_start   int32
 2   pos_end     int32
 3   contig_     object
 4   pos_start_  int32
 5   pos_end_    int32
dtypes: int32(4), object(2)
memory usage: 9.2+ GB
pyranges0 output DataFrame schema and memory used (Pandas)
df_pr0_1 = df2pr0(df1)
df_pr0_2 = df2pr0(df2)
df_pr0_1.df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1194285 entries, 0 to 1194284
Data columns (total 3 columns):
 #   Column      Non-Null Count    Dtype
---  ------      --------------    -----
 0   Chromosome  1194285 non-null  category
 1   Start       1194285 non-null  int64
 2   End         1194285 non-null  int64
dtypes: category(1), int64(2)
memory usage: 19.4 MB
df_pr0_2.df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9944559 entries, 0 to 9944558
Data columns (total 3 columns):
 #   Column      Dtype
---  ------      -----
 0   Chromosome  category
 1   Start       int64
 2   End         int64
dtypes: category(1), int64(2)
memory usage: 161.2 MB

df_pr0 = df_pr0_1.join(df_pr0_2)
len(df_pr0)
307184634
df_pr0.df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307184634 entries, 0 to 307184633
Data columns (total 5 columns):
 #   Column      Dtype
---  ------      -----
 0   Chromosome  category
 1   Start       int64
 2   End         int64
 3   Start_b     int64
 4   End_b       int64
dtypes: category(1), int64(4)
memory usage: 9.4 GB

Note

Please note that pyranges unlike bioframe and polars-bio returns only one chromosome column but uses int64 data types for encoding start and end positions even if input datasets use int32.