Next-gen Python DataFrame operations for genomics!
polars-bio is a blazing fast Python DataFrame library for genomics𧬠built on top of Apache DataFusion, Apache Arrow and polars. It is designed to be easy to use, fast and memory efficient with a focus on genomics data.
Single-thread performance πβ
Parallel performance πβπβ
Key Features
- optimized for peformance and large-scale genomics datasets
- popular genomics operations with a DataFrame API (both Pandas and polars)
- native parallel engine powered by Apache DataFusion and sequila-native
- out-of-core processing (for data too large to fit into a computer's main memory) with Apache DataFusion and polars
- zero-copy data exchange with Apache Arrow
- bioinformatics file formats with exon
- pre-built wheel packages for Linux, Windows and MacOS (arm64 and x86_64) available on PyPI