Interval joins

Interval joins
Property Name Default Meaning
spark.biodatageeks.rangejoin.useJoinOrder false Whether to always broadcast the right table of a join or to computer row counts and pick up the smaller one.
spark.biodatageeks.rangejoin.maxBroadcastSize 0.1*spark.driver.memory The maximum allowed size of the broadcasted intverval structure which is used by the SeQuiLa’s optimizer to chose interval join algorithm.
spark.biodatageeks.rangejoin.maxGap 0 The maximum gap between between regions
spark.biodatageeks.rangejoin.minOverlap 1 The minimal length of the overlap between regions
spark.biodatageeks.rangejoin.intervalHolderClass IntervalTreeRedBlack Pluggable mechanism for implementing custom interval structures.

Last modified July 26, 2024: Fix comet condition (#180) (15431c5)