Quasi-Horizontal
auto* lb = gempba::mt::create_load_balancer(gempba::balancing_policy::QUASI_HORIZONTAL);
Implements load_balancer. The primary scheduling strategy of GemPBA and the main algorithmic contribution of the original research. Designed for branching algorithms where subtree sizes are unknown and highly unbalanced, which is the common case.
The problem it solves
In a branch-and-bound search the recursion tree is highly unbalanced: one subtree might contain millions of nodes while its sibling contains three. Naive work-stealing picks whatever task is available — often a node near the leaves, carrying almost no remaining work. Threads finish quickly and idle while one thread drowns in the large subtree.
Quasi-horizontal avoids this by distributing work near the root first. Nodes close to the root are parents of the largest subtrees. Handing them to idle threads early gives each thread roughly equal downstream work.
%%{init: {'theme': 'base'}}%%
flowchart TD
R(("Root")):::root
R -->|"→ Thread 2"| A("A"):::t2
R -->|"→ Thread 3"| B("B"):::t3
R -->|"→ Thread 4"| C("C"):::t4
R -->|"current thread"| D("D"):::t1
A --> a1(" "):::sub
A --> a2(" "):::sub
a1 --> a3(" "):::sub
a1 --> a4(" "):::sub
B --> b1(" "):::sub
B --> b2(" "):::sub
b1 --> b3(" "):::sub
C --> c1(" "):::sub
c1 --> c2(" "):::sub
c1 --> c3(" "):::sub
c1 --> c4(" "):::sub
D --> d1(" "):::sub
D --> d2(" "):::sub
classDef root fill:#b0bec5,color:#37474f,stroke:#000000
classDef t1 fill:#7b1fa2,color:#fff,stroke:#000000
classDef t2 fill:#1565c0,color:#fff,stroke:#000000
classDef t3 fill:#2e7d32,color:#fff,stroke:#000000
classDef t4 fill:#e65100,color:#fff,stroke:#000000
classDef sub fill:#eceff1,color:#546e7a,stroke:#000000
Each thread receives a root-level child — one step below the tree root. Because these nodes are as high as possible, each thread inherits the largest possible share of remaining work.
Per-thread root pointer
Each worker thread maintains a root pointer: a reference to the highest node in the tree this thread currently owns. When a thread pushes a branch to another thread, it hands off that subtree entirely.
As branches get resolved or pruned, the root pointer descends: if the current root has only one remaining child, there is no parallelism left at that level, so the pointer advances to that child. This continues until a real branching point (two or more pending children) or a leaf is reached. This is the root correction step.
%%{init: {'theme': 'base'}}%%
flowchart LR
subgraph s1["Before: branch A resolved"]
direction TB
R1(("root\npointer")):::oldroot
R1 --> Ax("A ✓"):::done
R1 --> By("B"):::active1
By --> By1(" "):::sub1
By --> By2(" "):::sub1
By1 --> By3(" "):::sub1
By2 --> By4(" "):::sub1
By2 --> By5(" "):::sub1
end
subgraph s2["After correction"]
direction TB
R2(("root\npointer (B)")):::newroot
R2 --> Ba(" "):::active
R2 --> Bb(" "):::active
Ba --> Ba1(" "):::sub2
Bb --> Bb1(" "):::sub2
Bb --> Bb2(" "):::sub2
end
s1 -- "root descends\npast single-child nodes" --> s2
classDef oldroot fill:#b0bec5,color:#37474f,stroke:#000000
classDef newroot fill:#b0bec5,color:#37474f,stroke:#000000
classDef done fill:#bdbdbd,color:#9e9e9e,stroke:#000000
classDef active1 fill:#2e7d32,color:#fff,stroke:#000000
classDef active fill:#2e7d32,color:#fff,stroke:#000000
classDef sub1 fill:#eceff1,color:#546e7a,stroke:#000000
classDef sub2 fill:#eceff1,color:#546e7a,stroke:#000000
Once A is resolved, keeping the root pointer at the original root adds traversal overhead on every future submission. Correction moves it down to B so the thread stays focused on live work only.
Submission algorithm
When try_local_submit is called, the strategy:
- Pushes root-level siblings first. If the submitting node has pending siblings at the current root level, it pushes those to idle threads before the current node. This is the horizontal spreading step.
- Prunes the current node once no more root-level siblings remain.
- Delegates to a thread if
should_branch()is true, otherwise marks the nodeDISCARDED. - Corrects the root pointer — descends past any single-child nodes just resolved.
Tree navigation calls used:
p_node.get_root() // current root for this thread
p_root.get_children_count() // branches remaining at root level
p_root.get_second_leftmost_child() // next sibling to push
p_node.get_leftmost_child() // descend during root correction
When to use
This is the default and recommended strategy for all branching algorithms. Use Work-Stealing only when you need a baseline for benchmarking.