# Packages
# README
README
The current doc mainly describes the algorithm for guessing stack nodes.
Question
We represent a call stack from root to leaf as a list. Then the call order is the left calling the right. Therefore, the left side will be relatively more stable and the right side will be more transient.
Sn = [f1, f2, f3, ..., fn]
(root) (leaf)
Assuming some consecutive nodes trimmed on the left side of each list, then the lists cannot align with each other any more. We need to fill in the trimmed nodes so that the lists can align with each other again.
For example: original stack
S1 = [f1, f2, f3, f4, f5, f6, f8, f9]
S2 = [f1, f2, f3, f4, f7]
Assuming that nodes f1, f2
is trimmed from stack S1
on the left trimmed, it will become the following case, and can
not align with each other anymore.
S1 = [ f3, f4, f5, f6, f8, f9 ] // (with f1, f2 trimmed)
S2 = [ f1, f2, f3, f4, f7 ]
After recovering, it should be the following
S1 = [ f1, f2, f3, f4, f5, f6, f8, f9 ]
S2 = [ f1, f2, f3, f4, f7 ]
Mark Agreement
For the convenience of the following analysis, we make the following conventions for lists:
Given two lists
A = [e1, e2, e3]
B = [e4, e5]
We mark A + B
as a list concatenating A
and B
with all elements from A
before B
, shown as following:
A+B = [e1, e2, e3, e4, e5]
B+A = [e4, e5, e1, e2, e3]
mark | A |
as the length of list A
, for example:
|A| = 3
|B| = 2
|A+B| = 5
|B+A| = 5
If A = C + D
, then we call C
a prefix of A
. Suppose C
is the longest list as common prefix for list A
& B
,
we mark
C = A & B
we know there should be a list D
and E
, making
A = C + D
B = C + E
| D & E | = 0
Analysis
If the roots of the two stacks is like following
S1 = A0 + A1 + A2
S2 = A0 + A1 + A3
Where | A2 & A3 | = 0
and |A0| > 0, |A1| > 0
.
Assuming A0
is trimmed from S1
, then after trimmed, the 2 stacks becomes:
S1 = A1 + A2
S2 = A0 + A1 + A3
Here we know the remaining root A1
in S1
should overlap with a middle list of S2
. Therefore, we can guess
that A0
is trimmed from S1
through the overlapping position in S2
.
The longer the overlapping A1
is, the more trustable this guess is.
Solution
If we can devide S1
and S2
in the following way, with A1
has the longest length.
S1 = A1 + A2
S2 = A0 + A1 + A3
Where | A0 | > 0
and | A2 & A3 | = 0
,
Then the max overlapping range (MOR
) is recorded as follows:
start : MORS(S1, S2) = |A1 + A3| // start of max overlapping range in S2, from leaf.
length: MORL(S1, S2) = | A1 | // length of max overlapping range
If MORL (S1, S2) = 0
, then the two stack nodes do not overlap, and they may not have the same root.
When there are multiple stacks (such as: {S1, S2, S3,..., Sn}
), for Si
, we can find MORL
with any other
stack Sj
(MORL(i, j)
).
Among them, the biggest MORL(i, j)
is the most trustable guess: Si
should have the same root as Sj
from
node MORS(Si, Sj)
. We mark MORL(i) = max{ MORL(i, Sj) }
, where j != i
. We call stack j
is the best match for
stack i
.
MORL | 1 | 2 | 3 | ... | n | max MORL |
---|---|---|---|---|---|---|
1 | - | MORL(1,2) | MORL(1,3) | ... | MORL(1,n) | MORL(1) = max{MORL(1,k)} |
2 | MORL(2,1) | - | MORL(2,3) | ... | MORL(2,n) | MORL(2) = max{MORL(2,k)} |
3 | MORL(3,1) | MORL(3,2) | - | ... | MORL(3,n) | MORL(3) = max{MORL(3,k)} |
... | ... | ... | ... | ... | ... | ... |
n | MORL (n, 1) | MORL (n, 2) | MORL (n, 3) | ... | - | MORL (n) = max {MORL (n, k)} |
Then we can fix the stacks Si
in the order of MORL(i)
from big to small.
Not To Dynamic Calculation 'MOR'
Assuming that the best guess of stack Si
is to share root nodes from Sj
, and fix the stack Si
, does any other
unfixed stack need to update the MOR
due to this operation?
The conclusion is: not needed
Proof as following:
Assuming
Si = A1 + A2
Sj = A0 + A1 + A3
Where | A2 & A3 | = 0
. At this time MORL(i, j) = | A1 |
. After fixing Si
, we can know:
Si = A0 + A1 + A2
Assuming that an unfixed stack Sk
previously having best match with MORL(k)
need to be updated due to the fix
of Si
, then the new better MORL(k)
must be of a range starting at a node in A0
. Can the new overlapping range be
bigger than the previous MORL(k)
?
Since we are fixing all stacks in the order of MORL
from big to small, we can know for the present stack Sk
matching Sj
having the following relations.
MORL(k,j) <= MORL(k) < MORL(i) = | A1 |
So, any overlapping range A
for Sk
on Sj
starts from node in A0
, should be a sublist of A0 + A1
. (
otherwise | A |
should have a bigger length than | A1 |
, counter with the previous conclusion MORL(k,j) < |A1|
)
Now Si
has the same range A0 + A1
with Sj
. So any overlapping range starting at a node in A0
will be the same
with that in Sj
(not better than the current MORL(k,j)
, not to say the MORL(k)
)
Therefore, it is certain that in the new Si = A0 + A1 + A2
list, it will not produce a new longer overlapping list for
any unfixed stack Sk
.
Avoid Loop
The above procedure of sharing root nodes can be understood here as constructing a tree.
The process of sharing root nodes is to add one stack to the children list of a node of another stack. Then we need to make sure that it ends up with a tree, with no loop in it.
This limitation can be summed up as following:
- Self-overlap should not be allowed.
- A stack cannot share the root from other stack where it has transitively shared root from the current stack.
Introduce prefix tree to reduce complexity from $O(n^2)$ to O(n)
In the analysis above, we have to compare a stack to all other stacks to find out the best match. Suppose we have n
stacks in total, and comparing one stack to another to calculate the MORL
for the 2 stacks to be a basic operation.
then n * (n-1)
runs of comparisons are required. The complexity is O(n^2)
. However, there is possibility for
improvement. Suppose we have 2 stacks
Si = A1 + A2
Sj = A1 + A3
where | A2 & A3 | = 0
.
If we have compared Si
to Sk
, and get MORL(i,k) < |A1|
. That's to say A1
can fully overlap any part of Sk
.
Then we can just tell MORL(j,k) = MORL(i,k)
, and comparing Sj
to Sk
is a redundant operation. So how to reduce the
redundant comparisons? The prefix tree is just a way to reach it.
First we combine all stacks to a prefix tree, ensuring any path from root to a leaf to be a stack of the input problem.
Then we can compare a stack to the whole tree, we will get the MORL
from all stacks in the tree to the current stack
with just 1 run of comparisons.
Shown as the following diagram. Suppose we have 5 stacks and join them into the same prefix tree.
The stacks are:
S1 = [ 1, 2, 3, 4 ]
S2 = [ 1, 2, 5 ]
S3 = [ 1, 6 ]
S4 = [ 2, 3, 5, 6 ]
S4 = [ 2, 3, 7 ]
flowchart BT
subgraph tree
direction TB
0([ROOT]) --> 1 --> 2 --> 3 --> 4
2 --> 5
1 --> 6
0 --> 20[2]:::match --> 30[3]:::match --> 50[5] --> 60[6]
30 --> 7
end
subgraph stack
direction TB
11[1] --> 21[2]:::match --> 31[3]:::match --> 41[4]
end
classDef match fill:#00cc00,color:#ffffff;
When we match S1
with the tree, we spare [1]
as the root, and compare the remainder substack [ 2, 3, 4 ]
with the
tree, we can easily find the overlapping part [ 2, 3 ]
on the right path of the tree. Then we will know:
S1
/S2
/S3
cannot overlapS1
, i.e.MORL(i,1)=0
, wherei in {1, 2, 3}
.S4
/S5
can overlap it withMORL
to be2
. i.e.MORL(i,1)=2
, wherei
in{4, 5}
.
Therefore, when performing a run of comparisons for stack (j
) to the prefix tree, we can know MORL(i,j)
for all i
in {1..n}
, with O(1)
basic operation. So finally we just need O(n)
basic runs of comparisons.