202 Automata for Unranked Trees
is processed in a bottom-up fashion, starting at the leaves and working toward
the root using the transition rules.
In unranked trees the number of successors of a position p in a tree t is not
determined by its label, and furthermore there is no upper bound on the number
of successors that a position can have. Therefore automata for unranked trees
cannot be defined by explicitly listing all transition rules in the same style as
for ranked tree automata.
To deal with the unbounded branching of unranked trees we have to find a
way to symbolically represent an infinite number of transitions.
Example 8.2.2. Assume we want to define an automaton accepting all un-
ranked trees of height 1 over the alphabet Σ = {a, b} that have an even number
of leaves, all leaves are labeled by b, and the root is labeled by a, i.e. the trees
a, a(bb), a(bbbb), · · · .
For this purpose, we could use the two states q
b
and q. The first transition
rule b → q
b
ensures that the leaves can be labeled by q
b
. To capture all the trees
listed above we need the infinite set of rules
a → q, a(q
b
q
b
) → q, a(q
b
q
b
q
b
q
b
) → q, · · ·
This set can be represented by a single rule a((q
b
q
b
)
∗
) → q using a regular
expression for describing the sequences of states below the position with label
a that enable the automaton to proceed to state q.
In the previous example we have used a regular expression to represent an
infinite number of transition rules. In general, this leads to transition rules of
the form a(R) → q, where R ⊆ Q
∗
is a regular language over the states of the
unranked tree automaton.
Of course it would be possible to allow R to be from a bigger class of lan-
guages, e.g. the context-free languages. But as we are interested in defining
finite automata for unranked trees, i.e. automata using memory of bounded
size, we should choose R such that it can be represented by an automaton using
only bounded memory. Thus, the class of regular languages is a natural choice
in this context.
A nondeterministic finite hedge automaton (NFHA) over Σ is a tuple
A = (Q, Σ, Q
f
, ∆) where Q is a finite set of states, Q
f
⊆ Q is a set of final
states, and ∆ is a finite set of transition rules of the following type:
a(R) → q
where R ⊆ Q
∗
is a regular language over Q. These languages R occurring in
the transition rules are called horizontal languages.
A run of A on a tree t ∈ T (Σ) is a tree r ∈ T (Q) with the same domain
as t such that for each node p ∈ Pos(r) with a = t(p) and q = r(p) there is
a transition rule a(R) → q of A with r(p1) · · · r(pn) ∈ R, where n denotes the
number of successors of p. In particular, to apply a rule at a leaf, the empty
word ε has to be in the horizontal language of the rule.
An unranked tree t is accepted by A if there is a run r of A on t whose
root is labeled by a final state, i.e. with r(ε) ∈ Q
f
. The language L(A) of A
is the set of all unranked trees accepted by A.
TATA — November 18, 2008 —