Kudryavtsev V.B., Rosenberg I.G. Structural Theory of Automata, Semigroups, and Universal Algebra

Подождите немного. Документ загружается.

Algebraic classiﬁcations of regular tree languages 387

(1) X ∪ Σ

⊆ T ,and

(2) f(t

,...,t

) ∈ T whenever m>0, f ∈ Σ

and t

,...,t

∈ T .

Clearly, T

(X) = ∅ if and only if Σ

∪X = ∅, and in that case the ΣX-term algebra T

(X)=

(X), Σ) is deﬁned by setting c

(X)

= c for c ∈ Σ

,andf

(X)

,...,t

)=f(t

,...,t

)

for any m ≥ 1,f ∈ Σ

and t

,...,t

∈ T

(X).

The term algebra T

(X)isfreely generated by X over the class of all Σ-algebras, i.e.,

(1) T

(X) is generated by X, X = T

(X), and

(2) for any Σ-algebra A =(A, Σ), every mapping α : X → A has a unique extension to a

homomorphism α

: T

(X) →A.

The requirement that α

: T

(X) →Abe a homomorphism such that xα

= α(x) for every

x ∈ X, gives inductively a unique value tα

to every t ∈ T

(X):

• xα

= α(x) for any x ∈ X;

• cα

= c

(X)

= c

for any c ∈ Σ

;

• tα

= f

(X)

,...,t

)α

= f

,...,t

), for t = f (t

,...,t

The term function t

: A

→ A deﬁned by a ΣX-term t ∈ T

(X)isnowobtainedby

setting t

(α)=tα

for any α : X → A (i.e., α ∈ A

). For example, if t = f(g(x),c), then

(α)=tα

= f

(α(x)),c

3 Finite automata and regular languages

In this section we review some relevant parts of the theory of ﬁnite automata and regular

languages. Proofs and further results can be found in [6, 17, 37, 44, 94], for example.

In what follows, an alphabet is a ﬁnite non-empty set of symbols called letters.IfX

is an alphabet, then X

∗

denotes the set of all (ﬁnite) words over X, e is the empty word,

and X

is the set of non-empty words over X. The free monoid generated by X under the

concatenation operation (u, v) → uv with e as the identity is also denoted X

∗

. Similarly, X

also stands for the free semigroup generated by X. Subsets of X

∗

are called languages,and

subsets of X

are e-free languages.

An X-automaton is a triple (A, X, δ) consisting of a ﬁnite non-empty set A of states,the

input alphabet X,andatransition function δ : A ×X → A; for any a ∈ A and x ∈ X, δ(a, x)

is the next state of the automaton if a is the present state and x is the input letter currently

read by the automaton. The transition function δ is extended to a function δ

∗

: A ×X

∗

→ A

by setting δ

∗

(a, e)=a and δ

∗

(a, ux)=δ(δ

∗

(a, u),x) for all a ∈ A, u ∈ X

∗

and x ∈ X.

Then, for any a ∈ A and w ∈ X

∗

, δ

∗

(a, w) is the state reached from state a after reading

the input word w.AnX-recognizer is a system A =(A, X, δ, a

,F), where (A, X, δ)isan

X-automaton, a

∈ A is the initial state,andF ⊆ A is the set of ﬁnal states.Thelanguage

recognized by A is the set L(A)={w ∈ X

∗

| δ

∗

,w) ∈ F }. A language L ⊆ X

∗

recognizable,orregular,ifL = L(A)forsomeX-recognizer A.LetRec(X)denotethesetof

388 M. Steinby

all regular languages over X and let Rec = {Rec(X)}

be the family of all regular languages,

where X ranges over all ﬁnite alphabets

Let us note some closure properties of the family Rec. At the same time we deﬁne several

important language operations.

3.1 Proposition Let X and Y be any (ﬁnite non-empty) alphabets.

(1) (Boolean operations) Rec(X) forms a ﬁeld of sets on X

∗

(2) (Product) If K, L ∈ Rec(X),thenKL := {uv | u ∈ K, v ∈ L}∈Rec(X).

(3) (Iteration) If L ∈ Rec(X),thenL

∗

:= {u

...u

| n ≥ 0,u

,...,u

∈ L}∈Rec(X).

(4) (Quotients) For any L ∈ Rec(X) and w ∈ X

∗

(a) w

−1

L := {u ∈ X

∗

| wu ∈ L}∈Rec(X);

(b) Lw

−1

:= {u ∈ X

∗

| uw ∈ L}∈Rec(X).

(5) (Homomorphisms and Inverse Homomorphisms) For any homomorphism ϕ : X

∗

→ Y

∗

(a) L ∈ Rec(X) implies Lϕ ∈ Rec(Y );

(b) L ∈ Rec(Y ) implies Lϕ

−1

∈ Rec(X).

Let us now recall some algebraic characterizations of regular languages that have natural

counterparts in the theory of regular tree languages.

3.2 Theorem (Kleene 1956) A language over an alphabet X is regular if and only if it

can be obtained from the empty language ∅ and the singleton sets {x} (x ∈ X)bymeansof

the regular operations union K ∪ L,productKL, and iteration L

∗

Hence, a language is regular if and only if it is denoted by a regular expression that shows

how it can be obtained from the elementary languages ∅ and {x} by regular operations.

3.3 Theorem (Nerode 1958) For any language L ⊆ X

∗

the following are equivalent:

(1) L is regular;

(2) L is saturated by a right congruence on X

∗

of ﬁnite index;

(3) the following Nerode congruence 

on X

∗

of L is of ﬁnite index:

u

v ⇐⇒ (∀w ∈ X

∗

)(uw ∈ L ↔ vw ∈ L)(u, v ∈ X

∗

The Nerode congruence of a language L ⊆ X

∗

is easily seen to be the greatest right

congruence on X

∗

that saturates L,andifL is regular, it yields a minimal recognizer of L

in which the states are the 

-classes.

3.4 Theorem (Myhill 1957) For any language L ⊆ X

∗

, the following are equivalent:

Since no set-theoretic problems can arise here, we will simply let X range over “all” alphabets. Of course,

we could assume that every X is a ﬁnite non-empty subset of a given inﬁnite set of symbols.

Algebraic classiﬁcations of regular tree languages 389

(1) L is regular;

(2) L is saturated by a congruence on X

∗

of ﬁnite index;

(3) the following Myhill congruence µ

on X

∗

of L is of ﬁnite index:

uµ

v ⇐⇒ (∀s, t ∈ X

∗

)(sut ∈ L ↔ svt ∈ L)(u, v ∈ X

∗

It is easy to see that µ

is the greatest congruence on X

∗

that saturates L. The congruence

is also called the syntactic congruence of L and the quotient monoid M(L)=X

∗

/µ

the syntactic monoid of L.IfL is regular, M(L) can be computed because it is isomorphic to

the transition monoid of the minimal recognizer A =(A, X, δ, a

,F)ofL, that is, the monoid

formed by the maps δ

∗

(−,w):A → A, a → δ

∗

(a, w)(w ∈ X

∗

), under composition. Myhill’s

Theorem can be restated as follows. Condition (2) of the following corollary is often used as

the deﬁnition of regularity in algebraic presentations of the theory of regular languages.

3.5 Corollary For any language L ⊆ X

∗

, the following are equivalent:

(1) L is regular;

(2) there exist a ﬁnite monoid M, a homomorphism ϕ : X

∗

→ M andasubsetH ⊆ M

such that L = Hϕ

−1

;

(3) the syntactic monoid M(L) of L is ﬁnite.

Several interesting types of regular languages with some special properties have been

studied in the literature. Let us deﬁne generally a family of regular languages as a mapping

L that assigns to each alphabet X asetL(X) ⊆ Rec(X) of regular languages over X.We

write L = {L(X)}

with the understanding that X ranges over all alphabets. For each such

family L one is faced with the problem of ﬁnding an algorithm for deciding for any given

X-recognizer A whether L(A) ∈L(X).

The deﬁnite languages already introduced by Kleene [42] form perhaps the ﬁrst non-trivial

proper sub-family of the regular languages that got an eﬀective characterization when Perles,

Rabin and Shamir [56] described the corresponding recognizers. However, a considerably

harder problem was solved when Sch¨utzenberger [77] proved that a language is star-free if

and only if its syntactic monoid is aperiodic, that is to say, has only trivial subgroups. The

result was remarkable as this family of languages arises in many natural ways (cf. [50]), but

no other decision method for it was known. Subsequently several other families were similarly

characterized by properties of their syntactic monoids or syntactic semigroups. Finally, in

his Variety Theorem Eilenberg [18] identiﬁed the families of languages for which such a char-

acterization is possible. As Eilenberg’s theory is also the starting point for the corresponding

work on tree languages, we review its main notions and results. For systematic expositions

the reader is referred to [2, 18, 63, 64]. Briefer accounts can be found in [37, 45].

A family L = {L(X)}

of regular languages is called a ∗-variety,oravariety of regular

languages (VRL) if for all alphabets X and Y ,

(1) L(X) ⊆ Rec(X),

(2) L ∈L(X) implies X

∗

\ L ∈L,

390 M. Steinby

(3) K, L ∈L(X) implies K ∩ L ∈L(X),

(4) L ∈L(X) implies w

−1

L, Lw

−1

∈L(X) for every w ∈ X

∗

,and

(5) L ∈L(Y ) implies Lϕ

−1

∈L(X) for every homomorphism ϕ : X

∗

→ Y

∗

In [18] the corresponding families of e-free languages are called +-varieties.Thegreatest

VRL is the family Rec of all regular languages. If we exclude the VRL L with L(X)=∅

for every X, then the least VRL is Triv, where Triv(X)={∅,X

∗

} for every X.Themore

interesting examples include the families of star-free, deﬁnite, reverse deﬁnite, generalized

deﬁnite, locally testable and piecewise testable languages.

A non-empty class M of ﬁnite monoids is a variety of ﬁnite monoids (VFM), or a pseu-

dovariety, if it is closed under the forming of submonoids, images and ﬁnite direct products.

For any given class K of ﬁnite monoids, there is a unique minimal VFM V

(K) containing

K,theVFM generated by K.

For any family L = {L(X)}

of regular languages, let

({M(L) | L ∈L(X)forsomeX}),

and for any class K of ﬁnite monoids, deﬁne K

= {K

(X)}

by setting

(X)={L ⊆ X

∗

| M(L) ∈ K} for each X.

If we let VRL and VFM denote the classes of all VRLs and all VFMs, respectively, then

Eilenberg’s Variety Theorem can be stated as follows.

3.6 Theorem (Eilenberg 1976) The mappings L →L

and M → M

deﬁne mutually

inverse isomorphisms between the lattice (VRL, ⊆) of all varieties of regular languages and

the lattice (VFM, ⊆) of all varieties of ﬁnite monoids. In particular,

(1) if L∈VRL,thenL

∈ VFM and L

µλ

= L,and

(2) if M ∈ VFM,thenM

∈ VRL and M

λµ

= M.

If L = {L(X)}

is a VRL, then for any X and L ⊆ X

∗

, L ∈L(X) if and only if

M(L) ∈L

. Hence, an eﬀective characterization of L may be obtained by determining the

VFM L

. A similar correspondence holds between the varieties of ﬁnite semigroups and the

+-varieties.

Let us also recall an important addition to the Variety Theorem due to Th´erien [86, 87].

For any alphabet X,letFCon(X

∗

) denote the set of congruences on X

∗

of ﬁnite index.

Clearly FCon(X

∗

) is a ﬁlter of the congruence lattice Con(X

∗

). A ∗-variety of congruences,

or a variety of ﬁnite congruences (VFC), is a family Γ = {Γ(X)}

of sets of congruences

such that for all alphabets X and Y ,

(1) Γ(X) ⊆ FCon(X

∗

)isaﬁlterofCon(X

∗

), and

(2) if ϕ : X

∗

→ Y

∗

is a homomorphism and θ ∈ Γ(Y ), then ϕ ◦ θ ◦ ϕ

−1

∈ Γ(X).

Algebraic classiﬁcations of regular tree languages 391

That each VRL corresponds to a unique VFC is a useful fact as many varieties of regular

languages are most naturally deﬁned in terms of congruences of the monoids X

∗

We conclude this section by noting how ﬁnite automata can be deﬁned as unary algebras.

This approach, already propounded by J. R. B¨uchi and J. B. Wright in the 1950s, provides

a natural passage to tree automata (cf. [8, 13, 16, 28, 79, 85], for example).

Each input letter x ∈ X deﬁnes in an X-automaton (A, X, δ) a unary operation

: A → A, a → δ(a, x),

and these operations determine δ completely. Hence (A, X, δ) can be redeﬁned as a unary

algebra A =(A, X)whenweviewX as a set of unary operation symbols. If ε is a variable,

we may identify each word w ∈ X

∗

with an X-term t

over {ε} by setting t

= ε,and

= x(t

)forw = ux (u ∈ X

∗

,x ∈ X). For example, t

xxy

= y(x(x(ε))). By letting w

represent the term t

,theX{}-term algebra may be taken to be T

(ε)=(X

∗

,X), where

(ε)

(w)=wx for any w ∈ X

∗

and x ∈ X. The mapping δ

∗

(−,w):A → A, a → δ

∗

(a, w),

induced by an input word w ∈ X

∗

in (A, X, δ) now becomes the term function w

deﬁned by

the term w (= t

) in the algebra A =(A, X). Furthermore, an X-recognizer can be deﬁned

as a system A =(A,a

,F), where A =(A, X) is a ﬁnite X-algebra, a

∈ A is the initial

state, and F ⊆ A is the set of ﬁnal states. The language recognized by A is then the term

set L(A)={w | w

) ∈ F },andL ⊆ X

∗

is regular if and only if L = Fϕ

−1

for some ﬁnite

X-algebra A =(A, X), a homomorphism ϕ : T

(ε) →Aand a subset F ⊆ A.

Tree recognizers and regular tree languages are now obtained by allowing function symbols

of any ﬁnite arities. Let us note that the unary interpretation described above also suggests

an alternative theory of varieties of regular languages [83].

4 Trees and terms

In mathematics and computer science trees are deﬁned in several diﬀerent ways, often de-

pending on the applications in mind. The trees to be considered here are ﬁnite, their nodes

are labelled with symbols, and the branches leaving any given node have a speciﬁed order.

For example, derivations in context-free grammars can be represented by such trees. For the

algebraic approach to be adopted here it will be convenient to deﬁne our trees formally as

terms of the kind used in algebra and logic, for example.

A ranked alphabet is a ﬁnite set of operation symbols, but now these symbols will also be

used for labelling nodes of trees. In what follows, Σ is always a ranked alphabet. For each

m ≥ 0, the set of m-ary symbols in Σ is again denoted Σ

. If Ω is also a ranked alphabet,

Σ ⊆ ΩmeansthatΣ

⊆ Ω

for every m ≥ 0. The union Σ ∪ΩmaybeformedifΣ

∩Ω

= ∅

whenever m = n, and then (Σ ∪ Ω)

=Σ

∪ Ω

for every m ≥ 0. In examples we may

give a ranked alphabet in the form Σ = {f

,...,f

} indicating that Σ consists of the

symbols f

,...,f

with the respective ranks m

,...,m

In addition to ranked alphabets, ordinary ﬁnite alphabets, called leaf alphabets,areused

for labelling leaves of trees. These will usually be denoted by X or Y . When a leaf alphabet

is considered together with a ranked alphabet, the two sets are assumed to be disjoint.

Terms will be regarded as syntactic representations of trees, and Σ-terms with variables

in X are called also ΣX-trees.Anyt ∈ X ∪ Σ

represents a one-node tree in which the

only node is labelled with the symbol t. A composite term f(t

,...,t

) is interpreted as

392 M. Steinby

a tree formed by adjoining the m trees represented by t

,...,t

toanewf -labelled root.

Subsets of T

(X) are called ΣX-tree languages. We may also speak about Σ-trees and Σ-tree

languages without specifying the leaf alphabet, or generally about trees and tree languages

without mentioning any alphabet.

If Σ

∪X = ∅,thesetT

(X) is also empty, and if Σ = Σ

,theonlyΣX-trees are the ﬁnitely

many one-node trees labelled with symbols from Σ

∪X. To exclude these uninteresting trivial

cases, we will tacitly assume that Σ

∪ X = ∅ and Σ =Σ

The inductive deﬁnition of T

(X) yields a Principle of Tree Induction for proving asser-

tions about ΣX-trees: a statement S holds for every ΣX-tree if

(1) S holds for every x ∈ X and for every c ∈ Σ

,and

(2) S holds for f(t

,...,t

) assuming that S holds for t

,...,t

(m>0, f ∈ Σ

Similarly, notions related to ΣX-trees may be deﬁned recursively following the inductive

deﬁnition of T

(X). As useful examples we deﬁne the set of subtrees sub(t), the height hg(t)

and the root (symbol) root(t)ofaΣX-tree t:

(1) sub(t)={t},hg(t)=0androot(t)=t for any t ∈ X ∪Σ

;

(2) sub(t)={t}∪sub(t

)∪···∪sub(t

), hg(t)=max{hg(t

),...,hg(t

)}+1 and root(t)=

f for t = f(t

,...,t

For example, for the ΣX-tree t = g(f(g(y),x)), where f ∈ Σ

, g ∈ Σ

and x, y ∈ X,weget

hg(t) = 3, root(t)=g and sub(t)={t, f(g(y),x),g(y),x,y}.

For any n ≥ 0, let

(X)

≥n

= {t ∈ T

(X) | hg(t) ≥ n}.

Similarly, let T

(X)

be the set of ΣX-trees of height <n.

Let ξ be a special symbol that appears neither in any ranked alphabet nor in any leaf

alphabet considered. A Σ(X ∪{ξ})-tree in which ξ appears exactly once, is called a ΣX-

context.ThesetofallΣX-contexts is denoted by C

(X). If p, q ∈ C

(X), then p · q = q(p)

is the ΣX-context obtained by replacing the ξ in q with p. Similarly, if t ∈ T

(X)and

p ∈ C

(X), then t · p = p(t)istheΣX-tree obtained when the ξ in p is replaced with t.

The height hg(p)andtheroot root(p)ofaΣX-context p are deﬁned the same way as for

ΣX-trees treating ξ as a leaf symbol, i.e., hg(ξ)=0androot(ξ)=ξ.Moreover,thedepth

dp(p)ofaΣX-context p is the distance of the ξ-labelled leaf from the root, that is to say

(1) dp(ξ)=0;

(2) dp(p)=dp(q)+1 forp = f (t

,...,q,...,t

), t

,...,t

∈ T

(X)andq ∈ C

(X).

Finally, let us note that in many presentations no separate leaf alphabets are used, but

a special set of nullary symbols is singled out when the need arises. Although this could be

done without any essential loss of generality, leaf alphabets are convenient in many cases and

we shall use them. If X = ∅ in the above deﬁnitions, T

(X) becomes the set T

of ground

Σ-terms,orground Σ-trees,andC

(X) becomes the set C

of Σ-contexts.

Algebraic classiﬁcations of regular tree languages 393

5 Finite tree recognizers and regular tree languages

A ﬁnite Σ-algebra A =(A, Σ) may be regarded as a tree automaton that reads ΣX-trees in

a bottom-up,orfrontier-to-root, fashion starting from the leaves and ﬁnishing at the root.

The elements of A are then called states. At a leaf labelled with a nullary symbol c ∈ Σ

starts in state c

. The starting states at leaves labelled with symbols from X are speciﬁed

by a mapping α : X → A.IfA has reached the m immediate descendant nodes of a node

u labelled with an m-ary symbol f ∈ Σ

,wherem>0, in states a

,...,a

, respectively,

then it enters u in state f

,...,a

). Obviously, the root of a ΣX-tree t is reached in

state tα

,whereα

: T

(X) →Ais the homomorphic extension of α. Specifying now a set

of ﬁnal states, we obtain the following tree recognizers that are also called (deterministic)

frontier-to-root tree recognizers.

5.1 Deﬁnition A(deterministic bottom-up)ΣX-recognizer A =(A,α,F) consists of a ﬁnite

Σ-algebra A =(A, Σ), an initial assignment α : X → A,andasetF ⊆ A of ﬁnal states; A

is the state set.TheΣX-tree language recognized by A is the set

T (A)={t ∈ T

(X) | tα

∈ F }.

AΣX-tree language is called recognizable,orregular, if it is recognized by some ΣX-recogni-

zer. Let Rec

(X) be the set of all recognizable ΣX-tree languages.

We may also speak generally about tree recognizers without specifying the alphabets. The

family of all regular tree languages is denoted by Rec. The following example illustrates some

basic capabilities and a limitation of these recognizers.

5.2 Example Let Σ = {f/2,g/1} and let X = {x, y}. For any t ∈ T

(X), we set g

(t)=t

and g

k+1

(t)=g(g

(t)) for all k ≥ 0. The ΣX-tree language

= {p(f(g

(x),g

(y))) | p ∈ C

(X),m≡ 1(mod2),n≥ 2},

formed by the ΣX-trees that have a subtree f(g

(x),g

(y)) with m odd and n ≥ 2, is

recognized by the ΣX-recognizer A =(A,α,F) deﬁned as follows. The state set is A =

−

} and the operations of A are deﬁned thus:

• g

)=a

, g

)=a

, g

)=b

, g

)=g

)=b

• g

)=a

, g

−

)=a

−

• f

)=a

, f

,a)=f

(a, a

)=a

for all a ∈ A,and

• f

(a, b)=a

−

in all remaining cases.

Furthermore, α(x)=a

, α(y)=b

and F = {a

On the other hand, the ΣX-tree language T

= {p(f(g

(x),g

(x))) | n ≥ 0} is not regular.

Indeed, if T = T(A)foraΣX-recognizer A =(A,α,F), then g

(x)α

= g

(x)α

for some

0 ≤ j<k.Nowf (g

(x),g

(x)) ∈ T (A) would follow from

f(g

(x),g

(x))α

= f

(x)α

)=f

(x)α

) ∈ F.

394 M. Steinby

Similarly as the non-regularity of T

in the above example, the following general Pumping

Lemma also follows from the ﬁniteness of the state sets of tree recognizers.

5.3 Lemma For any recognizable ΣX-tree language T there is a number n ≥ 1 such that if

t ∈ T and hg(t) ≥ n, then for some s ∈ T

(X) and p, q ∈ C

(X),

(1) t = s · p · q,

(2) dp(p) ≥ 1, 1 ≤ hg(s · p) ≤ n,and

(3) s · p

· q ∈ T for every k ≥ 0.

We may choose the number of states of any ΣX-recognizer A such that T = T(A)

as the limit n in the Pumping Lemma. Moreover, the Pumping Lemma clearly implies

that T is inﬁnite if and only if there is a t ∈ T such that n ≤ hg(t) < 2n.Although

the algorithm suggested by this fact is not very eﬃcient, it proves the decidability of the

Finiteness Problem “Is T (A) ﬁnite?”. The Emptiness Problem “T (A)=∅?” is decidable,

too. Indeed, if A =(A,α,F)isanyΣX-recognizer, then T (A) = ∅ if and only if some ﬁnal

state a ∈ F is reachable, i.e., a = tα

for some t ∈ T . Moreover, the reachable states form

the subalgebra of A generated by α(X)={α(x) | x ∈ X}, and this can always be computed.

Each regular ΣX-tree language T has a minimal ΣX-recognizer, unique up to isomor-

phism, that can be constructed from any given ΣX-recognizer A =(A,α,F)ofT by ﬁrst

deleting all non-reachable states (the result is a connected recognizer) and then merging all

pairs of equivalent states. For deﬁning the equivalence of states, we need the following notion

that will be used later too.

5.4 Deﬁnition Let A =(A,α,F)beanyΣX-recognizer. The translation p

: A → A of A

deﬁned by a ΣX-context p ∈ C

(X) is deﬁned by setting ξ

and

(a)=f

,...,q

(a),...,t

)

for p(ξ)=f(t

,...,q(ξ),...,t

), where m>0, f ∈ Σ

, t

,...,t

∈ T

(X), q ∈ C

(X), and

a ∈ A.Theset{p

| p ∈ C

(X)} of all translations of A is denoted Tr(A).

It is easy to verify that all translations of a ΣX-recognizer A =(A,α,F) are also trans-

lations of the Σ-algebra A,andthatifA is connected, then Tr(A)=Tr(A). Two states a

and b of a ΣX-recognizer A =(A,α,F)areequivalent if for every p ∈ C

(X), p

(a) ∈ F if

and only if p

(b) ∈ F . The minimization theory of tree recognizers can be found in [28].

The regular tree languages are obtained in many other ways too. A standard subset

construction shows that they are exactly the tree languages recognized by nondeterministic

bottom-up tree recognizers.Alsothenondeterministic top-down tree recognizers that process

a tree starting at the root recognize exactly the regular tree languages. Furthermore, they are

deﬁned by regular tree grammars, certain systems of ﬁxed-point equations, monadic second

order logic etc. A regular tree grammar G =(N,Σ,X,P,a

) generating a regular ΣX-tree

language consists of a ﬁnite set N of non-terminal symbols, the alphabets Σ and X,aninitial

symbol a

∈ N, and a ﬁnite set P of productions, each of them of the form a → x, a → c or

a → f(a

,...,a

), where a, a

,...,a

∈ N, x ∈ X, c ∈ Σ

and f ∈ Σ

(m>0). Derivations

and the tree language T (G)={t ∈ T

(X) | a

⇒

∗

t} generated by G are deﬁned in the

usual manner. We refer the reader to [28, 29], for example, for details and further references.

Algebraic classiﬁcations of regular tree languages 395

Finally, let us note some algebraic characterizations of regularity that are of direct interest

here. The following fact is an immediate consequence of Deﬁnition 5.1.

5.5 Proposition A ΣX-tree language T is regular if and only if there exist a ﬁnite Σ-algebra

A =(A, Σ), a homomorphism ϕ : T

(X) →Aand subset F ⊆ A such that T = Fϕ

−1

We may say that an algebra A =(A, Σ) recognizes aΣX-tree language T if T = Fϕ

−1

for some homomorphism ϕ : T

(X) →Aand a subset F ⊆ A. For every ΣX-tree language

T there is a unique (up to isomorphism) ”smallest” algebra recognizing T , and this algebra

is ﬁnite exactly in case T is regular. Moreover, for a regular T the smallest algebra is the

underlying algebra of the minimal ΣX-recognizer of T . These ideas can be also expressed in

terms of congruences by generalizing Nerode’s Theorem to tree languages.

It is easy to see that for any ΣX-tree language T , there is a greatest congruence θ

(X)thatsaturatesT . It could be called the Nerode congruence of T , and the following

result is Nerode’s Theorem for tree languages.

5.6 Proposition For any ΣX-tree language T , the following three conditions are equivalent:

(1) T ∈ Rec

(X);

(2) T is saturated by a congruence on T

(X) of ﬁnite index;

(3) the Nerode congruence θ

of T is of ﬁnite index.

Proof The equivalence of (1) and (2) follows by Proposition 5.5 as follows.

(1) If T = Fϕ

−1

for some homomorphism ϕ : T

(X) →Aand a subset F ⊆ A,where

A =(A, Σ) is a ﬁnite Σ-algebra, then ker ϕ is a congruence of ﬁnite index on T

(X)

that saturates T .

(2) If θ ∈ Con(T

(X)) is a congruence of ﬁnite index that saturates T , then the ﬁnite

Σ-algebra T

(X)/θ recognizes T . Indeed, it is easy to see that T =(Tθ



)(θ



)

−1

for the

natural homomorphism θ



: T

(X) →T

(X)/θ, t → t/θ.

Of course, (2) and (3) imply each other immediately. 2

A characterization that could be regarded as a counterpart to Myhill’s Theorem is ob-

tained by considering fully invariant congruences on T

(X)(cf.[79,81]).

6 Tree language operations and closure properties

We introduce now several tree language operations and note that the family of regular tree

languages is closed under most of them. As shown by the following obvious lemma, we may

usually assume that all tree languages involved are over the same alphabets Σ and X.

6.1 Lemma If Σ and Ω are ranked alphabets such that Σ ⊆ Ω,andX and Y are leaf alphabets

such that X ⊆ Y , then for any T ⊆ T

(X), T ∈ Rec

(X) if and only if T ∈ Rec

Ω

(Y ).

396 M. Steinby

In particular, if S ∈ Rec

(X), T ∈ Rec

Ω

(Y ), and Σ

∩ Ω

= ∅ whenever m = n,then

S, T ∈ Rec

Σ∪Ω

(X ∪ Y ).

As tree languages are sets, we may apply any Boolean operations, i.e., the usual basic

set-theoretic operations, to them.

6.2 Proposition For any ranked alphabet Σ and any leaf alphabet X, Rec

(X) isaﬁeldof

sets on T

(X).

Proof It is clear that ∅,T

(X) ∈ Rec

(X). If S, T ∈ Rec

(X), then S = T (A)and

T = T(B)forsomeΣX-recognizers A =(A,α,F)andB =(B,β,G). Let C =(C,γ,H)bea

new ΣX-recognizer, where C =(A ×B,Σ) is the direct product A×B, the initial assignment

is γ : X → A × B,x → (α(x),β(x)), and the set of ﬁnal states H ⊆ A × B is speciﬁed

as appropriate. Since tγ

=(tα

,tβ

) for every t ∈ T

(X), it is clear that any Boolean

combination of S and T can be recognized by C by selecting a suitable set H of ﬁnal states.

For example, T(C)=S − T if H = F × (B − G). 2

There are a few diﬀerent natural products of tree languages. The tree language product

T (x ← T

| x ∈ X)ofaΣX-tree language T and an X-indexed family (T

| x ∈ X)of

ΣX-tree languages is the set of all ΣX-trees that can be obtained from a tree t ∈ T by

simultaneously replacing in it every x ∈ X with a tree from the corresponding set T

.The

diﬀerent occurrences of each x may be replaced with diﬀerent trees from T

6.3 Proposition If the ΣX-tree languages T and T

(x ∈ X) are regular, then so is their

tree language product T (x ← T

| x ∈ X).

Regular tree grammars often provide the simplest way to prove a closure result like this.

It is quite easy to construct a regular tree grammar generating T (x ← T

| x ∈ X)ifweare

given grammars generating T and the tree languages T

(x ∈ X). Note that we may assume

that all grammars have pairwise disjoint sets of non-terminals.

It is easy to see that the operation deﬁned in the following corollary is a special case of

the tree language product.

6.4 Corollary For any m>0 and f ∈ Σ

,thef-product

f(T

,...,T

):={f(t

,...,t

) | t

∈ T

,...,t

∈ T

}

of any T

,...,T

∈ Rec

(X), is also a regular ΣX-tree language.

For any given z ∈ X,thez-product S ·

T of two ΣX-tree languages S and T is deﬁned

as the special tree language product T (x ← T

| x ∈ X), where T

= S and T

= {x} for all

x = z, x ∈ X. In other words, any element of S ·

T is obtained from some tree t ∈ T by

replacing each z-labelled leaf with some tree from S, and again diﬀerent z-labelled leaves can

be replaced with diﬀerent members of S.

6.5 Corollary If S, T ∈ Rec

(X),thenS ·

T ∈ Rec

(X) for every z ∈ X.

For any z ∈ X,thez-iteration of a ΣX-tree language T is deﬁned as the union

∗z



k,z

| k ≥ 0} = {z}∪T ∪ ({z}∪T ) ·

T ∪ ...,

where T

0,z

= {z},andT

k,z

= T

k−1,z

T ∪ T

k−1,z

for every k ≥ 1.