Café Math : Cassipoeia Type System (Part 1)

Cassiopeia is a small computer algebra system I am writing. It is still very basic, but it is aimed at having a rich a powerful type system. This article is the first of a series on the theory behind that type system. The goam is to provide a conveinent dictionary between some sophisticated computer software type system (like that of C#) and the theory of categories which I chose as a foundation of the mathematical engine.

Definition 1.
A *signature* $\Sigma$ is a given by an alphabet $A$ partitioned in two distinguished subalphabets $F$ and $V$, and a mapping $\alpha : A \to \mathbb{N}$ such that for all $v \in V$ we have $\alpha(v) = 0$.

Elements of $F$ are *function symbols*, elements of $V$ are *variable symbols*. Given a symbol $a \in A$, the integer $\alpha(a)$ is the *arity* of the symbol $a$. Variable symbols all have arity zero, and zero arity function symbols are called *constant symbols*.

Definition 2.
A *term* $t$ of signature $\Sigma$, or *$\Sigma$-term*, is a tree with nodes labelled by the alphabet $A$. Outgoing degrees of nodes are constrained by the arity function $\alpha$ in the sense that each node labelled by a symbol $a \in A$ has degree $\alpha(a)$.

A term is uniquely decomposed in an expression of the form $t = (a\,;\,t_1, ..., t_n)$ with $\alpha(a) = n$.
Such an expression can conveniently be written $a(t_1, ..., t_n)$ if no confusion arises. Terms of the form $t = (a\,;\,\,)$ *i.e.* with $\alpha(a)=0$ are identified with $a$ so we can write $t \in F$ to mean a *constant term* and $t \in V$ to mean a *variable term*.

Example 1. We take the signature $\Sigma$ given by $A=F=\{\, a, b \,\}$, $V = \emptyset$, $\alpha(a)=0$ and $\alpha(b)=2$. The set of all terms of that signature is infinite and is composed of binary trees as.

The number of such terms with $n$ nodes labelled by $b$ is the $n^\text{th}$ Catalan number, $$ \begin{align*} C_n = \frac{1}{n+1} \binom{2n}{n} \end{align*} $$ The generating series of this sequence of numbers is, $$ \begin{align*} G(t) &= \frac{1-\sqrt{1-4t}}{2t} \\ &= 1+t+2\,t^2+5\,t^3+14\,t^4+42\,t^5+132\,t^6+... \end{align*} $$

More generally, given any signature with a finite alphabet $A = F = \{\, a_1, ...,a_n \,\}$, $V = \emptyset$ and $\alpha(a_1) = k_1$, ..., $\alpha(a_n)=k_n$, we can count the number $N_k$ of terms of that signature having exactly $k$ nodes by introducing the following generating series, $$ \begin{align*} G(t) = \sum_{k \ge 0}\, N_k \,t^k \end{align*} $$ It is subject to the following algebraic equation, $$ \begin{align*} G(t) = t\,(G(t)^{k_1}+ ... + G(t)^{k_n}) \end{align*} $$ although this is not always easily solved for a compact closed form, one can easily pump the terms of the series using the Newton-Raphson method. For that one defines a sequence $(g_k)_{k\ge 0}$ of series approximations by recurrence, $$ \begin{align*} g_0 &= N_1\,t + o(t)\\ g_{k+1} &= g_k - \frac{g_k - t\,(g_k^{k_1} + ... + g_k^{k_n})}{1 - t\,(k_1\,g_k^{k_1 - 1} +...+k_n\,g_k^{k_n-1})} + o(t ^{2^{k+1}}). \end{align*} $$ where $N_1$ is the number of constant symbols in the signature. The series approximation $g_k$ is exact up to order $2^k$ which provides fast convergence.

Example 2. Taking $A=F=\{\, a, b \,\}$, $V = \emptyset$, $\alpha(a)=0$ and $\alpha(b)=3$ one obtains ternary trees. Applying the above method one finds the following approximation to the generating series after five iterations, $$ \begin{align*} G(t)& = t+t^4+3\,t^7+12\,t^{10}+55\,t^{13}+273\,t^{16}+1428\,t^{19}+7752\,t^{22}+43263\,t^{25}\\ &\quad +246675\,t^{28}+1430715\,t^{31} + o(t^{32}) \end{align*} $$

Definition 3.
An *algebra* $A$ of signature $\Sigma$, or a *$\Sigma$-algebra*, is given by a set $A_\circ$ and for each $a \in F$ with $\alpha(a) = n$ a mapping of sets,
$$
\begin{align*}
f_a\,:\, A_\circ^n \to A_\circ
\end{align*}
$$

The set $A_\circ$ is the set of *elements* of the algebra. The mappings $(f_a)_{a\in F}$ are the *operations* of the algebra.

Definition 4.
A *morphism* $\phi$ between two algebras $A$ and $A'$, is given by a mapping between their set of elements such that for each $a\in F$ with $\alpha(a) = n$ and all $x_1, ...,x_n \in A_\circ$ we have,
$$
\begin{align*}
\phi(f_a(x_1, ..., x_n)) = f'_a(\phi(x_1),...,\phi(x_n))
\end{align*}
$$

Definition 5.
The *universal algebra* $U$ of signature $\Sigma$ is the $\Sigma$-algebra which elements are the terms of signature $\Sigma$ and which family of operations is defined for each $a \in F$ with $\alpha(a) = n$ and any terms $t_1, ..., t_n$ by,
\begin{align*}
f_a(t_1,..., t_n) = (a\,;\,t_1,...,t_n)
\end{align*}

Theorem 1. (*Universality*)
Given any $\Sigma$-algebra $A$ and any interpretation $\iota : V \to A$, there is one and only one morphism of $\Sigma$-algebra $\phi_U : U \to A$ such that for each $v \in V$ we have $\phi_U(v) = \iota(u)$

Proof. Existence and unicity both come from the induction principle.

Corollary. The universal $\Sigma$-algebra is unique up to a unique isomorphism of $\Sigma$-algebras (hence its name).

Proof. As usual, given $U_1$ and $U_2$ satisfying the definition we get two morphisms $\phi_{21} : U_1 \to U_2$ and $\phi_{12} : U_2 \to U_1$. By the unicity of $\phi_{1} : U_1 \to U_1$ and $\phi_{2} : U_2 \to U_2$ we get $\phi_{12} \circ \phi_{21} = \mathrm{Id}_{U_1}$ and $\phi_{21} \circ \phi_{12} = \mathrm{Id}_{U_2}$.

If we chose for one moment to distinguish an algebra $A$ from its set of elements $A_\circ$ and a morphism of algebras $\phi : A \to A'$ from the mapping of the underlying sets $\phi_\circ : A_\circ \to A'_\circ$ we get a seemingly dull covariant functor $(-)_\circ$ from the category of $\Sigma$-algebras to that of sets. The definition of the universal algebra describes another functor in the other direction, from the category of sets to that of $\Sigma$-algebras. We have a formal adjunction between those two functors. $$ \begin{align*} \mathrm{Hom}(U_V, A) \underset{nat.}{\simeq} \mathrm{Hom}(V, A_\circ) \end{align*} $$

The Hom-set on the left is the set of all morphisms of $\Sigma$-algebras from $U_V$ to $A$ and the Hom-set on the right is the set of all *interpretations* of the set of variable $V$ in the algebra $A$, that is, mappings associating an element of $A$ to any variable symbol. The formal adjunction means that any interpretation of the variable symbols in $A$ gives rise to a unique morphism of $\Sigma$-algebras $U_V \to A$ as the definition says, and that the converse is also true as is readily seen.

We consider a type system where types are described by terms called *type terms*.
Variables of a type term are called *type variables*, constants are called *primitive types*, bound type terms are called *generic types*, unbound types are called *non-generic types*. Function symbols are called *type constructors*, and non-constant type terms are called *non-primitive* or *compound types*.

A subtyping relation is a partial order on the set of types. Due to soundness considerations we shall impose some restriction on such a relation.

Given two type terms $A$ and $B$ we say that $B$ extends $A$ or that $B$ is a subtype of $A$ and we note $B :A$

(To be continued...)

Samuel VIDAL posted 2012-04-29 21:05:30

Thanks ;-)

Always a good job right here. Keep rolling on through.