A Mathematical Theory of Grammatical Categories Abstract: Every mainstream linguistic theory categorizes expressions and provides some system for naming the categories (with atomic symbols, function expressions, or feature complexes of some kind), but the conceptual and empirical basis for categorization is seldom explored. In the 1950's the most prominent idea was that categories are sets of substitution-equivalent strings, but the various rigorous formulations of this idea are empirically inaccurate and fail to correspond to the way linguists actually use categories. More sophisticated approaches build on the idea that the categories partition (or cover) the form-meaning pairs in a language in a way that stands in a certain relation to the generative (or relational) structure of the language. After critiquing some of the earlier ideas about categories, this class explores proposals of the latter kind, with attention to the possibility that an optimal categorization of expressions may be induced by the generative structure of grammar rather than being simply stipulated. (We find some models for this kind of "optimal categorization" in formal language theory: e.g. the Nerode equivalence classes as the categories of canonical regular grammars, and the elements of the Rabin&Scott, Schutzenberger "syntactic monoid".) This builds on earlier foundational work by Keenan & Stabler as follows: K&S treat a grammar G as a lexicon and a set F of generating functions; the language L(G) is the closure of the lexicon wrt F. This definition partitions L(G) by equating two expressions iff some automorphism of L(G) a permutatation of L(G)that fixes each function in F) maps one expression to the other. These equivalence classes are unlike linguists' categories because they (typically) distinguish elements of differing derivational complexity. But we expect that having a given category is an automorphism invariant property. And some natural coarsenings of the automorphism partition come close to standard notions and suggest what an "optimal" categorization might be. E.g. consider the coarsest partition of L(G) that is a congruence in the sense that (using = for equivalence) for all f in F and all tuples s,t in L(G)*, if s is in dom(f) and t is the result of replacing the i'th coordinate si in s by ti where si=ti, then f(s)=f(t). This does not fit all reasonable proposals about categorizations, as we show, but it comes close, and variations on the idea will be explored. We also show how fundamental sub- and super-category relations are induced by grammar and so do not need to be stipulated.