GF-based speech grammars

Introduction

The grammars that are available on this page cover some fixed domains (e.g. calculator, maps, alarm clock) and define two kinds of languages for talking about these domains:

The syntax of the natural languages is designed to reflect the common ways of expressing the sentences of the domains. Furthermore, it mimics spoken language (e.g. the syntax does not describe any orthographic symbols or conventions).

The language App has the following properties:

The grammars are implemented in Grammatical Framework (GF), a formalism that supports precise translation between the languages. For example, we can automatically translate the English

how much is two liters in cubic centi meters

to Estonian and App:

kui palju on kaks liitrit kuup senti meetrites

convert 2 L to cm^3

The last form can be directly evaluated by resolving e.g. the URL

http://www.wolframalpha.com/input/?i=convert+2+L+to+cm^3

Grammars

The grammars are listed in the following table where the columns stand for:

  1. name of the grammar;
  2. short description;
  3. supported natural languages (note that every grammar also contains the App-language);
  4. random-generated example sentences (which often do not agree with common sense but still convey some idea of what is syntactically possible in the described languages);
  5. link to a compiled/portable grammar file.
Name Description Languages Examples PGF
Action Union of Alarm, Calc, Dial, Direction, and Weather Est,Eng txt pgf
Alarm Simple 24h-clock, e.g. "set alarm to fourteen fifty seven" Est,Eng txt pgf
Calc Union of ArithExpr and Unitconv Est,Eng txt pgf
Dial Sequence of digits forming a phone number Est,Eng txt pgf
Direction FROM-TO queries over Estonian place names (e.g. "Pariisi") and the street addresses of Tallinn and Tartu (e.g. "Lossi plats kaks") Est txt pgf
Estvrp Estonian vehicle registration plate language, e.g. "ABC 123" Est txt pgf
ArithExpr Arithmetical expressions, e.g. 1 + -2.3 * PI ^ 5 Est,Eng txt pgf
Go Tiny example grammar Est,Eng txt pgf
Symbols Arbitrary length sequence of digits and Estonian letters Est txt pgf
Tallinndirection FROM-TO queries over Tallinn's street addresses Est txt pgf
Unitconv Unit conversion expressions, e.g. 12.34 km^2 in ft^2 Est,Eng txt pgf
Weather Weather queries with Estonian place names (e.g. Est: "Lasila ilmateade"; App: weather Lasila, Estonia) Est txt

Note that the examples only list one form for each example. The App sentences typically do not have more forms anyway, but the natural languages do contain some variation, e.g. articles can be dropped or requests can be optionally prefixed by "please".

All the grammars are modular, e.g. Calc is a union of ArithExpr and Unitconv. The above table lists mostly the higher-level grammars. The module import-hierarchy of the Action-grammar is displayed below.

Abstract grammar component hierarchy for the Action-grammar

Usage

The grammars can be downloaded in the (binary) PGF-format. This format can be embedded into applications (using existing GF libraries for various programming languages) or it can be opened by the GF commandline tool and converted into other formats (e.g. JSGF, Javascript, ...). The example sentences of the introduction were produced with this Unix+GF commandline:

echo 'p -lang=Eng "how much is two liters in cubic centi meters" | l -lang=Est,App -bind' |\
gf --run Action.pgf

Usage in applications:

Source code

The source code of all the grammars is available at http://github.com/Kaljurand/Grammars/. The PGF files have been generated based on this commit.

Contributing

Everybody is welcome to contribute grammars to this project provided that the grammar is made available as open source. If you would like to contribute a grammar please proceed as follows

  1. fork the Grammars-project on GitHub,
  2. create a new branch in your fork,
  3. add your grammar in this branch, and
  4. submit the branch via a pull request.

Read more about using pull requests.

Publications

Kaarel Kaljurand (kaljurand@gmail.com), 2014-07-01