Quintus Prolog Manual


(PREV) (NEXT)

g-16: Grammar Rules

This chapter describes Quintus Prolog's grammar rules, and the translation of these rules into Prolog clauses. At the end of the chapter is a list of grammar-related built-in predicates.

g-16-1: Definite Clause Grammars

Prolog's grammar rules provide a convenient notation for expressing definite clause grammars, which are useful for the analysis of both artificial and natural languages. The usual way one attempts to make precise the definition of a language, whether it is a natural language or a programming lanaguage, is through a collection of rules called a "grammar". The rules of a grammar define which strings of words or symbols are valid sentences of the language. In addition, the grammar generally analyzes the sentence into a structure which makes its meaning more explicit. A fundamental class of grammar is the context-free grammar (CFG), familiar to the computing community in the notation of "BNF" (Backus-Naur form). In CFGs, the words, or basic symbols, of the language are identified by "terminal symbols", while categories of phrases of the language are identified by non-terminal symbols. Each rule of a CFG expresses a possible form for a non-terminal, as a sequence of terminals and non-terminals. The analysis of a string according to a CFG is a parse tree, showing the constitutent phrases of the string and their hierarchical relationships. Context-free grammars (CFGs) consist of a series of rules of the form:

        nt --> body.

where 'nt' is a non-terminal symbol and body is a sequence of one or more items separated by commas. Each item is either a non-terminal symbol or a sequence of terminal symbols. The meaning of the rule is that 'body' is a possible form for a phrase of type 'nt'. A non-terminal symbol is written as a Prolog atom, while a sequence of terminals is written as a Prolog list, whereas a terminal may be any Prolog term. Definite clause grammars (DCGs) are a generalization of context-free grammars and rules corresponding to DCGs are referred to as "Grammar Rules". A grammar rule in Prolog takes the general form

        head --> body.

meaning "a possible form for head is body". Both body and head are sequences of one or more items linked by the standard Prolog conjunction operator ',' (comma). Definite clause grammars extend context-free grammars in the following ways:

g-16-2: How to Use the Grammar Rule Facility

Following is a summary of the steps which enable you to construct and utilitze definte clause grammars: STEPS:

  1. Write a grammar, using -->/2 to formulate rules.
  2. Compile the file containing the grammar rules. The Load Predicates call expand_term/2 which translates the grammar rules into Prolog clauses.
  3. Use phrase/[2,3] to parse or generate strings.

OPTIONAL STEPS:

  1. Modify the way in which Prolog translates your grammar rules by defining clauses for term_expansion/2.
  2. In debugging or in using the grammar facility for more obscure purposes it may be useful to understand more about expand_term/2 and 'C'/3.

g-16-3: An Example

As an example, here is a simple grammar which parses an arithmetic expression (made up of digits and operators) and computes its value. Create a file, 'grammar.pl' containing the following rules:

expr(Z) --> term(X), "+", expr(Y), {Z is X + Y}.
expr(Z) --> term(X), "-", expr(Y), {Z is X - Y}.
expr(X) --> term(X).

term(Z) --> number(X), "*", term(Y), {Z is X * Y}.
term(Z) --> number(X), "/", term(Y), {Z is X / Y}.
term(Z) --> number(Z).

number(C) --> "+", number(C).
number(C) --> "-", number(X), {C is -X}.
number(X) --> [C], {"0"=<C, C=<"9", X is C - "0"}.

In the last rule, C is the ASCII code of a decimal digit. This grammar can now be used to parse and evaluate an expression by means of the built-in predicates phrase/2 and phrase/3. For example,

        | ?- [grammar].
        | ?-  phrase(expr(Z), "-2+3*5+1").

        Z = 14

        | ?-  phrase(expr(Z), "-2+3*5", Rest).

        Z = 13,
        Rest = [] ;

        Z = 1,
        Rest = "*5" ;

        Z = -2,
        Rest = "+3*5" ;

        no

g-16-4: Translation of Grammar Rules into Prolog Clauses

Grammar rules are merely a convenient abbreviation for ordinary Prolog clauses. Each grammar rule is translated into a Prolog clause as it is compiled. This translation is described below. The procedural interpretation of a grammar rule is that it takes an input list of symbols or character codes, analyzes some initial portion of that list, and produces the remaining portion (possibly enlarged) as output for further analysis. The arguments required for the input and output lists are not written explicitly in a grammar rule, but are added when the rule is translated into an ordinary Prolog clause. The translations shown differ from the output of listing/[0,1] in that internal translations such as variable renaming are not represented. This is done in the interests of clarity. For example, a rule such as (A) will be depicted as translating into (B) rather than (C).

        p(X) --> q(X).                              (A)

        p(X, S0, S) :-
                q(X, S0, S).                        (B)

        p(A,B,C) :-
                q(A,B,C).                           (C)

If there is more than one non-terminal on the right-hand side, as in (D) the corresponding input and output arguments are identified, translating into (E):

        p(X, Y) --> q(X), r(X, Y), s(Y).            (D)

        p(X, Y, S0, S) :-                           (E)
            q(X, S0, S1),
            r(X, Y, S1, S2),
            s(Y, S2, S).

Terminals are translated using the built-in predicate 'C'(S1, X, S2), read as "point S1 is connected by terminal X to point S2", and defined by the single clause

        'C'([X|S], X, S).

(This predicate is not normally useful in itself; it has been given the name uppercase 'c' simply to avoid pre-empting a more useful name.) Then, for instance (F) is translated into (G):

        p(X) --> [go, to], q(X), [stop].            (F)

        p(X, S0, S) :-                              (G)
            'C'(S0, go, S1),
            'C'(S1, to, S2),
            q(X, S2, S3),
            'C'(S3, stop, S).

Extra conditions expressed as explicit procedure calls, enclosed in curly braces, naturally translate into themselves. For example (H) translates to (I):

        p(X) --> [X], {integer(X), X > 0}, q(X).    (H)

        p(X, S0, S) :-                              (I)
            'C'(S0, X, S1),
            integer(X),
            X > 0,
            q(X, S1, S).

Similarly, a cut is translated literally. Terminals on the left-hand side of a rule, enclosed in square brackets, translate into 'C'/3 goals with the first and third arguments reversed. For example, (J) becomes (K):

        is(N), [not] --> [aint].                    (J)

        is(N, S0, S) :-                             (K)
            'C'(S0, aint, S1),
            'C'(S, not, S1).

Disjunction has a fairly obvious translation. For example, (L), a rule which equates phrases like '(sent) a letter to him' and '(sent) him a letter', translates to (M):

        args(X, Y) -->                              (L)
                dir(X), [to], indir(Y) |
                indir(Y), dir(X).

        args(X, Y, S0, S) :-                        (M)
            (   dir(X, S0, S1),
                'C'(S1, to, S2),
                indir(Y, S2, S)
            |   indir(Y, S0, S1),
                dir(X, S1, S)
            ).

Tip: In order to look at these translations, declare the grammar rules dynamic and use listing/[0,1]. However, in a grammar rule like 'head --> body', if head has n arguments, the dynamic declaration is for foo/n+2. For example, the following declaration for grammar rule (L) would enable you to list its translation, (M):

         :- dynamic(args/4).

g-16-5: Summary of Predicates


Copyright (C) 1997 AI International Ltd
contact: product support sales information