Ginger XML: Transport Syntax for Ginger

Author:

Stephen Leach

Email:

stephen.leach@steelypip.com

Created, June 2010

Revised, August 2010, May 2011, Feb 2013, Apr 2013, Jun 2017

Note

Reader, please note that this is very much a work-in-progress. It is being written up as we design the relevant sections in appginger.

Ginger XML Overview

The idea behind GingerXML (*.gnx) is to provide a representation of Ginger code that is neutral. By neutral we mean that it should be both human-readable and machine-friendly. Hence it should be easy for people with some computing background to understand the logic behind it, and easy for people to review but not necessarily write. It must also be easy for them to create programs, in their programming language of choice, that can read, transform and generate GingerXML. And it should be reasonably inexpensive to process, without that being an overriding consideration.

To balance these opposed concerns we borrowed from Lisp, especially Scheme. We selected a simplified subset of XML, called Minimal-XML or MinXML, for short. MinXML is loosely based on s-expressions, which made programming very straightforward. We then designed the elements around Lisp-style primitives. This gave us a very clean design because we did not have to make concessions to programming convenience - because this is a representation that is essentially only read and written by machines, not people.

For example the equivalent of Pop-11’s if f( 0 ) then p else q endif( false ) would be:

<app>
   <if>
         <app>
             <id name="f" />
             <int value="0" />
        </app>
        <id name="p" />
        <id name="q" />
    </if>
    <constant type="bool" value="false" />
</app>

There are disavantages to this approach, of course, such as it being relatively verbose and therefore slowing down compilation. Yet on the whole we believe GingerXML balances the relevant factors and is both effective and pleasing to use.

Element and Meta-attributes

Comment Attribute

Any element may have the optional ‘comment’ attribute. It is a free-text string intended to provide human-readable text. Its context is the node and all nested nodes.

Source File Attribute

Any element may have the optional ‘source’ attribute that describes the source file (or other source) of the element. It is a free-text string. It is considered to apply to all nested nodes.

e.g.

<bind source="hello.common">
    <var name="helloWorld" protected="true/>
    <fn> ... </fn>
</bind>

Span Attributes

Any element may have optional ‘from’ and ‘to’ attributes that describe the span of text of the source file that the element derives from. Each should have the format:

(<digit>+)[.(<digit>+)]

The first group specifies the row number and the second the column. N.B. the column number is optional.

If the ‘from’ attribute is specified then the ‘to’ attribute will default to the same line number as the ‘from’ attribute, although any column will be ignored.

e.g.

<app from="23.8" to="23.64"> .... </app>

Context Attributes

Any element _may_ have an optional context.* attribute which will be printed out to provide additional context. It will typically be a truncated version of the spanned text.

Context - Grammar Role

Any element _may_ be decorated with the grammatical role assigned by the parser.

if test then x else y endif ### may be decorated as follows.

<if context.role="Conditional">
    <id name="test" context.role="Predicate"/>
    <id name="x" context.role="Then-part of Conditional"/>
    <id name="y" context.role="Else-part of Conditional"/>
</if>

Context - Parameter Position

Any expression that is passed as a parameter to a function application _may_ be decorated with its positional value and the name of the function (or description of the function expression) that it is statically called by.

Arguments on the left hand side of the function are numbered negatively, arguments on the right hand side are numbered positively.

x.f( y, z )     ### Original expression.

<app>
    <id name="f"/>
    <seq>
        <id name="x" context.posn="-1" arg.caller="f"/>
        <id name="x" context.posn="1" arg.caller="f"/>
        <id name="x" context.posn="2" arg.caller="f"/>
    </seq>
</app>

Statements

Syntax

STMNT ::=
    DECLARATION
    EXPR

Expressions

Syntax

EXPR ::=
    CONSTANT            ### any literal constant
    VARIABLE            ### reference to a variable
    ASSIGNMENT          ### assignment to a variable
    SEQ                 ### sequence of expressions (comma/semi separated)
    BLOCK               ### introduces a new scope
    FUNCTION            ### a function
    APP                 ### function application
    CONDITIONAL         ### if/unless
    LOOP                ### for loops
    LIST                ### list expressions
    VECTOR              ### vector expressions

Constants

Description

Constants are characterised by having element name ‘constant’ and ‘type’ and ‘value’ attributes. Constants always represent a single IMMUTABLE value. N.B. The compiler is free to share instances of these constants which are equal to each other.

Note that the “type” attribute doesn’t correspond to the class name you may have expected. This is a hangover from early development before the class names were stablised.

Syntax

CONSTANT ::=
    <constant type="absent" value="absent"/>              ### The absent singleton
    <constant type="bool" value=("true"|"false")/>        ### Booleans
    <constant type="indeterminate" value="indeterminate"> ### The indeterminate singleton
    <constant type="int" value=TEXT/>                     ### +/- arbitrary precision
    <constant type="float" value=TEXT/>                   ### We might unify numbers?
    <constant type="string" value=TEXT/>                  ### Immutable strings
    <constant type="symbol" value=TEXT/>                  ### Symbols
    <constant type="char" value=TEXT/>                    ### A single character
    <constant type="sysfn" value=TEXT/>                   ### Named procedure
    <constant type="sysclass" value=TEXT>                 ### Named class
    <constant type="undefined" value="undefined">         ### The undefined singleton

Examples

<constant type="int" value="123"/>
<constant type="float" value="1.2"/>
<constant type="string" value="qwertyuiop"/>
<constant type="char" value="A"/>
<constant type="sysfn" value="+"/>

N.B. Character sequences are multi-valued constants. They are represented as a sequence of characters.

<seq>
    <constant type="char" value="a"/>
    <constant type="char" value="b"/>
    <constant type="char" value="c"/>
</seq>

Available Named Procedures

Note that these constants are not necessarily bound to identifiers in Ginger. These constants are intended as direct support for built-in operators (e.g. arithmetic) and syntactic forms such as list construction, string interpolation, and so on. Here are some examples:

<constant type="sysfn" value="+"/>                  ### }
<constant type="sysfn" value="-"/>                  ### }
<constant type="sysfn" value="*"/>                  ### }- standard arithmetic
<constant type="sysfn" value="/"/>                  ### }
<constant type="sysfn" value="head"/>
<constant type="sysfn" value="newList"/>
<constant type="sysfn" value="newVector"/>
<constant type="sysfn" value="newMap"/>
<constant type="sysfn" value="not"/>                ### Boolean negation
<constant type="sysfn" value="tail"/>

It is intended that all the members of this list are guaranteed to be available from the “std” package. Hence they are functionally equivalent to

<id name=NAME def.pkg="ginger.library"/>

Furthermore, it is important to note that these constants do not have to be implemented efficiently. Compiler writers are permitted to implement these as lambda forms. For example a system function ‘foo’ of 1 argument might be implemented like this:

### permitted possible implementation of unary sysfn called 'foo'
<fn title="foo">
    <var name=”x”/>
    <sysapp name="foo">
        <id name=”x”/>
    </sysapp>
</fn>

In particular it is explicitly permitted that each use of a sysfn _may_ return a different object.

Available Named Classes

There is a built-in class for every type of built-in value, although they are not necessarily bound to identifiers in Ginger. Examples:

<constant type="sysclass" value="Absent"/>          ### class for absent
<constant type="sysclass" value="Bool"/>            ### class for true & false
<constant type="sysclass" value="Small"/>           ### class for 'small' integers
<constant type="sysclass" value="Double"/>          ### class for doubles
<constant type="sysclass" value="String"/>          ### class for strings
<constant type="sysclass" value="Char"/>            ### class for characters
<constant type="sysclass" value="Nil"/>             ### class for nil
<constant type="sysclass" value="Pair"/>            ### class for list pairs
<constant type="sysclass" value="Vector"/>          ### class for vectors
<constant type="sysclass" value="Class"/>           ### class for classes

Note that classes are not exactly he same as types. All function objects share the same class but may have entirely different types.

Variable Reference

Notes: We have to add in name qualification e.g. nicknames. We also should consider a way of allocating local variables guaranteed never to clash with local variables created by the programmer. Maybe have an extra hidden dimension on names??

Note: there are three ways by which a global variable might be referred by.
  1. A qualified reference, using the alias attribute

  2. An unqualified reference, using the enc.pkg (enclosing package) attribute

  3. An absolute reference, using the def.pkg (defining package) attribute

Syntax

VARIABLE ::=
    <id name=NAME
        [enc.pkg=PACKAGE_NAME ]
        [def.pkg=PACKAGE_NAME | alias=NICKNAME   ]
    />

Assignments

Description

N.B. Assignment runs from left-to-right, not following the usual convention. The destination expression may be a complex assignable expression.

Syntax

<set> SRC_EXPR DST_EXPR </set>

Sequences & Blocks

Overview

Sequences are used to create a sequence of expressions. Blocks are sequences with the additional property that they introduce a new scope.

Syntax

SEQ ::=
    <seq> EXPR* </seq>
BLOCK ::=
    <block> EXPR* </block>

Function Applications

Syntax

APP ::=
    <app> EXPR EXPR </app>
    <sysapp name=NAME> EXPR* </sysapp>

SysApps

SysApp’s are invocations of the built-in functions. Each built-in function is named and can be referred to via

  • <sysapp name=NAME> EXPR* </sysapp>, which compiles into a function call

  • <constant type=”sysfn” name=NAME/>, which will compile into a function object

  • <id def.pkg=”ginger.library” name=NAME/>, which will compile into a variable that references a function object.

Of these three methods, only the direct function call is guaranteed to be efficient. The other two forms are permitted to be relatively inefficient. In support of this, the compiler writer is allowed to make reasonable assumptions to help performance e.g. the call may be inlined, computed at compile-time, overflow checking may be deferred until the end of the parent block, no debug information may be available, the garbage collector may be blocked, and so on.

Note that it is also guaranteed that direct calls of sysfns will be as efficient as sysapps.

### This form will be treated as a sysapp.
<app><sysfn value="foo"/> ... </app>

Effectively it turns into

<sysapp name="foo"> ... </sysapp>

See sysapps in detail for more information.

Conditionals

Notes: In progress - I am designing these as multi-part if/then/elseif/../else/endif forms. This means they are an easy target for compiling switches. Short circuits need to be fleshed out.

Syntax

CONDITIONAL ::=
    <if> ( IF_PART THEN_PART )*  [ELSE_PART] </if>
    <and> EXPR* </and>
    <or> EXPR* </or>
    <absand> EXPR* </absand>                          ### &&
    <absor> EXPR* </absor>                            ### ||

IF_PART ::= EXPR
THEN_PART ::= EXPR
ELSE_PART ::= EXPR
SWITCH ::=
    <switch> VALUE_PART ( CASE_VALUE CASE_BODY )* [ ELSE_PART ] </switch>

VALUE_PART ::= EXPR
CASE_VALUE ::= EXPR
CASE_BODY  ::= EXPR
ELSE_PART  ::= EXPR

List Expressions

Description

Lists are implemented as singly linked chains. The list syntax is a shorthand for calling the ‘newList’ function. The lists that are constructed are guaranteed to be immutable and may or may not share. The empty list ‘nil’ is guaranteed to be a unique object.

Syntax

LIST ::= <list> EXPR* </list>

Vector Expressions

Description

Vectors are implemented as contiguous arrays. The vector syntax is a shorthand for calling the ‘newVector’ function. The vectors that are constructed are guaranteed to be immutable and may or may not share.

Syntax

VECTOR ::= <vector> EXPR* </vector>

For Loops

Overview

In Ginger, looping is unified with the process of finding the solutions of a query, so all of the expressive work is carried out by the query. For each solution of the query a new set of bindings is made to the variables of the query.

Queries are designed so that the familiar loops can all be easily composed and all work in the expected way. As an example, iterating a variable n from A to B becomes:

<for>
    <do>
        <from>
            <var name="n"/>
            <var name="A"/>
            <var name="B"/>
        </from>
        THE BODY OF THE LOOP
    </do>
</for>

Syntax

LOOP ::= <for> QUERY </for>

Queries

Overview

Queries are a signature feature of Ginger. They represent multiple alternative bindings to a set of variables and, as such, are a match-based generalisation of assignment.

Syntax

QUERY ::=
    <bind> PATTERN EXPR </bind>
    <from> PATTERN FROM_EXPR [ BY_EXPR [ TO_EXPR ] ] </from>
    <in> PATTERN EXPR </in>
    <do> QUERY EXPR </do>
    <cross> QUERY QUERY </cross>
    <zip> QUERY QUERY </zip>
    <while> QUERY EXPR </while>
    <ok />
    <fail />
    <once />

bind

The bind element is described in the section on Declarations and Patterns. It corresponds to a simple match between a pattern and an expression.

from

The from element binds the pattern successively to the numbers in the inclusive range FROM_EXPR to TO_EXPR (default +infinity) in increments of BY_EXPR (default 1).

in

The in element binds the pattern successively to the members of the iterable value computed by EXPR.

do

The do element runs EXPR each time a solution is found for QUERY.

cross

The cross element combines two queries together finding their cross-product. The combined set of solutions is the set of solutions of the right hand query in the context of the left hand query. The right hand query may assume that the variables of the left hand query are bound (but not vice versa).

zip

The zip elements combines two queries together in parallel. Unlike the cross element, the subqueries share no bindings.

ok, fail, once

The ok element is an atomic query that always succeeds without binding any variables, fail never succeeds and once succeeds only once. All three can be simulated by the from element.

Declarations and Patterns

Overview

Declarations match a pattern with an expression - patterns being limited expressions that contain pattern variables. N.B. The intention is to fit this to the pattern/query proposal.

<bind>
    PATTERN
    EXPR
</bind>

Syntax

<bind>
    PATTERN
    EXPR
</bind>

A PATTERN is any of the following

PATTERN ::= PATTERN_VAR | PATTERN_ANON | PATTERN_SEQ | PATTERN_APP | PATTERN_CONST

PATTERN_VAR ::=
    <var
        name=NAME
        [(match|type)=TYPE_EXPR]
        [protected=BOOL]
        [enc.pkg=PACKAGE_NAME]
        [def.pkg=PACKAGE_NAME |
         qualifier=ALIAS_NAME ]
        ( (tag0|tag1|..)=TAG_VALUE )*
    />

PATTERN_ANON ::=
    <var/>

Note

Qualifier or alias? We have some terminological confusion from different rounds of discussion being exposed.

PATTERN_SEQ ::=
    <seq> PATTERN* </seq>

PATTERN_CONST ::=
    EXPR

PATTERN_APP ::=
    <app> EXPR PATTERN </app>

Note

At the time of writing we have not implemented PATTERN_CONST or PATTERN_APP.

Pattern Variables

These are the most basic and familiar types of pattern. They introduce an optionally typed variable. The protected attribute plays the same role as in Pop-11, protecting the variable from assignment (n.b. this is shallow rather than deep protection.)

name=NAME The “name” attribute is optional. If it is omitted then it is an anonymous variable.

type=TYPE_EXPR The type-check will be made BEFORE assignment and a failed type-check will generate an error.

match=TYPE_EXPR The type-check is made BEFORE the assignment and failure will cause the matcher to backtrack.

protected=BOOL If “true” variable is protected against subsequent assignments. Generated by val and define declarations. If “false” the variable may be assigned to. If omitted the default is “true”.

Top level variables may also be given tags and package qualifiers.

tagN=TAG_VALUE Tags the variable.

qualifier=ALIAS The name is qualified by an import alias.

pkg=PACKAGE_NAME The package name is an absolute reference to a package.

Comment! Qualifier or alias!

Note: we also need to cope with forward declarations.

As a Query

A bind declaration is a type of query that either fails or succeeds once. In particular this loop would execute precisely once:

<for>
    <bind>
        <var name="foo"/>
        <absent value="absent"/>
    </bind>
    STATEMENTS
</for>

Examples

### Note that var/val introduces a query in Ginger. The '=' operator
### is a query operator whose LHS is a pattern. Identifiers are
### parsed as pattern-variables within a pattern, taking on the
### default protection of the var/val.
var x = 99;
<bind><var name="x"/><constant type="int" value="99"/></bind>

### The identifiers can given overrides for protection or type.
val [ x, var y, z : bool ] = f();
<bind>
    <app>
        <id name="newList">
        <seq>
            <var name="x" protected="true"/>
            <var name="y" protected="false"/>
            <var name="z" type="bool" protected="true"/>
        </seq>
    </app>
    <app><id name="f"/></app>
</bind>

### Ensure that p returns a single value which is an integer.
val _ : int = p();
<bind>
    <var type="int" protected="true"/>
    <app><id name="p"/></app>
</bind>

### The 'define' form also introduces an implicit PATTERN = EXPR
### bindings where EXPR will be the arguments to the function.
define K( x )( y ) => x enddefine;
<bind><var name="K"/><fn name="K"><var name="x"/><fn><var name="y"/><id name="x"/></fn></fn></bind>

Packages and Imports

Note

This section did not reflect the current implementation and needs further discussion. In practice the fetchgnx tool discharges the packages and imports before the Ginger Virtual Machine gets to see them. As a consequence it has been moved aside to Packages and Imports.