Software and Programming Language Theory

Domain-specific languages

Intermediate representation languages

https://maxxk.github.io/programming-languages/

Domain-specific languages

https://cacm.acm.org/magazines/2011/7/109910-dsl-for-the-uninitiated/fulltext

Domain-specific programming language is a language which is targetet to the specific application domain.

For the further reference, syllabus for CMU seminar on DSLs: https://github.com/jeanqasaur/dsl-syllabus-fall-2016

Example: HTML and CSS

Domain: text markup (HTML — HyperText Markup Language) and hierarchical styling (CSS — Cascading Style Sheets). Recent application — user interface design.

No red text

Example: UNIX shell

Domain: OS process, input/output management, pipelining

Example: GrGen

Domain: graph modeling and rewriting

http://www.info.uni-karlsruhe.de/software/grgen/

https://en.wikipedia.org/wiki/GrGen

node class GridNode {
    food:int;
    pheromones:int;
}
node class GridCornerNode extends GridNode;
node class AntHill extends GridNode {
    foodCountdown:int = 10;
}
node class Ant {
    hasFood:boolean;
}

edge class GridEdge connect GridNode[1] -> GridNode[1];
edge class PathToHill extends GridEdge;
edge class AntPosition;
rule TakeFood(curAnt:Ant)
{
    curAnt -:AntPosition-> n:GridNode\AntHill;
    if { !curAnt.hasFood && n.food > 0; }
    modify {
        eval {
            curAnt.hasFood = true;
            n.food = n.food - 1;
        }
    }
}

rule SearchAlongPheromones(curAnt:Ant)
{
    curAnt -oldPos:AntPosition-> old:GridNode <-:PathToHill- new:GridNode;
    if { new.pheromones > 9; }
    modify {
        delete(oldPos);
        curAnt -:AntPosition-> new;
    }
}

test ReachedEndOfWorld(curAnt:Ant) : (GridNode)
{
    curAnt -:AntPosition-> n:GridNode\AntHill;
    negative { 
        n <-:PathToHill-;
    }
    return (n);
}

Example: blockchain contract languages

https://blog.chain.com/announcing-ivy-playground-395364675d0a

contract CallOption(
  strikePrice: Amount,
  strikeCurrency: Asset,
  seller: Program,
  buyerKey: PublicKey,
  expiration: Time
) locks underlying {
  clause exercise(
    buyerSig: Signature
  ) requires payment: strikePrice of strikeCurrency {
    verify before(expiration)
    verify checkTxSig(buyerKey, buyerSig)
    lock payment with seller
    unlock underlying
  }
  clause expire() {
    verify after(expiration)
    lock underlying with seller
  }
}

Blockchain contract languages

http://solidity.readthedocs.io/en/latest/

pragma solidity ^0.4.11;

contract SimpleAuction {
    address public beneficiary;
    uint public auctionStart;
    uint public biddingTime;

    address public highestBidder;
    uint public highestBid;

    mapping(address => uint) pendingReturns;

    bool ended;

    event HighestBidIncreased(address bidder, uint amount);
    event AuctionEnded(address winner, uint amount);

    function SimpleAuction(
        uint _biddingTime,
        address _beneficiary
    ) {
        beneficiary = _beneficiary;
        auctionStart = now;
        biddingTime = _biddingTime;
    }

    function bid() payable {
        require(now <= (auctionStart + biddingTime));

        require(msg.value > highestBid);

        if (highestBidder != 0) {
            pendingReturns[highestBidder] += highestBid;
        }
        highestBidder = msg.sender;
        highestBid = msg.value;
        HighestBidIncreased(msg.sender, msg.value);
    }

    function withdraw() returns (bool) {
        var amount = pendingReturns[msg.sender];
        if (amount > 0) {
            pendingReturns[msg.sender] = 0;

            if (!msg.sender.send(amount)) {
                pendingReturns[msg.sender] = amount;
                return false;
            }
        }
        return true;
    }

    function auctionEnd() {
        require(now >= (auctionStart + biddingTime)); // auction did not yet end
        require(!ended); // this function has already been called

        ended = true;
        AuctionEnded(highestBidder, highestBid);

        beneficiary.transfer(highestBid);
    }
}

Embedded and external DSLs

External DSL (all previous examples) — programming language with its own syntax and tools (translator).

Embedded DSL (internal DSL, following examples) uses facilities of some general-purpose host language.

LINQ

Domain: declarative relational queries

Example: Haskell eDSLs

https://github.com/bitemyapp/esqueleto

All “keywords” and “operators” are ordinary functions in Haskell (except do, $ and lambda-abstraction)

Example: Ruby eDSLs

https://github.com/felipecsl/wombat

Example: Lisp DSLs

http://swizard.info/articles/solitaire/article.html

Language-oriented programming

Ward, MP. “Language-Oriented Programming.” Software - Concepts and Tools 15, no. 4 (1994): 147–61.

http://www.cse.dmu.ac.uk/~mward/martin/papers/middle-out-t.pdf

This paper describes the concept of language oriented programming which is a novel way of organising the development of a large software system, leading to a different structure for the finished product. The approach starts by developing a formally specified, domain-oriented, very high-level language which is designed to be well-suited to developing “this kind of program”.

The development process then splits into two independent stages: 1. Implement the system using this “middle level” language 2. Implement a compiler or translator or interpreter for the language, using existing technology.

http://www.onboard.jetbrains.com/articles/04/10/lop/

https://www.martinfowler.com/articles/languageWorkbench.html

Example: JetBrains MPS

https://www.jetbrains.com/mps/

https://confluence.jetbrains.com/display/MPSD20171/Shapes+-+an+introductory+MPS+tutorial

Language composition

  • some syntax subtrees may be common for different languages, for example, boolean and arithmetical expressions
  • we can decompose some existing language to a separate parts, for example in CSS there is a selector syntax and property syntax:

Selectors language is reused in JavaScript (initially reimplemented in jQuery library, in HTML5 actually reused by document.querySelector function).

Language composition

  • some well-designed programming libraries usually provide some kind of embedded DSLs

(example is from Django framework, SQLAlchemy provides even better examples)

  • problems: syntax composition (how to write parser which can detect language boundaries), language interoperability

Towards intermediate representation languages

  • language-oriented approach presumes the extensive usage of different DSLs
  • problem 1: how to write the effecient translators for these languages?
  • problem 2: how do languages interoperate with each other?

Intermediate representation

Intermediate representation programming language is a special case of programming language which serves as an interface between some stages of program translation. Intermediate representations usually are not used by humans, have no concrete syntax (except for the purposes of debugging).

  • DSLs interoperability can be represented in terms of a common intermediate representation
  • common intermediate representation may have formal specification (in this case the semantics of DSL is specified in terms of translation)

Classification

Vasenin V. A., Krivchikov M. A. Program Intermediate Representation Techniques, Programmnaya Ingeneria, 2017, vol. 8, no. 8, pp. 345—353 (in Russian)

Low-level intermediate representations

Bytecode for imperative languages

Bytecode for functional languages

Typed control flow graphs

High-level language-specific representations

Low-level intermediate representations

  • linear code structure
  • data types are specified in terms of memory layout
  • instruction set is close to machine/assembly language

Low-level intermediate representation

  • three-address code, abstract representation, see “Dragon book” (Aho, Seti, Ullman “Compilers: Principles, Techniques, and Tools”)
    • each instruction has opcode and at most 3 operands (“addresses”)
  • RTL (register transfer language), abstract representation
    • GCC uses RTL
  • SSA (static single assignment form), abstract representation
  • TAL (typed assembly language) is formally specified low-level intermediate representation, code blocks are represented by triples (H — heap state, R — register state, I — instruction sequence) Morrisett G. et al. From System F to Typed Assembly Language // ACM Trans. Program. Lang. Syst. 1999. Т. 21, № 3. Сс. 527–568.

Bytecode application virtual machines

  • linear code structure; high-level instructions
  • high-level (usually memory layout-independent) date type representation
  • typing, first-order polymorphism
.method private hidebysig static void DictionaryInitialize () cil managed
{
    .maxstack 3
    .locals init (
        [0] class [mscorlib]System.Collections.Generic.Dictionary`2<string,string>
    )

    newobj instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string,string>::.ctor()
    stloc.0
    ldloc.0
    ldstr "1"
    ldstr "2"
    callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string,string>::set_Item(!0, !1)
    ldloc.0
    ldstr "3"
    ldstr "4"
    callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string,string>::set_Item(!0, !1) 
}

Bytecode for imperative languages

  • CLI/JVM ISO. ISO/IEC 23271:2003: Information technology — Common Language Infrastructure.

    Examples of high-level instruction: ldfld, callvirt, isinstance

  • Vortex IL Dean J. et al. Vortex: An Optimizing Compiler for Object-oriented Languages // Proceedings of the 11th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications. New York, NY, USA: ACM, 1996. Сс. 83–100.

class Square isa Rhombus, Quadrilateral;
method draw__6Square_int(this, color):void;
associate draw__6Square_int with draw__5Shape_int(@Square, @any)

send draw__5Shape_int(s, color)

Bytecode for functional languages

  • usually untyped

  • supports closure creation at the level of bytecode

  • ZINC — intermediate representation for ML-family languages Leroy X. The ZINC experiment: an economical implementation of the ML language: Technical report. INRIA, 1990.

  • Python bytecode https://laanwj.github.io/2011/5/4/python-bytecode-archeology example instructions: LOAD_CLOSURE, MAP_ADD (add item to dictionary)

  • Warren’s Abstract Machine (for Prolog language) Aït-Kaci H. Warren’s Abstract Machine: A Tutorial Reconstruction. Cambridge, Mass: The MIT Press, 1991. 114 с. example instructions: try_me_else (representation of choice list), unify_constant

     girl(sally).
     girl(jane).
    
     boy(B) :- \+ girl(B).
    predicate(girl/1):
        switch_on_term(2,1,fail,fail,fail),
    label(1): switch_on_atom([(sally,3),(jane,5)])
    label(2): try_me_else(4)
    label(3): get_atom(sally,0)
            proceed
    label(4): trust_me_else_fail
    label(5): get_atom(jane,0)
            proceed
    
    predicate(boy/1):
        get_variable(x(1),0)
        put_structure(girl/1,0)
        unify_local_value(x(1))
        execute((\+)/1)])

Typed control flow graphs

  • graph-based code structure (special constructions at join points are used instead of labels and jump instructions) Leißa R., Köster M., Hack S. A graph-based higher-order intermediate representation // 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2015. Сс. 202–212.

High-level language-specific representations

  • Clight in CompCert (previously discussed)
  • LISP (and, less frequently, Forth) also can be used as high-level general-purpose representations

Nanopass compiler

D. Sarkar, O. Waddell, and R. K. Dybvig. A nanopass infrastructure for compiler education. In ICFP ’04: Proceedings of the ninth ACM SIGPLAN International Conference on Functional Programming, pages 201–212, New York, NY, USA, 2004. ACM.

Translation pipeline is specified in terms of ~50 simple transformation steps between languages. Nanopass framework provides the specification DSL for languages. Languages may be specified in terms of modification of output languages from previous pass.