Software and Programming Language Theory
Static and dynamic semantics
Scope
Typing
Programming language specification
Programming language semantics
- Static semantics
- static properties of programs, which can be determined without executing them
- variable bindings (what does variable name correspond to)
- static type checking (do the expressions satisfy constraints of typing system)
- “context-sensitive” restriction on valid programs
- Dynamic semantics
- how do we compute the value of expressions, statements, etc.
- “recursively enumerable” restriction on valid programs
Naming
In programming languages we usually name entities
- variables
- function arguments
- classes
- modules
Binding
Binding is the process of assigning the meaning to a name.Scope
Scope of a binding is a textual region in program in which a binding is active. We may also use term scope to designate the region of a program of maximal size in which no bindings are destroyed (masked).
Lexical scope (implemented in most of statically-typed programming languages, e.g. C/C++) binds identifiers inside some specific parts of program source code (lexical context).
Closure
A closure is a persistent scope which holds on to local variables even after the code execution has moved out of that block [1].
Static semantics: which variables to persist and share between closures?
Dynamic scope
Some languages (like ECMAScript/JavaScript or some kinds of LISP) have a concept of the dynamic scoping. Binding of identifiers occurs at run-time.
this
in an object-oriented language may be seen as either an implicit argument to a function or as a way of introducing a dynamic scope.
Name overloading
template<typename T>
T mul(int i, int j)
{
// If you get a compile error, it's because you did not use
// one of the authorized template specializations
const int k = 25 ; k = 36 ;
}
template<>
int mul<int>(int i, int j)
{
return i * j ;
}
template<>
std::string mul<std::string>(int i, int j)
{
return std::string(j, static_cast<char>(i)) ;
}
If the language supports name overloading, the same identifier in the same place may have a different meaning depending on some external information (like typing). Usually in imperative languages name overloading is restricted to functions names.
int mul(int i, int j) { return i*j; }
std::string mul(char c, int n) { return std::string(n, c); }
int n = mul(6, 3); // n = 18
std::string s = mul(static_cast<char>(6), 2); // s = "110"
int n = mul<int>(6, 3); // n = 18
std::string s = mul<std::string>(54, 2); // s = "110110"
short n2 = mul<short>(6, 3); // error: assignment of read-only variable ‘k’
Languages supporting function overloading: C++, C#
Languages without support for function overloading: Haskell, ECMAScript, C (but C11 has the _Generic
keyword: http://stackoverflow.com/a/25026358)
Lexical scope declarative region
Where does the declaration effect start/stop?
Use before definition
Static semantics representation
Returning to the topic of formal mechanized analysis of programs.
Syntax is a transformation from String
to AST
.
Static semantics, in the same way, is a transformation from AST to AST with variable bindings information. How would we store this information?
Easier approach: generate unique names and store scope as a separate value.
AST+Bindings = Σ (ast : AST /[Id := ℕ]) . (scope : Map<ℕ, AST> ) . (p : ∀ { id : ℕ } ∈ ast, id ∈ scope).
α-equivalence and de Bruijn indices
(see the previous semester: https://maxxk.github.io/formal-models/presentations/06-Markov-Refal-Lambda-calculus.html#%D1%81%D1%82%D1%80%D0%BE%D0%B3%D0%BE%D0%B5-%D0%BE%D0%BF%D1%80%D0%B5%D0%B4%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5 )
Higher order abstract syntax
— the technique to capture the variable binding in abstract syntax tree (Miller, 1987; Phenning, 1988).Abstract syntax:
type var = string
type typ =
| Bool
| Arrow of typ * typ
type exp =
| Var of var
| True
| False
| App of exp * exp
| Abs of var * exp
Parametrized Higher-order syntax captures the types of variables:
Ornaments
Conor McBride. «Ornamental Algebras, Algebraic Ornaments»
One base data type defines the induction structure, basic constructors:
Then we place an additional data (ornament) to the leaves of the induction structure:
Recursion
Suppose we started translating AST to static semantics. Before we process second line and see that B
is a class, what is the static semantics of B
at the first line?
Approach: fixed point (works for classical Domain Theory, doesn’t work well in a constructive setting).
N.S. Papaspyrou. Formal semantics of the C Programming Language (PhD thesis).
Static semantics
The static semantics is the description of the structural constraints (context-sensitive aspects) that cannot be adequately described by context-free grammars.
Source: Anthony Aaby. Introduction to Programming Languages. 1996. Online publication.
In addition to variable binding, static semantics includes the specification of possible values and valid literals, and also the type system (for strongly-typed programming languages).
Static semantcs for ECMAScript
Examples of static semantics for dynamically-typed language in specification of ECMAScript (JavaScript):
ECMA-262. ECMAScript Language Specifcation.
http://www.ecma-international.org/ecma-262/6.0/#sec-static-semantics-mv
Static formal semantics for ANSI C
Example of mathematically specified formal semantics for C programming language:
N.S. Papaspyrou. A Formal Semantics for the C Programming Language. PhD Thesis. 1998 http://www.softlab.ntua.gr/~nickie/Papers/papaspyrou-1998-fscpl.pdf
Part II. Static Semantics.
Type conversions
Type conversion — mapping from the values of one type to the corresponding values of a different type. For example, integers to floating point numbers: 1 → 1.0 or strings to codepoints to integers.
- cast — explicit type conversion
- coercion — implicit type conversion, performed automatically.
Type conversions
- some languages (Ada, Go) do not support any form of the coercion, some (Pascal) support only “lossless” coercions, others (C++, C#) even allow user-defined coercions.
-- Ada:
n: Integer; x: Float;
x := n; -- error
n := x; -- error
x := Float(n);
n := Integer(x); -- rounding
double
allow exact representation for 53-bit integers.
Implicit coercions
public class Author
{
public string First;
public string Last;
public string[] BooksArray;
}
public class Writer
{
public string FirstName { get; set; }
public string LastName { get; set; }
public List<string> Books { get; set; }
}
public static implicit operator Writer(Author a)
{
return new Writer
{
FirstName = a.First,
LastName = a.Last,
Books = a.BooksArray != null ? a.BooksArray.ToList() : null
};
}
Author a = new Author
{
First = "Vijaya",
Last = "Anand",
BooksArray = new string[] { "book1" }
};
Writer w = a; //implicitly casting from Author to Writer.
Homework assignments (1/2)
Task 5.1. ** Implement a translator from simply-typed lambda calculus AST to HOAS in Haskell or any other language of your choice (except OCaml) which is expressive enough to represent HOAS.
Task 5.2. *** Implement a translator from simply-typed lambda calculus AST to PHOAS in Agda or any other language of your choice which is expressive enough to represent PHOAS.
Task 5.3. * Look at the assembly code generated by GCC on example code from “Closure” slide and explain how “closures” are implemented in this case.
Task 5.4. Describe static semantics of BNF in terms of Parametrized Higher Order Syntax.
- “on paper” (**)
- in Coq, following the original paper (4*)
Homework assignments (2/2)
Task 5.5. *** Write a formal specification (in Haskell, Standard ML, Coq or Agda) for ISO C99 Arithmetic operands conversion (section 6.3.1 Arithmetic operands).
Formal specification must be presented as a function with input as an AST with operands annotated with the typing information and with output as an AST with operands replaced by the typing information and transformations specified in draft or with special value “Undefined behavior”. Example:
Inputs in C syntax:
Inputs as a possible AST:
Assign (Variable "pi" Integer32) (Literal (Float32 3.14))
Assign (Variable "i" Integer32) (Literal (Float32 1e30))
Outputs as a possible AST:
Assign (Variable "pi" Integer32) (IntegerPart (Literal (Float32 3.14)))
UndefinedBehavior