Skip to content

Latest commit

 

History

History
2319 lines (1867 loc) · 80.4 KB

LRM.md

File metadata and controls

2319 lines (1867 loc) · 80.4 KB

C* - Language Reference Manual

Github link: https://github.com/kkysen/cstar/blob/main/LRM.md

Table of Contents

Table of Contents

Overview

C* is a general-purpose systems programming language. It is between the level of C and Zig on a semantic level, and syntactically it also borrows a lot from Rust (pun intended). It is meant primarily for programs that would otherwise be implemented in C for the speed, simplicity, and explicitness of the language, but want a few simple higher-level language constructs, more expressiveness, and some safety, but not so many overwhelming language features and implicit costs like in Rust, C++, or Zig.

It has manual memory management (no GC) and uses LLVM as its primary codegen backend, so it can be optimized as well as C, or even better in cases. All of C*'s higher-level language constructs are zero-cost, meaning none of those features give it any overhead over C, which often lead to a highly-optimized style where in C you would take less efficient shortcuts (e.x. function pointers and type-erased generics) and use dangerous constructs like goto. In the future, it may also have a C backend so that it can target any architecture where there is a C compiler.

While a general-purpose language, C* will probably have the most advantages when used in systems and embedded programming. It's expressivity and high-level features combined with its relative simplicity, performance, and explicitness is a perfect match for many of these low-level systems and embedded programs.

Table of Contents

A C* Program

A C* program is a top-level C* module.

Note that italics will be used here to refer to placeholders for language items, not the items themselves.

Modules

Every C* file (by default using a .cstar extension) must be UTF-8. Each file is implicitly a module, though modules can also be declared inline with the mod name {} keyword*. Everything between the braces belongs to the module name.

A module is composed of a series of top-level items (aka declarations), which may be one of:

These items may be proceeded by a single publicity modifier and any number of annotations.

Comments may also appear anywhere.

C* is not whitespace sensitive, i.e., any consecutive sequence of whitespace may be replaced by any other consecutive sequence of whitespace without changing the meaning of the program. A unicode character is considered whitespace if it matches the \p{Pattern_White_Space} unicode property.

Table of Contents

Identifiers

Identifiers in C* may be any UTF-8 string in which the first characters is _, $, or matches the \p{XID_Start} unicode property, and the remaining characters match the \p{XID_Continue} unicode property, except for the following exceptions:

Identifiers may begin with $ but are only definable by the compiler as intrinsics.

There are no keywords at the lexer level, but identifiers may not be a C* keyword. They may also not be the boolean literals true or false.

_ is a valid C* identifier at the syntactic level, but has a special meaning and cannot be used everywhere. That is, it can only be assigned to.

Examples:

// valid identifiers
let validWord: u32 = 2;
fn get_num() = {}
enum 小笼包 {}

// invalid identifier
let 2words = 2;
struct const {}

Table of Contents

Keywords

Keywords are reserved identifiers that cannot be used as regular identifiers for other purposes.

C* keywords:

  • use
  • let
  • mut
  • pub
  • try
  • const
  • impl
  • fn
  • struct
  • enum
  • union
  • return
  • break
  • continue
  • for
  • while
  • if
  • else
  • match
  • defer
  • undefer

There are also reserved keywords:

  • trait

Table of Contents

Comments

C* contains multiple types of comments

Table of Contents

// Single-Line Comments

Tokens followed by // until a \n newline are considered single-line comments.

Table of Contents

/// Doc Comments

Tokens followed by /// until a \n newline are considered doc comments. They are a form of single-line comments, but may also be processed by tools for generating documentation.

Table of Contents

/* */ Nested, Multi-Line Comments

Tokens followed by /* are considered multi-line comments. They can be nested, and end at the next */ that is not a part of an inner multi-line comment. They also do not have to be multi-line, and can comment out only part of a line.

Table of Contents

/- Structural Comments

/- denotes a structral comment. It comments out the next item in the AST, which could be the next expression, function, type definition, etc.

Example:

// This is a regular single line comment.

/// This is a doc comment for the function below.
fn foo() = {}

/* This is a multiline comment
Everything inside here is commented out until "* /"
*/

/* They can be /* nested */, too. */
fn /* and appear in-between things */ bar() = {}

/- let x = 25; // This comments out the entire let expression.

Table of Contents

pub Publicity

All top-level items (except impl blocks) may be prefixed with a publicity modifier.

The syntax for this is pub.

Following the pub, there may also be a module path within parentheses, like this: (path).

If there is no publicity modifier, i.e. no pub, then the publicity of the item is private, i.e. pub(self).

Only public items may be used from other modules. Private items may only be used for the current module or its descendants.

Table of Contents

Annotations

All items may be prefixed with any number of annotations, which annotate the item with certain metadata.

The syntax for this is @annotation, where annotation is the name of the annotation. Note that annotations may be imported (used) or referred to with their fully-qualified path.

They may also have an argument_list after the annotation. Having no argument_list is equivalent to having an empty, 0-length argument_list. The argument_list is a normal C* argument_list, except this one must be a compile-time constant.

The exact annotations available is still being decided, but a few of them may be:

  • @extern
  • @abi("abi"), like @abi("C") or the default @abi("C*")
  • @inline
  • @noinline
  • @impl(type1, ..., typeN)
  • @align(alignment)
  • @packed
  • @allow("warning_name")
  • @non_exhaustive

For now, any available annotations will be implemented in the compiler, though this could change in the future.

Annotations can also be applied to the current module. In this case, they must appear before any other items in the module and are prefixed with an extra @, like @@allow("unused_variable").

Table of Contents

use Declarations

use declarations are used to import items/declarations from other modules, such as the standard library, external libraries, your own defined modules, or certain types.

Their syntax is use = use path, where path = identitifier.path.

That is, it imports a path to an item to be used without path qualification within the current scope.

path can also end in .*. The * indicates all items, so this imports all items from the parent path.

Table of Contents

lets

A let binds an expression to a name. That expression can either be a value or a type.

Normally (in expressions), let bindings can be shadowed, but they cannot be at the module level.

Table of Contents

Value lets

For values, the syntax of this is let mut? identifier: type = expr;?.

The mut is optional. If there is no mut, then the variable is an immutable const. If there is a mut, then it is a mutable global variable.

In normal let bindings, expr can be any C* expression, and the : type may be omitted where inferrable, but at the top, global level, the expr must be constant evaluated and the type must be annotated. The way to do the former is by using a const { ... } block, which evaluates the block to a constant at compile time.

A value let can also create zero, one, or multiple bindings at once through destructuring a pattern. If the pattern is tautological, i.e. the pattern always matches, then the bindings are always created. If the pattern may not match, then the let expression is a bool and may be used in ifs or matches. In this case, the let binding(s) are only created if the pattern matches and the let expression evaluated to true. Note that matching a non-tautological let is possible but very un-idiomatic, since the binding could simply be done in the match itself. Thus, it is normally used with if.

See pattern matching for more info on patterns and destructuring.

Table of Contents

Type lets aka Type Aliases

For types, the syntax of this is let identifier generic_parameter_list? = type;.

The type here may be any type expression that a value would be annotated with. For example, this includes named types, tuples, arrays, slices, function pointers.

See below for info on the optional generic_parameter_list.

Note that this only creates an alias of the type, but does not actually create a new type. For example, the type alias cannot be used as a namespace for methods or enum variants.

For example, you could have these type aliases:

let Option<T> = Result<T, ()>;
let Bool = Option<()>;
let Point = (f64, f64);

Table of Contents

fn Function Declarations

fn declarations declare functions.

The syntax of this is fn identifier generic_parameter_list? parameter_list: type = expr.

The identifier is the name of the function, the generic_parameter_list optional generic parameters, the parameter_list required normal (non-generic) parameters, the type the return type of the function, and the expr the return value of the function.

Generic Parameters

A generic_parameter_list is delimited by < > angle brackets and contains , comma-separated generic parameters. A trailing comma is allowed.

Each generic parameter is a generic type or a generic constant*. If it is a generic constant, then it requires a : type annotation.

Note that an empty generic_parameter_list like <> is semantically distinct from no generic_parameter_list at all. Generic functions are monomorphized (see generics for more).

Also, the < > angle brackets as used for generics has higher precedence than the < > comparison operators.

Parameters

A parameter_list is delimited by ( ) parentheses and contains a , comma-separated parameters. A trailing comma is allowed. Each parameter is a let binding except without the let keyword. However, in function declarations, the parameters must have : type annotations. Note that the similar function literals/values do not require this.

Return Type

The : type may be omitted if the type is the unit () type.

Return Value

The expr that the function returns may be any expression. However, normally it is a { ... } block, which is necessary to include multiple statements in a function. The block (like any) may also have modifiers, like try { ... } or const { ... }. Returning a const { ... } from a function in particular marks that function as constant evaluatable*.

Normally a ; is required to end the return value, except if a block is used as the return value, then it does not require the ;.

A function return block is slightly special in that return may be used within it, which is equivalent to a break from that top-level function block.

If a function is annotated with @extern, then it must omit the = expr and end with a ;. In this case, only the function signature is specified and the @externed function must be available as a function symbol at link time or else there will be a compile error.

Note that @abi("C") is usually specified along with @extern because the default @abi("C*") is unstable.

In an @extern @abi("C") function, the last (but not only) parameter may also be ..., which is a C varargs parameter and may be called with multiple arguments. This is only for C FFI for functions like syscall, which otherwise we'd need to implement with some assembly.

Note that @extern and @abi("C") may also be specified for an entire module, in which case it applies to all items within that module.

Function Examples

For example, a non-generic function may look like this:

fn foo(_a: i32, b: usize, _c: String): usize = b * b;

or this:

fn string_len(c: String): usize = {
    c.len()
}

and a generic function may look like this:

fn equals<T>(a: T, b: T): bool = {
    a.equals(b)
}

Table of Contents

struct Declarations

struct declarations declare a struct type, which is a product type of its field types. All fields are always initialized.

The syntax of this is struct identifier generic_parameter_list?{ fields }, where identifier is the name of the struct type, generic_parameter_list are its generic parameters, and fields is a , comma-separated list of fields. A trailing comma is allowed. Zero fields is also allowed.

The syntax of each field is a value let without the let and the = expr;. Each field may also be prefixed by a publicity modifier.

Note that mut can be specified for these fields, in which case they are have interior mutability, i.e., they can be mutated through a non-mut pointer to the struct.

By default, structs use @abi("C*"), which means their layout and alignment is unspecified and unstable. This allows for fields to be rearranged for optimizations. If @abi("C") is specified, however, then the fields are layed out in memory in the order they appear in, and C alignment and padding rules are used.

Table of Contents

enum Declarations

enum declarations declare an enum type, which is a sum type of its variants. That is, it is a discriminated union of variants, each of which may have a value or not. A value of an enum type is always one of its variants and cannot be anything except those variants. The discriminant value is stored.

The syntax of this is enum identifier generic_parameter_list?{ variants }, where identifier is the name of the struct type, generic_parameter_list its generic parameters, and variants is a , comma-separated list of variants. A trailing comma is allowed. Zero variants is also allowed, but note that this means that the enum can never be instantiated because it has no variants.

Each variant may have a value or not. If a variant does not have a value, then the syntax is identifier. By default, the discriminant value of each variant is chosen by the compiler, but this may be overridden for each variant if all the variants of the enum have no value. The syntax for this is identifier = expr, where expr must be a const { ... } block evaluating to the integer to be used for the discriminant.

If a variant does have a value, then the syntax is identifier(type). Note that only one type is allowed here. If you wish to include multiple types, simple use a tuple or struct instead.

All variants of an enum implicity use pub as their publicity modifier, which cannot be changed.

By default, enums use @abi("C*"), which means their layout and alignment is unspecified and unstable. This allows for the layout, including the discriminant, to be optimized. Generally, though, the size of an enum type is the size of the discriminant plus the size of the largest variant data.

If all the variants have no values, then @abi("C") may be specified. In this case, you must also specify the size of the enum by adding a : type following the identifier name, where the type is a primitive integer type. In this case, all the variant discriminants must fit within that type.

The @non_exhaustive attribute can also be applied to an enum type, in which case matching all the variants is no longer considered an exhaustive match, and a catch-all _ => match arm is required.

Table of Contents

union Declarations *

union declarations declare a union type, which is a non-discriminated union similar to C unions. It is meant for C FFI and thus defaults to @abi("C").

The syntax of a union type declaration is the same as a struct type declaration, except the struct keyword is replaced by the union keyword.

The difference between the two is semantics. The size of a union is the size of its largest field and only one field may be active at any time. Reading from an inactive field is undefined.

Table of Contents

impl Blocks

impl blocks define associated items for a type, which includes methods.

The syntax for this is impl generic_parameter_list? type { items }, where type is the type you are defining associated items for, generic_parameter_list is any generic parameters needed for type, and items are items like those in a module.

Within an impl block, there is an implicit type alias defined: let Self = type;, where type is the same type being implemented.

Items defined within an impl block are available through the type as if it were a module. The exception is methods, which may be called in another way as well. A method is a function in an impl block whose first parameter is self: Self. The : Self may be inferred (an exception for function declarations). To call a method, you may also call it using . syntax on a value of the impl type. That is, value.method(args) is syntactic sugar for type.method(value, args) where value: type.

Table of Contents

Type System

C* types can be split up into three kinds of types:

Table of Contents

Primitive Types

The primitive types in C* are:

Table of Contents

() Unit Type

Table of Contents

bool Type

bool is the boolean type in C*, except it is actually defined as an enum:

@allow("non_title_case_types")
enum bool {
    false = const { 0 },
    true = const { 1 },
}

Normally operator overloading is not allowed in C*. The exception is bool, which defines the normal boolean operators. See operators for details on them.

Table of Contents

Integer Types

Table of Contents

Float Types

Table of Contents

character Type

Table of Contents

Built-In Compound Types

The built-in compound types in C* are:

Table of Contents

Reference Types

In C*, you can have a reference to any type. That reference is either immutable or mutable.

There is one exception to this. type.$bit_size_of() must be a multiple of 8. That is, bit fields like u1 or i5 may not be referenced.

The syntax for an immutable reference is type&, and the syntax for a mutable reference is type&mut.

An immutable reference can be created using the postfix .& reference operator from either an immutable or mutable binding. A mutable reference can be created using the postfix .&mut mutable reference operator, but only from a mutable binding.

Both immutable and mutable references can be dereferenced using the postfix .* dereference operator. This creates a temporary, unnamed, non-copied, immutable binding. A mutable reference can also be dereferenced mutably using the postfix .*mut mutable dereference operator. This is the same as the .* deference operator, except the resultant temporary is mutable.

Note that references can only be created by referencing an existing value. Thus, null references are impossible to create. Instead, Option should be used, like Option<T&>.

Table of Contents

Slice Types

In C*, you can also have a slice of a type, a contiguous collection of values of the same type. The number of values is only known at runtime.

The syntax for this is type[].

A slice T[] is similar to the struct

struct SliceT {
    len: usize,
    ptr: T&,
}

but there are a few important differences. Slices store their values inline.
They are thus unsized (i.e. dynamically sized) (.$size_of() is non-const for them). However, references to slices are sized. They are so-called fat pointers, i.e. the length and raw pointer both constitute the reference.

Slices are the only fundamentally unsized types. Other compounds may only contain at most one unsized type, and if they do, then they themselves are unsized. Like slices, references to any unsized type are fat pointers.

To access the values of a slice, the [] index operator may be used: value[index], where index is a value of an unsigned integer type and value is a reference to a value of slice type. Note that if you have a slice reference, it must be derefenced before indexing the slice directly.

Indexing a slice referenceT[]& evaluates to Result<T&, IndexBoundsError>, and indexing a mutable slice reference T[]&mut evaluates to Result<T&mut, IndexBoundsError>. Thus, it is always bounds checked. To panic on an out-of-bounds index, simply .unwrap() the Result to get the T& or T&mut, which can then be dereference to access. To elimiate bounds checking, the Result can instead be .unwrap_unchecked() to get the T& or T&mut without checking if there was an error, thus eliminating the bounds check.

Bounds checking can also be eliminated in many other safe ways. Bounds checking is usually only a problem when it is done for many elements of a slice when it only needs to be done once. For this case, multiple elements can be indexed using a slice pattern (see patterns), or an iterator can be used, which will eliminate redundant bounds checking.

Slices can also be sliced to yield a smaller view of the original slice. This is also done by the same [] indexing operator, except now the syntax is value[range], where range is a value of range type.

Slicing a slice reference T[]& evaluates to Result<T[]&, SliceBoundsError>, and slicing a mutable slice reference T[]&mut evaluates to Result<T[]&mut, SliceBoundsError>.

Table of Contents

Array Types

In C*, there also arrays of a type, which, like slices, are a contiguous collection of values of the same type, but unlike slices, have a length known at compile time and not stored at runtime. Thus, they are sized unliked slices.

The syntax for this type is type[size], where size is a const of an unsized integer type.

Arrays can also be indexed and sliced, but since the length is known at compile time, if the index or range is also known at compile time, then indexing and slicing always succeeds at runtime (i.e. there is no Result) yielding another array, or else is a compile error. The same syntax is used for indexing and slicing as is for slices.

To explicitly turn an array into a slice reference, .$cast<T[]>() can be used.

Table of Contents

Pointer Types

In C*, you can have a pointer to any type, That reference is either immutable or mutable.

There is one exception to this. type.$bit_size_of() must be a multiple of 8. That is, bit fields like u1 or i5 may not be referenced.

The syntax for an immutable reference is type*, and the syntax for a mutable reference is type*mut.

A pointer can point to 0, 1, or any number of the pointee type.

A pointer can only be created from an explicit cast from a reference type and through the return type of an @extern function. It is just meant primarily for FFI.

A pointer cannot be dereferenced directly. It must be explicitly cast to one of these types to be dereferenced:

  • a reference if it points to 1 pointee type
  • a slice if it points to any number of pointee types of runtime-known amount
  • an array if it points to any number of pointee types of compile-time-known amount
  • None if it is a null pointer

Table of Contents

Tuple Types

In C*, you can also have a contiguous collection values of different types, i.e. a heterogenous array of sorts. This is called a tuple and its length must be known at compile time.

The syntax for this type is (types), where types is a list of , comma-separated type s. A trailing , comma is allowed. However, in a single-element tuple, a trailing comma is required to differentiate from general parentheses.

The elements of a tuple can be accessed as fields like in a struct. In fact, a tuple is syntax sugar for an anonymous struct with all public fields, though there is one caveat. The fields of a tuple are decimal integer literals (the index), which would not otherwise be allowed as an identifier for a field name. Note that like structs, tuple elements may be not layed out in memory in order.

Table of Contents

Function Types

The type of a function fn(a: A, b: B): C is fn(A, B): C.

The syntax for this is fntuple_type: type, where tuple_type is a tuple type of the arguments and type is the return type.

Other postfix type modifiers (e.x. *, &, []) applied at the end by default apply to the return type. To apply them to the entire function type, the function type must be parenthesized, like (fn(A): B)&.

Table of Contents

User-Defined Compound Types

The user-defined compound types in C* are:

They correspond to the item declarations of the same name.

Table of Contents

struct Types

See struct declarations for more.

Table of Contents

enum Types

See enum declarations for more.

Table of Contents

union Types

See union declarations for more.

Table of Contents

Destructive Moves

Passing a variable (to a function, to another variable, etc.) are done by moving destructively. That is, a simple memcpy to the new location. There are no move constructors or anything like that. Clones must be explicit with a .clone() call for Clone types (@impl(Clone)). The exception is Copy types (@impl(Copy)), for which clones are implicit.

Table of Contents

Expressions

Almost everything that is not a type in C* is an expression. This includes all control flow constructs.

Table of Contents

Literals

C* Literals:

Table of Contents

Unit Literals

In C*, every expression has a type. Even statements that return "nothing", they really return unit, or ().
The type of this unit literal is also called unit and written () as well.

Table of Contents

Boolean Literals

There are two boolean literals of type bool: true and false. These are actually enum variants of the enum bool. See the bool Type.

Table of Contents

Number Literals

In C*, number literals are composed of 4 (potentially optional) parts (in order):

  • the integral part
  • the floating part (optional)
  • the exponent (optional)
  • the suffix (optional)

For each of the integral part, floating part, and exponent, they contain an optional sign, optional base, and then a series of one or more digits.
Note that each part may specify a different base.

The sign may be + for positive numbers, - for negative numbers, or nothing, which defaults to +.

The base and corresponding digits may be:

Prefix Name Base Digits
none decimal 10 0-9
0b binary 2 0-1
0o octal 8 0-8
0x hexadecimal 16 0-9, A-F

The series of digits may also be separated by any number of _ underscores between the digits. It cannot begin or end with _ underscores, however.

If there is a floating part, then a decimal point . separates it from the preceeding integral part. The floating part may not have a sign and is always positive (in itself).

If there is an exponent, then an e precedes it.

The (optional) suffix contains the type of number and a bit size.

The type of number may be:

  • u: unsigned integer
  • i: signed integer
  • f: floating-point number

The bit size is usually a literal power of 2 number, but may be any positive integer for integer types. It may also be a word whose bit size is architecture-dependent.

For integers (u and i), the common bit sizes are:

  • 8
  • 16
  • 32
  • 64
  • 128
  • size (bit size necessary to store an array index)
  • ptr (bit size necessary to store a pointer or the difference between them)

For floats (f), the bit sizes are:

  • 16
  • 32
  • 64
  • 128

These suffixes are the primitive number types. Thus, in total, they are (with their C equivalent for FFI):

C* C
u8 uint8_t
i8 int8_t
u16 uint16_t
i16 int16_t
u32 uint32_t
i32 int32_t
u64 uint64_t
i64 int64_t
u128 unsigned __int128
i128 __int128
usize size_t
isize ssize_t
uptr uintptr_t
iptr intptr_t
f16 _Float16
f32 float
f64 double
f128 _Float128

Integers always use 2's-complement and floats always are IEEE 754 floating point numbers.

If the type is a float, then it must contain a . decimal point and a floating part. If the type is an integer, then it must not. Both can contain exponents, though for integers, the exponent (in scientific notation) cannot cause the integer to exceed its finite size.

If there is no suffix type, then the type is inferred. If there is a . decimal point, then the type must be a float, and vice versa with integers. If there is a - sign for the integral part, then the type must be a float or a signed integer. To infer the bit size of the number, general type inference is used. If it cannot be unambiguously inferred, then it is an error and the user must explicitly specify the suffix type.

Table of Contents

Character Literals

In C*, character literals are of type char and are denoted with single '' quotes. They are unicode scalar values, which are slightly different from unicode code points. This means they are always 32 bits on all architectures.

For the actual char literal within the quotes, it may be any unicode scalar value, but some characters need to be or may be escaped. The ascii values that must be escaped are:

  • \n: newline
  • \r: carriage return
  • \t: tab
  • \0: null char
  • \\: backslash
  • \': single quote

Other ascii values may also be escaped as well using the syntax \x7F, where 7F is the hexadecimal value of the ascii character, from 0 to 127 (aka 0x7F). Thus it may only be two digits.

Unicode scalar values can also be escaped with the syntax \u{7FFF}. The hexadecimal value is the 24-bit unicode character code.

Character literals can also be prefixed with a b: b' ', in which case they are byte literals, i.e. a u8. The required ascii escapes are the same, though the \xFF escape can now go up to 255 (aka 0xFF), and there may not be unicode escapes (since it's only a u8 byte literal now).

Table of Contents

String Literals

There are multiple types of strings in C* owing to the inherent complexity of string-handling without incurring overhead. The default string literal type is String, which is UTF-8 encoded and wraps a *[u8]. This is a borrowed slice type and can't change size. To have a growable string, there is the StringBuf type, but there is no special syntactic support for this owned string. Strings are made of chars, unicode scalar values, when iterating (even though they are stored as *[u8]).

Then there are byte strings, which are just *[u8] and do not have to be UTF-8 encoded. String literals for this are prefixed with b, like b"hello". The owning version of this is just a Box<[u8]> (notice the unsized slice use), and the growable owning version is just a Vec<u8>.

Furthermore, for easier C FFI, there is also CString and CStringBuf, which are explicitly null-terminated. All other string types are not null-terminated, since they store their own length, which is way more efficient and safe. Literal CStrings have a c prefix, like c"/home".

And finally, there are format strings. Written f"n + m = {n + m}", they can interpolate expressions within {}. Format, or f-strings, don't actually evaluate to a string, but rather evaluate to an anonymous struct that has methods to convert it all at once into a real string. Thus, f-strings do not allocate.

For the character literals allowed in C* strings, that depends on the string type, which are:

Prefix Name Type
none string String
b byte-string *[u8]
r raw-string type without the r
c c-string CString
f f-string anonymous struct with methods

All of these string prefixes can be combined with each other, except for r and f, since f-strings require escaping, which goes against raw strings.

For r raw strings, no escapes are allowed.

For normal UTF-8 strings (which includes the r, c, and f modifiers), the string must contain character literals, except there are no single ' quotes anymore, double " quotes delimit strings, and double quotes must escaped (\") instead of single quotes (\'). Obviously the escapes don't apply to raw r strings. For f-strings, braces must also be escaped: \{ and \}, since they are used to delimit expressions within the string. And for c-strings, they must not contains any \0 null characters.

For byte b strings, the string must contains byte literals. The other string modifiers apply in the same way, and again, double quotes (\") must be escaped instead of single quotes (\').

Table of Contents

Struct Literals

Struct literals are literals that create a value of a struct type. That is, if we have a struct Example:

struct Example {
    a: u32,
    b: f64,
    c: String,
}

then we can create a value of type Example with the struct literal

Example {
    a: 0,
    b: 0.0,
    c: "",
}

That is, we first have the struct type name, an open { brace, the list of fields and their values, and then a closing } brace. The fields are separate by , commas (a trailing , comma is allowed), and : colons separate the field name and its value.

If the name of a field and its value expression are the same, then the : colon and value may be omitted, like so:

let c = "";
Example {
    a: 0,
    b: 0.0,
    c,
}

Furthermore, .. can be used to spread the fields of another struct into a struct literal, like so:

struct SmallExample {
    a: u32,
    b: f64,
}

let x = SmallExample {
    a: 0,
    b: 0.0,
};

Example {
    ..x,
    c: "",
}

Note that the struct type does not have to be the same, but the fields that are being spread must match between the struct types in name and type.

Table of Contents

Tuple Literals

C* has tuples, but they are simply shorthand and syntax sugar for structs. A tuple type is a finite, heterogenous list of types, such as (i32, usize, String), and its field names are unsigned integers (.0, .1, and .2 for this tuple). This is the only difference between tuples and desugaring them to structs: struct field names must be valid C* identifiers, but tuple field names begin with digits. Otherwise, they are exactly the same. The tuple type with 0 element types, (), is also valid, but it is equivalent to the () unit type.

Tuple literals mirror tuple types. The field names are unnamed (unlike struct literals), so it is just a , comma separated list of values of any type delimited by open ( and close ) parentheses. There may be a trailing , comma separator, and for 1-element tuple literals, this trailing , comma is required to distinguish it from using () parentheses for associating general expressions.

Table of Contents

Array Literals

In C*, arrays are finite, homogenous lists of a single type. There are delimited by open [ and close ] brackets, as opposed to () parentheses for tuples. Their values are also , comma separated. Trailing , commas are allowed but never required, unlike in 1-element tuple literals.

Array types are denoted [T; N], where T is any type and N: usize.

Table of Contents

Enum Literals

In an enum, such as

enum Example {
    A,
    B(i32),
}

there are two possible forms of enum literals depending on if the variant has any data or not.

In the case of the variant A, which has no data attached, the enum literal Example.A (or just A if A is imported) is a value of type Example.

In the case of the variant B, which has data attached, the enum literal Example.B is a function of type fn(i32): Example that returns the B variant with the given data attached. Thus, Example.B(0) or Example.B(100) is normally written, though the function can also be referred to by itself.

Table of Contents

Union Literals

Union literals are the same as struct literals except only one field may be specified.

Table of Contents

Function Literals

In C*, there is very little difference between function declarations and function literals (using them as values).

In function declarations, they are written

PUBLICITY fn FUNC_NAME GENERIC_ARGS ARGS = BODY_EXPRESSION

such as

fn foo<T>(t: T): T = { t * t }

In function literals, there is no more publicity modifier and the function name is optional, since it usually specified as the let binding instead if named:

fn<T>(t: T): T = { t * t }

Furthermore, type inference of function arguments and return type is allowed for function literals, since they cannot be public declarations. If the types are ambiguous, though, type annotations are still required of course.

The type of a function literal is unique and opaque, but can be casted to a function pointer like fn(T): T.

Note that annotations like @abi("C") can still be applied to function literals just like function declarations.

Table of Contents

Closure Literals

Closure literals are very similar to function literals—in fact, they are a superset of function literals—except they also have a closure context. That is, they can "enclose" over values in the current scope.

The syntax for a closure literal is simply a normal function literal with an anonymous struct literal, the closure context, following the fn.

The closure context is an anonymous struct literal in that it has no named struct type. That is, instead of

Example {a: 0, b: 0.0, c: ""}

it would just be

{a: 0, b: 0.0, c: ""}

The fields in this closure context struct are then immediately available within the function body as if they were immediately destructured.

The type of a closure literal is unique and opaque. Unlike function literals (in which there is no context), the type of closure literals cannot be casted to a bare function pointer. The closure function corresponds to a method on the closure context struct, and as such, cannot be casted to a function pointer since there is an implicit *Self argument. Thus, the only way to accept a closure as an argument is by using generics, which ensures there is no pointer indirection and the closure can be inlined into the call site.

Table of Contents

Range Literals

Range literals denote an integer range. There are a few different forms of ranges, which we will define in terms of set interval notation as to what integers the range includes. Here, n refers to the parent length that the range applies to.

Range Interval
a..b [a, b)
a.. [a, n)
..b [0, b)
.. [0, n)
a..=b [a, b]
..=b [0, b]
a..+b [a, a + b)
a..+=b [a, a + b]
a..-b [a, n - b)
a..-=b [a, n - b]
..-b [0, n - b)
..-=b [a, n - b]

Table of Contents

Function Calls

TODO

Table of Contents

Method Calls

TODO

Table of Contents

Blocks

TODO

Table of Contents

Control Flow

TODO

Table of Contents

Pattern Matching

TODO

Table of Contents

Conditionals

TODO

Table of Contents

match

TODO patterns

Table of Contents

if

if evaluates a block conditionally.

The syntax for this is expr.if block. It is syntax sugar for a match:

expr.match { true => block, false => (), }

Table of Contents

else

An else may immediately follow an if expression, in which case the whole thing becomes an if-else expression.

The syntax for this is expr.if block else block. It is syntax sugar for a match:

expr.match { true => block, false => block, } , where the block are in the same order as in the if-else expression.

Normally the expr following an else must be a block, but it can also be another if expression.

Table of Contents

Labels

TODO

Table of Contents

Loops

TODO

Table of Contents

while

TODO

Table of Contents

for

A for loop allows you to iterate through an iterator. An iterator is just a type Iter that has a fn next(self: Self) -> Option<T> method, where T is the element type we are iterating over.

The syntax for this is expr.for binding block, where the expr is a value that has a .into_iter() method returning the iterator, the binding is the binding for the element name, and block is the block of the for loop.

It is syntax sugar for:

{ let iter = expr.into_iter(); true.while { let binding = iter.next().?; block } }

Table of Contents

defer

TODO

Table of Contents

Error Handling

TODO

Table of Contents

try

TODO error handling

Table of Contents

Panicking

In C*, all fallible functions and operations return either Result or Option to indicate an error or exceptional case. Normally errors are handled by bubbling up the error with .? or handling the error directly in a match or other Option/Result methods. However, in certain cases you either don't care about handling the exceptional case or you can determine that the error case is statically impossible but the compiler cannot. In this case, you may wish to simply get the Some or Ok value out of the Option or Result. This can be done by panicking on a None or Err.

Panicking in C* means the program will immediately print out an error message and then abort, i.e., calls the libc function abort. No cleanup or unwinding is done in this case. In particular, defers on the stack are not run because the stack is not unwound. Because of this, panicking should only be done under extreme circumstances, such as statically determining the error case is impossible. If you want unwinding and defers to run, simply use .? to bubble up the errors.

The way to panic is to call .unwrap() on a Result. This is the only fundamental way to panic in C*. All other functions that panic or may panic ultimately call Result.unwrap. For example, Option.unwrap converts the Option into a Result and then calls .unwrap() on it. The same is true for Option.expect and Result.expect, which allow you to set an error message to be printed.

The error message that Result.unwrap prints to stderr is implementation defined, but it calls E.error_message to obtain the error message of the e: E in Err(e). Thus, to .unwrap() a Result<T, E>, E must have such a .error_message() method. It may also print a (function call) stack trace or error return trace, but that is not guaranteed.

There is one other option as well besides panicking. If you know for certain that the error case is impossible, you may call Result.unwrap_unchecked(). This does not panic if the Result is Err, but it is undefined behavior.

Table of Contents

Operators

Operator Arity In-Place Type Description Example
+ binary no arithmetic addition 2 + 2 , 4.0 + 2.0
- binary no arithmetic subtraction 2 - 2, 4.2 - 2.2
* binary no arithmetic multiplication 2 * 2, 4.0 * 2.0
/ binary no arithmetic division 2 / 2, 4.0 / 2.0
% binary no arithmetic modulus 2 % 2
- unary no arithmetic negation -a
== binary no relational equal to a == 2
!= binary no relational not equal to a != 2
> binary no relational greater than a > 2
< binary no relational less than a < 2
>= binary no relational greater than or equal to a >= 2
<= binary no relational less than or equal to a <= 2
&& binary no logical and a && b
|| binary no logical or a || b
!,.! unary no logical not !a
& binary no bitwise and
| binary no bitwise or
^ binary no bitwise xor
~,.~ unary no bitwise not
<< binary no bitwise left shift
>> binary no bitwise right shift
[] binary no indexing index a slice a[1]
+= binary yes arithmetic addition
-= binary yes arithmetic subtraction
*= binary yes arithmetic multiplication
/= binary yes arithmetic division
%= binary yes arithmetic modulus
&&= binary yes logical and
||= binary yes logical or
&= binary yes bitwise and
|= binary yes bitwise or
^= binary yes bitwise xor
<<= binary yes bitwise left shift
>>= binary yes bitwise right shift
++ unary yes arithmetic increment
-- unary yes arithmetic decrement
.& unary no reference reference
.&mut unary no reference mutable reference
.* unary no reference dereference
.*mut unary no reference mutable dereference
.? unary no control flow try

Arithmetic operators operate on expressions of the same number type and evaluate to the same number type as well. .$cast<>() can be used here when the operands are of different type. %, ++, and -- are not allowed for floats.

Relational operators operate on expressions of the same type and evaluate to a bool.

Logical operators operate on bool expressions and evaluate to a bool.

Bitwise operators operate on expressions of the same number type and evaluate to the same number type as well. The except is the shift operators: <<, >>, <<=, and >>=, whose right operand is the minimum unsigned integer type that may be shifted by (i.e. the bit size of the left operand). Otherwise it would be UB. For example, if the left operand is u64, then the right operand is u6. For signed integer types as the left operand, the sign bit is extended when shifting.

For indexing operators, see slices and arrays, which may be indexed.

In-place operator=s evalute to ().

Table of Contents

Generics

Generics in C* are always monomorphized.

TODO

Table of Contents

Constant Evaluation

TODO

Table of Contents

Builtin Functions

TODO

Table of Contents

Lang Types

Lang types are standard library types that the compiler knows about and may use. They are:

For example, they are used for the .? try operator.

Table of Contents

Option

enum Option<T> {
    Some(T),
    None,
}

Table of Contents

Result

enum Result<T, E> {
    Ok(T),
    Err(E),
}

Table of Contents

List of Annotations

TODO

Table of Contents

Current Restrictions and Unimplemented Features

The following features are currently unimplemented:

  • non ASCII source code (normally UTF-8 is allowed) -- this will be super low priority for us
  • targets other than x86_64-linux-gnu
  • user-defined modules, except for:
    • the implicit single-file module
    • those defined by the compiler or in the standard library
  • pub publicity modifiers (everything will be public for now)
  • any and all generic programming
  • explicit enum discriminants set to user-decided constants
  • use declarations except for the standard prelude, which is implicitly used
  • strings and characters except for byte ones, i.e.:
    • byte string literals
    • byte literals
  • growable string types (in the standard library)
  • type aliases except for:
    • those implemented by the compiler
  • most attributes except for:
    • @extern and @abi("C") for functions (for calling libc)
    • all other annotations are allowed but ignored
  • ... trailing varargs parameter for @extern @abi("C") functions unless it's needed for the standard library (using libc)
  • unions since they're only for C FFI
  • tuples since they're just sugar for structs [2]
  • if, else, for, which are just sugar for match and while [2]
  • non-temporary unsized types (slices must be references)
  • const generics
  • const evaluation other than constant literals
  • mut fields for interior mutability
  • struct spread .. syntax and field: field => field sugar [2]
  • the only Copy types are primitive types

The following features we hope to implement but will come at the end:

  • generics except for Option and Result (which will definitely be done)
  • defer and undefer (undefer more likely to skip)
  • closures and function pointers
2

may add this back if we have time since it's just sugar

Table of Contents

Grammar

Much of the grammar is specified above using italics and in words, but here is the ocamlyacc grammar:

TODO

Old Stuff Below


Statements and Expressions

Table of Contents

Statements

Due to the expression oriented nature of C* all control flow statements are themselves expressions.

Table of Contents

If-Else Statements

If-Else statements execute one of two cases. The first consists of typical C-style semantics wherein we have:

if (expr1)
    statement1
else
    statement2

Both statement1 and statement2 must evaluate to the unit type. Like C the else part of the If-Else control flow block is optional. In addition to the C-style control flow we also can have:

if (expr1)
    expr2
else
    expr3

In both cases the expressions in the if statement are evaluated and in the case they evaluate to a non-zero value the flow of execution continues down that path otherwise the body of the else statement is executed.

C* utilizes the same mechanism to eliminate ambiguity relating to a "dangling-else". An else is grouped to the nearest if. In the case of:

let i: i32 = 6;
let j: i32 = 7;

if(i > 4)
    if(j > i)
        println!("j is greater than i!");
    else
        println("j is less than or equal to i!");

While the indentation and print statements make clear which if the else clause is grouped with it should be clear that barring the use of additional brackets to direct control flow the else is grouped to the nearest if above it.

Table of Contents

For Statements

For statements can execute over a range in the case of:

for season in seasons.iter()
    println!(season);

In addition to the use of an explicit iterator it is also possible to use a range literal to bound the execution of the body of a for loop in the case of:

let mut day_ = 1;
for x in 1..365{
    println!("Day {} of 365", x);
}

Table of Contents

While Statements

Execution of the body of a while statement continues until the expression labeled expr1 evaluates to zero. For example:

while(expr1){
    statement1;
}

Similiar to if statements due to the expression oriented nature of C* statement1 must evaluate to the unit type and it is possible to replace statement1 with expr2.

Table of Contents

Defer

To aid in resource handling, C* has a defer keyword. defer defers the following statement or block until the function returns, but will run it no matter where the function returns from (but not panics/aborts) (actually, the defer will run when its block exits, but its easier to just think about function blocks first).

For example, you can use this to ensure you correctly clean up resources in a function:

extern "C" fn open(path: *u8, flags: i32): i32;
extern "C" fn close(fd: i32): i32;

fn open_file_in_dir(dir: *[u8], filename: *[u8]): Result<i32, String> try = {
    let mut path = Vec.new(Mallocator());
    defer path.free();
    try {
        if (dir.len() > 0) {
            path.extend(dir).?;
            path.push(b'/').?;
        }
        path.extend(filename).?;
        path.push(0).?;
    }.map_err(fn(_) "alloc error").?;
    
    let path = path.as_ptr();
    let fd = open(path, O_RDWR).match {
        -1 => Err("open failed"),
        fd => fd,
    }.?;
    defer println(f"opened {fd}");
    return fd;
}

In this example, you have to allocate a path to store the directory and filename you combine, and then open that path and return the file descriptor if it was successful. You have to clean up the memory allocation, though, and do that while still handling all the allocation errors and the open error. The latter can be done elegantly with try and .?, but if you mix in the path.free(), you'd have to run it before every error return, which means you have to duplicate it and not use .? anymore.

Instead, you can use defer for this. No matter where you return from the function, it will run its statement right before that. You can also use defer for any statement, not just resource cleanup, like logging for example.

However, sometimes you want to cancel a defer:

struct FilePair {
    fd1: i32,
    fd2: i32,
}

fn open_two_files(path1: *[u8], path2: *[u8]): Result<FilePair, String> try = {
    let fd1 = open_file_in_dir(b"", path1).?;
    close: defer close(fd1);
    let fd2 = open_file_in_dir(b"", path2).?;
    close: defer close(fd2);
    println(f"opened {fd1} and {fd2}");
    undefer close;
    FilePair {fd1, fd2}
}

In this example, you want open two files and return them if successfull. If only one is successful, though, that's an error and you should close the first one before returning the error. In order to do that cleanly, you can use the undefer keyword, which cancels an earlier labeled defer, in this case labeled close.

defer and undefer are actually syntax sugar for something a bit more low-level and wordy:

fn open_two_files(path1: *[u8], path2: *[u8]): Result<FilePair, String> try = {
    let fd1 = open_file_in_dir(b"", path1).?;
    let close1 = {fd1} fn() close(fd1);
    let close1 = close1.$defer());
    let fd2 = open_file_in_dir(b"", path2).?;
    let close2 = {fd1} fn() close(fd1);
    let close2 = close2.$defer());
    println(f"opened {fd1} and {fd2}");
    let close = [close2, close1];
    close.undo();
    FilePair {fd1, fd2}
}

That is, .$defer() places the closure on the stack and returns a Defer struct, which can be undone with Defer.undo() ([Defer].undo() just maps Defer.undo() over the array). Defer.undo() sets a bit in the Defer struct that it's been undone. Then when the stack unwinds, any none-undone Defers on the stack are run.

Table of Contents

Expressions and Operators

Table of Contents

Unary Operators

Unary operators are operators that can act on an expression. C* uses the unary operators "-" and "!" to represent negation and the logical not repectively. "-" negates a number literal such as

let x = -2

The logical not "!" represents negation for bool literals or boolean expressions such as

let a = true
let b = !a

where b returns the value of false.

Table of Contents

Binary Operators

A binary operator acts on two expressions and can be show as follows:

Binary operator = expr * operator * expr

Table of Contents

Assignment operator

The assignment operator stores values into vairables. It uses the keyword "let" and the = symbol so that the left side variable stores the expression on the right.

Ex.

let a = 23 // a stores the value 23

Table of Contents

Arithmetic Operator
  • The addition operator "+" adds two values of the same type. Automatic type conversion is applied when adding two number literals and can also be applied to string addition.

Ex.

1 + 2 // 3
12.3 + 10 // 22.3
"string" + "test" // "stringtest"
  • The subtraction operator "-" subtracts two values of the same type. Automatic type conversion is applied when adding two number literals.

Ex.

1 - 2 // -1
12.3 - 10 // 2.3
  • The multiplication operator "*" multiplies two values of the same type. Automatic type conversion is applied when adding two number literals.

Ex.

1 * 2 // 2
12.3 * 10 // 123
  • The division operator "/" divides two values of the same type. Automatic type conversion is applied when adding two number literals.

Ex.

1 / 2 // .5
12.3 / 10 // 1.23
  • The modulus operator "%" takes the modulus of two values of the same type. Automatic type conversion is applied when adding two number literals.

Ex.

1 % 2 // 1
12.3 % 10 // 2.3

Table of Contents

Relational Operators

Relational operators represent how the operands relate to each other. Each expression using a relational operator has two values as inputs and outputs either true or false. The relational operators are: ==, !=, <, >, <=, >=, &, |.

1 < 2 // true
1 > 2 // false
1 != 2 // true
1 == 2 // false
true | false // true
true & false // false

Table of Contents

Functions

Functions are a type of statement that can be declared one of two ways:

fn name(parameters): return type = body

or

fn name(parameters): return type = { body }

It takes in a list of parameters and returns a value based on the expression. Functions can be written with or without specifying the return type.

Ex.

fn hello(): string = "hello world"

fn adding(a, b): = { return a + b }

Table of Contents

Pattern Matching

Instead of having a switch statement like in C, C* has a generalized match statement, which can be used to match many more expressions, including integers (like in C), enum variants, dereferenced pointers, slices, arrays, and strings. Also, there is no fall-through, but match cases can be combined explicitly.

Furthermore, just like you can destructure to pattern match in a match statement, you can also do the same as a general statement, like in a let. It's like an unconditional match.

let cow = CowString::Borrowed("🐄");
let len = match cow {
    Borrowed(s) => s.len(),
    Owned(s) => s.len(),
};
let String {ptr, len} = "🐄";

Note that string literals are of the String type similarly defined as above, and you can redeclare/shadow variables like len.

Table of Contents

Methods

C* has associated functions and simple methods, though these are largely syntactic sugar. To declare these for a type, simply write:

struct Person {
    first_name: String,
    last_name: String,
}

impl Hello {

    fn new(first_name: String, last_name: String): Self = {
        Self {first_name, last_name}
    }
    
    fn say_hi1(self: Self) = {
        print(f"Hi {self.first_name} {self.last_name}");
    }
    
    fn say_hi1(self: *Self) = {
        print(f"Hi {self.last_name}, {self.first_name}");
    }
    
    fn remove_last_name(self: *mut Self) = {
        self.last_name = "";
    }
    
}

fn main() {
    let mut person = Person.new("Khyber", "Sen");
    
    {
        person.say_hi1();
        person.&.say_hi2();
        person.&mut.remove_last_name();
        person.say_hi1();
    }
    {
        Person.say_hi1(person);
        Person.say_hi2(person.&);
        Person.remove_last_name(person.&mut);
        Person.say_hi1(person);
    }
}

In this example, we first declared a struct Person, and then an impl block for Person to define methods/associated functions for it. Note that this impl block can be anywhere, even in other modules.

In the impl block, we first declared an associated function Person.new, which is just a normal function but namespaced to Person. Similarly, the other three methods are just normal functions, too, as seen when we call them explicity in the second block in main. But we can also use . syntax to call them, which just allows us to explicitly name Person.

Inside an impl block, we can also use the Self type as an alias to the type being implemented. This is especially useful with generics.

Note that the .& and *Self are explicit, because we wan't these kinds of possible costs to be noted explicitly. For example, Person.say_hi1 takes Self by value, which means it must copy the Person every time. If Person were a much larger struct, this could be very expensive and we don't want to hide that information. Also, the difference between .& and .&mut is explicit to make mutability explicit everywhere.

Table of Contents

Postfix

Most unary operators and keywords can be used postfix as well.

  • .if {}
  • .if {} else {}
  • .match {}
  • .for {}
  • .* for dereference
  • .& for pointer to
  • .&mut for mutable pointer to
  • .! for negation
  • .@() for builtins, like as (casting), size_of, etc.
    • .$cast(T): convert to T, like an int to float cast, or an int widening cast
    • .$ptr_cast<T>(): cast a pointer like *T to *U
    • .$bit_cast<T>(): reinterpret the bits, like from u32 to f32
    • .$size_of(): size of a type
    • .$align_of(): alignment of a type
    • .$call(func): call a function or closure in a unified syntax

Combined with everything being an expression, match, and having methods, this makes it much easier to write programs in a very fluid style.

Furthermore, and perhaps most importantly in practice, this makes autocompletion vastly better, because an IDE can narrow down what you may type next based on the type of the previous expression. This can't be done with postfix operators and functions (rather than methods). You get to think in one forward direction, rather than having to jump from some prefix keywords to some postfix methods and fields.

Table of Contents

Slices

C* also has slices. These are a pointer and length, and are much preferred to passing the pointer and length separately, like you usually have to do in C.

They are implemented like this (not actually, but similarly):

struct Slice<T> {
    ptr: *T,
    len: usize,
}

But they can be written as *[T]. Actually, slices are unsized types, so their type is just [T], but usually *[T] is used and that is what's equivalent to the above Slice<T>.

Unlike pointers like *T, slices can be indexed. By default, using the indexing operator, this is bounds checked for safety, but there are also unchecked methods for indexing. Usually, though, bounds checking can be elided during sequential iteration, so the performance hit is minimal, and can be side-stepped if really needed.

Slices can also be sliced to create subslices by indexing them with a range (e.x. [1..10] or [1..]). Again, this is bounds checked by default.

Table of Contents

Monadic Error-Handling

There are no exceptions in C*, just like C. It uses return values for error handling, similarly to C. But C* has much better support for this using the Option and Result types.

The definitions of these types are:

enum Option<T> {
    None,
    Some(T),
}

enum Result<T, E> {
    Ok(T),
    Err(E),
}

That is, Option represents an optional value, and Result represents either a successful Ok value or an error Err value.

There is special syntactic support for using these two monadic types for error-handling using the .? postfix operator in try blocks:

struct IndexError {
    index: usize,
}

fn get_by_index<T>(a: *[T], i: usize): Result<T, IndexError> {
    if (i < a.len()) {
        Ok(a[i])
    } else {
        Err(IndexError {index: i})
    }
}

struct IndexPair {
    first: usize,
    second: usize,
}

fn get_two_by_index<T>(a: *[T], i: usize, j: usize): Result<T, IndexError> try = {
    let first = try {
        get_by_index(a, i).?
    };
    let second = get_by_index(a, j).?;
    IndexPair {first, second}
}

This desugars to

fn get_two_by_index<T>(a: *[T], i: usize, j: usize): Result<T, IndexError> ={
    let first = try {
        get_by_index(a, i).match {
            Ok(i) => i,
            Err(e) => return Err(e),
        }
    };
    let second = get_by_index(a, j).match {
        Ok(i) => i,
        Err(e) => return Err(e),
    }
    Ok(IndexPair {first, second})
}

As you can see, without the try .? operator and try blocks, doing all the error handling with just match quickly becomes tedious. This is also kind of like a monadic do notation, except it is in C* limited to just the monads Option<T>, and Result<T, E> (over T).

Note also that try blocks can be specified at the function level as well as normal blocks.

Table of Contents

Uncatchable Panics

While monadic error-handling with Option and Result is usually superior, there are still cases where you have unrecoverable errors (maybe you don't want to handle out of memory conditions), or where you'd rather just end the program than handle the error. In this case, you can panic, which will print an error message and immediately abort.

To do this with an Option or Result, you can just call .unwrap(), which will panic if it was None or Err and return the Some or Ok value.

There is no language-supported unwinding. abort is immediately called after a panic, and only the OS cleans things up. Nothing is stopping you from calling setjmp and longjmp from C, but no unwinding of defer statements is done, and it may result in undefined behavior. There is no undefined behavior, however, in a normal panic because you just simply abort.

Table of Contents

Operator Precedence

The table below shows the operator precedence for binary and unary operators from lowest precedence to highest precedence.

Operator Description Associativity
; sequencing Left
= assignment Right
. access Left
or
& and Left
== != equality/inequality Left
< > <= >= comparison Left
+- addition/subtraction Left
*/ multiplication/division Left
- negation Right
! logical NOT Right
? conditional Left

In C* generics have a higher precedence than comparison thus removing ambiguity from "< >".

Table of Contents

Examples

Table of Contents

GCD

Here is how you write simple algorithms like GCD in C*:

fn gcd(a: i64, b: i64): i64 = {
    (fn gcd(a: u64, b: u64): u64 = {
        match b {
            0 => b,
            _ => gcd(b, a % b),
        }
    })(a.abs(), b.abs()).$cast(i64)
}

Table of Contents

Systems Programming

Here is an example program in C* for part of a simple HTTP/1.0 server, equivalent to part0 of hw3 in Jae's OS class (https://gist.github.com/RyanLee64/hash-redacted). It showcases many of C*'s notable features, like enums, methods, generics, defer, expression-orientedness, postfix operators, pattern matching, closures, monadic error handling, and byte, c, and format strings.

That code (the ported part) is ~230 LOC, while the C* below is only ~80 LOC, and it is more correct in error handling and edge cases, faster in places (though IO dominates here), and the business logic stands out more (while less important aspects like errors, resource cleanup, allocations, and string handling stay in the background). That is, C* allows you to be simulatenously more expressive while still staying correct and explicit, and the performance is just as good if not better.

enum Status {
    Ok,
    NotImplemented,
    BadRequest,
    // rest skipped for brevity
}

struct RequestLine {
    method: *[u8],
    uri: *[u8],
    version: *[u8],
}

impl RequestLine {
    fn check(self: *Self): Result<(), Status> try = {
        let Self {method, uri, version} = self.*;
        match (method, version) {
            (b"GET", b"HTTP/1.0" | b"HTTP/1.1") => {},
            _ => Err(Status.NotImplemented).?,
        }
        if uri.starts_with(b'/').! || uri.equals(b"/..") || uri.contains(b"/../") {
            Err(Status.BadRequest).?;
        }
    }
}

fn main(): Result<(), AnyError> try = {
    let (port, web_root) = std.env.argv().match {
        [_, port, web_root] => (port.parse<u16>().?, web_root),
        [program, ...] => Err(f"usage: {program} <server_port> <web_root>").?,
    };
    let server_socket = Socket.new(PF_INET, SOCK_STREAM, IPPROTO_TCP).?;
    defer server_socket.&.close();
    server_socket.&.bind(SocketAddr {
        family: AF_INET,
        addr: InetAddr {
            addr: INADDR_ANY.to_be(),
        },
        port: port.to_be(),
    }).?;
    server_socket.&.listen(5).?;
    let mut request_line_buf = Vec.new();
    defer request_line_buf.free();
    let mut line_buf = Vec.new();
    defer line_buf.free();
    loop try {
        let client_socket = server_socket.&.accept().?;
client_socket_close:
        defer client_socket.&.close();
        let mut client_stream = fdopen(client_socket.fd, c"r").?;
        undefer client_socket_close; // stream (`FILE *` in C) takes ownership
        defer client_stream.&.close();
        let line_or_status = try {
            // read and parse request line
            let line = client_stream.&mut.read_line(buf.&mut)
                .map_err(fn(_) Status.BadRequest).?
                .split(fn(b) " \t\r\n".contains(b)).match {
                    [method, uri, version] => RequestLine { method, uri, version },
                    _ => Err(Status.NotImplemented).?,
                };
            line.&.check().?;
            // read headers, skip them
            loop {
                client_stream.&mut.read_line(buf.&mut)
                    .map_err(fn(_) Status.BadRequest).?
                    .match {
                        "\n" | "\r\n" => break,
                        _ => {},
                    }
            }
            line
        }
        let (line, status) = match line_or_status {
            Ok(line) => (line, Status.Ok),
            Err(status) => (RequestLine { method: b"", uri: b"", version: b"" }, status),
        };
        client_socket.write(f"HTTP/1.0 {status.code()} {status.reason()}\r\n\r\n").?;
        match line_or_status {
            Ok(_) => handle_request(web_root, line.uri, client_socket).?,
            Err(_) => client_socket.write(f"<html><body>\n<h1>{status.code()} {status.reason()}</h1>\n</body></html>").?;
        }
        eprintln(f"{client_socket.addr} \"{line.method} {line.uri} {line.version}\" {status.code()} {status.reason()}").?;
    }
}

Table of Contents