Skip to content

edqx/microwave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microwave

A TOML parser for Zig.

This parser should be spec compliant.

Features

  • Parse all spec-compliant TOML documents.
    • WIP: parses all valid TOML files, but also parses some invalid ones, see Spec Compliancy
  • Use Zig readers and writers
  • Populate structs
  • Populate dynamic values
  • TOML builder/write stream
  • Stringify entire structs and tables

TODO

These features are yet to be implemented, and are actively being worked on, in order of severity:

  • Check files for invalid control sequences and characters
  • Fix parsing issues related to keys and tables being re-defined
  • Check integer literals against the spec (leading zeroes are currently allowed)

Usage

Microwave has 5 sets of APIs:

Parser API

Microwave allows you to parse an entire TOML file from either a slice or reader into a tree-like structure that can be traversed, inspected or modified manually.

From Slice

const document = try microwave.parse.fromSlice(allocator, toml_text);
defer document.deinit(); // all pointers will be freed using the created internal arena

// use document.root_table

From Reader

const document = try microwave.parse.fromReader(allocator, file.reader());
defer document.deinit();

// use document.root_table

Owned Allocations API

If you would like to personally own all of the pointers without creating an arena for them, use the *Owned variation of the functions.

These return a parse.Value.Table directly, representing the root table of the TOML file.

The best way to free the resulting root table is to use parse.deinitTable.

var owned_tree = try microwave.parse.fromSliceOwned(allocator, toml_text); // or .fromReaderOwned
defer microwave.parse.deinitTable(allocator, &owned_tree);

// use owned_tree

Value API

pub const Value = union(enum) {
    pub const Table = struct {
        keys: std.StringArrayHashMapUnmanaged(Value),
    };

    pub const Array = std.ArrayListUnmanaged(Value);
    pub const ArrayOfTables = std.ArrayListUnmanaged(Table);

    pub const DateTime = struct {
        date: ?[]const u8 = null,
        time: ?[]const u8 = null,
        offset: ?[]const u8 = null,

        pub fn dupe(self: DateTime, allocator: std.mem.Allocator) !DateTime;
        pub fn deinit(self: DateTime, allocator: std.mem.Allocator) ;
    };

    none: void,
    table: Table,
    array: Array,
    array_of_tables: ArrayOfTables,
    string: []const u8,
    integer: i64,
    float: f64,
    boolean: bool,
    date_time: DateTime,

    pub fn dupeRecursive(self: Value, allocator: std.mem.Allocator) !Value;
    pub fn deinitRecursive(self: *Value, allocator: std.mem.Allocator) void;
};

Populate API

It's often helpful to map a TOML file directly onto a Zig struct, for example for config files. Microwave lets you do this using the Populate(T) API:

const Dog = struct {
    pub const Friend = struct {
        name: []const u8,
    };

    name: []const u8,
    cross_breeds: []const []const u8,
    age: i64,

    friends: []Friend,
    
    vet_info: microwave.parse.Value.Table,
}

const dog = try microwave.Populate(Dog).createFromSlice(allocator, toml_text); // or .createFromReader
defer dog.deinit();

Struct Shape

Since TOML only supports a subset of the types that are available in Zig, your destination struct must consist of the following types:

TOML Type Zig Type Examples
String []const u8 "Barney"
Float f64 5.0e+2
Integer i64, f64 16
Boolean bool true, false
Date/Time parse.Value.DateTime 2025-04-19T00:43:00.500+05:00
Specific Table struct { ... } { name = "Barney", age = 16 }
Array of Tables []struct {} [[pet]]
Inline Array []T ["Hello", "Bonjour", "Hola"]
Any Table parse.Value.Table Any TOML table
Any Value parse.Value Any TOML value

You can also specify an option of different types using unions. For example:

const Animal = union(enum) {
    dog: struct {
        name: []const u8,
        breed: []const u8,
    },
    cat: struct {
        name: []const u8,
        number_of_colours: usize,
    },
};

const animal = try microwave.Populate(Animal).createFromSlice(allocator, toml_text);
defer animal.deinit();

If the field is entirely optional and may not exist, use the Zig optional indiciator on the type, for example:

const Person = struct {
    name: []const u8,
    age: i64,
    salary: f64,
    job: ?[]const u8, // can be missing from the TOML file
};

const person = try microwave.Populate(Person).createFromSlice(allocator, toml_text);
defer person.deinit();

Owned Allocations API

Like the parser API, you might want to own the pointers yourself rather than delegate them to an arena. You can use the *Owned variations of the functions.

These return the value directly.

You can free the data in the returned value however you want, but if you're using an stack-based allocator like arena or fixed buffer allocator, then it's best to use Populate(T).deinitRecursive.

var dog = try microwave.Populate(Dog).createFromSliceOwned(allocator, toml_text);
defer microwave.Populate(Dog).deinitRecursive(allocator, &dog);

'Into' API

Instead of making Microwave create the value to populate, you can provide it with a pointer to an existing one to populate using the into* functions:

var dog: Dog = undefined;
try microwave.Populate(Dog).intoFromSliceOwned(allocator, &dog); // or .intoFromReaderOwned
defer microwave.Populate(Dog).deinitRecursive(allocator, &dog);

Stringify API

Microwave can try its best to serialise a given struct value or parse.Value.Table into a writer:

try microwave.stringify.write(allocator, dog, file.writer());
try microwave.stringify.writeTable(allocator, root_table, file.writer());

Note

There's no need to de-init anything, the allocator is for temporary allocations.

Write Stream API

You can build a TOML file manually, with safety assertions that the file is well-formed, using the write stream API:

var stream: microwave.write_stream.Stream(@TypeOf(file.writer()), .{
    .newlines = .lf,
    .unicode_full_escape_strings = false,
    .format_float_options = .{
        .mode = .scientific,
        .precision = null,
    },
    .date_time_separator = .t,
}) = .{
    .underlying_writer = file.writer(),
    .allocator = allocator,
};
defer stream.deinit();

You can use the following functions on the write_stream.Stream struct to build your TOML file:

pub fn beginDeepKeyPair(self: *Stream, key_parts: []const []const u8) !void;
pub fn beginKeyPair(self: *Stream, key_name: []const u8) !void;

pub fn writeString(self: *Stream, string: []const u8) !void;
pub fn writeInteger(self: *Stream, integer: i64) !void;
pub fn writeFloat(self: *Stream, float: f64) !void;
pub fn writeBoolean(self: *Stream, boolean: bool) !void;
pub fn writeDateTime(self: *Stream, date_time: parse.Value.DateTime) !void;

pub fn beginArray(self: *Stream) !void;
pub fn arrayLine(self: *Stream) !void;
pub fn endArray(self: *Stream) !void;

pub fn beginInlineTable(self: *Stream) !void;
pub fn endInlineTable(self: *Stream) !void;

pub fn writeDeepTable(self: *Stream, key_parts: []const []const u8) !void;
pub fn writeTable(self: *Stream, key_name: []const u8) !void;

pub fn writeDeepManyTable(self: *Stream, key_parts: []const []const u8) !void;
pub fn writeManyTable(self: *Stream, key_name: []const u8) !void;

Scanner API

As a low level API, Microwave also provides the ability to scan through a file and iterate through individual tokens.

Only basic state checks are done at this stage, and that state you have to manage yourself. It doesn't guarantee a well-formed TOML file. Most of those checks are done in the parsing stage.

Whole Slice Scanner

If you have access to the entire slice of the TOML file, you can initialise the scanner directly:

var scanner: microwave.Scanner = .{ .buffer = slice };

while (try scanner.next()) |token| {
    // token.kind, token.range.start, token.range.end

    // modify state with scanner.setState(state)
}

The default scanner may return any of the following errors:

pub const Error = error{ UnexpectedEndOfBuffer, UnexpectedByte };

Buffered Reader Scanner

You can also tokenise the TOML file using a reader:

var scanner = microwave.Scanner.bufferedReaderScanner(file.reader());

// use scanner.next() in the same way

The buffered reader scanner may return any of the following errors:

pub const Error = error{ UnexpectedEndOfBuffer, UnexpectedByte, BufferTooSmall };

Managing State

A TOML file can be tokenised differently depending on what kind of entities need to be read. The scanner API doesn't manage this for you, but with your own reading logic you can update the state of the scanner using the scanner.setState function:

while (try scanner.next()) |token| {
    if (token.kind == .table_start) {
        scanner.setState(.table_key);
    }
    if (token.kind == .table_end) {
        scanner.setState(.root);
    }
}

The valid states are listed below:

State Name Enum Value Description
Root .root Either ordinary newline-separated keys, or [table] and [[many table]] structures
Table Key .table_key The keys inside [ .. ] and [[ ... ]]
Inline Key .inline_key Delimeter-separated inline table keys
Value .value An ordinary value literal, array or inline table opening token
Array Container .array_container Same as .value, but can process array close tokens and value delimeters

The default state is .root.

Handling Errors

When encountering an error, you can use scanner.cursor() to get the file offset that it occurred at.

If you encounter error.BufferTooSmall while using the buffered reader scanner, you can increase the size of the buffer for your project by instantiating Scanner.BufferedReaderScanner directly:

var scanner = microwave.Scanner.BufferedReaderScanner(8192, @TypeOf(file.reader())) = .{
    .reader = file.reader(),
};

Token Contents

To access the contents of a token, you can use the scanner.tokenContents function:

while (try scanner.next()) |token| {
    if (token.kind == .string) {
        std.log.info("Found string! {s}", .{ scanner.tokenContents(token) });
    }
}

Note

For the buffered reader scanner, previous token contents may be invalidated at any point while iterating.

Related Projects

Check out my other project, dishwasher for parsing XML files.

Why 'Microwave'?

Not sure.

Spec Compliancy

See the tests folder to check Microwave against the various official TOML test cases.

All failed tests are false positives, which means Microwave can read all valid TOML files, but can also read many invalid ones too.

- fail: invalid/control/bare-cr.toml
- fail: invalid/control/comment-cr.toml
- fail: invalid/control/comment-del.toml
- fail: invalid/control/comment-ff.toml
- fail: invalid/control/comment-lf.toml
- fail: invalid/control/comment-null.toml
- fail: invalid/control/comment-us.toml
- fail: invalid/control/multi-cr.toml
- fail: invalid/control/multi-del.toml
- fail: invalid/control/multi-lf.toml
- fail: invalid/control/multi-null.toml
- fail: invalid/control/multi-us.toml
- fail: invalid/control/rawmulti-cr.toml
- fail: invalid/control/rawmulti-del.toml
- fail: invalid/control/rawmulti-lf.toml
- fail: invalid/control/rawmulti-null.toml
- fail: invalid/control/rawmulti-us.toml
- fail: invalid/encoding/bad-codepoint.toml
- fail: invalid/encoding/bad-utf8-in-comment.toml
- fail: invalid/encoding/bad-utf8-in-multiline-literal.toml
- fail: invalid/encoding/bad-utf8-in-string-literal.toml
- fail: invalid/float/leading-zero.toml
- fail: invalid/float/leading-zero-neg.toml
- fail: invalid/float/leading-zero-plus.toml
- fail: invalid/inline-table/duplicate-key-3.toml
- fail: invalid/inline-table/overwrite-02.toml
- fail: invalid/inline-table/overwrite-05.toml
- fail: invalid/inline-table/overwrite-08.toml
- fail: invalid/integer/leading-zero-1.toml
- fail: invalid/integer/leading-zero-2.toml
- fail: invalid/integer/leading-zero-3.toml
- fail: invalid/integer/leading-zero-sign-1.toml
- fail: invalid/integer/leading-zero-sign-2.toml
- fail: invalid/integer/leading-zero-sign-3.toml
- fail: invalid/spec/inline-table-2-0.toml
- fail: invalid/spec/table-9-0.toml
- fail: invalid/spec/table-9-1.toml
- fail: invalid/table/append-with-dotted-keys-1.toml
- fail: invalid/table/append-with-dotted-keys-2.toml
- fail: invalid/table/duplicate.toml
- fail: invalid/table/duplicate-key-dotted-table.toml
- fail: invalid/table/duplicate-key-dotted-table2.toml
- fail: invalid/table/redefine-2.toml
- fail: invalid/table/redefine-3.toml
- fail: invalid/table/super-twice.toml
passing: 512/557

License

All microwave code is under the MIT license.

About

TOML Parser for Zig.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages