Skip to content

ndantam/sycamore

Repository files navigation

SYCAMORE

Fast, purely functional data structures in Common Lisp.

API Documentation: http://ndantam.github.io/sycamore

Comprehensive Set Benchmarks: https://ndantam.github.io/sycamore/bench.html

Features

Installation

Examples

See also ./example.lisp

Hash Sets

Create a set with default EQL test:

CL-USER> (sycamore:make-hash-set)

#<SYCAMORE:HASH-SET {}>

Create a set with default EQL test and initial elements from a list:

CL-USER> (sycamore:list-hash-set '(1 2 -10 40))

#<SYCAMORE:HASH-SET {40 1 -10 2}>

Create a set with EQUALP test:

CL-USER> (sycamore:list-hash-set '(1 1.0 2.0 2) :test #'equalp)

#<SYCAMORE:HASH-SET {1 2.0}>

Insertion:

CL-USER> (sycamore:hash-set-insert (sycamore:list-hash-set '(1 2))
                                   0)
#<SYCAMORE:HASH-SET {1 2 0}>

Lookup:

CL-USER> (sycamore:hash-set-find (sycamore:list-hash-set '(0 1 2))

0
T

Removal:

CL-USER> (sycamore:hash-set-remove (sycamore:list-hash-set '(1 2 0))
                                   0)

#<SYCAMORE:HASH-SET {1 2}>

Union:

CL-USER> (sycamore:hash-set-union (sycamore:list-hash-set '(1 2 0))
                                  (sycamore:list-hash-set '(1 0 3)))


#<SYCAMORE:HASH-SET {3 1 2 0}>

Intersection:

CL-USER> (sycamore:hash-set-intersection (sycamore:list-hash-set '(1 2 0))
                                         (sycamore:list-hash-set '(1 0 3)))

#<SYCAMORE:HASH-SET {1 0}>

Difference:

CL-USER> (sycamore:hash-set-difference (sycamore:list-hash-set '(1 2 0))
                                       (sycamore:list-hash-set '(1 0 3)))

#<SYCAMORE:HASH-SET {2}>

Map set:

CL-USER> (sycamore:map-hash-set 'list #'1+
                                (sycamore:list-hash-set '(1 2 0)))

(2 3 1)

Fold set:

CL-USER> (sycamore:fold-hash-set (lambda (list item) (cons item list))
                                 nil
                                 (sycamore:list-hash-set '(1 2 0)))

(1 2 0)

Tree Sets

Define an ordering function:

CL-USER> (defun compare (a b)
           (cond ((< a b) -1)
                 ((> a b) 1)
                 (t 0)))

COMPARE

Create a set for integers:

CL-USER> (sycamore:tree-set #'compare 1 2 -10 40)

#<TREE-SET {-10 1 2 40}>

Insertion:

CL-USER> (sycamore:tree-set-insert (sycamore:tree-set #'compare 1 2)
                                   0)
#<TREE-SET {0 1 2}>

Lookup:

CL-USER> (sycamore:tree-set-find (sycamore:tree-set #'compare 1 2 0)
                                 0)
0
T

Removal:

CL-USER> (sycamore:tree-set-remove (sycamore:tree-set #'compare 1 2 0)
                                   0)
#<TREE-SET {1 2}>

Union operation:

CL-USER> (sycamore:tree-set-union (sycamore:tree-set #'compare 1 2)
                                  (sycamore:tree-set #'compare 1 0 3))

#<TREE-SET {0 1 2 3}>

Intersection operation:

CL-USER> (sycamore:tree-set-intersection (sycamore:tree-set #'compare 1 2)
                                         (sycamore:tree-set #'compare 1 0 3))

#<TREE-SET {1}>

Difference operation:

CL-USER> (sycamore:tree-set-difference (sycamore:tree-set #'compare 1 2)
                                       (sycamore:tree-set #'compare 1 0 3))

#<TREE-SET {2}>

Map set:

CL-USER> (sycamore:map-tree-set 'list #'1+
                                (sycamore:tree-set #'compare 1 0 10 2))

(1 2 3 11)

Fold set:

CL-USER> (sycamore:fold-tree-set (lambda (list item) (cons item list))
                                 nil
                                 (sycamore:tree-set #'compare 1 0 10 2))

(10 2 1 0)

Ropes

Create a Rope:

CL-USER> (sycamore:rope "Hello" #\Space 'World!)

#<ROPE "Hello WORLD!">

Also works on lists:

CL-USER> (sycamore:rope (list "Hello" #\Space 'World!))

#<ROPE "Hello WORLD!">

And arrays:

CL-USER> (sycamore:rope (vector "Hello" #\Space 'World!))

#<ROPE "Hello WORLD!">

Rope to string:

CL-USER> (sycamore:rope-string (sycamore:rope "Hello" #\Space 'World!))

"Hello WORLD!"

Print a rope:

CL-USER> (sycamore:rope-write (sycamore:rope "Hello" #\Space 'World!)
                              :escape nil :stream *standard-output*)

Hello WORLD!

What collection data structure should I use?

Concept of Operations

  • Two classes of set and map collections: hashes and trees.
  • Hashes require a hash function and an equality function. ANSI CL provides SXHASH and EQ/EQL/EQUAL/EQUALP/=.
  • Trees require a total order (for any two elements A and B: does A come before B, does B come before A, or are A and B equal?)
  • Hashes offer better asymptotic (big-O) and constant factor running times (O(1)) than trees (O(log(n))).
  • In some less-typical cases (e.g., adversarial key selection, very large collections on 32 bit machines), hash collisions could reduce hash performance, while balanced trees guarantee consistent performance in such cases.

Q/A

  • Q: Is your collection tiny or used outside a performance-critical path?

    • Yes: Use a list. Or whatever you care to. For collections that are small and/or not performance-critical, it probably won't much matter.

    • No: Let's discuss!

  • Q: Are you very certain you will only ever have one version of your structure that you need? (no sharing between modules, no backtracking, maybe used only within a single, small function)?

    • Yes: A mutable structure such as ANSI CL hash-tables may offer the best performance.

    • No: Let's discuss!

  • Q: Are your elements/keys hashable and equality-comparable (lisp primitives and lists are hashable/equality-comparable), and are keys not adversarially selected?

    • Yes: Hash Array Mapped Tries (HAMTs) could offer the best performance.

    • No, I either don't have a hashing function or my keys could be adversarially selected.

  • Q: Are your keys adversarially selected but able to be cryptographically hashed?

    • Yes: In principle, HAMTs using a cryptographic hashing function would be robust to adversarially selected keys. However, strong cryptographic hashes require >64 bits, while the current HAMT implementation in Sycamore assumes that hash-codes are FIXNUMs (aligning with ANSI CL SXHASH). This limitation may be addressed in the future by adding support for larger (e.g., bit-vector) hash-codes, but for now WB-trees may be best for adversarial cases.
  • Q: Are your keys not hashable but totally ordered (for any two elements A and B: you can decide if A and B are equal or if A comes before B, or vice versa)?

    • Yes: Weight-balanced trees could offer good performance. They may be slighter slower than HAMTs, but require comparison rather than hashing functions.

Alternatives

There are many other Common Lisp data structure libraries. Here are a few alternatives and their trade-offs relative to Sycamore.

ANSI CL Hash Tables

Mutable hash tables are very fast. If you don't need persistence or set operations (union, intersection, difference, subset), then a mutable hash table could be a good choice. Included benchmarks show that SBCL's hash tables are around twice as fast as Sycamore HAMTs for construction and lookup.

FSet

https://gitlab.common-lisp.net/fset/fset

FSet implements finite sets using a data structure similar to (and which in part inspired) Sycamore's WB-tree implementation. FSet trees support partial orders, whereas Sycamore trees assume a total order (Sycamore HAMTs don't require any order). FSet uses a single, global ordering for elements via a generic function, while Sycamore orders elements with a per-collection COMPARE parameter or hashes elements with per-collection HASH-FUNCTION and TEST parameters (defaulting to ANSI CL SXHASH and EQL).

Included benchmarks show that Sycamore WB-trees are 1.5-3x faster than FSET, and Sycamore HAMTs are 2-10x faster than FSet.

CL-HAMT

https://github.com/danshapero/cl-hamt

CL-HAMT is an implementation of hash array mapped tries. Included benchmarks show that Sycamore HAMTs are 10x faster than CL-HAMT.

CL-Containers

https://github.com/gwkkwg/cl-containers

CL-Containers is stateful/mutable/imperative, while Sycamore is purely-functional/persistent.

Lisp Interface Library (LIL)

https://github.com/fare/lisp-interface-library

Lisp Interface Library (LIL) provides abstracted data structures using Interface Passing Style, while Sycamore provides a few concrete data structures. LIL's Interface Passing Style presumably improves flexibility at the cost of runtime overhead and API complexity. Sycamore's explicit data structures have low-overhead, optimized implementations and a simple, documented API.

References

  • Okasaki, Chris. "Purely Functional Data Structures." Cambridge University Press. June 1999. ISBN: 978-0521663502.

  • Boehm, H.J., Atkinson, R. and Plass, M., 1995. Ropes: an alternative to strings. Software: Practice and Experience, 25(12), pp.1315-1330.

  • Adams, Stephen., 1992. Implementing sets efficiently in a functional language. University of Southampton. Tech Report CSTR 92-10.

  • Bagwell, Phil. Ideal Hash Trees. EPFL Technical Report, 2001.

  • Michael J. Steindorfer and Jurgen J. Vinju. 2015. Optimizing hash-array mapped tries for fast and lean immutable JVM collections. SIGPLAN Not. 50, 10 (October 2015), 783–800. https://doi.org/10.1145/2858965.2814312

Name

http://en.wikipedia.org/wiki/Platanus_occidentalis

Oh, the moonlight's fair tonight along the Wabash,
From the fields there comes the breath of newmown hay.
Through the sycamores the candle lights are gleaming,
On the banks of the Wabash, far away.

    - Paul Dresser
      "On the Banks of the Wabash, Far Away"
      1897

About

A fast, purely functional data structure library in Common Lisp.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages