Skip to content

miku/localmodels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Testdriving OLLAMA

2023-11-21, Leipzig Gophers Meetup #38, Martin Czygan, L Gopher, Open Data Engineer at IA

Short talk about running local models, using Go tools.

Personal Timeline

"What a difference a week makes"

I am going to assert that Riley is the first Staff Prompt Engineer hired anywhere.

  • on 2023-02-14 (+9w), I ask a question on how long before we can run things locally at the Leipzig Python User Group -- personally, I expected 2-5 years timeline
  • on 2023-04-18 (+9w), we discuss C/GO and ggml (ai-on-the-edge) at Leipzig Gophers #35
  • on 2023-07-20 (+13w), ollama is released (with two models), HN
  • on 2023-11-21 (+17w), today, 43 models (each with a couple of tags/versions)

Confusion

Turing Test was proposed in 1950. From Nature, 2023-07-23: Understanding ChatGPT is a bold new challenge for science

This lack of robustness signals a lack of reliability in the real world.

What I cannot create, I do not understand.

Open models not binary:

We propose a framework to assess six levels of access to generative AI systems, from The Gradient of Generative AI Release: Methods and Considerations:

  • fully closed
  • gradual or staged access
  • hosted access
  • cloud-based or API access
  • downloadable access and
  • fully open.

A prolific AI Researcher (with 387K citations in the past 5 years) believes open source AI is ok for less capable models: Open-Source vs. Closed-Source AI

For today, let's focus on Go. Go is a nice infra language, what projects exist for model infra?

  • going to look at a tool, from the outside and a bit from the inside

POLL

OLLAMA

  • first appeared in 07/2023 (~18 weeks ago)
  • very inspired by docker, not images, but models
  • built on llama (meta), GGML ai-on-the-edge ecosystem, especially using GGUF - a unified image format
  • docker may be considered less a glorified nsenter, but more (lots of) glue to go from spec to image to process, code lifecycle management; similarly ollama may be a way to organize the ai "model lifecycle"
  • clean developer UX

Time-to-chat

From zero to chat in about 5 minutes, on a power-efficient CPU. Started w/ 2 models, as of 11/2023 hosting 43 models.

$ git clone git@github.com:jmorganca/ollama.git
$ cd ollama
$ go generate ./... && go build . # cp ollama ...

Follows a client server model, like docker.

$ ollama serve

Once it is running, we can pull models.

$ ollama pull llama2
pulling manifest
pulling 22f7f8ef5f4c... 100% |..
pulling 8c17c2ebb0ea... 100% |..
pulling 7c23fb36d801... 100% |..
pulling 2e0493f67d0c... 100% |..
pulling 2759286baa87... 100% |..
pulling 5407e3188df9... 100% |..
verifying sha256 digest
writing manifest
removing any unused layers
success

Some examples

$ ollama run zephyr
>>> please complete: {"author": "Turing, Alan", "title" ... }

{
  "author": "Alan Turing",
  "title": "On Computable Numbers, With an Application to the Entscheidungsproblem",
  "publication_date": "1936-07-15",
  "journal": "Proceedings of the London Mathematical Society. Series 2",
  "volume": "42",
  "pages": "230–265"
}

Formatting mine.

More

The whole prompt engineering thing is kind of mysterious to me. Do you get better output by showing emotions?

To this end, we first conduct automatic experiments on 45 tasks using various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4.

Batch Mode

[GIN-debug] POST   /api/pull       --> gith...m/jmo...ma/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate   --> gith...m/jmo...ma/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/embeddings --> gith...m/jmo...ma/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create     --> gith...m/jmo...ma/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push       --> gith...m/jmo...ma/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy       --> gith...m/jmo...ma/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete     --> gith...m/jmo...ma/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show       --> gith...m/jmo...ma/server.ShowModelHandler (5 handlers)
[GIN-debug] GET    /               --> gith...m/jmo...ma/server.Serve.func2 (5 handlers)
[GIN-debug] GET    /api/tags       --> gith...m/jmo...ma/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /               --> gith...m/jmo...ma/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags       --> gith...m/jmo...ma/server.ListModelsHandler (5 handlers)

Specifically /api/generate/

Constraints

  • possible to enforce JSON generation

Customizing models

weights, configuration, and data in a single package

Using a Modelfile.

FROM llama2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system prompt to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

Freeze this as a custom package:

$ ollama create llama-mario -f custom/Modelfile.mario
$ ollama run llama-mario

About 16 parameters to tweak: Valid Parameters and Values

Task 1: "haiku"

  • generate a small volume of Go programming haiku
// haikugen generates
// JSON output for later eval
// cannot parallelize

Task 2: "bibliography"

  • given unstructured strings, parse the to json
  • unstructured

Credits

About

Lightning Talk on running local models with Go.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published