Using LLaMA.cpp models in Ruby

In this blog post, we will teach you how to use LLaMa (Meta's AI) models using ruby for your applications and projects.

LLaMa is the Meta’s AI that “accidentally” had been shared by torrent and is now available to everyone. LLaMa.cpp is a project that provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. Many cool projects have been made with LLaMa.cpp, such as GPT4ALL for instance.

A library in C++ can be used to create ports to other languages, and luckily, a port to Ruby has been made: yoshoku/llama_cpp.rb! This means that now we can directly use LLaMa models like Alpaca or Vicuna in Ruby (and Node.js too, using hlhr202/llama-node).

The instructions on how to run it were a little cryptic, and I couldn’t find a straightforward step-by-step tutorial on how to use it. But the process is already quite simple if you know the steps.

First of all, you need to have llama_cpp.rb in your project:

$ bundle add llama_cpp

Then you need to download the model; the easiest way is to download it from Hugging Face. All the Hugging Face models are stored in git and we can download them, but we need to keep in mind that models are very large (some GBs in size), so it is recommended to use the git large file storage.

To download ggml-vicuna-7b-4bit, for example, you can run:

$ git lfs install
$ git clone git@hf.co:chharlesonfire/ggml-vicuna-7b-4bit ./models

Then it is ready to be used in your code. I’ll share with you an experiment that I’m doing in Jambots to implement it:

#!/usr/bin/env ruby

require "llama_cpp"
require "bundler/setup"

model_path = "./models/ggml-vicuna-7b-4bit/ggml-vicuna-7b-q4_0.bin"
prompt = <<~HEREDOC
        Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User’s requests immediately and with precision.
        User: Hello, Bob.
        Bob: Hello. How may I help you today?
        User: Please tell me the largest city in Europe.
        Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
        User:
HEREDOC

if ARGV.empty?
        puts "Message required"
        exit(1)
end

messages = ARGV[0]

client = LLaMACpp::Client.new(model_path: model_path, n_threads: 4, seed: 12)
output = client.completions("#{prompt} #{messages}")

puts output

It runs a bit slow on my computer, and the arguments of llama.cpp are a bit complicated, but at least now we can use this ecosystem in Ruby and Node.

Enjoy tinkering with it!

Using LLaMA.cpp models in Ruby

References

Related articles

So, do you guys do only Ruby for the backend?

Building a Ruby CLI with Thor

Use GoLang code in Ruby