Package: llamaR 0.2.4

llamaR: Interface for Large Language Models via 'llama.cpp'

Provides 'R' bindings to 'llama.cpp' for running Large Language Models ('LLMs') locally with optional 'Vulkan' GPU acceleration via 'ggmlR'. Supports model loading, text generation, 'tokenization', token-to-piece conversion, 'embeddings' (single and batch), encoder-decoder inference, low-level batch management, chat templates, 'LoRA' adapters, explicit backend/device selection, multi-GPU split, and 'NUMA' optimization. Includes a high-level 'ragnar'-compatible embedding provider ('embed_llamar'). Built on top of 'ggmlR' for efficient tensor operations.

Authors:Yuri Baramykov [aut, cre], Georgi Gerganov [cph]

llamaR_0.2.4.tar.gz
llamaR_0.2.4.zip(r-4.7)llamaR_0.2.4.zip(r-4.6)llamaR_0.2.4.zip(r-4.5)
llamaR_0.2.4.tgz(r-4.6-x86_64)llamaR_0.2.4.tgz(r-4.6-arm64)llamaR_0.2.4.tgz(r-4.5-x86_64)llamaR_0.2.4.tgz(r-4.5-arm64)
llamaR_0.2.4.tar.gz(r-4.7-arm64)llamaR_0.2.4.tar.gz(r-4.7-x86_64)llamaR_0.2.4.tar.gz(r-4.6-arm64)llamaR_0.2.4.tar.gz(r-4.6-x86_64)
manual.pdf |manual.html
card.svg |card.png
llamaR/json (API)
NEWS

# Install 'llamaR' in R:
install.packages('llamaR', repos = c('https://zabis13.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/zabis13/llamar/issues

Uses libs:
  • c++– GNU Standard C++ Library v3
  • openmp– GCC OpenMP (GOMP) support library

On CRAN:

Conda:

cppopenmp

6.51 score 5 stars 64 scripts 485 downloads 88 exports 4 dependencies

Last updated from:529a31f148. Checks:12 OK, 1 FAIL. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK293
linux-devel-x86_64OK307
source / vignettesOK503
linux-release-arm64OK277
linux-release-x86_64OK298
macos-release-arm64OK207
macos-release-x86_64OK443
macos-oldrel-arm64OK214
macos-oldrel-x86_64OK597
windows-develOK394
windows-releaseOK327
windows-oldrelOK405
wasm-releaseFAIL306

Exports:chat_llamarchat_llamar_stopembed_llamarllama_backend_devicesllama_batch_freellama_batch_initllama_chat_apply_templatellama_chat_builtin_templatesllama_chat_templatellama_detokenizellama_embed_batchllama_embeddingsllama_encodellama_free_contextllama_free_modelllama_gen_beginllama_gen_endllama_gen_nextllama_generatellama_generate_batchllama_get_embeddingsllama_get_embeddings_ithllama_get_embeddings_seqllama_get_logitsllama_get_logits_ithllama_get_modelllama_get_verbosityllama_hf_cache_clearllama_hf_cache_dirllama_hf_cache_infollama_hf_downloadllama_hf_listllama_load_modelllama_load_model_hfllama_lora_applyllama_lora_clearllama_lora_loadllama_lora_removellama_max_devicesllama_memory_breakdown_printllama_memory_can_shiftllama_memory_clearllama_memory_seq_addllama_memory_seq_cpllama_memory_seq_divllama_memory_seq_keepllama_memory_seq_pos_rangellama_memory_seq_rmllama_model_infollama_model_metallama_model_meta_valllama_n_batchllama_n_ctxllama_n_ctx_seqllama_n_seq_maxllama_n_threadsllama_n_threads_batchllama_n_ubatchllama_new_contextllama_numa_initllama_perfllama_perf_printllama_perf_resetllama_pooling_typellama_serve_openaillama_set_abort_callbackllama_set_causal_attnllama_set_threadsllama_set_verbosityllama_set_warmupllama_state_get_sizellama_state_loadllama_state_savellama_supports_gpullama_supports_mlockllama_supports_mmapllama_supports_rpcllama_synchronizellama_system_infollama_time_usllama_token_to_piecellama_tokenizellama_vocab_get_scorellama_vocab_get_textllama_vocab_infollama_vocab_is_controlllama_vocab_is_eogllama_vocab_type

Dependencies:genericsggmlRjsonliteR6

Chat and Agents

Rendered fromchat-and-agents.Rmdusingknitr::rmarkdownon Jun 01 2026.

Last update: 2026-05-27
Started: 2026-05-27

Getting Started with llamaR

Rendered fromgetting-started.Rmdusingknitr::rmarkdownon Jun 01 2026.

Last update: 2026-05-27
Started: 2026-05-27

Readme and manuals

Help Manual

Help pageTopics
Chat with a local model through an ellmer::Chat objectchat_llamar
Stop the server spawned by chat_llamar()chat_llamar_stop
Embedding provider for ragnar / standalone useembed_llamar
List available backend devicesllama_backend_devices
Free a llama batch allocated with 'llama_batch_init()'llama_batch_free
Initialise a llama batchllama_batch_init
Apply chat template to messagesllama_chat_apply_template
List built-in chat templatesllama_chat_builtin_templates
Get model's built-in chat templatellama_chat_template
Detokenize token IDs back to textllama_detokenize
Batch embeddings for multiple textsllama_embed_batch
Extract embeddings for a textllama_embeddings
Encode tokens using the encoder (encoder-decoder models only)llama_encode
Free an inference contextllama_free_context
Free a loaded modelllama_free_model
Begin a streaming (token-by-token) generationllama_gen_begin
Finish a streaming generationllama_gen_end
Pull the next chunk of a streaming generationllama_gen_next
Generate text from a promptllama_generate
Generate completions for multiple prompts in parallelllama_generate_batch
Get all output token embeddings as a matrixllama_get_embeddings
Get embeddings for the i-th token in the batchllama_get_embeddings_ith
Get pooled embeddings for a sequencellama_get_embeddings_seq
Get logits from the last decode stepllama_get_logits
Get logits for a specific token positionllama_get_logits_ith
Get the model associated with a contextllama_get_model
Get current verbosity levelllama_get_verbosity
Clear the model cachellama_hf_cache_clear
Get the cache directory for downloaded modelsllama_hf_cache_dir
Show information about the model cachellama_hf_cache_info
Download a GGUF model from Hugging Facellama_hf_download
List GGUF files in a Hugging Face repositoryllama_hf_list
Load a GGUF model filellama_load_model
Load a model directly from Hugging Facellama_load_model_hf
Apply a LoRA adapter to contextllama_lora_apply
Remove all LoRA adapters from contextllama_lora_clear
Load a LoRA adapterllama_lora_load
Remove a LoRA adapter from contextllama_lora_remove
Get maximum number of devicesllama_max_devices
Print memory breakdown by devicellama_memory_breakdown_print
Check if the KV cache supports shiftingllama_memory_can_shift
Clear the KV cachellama_memory_clear
Shift token positions in a sequencellama_memory_seq_add
Copy a sequence in the KV cachellama_memory_seq_cp
Integer-divide token positions in a sequencellama_memory_seq_div
Keep only one sequence in the KV cachellama_memory_seq_keep
Get position range for a sequencellama_memory_seq_pos_range
Remove tokens from a sequence in the KV cachellama_memory_seq_rm
Get model metadatallama_model_info
Get all model metadata as a named character vectorllama_model_meta
Get a single model metadata value by keyllama_model_meta_val
Get logical batch sizellama_n_batch
Get context window sizellama_n_ctx
Get per-sequence context window sizellama_n_ctx_seq
Get maximum number of sequencesllama_n_seq_max
Get number of threads for single-token generationllama_n_threads
Get number of threads for batch processingllama_n_threads_batch
Get physical micro-batch sizellama_n_ubatch
Create an inference contextllama_new_context
Initialize NUMA optimizationllama_numa_init
Get performance statisticsllama_perf
Print performance statistics to the consolellama_perf_print
Reset performance countersllama_perf_reset
Get pooling typellama_pooling_type
Serve an OpenAI-compatible HTTP API for a local modelllama_serve_openai
Set or clear the abort callbackllama_set_abort_callback
Set causal attention modellama_set_causal_attn
Set the number of threads for a contextllama_set_threads
Set logging verbosity levelllama_set_verbosity
Set warmup modellama_set_warmup
Get the size of the serialized context state in bytesllama_state_get_size
Load context state from filellama_state_load
Save context state to filellama_state_save
Check whether GPU offloading is availablellama_supports_gpu
Check whether memory locking is supportedllama_supports_mlock
Check whether memory-mapped file I/O is supportedllama_supports_mmap
Check whether RPC backend is availablellama_supports_rpc
Synchronize asynchronous computationllama_synchronize
Get system information stringllama_system_info
Get current time in microsecondsllama_time_us
Convert a single token ID to its text piecellama_token_to_piece
Tokenize text into token IDsllama_tokenize
Get the score of a tokenllama_vocab_get_score
Get the text representation of a tokenllama_vocab_get_text
Get vocabulary special token IDsllama_vocab_info
Check if a token is a control tokenllama_vocab_is_control
Check if a token is an end-of-generation tokenllama_vocab_is_eog
Get vocabulary typellama_vocab_type