| Title: | 'GGML' Tensor Operations for Machine Learning |
|---|---|
| Description: | Provides 'R' bindings to the 'GGML' tensor library for machine learning, optimized for 'Vulkan' GPU acceleration with a transparent CPU fallback. The package features a 'Keras'-like sequential API and a 'PyTorch'-style 'autograd' engine for building, training, and deploying neural networks. Key capabilities include high-performance 5D tensor operations, 'f16' precision, and efficient quantization. It supports native 'ONNX' model import (50+ operators) and 'GGUF' weight loading from the 'llama.cpp' and 'Hugging Face' ecosystems. Designed for zero-overhead inference via dedicated weight buffering, it integrates seamlessly as a 'parsnip' engine for 'tidymodels' and provides first-class learners for the 'mlr3' framework. See <https://github.com/ggml-org/ggml> for more information about the underlying library. |
| Authors: | Yuri Baramykov [aut, cre] (ORCID: <https://orcid.org/0009-0000-7627-4217>), Georgi Gerganov [ctb, cph] (Author of the GGML library), Jeffrey Quesnelle [ctb, cph] (Contributor to ops.cpp), Bowen Peng [ctb, cph] (Contributor to ops.cpp), Mozilla Foundation [ctb, cph] (Author of llamafile/sgemm.cpp) |
| Maintainer: | Yuri Baramykov <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.7.8 |
| Built: | 2026-06-08 22:28:08 UTC |
| Source: | https://github.com/zabis13/ggmlr |
Computes A + B. If B is [m, 1] and A is
[m, n], B is broadcast across columns (useful for bias
vectors).
ag_add(A, B)ag_add(A, B)
A |
ag_tensor or numeric matrix |
B |
ag_tensor or numeric matrix (may be |
ag_tensor
Normalises each feature (row) over the batch dimension.
Learnable scale gamma [F,1] and shift beta [F,1].
ag_batch_norm(num_features, eps = 1e-05, momentum = 0.1)ag_batch_norm(num_features, eps = 1e-05, momentum = 0.1)
num_features |
Number of features (rows of input) |
eps |
Numerical stability constant (default 1e-5) |
momentum |
Running-stats momentum (default 0.1) |
Training mode: use batch statistics; update running mean/var. Eval mode: use stored running statistics.
An ag_batch_norm environment
bn <- ag_batch_norm(16L) x <- ag_tensor(matrix(rnorm(16 * 32), 16, 32)) out <- bn$forward(x)bn <- ag_batch_norm(16L) x <- ag_tensor(matrix(rnorm(16 * 32), 16, 32)) out <- bn$forward(x)
Clamps values to [lo, hi]. Gradient is 1 inside the interval,
0 at the boundary (straight-through estimator).
ag_clamp(x, lo = -Inf, hi = Inf)ag_clamp(x, lo = -Inf, hi = Inf)
x |
ag_tensor |
lo |
Lower bound (default |
hi |
Upper bound (default |
ag_tensor
Generic CE: -sum(target * log(pred)) / batch_size.
The gradient w.r.t. pred is -target / pred / n.
Use ag_softmax_cross_entropy_loss() for the numerically stable
combined softmax + CE (fused gradient (p - y) / n).
ag_cross_entropy_loss(pred, target)ag_cross_entropy_loss(pred, target)
pred |
ag_tensor [classes, batch_size] probabilities (any, not just softmax) |
target |
matrix [classes, batch_size] one-hot (or soft) labels |
scalar ag_tensor
Returns an iterator environment. Each call to $next_batch() returns
a named list list(x, y) with ag_tensor objects of shape
[features, batch_size] / [labels, batch_size].
After the last batch, $has_next() returns FALSE; call
$reset() (or start a new epoch via $epoch()) to reshuffle
and restart.
ag_dataloader(x, y = NULL, batch_size = 32L, shuffle = TRUE, col_major = TRUE)ag_dataloader(x, y = NULL, batch_size = 32L, shuffle = TRUE, col_major = TRUE)
x |
Feature matrix |
y |
Label matrix with the same convention. |
batch_size |
Integer batch size. |
shuffle |
Logical; if |
col_major |
Logical; if |
An ag_dataloader environment
n <- 128L x <- matrix(runif(4 * n), 4, n) # [4, 128] col-major y <- matrix(runif(2 * n), 2, n) dl <- ag_dataloader(x, y, batch_size = 32L) dl$reset() while (dl$has_next()) { batch <- dl$next_batch() # batch$x: [4, 32], batch$y: [2, 32] }n <- 128L x <- matrix(runif(4 * n), 4, n) # [4, 128] col-major y <- matrix(runif(2 * n), 2, n) dl <- ag_dataloader(x, y, batch_size = 32L) dl$reset() while (dl$has_next()) { batch <- dl$next_batch() # batch$x: [4, 32], batch$y: [2, 32] }
Return the current default compute device
ag_default_device()ag_default_device()
"cpu" or "gpu"
Return the current default dtype for GPU operations
ag_default_dtype()ag_default_dtype()
"f32", "f16", or "bf16"
Switches all subsequent ag_tensor / ag_param operations to run
on the specified device. Calling ag_device("gpu") initialises the
best available ggml backend (Vulkan, Metal, CUDA, or CPU fallback) the first
time it is called.
ag_device(device)ag_device(device)
device |
|
Invisibly the previous device string
In training mode applies inverted dropout (random Bernoulli mask, scale by
1/(1-rate) to preserve expected values). In eval mode is identity.
ag_dropout(rate)ag_dropout(rate)
rate |
Drop probability in [0, 1) |
An ag_dropout environment
drop <- ag_dropout(0.5) x <- ag_tensor(matrix(runif(8), 4, 2)) out <- drop$forward(x) # training mode by default ag_eval(drop) out2 <- drop$forward(x) # identitydrop <- ag_dropout(0.5) x <- ag_tensor(matrix(runif(8), 4, 2)) out <- drop$forward(x) # training mode by default ag_eval(drop) out2 <- drop$forward(x) # identity
Controls the dtype used when uploading tensors to the ggml backend.
"bf16" halves memory usage vs "f32" with minimal accuracy loss.
Backward pass always uses f32 R matrices regardless of this setting.
ag_dtype(dtype)ag_dtype(dtype)
dtype |
|
Invisibly the previous dtype string
Maps 0-based integer indices to dense vectors via table lookup.
Input: integer matrix or vector of 0-based indices.
Output: float tensor [dim, length(idx)].
ag_embedding(vocab_size, dim)ag_embedding(vocab_size, dim)
vocab_size |
Vocabulary size |
dim |
Embedding dimension |
Backward: scatter-add — only the looked-up rows accumulate gradient.
An ag_embedding environment
emb <- ag_embedding(100L, 16L) idx <- c(0L, 3L, 7L, 2L) out <- emb$forward(idx) # [16, 4]emb <- ag_embedding(100L, 16L) idx <- c(0L, 3L, 7L, 2L) out <- emb$forward(idx) # [16, 4]
Switch a layer or sequential model to eval mode
ag_eval(model)ag_eval(model)
model |
An ag_sequential, ag_batch_norm, or ag_dropout layer |
The model/layer (invisibly)
Element-wise exponential
ag_exp(x)ag_exp(x)
x |
ag_tensor |
ag_tensor
Compares analytical gradients (from backward()) with finite-difference
numerical gradients for all input tensors with requires_grad = TRUE.
ag_gradcheck( fn, inputs, eps = 1e-05, atol = 1e-04, verbose = FALSE, quiet = FALSE )ag_gradcheck( fn, inputs, eps = 1e-05, atol = 1e-04, verbose = FALSE, quiet = FALSE )
fn |
A function that takes a list of ag_tensor inputs and returns a
scalar ag_tensor loss (must be used inside |
inputs |
Named list of ag_tensor objects. Only those with
|
eps |
Finite-difference step size (default 1e-5). |
atol |
Absolute tolerance for pass/fail (default 1e-4). |
verbose |
Print per-element comparison (default FALSE). |
quiet |
Suppress per-parameter and overall status lines (default FALSE).
Useful when calling from |
Invisibly TRUE if all gradients match, FALSE otherwise.
When quiet = FALSE (default), prints a summary report.
W <- ag_param(matrix(runif(6), 2, 3)) x <- ag_tensor(matrix(runif(3), 3, 1)) ag_gradcheck( fn = function(ins) ag_mse_loss(ag_relu(ag_matmul(ins$W, ins$x)), matrix(0, 2, 1)), inputs = list(W = W, x = x) )W <- ag_param(matrix(runif(6), 2, 3)) x <- ag_tensor(matrix(runif(3), 3, 1)) ag_gradcheck( fn = function(ins) ag_mse_loss(ag_relu(ag_matmul(ins$W, ins$x)), matrix(0, 2, 1)), inputs = list(W = W, x = x) )
Returns a closure-based layer. Because ag_param uses environment semantics, the optimizer updates W and b in-place, and forward() always uses the latest weights.
ag_linear(in_features, out_features, activation = NULL)ag_linear(in_features, out_features, activation = NULL)
in_features |
Input dimension |
out_features |
Output dimension |
activation |
"relu", "sigmoid", "tanh", "softmax", or NULL |
List with W, b, forward(x), params()
layer <- ag_linear(4L, 8L, activation = "relu") x <- ag_tensor(matrix(runif(4 * 16), 4, 16)) out <- layer$forward(x)layer <- ag_linear(4L, 8L, activation = "relu") x <- ag_tensor(matrix(runif(4 * 16), 4, 16)) out <- layer$forward(x)
Reconstructs an ag_* module saved with ag_save_model.
The architecture is rebuilt by calling model_fn (either the one passed
here, or the one stored inside the container at save time), and the saved
parameter and buffer values are copied back by name.
ag_load_model(path, model_fn = NULL, device = NULL)ag_load_model(path, model_fn = NULL, device = NULL)
path |
File path written by |
model_fn |
Optional zero-argument rebuild function. Required if no
|
device |
Optional device for the rebuilt module ( |
The reconstructed module with restored weights, in eval mode.
build <- function() ag_sequential(ag_linear(4L, 8L), ag_linear(8L, 3L)) f <- tempfile(fileext = ".rds") ag_save_model(build(), f, model_fn = build) model <- ag_load_model(f)build <- function() ag_sequential(ag_linear(4L, 8L), ag_linear(8L, 3L)) f <- tempfile(fileext = ".rds") ag_save_model(build(), f, model_fn = build) model <- ag_load_model(f)
Element-wise natural logarithm
ag_log(x)ag_log(x)
x |
ag_tensor |
ag_tensor
Computes A %*% B and records the operation on the gradient tape.
ag_matmul(A, B)ag_matmul(A, B)
A |
ag_tensor or numeric matrix of shape |
B |
ag_tensor or numeric matrix of shape |
ag_tensor of shape [m, n]
Mean of elements (or along a dim)
ag_mean(x, dim = NULL, keepdim = FALSE)ag_mean(x, dim = NULL, keepdim = FALSE)
x |
ag_tensor |
dim |
NULL (all), 1 (row-wise), or 2 (col-wise) |
keepdim |
Logical |
ag_tensor
Mean Squared Error loss
ag_mse_loss(pred, target)ag_mse_loss(pred, target)
pred |
ag_tensor [units, batch_size] |
target |
ag_tensor or matrix [units, batch_size] |
scalar ag_tensor
Element-wise multiplication
ag_mul(A, B)ag_mul(A, B)
A |
ag_tensor or numeric matrix |
B |
ag_tensor or numeric matrix |
ag_tensor
Implements scaled dot-product multi-head attention as in "Attention Is All You Need" (Vaswani et al., 2017).
ag_multihead_attention(d_model, n_heads, dropout = 0, bias = TRUE)ag_multihead_attention(d_model, n_heads, dropout = 0, bias = TRUE)
d_model |
Model (embedding) dimension |
n_heads |
Number of attention heads. |
dropout |
Attention dropout probability (default 0, applied in training mode only) |
bias |
Logical: add bias to output projection (default TRUE) |
Calling convention (mirrors PyTorch nn.MultiheadAttention):
layer$forward(q) — self-attention (k = v = q)
layer$forward(q, k, v) — cross-attention
Tensor layout: [d_model, seq_len] — columns are tokens,
consistent with the rest of the ag_* API.
Forward pass:
Q = W_q %*% q [d_k * n_heads, seq_len]
K = W_k %*% k [d_k * n_heads, seq_len]
V = W_v %*% v [d_v * n_heads, seq_len]
for each head h:
q_h = Q[h*d_k+1 : (h+1)*d_k, ] [d_k, seq_len]
k_h = K[h*d_k+1 : (h+1)*d_k, ] [d_k, seq_len]
v_h = V[h*d_v+1 : (h+1)*d_v, ] [d_v, seq_len]
A_h = softmax(t(q_h) %*% k_h / sqrt(d_k)) [seq_len, seq_len]
if causal_mask: A_h[i,j] = 0 for j > i
head_h = v_h %*% A_h [d_v, seq_len]
concat = rbind(head_1, ..., head_H) [d_v*n_heads, seq_len]
out = W_o %*% concat + b_o [d_model, seq_len]
An ag_multihead_attention environment with
$forward(q, k, v, causal_mask) and $parameters()
# Self-attention mha <- ag_multihead_attention(64L, 8L) x <- ag_tensor(matrix(rnorm(64 * 10), 64, 10)) # [d_model=64, seq_len=10] out <- mha$forward(x) # [64, 10] # Cross-attention q <- ag_tensor(matrix(rnorm(64 * 10), 64, 10)) kv <- ag_tensor(matrix(rnorm(64 * 15), 64, 15)) out <- mha$forward(q, kv, kv) # Causal (GPT-style) out <- mha$forward(x, causal_mask = TRUE)# Self-attention mha <- ag_multihead_attention(64L, 8L) x <- ag_tensor(matrix(rnorm(64 * 10), 64, 10)) # [d_model=64, seq_len=10] out <- mha$forward(x) # [64, 10] # Cross-attention q <- ag_tensor(matrix(rnorm(64 * 10), 64, 10)) kv <- ag_tensor(matrix(rnorm(64 * 15), 64, 15)) out <- mha$forward(q, kv, kv) # Causal (GPT-style) out <- mha$forward(x, causal_mask = TRUE)
Create a parameter tensor (gradient tracked)
ag_param( data, device = .ag_device_state$device, dtype = .ag_device_state$dtype )ag_param( data, device = .ag_device_state$device, dtype = .ag_device_state$dtype )
data |
Numeric matrix or vector |
device |
|
dtype |
Floating-point precision: |
An ag_tensor with requires_grad = TRUE
Element-wise power
ag_pow(x, p)ag_pow(x, p)
x |
ag_tensor |
p |
Numeric exponent (scalar, not tracked for gradients) |
ag_tensor
Applies the rectified linear unit: .
ag_relu(x)ag_relu(x)
x |
ag_tensor |
ag_tensor
Reshape tensor
ag_reshape(x, nrow, ncol)ag_reshape(x, nrow, ncol)
x |
ag_tensor |
nrow |
New number of rows (use -1 to infer) |
ncol |
New number of columns (use -1 to infer) |
ag_tensor with new shape, same data
Serializes the trainable parameters and persistent buffers of an
ag_sequential (or single ag_* layer) module as a portable
state dictionary of plain numeric matrices. This avoids serializing the live
module (environments + closures), which is brittle across ggmlR versions and
carries non-portable GPU pointers.
ag_save_model(model, path, model_fn = NULL)ag_save_model(model, path, model_fn = NULL)
model |
An |
path |
File path to write (an RDS container). |
model_fn |
Optional zero-argument function that rebuilds the module
architecture (fresh, untrained). If supplied, it is stored in the container
so |
Reconstruction requires the architecture. Either pass model_fn here so
it is stored in the file, or pass it later to ag_load_model.
path, invisibly.
build <- function() ag_sequential(ag_linear(4L, 8L), ag_linear(8L, 3L)) model <- build() ag_save_model(model, tempfile(fileext = ".rds"), model_fn = build)build <- function() ag_sequential(ag_linear(4L, 8L), ag_linear(8L, 3L)) model <- build() ag_save_model(model, tempfile(fileext = ".rds"), model_fn = build)
Scale tensor by a scalar constant
ag_scale(x, scalar)ag_scale(x, scalar)
x |
ag_tensor |
scalar |
Numeric scalar (not tracked for gradients) |
ag_tensor
Chains layers so that forward(x) passes x through each layer
in order. parameters() collects all trainable params from all layers.
ag_train() / ag_eval() propagate mode to stateful sub-layers.
ag_sequential(...)ag_sequential(...)
... |
Layer objects (ag_linear, ag_dropout, ag_batch_norm, ag_embedding) or a single list of layers. |
An ag_sequential environment
model <- ag_sequential( ag_linear(4L, 16L, activation = "relu"), ag_dropout(0.5), ag_linear(16L, 2L, activation = "softmax") ) x <- ag_tensor(matrix(runif(4 * 8), 4, 8)) out <- model$forward(x)model <- ag_sequential( ag_linear(4L, 16L, activation = "relu"), ag_dropout(0.5), ag_linear(16L, 2L, activation = "softmax") ) x <- ag_tensor(matrix(runif(4 * 8), 4, 8)) out <- model$forward(x)
Applies .
ag_sigmoid(x)ag_sigmoid(x)
x |
ag_tensor |
ag_tensor
Applies numerically stable softmax along rows so that each column (one sample) sums to 1.
ag_softmax(x)ag_softmax(x)
x |
ag_tensor of shape |
ag_tensor of the same shape as x
Combines softmax and CE in one op using the fused gradient (p - y) / n.
More numerically stable than chaining ag_softmax + ag_cross_entropy_loss.
Use this when your last layer outputs raw logits.
ag_softmax_cross_entropy_loss(logits, target)ag_softmax_cross_entropy_loss(logits, target)
logits |
ag_tensor [classes, batch_size] raw (pre-softmax) scores |
target |
matrix [classes, batch_size] one-hot labels |
scalar ag_tensor
Element-wise subtraction
ag_sub(A, B)ag_sub(A, B)
A |
ag_tensor or numeric matrix |
B |
ag_tensor or numeric matrix |
ag_tensor
Sum all elements (or along a dim): out = sum(x)
ag_sum(x, dim = NULL, keepdim = FALSE)ag_sum(x, dim = NULL, keepdim = FALSE)
x |
ag_tensor |
dim |
NULL (all), 1 (row-wise), or 2 (col-wise) |
keepdim |
Logical: keep size-1 dimensions |
scalar (or reduced) ag_tensor
Tanh activation
ag_tanh(x)ag_tanh(x)
x |
ag_tensor |
ag_tensor
ag_tensor is backed by an R environment so all references to the same tensor see updates (like PyTorch tensors).
ag_tensor( data, device = .ag_device_state$device, dtype = .ag_device_state$dtype )ag_tensor( data, device = .ag_device_state$device, dtype = .ag_device_state$dtype )
data |
Numeric matrix or vector |
device |
|
dtype |
Floating-point precision: |
An ag_tensor object (environment)
Copies an ag_tensor to the target device, returning a new tensor.
The original tensor is not modified.
ag_to_device(tensor, device)ag_to_device(tensor, device)
tensor |
An |
device |
|
A new ag_tensor on the target device (or the original if
already on the target device)
Switch a layer or sequential model to training mode
ag_train(model)ag_train(model)
model |
An ag_sequential, ag_batch_norm, or ag_dropout layer |
The model/layer (invisibly)
Transpose a tensor
ag_transpose(x)ag_transpose(x)
x |
ag_tensor |
ag_tensor with rows and columns swapped
Adds prediction columns to new_data, broom style. For classification this
appends .pred_class plus one .pred_<level> probability column per class;
for regression it appends .pred. Predictions are produced by the existing
predict.ggmlr_parsnip_model path (no duplicate inference logic).
## S3 method for class 'ggmlr_parsnip_model' augment(x, new_data, ...)## S3 method for class 'ggmlr_parsnip_model' augment(x, new_data, ...)
x |
A fitted |
new_data |
A data frame of predictors (same columns used for fitting). |
... |
Unused; for generic compatibility. |
new_data as a tibble with prediction columns appended.
spec <- parsnip::mlp(hidden_units = 8L, epochs = 3L) |> parsnip::set_engine("ggml", backend = "cpu") |> parsnip::set_mode("regression") fit_obj <- parsnip::fit(spec, mpg ~ ., data = mtcars) generics::augment(parsnip::extract_fit_engine(fit_obj), mtcars)spec <- parsnip::mlp(hidden_units = 8L, epochs = 3L) |> parsnip::set_engine("ggml", backend = "cpu") |> parsnip::set_mode("regression") fit_obj <- parsnip::fit(spec, mpg ~ ., data = mtcars) generics::augment(parsnip::extract_fit_engine(fit_obj), mtcars)
Traverses the gradient tape in reverse and accumulates gradients into
tensor$grad for all leaf tensors with requires_grad = TRUE.
backward(loss)backward(loss)
loss |
Scalar ag_tensor |
Named environment: tensor id -> gradient matrix (for use by optimizer$step)
w <- ag_param(matrix(runif(4), 2, 2)) x <- ag_tensor(matrix(c(1, 2), 2, 1)) y <- ag_tensor(matrix(c(0, 1), 2, 1)) with_grad_tape({ out <- ag_matmul(w, x) loss <- ag_mse_loss(out, y) }) grads <- backward(loss)w <- ag_param(matrix(runif(4), 2, 2)) x <- ag_tensor(matrix(c(1, 2), 2, 1)) y <- ag_tensor(matrix(c(0, 1), 2, 1)) with_grad_tape({ out <- ag_matmul(w, x) loss <- ag_mse_loss(out, y) }) grads <- backward(loss)
Rescales all gradients in grads so that their global L2 norm does
not exceed max_norm. Modifies the grads environment
in-place and returns the pre-clip norm.
clip_grad_norm(params, grads, max_norm)clip_grad_norm(params, grads, max_norm)
params |
Named list of ag_param tensors (same as passed to optimizer). |
grads |
Gradient environment returned by |
max_norm |
Maximum allowed global L2 norm. |
Call this after backward() and before
optimizer$step().
Numeric: the global L2 norm before clipping (invisibly).
w <- ag_param(matrix(runif(4), 2, 2)) x <- ag_tensor(matrix(c(1, 1), 2, 1)) with_grad_tape({ out <- ag_matmul(w, x) loss <- ag_mse_loss(out, matrix(0, 2, 1)) }) grads <- backward(loss) clip_grad_norm(list(w = w), grads, max_norm = 1.0)w <- ag_param(matrix(runif(4), 2, 2)) x <- ag_tensor(matrix(c(1, 1), 2, 1)) with_grad_tape({ out <- ag_matmul(w, x) loss <- ag_mse_loss(out, matrix(0, 2, 1)) }) grads <- backward(loss) clip_grad_norm(list(w = w), grads, max_norm = 1.0)
Configures the model for training by setting the optimizer, loss function,
and metrics. This is the keras-compatible interface; it delegates to
ggml_compile.
## S3 method for class 'ggml_sequential_model' compile( object, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), ... ) ## S3 method for class 'ggml_functional_model' compile( object, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), ... )## S3 method for class 'ggml_sequential_model' compile( object, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), ... ) ## S3 method for class 'ggml_functional_model' compile( object, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), ... )
object |
A model object (e.g. |
optimizer |
Character: |
loss |
Character: |
metrics |
Character vector of metrics (default |
... |
Additional arguments passed to |
The compiled model (invisibly).
model <- ggml_model_sequential() |> ggml_layer_dense(10, activation = "softmax", input_shape = 4) model <- compile(model, optimizer = "adam", loss = "categorical_crossentropy")model <- ggml_model_sequential() |> ggml_layer_dense(10, activation = "softmax", input_shape = 4) model <- compile(model, optimizer = "adam", loss = "categorical_crossentropy")
Converts IQ (integer quantization) data back to float values. IQ formats provide high compression with importance-matrix-aware quantization.
dequantize_row_iq2_xxs(raw_data, n_elements) dequantize_row_iq2_xs(raw_data, n_elements) dequantize_row_iq2_s(raw_data, n_elements) dequantize_row_iq3_xxs(raw_data, n_elements) dequantize_row_iq3_s(raw_data, n_elements) dequantize_row_iq4_nl(raw_data, n_elements) dequantize_row_iq4_xs(raw_data, n_elements) dequantize_row_iq1_s(raw_data, n_elements) dequantize_row_iq1_m(raw_data, n_elements)dequantize_row_iq2_xxs(raw_data, n_elements) dequantize_row_iq2_xs(raw_data, n_elements) dequantize_row_iq2_s(raw_data, n_elements) dequantize_row_iq3_xxs(raw_data, n_elements) dequantize_row_iq3_s(raw_data, n_elements) dequantize_row_iq4_nl(raw_data, n_elements) dequantize_row_iq4_xs(raw_data, n_elements) dequantize_row_iq1_s(raw_data, n_elements) dequantize_row_iq1_m(raw_data, n_elements)
raw_data |
Raw vector containing quantized data |
n_elements |
Number of elements to dequantize |
Numeric vector of dequantized values
Other quantization:
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Converts MXFP4 (microscaling FP4) quantized data back to float values.
dequantize_row_mxfp4(raw_data, n_elements)dequantize_row_mxfp4(raw_data, n_elements)
raw_data |
Raw vector containing quantized data |
n_elements |
Number of elements to dequantize |
Numeric vector of dequantized values
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Converts NVFP4 quantized data back to float values.
dequantize_row_nvfp4(raw_data, n_elements)dequantize_row_nvfp4(raw_data, n_elements)
raw_data |
Raw vector of NVFP4 quantized data |
n_elements |
Number of dequantized elements (must be multiple of 64) |
Numeric vector of dequantized values
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Converts Q1_0 quantized data back to float values.
dequantize_row_q1_0(raw_data, n_elements)dequantize_row_q1_0(raw_data, n_elements)
raw_data |
Raw vector of Q1_0 quantized data |
n_elements |
Number of dequantized elements (must be multiple of 128) |
Numeric vector of dequantized values
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Converts K-quant quantized data back to float values. K-quants (q2_K through q8_K) provide better quality/size tradeoffs.
dequantize_row_q2_K(raw_data, n_elements) dequantize_row_q3_K(raw_data, n_elements) dequantize_row_q4_K(raw_data, n_elements) dequantize_row_q5_K(raw_data, n_elements) dequantize_row_q6_K(raw_data, n_elements) dequantize_row_q8_K(raw_data, n_elements)dequantize_row_q2_K(raw_data, n_elements) dequantize_row_q3_K(raw_data, n_elements) dequantize_row_q4_K(raw_data, n_elements) dequantize_row_q5_K(raw_data, n_elements) dequantize_row_q6_K(raw_data, n_elements) dequantize_row_q8_K(raw_data, n_elements)
raw_data |
Raw vector containing quantized data |
n_elements |
Number of elements to dequantize |
Numeric vector of dequantized values
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Converts Q4_0 quantized data back to float values.
dequantize_row_q4_0(raw_data, n_elements) dequantize_row_q4_1(raw_data, n_elements) dequantize_row_q5_0(raw_data, n_elements) dequantize_row_q5_1(raw_data, n_elements) dequantize_row_q8_0(raw_data, n_elements)dequantize_row_q4_0(raw_data, n_elements) dequantize_row_q4_1(raw_data, n_elements) dequantize_row_q5_0(raw_data, n_elements) dequantize_row_q5_1(raw_data, n_elements) dequantize_row_q8_0(raw_data, n_elements)
raw_data |
Raw vector containing quantized data |
n_elements |
Number of elements to dequantize |
Numeric vector of dequantized values
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Converts ternary quantized data back to float values. TQ1_0 and TQ2_0 are extreme compression formats.
dequantize_row_tq1_0(raw_data, n_elements) dequantize_row_tq2_0(raw_data, n_elements)dequantize_row_tq1_0(raw_data, n_elements) dequantize_row_tq2_0(raw_data, n_elements)
raw_data |
Raw vector containing quantized data |
n_elements |
Number of elements to dequantize |
Numeric vector of dequantized values
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Runs synchronous data-parallel training:
make_model() is called n_gpu times to create one
independent model replica per GPU (each with its own parameters).
Each iteration: the current data item is forwarded through every
replica in parallel; gradients are computed via backward().
Gradients are averaged across all replicas (element-wise mean).
One optimizer step is taken on replica 0; updated weights are then broadcast to replicas 1 … N-1 so all replicas stay in sync.
dp_train( make_model, data, loss_fn = NULL, forward_fn = NULL, target_fn = NULL, n_gpu = NULL, n_iter = 10L, lr = 0.001, max_norm = Inf, verbose = 10L )dp_train( make_model, data, loss_fn = NULL, forward_fn = NULL, target_fn = NULL, n_gpu = NULL, n_iter = 10L, lr = 0.001, max_norm = Inf, verbose = 10L )
make_model |
A zero-argument function that returns a model object with
at least |
data |
A list of training samples. Each element is passed directly to
|
loss_fn |
A function |
forward_fn |
Optional function |
target_fn |
Optional function |
n_gpu |
Number of GPU replicas (default: all available Vulkan devices, minimum 1). |
n_iter |
Number of training iterations (passes over |
lr |
Learning rate for Adam optimizer (default 1e-3). |
max_norm |
Gradient clipping threshold (default |
verbose |
Print loss every |
Because all replicas live in the same R process and ag_param uses
environment (reference) semantics, no IPC or NCCL is required — weight
synchronisation is a simple in-place copy.
A list with:
paramsNamed list of final parameters (from replica 0).
loss_historyNumeric vector of per-iteration mean loss.
modelReplica 0 model object.
make_model <- function() { W <- ag_param(matrix(rnorm(4), 2, 2)) list( forward = function(x) ag_matmul(W, x), parameters = function() list(W = W) ) } data <- lapply(1:8, function(i) matrix(rnorm(2), 2, 1)) result <- dp_train( make_model = make_model, data = data, loss_fn = function(out, tgt) ag_mse_loss(out, tgt), target_fn = function(s) s, n_gpu = 1L, n_iter = 10L, lr = 1e-3, verbose = FALSE )make_model <- function() { W <- ag_param(matrix(rnorm(4), 2, 2)) list( forward = function(x) ag_matmul(W, x), parameters = function() list(W = W) ) } data <- lapply(1:8, function(i) matrix(rnorm(2), 2, 1)) result <- dp_train( make_model = make_model, data = data, loss_fn = function(out, tgt) ag_mse_loss(out, tgt), target_fn = function(s) s, n_gpu = 1L, n_iter = 10L, lr = 1e-3, verbose = FALSE )
Computes loss and metrics on test data. This is the keras-compatible
interface; it delegates to ggml_evaluate.
## S3 method for class 'ggml_sequential_model' evaluate(x, test_x, test_y, batch_size = 32L, ...) ## S3 method for class 'ggml_functional_model' evaluate(x, test_x, test_y, batch_size = 32L, ...)## S3 method for class 'ggml_sequential_model' evaluate(x, test_x, test_y, batch_size = 32L, ...) ## S3 method for class 'ggml_functional_model' evaluate(x, test_x, test_y, batch_size = 32L, ...)
x |
A trained model object. |
test_x |
Test data. |
test_y |
Test labels. |
batch_size |
Batch size (default 32). |
... |
Additional arguments passed to |
A named list with loss and metric values.
Trains the model on data for a fixed number of epochs. This is the
keras-compatible interface; it delegates to ggml_fit.
## S3 method for class 'ggml_sequential_model' fit( object, x, y, epochs = 1L, batch_size = 32L, validation_split = 0, validation_data = NULL, verbose = 1L, callbacks = list(), ... ) ## S3 method for class 'ggml_functional_model' fit( object, x, y, epochs = 1L, batch_size = 32L, validation_split = 0, validation_data = NULL, verbose = 1L, callbacks = list(), ... )## S3 method for class 'ggml_sequential_model' fit( object, x, y, epochs = 1L, batch_size = 32L, validation_split = 0, validation_data = NULL, verbose = 1L, callbacks = list(), ... ) ## S3 method for class 'ggml_functional_model' fit( object, x, y, epochs = 1L, batch_size = 32L, validation_split = 0, validation_data = NULL, verbose = 1L, callbacks = list(), ... )
object |
A compiled model object. |
x |
Training data. Matrix, array, or list of matrices (multi-input). |
y |
Training labels (matrix, one-hot encoded for classification). |
epochs |
Number of training epochs (default 1). |
batch_size |
Batch size (default 32). |
validation_split |
Fraction of data for validation (default 0). |
validation_data |
Optional |
verbose |
0 = silent, 1 = progress (default 1). |
callbacks |
List of callback objects (default |
... |
Additional arguments passed to |
The trained model (invisibly), with model$history.
model <- ggml_model_sequential() |> ggml_layer_dense(10, activation = "softmax", input_shape = 4) model <- compile(model, optimizer = "adam", loss = "categorical_crossentropy") # model <- fit(model, x_train, y_train, epochs = 5, batch_size = 32)model <- ggml_model_sequential() |> ggml_layer_dense(10, activation = "softmax", input_shape = 4) model <- compile(model, optimizer = "adam", loss = "categorical_crossentropy") # model <- fit(model, x_train, y_train, epochs = 5, batch_size = 32)
Check if R Abort Handler is Enabled
ggml_abort_is_r_enabled()ggml_abort_is_r_enabled()
Logical indicating if R-compatible abort handling is active
Other logging:
ggml_log_is_r_enabled(),
ggml_log_set_default(),
ggml_log_set_r(),
ggml_set_abort_callback_default(),
ggml_set_abort_callback_r()
Creates a graph node for element-wise absolute value: |x|
ggml_abs(ctx, a)ggml_abs(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the abs operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(-2, -1, 1, 2)) result <- ggml_abs(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [2, 1, 1, 2] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(-2, -1, 1, 2)) result <- ggml_abs(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [2, 1, 1, 2] ggml_free(ctx)
Creates a graph node for in-place element-wise absolute value.
ggml_abs_inplace(ctx, a)ggml_abs_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with absolute values
Creates a graph node for element-wise addition. Must be computed using ggml_build_forward_expand() and ggml_graph_compute().
ggml_add(ctx, a, b) ggml_add(ctx, a, b)ggml_add(ctx, a, b) ggml_add(ctx, a, b)
ctx |
GGML context |
a |
First tensor |
b |
Second tensor (same shape as a) |
Tensor representing the addition operation
Tensor representing the addition operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place element-wise addition. Result is stored in tensor a, saving memory allocation. Returns a view of the modified tensor.
ggml_add_inplace(ctx, a, b)ggml_add_inplace(ctx, a, b)
ctx |
GGML context |
a |
First tensor (will be modified in-place) |
b |
Second tensor (same shape as a) |
View of tensor a with the addition result
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add_inplace(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add_inplace(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Adds width and height relative-position bias to a.
ggml_add_rel_pos(ctx, a, pw, ph)ggml_add_rel_pos(ctx, a, pw, ph)
ctx |
GGML context |
a |
Input tensor |
pw |
Width relative-position tensor |
ph |
Height relative-position tensor |
Tensor with added relative-position bias
Creates a graph node for adding a scalar (1-element tensor) to all elements of a tensor. This is more efficient than creating a full tensor of the same value.
ggml_add1(ctx, a, b)ggml_add1(ctx, a, b)
ctx |
GGML context |
a |
Input tensor |
b |
Scalar tensor (1-element tensor) |
Tensor representing the operation a + b (broadcasted)
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) scalar <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(scalar, 10) result <- ggml_add1(ctx, a, scalar) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) scalar <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(scalar, 10) result <- ggml_add1(ctx, a, scalar) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Applies a ggml_layer object (created with ggml_dense(),
ggml_lstm(), etc.) to a ggml_tensor_node. Applying the
same layer object to multiple tensor nodes produces shared weights –
the identity of the layer object (layer$layer_id) is used as the
sharing key, not its name.
ggml_apply(tensor, layer)ggml_apply(tensor, layer)
tensor |
A |
layer |
A |
A new ggml_tensor_node.
encoder <- ggml_dense(64L, activation = "relu") x1 <- ggml_input(shape = 32L) x2 <- ggml_input(shape = 32L) out1 <- x1 |> ggml_apply(encoder) out2 <- x2 |> ggml_apply(encoder) # shared weights model <- ggml_model(inputs = list(x1, x2), outputs = list(out1, out2))encoder <- ggml_dense(64L, activation = "relu") x1 <- ggml_input(shape = 32L) x2 <- ggml_input(shape = 32L) out1 <- x1 |> ggml_apply(encoder) out2 <- x2 |> ggml_apply(encoder) # shared weights model <- ggml_model(inputs = list(x1, x2), outputs = list(out1, out2))
Creates a 1D F32 tensor with values from start (inclusive) to
stop (exclusive) in steps of step.
ggml_arange(ctx, start, stop, step = 1)ggml_arange(ctx, start, stop, step = 1)
ctx |
GGML context |
start |
Start value (inclusive) |
stop |
Stop value (exclusive) |
step |
Step between values (default 1) |
1D F32 tensor
Compares two tensors to check if they have identical type, shape, and strides. Tensors with the same layout can be used interchangeably for memory operations.
ggml_are_same_layout(a, b)ggml_are_same_layout(a, b)
a |
External pointer to first tensor |
b |
External pointer to second tensor |
Logical indicating if tensors have identical layout
Other tensor:
ggml_get_op_params(),
ggml_get_op_params_f32(),
ggml_get_op_params_i32(),
ggml_set_op_params(),
ggml_set_op_params_f32(),
ggml_set_op_params_i32()
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 4) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 4) same <- ggml_are_same_layout(a, b) # TRUE ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 4) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 4) same <- ggml_are_same_layout(a, b) # TRUE ggml_free(ctx)
Checks if two tensors have the same shape.
ggml_are_same_shape(a, b)ggml_are_same_shape(a, b)
a |
First tensor |
b |
Second tensor |
TRUE if shapes are identical, FALSE otherwise
Check if two tensors have the same stride pattern. Useful for determining if tensors can share operations.
ggml_are_same_stride(a, b)ggml_are_same_stride(a, b)
a |
First tensor |
b |
Second tensor |
Logical indicating if strides are identical
Other tensor_layout:
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
Creates a graph node that finds the index of the maximum value. CRITICAL for token generation in LLMs.
ggml_argmax(ctx, a)ggml_argmax(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor with argmax indices
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 5, 3, 2, 4)) result <- ggml_argmax(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_i32(result) # 1 (0-indexed) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 5, 3, 2, 4)) result <- ggml_argmax(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_i32(result) # 1 (0-indexed) ggml_free(ctx)
Returns indices that would sort the tensor rows. Each row is sorted independently.
ggml_argsort(ctx, a, order = GGML_SORT_ORDER_ASC)ggml_argsort(ctx, a, order = GGML_SORT_ORDER_ASC)
ctx |
GGML context |
a |
Input tensor to sort (F32) |
order |
Sort order: GGML_SORT_ORDER_ASC (0) or GGML_SORT_ORDER_DESC (1) |
Tensor of I32 indices that would sort each row
ctx <- ggml_init(16 * 1024 * 1024) # Create tensor with values to sort a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(3, 1, 4, 1, 5)) # Get indices for ascending sort indices <- ggml_argsort(ctx, a, GGML_SORT_ORDER_ASC) graph <- ggml_build_forward_expand(ctx, indices) ggml_graph_compute(ctx, graph) result <- ggml_get_i32(indices) # result: [1, 3, 0, 2, 4] (0-indexed positions for sorted order) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Create tensor with values to sort a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(3, 1, 4, 1, 5)) # Get indices for ascending sort indices <- ggml_argsort(ctx, a, GGML_SORT_ORDER_ASC) graph <- ggml_build_forward_expand(ctx, indices) ggml_graph_compute(ctx, graph) result <- ggml_get_i32(indices) # result: [1, 3, 0, 2, 4] (0-indexed positions for sorted order) ggml_free(ctx)
Allocates all tensors in a GGML context to a specific backend. Returns a buffer that must be freed when no longer needed.
ggml_backend_alloc_ctx_tensors(ctx, backend)ggml_backend_alloc_ctx_tensors(ctx, backend)
ctx |
GGML context |
backend |
Backend handle |
Backend buffer object
Clear buffer memory
ggml_backend_buffer_clear(buffer, value = 0L)ggml_backend_buffer_clear(buffer, value = 0L)
buffer |
External pointer to buffer |
value |
Byte value to fill with (default 0) |
NULL invisibly
Other backend:
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Frees a backend buffer and all associated memory.
ggml_backend_buffer_free(buffer)ggml_backend_buffer_free(buffer)
buffer |
Backend buffer object |
No return value, called for side effects
Returns the total size of a backend buffer.
ggml_backend_buffer_get_size(buffer)ggml_backend_buffer_get_size(buffer)
buffer |
Backend buffer object |
Size in bytes
Get buffer usage
ggml_backend_buffer_get_usage(buffer)ggml_backend_buffer_get_usage(buffer)
buffer |
External pointer to buffer |
Usage constant
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Check if buffer is host memory
ggml_backend_buffer_is_host(buffer)ggml_backend_buffer_is_host(buffer)
buffer |
External pointer to buffer |
Logical indicating if buffer is in host memory
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Check if buffer is a multi-buffer
ggml_backend_buffer_is_multi_buffer(buffer)ggml_backend_buffer_is_multi_buffer(buffer)
buffer |
External pointer to buffer |
Logical indicating if buffer is a multi-buffer
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Returns the name/type of a backend buffer.
ggml_backend_buffer_name(buffer)ggml_backend_buffer_name(buffer)
buffer |
Backend buffer object |
Character string with buffer name
Reset buffer
ggml_backend_buffer_reset(buffer)ggml_backend_buffer_reset(buffer)
buffer |
External pointer to buffer |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Set buffer usage hint
ggml_backend_buffer_set_usage(buffer, usage)ggml_backend_buffer_set_usage(buffer, usage)
buffer |
External pointer to buffer |
usage |
Usage constant (use ggml_backend_buffer_usage_* functions) |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Buffer usage: Any
ggml_backend_buffer_usage_any()ggml_backend_buffer_usage_any()
Integer constant for any buffer usage
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Buffer usage: Compute
ggml_backend_buffer_usage_compute()ggml_backend_buffer_usage_compute()
Integer constant for compute buffer usage
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Buffer usage: Weights
ggml_backend_buffer_usage_weights()ggml_backend_buffer_usage_weights()
Integer constant for weights buffer usage
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Creates a new CPU backend instance for graph computation.
ggml_backend_cpu_init()ggml_backend_cpu_init()
Backend pointer
Sets the number of threads for CPU backend computation.
ggml_backend_cpu_set_n_threads(backend, n_threads)ggml_backend_cpu_set_n_threads(backend, n_threads)
backend |
CPU backend pointer |
n_threads |
Number of threads |
NULL invisibly
Get device by name
ggml_backend_dev_by_name(name)ggml_backend_dev_by_name(name)
name |
Device name |
External pointer to device, or NULL if not found
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device by type
ggml_backend_dev_by_type(type)ggml_backend_dev_by_type(type)
type |
Device type (use ggml_backend_device_type_* functions) |
External pointer to first device of given type, or NULL if not found
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get number of available devices
ggml_backend_dev_count()ggml_backend_dev_count()
Number of devices
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device description
ggml_backend_dev_description(device)ggml_backend_dev_description(device)
device |
External pointer to device |
Device description
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device by index
ggml_backend_dev_get(index)ggml_backend_dev_get(index)
index |
Device index (0-based) |
External pointer to device, or NULL if not found
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device properties
ggml_backend_dev_get_props(device)ggml_backend_dev_get_props(device)
device |
External pointer to device |
List with name, description, memory_free, memory_total, type, device_id, caps
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Initialize backend from device
ggml_backend_dev_init(device, params = NULL)ggml_backend_dev_init(device, params = NULL)
device |
External pointer to device |
params |
Optional parameters string |
External pointer to backend, or NULL on failure
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device memory
ggml_backend_dev_memory(device)ggml_backend_dev_memory(device)
device |
External pointer to device |
Named numeric vector with 'free' and 'total' memory in bytes
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device name
ggml_backend_dev_name(device)ggml_backend_dev_name(device)
device |
External pointer to device |
Device name
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Check if device should offload operation
ggml_backend_dev_offload_op(device, op)ggml_backend_dev_offload_op(device, op)
device |
External pointer to device |
op |
External pointer to tensor/operation |
Logical indicating if operation should be offloaded
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Check if device supports buffer type
ggml_backend_dev_supports_buft(device, buft)ggml_backend_dev_supports_buft(device, buft)
device |
External pointer to device |
buft |
External pointer to buffer type |
Logical indicating support
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Check if device supports operation
ggml_backend_dev_supports_op(device, op)ggml_backend_dev_supports_op(device, op)
device |
External pointer to device |
op |
External pointer to tensor/operation |
Logical indicating support
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device type
ggml_backend_dev_type(device)ggml_backend_dev_type(device)
device |
External pointer to device |
Device type constant
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Dynamically registers a new device in the global registry. This is an advanced function for custom backend development.
ggml_backend_device_register(device)ggml_backend_device_register(device)
device |
External pointer to device |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Device type: Accelerator
ggml_backend_device_type_accel()ggml_backend_device_type_accel()
Integer constant for accelerator device type (e.g. BLAS, AMX)
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Device type: CPU
ggml_backend_device_type_cpu()ggml_backend_device_type_cpu()
Integer constant for CPU device type
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Device type: GPU
ggml_backend_device_type_gpu()ggml_backend_device_type_gpu()
Integer constant for GPU device type
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Device type: Integrated GPU
ggml_backend_device_type_igpu()ggml_backend_device_type_igpu()
Integer constant for integrated GPU device type
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Free event
ggml_backend_event_free(event)ggml_backend_event_free(event)
event |
External pointer to event |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Create new event
ggml_backend_event_new(device)ggml_backend_event_new(device)
device |
External pointer to device |
External pointer to event, or NULL on failure
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Record event
ggml_backend_event_record(event, backend)ggml_backend_event_record(event, backend)
event |
External pointer to event |
backend |
External pointer to backend |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Synchronize event
ggml_backend_event_synchronize(event)ggml_backend_event_synchronize(event)
event |
External pointer to event |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Wait for event
ggml_backend_event_wait(backend, event)ggml_backend_event_wait(backend, event)
backend |
External pointer to backend |
event |
External pointer to event |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Releases resources associated with a backend.
ggml_backend_free(backend)ggml_backend_free(backend)
backend |
Backend pointer |
NULL invisibly
Get device from backend
ggml_backend_get_device(backend)ggml_backend_get_device(backend)
backend |
External pointer to backend |
External pointer to device
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Executes computation graph using specified backend.
ggml_backend_graph_compute(backend, graph)ggml_backend_graph_compute(backend, graph)
backend |
Backend pointer |
graph |
Graph pointer |
Status code (0 = success)
Starts graph computation without blocking. Use ggml_backend_synchronize() to wait for completion.
ggml_backend_graph_compute_async(backend, graph)ggml_backend_graph_compute_async(backend, graph)
backend |
External pointer to backend |
graph |
External pointer to computation graph |
Integer status code (0 = success)
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
cpu <- ggml_backend_cpu_init() ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) ggml_set_f32(a, rnorm(100)) # Start async computation status <- ggml_backend_graph_compute_async(cpu, graph) # Do other work while computation runs... ggml_backend_synchronize(cpu) ggml_backend_free(cpu) ggml_free(ctx)cpu <- ggml_backend_cpu_init() ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) ggml_set_f32(a, rnorm(100)) # Start async computation status <- ggml_backend_graph_compute_async(cpu, graph) # Do other work while computation runs... ggml_backend_synchronize(cpu) ggml_backend_free(cpu) ggml_free(ctx)
Execute graph plan
ggml_backend_graph_plan_compute(backend, plan)ggml_backend_graph_plan_compute(backend, plan)
backend |
External pointer to backend |
plan |
External pointer to plan |
Status code (0 = success)
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Create graph execution plan
ggml_backend_graph_plan_create(backend, graph)ggml_backend_graph_plan_create(backend, graph)
backend |
External pointer to backend |
graph |
External pointer to computation graph |
External pointer to plan, or NULL on failure
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Free graph execution plan
ggml_backend_graph_plan_free(backend, plan)ggml_backend_graph_plan_free(backend, plan)
backend |
External pointer to backend |
plan |
External pointer to plan |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Initialize best available backend
ggml_backend_init_best()ggml_backend_init_best()
External pointer to backend (GPU if available, otherwise CPU)
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Initialize backend by name
ggml_backend_init_by_name(name, params = NULL)ggml_backend_init_by_name(name, params = NULL)
name |
Backend name (e.g. "CPU", "Vulkan") |
params |
Optional parameters string |
External pointer to backend, or NULL on failure
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Initialize backend by type
ggml_backend_init_by_type(type, params = NULL)ggml_backend_init_by_type(type, params = NULL)
type |
Device type constant |
params |
Optional parameters string |
External pointer to backend, or NULL on failure
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Load backend from dynamic library
ggml_backend_load(path)ggml_backend_load(path)
path |
Path to dynamic library |
External pointer to registry, or NULL on failure
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Load all available backends
ggml_backend_load_all()ggml_backend_load_all()
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Creates a "meta" device that wraps multiple "simple" backend devices for
tensor parallelism. Each tensor is split across the wrapped devices
according to the result of split_fn, which is called by ggml when
weight buffers are allocated.
ggml_backend_meta_device(devs, split_fn, env = environment(split_fn))ggml_backend_meta_device(devs, split_fn, env = environment(split_fn))
devs |
A list of |
split_fn |
A function |
env |
An environment in which to evaluate |
The split function is invoked with two arguments:
a named list with fields name (character),
type (integer ggml_type enum), ne (numeric vector of
dimensions), op (integer op enum), flags (integer).
the number of simple devices wrapped by the meta backend.
It must return a named list with:
integer; one of 0..3 to split along a tensor axis,
10 for MIRRORED (full copy on each device),
11 for PARTIAL (each device has a partial sum), or
98/99 for NONE/UNKNOWN.
integer or numeric vector of length n_segments * n_devs
giving the per-segment, per-device slice size along the split axis.
integer; usually 1, larger for fused tensors like QKV.
If split_fn errors or returns an unparseable result, the meta
backend silently falls back to MIRRORED for that tensor and stops
calling the callback (sticky error). This is intentional: a misbehaving
callback would otherwise spray errors for every tensor in the model.
Note: with a single device this is a degenerate (no-op) configuration — useful for testing but provides no parallelism benefit. The feature is experimental and the API may change.
External pointer to the meta ggml_backend_dev_t.
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Creates a buffer that combines multiple backend buffers into one. Useful for managing memory across different backends.
ggml_backend_multi_buffer_alloc_buffer(buffers)ggml_backend_multi_buffer_alloc_buffer(buffers)
buffers |
List of backend buffer external pointers |
External pointer to multi-buffer
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
cpu <- ggml_backend_cpu_init() ctx1 <- ggml_init(1024, no_alloc = TRUE) ctx2 <- ggml_init(2048, no_alloc = TRUE) a <- ggml_new_tensor_1d(ctx1, GGML_TYPE_F32, 10) b <- ggml_new_tensor_1d(ctx2, GGML_TYPE_F32, 20) buf1 <- ggml_backend_alloc_ctx_tensors(ctx1, cpu) buf2 <- ggml_backend_alloc_ctx_tensors(ctx2, cpu) multi <- ggml_backend_multi_buffer_alloc_buffer(list(buf1, buf2)) ggml_backend_buffer_free(multi) ggml_backend_free(cpu) ggml_free(ctx1) ggml_free(ctx2)cpu <- ggml_backend_cpu_init() ctx1 <- ggml_init(1024, no_alloc = TRUE) ctx2 <- ggml_init(2048, no_alloc = TRUE) a <- ggml_new_tensor_1d(ctx1, GGML_TYPE_F32, 10) b <- ggml_new_tensor_1d(ctx2, GGML_TYPE_F32, 20) buf1 <- ggml_backend_alloc_ctx_tensors(ctx1, cpu) buf2 <- ggml_backend_alloc_ctx_tensors(ctx2, cpu) multi <- ggml_backend_multi_buffer_alloc_buffer(list(buf1, buf2)) ggml_backend_buffer_free(multi) ggml_backend_free(cpu) ggml_free(ctx1) ggml_free(ctx2)
Set usage for all buffers in a multi-buffer
ggml_backend_multi_buffer_set_usage(buffer, usage)ggml_backend_multi_buffer_set_usage(buffer, usage)
buffer |
External pointer to multi-buffer |
usage |
Usage constant (from ggml_backend_buffer_usage_*) |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Returns the name of the backend (e.g., "CPU").
ggml_backend_name(backend)ggml_backend_name(backend)
backend |
Backend pointer |
Character string name
Get backend registry by name
ggml_backend_reg_by_name(name)ggml_backend_reg_by_name(name)
name |
Registry name |
External pointer to registry, or NULL if not found
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get number of registered backends
ggml_backend_reg_count()ggml_backend_reg_count()
Number of registered backends
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get number of devices in registry
ggml_backend_reg_dev_count(reg)ggml_backend_reg_dev_count(reg)
reg |
External pointer to registry |
Number of devices
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get device from registry
ggml_backend_reg_dev_get(reg, index)ggml_backend_reg_dev_get(reg, index)
reg |
External pointer to registry |
index |
Device index (0-based) |
External pointer to device
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get backend registry by index
ggml_backend_reg_get(index)ggml_backend_reg_get(index)
index |
Registry index (0-based) |
External pointer to registry, or NULL if not found
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Get registry name
ggml_backend_reg_name(reg)ggml_backend_reg_name(reg)
reg |
External pointer to registry |
Registry name
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Dynamically registers a new backend in the global registry. This is an advanced function for custom backend development.
ggml_backend_register(reg)ggml_backend_register(reg)
reg |
External pointer to backend registry |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Allocates memory for a graph across the scheduler's backends. Must be called before computing the graph.
ggml_backend_sched_alloc_graph(sched, graph)ggml_backend_sched_alloc_graph(sched, graph)
sched |
Scheduler pointer |
graph |
Graph pointer |
Logical indicating success
Releases resources associated with the backend scheduler.
ggml_backend_sched_free(sched)ggml_backend_sched_free(sched)
sched |
Scheduler pointer from ggml_backend_sched_new() |
NULL (invisible)
cpu <- ggml_backend_cpu_init() sched <- ggml_backend_sched_new(list(cpu)) ggml_backend_sched_free(sched) ggml_backend_free(cpu)cpu <- ggml_backend_cpu_init() sched <- ggml_backend_sched_new(list(cpu)) ggml_backend_sched_free(sched) ggml_backend_free(cpu)
Returns a specific backend from the scheduler by index.
ggml_backend_sched_get_backend(sched, index = 0L)ggml_backend_sched_get_backend(sched, index = 0L)
sched |
Scheduler pointer |
index |
Backend index (0-based) |
Backend pointer
Returns the number of backends managed by the scheduler.
ggml_backend_sched_get_n_backends(sched)ggml_backend_sched_get_n_backends(sched)
sched |
Scheduler pointer |
Integer count of backends
Returns the number of tensor copies made in the last computed graph. Copies occur when data needs to be transferred between backends.
ggml_backend_sched_get_n_copies(sched)ggml_backend_sched_get_n_copies(sched)
sched |
Scheduler pointer |
Integer count of copies
Returns the number of splits in the last computed graph. Higher numbers indicate more distribution across backends.
ggml_backend_sched_get_n_splits(sched)ggml_backend_sched_get_n_splits(sched)
sched |
Scheduler pointer |
Integer count of splits
Returns which backend a tensor is assigned to.
ggml_backend_sched_get_tensor_backend(sched, tensor)ggml_backend_sched_get_tensor_backend(sched, tensor)
sched |
Scheduler pointer |
tensor |
Tensor pointer |
Backend pointer or NULL if not assigned
Computes a graph by distributing work across multiple backends. This is the main function for multi-GPU computation.
ggml_backend_sched_graph_compute(sched, graph)ggml_backend_sched_graph_compute(sched, graph)
sched |
Scheduler pointer |
graph |
Graph pointer |
Status code (0 = success)
# Multi-GPU example if (ggml_vulkan_available() && ggml_vulkan_device_count() >= 2) { gpu1 <- ggml_vulkan_init(0) gpu2 <- ggml_vulkan_init(1) sched <- ggml_backend_sched_new(list(gpu1, gpu2)) ctx <- ggml_init(64 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10000) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10000) ggml_set_f32(a, rnorm(10000)) ggml_set_f32(b, rnorm(10000)) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) # Reserve memory ggml_backend_sched_reserve(sched, graph) # Compute using both GPUs ggml_backend_sched_graph_compute(sched, graph) result <- ggml_get_f32(c) cat("Splits:", ggml_backend_sched_get_n_splits(sched), "\n") cat("Copies:", ggml_backend_sched_get_n_copies(sched), "\n") ggml_free(ctx) ggml_backend_sched_free(sched) ggml_vulkan_free(gpu1) ggml_vulkan_free(gpu2) }# Multi-GPU example if (ggml_vulkan_available() && ggml_vulkan_device_count() >= 2) { gpu1 <- ggml_vulkan_init(0) gpu2 <- ggml_vulkan_init(1) sched <- ggml_backend_sched_new(list(gpu1, gpu2)) ctx <- ggml_init(64 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10000) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10000) ggml_set_f32(a, rnorm(10000)) ggml_set_f32(b, rnorm(10000)) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) # Reserve memory ggml_backend_sched_reserve(sched, graph) # Compute using both GPUs ggml_backend_sched_graph_compute(sched, graph) result <- ggml_get_f32(c) cat("Splits:", ggml_backend_sched_get_n_splits(sched), "\n") cat("Copies:", ggml_backend_sched_get_n_copies(sched), "\n") ggml_free(ctx) ggml_backend_sched_free(sched) ggml_vulkan_free(gpu1) ggml_vulkan_free(gpu2) }
Computes a graph asynchronously across backends. Use ggml_backend_sched_synchronize() to wait for completion.
ggml_backend_sched_graph_compute_async(sched, graph)ggml_backend_sched_graph_compute_async(sched, graph)
sched |
Scheduler pointer |
graph |
Graph pointer |
Status code (0 = success)
Creates a scheduler that can distribute computation across multiple backends (GPUs, CPU). A CPU backend is automatically added as a fallback. Backends with lower index have higher priority.
ggml_backend_sched_new(backends, parallel = TRUE, graph_size = 2048)ggml_backend_sched_new(backends, parallel = TRUE, graph_size = 2048)
backends |
List of backend pointers (from ggml_vulkan_init() or ggml_backend_cpu_init()). Note: A CPU backend is automatically added, so you only need to specify GPU backends. |
parallel |
Logical, whether to run backends in parallel (default: TRUE) |
graph_size |
Expected maximum graph size (default: 2048) |
Scheduler pointer
if (ggml_vulkan_available() && ggml_vulkan_device_count() >= 2) { # Create two GPU backends (CPU is added automatically) gpu1 <- ggml_vulkan_init(0) gpu2 <- ggml_vulkan_init(1) # Create scheduler with both GPUs + CPU (automatic) sched <- ggml_backend_sched_new(list(gpu1, gpu2), parallel = TRUE) # The scheduler now has 3 backends: GPU1, GPU2, CPU cat("Backends:", ggml_backend_sched_get_n_backends(sched), "\\n") # Use scheduler... # Cleanup ggml_backend_sched_free(sched) ggml_vulkan_free(gpu1) ggml_vulkan_free(gpu2) }if (ggml_vulkan_available() && ggml_vulkan_device_count() >= 2) { # Create two GPU backends (CPU is added automatically) gpu1 <- ggml_vulkan_init(0) gpu2 <- ggml_vulkan_init(1) # Create scheduler with both GPUs + CPU (automatic) sched <- ggml_backend_sched_new(list(gpu1, gpu2), parallel = TRUE) # The scheduler now has 3 backends: GPU1, GPU2, CPU cat("Backends:", ggml_backend_sched_get_n_backends(sched), "\\n") # Use scheduler... # Cleanup ggml_backend_sched_free(sched) ggml_vulkan_free(gpu1) ggml_vulkan_free(gpu2) }
Pre-allocates memory based on a measurement graph. This should be called before using the scheduler to compute graphs.
ggml_backend_sched_reserve(sched, graph)ggml_backend_sched_reserve(sched, graph)
sched |
Scheduler pointer |
graph |
Graph pointer to measure memory requirements |
Logical indicating success
cpu <- ggml_backend_cpu_init() sched <- ggml_backend_sched_new(list(cpu)) ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1000) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1000) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) ggml_backend_sched_reserve(sched, graph) ggml_backend_sched_free(sched) ggml_backend_free(cpu) ggml_free(ctx)cpu <- ggml_backend_cpu_init() sched <- ggml_backend_sched_new(list(cpu)) ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1000) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1000) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) ggml_backend_sched_reserve(sched, graph) ggml_backend_sched_free(sched) ggml_backend_free(cpu) ggml_free(ctx)
Resets the scheduler, deallocating all tensors. Must be called before changing node backends or allocating a new graph.
ggml_backend_sched_reset(sched)ggml_backend_sched_reset(sched)
sched |
Scheduler pointer |
NULL (invisible)
Manually assigns a specific tensor to run on a specific backend. This overrides automatic scheduling.
ggml_backend_sched_set_tensor_backend(sched, tensor, backend)ggml_backend_sched_set_tensor_backend(sched, tensor, backend)
sched |
Scheduler pointer |
tensor |
Tensor pointer |
backend |
Backend pointer to assign tensor to |
NULL (invisible)
Waits for all asynchronous operations to complete.
ggml_backend_sched_synchronize(sched)ggml_backend_sched_synchronize(sched)
sched |
Scheduler pointer |
NULL (invisible)
Synchronize backend
ggml_backend_synchronize(backend)ggml_backend_synchronize(backend)
backend |
External pointer to backend |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Copy tensor asynchronously between backends
ggml_backend_tensor_copy_async(backend_src, backend_dst, src, dst)ggml_backend_tensor_copy_async(backend_src, backend_dst, src, dst)
backend_src |
Source backend |
backend_dst |
Destination backend |
src |
Source tensor |
dst |
Destination tensor |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Gets tensor data from a backend with synchronization.
ggml_backend_tensor_get_and_sync(backend, tensor, offset = 0, size)ggml_backend_tensor_get_and_sync(backend, tensor, offset = 0, size)
backend |
Backend pointer (or NULL for CPU) |
tensor |
Tensor pointer |
offset |
Byte offset (default 0) |
size |
Number of bytes to read |
Raw vector with tensor data
Get tensor data asynchronously
ggml_backend_tensor_get_async(backend, tensor, offset = 0, size)ggml_backend_tensor_get_async(backend, tensor, offset = 0, size)
backend |
External pointer to backend |
tensor |
External pointer to tensor |
offset |
Byte offset (default 0) |
size |
Number of bytes to read |
Numeric vector with data
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_set_async(),
ggml_backend_unload()
Gets tensor data using the backend API. This works with tensors allocated on any backend, not just CPU.
ggml_backend_tensor_get_data(tensor, offset = 0, n_elements = NULL)ggml_backend_tensor_get_data(tensor, offset = 0, n_elements = NULL)
tensor |
Tensor pointer |
offset |
Byte offset (default: 0) |
n_elements |
Number of elements to retrieve (NULL for all) |
R vector with tensor data
Reads the first f32 element from a backend tensor.
ggml_backend_tensor_get_f32_first(tensor)ggml_backend_tensor_get_f32_first(tensor)
tensor |
Tensor pointer |
Float value
Set tensor data asynchronously
ggml_backend_tensor_set_async(backend, tensor, data, offset = 0, size = NULL)ggml_backend_tensor_set_async(backend, tensor, data, offset = 0, size = NULL)
backend |
External pointer to backend |
tensor |
External pointer to tensor |
data |
Numeric or integer vector |
offset |
Byte offset (default 0) |
size |
Number of bytes to copy |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_unload()
Sets tensor data using the backend API. This works with tensors allocated on any backend, not just CPU.
ggml_backend_tensor_set_data(tensor, data, offset = 0)ggml_backend_tensor_set_data(tensor, data, offset = 0)
tensor |
Tensor pointer |
data |
R vector with data to set |
offset |
Byte offset (default: 0) |
No return value, called for side effects
Unload backend
ggml_backend_unload(reg)ggml_backend_unload(reg)
reg |
External pointer to registry |
NULL invisibly
Other backend:
ggml_backend_buffer_clear(),
ggml_backend_buffer_get_usage(),
ggml_backend_buffer_is_host(),
ggml_backend_buffer_is_multi_buffer(),
ggml_backend_buffer_reset(),
ggml_backend_buffer_set_usage(),
ggml_backend_buffer_usage_any(),
ggml_backend_buffer_usage_compute(),
ggml_backend_buffer_usage_weights(),
ggml_backend_dev_by_name(),
ggml_backend_dev_by_type(),
ggml_backend_dev_count(),
ggml_backend_dev_description(),
ggml_backend_dev_get(),
ggml_backend_dev_get_props(),
ggml_backend_dev_init(),
ggml_backend_dev_memory(),
ggml_backend_dev_name(),
ggml_backend_dev_offload_op(),
ggml_backend_dev_supports_buft(),
ggml_backend_dev_supports_op(),
ggml_backend_dev_type(),
ggml_backend_device_register(),
ggml_backend_device_type_accel(),
ggml_backend_device_type_cpu(),
ggml_backend_device_type_gpu(),
ggml_backend_device_type_igpu(),
ggml_backend_event_free(),
ggml_backend_event_new(),
ggml_backend_event_record(),
ggml_backend_event_synchronize(),
ggml_backend_event_wait(),
ggml_backend_get_device(),
ggml_backend_graph_compute_async(),
ggml_backend_graph_plan_compute(),
ggml_backend_graph_plan_create(),
ggml_backend_graph_plan_free(),
ggml_backend_init_best(),
ggml_backend_init_by_name(),
ggml_backend_init_by_type(),
ggml_backend_load(),
ggml_backend_load_all(),
ggml_backend_meta_device(),
ggml_backend_multi_buffer_alloc_buffer(),
ggml_backend_multi_buffer_set_usage(),
ggml_backend_reg_by_name(),
ggml_backend_reg_count(),
ggml_backend_reg_dev_count(),
ggml_backend_reg_dev_get(),
ggml_backend_reg_get(),
ggml_backend_reg_name(),
ggml_backend_register(),
ggml_backend_synchronize(),
ggml_backend_tensor_copy_async(),
ggml_backend_tensor_get_async(),
ggml_backend_tensor_set_async()
Create a Batch Normalization Layer Object
ggml_batch_norm(eps = 1e-05, name = NULL, trainable = TRUE)ggml_batch_norm(eps = 1e-05, name = NULL, trainable = TRUE)
eps |
Small constant for numerical stability (default 1e-5). |
name |
Optional character name. |
trainable |
Logical. |
A ggml_layer object.
Returns the block size for a GGML type. Quantized types process data in blocks (e.g., 32 elements for Q4_0).
ggml_blck_size(type)ggml_blck_size(type)
type |
GGML type constant |
Integer block size
Other type_system:
ggml_ftype_to_ggml_type(),
ggml_is_quantized(),
ggml_type_name(),
ggml_type_sizef()
ggml_blck_size(GGML_TYPE_F32) # 1 ggml_blck_size(GGML_TYPE_Q4_0) # 32ggml_blck_size(GGML_TYPE_F32) # 1 ggml_blck_size(GGML_TYPE_Q4_0) # 32
Builds a computation graph from the output tensor, expanding backwards to include all dependencies.
Creates a computation graph by expanding backwards from the output tensor
ggml_build_forward_expand(ctx, tensor) ggml_build_forward_expand(ctx, tensor)ggml_build_forward_expand(ctx, tensor) ggml_build_forward_expand(ctx, tensor)
ctx |
GGML context |
tensor |
Output tensor of the computation |
Graph pointer
Graph object (external pointer)
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx) ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) ggml_set_f32(b, 11:20) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(c) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) result <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx) ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) ggml_set_f32(b, 11:20) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(c) ggml_free(ctx)
Stops training when the monitored metric does not improve.
ggml_callback_early_stopping( monitor = "val_loss", patience = 5, min_delta = 0, mode = "auto" )ggml_callback_early_stopping( monitor = "val_loss", patience = 5, min_delta = 0, mode = "auto" )
monitor |
Metric to monitor: "val_loss", "val_accuracy", "train_loss", "train_accuracy" |
patience |
Number of epochs with no improvement before stopping |
min_delta |
Minimum change to qualify as improvement |
mode |
"min" (lower is better) or "max" (higher is better). "auto" infers from monitor name. |
List with on_epoch_end function
Other callbacks:
ggml_schedule_cosine_decay(),
ggml_schedule_reduce_on_plateau(),
ggml_schedule_step_decay()
Check if tensor a can be repeated (broadcast) to match tensor b.
Used for broadcasting operations.
ggml_can_repeat(a, b)ggml_can_repeat(a, b)
a |
Source tensor (smaller) |
b |
Target tensor (larger or same size) |
Logical indicating if a can be repeated to match b
Other tensor_layout:
ggml_are_same_stride(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 8) ggml_can_repeat(a, b) # TRUE - a can broadcast along dim 1 ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 8) ggml_can_repeat(a, b) # TRUE - a can broadcast along dim 1 ggml_free(ctx)
Creates a graph node for element-wise ceiling: ceil(x)
ggml_ceil(ctx, a)ggml_ceil(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the ceil operation
Creates a graph node for in-place element-wise ceiling.
ggml_ceil_inplace(ctx, a)ggml_ceil_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with ceiling values
Creates a graph node for clamping values to a range: clamp(x, min, max)
ggml_clamp(ctx, a, min_val, max_val)ggml_clamp(ctx, a, min_val, max_val)
ctx |
GGML context |
a |
Input tensor |
min_val |
Minimum value |
max_val |
Maximum value |
Tensor with values clamped to [min_val, max_val]
Configures the model for training: infers shapes, creates backend. Weight tensors are created at training time when batch_size is known.
## S3 method for class 'ggml_functional_model' ggml_compile( model, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), backend = "auto" ) ggml_compile( model, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), backend = "auto" ) ## S3 method for class 'ggml_sequential_model' ggml_compile( model, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), backend = "auto" )## S3 method for class 'ggml_functional_model' ggml_compile( model, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), backend = "auto" ) ggml_compile( model, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), backend = "auto" ) ## S3 method for class 'ggml_sequential_model' ggml_compile( model, optimizer = "adam", loss = "categorical_crossentropy", metrics = c("accuracy"), backend = "auto" )
model |
A ggml_sequential_model object |
optimizer |
Optimizer name: "adam" or "sgd" |
loss |
Loss function name: "categorical_crossentropy" or "mse" |
metrics |
Character vector of metrics (currently "accuracy") |
backend |
Backend to use: "auto" (GPU if available, else CPU), "cpu", or "vulkan" |
The compiled model (invisibly).
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_max_pooling_2d(c(2, 2)) |> ggml_layer_flatten() |> ggml_layer_dense(10, activation = "softmax") model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy")model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_max_pooling_2d(c(2, 2)) |> ggml_layer_flatten() |> ggml_layer_dense(10, activation = "softmax") model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy")
Concatenates two tensors along a specified dimension. CRITICAL for KV-cache operations in transformers.
ggml_concat(ctx, a, b, dim = 0)ggml_concat(ctx, a, b, dim = 0)
ctx |
GGML context |
a |
First tensor |
b |
Second tensor (must match a in all dimensions except the concat dim) |
dim |
Dimension along which to concatenate (0-3) |
Concatenated tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 3) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 2) ggml_set_f32(a, rnorm(12)) ggml_set_f32(b, rnorm(8)) # Concatenate along dimension 1: result is 4x5 c <- ggml_concat(ctx, a, b, 1) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 3) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 2) ggml_set_f32(a, rnorm(12)) ggml_set_f32(b, rnorm(8)) # Concatenate along dimension 1: result is 4x5 c <- ggml_concat(ctx, a, b, 1) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Makes a tensor contiguous in memory. Required after permute/transpose before some operations.
ggml_cont(ctx, a)ggml_cont(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Contiguous tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 4) ggml_set_f32(a, 1:12) transposed <- ggml_transpose(ctx, a) contiguous <- ggml_cont(ctx, transposed) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 4) ggml_set_f32(a, 1:12) transposed <- ggml_transpose(ctx, a) contiguous <- ggml_cont(ctx, transposed) ggml_free(ctx)
Applies 1D convolution to input data.
ggml_conv_1d(ctx, a, b, s0 = 1L, p0 = 0L, d0 = 1L)ggml_conv_1d(ctx, a, b, s0 = 1L, p0 = 0L, d0 = 1L)
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0 |
Stride (default 1) |
p0 |
Padding (default 0) |
d0 |
Dilation (default 1) |
Convolved tensor
Applies depthwise 1D convolution: each input channel is convolved with its own kernel.
ggml_conv_1d_dw(ctx, a, b, s0 = 1L, p0 = 0L, d0 = 1L)ggml_conv_1d_dw(ctx, a, b, s0 = 1L, p0 = 0L, d0 = 1L)
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0 |
Stride (default 1) |
p0 |
Padding (default 0) |
d0 |
Dilation (default 1) |
Convolved tensor
Applies 2D convolution to input data.
ggml_conv_2d(ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L)ggml_conv_2d(ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L)
ctx |
GGML context |
a |
Convolution kernel tensor [KW, KH, IC, OC] |
b |
Input data tensor [W, H, C, N] |
s0 |
Stride dimension 0 (default 1) |
s1 |
Stride dimension 1 (default 1) |
p0 |
Padding dimension 0 (default 0) |
p1 |
Padding dimension 1 (default 0) |
d0 |
Dilation dimension 0 (default 1) |
d1 |
Dilation dimension 1 (default 1) |
Convolved tensor
Applies 2D convolution using the direct algorithm (no im2col).
ggml_conv_2d_direct( ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L )ggml_conv_2d_direct( ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L )
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0 |
Stride dimension 0 (default 1) |
s1 |
Stride dimension 1 (default 1) |
p0 |
Padding dimension 0 (default 0) |
p1 |
Padding dimension 1 (default 0) |
d0 |
Dilation dimension 0 (default 1) |
d1 |
Dilation dimension 1 (default 1) |
Convolved tensor
Applies depthwise 2D convolution: each input channel is convolved with its own kernel. Uses the im2col-based path.
ggml_conv_2d_dw( ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L )ggml_conv_2d_dw( ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L )
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0, s1
|
Strides along dim 0 and 1 (default 1) |
p0, p1
|
Padding along dim 0 and 1 (default 0) |
d0, d1
|
Dilation along dim 0 and 1 (default 1) |
Convolved tensor
Direct depthwise 2D convolution without an explicit im2col intermediate.
ggml_conv_2d_dw_direct( ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L )ggml_conv_2d_dw_direct( ctx, a, b, s0 = 1L, s1 = 1L, p0 = 0L, p1 = 0L, d0 = 1L, d1 = 1L )
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0, s1
|
Strides along dim 0 and 1 (default 1) |
p0, p1
|
Padding along dim 0 and 1 (default 0) |
d0, d1
|
Dilation along dim 0 and 1 (default 1) |
Convolved tensor
Applies transposed 1D convolution (deconvolution) to input data.
ggml_conv_transpose_1d(ctx, a, b, s0 = 1L, p0 = 0L, d0 = 1L)ggml_conv_transpose_1d(ctx, a, b, s0 = 1L, p0 = 0L, d0 = 1L)
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0 |
Stride (default 1) |
p0 |
Padding (default 0) |
d0 |
Dilation (default 1) |
Transposed convolved tensor
Applies transposed 2D convolution (deconvolution) with zero padding.
ggml_conv_transpose_2d_p0(ctx, a, b, stride = 1L)ggml_conv_transpose_2d_p0(ctx, a, b, stride = 1L)
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
stride |
Stride (default 1) |
Transposed convolved tensor
Creates a graph node for element-wise cosine: cos(x)
ggml_cos(ctx, a)ggml_cos(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the cos operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(0, pi/3, pi/2, pi)) result <- ggml_cos(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 0.5, 0, -1] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(0, pi/3, pi/2, pi)) result <- ggml_cos(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 0.5, 0, -1] ggml_free(ctx)
Creates a graph node that counts equal elements between two tensors. Useful for accuracy computation.
ggml_count_equal(ctx, a, b)ggml_count_equal(ctx, a, b)
ctx |
GGML context |
a |
First tensor |
b |
Second tensor (same shape as a) |
Tensor containing the count of equal elements
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
ctx <- ggml_init(16 * 1024 * 1024) pred <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 100) labels <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 100) # ... set values ... correct <- ggml_count_equal(ctx, pred, labels) graph <- ggml_build_forward_expand(ctx, correct) ggml_graph_compute(ctx, graph) # correct now contains count of matching elements ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) pred <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 100) labels <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 100) # ... set values ... correct <- ggml_count_equal(ctx, pred, labels) graph <- ggml_build_forward_expand(ctx, correct) ggml_graph_compute(ctx, graph) # correct now contains count of matching elements ggml_free(ctx)
Performs element-wise addition of two tensors using direct CPU computation. Returns the result as an R numeric vector. Does NOT use computation graphs.
ggml_cpu_add(a, b)ggml_cpu_add(a, b)
a |
First tensor (must be F32 type) |
b |
Second tensor (must be F32 type, same size as a) |
Numeric vector containing the element-wise sum
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) ggml_cpu_add(a, b) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(5, 4, 3, 2, 1)) ggml_cpu_add(a, b) ggml_free(ctx)
Returns a named list of all CPU feature detection results. Useful for diagnostics and optimizing computation.
ggml_cpu_features()ggml_cpu_features()
Named list with feature names and logical values
Other cpu_features:
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
features <- ggml_cpu_features() print(features) # On typical x86-64: sse3=TRUE, avx=TRUE, avx2=TRUE, ...features <- ggml_cpu_features() print(features) # On typical x86-64: sse3=TRUE, avx=TRUE, avx2=TRUE, ...
Returns the RISC-V RVV vector length in bytes (0 if not supported).
ggml_cpu_get_rvv_vlen()ggml_cpu_get_rvv_vlen()
Integer vector length in bytes
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Returns the SVE vector length in bytes (0 if not supported).
ggml_cpu_get_sve_cnt()ggml_cpu_get_sve_cnt()
Integer vector length in bytes
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AMX INT8 (Advanced Matrix Extensions). AMX provides hardware acceleration for matrix operations on Intel CPUs.
ggml_cpu_has_amx_int8()ggml_cpu_has_amx_int8()
Logical indicating AMX INT8 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM FMA (Fused Multiply-Add).
ggml_cpu_has_arm_fma()ggml_cpu_has_arm_fma()
Logical indicating ARM FMA support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX instructions.
ggml_cpu_has_avx()ggml_cpu_has_avx()
Logical indicating AVX support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX-VNNI instructions.
ggml_cpu_has_avx_vnni()ggml_cpu_has_avx_vnni()
Logical indicating AVX-VNNI support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX2 instructions. AVX2 provides 256-bit SIMD operations for faster matrix math.
ggml_cpu_has_avx2()ggml_cpu_has_avx2()
Logical indicating AVX2 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX-512 instructions. AVX-512 provides 512-bit SIMD for maximum throughput.
ggml_cpu_has_avx512()ggml_cpu_has_avx512()
Logical indicating AVX-512 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX-512 BF16 (bfloat16) instructions.
ggml_cpu_has_avx512_bf16()ggml_cpu_has_avx512_bf16()
Logical indicating AVX-512 BF16 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX-512 VBMI instructions.
ggml_cpu_has_avx512_vbmi()ggml_cpu_has_avx512_vbmi()
Logical indicating AVX-512 VBMI support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports AVX-512 VNNI instructions. VNNI accelerates neural network inference with int8/int16 dot products.
ggml_cpu_has_avx512_vnni()ggml_cpu_has_avx512_vnni()
Logical indicating AVX-512 VNNI support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports BMI2 (Bit Manipulation Instructions 2).
ggml_cpu_has_bmi2()ggml_cpu_has_bmi2()
Logical indicating BMI2 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM dot product instructions. Accelerates int8 matrix multiplication common in quantized models.
ggml_cpu_has_dotprod()ggml_cpu_has_dotprod()
Logical indicating dot product support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports F16C instructions for float16 conversion.
ggml_cpu_has_f16c()ggml_cpu_has_f16c()
Logical indicating F16C support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports FMA (Fused Multiply-Add) instructions. FMA allows matrix operations to run faster by combining operations.
ggml_cpu_has_fma()ggml_cpu_has_fma()
Logical indicating FMA support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM half-precision FP16 vector arithmetic.
ggml_cpu_has_fp16_va()ggml_cpu_has_fp16_va()
Logical indicating FP16 VA support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if llamafile optimizations are available.
ggml_cpu_has_llamafile()ggml_cpu_has_llamafile()
Logical indicating llamafile support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM INT8 matrix multiplication.
ggml_cpu_has_matmul_int8()ggml_cpu_has_matmul_int8()
Logical indicating INT8 MATMUL support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM NEON instructions. NEON is ARM's SIMD extension for vectorized operations.
ggml_cpu_has_neon()ggml_cpu_has_neon()
Logical indicating NEON support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports RISC-V Vector extension.
ggml_cpu_has_riscv_v()ggml_cpu_has_riscv_v()
Logical indicating RISC-V V support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM SME (Scalable Matrix Extension).
ggml_cpu_has_sme()ggml_cpu_has_sme()
Logical indicating SME support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports SSE3 instructions.
ggml_cpu_has_sse3()ggml_cpu_has_sse3()
Logical indicating SSE3 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
ggml_cpu_has_sse3()ggml_cpu_has_sse3()
Check if the CPU supports SSSE3 instructions.
ggml_cpu_has_ssse3()ggml_cpu_has_ssse3()
Logical indicating SSSE3 support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports ARM SVE (Scalable Vector Extension).
ggml_cpu_has_sve()ggml_cpu_has_sve()
Logical indicating SVE support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports PowerPC VSX instructions.
ggml_cpu_has_vsx()ggml_cpu_has_vsx()
Logical indicating VSX support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vxe(),
ggml_cpu_has_wasm_simd()
Check if the CPU supports IBM z/Architecture VXE instructions.
ggml_cpu_has_vxe()ggml_cpu_has_vxe()
Logical indicating VXE support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_wasm_simd()
Check if the CPU/environment supports WebAssembly SIMD.
ggml_cpu_has_wasm_simd()ggml_cpu_has_wasm_simd()
Logical indicating WASM SIMD support
Other cpu_features:
ggml_cpu_features(),
ggml_cpu_get_rvv_vlen(),
ggml_cpu_get_sve_cnt(),
ggml_cpu_has_amx_int8(),
ggml_cpu_has_arm_fma(),
ggml_cpu_has_avx(),
ggml_cpu_has_avx2(),
ggml_cpu_has_avx512(),
ggml_cpu_has_avx512_bf16(),
ggml_cpu_has_avx512_vbmi(),
ggml_cpu_has_avx512_vnni(),
ggml_cpu_has_avx_vnni(),
ggml_cpu_has_bmi2(),
ggml_cpu_has_dotprod(),
ggml_cpu_has_f16c(),
ggml_cpu_has_fma(),
ggml_cpu_has_fp16_va(),
ggml_cpu_has_llamafile(),
ggml_cpu_has_matmul_int8(),
ggml_cpu_has_neon(),
ggml_cpu_has_riscv_v(),
ggml_cpu_has_sme(),
ggml_cpu_has_sse3(),
ggml_cpu_has_ssse3(),
ggml_cpu_has_sve(),
ggml_cpu_has_vsx(),
ggml_cpu_has_vxe()
Performs element-wise multiplication of two tensors using direct CPU computation. Returns the result as an R numeric vector. Does NOT use computation graphs.
ggml_cpu_mul(a, b)ggml_cpu_mul(a, b)
a |
First tensor (must be F32 type) |
b |
Second tensor (must be F32 type, same size as a) |
Numeric vector containing the element-wise product
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) ggml_cpu_mul(a, b) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) ggml_cpu_mul(a, b) ggml_free(ctx)
Copies tensor a into tensor b, performing type conversion if needed. The tensors must have the same number of elements. CRITICAL for type casting operations (e.g., F32 to F16).
ggml_cpy(ctx, a, b)ggml_cpy(ctx, a, b)
ctx |
GGML context |
a |
Source tensor |
b |
Destination tensor (defines output type and shape) |
Tensor representing the copy operation (returns b with a's data)
ctx <- ggml_init(16 * 1024 * 1024) # Create F32 tensor a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_set_f32(a, rnorm(100)) # Create F16 tensor for output b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F16, 100) # Copy with F32 -> F16 conversion result <- ggml_cpy(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Create F32 tensor a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_set_f32(a, rnorm(100)) # Create F16 tensor for output b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F16, 100) # Copy with F32 -> F16 conversion result <- ggml_cpy(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Returns the current CPU cycle count. Useful for low-level benchmarking.
ggml_cycles()ggml_cycles()
Numeric value representing CPU cycles
ggml_cycles()ggml_cycles()
Returns an estimate of CPU cycles per millisecond. Useful for converting cycle counts to time.
ggml_cycles_per_ms()ggml_cycles_per_ms()
Numeric value representing cycles per millisecond
ggml_cycles_per_ms()ggml_cycles_per_ms()
Constructs an uncompiled sequential multi-layer perceptron suitable as a
starting point for tabular classification or regression. This is the default
model_fn used by LearnerClassifGGML and LearnerRegrGGML
when the user does not supply a custom builder, and it is also exported for
direct use or as a template for user-defined builders.
ggml_default_mlp( n_features, n_out, task_type = c("classif", "regr"), hidden_layers = c(128L, 64L), activation = "relu", dropout = 0.2 )ggml_default_mlp( n_features, n_out, task_type = c("classif", "regr"), hidden_layers = c(128L, 64L), activation = "relu", dropout = 0.2 )
n_features |
Integer. Number of input features. Required. |
n_out |
Integer. Number of output units. For classification this is the number of classes; for regression this is typically 1. |
task_type |
Character. One of |
|
Integer vector. Widths of the hidden dense layers.
Default |
|
activation |
Character. Activation applied to each hidden layer.
Default |
dropout |
Numeric in |
The returned model is not compiled: the caller is responsible for
calling ggml_compile with the appropriate loss
("categorical_crossentropy" for classification, "mse" for
regression) before training.
The final layer is chosen based on task_type:
"classif" — dense with units = n_out and softmax activation.
"regr" — dense with units = n_out and no activation
(identity / linear output).
An uncompiled ggml_sequential_model object. Call
ggml_compile before ggml_fit.
ggml_model_sequential, ggml_layer_dense,
ggml_layer_dropout, ggml_compile
## Not run: # 3-class classifier on 20 features model <- ggml_default_mlp( n_features = 20L, n_out = 3L, task_type = "classif", hidden_layers = c(64L, 32L), dropout = 0.1 ) model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") # Single-output regressor reg <- ggml_default_mlp( n_features = 10L, n_out = 1L, task_type = "regr" ) reg <- ggml_compile(reg, optimizer = "adam", loss = "mse") ## End(Not run)## Not run: # 3-class classifier on 20 features model <- ggml_default_mlp( n_features = 20L, n_out = 3L, task_type = "classif", hidden_layers = c(64L, 32L), dropout = 0.1 ) model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") # Single-output regressor reg <- ggml_default_mlp( n_features = 10L, n_out = 1L, task_type = "regr" ) reg <- ggml_compile(reg, optimizer = "adam", loss = "mse") ## End(Not run)
Returns a reusable layer object for use with ggml_apply().
Applying the same object to multiple tensor nodes shares weights.
ggml_dense(units, activation = NULL, name = NULL, trainable = TRUE)ggml_dense(units, activation = NULL, name = NULL, trainable = TRUE)
units |
Number of output units. |
activation |
Activation function name or NULL. |
name |
Optional character name. |
trainable |
Logical; whether weights are updated during training. |
A ggml_layer object.
encoder <- ggml_dense(64L, activation = "relu") x1 <- ggml_input(shape = 32L) x2 <- ggml_input(shape = 32L) out1 <- x1 |> ggml_apply(encoder) out2 <- x2 |> ggml_apply(encoder) # shared weightsencoder <- ggml_dense(64L, activation = "relu") x1 <- ggml_input(shape = 32L) x2 <- ggml_input(shape = 32L) out1 <- x1 |> ggml_apply(encoder) out2 <- x2 |> ggml_apply(encoder) # shared weights
Creates a diagonal matrix from a vector. For vector a[n], produces matrix with a on the diagonal.
ggml_diag(ctx, a)ggml_diag(ctx, a)
ctx |
GGML context |
a |
Input vector tensor |
Diagonal matrix tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) ggml_set_f32(a, c(1, 2, 3)) d <- ggml_diag(ctx, a) # 3x3 diagonal matrix graph <- ggml_build_forward_expand(ctx, d) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) ggml_set_f32(a, c(1, 2, 3)) d <- ggml_diag(ctx, a) # 3x3 diagonal matrix graph <- ggml_build_forward_expand(ctx, d) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Creates a graph node that sets elements above the diagonal to -Inf. This is used for causal (autoregressive) attention masking.
ggml_diag_mask_inf(ctx, a, n_past)ggml_diag_mask_inf(ctx, a, n_past)
ctx |
GGML context |
a |
Input tensor (typically attention scores) |
n_past |
Number of past tokens (shifts the diagonal). Use 0 for standard causal masking where position i can only attend to positions <= i. |
In causal attention, we want each position to only attend to itself and previous positions. Setting future positions to -Inf ensures that after softmax, they contribute 0 attention weight.
The n_past parameter allows for KV-cache scenarios where the diagonal needs to be shifted to account for previously processed tokens.
Tensor with same shape as input, elements above diagonal set to -Inf
ctx <- ggml_init(16 * 1024 * 1024) # Create attention scores matrix scores <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 4) ggml_set_f32(scores, rep(1, 16)) # Apply causal mask masked <- ggml_diag_mask_inf(ctx, scores, 0) graph <- ggml_build_forward_expand(ctx, masked) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Create attention scores matrix scores <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 4) ggml_set_f32(scores, rep(1, 16)) # Apply causal mask masked <- ggml_diag_mask_inf(ctx, scores, 0) graph <- ggml_build_forward_expand(ctx, masked) ggml_graph_compute(ctx, graph) ggml_free(ctx)
In-place version of ggml_diag_mask_inf. Returns a view of the input tensor.
ggml_diag_mask_inf_inplace(ctx, a, n_past)ggml_diag_mask_inf_inplace(ctx, a, n_past)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
n_past |
Number of past tokens |
View of input tensor with elements above diagonal set to -Inf
Creates a graph node that sets elements above the diagonal to 0. Alternative to -Inf masking for certain use cases.
ggml_diag_mask_zero(ctx, a, n_past)ggml_diag_mask_zero(ctx, a, n_past)
ctx |
GGML context |
a |
Input tensor |
n_past |
Number of past tokens |
Tensor with same shape as input, elements above diagonal set to 0
Creates a graph node for element-wise division.
ggml_div(ctx, a, b)ggml_div(ctx, a, b)
ctx |
GGML context |
a |
First tensor (numerator) |
b |
Second tensor (denominator, same shape as a) |
Tensor representing the division operation (a / b)
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(10, 20, 30, 40, 50)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) result <- ggml_div(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(10, 20, 30, 40, 50)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) result <- ggml_div(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place element-wise division. Result is stored in tensor a, saving memory allocation.
ggml_div_inplace(ctx, a, b)ggml_div_inplace(ctx, a, b)
ctx |
GGML context |
a |
First tensor (will be modified in-place) |
b |
Second tensor (same shape as a) |
View of tensor a with the division result
Creates a graph node that copies a tensor. This is a graph operation that must be computed using ggml_build_forward_expand() and ggml_graph_compute(). Unlike ggml_dup_tensor which just allocates, this creates a copy operation in the graph.
ggml_dup(ctx, a)ggml_dup(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the copy operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) b <- ggml_dup(ctx, a) graph <- ggml_build_forward_expand(ctx, b) ggml_graph_compute(ctx, graph) ggml_get_f32(b) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) b <- ggml_dup(ctx, a) graph <- ggml_build_forward_expand(ctx, b) ggml_graph_compute(ctx, graph) ggml_get_f32(b) ggml_free(ctx)
Creates a graph node for in-place tensor duplication. Returns a view of the input tensor.
ggml_dup_inplace(ctx, a)ggml_dup_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor |
View of tensor a
Creates a copy of a tensor with the same shape and type
ggml_dup_tensor(ctx, tensor)ggml_dup_tensor(ctx, tensor)
ctx |
GGML context |
tensor |
Tensor to duplicate |
New tensor pointer with same shape
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) b <- ggml_dup_tensor(ctx, a) ggml_nelements(b) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) b <- ggml_dup_tensor(ctx, a) ggml_nelements(b) ggml_free(ctx)
Returns the size of a single element in the tensor.
ggml_element_size(tensor)ggml_element_size(tensor)
tensor |
Tensor pointer |
Element size in bytes
Creates a graph node for ELU (Exponential Linear Unit) activation. ELU(x) = x if x > 0, else alpha * (exp(x) - 1) where alpha = 1.
ggml_elu(ctx, a)ggml_elu(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the ELU operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_elu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_elu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)
Creates a graph node for in-place ELU (Exponential Linear Unit) activation.
ggml_elu_inplace(ctx, a)ggml_elu_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with ELU applied
Create an Embedding Layer Object
ggml_embedding(vocab_size, dim, name = NULL, trainable = TRUE)ggml_embedding(vocab_size, dim, name = NULL, trainable = TRUE)
vocab_size |
Number of distinct tokens. |
dim |
Embedding dimension. |
name |
Optional character name. |
trainable |
Logical. |
A ggml_layer object.
Helper function to estimate memory needed for a tensor
ggml_estimate_memory(type = GGML_TYPE_F32, ne0, ne1 = 1, ne2 = 1, ne3 = 1)ggml_estimate_memory(type = GGML_TYPE_F32, ne0, ne1 = 1, ne2 = 1, ne3 = 1)
type |
Tensor type (GGML_TYPE_F32, etc) |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 (optional) |
ne2 |
Size of dimension 2 (optional) |
ne3 |
Size of dimension 3 (optional) |
Estimated memory in bytes
# For 1000x1000 F32 matrix ggml_estimate_memory(GGML_TYPE_F32, 1000, 1000)# For 1000x1000 F32 matrix ggml_estimate_memory(GGML_TYPE_F32, 1000, 1000)
Evaluate a Trained Model
## S3 method for class 'ggml_functional_model' ggml_evaluate(model, x, y, batch_size = 32L, ...) ggml_evaluate(model, ...) ## S3 method for class 'ggml_sequential_model' ggml_evaluate( model, x, y, batch_size = 32, sample_weight = NULL, class_weight = NULL, ... )## S3 method for class 'ggml_functional_model' ggml_evaluate(model, x, y, batch_size = 32L, ...) ggml_evaluate(model, ...) ## S3 method for class 'ggml_sequential_model' ggml_evaluate( model, x, y, batch_size = 32, sample_weight = NULL, class_weight = NULL, ... )
model |
A trained ggml_sequential_model |
x |
Test data |
y |
Test labels (one-hot encoded) |
batch_size |
Batch size for evaluation |
... |
Additional arguments (ignored). |
sample_weight |
Numeric vector of per-sample weights (length = nrow(x)). |
class_weight |
Named vector of weights per class, e.g. c("0"=1, "1"=10). Cannot be used with sample_weight. |
Named list with loss and accuracy.
n <- 128 x <- matrix(runif(n * 4), nrow = n, ncol = 4) y <- matrix(0, nrow = n, ncol = 2) for (i in seq_len(n)) { y[i, if (sum(x[i,]) > 2) 1L else 2L] <- 1 } model <- ggml_model_sequential() |> ggml_layer_dense(8, activation = "relu") |> ggml_layer_dense(2, activation = "softmax") model$input_shape <- 4L model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") model <- ggml_fit(model, x, y, epochs = 5, batch_size = 32, verbose = 0) # Basic evaluation result <- ggml_evaluate(model, x, y, batch_size = 32) # With sample_weight sw <- runif(n, 0.5, 1.5) result <- ggml_evaluate(model, x, y, batch_size = 32, sample_weight = sw) # With class_weight result <- ggml_evaluate(model, x, y, batch_size = 32, class_weight = c("0" = 1, "1" = 2))n <- 128 x <- matrix(runif(n * 4), nrow = n, ncol = 4) y <- matrix(0, nrow = n, ncol = 2) for (i in seq_len(n)) { y[i, if (sum(x[i,]) > 2) 1L else 2L] <- 1 } model <- ggml_model_sequential() |> ggml_layer_dense(8, activation = "relu") |> ggml_layer_dense(2, activation = "softmax") model$input_shape <- 4L model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") model <- ggml_fit(model, x, y, epochs = 5, batch_size = 32, verbose = 0) # Basic evaluation result <- ggml_evaluate(model, x, y, batch_size = 32) # With sample_weight sw <- runif(n, 0.5, 1.5) result <- ggml_evaluate(model, x, y, batch_size = 32, sample_weight = sw) # With class_weight result <- ggml_evaluate(model, x, y, batch_size = 32, class_weight = c("0" = 1, "1" = 2))
Creates a graph node for element-wise exponential: exp(x)
ggml_exp(ctx, a)ggml_exp(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the exp operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) ggml_set_f32(a, c(0, 1, 2)) result <- ggml_exp(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, e, e^2] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) ggml_set_f32(a, c(0, 1, 2)) result <- ggml_exp(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, e, e^2] ggml_free(ctx)
Creates a graph node for in-place element-wise exponential: e^x
ggml_exp_inplace(ctx, a)ggml_exp_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with exponential values
Trains a model epoch by epoch in R, allowing callbacks for early stopping and learning rate scheduling. Optimizer state (momentum) is preserved across all epochs.
ggml_fit_opt( sched, ctx_compute, inputs, outputs, dataset, loss_type = ggml_opt_loss_type_mse(), optimizer = ggml_opt_optimizer_type_adamw(), nepoch = 10L, nbatch_logical = 32L, val_split = 0, callbacks = list(), silent = FALSE )ggml_fit_opt( sched, ctx_compute, inputs, outputs, dataset, loss_type = ggml_opt_loss_type_mse(), optimizer = ggml_opt_optimizer_type_adamw(), nepoch = 10L, nbatch_logical = 32L, val_split = 0, callbacks = list(), silent = FALSE )
sched |
Backend scheduler |
ctx_compute |
Compute context (for temporary tensors) |
inputs |
Input tensor with shape [ne_datapoint, batch_size] |
outputs |
Output tensor with shape [ne_label, batch_size] |
dataset |
Dataset created with 'ggml_opt_dataset_init()' |
loss_type |
Loss type (default: MSE) |
optimizer |
Optimizer type (default: AdamW) |
nepoch |
Number of epochs |
nbatch_logical |
Logical batch size (for gradient accumulation) |
val_split |
Fraction of data for validation (0.0 to 1.0) |
callbacks |
List of callback lists. Each element may have 'on_epoch_begin(epoch, logs, state)' and/or 'on_epoch_end(epoch, logs, state)'. Built-in factories: 'ggml_callback_early_stopping()', 'ggml_schedule_step_decay()', 'ggml_schedule_cosine_decay()', 'ggml_schedule_reduce_on_plateau()'. 'state' is a mutable environment with fields: 'stop' (set TRUE to stop training), 'lr_ud', 'nepoch'. |
silent |
Whether to suppress per-epoch progress output |
Data frame with columns epoch, train_loss, train_accuracy, val_loss, val_accuracy
Other optimization:
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
if (FALSE) { history <- ggml_fit_opt(sched, ctx_compute, inputs, outputs, dataset, nepoch = 50, val_split = 0.2, callbacks = list( ggml_callback_early_stopping(monitor = "val_loss", patience = 5), ggml_schedule_cosine_decay() )) }if (FALSE) { history <- ggml_fit_opt(sched, ctx_compute, inputs, outputs, dataset, nepoch = 50, val_split = 0.2, callbacks = list( ggml_callback_early_stopping(monitor = "val_loss", patience = 5), ggml_schedule_cosine_decay() )) }
Dispatcher: if the first argument is a ggml_sequential_model, delegates
to the Keras-style high-level API (ggml_fit_sequential); otherwise
delegates to the low-level optimizer loop (ggml_fit_opt).
## S3 method for class 'ggml_functional_model' ggml_fit( model, x, y, epochs = 1L, batch_size = 32L, validation_split = 0, validation_data = NULL, verbose = 1L, ... ) ggml_fit(model, ...) ## S3 method for class 'ggml_sequential_model' ggml_fit(model, ...) ## Default S3 method: ggml_fit(model, ...)## S3 method for class 'ggml_functional_model' ggml_fit( model, x, y, epochs = 1L, batch_size = 32L, validation_split = 0, validation_data = NULL, verbose = 1L, ... ) ggml_fit(model, ...) ## S3 method for class 'ggml_sequential_model' ggml_fit(model, ...) ## Default S3 method: ggml_fit(model, ...)
model |
A compiled model object. |
x |
Training data (matrix or array). |
y |
Training labels (matrix, one-hot encoded). |
epochs |
Number of training epochs (default: 1). |
batch_size |
Batch size (default: 32). |
validation_split |
Fraction of data for validation (default: 0). |
validation_data |
Optional list(x_val, y_val). Overrides validation_split. |
verbose |
0 = silent, 1 = progress (default: 1). |
... |
Arguments passed to the appropriate implementation. |
Keras-style (Sequential model):
A compiled ggml_sequential_model
Training data (matrix or array)
Training labels (matrix, one-hot encoded for classification)
Number of training epochs (default: 1)
Batch size (default: 32)
Fraction of data for validation (default: 0)
Optional list(x_val, y_val) for validation. Overrides validation_split.
Named vector of weights per class, e.g. c("0"=1, "1"=10). Cannot be used with sample_weight.
Numeric vector of per-sample weights (length = nrow(x)). Cannot be used with class_weight.
0 = silent, 1 = progress (default: 1)
Low-level (optimizer loop):
Backend scheduler
Compute context
Input tensor
Output tensor
Dataset from ggml_opt_dataset_init()
Loss type (default: MSE)
Optimizer type (default: AdamW)
Number of epochs (default: 10)
Logical batch size (default: 32)
Validation fraction (default: 0)
List of callback objects
Suppress output (default: FALSE)
For Sequential models: the trained model (invisibly).
For the low-level API: a data frame with columns
epoch, train_loss, train_accuracy,
val_loss, val_accuracy.
n <- 128 x <- matrix(runif(n * 4), nrow = n, ncol = 4) y <- matrix(0, nrow = n, ncol = 2) for (i in seq_len(n)) { y[i, if (sum(x[i,]) > 2) 1L else 2L] <- 1 } model <- ggml_model_sequential() |> ggml_layer_dense(8, activation = "relu") |> ggml_layer_dense(2, activation = "softmax") model$input_shape <- 4L model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") # Basic training model <- ggml_fit(model, x, y, epochs = 5, batch_size = 32, verbose = 0) # With validation_data x_val <- matrix(runif(32 * 4), nrow = 32, ncol = 4) y_val <- matrix(0, nrow = 32, ncol = 2) for (i in seq_len(32)) { y_val[i, if (sum(x_val[i,]) > 2) 1L else 2L] <- 1 } model <- ggml_fit(model, x, y, epochs = 3, batch_size = 32, validation_data = list(x_val, y_val), verbose = 0) # With class_weight (useful for imbalanced classes) model <- ggml_fit(model, x, y, epochs = 3, batch_size = 32, class_weight = c("0" = 1, "1" = 2), verbose = 0) # With sample_weight sw <- runif(n, 0.5, 1.5) model <- ggml_fit(model, x, y, epochs = 3, batch_size = 32, sample_weight = sw, verbose = 0)n <- 128 x <- matrix(runif(n * 4), nrow = n, ncol = 4) y <- matrix(0, nrow = n, ncol = 2) for (i in seq_len(n)) { y[i, if (sum(x[i,]) > 2) 1L else 2L] <- 1 } model <- ggml_model_sequential() |> ggml_layer_dense(8, activation = "relu") |> ggml_layer_dense(2, activation = "softmax") model$input_shape <- 4L model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") # Basic training model <- ggml_fit(model, x, y, epochs = 5, batch_size = 32, verbose = 0) # With validation_data x_val <- matrix(runif(32 * 4), nrow = 32, ncol = 4) y_val <- matrix(0, nrow = 32, ncol = 2) for (i in seq_len(32)) { y_val[i, if (sum(x_val[i,]) > 2) 1L else 2L] <- 1 } model <- ggml_fit(model, x, y, epochs = 3, batch_size = 32, validation_data = list(x_val, y_val), verbose = 0) # With class_weight (useful for imbalanced classes) model <- ggml_fit(model, x, y, epochs = 3, batch_size = 32, class_weight = c("0" = 1, "1" = 2), verbose = 0) # With sample_weight sw <- runif(n, 0.5, 1.5) model <- ggml_fit(model, x, y, epochs = 3, batch_size = 32, sample_weight = sw, verbose = 0)
Backward pass for Flash Attention. Used during training to compute gradients through attention.
ggml_flash_attn_back(ctx, q, k, v, d, masked = TRUE)ggml_flash_attn_back(ctx, q, k, v, d, masked = TRUE)
ctx |
GGML context |
q |
Query tensor (same as forward pass) |
k |
Key tensor (same as forward pass) |
v |
Value tensor (same as forward pass) |
d |
Gradient tensor from upstream (same shape as forward output) |
masked |
Logical: whether causal masking was used in forward pass |
Gradient tensor
Creates a graph node for Flash Attention computation. This is a memory-efficient implementation of scaled dot-product attention.
ggml_flash_attn_ext( ctx, q, k, v, mask = NULL, scale, max_bias = 0, logit_softcap = 0 )ggml_flash_attn_ext( ctx, q, k, v, mask = NULL, scale, max_bias = 0, logit_softcap = 0 )
ctx |
GGML context |
q |
Query tensor of shape [head_dim, n_head, n_tokens, batch] |
k |
Key tensor of shape [head_dim, n_head_kv, n_kv, batch] |
v |
Value tensor of shape [head_dim, n_head_kv, n_kv, batch] |
mask |
Optional attention mask tensor (NULL for no mask). For causal attention, use ggml_diag_mask_inf instead. |
scale |
Attention scale factor, typically 1/sqrt(head_dim) |
max_bias |
Maximum ALiBi bias (0.0 to disable ALiBi) |
logit_softcap |
Logit soft-capping value (0.0 to disable). Used by some models like Gemma 2. |
Flash Attention computes: softmax(Q * K^T / scale + mask) * V
Key features: - Memory efficient: O(n) instead of O(n^2) memory for attention matrix - Supports grouped-query attention (GQA) when n_head_kv < n_head - Supports multi-query attention (MQA) when n_head_kv = 1 - Optional ALiBi (Attention with Linear Biases) for position encoding - Optional logit soft-capping for numerical stability
Attention output tensor of shape [head_dim, n_head, n_tokens, batch]
ctx <- ggml_init(64 * 1024 * 1024) head_dim <- 64 n_head <- 8 n_head_kv <- 2 # GQA with 4:1 ratio seq_len <- 32 q <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, head_dim, n_head, seq_len, 1) k <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, head_dim, n_head_kv, seq_len, 1) v <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, head_dim, n_head_kv, seq_len, 1) ggml_set_f32(q, rnorm(head_dim * n_head * seq_len)) ggml_set_f32(k, rnorm(head_dim * n_head_kv * seq_len)) ggml_set_f32(v, rnorm(head_dim * n_head_kv * seq_len)) # Scale = 1/sqrt(head_dim) scale <- 1.0 / sqrt(head_dim) # Compute attention out <- ggml_flash_attn_ext(ctx, q, k, v, NULL, scale, 0.0, 0.0) graph <- ggml_build_forward_expand(ctx, out) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(64 * 1024 * 1024) head_dim <- 64 n_head <- 8 n_head_kv <- 2 # GQA with 4:1 ratio seq_len <- 32 q <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, head_dim, n_head, seq_len, 1) k <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, head_dim, n_head_kv, seq_len, 1) v <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, head_dim, n_head_kv, seq_len, 1) ggml_set_f32(q, rnorm(head_dim * n_head * seq_len)) ggml_set_f32(k, rnorm(head_dim * n_head_kv * seq_len)) ggml_set_f32(v, rnorm(head_dim * n_head_kv * seq_len)) # Scale = 1/sqrt(head_dim) scale <- 1.0 / sqrt(head_dim) # Compute attention out <- ggml_flash_attn_ext(ctx, q, k, v, NULL, scale, 0.0, 0.0) graph <- ggml_build_forward_expand(ctx, out) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Creates a graph node for element-wise floor: floor(x)
ggml_floor(ctx, a)ggml_floor(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the floor operation
Creates a graph node for in-place element-wise floor.
ggml_floor_inplace(ctx, a)ggml_floor_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with floor values
Free GGML context
ggml_free(ctx)ggml_free(ctx)
ctx |
Context pointer |
NULL (invisible)
ctx <- ggml_init(1024 * 1024) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_free(ctx)
Sets trainable = FALSE on layers, preventing their weights from being
updated during training. Accepts optional from / to to freeze
a range of layers by index, or layer_names to freeze by name.
If none are provided, all layers are frozen.
ggml_freeze_weights( model, from = 1L, to = length(model$layers), layer_names = NULL, ... )ggml_freeze_weights( model, from = 1L, to = length(model$layers), layer_names = NULL, ... )
model |
A model object (ggml_sequential_model or ggml_functional_model) |
from |
Integer index of the first layer to freeze (default: 1) |
to |
Integer index of the last layer to freeze (default: last layer) |
layer_names |
Character vector of layer names to freeze (overrides from/to) |
... |
Additional arguments passed to methods |
The model with selected layers frozen.
model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") # Freeze all layers model <- ggml_freeze_weights(model) # Freeze only the first layer model <- ggml_freeze_weights(model, from = 1, to = 1)model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") # Freeze all layers model <- ggml_freeze_weights(model) # Freeze only the first layer model <- ggml_freeze_weights(model, from = 1, to = 1)
Converts a file type (ftype) to the corresponding GGML type. Used when loading quantized models.
ggml_ftype_to_ggml_type(ftype)ggml_ftype_to_ggml_type(ftype)
ftype |
File type constant |
Integer GGML type
Other type_system:
ggml_blck_size(),
ggml_is_quantized(),
ggml_type_name(),
ggml_type_sizef()
Allocates memory for all tensors in the computation graph. This must be called before computing the graph.
ggml_gallocr_alloc_graph(galloc, graph)ggml_gallocr_alloc_graph(galloc, graph)
galloc |
Graph allocator object |
graph |
Graph object |
TRUE on success, FALSE on failure
ctx <- ggml_init(16 * 1024 * 1024) galloc <- ggml_gallocr_new() # Create graph a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) # Allocate and compute ggml_gallocr_alloc_graph(galloc, graph) ggml_graph_compute(ctx, graph) ggml_gallocr_free(galloc) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) galloc <- ggml_gallocr_new() # Create graph a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) # Allocate and compute ggml_gallocr_alloc_graph(galloc, graph) ggml_graph_compute(ctx, graph) ggml_gallocr_free(galloc) ggml_free(ctx)
Frees a graph allocator and all associated buffers.
ggml_gallocr_free(galloc)ggml_gallocr_free(galloc)
galloc |
Graph allocator object |
No return value, called for side effects
Returns the size of the buffer used by the graph allocator.
ggml_gallocr_get_buffer_size(galloc, buffer_id = 0L)ggml_gallocr_get_buffer_size(galloc, buffer_id = 0L)
galloc |
Graph allocator object |
buffer_id |
Buffer ID (default: 0 for single-buffer allocator) |
Size in bytes
Creates a new graph allocator for efficient memory management. The allocator can automatically allocate and reuse memory for graph tensors.
ggml_gallocr_new()ggml_gallocr_new()
Graph allocator object (external pointer)
ctx <- ggml_init(16 * 1024 * 1024) galloc <- ggml_gallocr_new() a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) # Allocate graph ggml_gallocr_alloc_graph(galloc, graph) ggml_gallocr_free(galloc) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) galloc <- ggml_gallocr_new() a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) # Allocate graph ggml_gallocr_alloc_graph(galloc, graph) ggml_gallocr_free(galloc) ggml_free(ctx)
Pre-allocates memory for a graph. This is optional but recommended when running the same graph multiple times to avoid reallocation.
ggml_gallocr_reserve(galloc, graph)ggml_gallocr_reserve(galloc, graph)
galloc |
Graph allocator object |
graph |
Graph object |
TRUE on success, FALSE on failure
Creates a graph node for GeGLU operation. GeGLU uses GELU as the activation function on the first half. CRITICAL for models like GPT-NeoX and Falcon.
ggml_geglu(ctx, a)ggml_geglu(ctx, a)
ctx |
GGML context |
a |
Input tensor (first dimension must be even) |
Formula: output = GELU(x) * gate
Tensor with half the first dimension of input
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 3) ggml_set_f32(a, rnorm(24)) r <- ggml_geglu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 4x3 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 3) ggml_set_f32(a, rnorm(24)) r <- ggml_geglu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 4x3 ggml_free(ctx)
Creates a graph node for fast GeGLU approximation. Uses faster but less accurate GELU approximation for gating.
ggml_geglu_quick(ctx, a)ggml_geglu_quick(ctx, a)
ctx |
GGML context |
a |
Input tensor (first dimension must be even) |
Tensor with half the first dimension of input
Creates a graph node for GeGLU with separate input and gate tensors.
ggml_geglu_split(ctx, a, b)ggml_geglu_split(ctx, a, b)
ctx |
GGML context |
a |
Input tensor (the values to be gated) |
b |
Gate tensor (same shape as a) |
Formula: output = GELU(a) * b
Tensor with same shape as input tensors
Creates a graph node for GELU (Gaussian Error Linear Unit) activation. CRITICAL for GPT models.
ggml_gelu(ctx, a)ggml_gelu(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the GELU operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_gelu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_gelu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for exact GELU using the error function (erf). GELU(x) = x * 0.5 * (1 + erf(x / sqrt(2))). More accurate than approximate GELU but potentially slower on some backends.
ggml_gelu_erf(ctx, a)ggml_gelu_erf(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the exact GELU operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_gelu_erf(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_gelu_erf(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)
Creates a graph node for in-place GELU (Gaussian Error Linear Unit) activation. CRITICAL for GPT models with memory efficiency.
ggml_gelu_inplace(ctx, a)ggml_gelu_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with GELU applied
Creates a graph node for fast approximation of GELU. Faster than standard GELU with minimal accuracy loss.
ggml_gelu_quick(ctx, a)ggml_gelu_quick(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the GELU quick operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_gelu_quick(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_gelu_quick(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) ggml_free(ctx)
Get F32 data
Get F32 Data
ggml_get_f32(tensor) ggml_get_f32(tensor)ggml_get_f32(tensor) ggml_get_f32(tensor)
tensor |
Tensor |
Numeric vector with tensor values
Numeric vector
ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(tensor, c(1, 2, 3, 4, 5)) ggml_get_f32(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(t, c(1, 2, 3, 4, 5)) ggml_get_f32(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(tensor, c(1, 2, 3, 4, 5)) ggml_get_f32(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(t, c(1, 2, 3, 4, 5)) ggml_get_f32(t) ggml_free(ctx)
Gets a single f32 value from the tensor at position [i0, i1, i2, i3]. Works with any tensor type (auto-converts to float).
ggml_get_f32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0)ggml_get_f32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0)
tensor |
Tensor pointer |
i0, i1, i2, i3
|
Indices (0-based) |
Float value
Get First Tensor from Context
ggml_get_first_tensor(ctx)ggml_get_first_tensor(ctx)
ctx |
GGML context |
Tensor pointer or NULL
Gets integer data from an I32 tensor (e.g., from ggml_argmax)
ggml_get_i32(tensor)ggml_get_i32(tensor)
tensor |
Tensor of type GGML_TYPE_I32 |
Integer vector
ctx <- ggml_init(1024 * 1024) pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 10) ggml_set_i32(pos, 0:9) ggml_get_i32(pos) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 10) ggml_set_i32(pos, 0:9) ggml_get_i32(pos) ggml_free(ctx)
Gets a single i32 value from the tensor at position [i0, i1, i2, i3].
ggml_get_i32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0)ggml_get_i32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0)
tensor |
Tensor pointer |
i0, i1, i2, i3
|
Indices (0-based) |
Integer value
Retrieves a layer by name or by integer index (1-based).
ggml_get_layer(model, index = NULL, name = NULL)ggml_get_layer(model, index = NULL, name = NULL)
model |
A ggml_sequential_model object |
index |
Integer index of the layer (1-based), or NULL |
name |
Character name of the layer, or NULL |
The layer list object
model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu", name = "hidden") |> ggml_layer_dense(10, activation = "softmax", name = "output") ggml_get_layer(model, index = 1) ggml_get_layer(model, name = "output")model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu", name = "hidden") |> ggml_layer_dense(10, activation = "softmax", name = "output") ggml_get_layer(model, index = 1) ggml_get_layer(model, name = "output")
Returns the maximum tensor size that can be allocated in the context
ggml_get_max_tensor_size(ctx)ggml_get_max_tensor_size(ctx)
ctx |
GGML context |
Maximum tensor size in bytes
ctx <- ggml_init(1024 * 1024) ggml_get_max_tensor_size(ctx) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_get_max_tensor_size(ctx) ggml_free(ctx)
Returns the total memory pool size of the context
ggml_get_mem_size(ctx)ggml_get_mem_size(ctx)
ctx |
GGML context |
Total memory size in bytes
ctx <- ggml_init(1024 * 1024) ggml_get_mem_size(ctx) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_get_mem_size(ctx) ggml_free(ctx)
Get the current number of threads for GGML operations
ggml_get_n_threads()ggml_get_n_threads()
Number of threads
ggml_get_n_threads()ggml_get_n_threads()
Retrieves the name of a tensor.
ggml_get_name(tensor)ggml_get_name(tensor)
tensor |
Tensor pointer |
Character string name or NULL if not set
Get Next Tensor from Context
ggml_get_next_tensor(ctx, tensor)ggml_get_next_tensor(ctx, tensor)
ctx |
GGML context |
tensor |
Current tensor |
Next tensor pointer or NULL
Check if no-allocation mode is enabled
ggml_get_no_alloc(ctx)ggml_get_no_alloc(ctx)
ctx |
GGML context |
Logical indicating if no_alloc is enabled
ctx <- ggml_init(1024 * 1024) ggml_get_no_alloc(ctx) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_get_no_alloc(ctx) ggml_free(ctx)
Returns the raw op_params bytes from a tensor. These parameters control operation-specific behavior (e.g., precision, mode).
ggml_get_op_params(tensor)ggml_get_op_params(tensor)
tensor |
External pointer to tensor |
Raw vector of op_params bytes
Other tensor:
ggml_are_same_layout(),
ggml_get_op_params_f32(),
ggml_get_op_params_i32(),
ggml_set_op_params(),
ggml_set_op_params_f32(),
ggml_set_op_params_i32()
Gets a single float value from tensor op_params at given index.
ggml_get_op_params_f32(tensor, index)ggml_get_op_params_f32(tensor, index)
tensor |
External pointer to tensor |
index |
0-based index (0-15 for 64-byte op_params) |
Numeric value
Other tensor:
ggml_are_same_layout(),
ggml_get_op_params(),
ggml_get_op_params_i32(),
ggml_set_op_params(),
ggml_set_op_params_f32(),
ggml_set_op_params_i32()
Gets a single int32 value from tensor op_params at given index.
ggml_get_op_params_i32(tensor, index)ggml_get_op_params_i32(tensor, index)
tensor |
External pointer to tensor |
index |
0-based index (0-15 for 64-byte op_params) |
Integer value
Other tensor:
ggml_are_same_layout(),
ggml_get_op_params(),
ggml_get_op_params_f32(),
ggml_set_op_params(),
ggml_set_op_params_f32(),
ggml_set_op_params_i32()
Gathers relative-position rows for relative-position attention bias.
ggml_get_rel_pos(ctx, a, qh, kh)ggml_get_rel_pos(ctx, a, qh, kh)
ctx |
GGML context |
a |
Input tensor |
qh |
Query height |
kh |
Key height |
Relative-position tensor
Creates a graph node that extracts rows from a tensor by index. This is commonly used for embedding lookup in LLMs.
ggml_get_rows(ctx, a, b)ggml_get_rows(ctx, a, b)
ctx |
GGML context |
a |
Data tensor of shape [n_embd, n_rows, ...] - the embedding table |
b |
Index tensor (int32) of shape [n_indices] - which rows to extract |
This operation is fundamental for embedding lookup in transformers: given a vocabulary embedding matrix and token indices, it retrieves the corresponding embedding vectors.
Tensor of shape [n_embd, n_indices, ...] containing the selected rows
ctx <- ggml_init(16 * 1024 * 1024) # Create embedding matrix: 10 tokens, 4-dim embeddings embeddings <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 10) ggml_set_f32(embeddings, rnorm(40)) # Token indices to look up indices <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 3) ggml_set_i32(indices, c(0L, 5L, 2L)) # Get embeddings for tokens 0, 5, 2 result <- ggml_get_rows(ctx, embeddings, indices) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Create embedding matrix: 10 tokens, 4-dim embeddings embeddings <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 10) ggml_set_f32(embeddings, rnorm(40)) # Token indices to look up indices <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 3) ggml_set_i32(indices, c(0L, 5L, 2L)) # Get embeddings for tokens 0, 5, 2 result <- ggml_get_rows(ctx, embeddings, indices) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Backward pass for ggml_get_rows operation. Accumulates gradients at the original row positions.
ggml_get_rows_back(ctx, a, b, c)ggml_get_rows_back(ctx, a, b, c)
ctx |
GGML context |
a |
Gradient of get_rows output |
b |
Index tensor (same as forward pass) |
c |
Reference tensor defining output shape |
Gradient tensor for the embedding matrix
Returns the unary operation type for a unary operation tensor.
ggml_get_unary_op(tensor)ggml_get_unary_op(tensor)
tensor |
Tensor pointer (must be a unary operation result) |
Integer unary operation type
Other op_info:
ggml_op_desc(),
ggml_op_name(),
ggml_op_symbol(),
ggml_unary_op_name()
Creates a graph node for GLU operation with specified gating type. GLU splits the input tensor in half along the first dimension, applies an activation to the first half (x), and multiplies it with the second half (gate).
ggml_glu(ctx, a, op, swapped = FALSE)ggml_glu(ctx, a, op, swapped = FALSE)
ctx |
GGML context |
a |
Input tensor (first dimension must be even) |
op |
GLU operation type (GGML_GLU_OP_REGLU, GGML_GLU_OP_GEGLU, etc.) |
swapped |
If TRUE, swap x and gate halves (default FALSE) |
Formula: output = activation(x) * gate where x and gate are the two halves of the input tensor.
Tensor with shape [n/2, ...] where n is the first dimension of input
ctx <- ggml_init(16 * 1024 * 1024) # Create tensor with 10 columns (will be split into 5 + 5) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 4) ggml_set_f32(a, rnorm(40)) # Apply SwiGLU r <- ggml_glu(ctx, a, GGML_GLU_OP_SWIGLU, FALSE) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 5x4 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Create tensor with 10 columns (will be split into 5 + 5) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 4) ggml_set_f32(a, rnorm(40)) # Apply SwiGLU r <- ggml_glu(ctx, a, GGML_GLU_OP_SWIGLU, FALSE) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 5x4 ggml_free(ctx)
Constants for GLU (Gated Linear Unit) operation types. Used with ggml_glu() and ggml_glu_split().
GGML_GLU_OP_REGLU GGML_GLU_OP_GEGLU GGML_GLU_OP_SWIGLU GGML_GLU_OP_SWIGLU_OAI GGML_GLU_OP_GEGLU_ERF GGML_GLU_OP_GEGLU_QUICKGGML_GLU_OP_REGLU GGML_GLU_OP_GEGLU GGML_GLU_OP_SWIGLU GGML_GLU_OP_SWIGLU_OAI GGML_GLU_OP_GEGLU_ERF GGML_GLU_OP_GEGLU_QUICK
Integer constants
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
GGML_GLU_OP_REGLU (0): ReGLU - ReLU gating
GGML_GLU_OP_GEGLU (1): GeGLU - GELU gating (used in GPT-NeoX, Falcon)
GGML_GLU_OP_SWIGLU (2): SwiGLU - SiLU/Swish gating (used in LLaMA, Mistral)
GGML_GLU_OP_SWIGLU_OAI (3): SwiGLU OpenAI variant
GGML_GLU_OP_GEGLU_ERF (4): GeGLU with exact erf implementation
GGML_GLU_OP_GEGLU_QUICK (5): GeGLU with fast approximation
An integer constant representing a GLU operation type
GGML_GLU_OP_REGLU # 0 - ReLU gating GGML_GLU_OP_GEGLU # 1 - GELU gating GGML_GLU_OP_SWIGLU # 2 - SiLU/Swish gating GGML_GLU_OP_SWIGLU_OAI # 3 - SwiGLU OpenAI GGML_GLU_OP_GEGLU_ERF # 4 - GELU with erf GGML_GLU_OP_GEGLU_QUICK # 5 - Fast GELUGGML_GLU_OP_REGLU # 0 - ReLU gating GGML_GLU_OP_GEGLU # 1 - GELU gating GGML_GLU_OP_SWIGLU # 2 - SiLU/Swish gating GGML_GLU_OP_SWIGLU_OAI # 3 - SwiGLU OpenAI GGML_GLU_OP_GEGLU_ERF # 4 - GELU with erf GGML_GLU_OP_GEGLU_QUICK # 5 - Fast GELU
Creates a graph node for GLU with separate input and gate tensors. Unlike standard GLU which splits a single tensor, this takes two separate tensors.
ggml_glu_split(ctx, a, b, op)ggml_glu_split(ctx, a, b, op)
ctx |
GGML context |
a |
Input tensor (the values to be gated) |
b |
Gate tensor (same shape as a) |
op |
GLU operation type (GGML_GLU_OP_REGLU, GGML_GLU_OP_GEGLU, etc.) |
Tensor with same shape as input tensors
Executes all operations in the computation graph.
Executes the computation graph using CPU backend
ggml_graph_compute(ctx, graph) ggml_graph_compute(ctx, graph)ggml_graph_compute(ctx, graph) ggml_graph_compute(ctx, graph)
ctx |
GGML context |
graph |
Graph object created by ggml_build_forward_expand |
NULL (invisible)
No return value, called for side effects
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) result <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx) ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) ggml_set_f32(b, 11:20) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(c) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) result <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx) ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) ggml_set_f32(b, 11:20) c <- ggml_add(ctx, a, b) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(c) ggml_free(ctx)
Computes the computation graph using the context-based method. This is an alternative to ggml_graph_compute() that uses ggml_graph_plan() and ggml_graph_compute() internally.
ggml_graph_compute_with_ctx(ctx, graph, n_threads = 0L)ggml_graph_compute_with_ctx(ctx, graph, n_threads = 0L)
ctx |
GGML context |
graph |
Graph object created by ggml_build_forward_expand |
n_threads |
Number of threads to use (0 for auto-detect, default: 0) |
No return value, called for side effects
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) c <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute_with_ctx(ctx, graph) result <- ggml_get_f32(c) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) c <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute_with_ctx(ctx, graph) result <- ggml_get_f32(c) ggml_free(ctx)
Exports the computation graph to a DOT file for visualization. The DOT file can be converted to an image using Graphviz tools.
ggml_graph_dump_dot(graph, leafs = NULL, filename)ggml_graph_dump_dot(graph, leafs = NULL, filename)
graph |
Graph object |
leafs |
Optional graph with leaf tensors (NULL for none) |
filename |
Output filename (should end with .dot) |
No return value, called for side effects
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) ggml_graph_dump_dot(graph, NULL, tempfile(fileext = ".dot")) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) ggml_graph_dump_dot(graph, NULL, tempfile(fileext = ".dot")) ggml_free(ctx)
Finds a tensor in the computation graph by its name
ggml_graph_get_tensor(graph, name)ggml_graph_get_tensor(graph, name)
graph |
Graph object |
name |
Character string with tensor name |
Tensor pointer or NULL if not found
Returns the number of computation nodes in the graph
ggml_graph_n_nodes(graph)ggml_graph_n_nodes(graph)
graph |
Graph object |
Integer number of nodes
Gets a specific node (tensor) from the computation graph by index
ggml_graph_node(graph, i)ggml_graph_node(graph, i)
graph |
Graph object |
i |
Node index (0-based, negative indices count from end) |
Tensor pointer
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_add(ctx, a, a) graph <- ggml_build_forward_expand(ctx, b) # Get the last node (output) output <- ggml_graph_node(graph, -1) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_add(ctx, a, a) graph <- ggml_build_forward_expand(ctx, b) # Get the last node (output) output <- ggml_graph_node(graph, -1) ggml_free(ctx)
Returns the memory overhead required for a computation graph
ggml_graph_overhead()ggml_graph_overhead()
Size in bytes
Prints debug information about the computation graph
ggml_graph_print(graph)ggml_graph_print(graph)
graph |
Graph object |
No return value, called for side effects
Resets the computation graph for a new backward pass. NOTE: This function requires the graph to have gradients allocated (used for training/backpropagation). For inference-only graphs, this function will cause an error.
ggml_graph_reset(graph)ggml_graph_reset(graph)
graph |
Graph object with gradients allocated |
No return value, called for side effects
Creates a view of a portion of a computation graph, containing nodes from index i0 to i1 (exclusive). The view shares the underlying nodes but does not include leaf tensors or gradients.
ggml_graph_view(graph, i0, i1)ggml_graph_view(graph, i0, i1)
graph |
External pointer to computation graph |
i0 |
Start index (0-based, inclusive) |
i1 |
End index (exclusive) |
External pointer to graph view
Other graph:
ggml_op_can_inplace()
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) n_nodes <- ggml_graph_n_nodes(graph) view <- ggml_graph_view(graph, 0, n_nodes) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) b <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, b) n_nodes <- ggml_graph_n_nodes(graph) view <- ggml_graph_view(graph, 0, n_nodes) ggml_free(ctx)
Creates a graph node for group normalization. Normalizes along ne0*ne1*n_groups dimensions. Used in Stable Diffusion and other image generation models.
ggml_group_norm(ctx, a, n_groups, eps = 1e-05)ggml_group_norm(ctx, a, n_groups, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor |
n_groups |
Number of groups to divide channels into |
eps |
Epsilon for numerical stability (default 1e-5) |
Tensor representing the group norm operation
ctx <- ggml_init(16 * 1024 * 1024) # 4 channels, 2 groups (2 channels per group) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 8) ggml_set_f32(a, rnorm(32)) result <- ggml_group_norm(ctx, a, n_groups = 2) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # 4 channels, 2 groups (2 channels per group) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 8) ggml_set_f32(a, rnorm(32)) result <- ggml_group_norm(ctx, a, n_groups = 2) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Creates a graph node for in-place group normalization.
ggml_group_norm_inplace(ctx, a, n_groups, eps = 1e-05)ggml_group_norm_inplace(ctx, a, n_groups, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
n_groups |
Number of groups |
eps |
Epsilon for numerical stability (default 1e-5) |
View of input tensor with group norm applied
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 8) ggml_set_f32(a, rnorm(32)) result <- ggml_group_norm_inplace(ctx, a, n_groups = 2) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 8) ggml_set_f32(a, rnorm(32)) result <- ggml_group_norm_inplace(ctx, a, n_groups = 2) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Create a GRU Layer Object
ggml_gru( units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", name = NULL, trainable = TRUE )ggml_gru( units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", name = NULL, trainable = TRUE )
units |
Integer, number of hidden units. |
return_sequences |
Logical. |
activation |
Candidate activation (default |
recurrent_activation |
Gate activation (default |
name |
Optional character name. |
trainable |
Logical. |
A ggml_layer object.
Creates a graph node for Hard Sigmoid activation. HardSigmoid(x) = ReLU6(x + 3) / 6 = min(max(0, x + 3), 6) / 6. A computationally efficient approximation of the sigmoid function.
ggml_hardsigmoid(ctx, a)ggml_hardsigmoid(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the Hard Sigmoid operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-4, -1, 0, 1, 4)) r <- ggml_hardsigmoid(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # [0, 0.333, 0.5, 0.667, 1] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-4, -1, 0, 1, 4)) r <- ggml_hardsigmoid(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # [0, 0.333, 0.5, 0.667, 1] ggml_free(ctx)
Creates a graph node for Hard Swish activation. HardSwish(x) = x * ReLU6(x + 3) / 6 = x * min(max(0, x + 3), 6) / 6. Used in MobileNetV3 and other efficient architectures.
ggml_hardswish(ctx, a)ggml_hardswish(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the Hard Swish operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-4, -1, 0, 1, 4)) r <- ggml_hardswish(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-4, -1, 0, 1, 4)) r <- ggml_hardswish(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)
Transforms image data into column format for efficient convolution. This is a low-level operation used internally by convolution implementations.
ggml_im2col( ctx, a, b, s0, s1, p0, p1, d0, d1, is_2D = TRUE, dst_type = GGML_TYPE_F16 )ggml_im2col( ctx, a, b, s0, s1, p0, p1, d0, d1, is_2D = TRUE, dst_type = GGML_TYPE_F16 )
ctx |
GGML context |
a |
Convolution kernel tensor |
b |
Input data tensor |
s0 |
Stride dimension 0 |
s1 |
Stride dimension 1 |
p0 |
Padding dimension 0 |
p1 |
Padding dimension 1 |
d0 |
Dilation dimension 0 |
d1 |
Dilation dimension 1 |
is_2D |
Whether this is a 2D operation (default TRUE) |
dst_type |
Output type (default GGML_TYPE_F16) |
Transformed tensor in column format
Initialize GGML context
ggml_init(mem_size = 16 * 1024 * 1024, no_alloc = FALSE)ggml_init(mem_size = 16 * 1024 * 1024, no_alloc = FALSE)
mem_size |
Memory size in bytes |
no_alloc |
If TRUE, don't allocate memory for tensors (default: FALSE) |
GGML context pointer
ctx <- ggml_init(1024 * 1024) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_free(ctx)
Creates a context with automatically calculated size based on planned tensors.
ggml_init_auto(..., extra_mb = 10, type = GGML_TYPE_F32, no_alloc = FALSE)ggml_init_auto(..., extra_mb = 10, type = GGML_TYPE_F32, no_alloc = FALSE)
... |
Named arguments with tensor dimensions (integer vectors) |
extra_mb |
Extra megabytes to add (default: 10) |
type |
Tensor type (default: GGML_TYPE_F32) |
no_alloc |
If TRUE, don't allocate memory for tensors (default: FALSE) |
GGML context
ctx <- ggml_init_auto(mat1 = c(1000L, 1000L), mat2 = c(1000L, 1000L)) ggml_free(ctx)ctx <- ggml_init_auto(mat1 = c(1000L, 1000L), mat2 = c(1000L, 1000L)) ggml_free(ctx)
Creates a symbolic input node for the Functional API. The node records
only the shape of one sample (without batch dimension); actual
memory is allocated when ggml_compile() is called.
ggml_input(shape, name = NULL, dtype = "float32")ggml_input(shape, name = NULL, dtype = "float32")
shape |
Integer vector describing the shape of a single sample.
For flat feature vectors use a scalar, e.g. |
name |
Optional character name for the input tensor. |
dtype |
Data type of the input: |
A ggml_tensor_node object.
x <- ggml_input(shape = 64L) x <- ggml_input(shape = c(28L, 28L, 1L), name = "image") x <- ggml_input(shape = 10L, dtype = "int32") # token indicesx <- ggml_input(shape = 64L) x <- ggml_input(shape = c(28L, 28L, 1L), name = "image") x <- ggml_input(shape = 10L, dtype = "int32") # token indices
Check if GGML is available
ggml_is_available()ggml_is_available()
TRUE if GGML library is loaded
ggml_is_available()ggml_is_available()
Returns TRUE if tensor data is stored contiguously in memory
ggml_is_contiguous(tensor)ggml_is_contiguous(tensor)
tensor |
Tensor pointer |
Logical
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_is_contiguous(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_is_contiguous(t) ggml_free(ctx)
Check if tensor is contiguous. Same as ggml_is_contiguous.
ggml_is_contiguous_0(tensor)ggml_is_contiguous_0(tensor)
tensor |
Tensor pointer |
Logical indicating contiguity
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
Check if tensor is contiguous for dimensions >= 1. Allows non-contiguous first dimension.
ggml_is_contiguous_1(tensor)ggml_is_contiguous_1(tensor)
tensor |
Tensor pointer |
Logical indicating contiguity for dims >= 1
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
Check if tensor is contiguous for dimensions >= 2. Allows non-contiguous first two dimensions.
ggml_is_contiguous_2(tensor)ggml_is_contiguous_2(tensor)
tensor |
Tensor pointer |
Logical indicating contiguity for dims >= 2
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
Check if tensor has contiguous channels (important for CNN operations). Data for each channel should be stored contiguously.
ggml_is_contiguous_channels(tensor)ggml_is_contiguous_channels(tensor)
tensor |
Tensor pointer |
Logical indicating channel-wise contiguity
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_rows(),
ggml_is_contiguously_allocated()
Check if tensor has contiguous rows (important for matrix operations). Each row should be stored contiguously in memory.
ggml_is_contiguous_rows(tensor)ggml_is_contiguous_rows(tensor)
tensor |
Tensor pointer |
Logical indicating row-wise contiguity
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguously_allocated()
Check if tensor data is contiguously allocated in memory. Different from contiguous layout - this checks the actual allocation.
ggml_is_contiguously_allocated(tensor)ggml_is_contiguously_allocated(tensor)
tensor |
Tensor pointer |
Logical indicating if data is contiguously allocated
Other tensor_layout:
ggml_are_same_stride(),
ggml_can_repeat(),
ggml_count_equal(),
ggml_is_contiguous_0(),
ggml_is_contiguous_1(),
ggml_is_contiguous_2(),
ggml_is_contiguous_channels(),
ggml_is_contiguous_rows()
Returns TRUE if tensor dimensions have been permuted
ggml_is_permuted(tensor)ggml_is_permuted(tensor)
tensor |
Tensor pointer |
Logical
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_is_permuted(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_is_permuted(t) ggml_free(ctx)
Returns TRUE if the GGML type is a quantized format.
ggml_is_quantized(type)ggml_is_quantized(type)
type |
GGML type constant |
Logical indicating if type is quantized
Other type_system:
ggml_blck_size(),
ggml_ftype_to_ggml_type(),
ggml_type_name(),
ggml_type_sizef()
ggml_is_quantized(GGML_TYPE_F32) # FALSE ggml_is_quantized(GGML_TYPE_Q4_0) # TRUEggml_is_quantized(GGML_TYPE_F32) # FALSE ggml_is_quantized(GGML_TYPE_Q4_0) # TRUE
Returns TRUE if tensor has been transposed
ggml_is_transposed(tensor)ggml_is_transposed(tensor)
tensor |
Tensor pointer |
Logical
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_is_transposed(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_is_transposed(t) ggml_free(ctx)
Creates a graph node for L2 normalization (unit norm). Normalizes vectors to unit length: x / ||x||_2. Used in RWKV v7 and embedding normalization.
ggml_l2_norm(ctx, a, eps = 1e-05)ggml_l2_norm(ctx, a, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor |
eps |
Epsilon for numerical stability (default 1e-5) |
Tensor representing the L2 norm operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(3, 0, 0, 4)) # Length = 5 result <- ggml_l2_norm(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [0.6, 0, 0, 0.8] unit vector ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(3, 0, 0, 4)) # Length = 5 result <- ggml_l2_norm(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [0.6, 0, 0, 0.8] unit vector ggml_free(ctx)
Creates a graph node for in-place L2 normalization.
ggml_l2_norm_inplace(ctx, a, eps = 1e-05)ggml_l2_norm_inplace(ctx, a, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
eps |
Epsilon for numerical stability (default 1e-5) |
View of input tensor with L2 norm applied
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(3, 0, 0, 4)) result <- ggml_l2_norm_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(3, 0, 0, 4)) result <- ggml_l2_norm_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Adds two (or more) tensor nodes element-wise. All tensors must have the same shape. This is the functional equivalent of a residual / skip connection.
ggml_layer_add(tensors, name = NULL)ggml_layer_add(tensors, name = NULL)
tensors |
A list of |
name |
Optional character name for the layer. |
A new ggml_tensor_node representing the sum.
x <- ggml_input(shape = 64L) a <- x |> ggml_layer_dense(64, activation = "relu") b <- x |> ggml_layer_dense(64) out <- ggml_layer_add(list(a, b))x <- ggml_input(shape = 64L) a <- x |> ggml_layer_dense(64, activation = "relu") b <- x |> ggml_layer_dense(64) out <- ggml_layer_add(list(a, b))
Applies normalization: RMS-normalizes the input, then scales by gamma
and shifts by beta (both learnable). Uses ggml_rms_norm which
supports backward pass for training.
ggml_layer_batch_norm(model, eps = 1e-05, name = NULL, trainable = TRUE)ggml_layer_batch_norm(model, eps = 1e-05, name = NULL, trainable = TRUE)
model |
A ggml_sequential_model object |
eps |
Small constant for numerical stability (default 1e-5) |
name |
Optional character name for the layer. |
trainable |
Logical; whether the layer weights are updated during training. |
The model object with the batch_norm layer appended (invisibly).
model <- ggml_model_sequential() |> ggml_layer_dense(128, input_shape = 784) |> ggml_layer_batch_norm() |> ggml_layer_dense(10, activation = "softmax")model <- ggml_model_sequential() |> ggml_layer_dense(128, input_shape = 784) |> ggml_layer_batch_norm() |> ggml_layer_dense(10, activation = "softmax")
Concatenates two or more tensor nodes along the specified axis.
ggml_layer_concatenate(tensors, axis = 0L, name = NULL)ggml_layer_concatenate(tensors, axis = 0L, name = NULL)
tensors |
A list of |
axis |
Integer axis along which to concatenate (0-based, ggml convention).
Default |
name |
Optional character name for the layer. |
A new ggml_tensor_node representing the concatenated tensor.
x <- ggml_input(shape = 32L) y <- ggml_input(shape = 32L) out <- ggml_layer_concatenate(list(x, y), axis = 0L)x <- ggml_input(shape = 32L) y <- ggml_input(shape = 32L) out <- ggml_layer_concatenate(list(x, y), axis = 0L)
Create a Conv1D Layer Object
Add 1D Convolution Layer
ggml_layer_conv_1d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = 1L, padding = "valid", name = NULL, trainable = TRUE ) ggml_layer_conv_1d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = 1L, padding = "valid", name = NULL, trainable = TRUE )ggml_layer_conv_1d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = 1L, padding = "valid", name = NULL, trainable = TRUE ) ggml_layer_conv_1d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = 1L, padding = "valid", name = NULL, trainable = TRUE )
model |
A ggml_sequential_model object |
filters |
Number of output filters |
kernel_size |
Integer kernel size |
activation |
Activation function name: "relu", "sigmoid", "tanh", "softmax", or NULL |
input_shape |
Input shape c(L, C) - required for first layer only (length, channels) |
strides |
Integer stride (default 1) |
padding |
"valid" (no padding) or "same" (preserve length) |
name |
Optional character name for the layer. |
trainable |
Logical; whether the layer weights are updated during training. |
A ggml_layer object.
The model object with the conv_1d layer appended (invisibly).
model <- ggml_model_sequential() |> ggml_layer_conv_1d(32, 3, activation = "relu", input_shape = c(100, 1))model <- ggml_model_sequential() |> ggml_layer_conv_1d(32, 3, activation = "relu", input_shape = c(100, 1))
Create a Conv2D Layer Object
Add 2D Convolution Layer
ggml_layer_conv_2d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = c(1L, 1L), padding = "valid", name = NULL, trainable = TRUE ) ggml_layer_conv_2d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = c(1L, 1L), padding = "valid", name = NULL, trainable = TRUE )ggml_layer_conv_2d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = c(1L, 1L), padding = "valid", name = NULL, trainable = TRUE ) ggml_layer_conv_2d( model, filters, kernel_size, activation = NULL, input_shape = NULL, strides = c(1L, 1L), padding = "valid", name = NULL, trainable = TRUE )
model |
A ggml_sequential_model object |
filters |
Number of output filters |
kernel_size |
Integer or vector of 2 integers for kernel height and width |
activation |
Activation function name: "relu", "sigmoid", "tanh", "softmax", or NULL |
input_shape |
Input shape c(H, W, C) - required for first layer only |
strides |
Integer or vector of 2 integers for stride |
padding |
"valid" (no padding) or "same" (preserve spatial dims) |
name |
Optional character name for the layer. |
trainable |
Logical; whether the layer weights are updated during training. |
A ggml_layer object.
The model object with the conv_2d layer appended (invisibly).
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1))model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1))
Add Dense (Fully Connected) Layer
ggml_layer_dense( model, units, activation = NULL, input_shape = NULL, name = NULL, trainable = TRUE )ggml_layer_dense( model, units, activation = NULL, input_shape = NULL, name = NULL, trainable = TRUE )
model |
A ggml_sequential_model object |
units |
Number of output units |
activation |
Activation function name: "relu", "sigmoid", "tanh", "softmax", or NULL |
input_shape |
Integer or integer vector specifying the input shape (only needed for the first layer) |
name |
Optional character name for the layer. |
trainable |
Logical; whether the layer weights are updated during training. |
The model object with the dense layer appended (invisibly).
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_flatten() |> ggml_layer_dense(128, activation = "relu")model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_flatten() |> ggml_layer_dense(128, activation = "relu")
Applies dropout regularization. During training, multiplies all activations
by (1 - rate) (deterministic expected-value scaling).
During inference (training = FALSE), the layer is an identity (no change).
ggml_layer_dropout( model, rate, stochastic = FALSE, name = NULL, trainable = FALSE )ggml_layer_dropout( model, rate, stochastic = FALSE, name = NULL, trainable = FALSE )
model |
A |
rate |
Dropout rate in |
stochastic |
Logical. If |
name |
Optional layer name. |
trainable |
Ignored for dropout (no weights); kept for API consistency. |
The model with the dropout layer appended, or a new tensor node.
Keras implements inverted dropout: during training it applies a random
Bernoulli mask and scales surviving activations up by
1 / (1 - rate), so the expected value of each unit is preserved and
no scaling is needed at inference.
This implementation uses deterministic scaling (multiply by
(1 - rate) at training, identity at inference) — equivalent in
expected value but without stochastic noise. Consequences:
No random mask → the regularization signal is weaker (no co-adaptation breaking).
Activations at training are scaled down, not up — the magnitude seen by subsequent layers differs from Keras behaviour.
Results are fully deterministic and reproducible without setting a seed.
With stochastic = TRUE the Bernoulli mask is regenerated once
per epoch (not per batch), because ggml_opt_fit processes all
batches inside a single C call. This is weaker than per-batch dropout
but stronger than the deterministic variant.
model <- ggml_model_sequential() |> ggml_layer_dense(128, activation = "relu", input_shape = 784L) |> ggml_layer_dropout(0.5, stochastic = TRUE) |> ggml_layer_dense(10, activation = "softmax")model <- ggml_model_sequential() |> ggml_layer_dense(128, activation = "relu", input_shape = 784L) |> ggml_layer_dropout(0.5, stochastic = TRUE) |> ggml_layer_dense(10, activation = "softmax")
Looks up dense vectors for integer token indices. The input must be an
integer matrix of 0-based indices in [0, vocab_size - 1] (use
ggml_input(shape, dtype = "int32") in Functional mode).
ggml_layer_embedding(model, vocab_size, dim, name = NULL, trainable = TRUE)ggml_layer_embedding(model, vocab_size, dim, name = NULL, trainable = TRUE)
model |
A |
vocab_size |
Number of distinct tokens (vocabulary size). |
dim |
Embedding dimension (vector length per token). |
name |
Optional layer name. |
trainable |
Logical; whether embedding weights are updated during training. |
The model with the embedding layer appended, or a new tensor node.
ggml stores tensors in column-major order, so the output shape is
[dim, seq_len] per sample (ggml convention) rather than
[seq_len, dim] as in Keras. When you call ggml_layer_flatten()
after embedding the result is the same flattened vector regardless of order,
but if you access raw output tensors be aware of this transposition.
Indices must be in [0, vocab_size - 1]. Out-of-range values cause
undefined behaviour inside the ggml kernel (no bounds check is performed at
the R level).
inp <- ggml_input(shape = 10L, dtype = "int32") out <- inp |> ggml_layer_embedding(vocab_size = 1000L, dim = 32L) |> ggml_layer_flatten() |> ggml_layer_dense(10L, activation = "softmax") model <- ggml_model(inputs = inp, outputs = out)inp <- ggml_input(shape = 10L, dtype = "int32") out <- inp |> ggml_layer_embedding(vocab_size = 1000L, dim = 32L) |> ggml_layer_flatten() |> ggml_layer_dense(10L, activation = "softmax") model <- ggml_model(inputs = inp, outputs = out)
Flattens the spatial dimensions into a single vector per sample.
ggml_layer_flatten(model, name = NULL, trainable = TRUE)ggml_layer_flatten(model, name = NULL, trainable = TRUE)
model |
A ggml_sequential_model object |
name |
Optional character name for the layer. |
trainable |
Logical; reserved for API consistency (no weights). |
The model object with the flatten layer appended (invisibly).
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_flatten()model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_flatten()
Reduces a [H, W, C] feature map to [C] by averaging all
spatial positions per channel. Equivalent to Keras
GlobalAveragePooling2D().
ggml_layer_global_average_pooling_2d(model, name = NULL, trainable = TRUE)ggml_layer_global_average_pooling_2d(model, name = NULL, trainable = TRUE)
model |
A |
name |
Optional character name for the layer. |
trainable |
Logical; reserved for API consistency (no weights). |
Updated model or a new ggml_tensor_node.
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_global_average_pooling_2d() |> ggml_layer_dense(10, activation = "softmax")model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_global_average_pooling_2d() |> ggml_layer_dense(10, activation = "softmax")
Reduces a [H, W, C] feature map to [C] by taking the maximum
value per channel across all spatial positions. Equivalent to Keras
GlobalMaxPooling2D().
ggml_layer_global_max_pooling_2d(model, name = NULL, trainable = TRUE)ggml_layer_global_max_pooling_2d(model, name = NULL, trainable = TRUE)
model |
A |
name |
Optional character name for the layer. |
trainable |
Logical; reserved for API consistency (no weights). |
Updated model or a new ggml_tensor_node.
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_global_max_pooling_2d() |> ggml_layer_dense(10, activation = "softmax")model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_global_max_pooling_2d() |> ggml_layer_dense(10, activation = "softmax")
Gated Recurrent Unit recurrent layer. Implemented as an unrolled computation graph (BPTT).
ggml_layer_gru( model, units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", input_shape = NULL, name = NULL, trainable = TRUE )ggml_layer_gru( model, units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", input_shape = NULL, name = NULL, trainable = TRUE )
model |
A |
units |
Integer, number of hidden units. |
return_sequences |
Logical; return all hidden states or only the last. |
activation |
Activation for the candidate hidden state ( |
recurrent_activation |
Activation for z/r gates ( |
input_shape |
Input shape |
name |
Optional layer name. |
trainable |
Logical. |
Updated model or a new ggml_tensor_node.
W_zh [input_size, 2*units] — input kernel for z and r
gates.
U_zh [units, 2*units] — recurrent kernel for z and r.
b_zh [2*units] — bias for z and r.
W_n [input_size, units] — input kernel for candidate.
U_n [units, units] — recurrent kernel for candidate.
b_n [units] — bias for candidate.
model <- ggml_model_sequential() |> ggml_layer_gru(64L, input_shape = c(10L, 32L)) |> ggml_layer_dense(10L, activation = "softmax")model <- ggml_model_sequential() |> ggml_layer_gru(64L, input_shape = c(10L, 32L)) |> ggml_layer_dense(10L, activation = "softmax")
Long Short-Term Memory recurrent layer. Implemented as an unrolled computation graph (BPTT) so that ggml's automatic differentiation works without any C extensions.
ggml_layer_lstm( model, units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", input_shape = NULL, name = NULL, trainable = TRUE )ggml_layer_lstm( model, units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", input_shape = NULL, name = NULL, trainable = TRUE )
model |
A |
units |
Integer, number of hidden units. |
return_sequences |
Logical; if |
activation |
Activation for the cell gate (default |
recurrent_activation |
Activation for the recurrent step (default
|
input_shape |
Input shape |
name |
Optional layer name. |
trainable |
Logical. |
Updated model or a new ggml_tensor_node.
W_gates [input_size, 4*units] — input kernel for all
four gates (i, f, g, o) concatenated.
U_gates [units, 4*units] — recurrent kernel.
b_gates [4*units] — bias.
Input: [seq_len, input_size] per sample (R row-major), or a 3-D
array [N, seq_len, input_size]. In the Functional API the input
node shape should be c(seq_len, input_size).
Output (Sequential): [units] per sample when
return_sequences = FALSE (default), or c(seq_len, units)
when return_sequences = TRUE.
model <- ggml_model_sequential() |> ggml_layer_lstm(64L, input_shape = c(10L, 32L)) |> ggml_layer_dense(10L, activation = "softmax")model <- ggml_model_sequential() |> ggml_layer_lstm(64L, input_shape = c(10L, 32L)) |> ggml_layer_dense(10L, activation = "softmax")
Add 2D Max Pooling Layer
ggml_layer_max_pooling_2d( model, pool_size = c(2L, 2L), strides = NULL, name = NULL, trainable = TRUE )ggml_layer_max_pooling_2d( model, pool_size = c(2L, 2L), strides = NULL, name = NULL, trainable = TRUE )
model |
A ggml_sequential_model object |
pool_size |
Integer or vector of 2 integers for pool height and width |
strides |
Integer or vector of 2 integers (defaults to pool_size) |
name |
Optional character name for the layer. |
trainable |
Logical; reserved for API consistency (no weights). |
The model object with the max pooling layer appended (invisibly).
model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_max_pooling_2d(c(2, 2))model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_max_pooling_2d(c(2, 2))
Creates a graph node for Leaky ReLU activation. LeakyReLU(x) = x if x > 0, else negative_slope * x. Unlike standard ReLU, Leaky ReLU allows a small gradient for negative values.
ggml_leaky_relu(ctx, a, negative_slope = 0.01, inplace = FALSE)ggml_leaky_relu(ctx, a, negative_slope = 0.01, inplace = FALSE)
ctx |
GGML context |
a |
Input tensor |
negative_slope |
Slope for negative values (default: 0.01) |
inplace |
If TRUE, operation is performed in-place (default: FALSE) |
Tensor representing the Leaky ReLU operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_leaky_relu(ctx, a, negative_slope = 0.1) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # [-0.2, -0.1, 0, 1, 2] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_leaky_relu(ctx, a, negative_slope = 0.1) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # [-0.2, -0.1, 0, 1, 2] ggml_free(ctx)
Restores a model previously saved with ggml_save_model(). The
returned model is compiled and ready for ggml_predict() /
ggml_evaluate(). Call ggml_fit() again to continue training.
ggml_load_model(path, backend = "auto")ggml_load_model(path, backend = "auto")
path |
File path to an RDS file written by |
backend |
Backend selection: |
A compiled model object.
model <- ggml_model_sequential() |> ggml_layer_dense(16L, activation = "relu", input_shape = 4L) |> ggml_layer_dense(2L, activation = "softmax") model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") x <- matrix(runif(64 * 4), 64, 4) y <- matrix(c(rep(c(1,0), 32), rep(c(0,1), 32)), 64, 2) model <- ggml_fit(model, x, y, epochs = 1L, batch_size = 32L, verbose = 0L) tmp <- tempfile(fileext = ".rds") ggml_save_model(model, tmp) model2 <- ggml_load_model(tmp)model <- ggml_model_sequential() |> ggml_layer_dense(16L, activation = "relu", input_shape = 4L) |> ggml_layer_dense(2L, activation = "softmax") model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") x <- matrix(runif(64 * 4), 64, 4) y <- matrix(c(rep(c(1,0), 32), rep(c(0,1), 32)), 64, 2) model <- ggml_fit(model, x, y, epochs = 1L, batch_size = 32L, verbose = 0L) tmp <- tempfile(fileext = ".rds") ggml_save_model(model, tmp) model2 <- ggml_load_model(tmp)
Loads previously saved weights into a compiled model. The model architecture must match the saved weights (same layer types, sizes, and shapes).
ggml_load_weights(model, path)ggml_load_weights(model, path)
model |
A compiled ggml_sequential_model (same architecture as saved) |
path |
File path to load weights from |
The model with loaded weights.
Creates a graph node for element-wise natural logarithm: log(x)
ggml_log(ctx, a)ggml_log(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the log operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) ggml_set_f32(a, c(1, exp(1), exp(2))) result <- ggml_log(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [0, 1, 2] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) ggml_set_f32(a, c(1, exp(1), exp(2))) result <- ggml_log(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [0, 1, 2] ggml_free(ctx)
Creates a graph node for in-place element-wise natural logarithm.
ggml_log_inplace(ctx, a)ggml_log_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with log values
Check if R Logging is Enabled
ggml_log_is_r_enabled()ggml_log_is_r_enabled()
Logical indicating if R-compatible logging is active
Other logging:
ggml_abort_is_r_enabled(),
ggml_log_set_default(),
ggml_log_set_r(),
ggml_set_abort_callback_default(),
ggml_set_abort_callback_r()
Restores GGML to default logging behavior (stderr output).
ggml_log_set_default()ggml_log_set_default()
NULL invisibly
Other logging:
ggml_abort_is_r_enabled(),
ggml_log_is_r_enabled(),
ggml_log_set_r(),
ggml_set_abort_callback_default(),
ggml_set_abort_callback_r()
Redirects GGML log messages to R's message system: - INFO/DEBUG messages go to stdout (via Rprintf) - WARN/ERROR messages go to stderr (via REprintf)
ggml_log_set_r()ggml_log_set_r()
NULL invisibly
Other logging:
ggml_abort_is_r_enabled(),
ggml_log_is_r_enabled(),
ggml_log_set_default(),
ggml_set_abort_callback_default(),
ggml_set_abort_callback_r()
ggml_log_set_r() # Now GGML messages will appear in R consoleggml_log_set_r() # Now GGML messages will appear in R console
Create an LSTM Layer Object
ggml_lstm( units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", name = NULL, trainable = TRUE )ggml_lstm( units, return_sequences = FALSE, activation = "tanh", recurrent_activation = "sigmoid", name = NULL, trainable = TRUE )
units |
Integer, number of hidden units. |
return_sequences |
Logical. |
activation |
Cell gate activation (default |
recurrent_activation |
Recurrent gate activation (default |
name |
Optional character name. |
trainable |
Logical. |
A ggml_layer object.
Serializes a trained sequential or functional ggmlR model into a self-describing raw container suitable for transport between R sessions or parallel workers (e.g. for mlr3 parallel resampling and tuning).
ggml_marshal_model(model)ggml_marshal_model(model)
model |
A compiled |
The container wraps the bytes produced by ggml_save_model
together with a format tag, schema version, package/R versions, a SHA-256
integrity checksum, and a timestamp. Autograd modules are not
supported in this version and cause the function to signal an error; the
mlr3 learners catch this and fall back to marshaled = FALSE.
A named list with class "ggmlR_marshaled" containing the
serialized payload and metadata. Pass it to
ggml_unmarshal_model to reconstruct the model.
ggml_unmarshal_model, ggml_save_model
Creates a graph node that computes the mean of all elements.
ggml_mean(ctx, a)ggml_mean(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Scalar tensor with the mean
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(2, 4, 6, 8, 10)) result <- ggml_mean(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # 6 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(2, 4, 6, 8, 10)) result <- ggml_mean(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # 6 ggml_free(ctx)
Assembles a ggml_functional_model from symbolic input and output
nodes produced by ggml_input() and ggml_layer_*() calls.
ggml_model(inputs, outputs)ggml_model(inputs, outputs)
inputs |
A |
outputs |
A |
A ggml_functional_model object.
x <- ggml_input(shape = 64L) out <- x |> ggml_layer_dense(10, activation = "softmax") model <- ggml_model(inputs = x, outputs = out)x <- ggml_input(shape = 64L) out <- x |> ggml_layer_dense(10, activation = "softmax") model <- ggml_model(inputs = x, outputs = out)
Reports the backend the model was actually compiled onto, making a
silent backend = "auto" fallback to CPU (when no GPU is available)
inspectable. Works on a raw sequential/functional model or a fitted parsnip
engine object.
ggml_model_backend(object, verbose = FALSE)ggml_model_backend(object, verbose = FALSE)
object |
A compiled/fitted |
verbose |
If |
If verbose = FALSE, a length-1 character: the backend in use
("vulkan" or "cpu"). If verbose = TRUE, a list with:
requested (what was asked: "auto"/"cpu"/"vulkan"),
used ("vulkan"/"cpu"), device (GPU device
description, or "cpu") and fallback (logical: TRUE when
a non-CPU backend was requested but CPU was used instead).
Creates an empty sequential model that layers can be added to using
pipe (|>) operators.
ggml_model_sequential()ggml_model_sequential()
A ggml_sequential_model object
## Not run: model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_max_pooling_2d(c(2,2)) |> ggml_layer_flatten() |> ggml_layer_dense(128, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") ## End(Not run)## Not run: model <- ggml_model_sequential() |> ggml_layer_conv_2d(32, c(3,3), activation = "relu", input_shape = c(28, 28, 1)) |> ggml_layer_max_pooling_2d(c(2,2)) |> ggml_layer_flatten() |> ggml_layer_dense(128, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") ## End(Not run)
Creates a graph node for element-wise multiplication.
ggml_mul(ctx, a, b) ggml_mul(ctx, a, b)ggml_mul(ctx, a, b) ggml_mul(ctx, a, b)
ctx |
GGML context |
a |
First tensor |
b |
Second tensor (same shape as a) |
Tensor representing the multiplication operation
Tensor representing the multiplication operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) result <- ggml_mul(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) result <- ggml_mul(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) result <- ggml_mul(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) ggml_set_f32(b, c(2, 2, 2, 2, 2)) result <- ggml_mul(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place element-wise multiplication. Result is stored in tensor a, saving memory allocation.
ggml_mul_inplace(ctx, a, b)ggml_mul_inplace(ctx, a, b)
ctx |
GGML context |
a |
First tensor (will be modified in-place) |
b |
Second tensor (same shape as a) |
View of tensor a with the multiplication result
Creates a graph node for matrix multiplication. CRITICAL for LLM operations. For matrices A (m x n) and B (n x p), computes C = A * B (m x p).
ggml_mul_mat(ctx, a, b)ggml_mul_mat(ctx, a, b)
ctx |
GGML context |
a |
First matrix tensor |
b |
Second matrix tensor |
Tensor representing the matrix multiplication
ctx <- ggml_init(1024 * 1024) A <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 3) B <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 2) ggml_set_f32(A, 1:12) ggml_set_f32(B, 1:8) C <- ggml_mul_mat(ctx, A, B) graph <- ggml_build_forward_expand(ctx, C) ggml_graph_compute(ctx, graph) ggml_get_f32(C) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) A <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 3) B <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 4, 2) ggml_set_f32(A, 1:12) ggml_set_f32(B, 1:8) C <- ggml_mul_mat(ctx, A, B) graph <- ggml_build_forward_expand(ctx, C) ggml_graph_compute(ctx, graph) ggml_get_f32(C) ggml_free(ctx)
Indirect matrix multiplication for Mixture of Experts architectures. Selects expert weights based on indices and performs batched matmul.
ggml_mul_mat_id(ctx, as, b, ids)ggml_mul_mat_id(ctx, as, b, ids)
ctx |
GGML context |
as |
Stacked expert weight matrices [n_embd, n_ff, n_experts] |
b |
Input tensor |
ids |
Expert selection indices tensor (I32) |
Output tensor after expert-selected matrix multiplication
ctx <- ggml_init(64 * 1024 * 1024) # 4 experts, each with 8x16 weights (small for example) experts <- ggml_new_tensor_3d(ctx, GGML_TYPE_F32, 8, 16, 4) ggml_set_f32(experts, rnorm(8 * 16 * 4)) input <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 2) ggml_set_f32(input, rnorm(16)) # Select expert 0 for token 0, expert 2 for token 1 ids <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 2) ggml_set_i32(ids, c(0L, 2L)) output <- ggml_mul_mat_id(ctx, experts, input, ids) graph <- ggml_build_forward_expand(ctx, output) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(64 * 1024 * 1024) # 4 experts, each with 8x16 weights (small for example) experts <- ggml_new_tensor_3d(ctx, GGML_TYPE_F32, 8, 16, 4) ggml_set_f32(experts, rnorm(8 * 16 * 4)) input <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 2) ggml_set_f32(input, rnorm(16)) # Select expert 0 for token 0, expert 2 for token 1 ids <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 2) ggml_set_i32(ids, c(0L, 2L)) output <- ggml_mul_mat_id(ctx, experts, input, ids) graph <- ggml_build_forward_expand(ctx, output) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Returns the number of dimensions of a tensor
ggml_n_dims(tensor)ggml_n_dims(tensor)
tensor |
Tensor pointer |
Integer number of dimensions (1-4)
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_n_dims(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_n_dims(t) ggml_free(ctx)
Get number of bytes
Get Number of Bytes
ggml_nbytes(tensor) ggml_nbytes(tensor)ggml_nbytes(tensor) ggml_nbytes(tensor)
tensor |
Tensor |
Integer number of bytes
Integer number of bytes
ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_nbytes(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_nbytes(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_nbytes(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_nbytes(t) ggml_free(ctx)
Creates a graph node for element-wise negation: -x
ggml_neg(ctx, a)ggml_neg(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the negation operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, -2, 3, -4)) result <- ggml_neg(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [-1, 2, -3, 4] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, -2, 3, -4)) result <- ggml_neg(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [-1, 2, -3, 4] ggml_free(ctx)
Creates a graph node for in-place element-wise negation: -x
ggml_neg_inplace(ctx, a)ggml_neg_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with negated values
Get number of elements
Get Number of Elements
ggml_nelements(tensor) ggml_nelements(tensor)ggml_nelements(tensor) ggml_nelements(tensor)
tensor |
Tensor |
Integer number of elements
Integer number of elements
ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_nelements(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_nelements(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_nelements(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_nelements(t) ggml_free(ctx)
Creates a 1-element tensor containing a single float value. Useful for scalar operations, learning rates, and other scalar floats.
ggml_new_f32(ctx, value)ggml_new_f32(ctx, value)
ctx |
GGML context |
value |
Numeric value |
Tensor pointer (1-element F32 tensor)
ctx <- ggml_init(1024 * 1024) scalar <- ggml_new_f32(ctx, 3.14) ggml_get_f32(scalar) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) scalar <- ggml_new_f32(ctx, 3.14) ggml_get_f32(scalar) ggml_free(ctx)
Creates a 1-element tensor containing a single integer value. Useful for indices, counters, and other scalar integer operations.
ggml_new_i32(ctx, value)ggml_new_i32(ctx, value)
ctx |
GGML context |
value |
Integer value |
Tensor pointer (1-element I32 tensor)
ctx <- ggml_init(1024 * 1024) scalar <- ggml_new_i32(ctx, 42) ggml_get_i32(scalar) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) scalar <- ggml_new_i32(ctx, 42) ggml_get_i32(scalar) ggml_free(ctx)
Generic tensor constructor for creating tensors with 1-4 dimensions. This is more flexible than the ggml_new_tensor_Nd functions.
ggml_new_tensor(ctx, type = GGML_TYPE_F32, n_dims, ne)ggml_new_tensor(ctx, type = GGML_TYPE_F32, n_dims, ne)
ctx |
GGML context |
type |
Data type (GGML_TYPE_F32, etc.) |
n_dims |
Number of dimensions (1-4) |
ne |
Numeric vector of dimension sizes |
Tensor pointer
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor(ctx, GGML_TYPE_F32, 3, c(10, 20, 30)) ggml_nelements(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor(ctx, GGML_TYPE_F32, 3, c(10, 20, 30)) ggml_nelements(t) ggml_free(ctx)
Create 1D tensor
Create 1D Tensor
ggml_new_tensor_1d(ctx, type = GGML_TYPE_F32, ne0) ggml_new_tensor_1d(ctx, type = GGML_TYPE_F32, ne0)ggml_new_tensor_1d(ctx, type = GGML_TYPE_F32, ne0) ggml_new_tensor_1d(ctx, type = GGML_TYPE_F32, ne0)
ctx |
GGML context |
type |
Data type |
ne0 |
Size |
Tensor pointer
Tensor pointer
ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_nelements(tensor) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_nelements(tensor) ggml_free(ctx)
Create 2D tensor
Create 2D Tensor
ggml_new_tensor_2d(ctx, type = GGML_TYPE_F32, ne0, ne1) ggml_new_tensor_2d(ctx, type = GGML_TYPE_F32, ne0, ne1)ggml_new_tensor_2d(ctx, type = GGML_TYPE_F32, ne0, ne1) ggml_new_tensor_2d(ctx, type = GGML_TYPE_F32, ne0, ne1)
ctx |
GGML context |
type |
Data type |
ne0 |
Rows |
ne1 |
Columns |
Tensor pointer
Tensor pointer
ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_nelements(tensor) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_nelements(tensor) ggml_free(ctx)
Create 3D Tensor
ggml_new_tensor_3d(ctx, type = GGML_TYPE_F32, ne0, ne1, ne2)ggml_new_tensor_3d(ctx, type = GGML_TYPE_F32, ne0, ne1, ne2)
ctx |
GGML context |
type |
Data type (default GGML_TYPE_F32) |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
ne2 |
Size of dimension 2 |
Tensor pointer
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_3d(ctx, GGML_TYPE_F32, 10, 20, 30) ggml_nelements(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_3d(ctx, GGML_TYPE_F32, 10, 20, 30) ggml_nelements(t) ggml_free(ctx)
Create 4D Tensor
ggml_new_tensor_4d(ctx, type = GGML_TYPE_F32, ne0, ne1, ne2, ne3)ggml_new_tensor_4d(ctx, type = GGML_TYPE_F32, ne0, ne1, ne2, ne3)
ctx |
GGML context |
type |
Data type (default GGML_TYPE_F32) |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
ne2 |
Size of dimension 2 |
ne3 |
Size of dimension 3 |
Tensor pointer
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 8, 8, 3, 2) ggml_nelements(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 8, 8, 3, 2) ggml_nelements(t) ggml_free(ctx)
Creates a graph node for layer normalization. Normalizes input to zero mean and unit variance.
ggml_norm(ctx, a, eps = 1e-05)ggml_norm(ctx, a, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor |
eps |
Epsilon value for numerical stability (default: 1e-5) |
Tensor representing the layer normalization operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_norm(ctx, a, eps = 1e-5) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) # Normalized values ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_norm(ctx, a, eps = 1e-5) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) # Normalized values ggml_free(ctx)
Creates a graph node for in-place layer normalization. Returns a view of the input tensor.
ggml_norm_inplace(ctx, a, eps = 1e-05)ggml_norm_inplace(ctx, a, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
eps |
Epsilon value for numerical stability (default: 1e-5) |
View of input tensor with layer normalization applied
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_norm_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_norm_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Returns the number of rows in a tensor (product of all dimensions except ne[0]).
ggml_nrows(tensor)ggml_nrows(tensor)
tensor |
Tensor pointer |
Number of rows
Returns whether a GGML operation can reuse memory from its source tensors. This is useful for memory optimization.
ggml_op_can_inplace(op)ggml_op_can_inplace(op)
op |
Operation code (integer) |
Logical indicating if operation supports in-place execution
Other graph:
ggml_graph_view()
# Check if operation code 1 (ADD) can be in-place can_inplace <- ggml_op_can_inplace(1L)# Check if operation code 1 (ADD) can be in-place can_inplace <- ggml_op_can_inplace(1L)
Returns a description of the operation that produces a tensor.
ggml_op_desc(tensor)ggml_op_desc(tensor)
tensor |
Tensor pointer |
Character string describing the operation
Other op_info:
ggml_get_unary_op(),
ggml_op_name(),
ggml_op_symbol(),
ggml_unary_op_name()
Returns the string name of a GGML operation.
ggml_op_name(op)ggml_op_name(op)
op |
GGML operation constant |
Character string with operation name
Other op_info:
ggml_get_unary_op(),
ggml_op_desc(),
ggml_op_symbol(),
ggml_unary_op_name()
Returns the mathematical symbol for a GGML operation.
ggml_op_symbol(op)ggml_op_symbol(op)
op |
GGML operation constant |
Character string with operation symbol
Other op_info:
ggml_get_unary_op(),
ggml_op_desc(),
ggml_op_name(),
ggml_unary_op_name()
Must be called before ggml_opt_eval. Allocates forward or forward+backward graph.
ggml_opt_alloc(opt_ctx, backward = TRUE)ggml_opt_alloc(opt_ctx, backward = TRUE)
opt_ctx |
External pointer to optimizer context |
backward |
Whether to allocate backward graph (for training) |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get optimizer type from context
ggml_opt_context_optimizer_type(opt_ctx)ggml_opt_context_optimizer_type(opt_ctx)
opt_ctx |
External pointer to optimizer context |
Integer optimizer type constant
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the underlying data tensor with shape [ne_datapoint, ndata].
ggml_opt_dataset_data(dataset)ggml_opt_dataset_data(dataset)
dataset |
External pointer to dataset |
External pointer to data tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Releases memory associated with a dataset.
ggml_opt_dataset_free(dataset)ggml_opt_dataset_free(dataset)
dataset |
External pointer to dataset |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Copies a batch of data and labels to the provided tensors.
ggml_opt_dataset_get_batch(dataset, data_batch, labels_batch = NULL, ibatch)ggml_opt_dataset_get_batch(dataset, data_batch, labels_batch = NULL, ibatch)
dataset |
External pointer to dataset |
data_batch |
Tensor to receive data batch |
labels_batch |
Tensor to receive labels batch (can be NULL) |
ibatch |
Batch index |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Creates a dataset for training with specified data and label types.
ggml_opt_dataset_init( type_data, type_label, ne_datapoint, ne_label, ndata, ndata_shard = 1 )ggml_opt_dataset_init( type_data, type_label, ne_datapoint, ne_label, ndata, ndata_shard = 1 )
type_data |
GGML type for data tensor (e.g., GGML_TYPE_F32) |
type_label |
GGML type for label tensor (e.g., GGML_TYPE_F32) |
ne_datapoint |
Number of elements per datapoint |
ne_label |
Number of elements per label (0 if no labels) |
ndata |
Total number of datapoints |
ndata_shard |
Shard size for shuffling (1 is fine for most cases) |
External pointer to dataset
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the underlying labels tensor with shape [ne_label, ndata].
ggml_opt_dataset_labels(dataset)ggml_opt_dataset_labels(dataset)
dataset |
External pointer to dataset |
External pointer to labels tensor, or NULL if no labels
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get number of datapoints in dataset
ggml_opt_dataset_ndata(dataset)ggml_opt_dataset_ndata(dataset)
dataset |
External pointer to dataset |
Number of datapoints
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Shuffles the dataset using the RNG from the optimizer context.
ggml_opt_dataset_shuffle(opt_ctx, dataset, idata = -1)ggml_opt_dataset_shuffle(opt_ctx, dataset, idata = -1)
opt_ctx |
External pointer to optimizer context |
dataset |
External pointer to dataset |
idata |
Number of datapoints to shuffle (-1 for all) |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the (lazily allocated) per-datapoint weights tensor with shape
[1, ndata]. The first call allocates it; fill it via
ggml_backend_tensor_set_data(). Used together with
ggml_opt_loss_type_weighted_mse.
ggml_opt_dataset_weights(dataset)ggml_opt_dataset_weights(dataset)
dataset |
External pointer to dataset |
External pointer to weights tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns a list with default optimization parameters.
ggml_opt_default_params(sched, loss_type)ggml_opt_default_params(sched, loss_type)
sched |
Backend scheduler |
loss_type |
Loss type constant |
List with loss_type, build_type, opt_period, optimizer
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Performs training on the front portion of the dataset and evaluation on the back portion. This gives more control than ggml_opt_fit.
ggml_opt_epoch( opt_ctx, dataset, result_train = NULL, result_eval = NULL, idata_split, callback_train = TRUE, callback_eval = TRUE )ggml_opt_epoch( opt_ctx, dataset, result_train = NULL, result_eval = NULL, idata_split, callback_train = TRUE, callback_eval = TRUE )
opt_ctx |
External pointer to optimizer context |
dataset |
External pointer to dataset |
result_train |
Result object to accumulate training stats (or NULL) |
result_eval |
Result object to accumulate evaluation stats (or NULL) |
idata_split |
Data index at which to split training and evaluation |
callback_train |
Callback for training: TRUE for progress bar, FALSE for none, or a function(train, ibatch, ibatch_max, t_start_us, result) |
callback_eval |
Callback for evaluation: TRUE for progress bar, FALSE for none, or a function(train, ibatch, ibatch_max, t_start_us, result) |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
# Requires full optimizer setup - see ggml_opt_fit() for simpler API if (FALSE) { result_train <- ggml_opt_result_init() result_eval <- ggml_opt_result_init() ggml_opt_epoch(opt_ctx, dataset, result_train, result_eval, idata_split = 900, callback_train = TRUE) ggml_opt_result_free(result_train) ggml_opt_result_free(result_eval) }# Requires full optimizer setup - see ggml_opt_fit() for simpler API if (FALSE) { result_train <- ggml_opt_result_init() result_eval <- ggml_opt_result_init() ggml_opt_epoch(opt_ctx, dataset, result_train, result_eval, idata_split = 900, callback_train = TRUE) ggml_opt_result_free(result_train) ggml_opt_result_free(result_eval) }
Performs forward pass, optionally increments result, and does backward pass if allocated.
ggml_opt_eval(opt_ctx, result = NULL)ggml_opt_eval(opt_ctx, result = NULL)
opt_ctx |
External pointer to optimizer context |
result |
External pointer to result object (optional) |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
High-level function to train a model on a dataset. This is the recommended way to train models.
ggml_opt_fit( sched, ctx_compute, inputs, outputs, dataset, loss_type = ggml_opt_loss_type_mse(), optimizer = ggml_opt_optimizer_type_adamw(), nepoch = 1, nbatch_logical = 32, val_split = 0, silent = FALSE )ggml_opt_fit( sched, ctx_compute, inputs, outputs, dataset, loss_type = ggml_opt_loss_type_mse(), optimizer = ggml_opt_optimizer_type_adamw(), nepoch = 1, nbatch_logical = 32, val_split = 0, silent = FALSE )
sched |
Backend scheduler |
ctx_compute |
Compute context (for temporary tensors) |
inputs |
Input tensor with shape [ne_datapoint, batch_size] |
outputs |
Output tensor with shape [ne_label, batch_size] |
dataset |
Dataset created with ggml_opt_dataset_init |
loss_type |
Loss type (default: MSE) |
optimizer |
Optimizer type (default: AdamW) |
nepoch |
Number of epochs |
nbatch_logical |
Logical batch size (for gradient accumulation) |
val_split |
Fraction of data for validation (0.0 to 1.0) |
silent |
Whether to suppress progress output |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
# Full training requires building a computation graph # See package vignettes for complete examples if (FALSE) { cpu <- ggml_backend_cpu_init() sched <- ggml_backend_sched_new(list(cpu)) dataset <- ggml_opt_dataset_init(GGML_TYPE_F32, GGML_TYPE_F32, 10, 1, 1000) # ... build model graph with ctx_compute, inputs, outputs ... ggml_opt_fit(sched, ctx_compute, inputs, outputs, dataset, nepoch = 10, val_split = 0.1) ggml_opt_dataset_free(dataset) ggml_backend_sched_free(sched) ggml_backend_free(cpu) }# Full training requires building a computation graph # See package vignettes for complete examples if (FALSE) { cpu <- ggml_backend_cpu_init() sched <- ggml_backend_sched_new(list(cpu)) dataset <- ggml_opt_dataset_init(GGML_TYPE_F32, GGML_TYPE_F32, 10, 1, 1000) # ... build model graph with ctx_compute, inputs, outputs ... ggml_opt_fit(sched, ctx_compute, inputs, outputs, dataset, nepoch = 10, val_split = 0.1) ggml_opt_dataset_free(dataset) ggml_backend_sched_free(sched) ggml_backend_free(cpu) }
Releases memory associated with an optimizer context.
ggml_opt_free(opt_ctx)ggml_opt_free(opt_ctx)
opt_ctx |
External pointer to optimizer context |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get current learning rate from optimizer context
ggml_opt_get_lr(lr_ud)ggml_opt_get_lr(lr_ud)
lr_ud |
LR userdata pointer (from 'ggml_opt_init_for_fit()$lr_ud') |
Named numeric vector with 'adamw' and 'sgd' LR values
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the gradient accumulator tensor for a node from the forward graph.
ggml_opt_grad_acc(opt_ctx, node)ggml_opt_grad_acc(opt_ctx, node)
opt_ctx |
External pointer to optimizer context |
node |
External pointer to tensor node |
External pointer to gradient accumulator tensor, or NULL if not found
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Creates a new optimizer context for training.
ggml_opt_init( sched, loss_type, optimizer = ggml_opt_optimizer_type_adamw(), opt_period = 1L, ctx_compute = NULL, inputs = NULL, outputs = NULL )ggml_opt_init( sched, loss_type, optimizer = ggml_opt_optimizer_type_adamw(), opt_period = 1L, ctx_compute = NULL, inputs = NULL, outputs = NULL )
sched |
Backend scheduler |
loss_type |
Loss type (use ggml_opt_loss_type_* functions) |
optimizer |
Optimizer type (use ggml_opt_optimizer_type_* functions) |
opt_period |
Gradient accumulation steps before optimizer step |
ctx_compute |
Compute context for static graph mode (or NULL) |
inputs |
Input tensor for static graph mode (or NULL) |
outputs |
Output tensor for static graph mode (or NULL) |
External pointer to optimizer context
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns a list with 'opt_ctx' and 'lr_ud' (learning rate userdata pointer). Use 'ggml_opt_set_lr()' to update LR between epochs. The optimizer state (momentum) is preserved across epochs.
ggml_opt_init_for_fit( sched, loss_type, optimizer = ggml_opt_optimizer_type_adamw(), opt_period = 1L, ctx_compute = NULL, inputs = NULL, outputs = NULL )ggml_opt_init_for_fit( sched, loss_type, optimizer = ggml_opt_optimizer_type_adamw(), opt_period = 1L, ctx_compute = NULL, inputs = NULL, outputs = NULL )
sched |
Backend scheduler |
loss_type |
Loss type constant |
optimizer |
Optimizer type constant |
opt_period |
Gradient accumulation period |
ctx_compute |
Compute context (for static graphs) |
inputs |
Input tensor (for static graphs) |
outputs |
Output tensor (for static graphs) |
List with elements 'opt_ctx' and 'lr_ud'
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get inputs tensor from optimizer context
ggml_opt_inputs(opt_ctx)ggml_opt_inputs(opt_ctx)
opt_ctx |
External pointer to optimizer context |
External pointer to inputs tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get labels tensor from optimizer context
ggml_opt_labels(opt_ctx)ggml_opt_labels(opt_ctx)
opt_ctx |
External pointer to optimizer context |
External pointer to labels tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get loss tensor from optimizer context
ggml_opt_loss(opt_ctx)ggml_opt_loss(opt_ctx)
opt_ctx |
External pointer to optimizer context |
External pointer to loss tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for cross entropy loss type. Use for classification tasks.
ggml_opt_loss_type_cross_entropy()ggml_opt_loss_type_cross_entropy()
Integer constant for cross entropy loss
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for mean loss type. Custom loss - reduces outputs to mean value.
ggml_opt_loss_type_mean()ggml_opt_loss_type_mean()
Integer constant for mean loss
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for MSE loss type. Use for regression tasks.
ggml_opt_loss_type_mse()ggml_opt_loss_type_mse()
Integer constant for MSE loss
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for sum loss type. Custom loss - reduces outputs to sum value.
ggml_opt_loss_type_sum()ggml_opt_loss_type_sum()
Integer constant for sum loss
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for per-datapoint weighted MSE loss type. Computes
sum(w * (pred - y)^2) / nelements, where w is a per-sample
weight supplied via ggml_opt_dataset_weights.
ggml_opt_loss_type_weighted_mse()ggml_opt_loss_type_weighted_mse()
Integer constant for weighted MSE loss
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get number of correct predictions tensor
ggml_opt_ncorrect(opt_ctx)ggml_opt_ncorrect(opt_ctx)
opt_ctx |
External pointer to optimizer context |
External pointer to ncorrect tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get optimizer name
ggml_opt_optimizer_name(optimizer_type)ggml_opt_optimizer_name(optimizer_type)
optimizer_type |
Integer optimizer type constant |
Character string with optimizer name
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for AdamW optimizer. Adam with weight decay - recommended for most tasks.
ggml_opt_optimizer_type_adamw()ggml_opt_optimizer_type_adamw()
Integer constant for AdamW optimizer
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the constant for SGD optimizer. Stochastic gradient descent - simpler but may require tuning.
ggml_opt_optimizer_type_sgd()ggml_opt_optimizer_type_sgd()
Integer constant for SGD optimizer
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get outputs tensor from optimizer context
ggml_opt_outputs(opt_ctx)ggml_opt_outputs(opt_ctx)
opt_ctx |
External pointer to optimizer context |
External pointer to outputs tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get predictions tensor from optimizer context
ggml_opt_pred(opt_ctx)ggml_opt_pred(opt_ctx)
opt_ctx |
External pointer to optimizer context |
External pointer to predictions tensor
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Must be called before ggml_opt_alloc when not using static graphs. Sets up the optimizer context with the computation graph and input/output tensors.
ggml_opt_prepare_alloc(opt_ctx, ctx_compute, graph, inputs, outputs)ggml_opt_prepare_alloc(opt_ctx, ctx_compute, graph, inputs, outputs)
opt_ctx |
External pointer to optimizer context |
ctx_compute |
Compute context for temporary tensors |
graph |
Computation graph (from ggml_build_forward_expand) |
inputs |
Input tensor |
outputs |
Output tensor |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Resets gradients to zero, initializes loss, and optionally resets optimizer state.
ggml_opt_reset(opt_ctx, optimizer = FALSE)ggml_opt_reset(opt_ctx, optimizer = FALSE)
opt_ctx |
External pointer to optimizer context |
optimizer |
Whether to also reset optimizer state (momentum, etc.) |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get accuracy from result
ggml_opt_result_accuracy(result)ggml_opt_result_accuracy(result)
result |
External pointer to result object |
Named numeric vector with 'accuracy' and 'uncertainty'
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Free optimization result
ggml_opt_result_free(result)ggml_opt_result_free(result)
result |
External pointer to result object |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Creates a new result object to accumulate training statistics.
ggml_opt_result_init()ggml_opt_result_init()
External pointer to result object
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get loss from result
ggml_opt_result_loss(result)ggml_opt_result_loss(result)
result |
External pointer to result object |
Named numeric vector with 'loss' and 'uncertainty'
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Get number of datapoints from result
ggml_opt_result_ndata(result)ggml_opt_result_ndata(result)
result |
External pointer to result object |
Number of datapoints processed
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Returns the predictions as an integer vector. The length equals the number of datapoints processed.
ggml_opt_result_pred(result)ggml_opt_result_pred(result)
result |
External pointer to result object |
Integer vector of predictions
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_reset(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Reset optimization result
ggml_opt_result_reset(result)ggml_opt_result_reset(result)
result |
External pointer to result object |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_set_lr(),
ggml_opt_static_graphs()
Updates the LR used for subsequent backward passes. Can be called between epochs to implement LR scheduling.
ggml_opt_set_lr(lr_ud, adamw_lr = NA, sgd_lr = NA)ggml_opt_set_lr(lr_ud, adamw_lr = NA, sgd_lr = NA)
lr_ud |
LR userdata pointer (from 'ggml_opt_init_for_fit()$lr_ud') |
adamw_lr |
New AdamW learning rate (NA to keep current) |
sgd_lr |
New SGD learning rate (NA to keep current) |
NULL invisibly
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_static_graphs()
Check if using static graphs
ggml_opt_static_graphs(opt_ctx)ggml_opt_static_graphs(opt_ctx)
opt_ctx |
External pointer to optimizer context |
Logical indicating if graphs are statically allocated
Other optimization:
ggml_fit_opt(),
ggml_opt_alloc(),
ggml_opt_context_optimizer_type(),
ggml_opt_dataset_data(),
ggml_opt_dataset_free(),
ggml_opt_dataset_get_batch(),
ggml_opt_dataset_init(),
ggml_opt_dataset_labels(),
ggml_opt_dataset_ndata(),
ggml_opt_dataset_shuffle(),
ggml_opt_dataset_weights(),
ggml_opt_default_params(),
ggml_opt_epoch(),
ggml_opt_eval(),
ggml_opt_fit(),
ggml_opt_free(),
ggml_opt_get_lr(),
ggml_opt_grad_acc(),
ggml_opt_init(),
ggml_opt_init_for_fit(),
ggml_opt_inputs(),
ggml_opt_labels(),
ggml_opt_loss(),
ggml_opt_loss_type_cross_entropy(),
ggml_opt_loss_type_mean(),
ggml_opt_loss_type_mse(),
ggml_opt_loss_type_sum(),
ggml_opt_loss_type_weighted_mse(),
ggml_opt_ncorrect(),
ggml_opt_optimizer_name(),
ggml_opt_optimizer_type_adamw(),
ggml_opt_optimizer_type_sgd(),
ggml_opt_outputs(),
ggml_opt_pred(),
ggml_opt_prepare_alloc(),
ggml_opt_reset(),
ggml_opt_result_accuracy(),
ggml_opt_result_free(),
ggml_opt_result_init(),
ggml_opt_result_loss(),
ggml_opt_result_ndata(),
ggml_opt_result_pred(),
ggml_opt_result_reset(),
ggml_opt_set_lr()
Computes the outer product of two vectors: C = a * b^T For vectors a[m] and b[n], produces matrix C[m, n].
ggml_out_prod(ctx, a, b)ggml_out_prod(ctx, a, b)
ctx |
GGML context |
a |
First vector tensor |
b |
Second vector tensor |
Matrix tensor representing the outer product
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3)) ggml_set_f32(b, c(1, 2, 3, 4)) c <- ggml_out_prod(ctx, a, b) # Result: 3x4 matrix graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 3) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3)) ggml_set_f32(b, c(1, 2, 3, 4)) c <- ggml_out_prod(ctx, a, b) # Result: 3x4 matrix graph <- ggml_build_forward_expand(ctx, c) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Pads tensor dimensions with zeros on the right side. Useful for aligning tensor sizes in attention operations.
ggml_pad(ctx, a, p0 = 0L, p1 = 0L, p2 = 0L, p3 = 0L)ggml_pad(ctx, a, p0 = 0L, p1 = 0L, p2 = 0L, p3 = 0L)
ctx |
GGML context |
a |
Input tensor to pad |
p0 |
Padding for dimension 0 (default 0) |
p1 |
Padding for dimension 1 (default 0) |
p2 |
Padding for dimension 2 (default 0) |
p3 |
Padding for dimension 3 (default 0) |
Padded tensor with shape [ne0+p0, ne1+p1, ne2+p2, ne3+p3]
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 5, 3) ggml_set_f32(a, 1:15) # Pad to 8x4 b <- ggml_pad(ctx, a, 3, 1) # Add 3 zeros to dim0, 1 to dim1 graph <- ggml_build_forward_expand(ctx, b) ggml_graph_compute(ctx, graph) # Result shape: [8, 4] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 5, 3) ggml_set_f32(a, 1:15) # Pad to 8x4 b <- ggml_pad(ctx, a, 3, 1) # Add 3 zeros to dim0, 1 to dim1 graph <- ggml_build_forward_expand(ctx, b) ggml_graph_compute(ctx, graph) # Result shape: [8, 4] ggml_free(ctx)
Pads the first dimension of a tensor using reflection of its values.
ggml_pad_reflect_1d(ctx, a, p0, p1)ggml_pad_reflect_1d(ctx, a, p0, p1)
ctx |
GGML context |
a |
Input tensor |
p0 |
Left padding |
p1 |
Right padding |
Padded tensor
Permutes the tensor dimensions according to specified axes. CRITICAL for attention mechanisms in transformers.
ggml_permute(ctx, a, axis0, axis1, axis2, axis3)ggml_permute(ctx, a, axis0, axis1, axis2, axis3)
ctx |
GGML context |
a |
Input tensor |
axis0 |
New position for axis 0 |
axis1 |
New position for axis 1 |
axis2 |
New position for axis 2 |
axis3 |
New position for axis 3 |
Permuted tensor
ctx <- ggml_init(16 * 1024 * 1024) # Create 4D tensor: (2, 3, 4, 5) t <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 2, 3, 4, 5) # Swap axes 0 and 1: result shape (3, 2, 4, 5) t_perm <- ggml_permute(ctx, t, 1, 0, 2, 3) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Create 4D tensor: (2, 3, 4, 5) t <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 2, 3, 4, 5) # Swap axes 0 and 1: result shape (3, 2, 4, 5) t_perm <- ggml_permute(ctx, t, 1, 0, 2, 3) ggml_free(ctx)
Applies 1D pooling operation for downsampling.
ggml_pool_1d(ctx, a, op, k0, s0 = k0, p0 = 0L) GGML_OP_POOL_MAX GGML_OP_POOL_AVGggml_pool_1d(ctx, a, op, k0, s0 = k0, p0 = 0L) GGML_OP_POOL_MAX GGML_OP_POOL_AVG
ctx |
GGML context |
a |
Input tensor |
op |
Pool operation constant (see details) |
k0 |
Kernel size (window size) |
s0 |
Stride (default = k0 for non-overlapping windows) |
p0 |
Padding (default 0) |
An object of class integer of length 1.
An object of class integer of length 1.
Pool operation constants:
GGML_OP_POOL_MAX (0): Max pooling - takes maximum value in each window
GGML_OP_POOL_AVG (1): Average pooling - takes mean of values in each window
Pooled tensor with reduced dimensions
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 8) ggml_set_f32(a, c(1, 3, 2, 4, 5, 2, 8, 1)) # Max pooling with kernel 2, stride 2 max_pool <- ggml_pool_1d(ctx, a, GGML_OP_POOL_MAX, k0 = 2) # Result: [3, 4, 5, 8] (max of each pair) # Average pooling with kernel 2, stride 2 avg_pool <- ggml_pool_1d(ctx, a, GGML_OP_POOL_AVG, k0 = 2) # Result: [2, 3, 3.5, 4.5] (mean of each pair) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 8) ggml_set_f32(a, c(1, 3, 2, 4, 5, 2, 8, 1)) # Max pooling with kernel 2, stride 2 max_pool <- ggml_pool_1d(ctx, a, GGML_OP_POOL_MAX, k0 = 2) # Result: [3, 4, 5, 8] (max of each pair) # Average pooling with kernel 2, stride 2 avg_pool <- ggml_pool_1d(ctx, a, GGML_OP_POOL_AVG, k0 = 2) # Result: [2, 3, 3.5, 4.5] (mean of each pair) ggml_free(ctx)
Applies 2D pooling operation.
ggml_pool_2d(ctx, a, op, k0, k1, s0 = k0, s1 = k1, p0 = 0, p1 = 0)ggml_pool_2d(ctx, a, op, k0, k1, s0 = k0, s1 = k1, p0 = 0, p1 = 0)
ctx |
GGML context |
a |
Input tensor |
op |
Pool operation: GGML_OP_POOL_MAX (0) or GGML_OP_POOL_AVG (1) |
k0 |
Kernel size dimension 0 |
k1 |
Kernel size dimension 1 |
s0 |
Stride dimension 0 (default = k0) |
s1 |
Stride dimension 1 (default = k1) |
p0 |
Padding dimension 0 (default 0) |
p1 |
Padding dimension 1 (default 0) |
Pooled tensor
Removes the last layer from the model. The model must not be compiled.
ggml_pop_layer(model)ggml_pop_layer(model)
model |
A ggml_sequential_model object |
The model with the last layer removed.
model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") model <- ggml_pop_layer(model) length(model$layers) # 1model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") model <- ggml_pop_layer(model) length(model$layers) # 1
Returns predicted class indices (1-based) by applying argmax
to the output of ggml_predict().
ggml_predict_classes(model, x, batch_size = 32L)ggml_predict_classes(model, x, batch_size = 32L)
model |
A trained ggml_sequential_model |
x |
Input data (matrix or array) |
batch_size |
Batch size for inference |
Integer vector of predicted class indices (1-based)
Runs forward pass on input data and returns prediction probabilities
(or raw output values for regression). Unlike ggml_evaluate(), this
does not require labels.
## S3 method for class 'ggml_functional_model' ggml_predict(model, x, batch_size = 32L, ...) ggml_predict(model, ...) ## S3 method for class 'ggml_sequential_model' ggml_predict(model, x, batch_size = 32L, ...)## S3 method for class 'ggml_functional_model' ggml_predict(model, x, batch_size = 32L, ...) ggml_predict(model, ...) ## S3 method for class 'ggml_sequential_model' ggml_predict(model, x, batch_size = 32L, ...)
model |
A trained ggml_sequential_model |
x |
Input data (matrix or array) |
batch_size |
Batch size for inference |
... |
Additional arguments (ignored). |
Matrix of predictions with shape [N, output_units]
Helper to print memory usage information
ggml_print_mem_status(ctx)ggml_print_mem_status(ctx)
ctx |
GGML context |
List with total, used, free memory (invisible)
ctx <- ggml_init(1024 * 1024) ggml_print_mem_status(ctx) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_print_mem_status(ctx) ggml_free(ctx)
Debug function to print all objects (tensors) in the context
ggml_print_objects(ctx)ggml_print_objects(ctx)
ctx |
GGML context |
NULL (invisible)
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_print_objects(ctx) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_print_objects(ctx) ggml_free(ctx)
Returns information about a quantization type including name, type size, block size, and whether it's quantized.
ggml_quant_block_info(type)ggml_quant_block_info(type)
type |
GGML type constant |
List with type_name, type_size, block_size, is_quantized
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Quantizes a chunk of floating-point data to a lower precision format.
ggml_quantize_chunk(type, src, nrows, n_per_row)ggml_quantize_chunk(type, src, nrows, n_per_row)
type |
Target GGML type (e.g., GGML_TYPE_Q4_0) |
src |
Source numeric vector (F32 data) |
nrows |
Number of rows |
n_per_row |
Number of elements per row |
Raw vector containing quantized data
# Quantize 256 floats to Q8_0 (block size 32) data <- rnorm(256) quantized <- ggml_quantize_chunk(GGML_TYPE_Q8_0, data, 1, 256) ggml_quantize_free() # Clean up# Quantize 256 floats to Q8_0 (block size 32) data <- rnorm(256) quantized <- ggml_quantize_chunk(GGML_TYPE_Q8_0, data, 1, 256) ggml_quantize_free() # Clean up
Frees any memory allocated by quantization. Call at end of program to avoid memory leaks.
ggml_quantize_free()ggml_quantize_free()
NULL invisibly
Initializes quantization tables for a given type. Called automatically by ggml_quantize_chunk, but can be called manually.
ggml_quantize_init(type)ggml_quantize_init(type)
type |
GGML type (e.g., GGML_TYPE_Q4_0) |
NULL invisibly
Some quantization types require an importance matrix for optimal quality.
ggml_quantize_requires_imatrix(type)ggml_quantize_requires_imatrix(type)
type |
GGML type |
TRUE if importance matrix is required
Creates a graph node for ReGLU operation. ReGLU uses ReLU as the activation function on the first half.
ggml_reglu(ctx, a)ggml_reglu(ctx, a)
ctx |
GGML context |
a |
Input tensor (first dimension must be even) |
Formula: output = ReLU(x) * gate
Tensor with half the first dimension of input
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 3) ggml_set_f32(a, rnorm(24)) r <- ggml_reglu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 4x3 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 3) ggml_set_f32(a, rnorm(24)) r <- ggml_reglu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 4x3 ggml_free(ctx)
Creates a graph node for ReGLU with separate input and gate tensors.
ggml_reglu_split(ctx, a, b)ggml_reglu_split(ctx, a, b)
ctx |
GGML context |
a |
Input tensor (the values to be gated) |
b |
Gate tensor (same shape as a) |
Formula: output = ReLU(a) * b
Tensor with same shape as input tensors
Creates a graph node for ReLU (Rectified Linear Unit) activation: max(0, x)
ggml_relu(ctx, a)ggml_relu(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the ReLU operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_relu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place ReLU activation: max(0, x)
ggml_relu_inplace(ctx, a)ggml_relu_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with ReLU applied
Creates a graph node that repeats tensor 'a' to match shape of tensor 'b'.
ggml_repeat(ctx, a, b)ggml_repeat(ctx, a, b)
ctx |
GGML context |
a |
Tensor to repeat |
b |
Target tensor (defines output shape) |
Tensor with repeated values
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 1, 2) ggml_set_f32(a, c(1, 2)) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2) result <- ggml_repeat(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 1, 1, 2, 2, 2] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 1, 2) ggml_set_f32(a, c(1, 2)) b <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2) result <- ggml_repeat(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 1, 1, 2, 2, 2] ggml_free(ctx)
Backward pass for repeat operation - sums repetitions back to original shape. Used for gradient computation during training.
ggml_repeat_back(ctx, a, b)ggml_repeat_back(ctx, a, b)
ctx |
GGML context |
a |
Input tensor (gradients from repeated tensor) |
b |
Target shape tensor (original tensor before repeat) |
Tensor with summed gradients matching shape of b
Clears all tensor allocations in the context memory pool. The context can be reused without recreating it. This is more efficient than free + init for temporary operations.
ggml_reset(ctx)ggml_reset(ctx)
ctx |
GGML context pointer |
NULL (invisible)
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_reset(ctx) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 200) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_reset(ctx) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 200) ggml_free(ctx)
Reshapes tensor to 1D with ne0 elements
ggml_reshape_1d(ctx, a, ne0)ggml_reshape_1d(ctx, a, ne0)
ctx |
GGML context |
a |
Input tensor |
ne0 |
Size of dimension 0 |
Reshaped tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 4) ggml_set_f32(a, 1:12) result <- ggml_reshape_1d(ctx, a, 12) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 4) ggml_set_f32(a, 1:12) result <- ggml_reshape_1d(ctx, a, 12) ggml_free(ctx)
Reshapes tensor to 2D with shape (ne0, ne1)
ggml_reshape_2d(ctx, a, ne0, ne1)ggml_reshape_2d(ctx, a, ne0, ne1)
ctx |
GGML context |
a |
Input tensor |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
Reshaped tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 12) ggml_set_f32(a, 1:12) result <- ggml_reshape_2d(ctx, a, 3, 4) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 12) ggml_set_f32(a, 1:12) result <- ggml_reshape_2d(ctx, a, 3, 4) ggml_free(ctx)
Reshapes tensor to 3D with shape (ne0, ne1, ne2)
ggml_reshape_3d(ctx, a, ne0, ne1, ne2)ggml_reshape_3d(ctx, a, ne0, ne1, ne2)
ctx |
GGML context |
a |
Input tensor |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
ne2 |
Size of dimension 2 |
Reshaped tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 24) ggml_set_f32(a, 1:24) result <- ggml_reshape_3d(ctx, a, 2, 3, 4) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 24) ggml_set_f32(a, 1:24) result <- ggml_reshape_3d(ctx, a, 2, 3, 4) ggml_free(ctx)
Reshapes tensor to 4D with shape (ne0, ne1, ne2, ne3)
ggml_reshape_4d(ctx, a, ne0, ne1, ne2, ne3)ggml_reshape_4d(ctx, a, ne0, ne1, ne2, ne3)
ctx |
GGML context |
a |
Input tensor |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
ne2 |
Size of dimension 2 |
ne3 |
Size of dimension 3 |
Reshaped tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 120) ggml_set_f32(a, 1:120) result <- ggml_reshape_4d(ctx, a, 2, 3, 4, 5) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 120) ggml_set_f32(a, 1:120) result <- ggml_reshape_4d(ctx, a, 2, 3, 4, 5) ggml_free(ctx)
Creates a graph node for RMS (Root Mean Square) normalization. Normalizes by x / sqrt(mean(x^2) + eps). CRITICAL for LLaMA models.
ggml_rms_norm(ctx, a, eps = 1e-05)ggml_rms_norm(ctx, a, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor |
eps |
Epsilon value for numerical stability (default: 1e-5) |
Tensor representing the RMS normalization operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_rms_norm(ctx, a, eps = 1e-5) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # sqrt(mean(output^2)) should be ~1 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_rms_norm(ctx, a, eps = 1e-5) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # sqrt(mean(output^2)) should be ~1 ggml_free(ctx)
Creates a graph node for backward pass of RMS normalization. Used in training for computing gradients.
ggml_rms_norm_back(ctx, a, b, eps = 1e-05)ggml_rms_norm_back(ctx, a, b, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor (x from forward pass) |
b |
Gradient tensor (dy) |
eps |
Epsilon for numerical stability (default 1e-5) |
Tensor representing the gradient with respect to input
Creates a graph node for in-place RMS normalization. Returns a view of the input tensor. CRITICAL for LLaMA models when memory efficiency is important.
ggml_rms_norm_inplace(ctx, a, eps = 1e-05)ggml_rms_norm_inplace(ctx, a, eps = 1e-05)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
eps |
Epsilon value for numerical stability (default: 1e-5) |
View of input tensor with RMS normalization applied
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_rms_norm_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_rms_norm_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Circularly shifts tensor elements along dimensions 0..3.
ggml_roll(ctx, a, shift0 = 0L, shift1 = 0L, shift2 = 0L, shift3 = 0L)ggml_roll(ctx, a, shift0 = 0L, shift1 = 0L, shift2 = 0L, shift3 = 0L)
ctx |
GGML context |
a |
Input tensor |
shift0, shift1, shift2, shift3
|
Shift amount along each dimension |
Rolled tensor
Creates a graph node for RoPE (Rotary Position Embedding). RoPE is the dominant position encoding method in modern LLMs like LLaMA, Mistral, and many others.
ggml_rope(ctx, a, b, n_dims, mode = 0L)ggml_rope(ctx, a, b, n_dims, mode = 0L)
ctx |
GGML context |
a |
Input tensor of shape [head_dim, n_head, seq_len, batch] |
b |
Position tensor (int32) of shape [seq_len] containing position indices |
n_dims |
Number of dimensions to apply rotation to (usually head_dim) |
mode |
RoPE mode: GGML_ROPE_TYPE_NORM (0), GGML_ROPE_TYPE_NEOX (2), etc. |
RoPE encodes position information by rotating pairs of dimensions in the embedding space. The rotation angle depends on position and dimension index.
Key benefits of RoPE: - Relative position information emerges naturally from rotation - Better extrapolation to longer sequences than absolute embeddings - No additional parameters needed
Tensor with same shape as input, with rotary embeddings applied
ctx <- ggml_init(16 * 1024 * 1024) # Query tensor: head_dim=8, n_head=4, seq_len=16, batch=1 q <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 8, 4, 16, 1) ggml_set_f32(q, rnorm(8 * 4 * 16)) # Position indices pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 16) ggml_set_i32(pos, 0:15) # Apply RoPE q_rope <- ggml_rope(ctx, q, pos, 8, GGML_ROPE_TYPE_NORM) graph <- ggml_build_forward_expand(ctx, q_rope) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Query tensor: head_dim=8, n_head=4, seq_len=16, batch=1 q <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 8, 4, 16, 1) ggml_set_f32(q, rnorm(8 * 4 * 16)) # Position indices pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 16) ggml_set_i32(pos, 0:15) # Apply RoPE q_rope <- ggml_rope(ctx, q, pos, 8, GGML_ROPE_TYPE_NORM) graph <- ggml_build_forward_expand(ctx, q_rope) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Creates a graph node for extended RoPE with frequency scaling parameters. Supports context extension techniques like YaRN, Linear Scaling, etc.
ggml_rope_ext( ctx, a, b, c = NULL, n_dims, mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )ggml_rope_ext( ctx, a, b, c = NULL, n_dims, mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )
ctx |
GGML context |
a |
Input tensor |
b |
Position tensor (int32) |
c |
Optional frequency factors tensor (NULL for default) |
n_dims |
Number of dimensions to apply rotation to |
mode |
RoPE mode |
n_ctx_orig |
Original context length the model was trained on |
freq_base |
Base frequency for RoPE (default 10000 for most models) |
freq_scale |
Frequency scale factor (1.0 = no scaling) |
ext_factor |
YaRN extension factor (0.0 to disable) |
attn_factor |
Attention scale factor (typically 1.0) |
beta_fast |
YaRN parameter for fast dimensions |
beta_slow |
YaRN parameter for slow dimensions |
This extended version supports various context extension techniques:
- **Linear Scaling**: Set freq_scale = original_ctx / new_ctx - **YaRN**: Set ext_factor > 0 with appropriate beta_fast/beta_slow - **NTK-aware**: Adjust freq_base for NTK-style scaling
Common freq_base values: - LLaMA 1/2: 10000 - LLaMA 3: 500000 - Mistral: 10000 - Phi-3: 10000
Tensor with extended RoPE applied
ctx <- ggml_init(16 * 1024 * 1024) q <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 64, 8, 32, 1) ggml_set_f32(q, rnorm(64 * 8 * 32)) pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 32) ggml_set_i32(pos, 0:31) # Standard RoPE with default freq_base q_rope <- ggml_rope_ext(ctx, q, pos, NULL, n_dims = 64, mode = 0L, n_ctx_orig = 4096, freq_base = 10000, freq_scale = 1.0, ext_factor = 0.0, attn_factor = 1.0, beta_fast = 32, beta_slow = 1) graph <- ggml_build_forward_expand(ctx, q_rope) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) q <- ggml_new_tensor_4d(ctx, GGML_TYPE_F32, 64, 8, 32, 1) ggml_set_f32(q, rnorm(64 * 8 * 32)) pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 32) ggml_set_i32(pos, 0:31) # Standard RoPE with default freq_base q_rope <- ggml_rope_ext(ctx, q, pos, NULL, n_dims = 64, mode = 0L, n_ctx_orig = 4096, freq_base = 10000, freq_scale = 1.0, ext_factor = 0.0, attn_factor = 1.0, beta_fast = 32, beta_slow = 1) graph <- ggml_build_forward_expand(ctx, q_rope) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Backward pass for extended RoPE (Rotary Position Embedding). Used during training to compute gradients through RoPE.
ggml_rope_ext_back( ctx, a, b, c = NULL, n_dims, mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )ggml_rope_ext_back( ctx, a, b, c = NULL, n_dims, mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )
ctx |
GGML context |
a |
Gradient tensor from upstream (gradients of ggml_rope_ext result) |
b |
Position tensor (same as forward pass) |
c |
Optional frequency factors tensor (NULL for default) |
n_dims |
Number of dimensions for rotation |
mode |
RoPE mode |
n_ctx_orig |
Original context length |
freq_base |
Base frequency |
freq_scale |
Frequency scale factor |
ext_factor |
Extension factor (YaRN) |
attn_factor |
Attention factor |
beta_fast |
YaRN fast beta |
beta_slow |
YaRN slow beta |
Gradient tensor for the input
Creates a graph node for extended RoPE, modifying input tensor in place. Returns a view of the input tensor.
ggml_rope_ext_inplace( ctx, a, b, c = NULL, n_dims, mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )ggml_rope_ext_inplace( ctx, a, b, c = NULL, n_dims, mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )
ctx |
GGML context |
a |
Input tensor |
b |
Position tensor (int32) |
c |
Optional frequency factors tensor (NULL for default) |
n_dims |
Number of dimensions to apply rotation to |
mode |
RoPE mode |
n_ctx_orig |
Original context length the model was trained on |
freq_base |
Base frequency for RoPE (default 10000 for most models) |
freq_scale |
Frequency scale factor (1.0 = no scaling) |
ext_factor |
YaRN extension factor (0.0 to disable) |
attn_factor |
Attention scale factor (typically 1.0) |
beta_fast |
YaRN parameter for fast dimensions |
beta_slow |
YaRN parameter for slow dimensions |
View of input tensor with RoPE applied in place
Other rope:
ggml_rope_multi(),
ggml_rope_multi_inplace()
In-place version of ggml_rope. Returns a view of the input tensor.
ggml_rope_inplace(ctx, a, b, n_dims, mode = 0L)ggml_rope_inplace(ctx, a, b, n_dims, mode = 0L)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
b |
Position tensor (int32) |
n_dims |
Number of dimensions to apply rotation to |
mode |
RoPE mode |
View of input tensor with RoPE applied
Creates a graph node for multi-dimensional RoPE (MRoPE) used in vision transformers. Supports separate rotation for different positional dimensions (e.g., height, width, time).
ggml_rope_multi( ctx, a, b, c = NULL, n_dims, sections = c(0L, 0L, 0L, 0L), mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )ggml_rope_multi( ctx, a, b, c = NULL, n_dims, sections = c(0L, 0L, 0L, 0L), mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )
ctx |
GGML context |
a |
Input tensor |
b |
Position tensor (int32) |
c |
Optional frequency factors tensor (NULL for default) |
n_dims |
Number of dimensions to apply rotation to |
sections |
Integer vector of length 4 specifying dimension sections for MRoPE |
mode |
RoPE mode |
n_ctx_orig |
Original context length the model was trained on |
freq_base |
Base frequency for RoPE (default 10000 for most models) |
freq_scale |
Frequency scale factor (1.0 = no scaling) |
ext_factor |
YaRN extension factor (0.0 to disable) |
attn_factor |
Attention scale factor (typically 1.0) |
beta_fast |
YaRN parameter for fast dimensions |
beta_slow |
YaRN parameter for slow dimensions |
Tensor with multi-dimensional RoPE applied
Other rope:
ggml_rope_ext_inplace(),
ggml_rope_multi_inplace()
Creates a graph node for multi-dimensional RoPE, modifying input in place.
ggml_rope_multi_inplace( ctx, a, b, c = NULL, n_dims, sections = c(0L, 0L, 0L, 0L), mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )ggml_rope_multi_inplace( ctx, a, b, c = NULL, n_dims, sections = c(0L, 0L, 0L, 0L), mode = 0L, n_ctx_orig = 0L, freq_base = 10000, freq_scale = 1, ext_factor = 0, attn_factor = 1, beta_fast = 32, beta_slow = 1 )
ctx |
GGML context |
a |
Input tensor |
b |
Position tensor (int32) |
c |
Optional frequency factors tensor (NULL for default) |
n_dims |
Number of dimensions to apply rotation to |
sections |
Integer vector of length 4 specifying dimension sections for MRoPE |
mode |
RoPE mode |
n_ctx_orig |
Original context length the model was trained on |
freq_base |
Base frequency for RoPE (default 10000 for most models) |
freq_scale |
Frequency scale factor (1.0 = no scaling) |
ext_factor |
YaRN extension factor (0.0 to disable) |
attn_factor |
Attention scale factor (typically 1.0) |
beta_fast |
YaRN parameter for fast dimensions |
beta_slow |
YaRN parameter for slow dimensions |
View of input tensor with MRoPE applied in place
Other rope:
ggml_rope_ext_inplace(),
ggml_rope_multi()
Creates a graph node for element-wise rounding: round(x)
ggml_round(ctx, a)ggml_round(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the round operation
Creates a graph node for in-place element-wise rounding.
ggml_round_inplace(ctx, a)ggml_round_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with rounded values
Saves both the architecture and trained weights of a model to an RDS file.
Unlike ggml_save_weights(), which requires the model to be manually
reconstructed before loading, ggml_save_model() saves everything
needed to restore the model with a single call to ggml_load_model().
ggml_save_model(model, path)ggml_save_model(model, path)
model |
A trained |
path |
File path (typically |
The model (invisibly).
ggml_sequential_model — input shape, layer configs, trained
weights, and compilation settings are all saved.
ggml_functional_model — input/output node graphs (pure R
lists, no ggml pointers) and trained node_weights are saved.
model <- ggml_model_sequential() |> ggml_layer_dense(16L, activation = "relu", input_shape = 4L) |> ggml_layer_dense(2L, activation = "softmax") model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") x <- matrix(runif(64 * 4), 64, 4) y <- matrix(c(rep(c(1,0), 32), rep(c(0,1), 32)), 64, 2) model <- ggml_fit(model, x, y, epochs = 1L, batch_size = 32L, verbose = 0L) tmp <- tempfile(fileext = ".rds") ggml_save_model(model, tmp) model2 <- ggml_load_model(tmp)model <- ggml_model_sequential() |> ggml_layer_dense(16L, activation = "relu", input_shape = 4L) |> ggml_layer_dense(2L, activation = "softmax") model <- ggml_compile(model, optimizer = "adam", loss = "categorical_crossentropy") x <- matrix(runif(64 * 4), 64, 4) y <- matrix(c(rep(c(1,0), 32), rep(c(0,1), 32)), 64, 2) model <- ggml_fit(model, x, y, epochs = 1L, batch_size = 32L, verbose = 0L) tmp <- tempfile(fileext = ".rds") ggml_save_model(model, tmp) model2 <- ggml_load_model(tmp)
Saves the trained weights of a sequential model to an RDS file. The file includes both weights and architecture metadata for validation when loading.
ggml_save_weights(model, path)ggml_save_weights(model, path)
model |
A trained ggml_sequential_model |
path |
File path to save weights (typically with .rds extension) |
The model (invisibly).
Creates a graph node for scaling tensor by a scalar: x * s
ggml_scale(ctx, a, s)ggml_scale(ctx, a, s)
ctx |
GGML context |
a |
Input tensor |
s |
Scalar value to multiply by |
Tensor representing the scaled values
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_scale(ctx, a, 2.0) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [2, 4, 6, 8] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_scale(ctx, a, 2.0) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [2, 4, 6, 8] ggml_free(ctx)
Creates a graph node for in-place scaling: a * s
ggml_scale_inplace(ctx, a, s)ggml_scale_inplace(ctx, a, s)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
s |
Scalar value to multiply by |
View of tensor a with scaled values
Anneals LR from initial value to 'eta_min' following a cosine curve.
ggml_schedule_cosine_decay(eta_min = 0, T_max = NULL)ggml_schedule_cosine_decay(eta_min = 0, T_max = NULL)
eta_min |
Minimum LR at end of schedule |
T_max |
Total number of epochs (defaults to nepoch from fit state) |
List with on_epoch_begin function
Other callbacks:
ggml_callback_early_stopping(),
ggml_schedule_reduce_on_plateau(),
ggml_schedule_step_decay()
Reduces LR when a metric stops improving.
ggml_schedule_reduce_on_plateau( monitor = "val_loss", factor = 0.5, patience = 5, min_lr = 1e-07, min_delta = 1e-04, mode = "auto" )ggml_schedule_reduce_on_plateau( monitor = "val_loss", factor = 0.5, patience = 5, min_lr = 1e-07, min_delta = 1e-04, mode = "auto" )
monitor |
Metric to monitor: "val_loss", "train_loss", etc. |
factor |
Factor to reduce LR by |
patience |
Epochs with no improvement before reducing |
min_lr |
Minimum LR |
min_delta |
Minimum change to qualify as improvement |
mode |
"min" or "max". "auto" infers from monitor name. |
List with on_epoch_end function
Other callbacks:
ggml_callback_early_stopping(),
ggml_schedule_cosine_decay(),
ggml_schedule_step_decay()
Reduces LR by a factor every 'step_size' epochs.
ggml_schedule_step_decay(step_size = 10, gamma = 0.1)ggml_schedule_step_decay(step_size = 10, gamma = 0.1)
step_size |
Reduce LR every this many epochs |
gamma |
Multiplicative factor of LR reduction |
List with on_epoch_begin function
Other callbacks:
ggml_callback_early_stopping(),
ggml_schedule_cosine_decay(),
ggml_schedule_reduce_on_plateau()
Copies tensor b into tensor a at a specified offset. This allows writing to a portion of a tensor.
ggml_set(ctx, a, b, nb1, nb2, nb3, offset)ggml_set(ctx, a, b, nb1, nb2, nb3, offset)
ctx |
GGML context |
a |
Destination tensor |
b |
Source tensor (data to copy) |
nb1 |
Stride for dimension 1 (in bytes) |
nb2 |
Stride for dimension 2 (in bytes) |
nb3 |
Stride for dimension 3 (in bytes) |
offset |
Byte offset in destination tensor |
Tensor representing the set operation
Simplified 1D version of ggml_set. Copies tensor b into tensor a starting at offset.
ggml_set_1d(ctx, a, b, offset)ggml_set_1d(ctx, a, b, offset)
ctx |
GGML context |
a |
Destination tensor |
b |
Source tensor |
offset |
Byte offset in destination tensor |
Tensor representing the set operation
Simplified 2D version of ggml_set.
ggml_set_2d(ctx, a, b, nb1, offset)ggml_set_2d(ctx, a, b, nb1, offset)
ctx |
GGML context |
a |
Destination tensor |
b |
Source tensor |
nb1 |
Stride for dimension 1 (in bytes) |
offset |
Byte offset in destination tensor |
Tensor representing the set operation
Restores GGML to default abort behavior (prints to stderr and aborts).
ggml_set_abort_callback_default()ggml_set_abort_callback_default()
NULL invisibly
Other logging:
ggml_abort_is_r_enabled(),
ggml_log_is_r_enabled(),
ggml_log_set_default(),
ggml_log_set_r(),
ggml_set_abort_callback_r()
Converts GGML abort calls into R errors (via Rf_error). This allows R to catch GGML failures with tryCatch.
ggml_set_abort_callback_r()ggml_set_abort_callback_r()
NULL invisibly
Other logging:
ggml_abort_is_r_enabled(),
ggml_log_is_r_enabled(),
ggml_log_set_default(),
ggml_log_set_r(),
ggml_set_abort_callback_default()
ggml_set_abort_callback_r() # Now GGML aborts will become R errors result <- tryCatch({ # ... ggml operations that might fail ... }, error = function(e) { message("GGML error caught: ", e$message) })ggml_set_abort_callback_r() # Now GGML aborts will become R errors result <- tryCatch({ # ... ggml operations that might fail ... }, error = function(e) { message("GGML error caught: ", e$message) })
Set F32 data
Set F32 Data
ggml_set_f32(tensor, data) ggml_set_f32(tensor, data)ggml_set_f32(tensor, data) ggml_set_f32(tensor, data)
tensor |
Tensor |
data |
Numeric vector |
NULL (invisible)
NULL (invisible)
ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(tensor, c(1, 2, 3, 4, 5)) ggml_get_f32(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(t, c(1, 2, 3, 4, 5)) ggml_get_f32(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) tensor <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(tensor, c(1, 2, 3, 4, 5)) ggml_get_f32(tensor) ggml_free(ctx) ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(t, c(1, 2, 3, 4, 5)) ggml_get_f32(t) ggml_free(ctx)
Sets a single f32 value in the tensor at position [i0, i1, i2, i3]. This is a direct data write, not a graph operation.
ggml_set_f32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0, value)ggml_set_f32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0, value)
tensor |
Tensor pointer |
i0, i1, i2, i3
|
Indices (0-based) |
value |
Float value to set |
NULL (invisible)
Sets integer data in an I32 tensor. Used for indices (ggml_get_rows) and position tensors (ggml_rope).
ggml_set_i32(tensor, data)ggml_set_i32(tensor, data)
tensor |
Tensor of type GGML_TYPE_I32 |
data |
Integer vector |
NULL (invisible)
ctx <- ggml_init(1024 * 1024) pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 10) ggml_set_i32(pos, 0:9) ggml_get_i32(pos) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) pos <- ggml_new_tensor_1d(ctx, GGML_TYPE_I32, 10) ggml_set_i32(pos, 0:9) ggml_get_i32(pos) ggml_free(ctx)
Sets a single i32 value in the tensor at position [i0, i1, i2, i3].
ggml_set_i32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0, value)ggml_set_i32_nd(tensor, i0, i1 = 0, i2 = 0, i3 = 0, value)
tensor |
Tensor pointer |
i0, i1, i2, i3
|
Indices (0-based) |
value |
Integer value to set |
NULL (invisible)
Mark Tensor as Input
ggml_set_input(tensor)ggml_set_input(tensor)
tensor |
Tensor pointer |
The tensor (for chaining)
Set the number of threads for GGML operations
ggml_set_n_threads(n_threads)ggml_set_n_threads(n_threads)
n_threads |
Number of threads to use |
Number of threads set
# Use 4 threads ggml_set_n_threads(4) # Use all available cores ggml_set_n_threads(parallel::detectCores())# Use 4 threads ggml_set_n_threads(4) # Use all available cores ggml_set_n_threads(parallel::detectCores())
Assigns a name to a tensor. Useful for debugging and graph visualization.
ggml_set_name(tensor, name)ggml_set_name(tensor, name)
tensor |
Tensor pointer |
name |
Character string name |
The tensor (for chaining)
When enabled, tensor creation will not allocate memory for data. Useful for creating computation graphs without allocating storage.
ggml_set_no_alloc(ctx, no_alloc)ggml_set_no_alloc(ctx, no_alloc)
ctx |
GGML context |
no_alloc |
Logical, TRUE to disable allocation |
NULL (invisible)
ctx <- ggml_init(1024 * 1024) ggml_set_no_alloc(ctx, TRUE) ggml_get_no_alloc(ctx) ggml_set_no_alloc(ctx, FALSE) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) ggml_set_no_alloc(ctx, TRUE) ggml_get_no_alloc(ctx) ggml_set_no_alloc(ctx, FALSE) ggml_free(ctx)
Directly calls omp_set_num_threads() to limit OpenMP parallelism. Useful in tests to comply with CRAN policy on core usage.
ggml_set_omp_threads(n)ggml_set_omp_threads(n)
n |
Number of threads |
NULL invisibly
Sets the raw op_params bytes for a tensor.
ggml_set_op_params(tensor, params)ggml_set_op_params(tensor, params)
tensor |
External pointer to tensor |
params |
Raw vector of parameters (max 64 bytes) |
NULL invisibly
Other tensor:
ggml_are_same_layout(),
ggml_get_op_params(),
ggml_get_op_params_f32(),
ggml_get_op_params_i32(),
ggml_set_op_params_f32(),
ggml_set_op_params_i32()
Sets a single float value in tensor op_params at given index.
ggml_set_op_params_f32(tensor, index, value)ggml_set_op_params_f32(tensor, index, value)
tensor |
External pointer to tensor |
index |
0-based index (0-15 for 64-byte op_params) |
value |
Numeric value to set |
NULL invisibly
Other tensor:
ggml_are_same_layout(),
ggml_get_op_params(),
ggml_get_op_params_f32(),
ggml_get_op_params_i32(),
ggml_set_op_params(),
ggml_set_op_params_i32()
Sets a single int32 value in tensor op_params at given index.
ggml_set_op_params_i32(tensor, index, value)ggml_set_op_params_i32(tensor, index, value)
tensor |
External pointer to tensor |
index |
0-based index (0-15 for 64-byte op_params) |
value |
Integer value to set |
NULL invisibly
Other tensor:
ggml_are_same_layout(),
ggml_get_op_params(),
ggml_get_op_params_f32(),
ggml_get_op_params_i32(),
ggml_set_op_params(),
ggml_set_op_params_f32()
Mark Tensor as Output
ggml_set_output(tensor)ggml_set_output(tensor)
tensor |
Tensor pointer |
The tensor (for chaining)
Marks a tensor as a trainable parameter for backpropagation. The optimizer will compute gradients for this tensor during training.
ggml_set_param(tensor)ggml_set_param(tensor)
tensor |
Tensor pointer |
The tensor (for chaining)
Fixes the random seed used by ggmlR for everything that is stochastic:
weight initialisation (sequential, functional and autograd layers),
dropout masks (training-time),
data shuffling in the autograd dataloader / training loops.
ggml_set_seed(seed)ggml_set_seed(seed)
seed |
A single integer (or value coercible to integer) used as the RNG
seed. |
This is a thin wrapper around set.seed: all randomness in
ggmlR is produced by the base R RNG, so a fixed seed gives identical starting
weights, dropout masks and batch ordering across runs. It is the single point
of control used by the mlr3 learners (seed hyperparameter) and
the parsnip "ggml" engine (seed engine argument).
GPU note: this controls the random inputs to the computation, not the floating-point arithmetic itself. GPU (Vulkan) kernels are run-to-run stable on a given device/driver for the standard forward/backward paths, but ggmlR does not guarantee bit-for-bit identical results across different devices, drivers or backends (CPU vs Vulkan). Reproducibility is at the level of training dynamics, not exact bits.
Invisibly returns seed.
ggml_set_seed(42) a <- runif(3) ggml_set_seed(42) b <- runif(3) identical(a, b) # TRUEggml_set_seed(42) a <- runif(3) ggml_set_seed(42) b <- runif(3) identical(a, b) # TRUE
Sets all elements of a tensor to zero. This is more efficient than manually setting all elements.
ggml_set_zero(tensor)ggml_set_zero(tensor)
tensor |
Tensor to zero out |
NULL (invisible)
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(t, 1:10) ggml_set_zero(t) ggml_get_f32(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(t, 1:10) ggml_set_zero(t) ggml_get_f32(t) ggml_free(ctx)
Creates a graph node for element-wise sign function. sgn(x) = -1 if x < 0, 0 if x == 0, 1 if x > 0
ggml_sgn(ctx, a)ggml_sgn(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the sign operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -0.5, 0, 0.5, 2)) r <- ggml_sgn(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # c(-1, -1, 0, 1, 1) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -0.5, 0, 0.5, 2)) r <- ggml_sgn(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # c(-1, -1, 0, 1, 1) ggml_free(ctx)
Creates a graph node for sigmoid activation: 1 / (1 + exp(-x))
ggml_sigmoid(ctx, a)ggml_sigmoid(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the sigmoid operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_sigmoid(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_sigmoid(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place sigmoid activation: 1 / (1 + e^(-x))
ggml_sigmoid_inplace(ctx, a)ggml_sigmoid_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with sigmoid applied
Creates a graph node for SiLU (Sigmoid Linear Unit) activation, also known as Swish. CRITICAL for LLaMA models.
ggml_silu(ctx, a)ggml_silu(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the SiLU operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_silu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_silu(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Computes the backward pass for SiLU (Swish) activation. Used during training for gradient computation.
ggml_silu_back(ctx, a, b)ggml_silu_back(ctx, a, b)
ctx |
GGML context |
a |
Forward input tensor |
b |
Gradient tensor from upstream |
Gradient tensor for the input
Creates a graph node for in-place SiLU (Sigmoid Linear Unit) activation. CRITICAL for LLaMA models with memory efficiency.
ggml_silu_inplace(ctx, a)ggml_silu_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with SiLU applied
Creates a graph node for element-wise sine: sin(x)
ggml_sin(ctx, a)ggml_sin(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the sin operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(0, pi/6, pi/2, pi)) result <- ggml_sin(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [0, 0.5, 1, 0] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(0, pi/6, pi/2, pi)) result <- ggml_sin(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [0, 0.5, 1, 0] ggml_free(ctx)
Creates a graph node for softmax operation. CRITICAL for attention mechanisms.
ggml_soft_max(ctx, a)ggml_soft_max(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the softmax operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_soft_max(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # Output sums to 1.0 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_soft_max(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # Output sums to 1.0 ggml_free(ctx)
Creates a graph node for fused softmax operation with optional masking and ALiBi (Attention with Linear Biases) support. Computes: softmax(a * scale + mask * (ALiBi slope)) CRITICAL for efficient attention computation in transformers.
ggml_soft_max_ext(ctx, a, mask = NULL, scale = 1, max_bias = 0)ggml_soft_max_ext(ctx, a, mask = NULL, scale = 1, max_bias = 0)
ctx |
GGML context |
a |
Input tensor (typically attention scores) |
mask |
Optional attention mask tensor (F16 or F32). NULL for no mask. Shape must be broadcastable to input tensor. |
scale |
Scaling factor, typically 1/sqrt(head_dim) |
max_bias |
Maximum ALiBi bias (0.0 to disable ALiBi) |
This extended softmax is commonly used in transformer attention: 1. Scale attention scores by 1/sqrt(d_k) for numerical stability 2. Apply attention mask (e.g., causal mask, padding mask) 3. Optionally apply ALiBi position bias 4. Compute softmax
All these operations are fused for efficiency.
Tensor representing the scaled and masked softmax
ctx <- ggml_init(16 * 1024 * 1024) scores <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 10) ggml_set_f32(scores, rnorm(100)) attn <- ggml_soft_max_ext(ctx, scores, NULL, 1.0, max_bias = 0.0) graph <- ggml_build_forward_expand(ctx, attn) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) scores <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 10) ggml_set_f32(scores, rnorm(100)) attn <- ggml_soft_max_ext(ctx, scores, NULL, 1.0, max_bias = 0.0) graph <- ggml_build_forward_expand(ctx, attn) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Backward pass for extended softmax operation.
ggml_soft_max_ext_back(ctx, a, b, scale = 1, max_bias = 0)ggml_soft_max_ext_back(ctx, a, b, scale = 1, max_bias = 0)
ctx |
GGML context |
a |
Softmax output tensor (from forward pass) |
b |
Gradient tensor from upstream |
scale |
Scale factor (same as forward pass) |
max_bias |
Maximum ALiBi bias (same as forward pass) |
Gradient tensor for the input
Creates a graph node for the backward pass of extended softmax, modifying in place.
ggml_soft_max_ext_back_inplace(ctx, a, b, scale = 1, max_bias = 0)ggml_soft_max_ext_back_inplace(ctx, a, b, scale = 1, max_bias = 0)
ctx |
GGML context |
a |
Gradient tensor from upstream |
b |
Softmax output from forward pass |
scale |
Scaling factor used in forward pass |
max_bias |
Maximum ALiBi bias used in forward pass |
View of input tensor with gradient computed in place
Other softmax:
ggml_soft_max_ext_inplace()
Creates a graph node for extended softmax, modifying input tensor in place. Returns a view of the input tensor.
ggml_soft_max_ext_inplace(ctx, a, mask = NULL, scale = 1, max_bias = 0)ggml_soft_max_ext_inplace(ctx, a, mask = NULL, scale = 1, max_bias = 0)
ctx |
GGML context |
a |
Input tensor (typically attention scores) |
mask |
Optional attention mask tensor (F16 or F32). NULL for no mask. Shape must be broadcastable to input tensor. |
scale |
Scaling factor, typically 1/sqrt(head_dim) |
max_bias |
Maximum ALiBi bias (0.0 to disable ALiBi) |
View of input tensor with softmax applied in place
Other softmax:
ggml_soft_max_ext_back_inplace()
Creates a graph node for in-place softmax operation. Returns a view of the input tensor.
ggml_soft_max_inplace(ctx, a)ggml_soft_max_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of input tensor with softmax applied
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_soft_max_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_soft_max_inplace(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Creates a graph node for Softplus activation. Softplus(x) = log(1 + exp(x)). A smooth approximation of ReLU.
ggml_softplus(ctx, a)ggml_softplus(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the Softplus operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_softplus(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) r <- ggml_softplus(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) ggml_free(ctx)
Creates a graph node for in-place softplus activation: log(1 + e^x)
ggml_softplus_inplace(ctx, a)ggml_softplus_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with softplus applied
Sort Order Constants
GGML_SORT_ORDER_ASC GGML_SORT_ORDER_DESCGGML_SORT_ORDER_ASC GGML_SORT_ORDER_DESC
Integer constants
An object of class integer of length 1.
Constants for specifying sort order in argsort operations.
GGML_SORT_ORDER_ASC (0): Ascending order (smallest first)
GGML_SORT_ORDER_DESC (1): Descending order (largest first)
An integer constant representing a sort order
GGML_SORT_ORDER_ASC # 0 - Ascending order GGML_SORT_ORDER_DESC # 1 - Descending order # Usage with ggml_argsort ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(3, 1, 4, 1, 5)) # Get ascending sort indices idx_asc <- ggml_argsort(ctx, a, GGML_SORT_ORDER_ASC) # Get descending sort indices idx_desc <- ggml_argsort(ctx, a, GGML_SORT_ORDER_DESC) ggml_free(ctx)GGML_SORT_ORDER_ASC # 0 - Ascending order GGML_SORT_ORDER_DESC # 1 - Descending order # Usage with ggml_argsort ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(3, 1, 4, 1, 5)) # Get ascending sort indices idx_asc <- ggml_argsort(ctx, a, GGML_SORT_ORDER_ASC) # Get descending sort indices idx_desc <- ggml_argsort(ctx, a, GGML_SORT_ORDER_DESC) ggml_free(ctx)
Creates a graph node for element-wise squaring: x^2
ggml_sqr(ctx, a)ggml_sqr(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the square operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_sqr(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 4, 9, 16] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 2, 3, 4)) result <- ggml_sqr(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 4, 9, 16] ggml_free(ctx)
Creates a graph node for in-place element-wise square: x^2
ggml_sqr_inplace(ctx, a)ggml_sqr_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with squared values
Creates a graph node for element-wise square root: sqrt(x)
ggml_sqrt(ctx, a)ggml_sqrt(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the sqrt operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 4, 9, 16)) result <- ggml_sqrt(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 2, 3, 4] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 4) ggml_set_f32(a, c(1, 4, 9, 16)) result <- ggml_sqrt(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [1, 2, 3, 4] ggml_free(ctx)
Creates a graph node for in-place element-wise square root.
ggml_sqrt_inplace(ctx, a)ggml_sqrt_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with square root values
Creates a graph node for element-wise step function. step(x) = 0 if x <= 0, 1 if x > 0 Also known as the Heaviside step function.
ggml_step(ctx, a)ggml_step(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the step operation
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -0.5, 0, 0.5, 2)) r <- ggml_step(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # c(0, 0, 0, 1, 1) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -0.5, 0, 0.5, 2)) r <- ggml_step(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # c(0, 0, 0, 1, 1) ggml_free(ctx)
Creates a graph node for element-wise subtraction.
ggml_sub(ctx, a, b)ggml_sub(ctx, a, b)
ctx |
GGML context |
a |
First tensor |
b |
Second tensor (same shape as a) |
Tensor representing the subtraction operation (a - b)
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(5, 4, 3, 2, 1)) ggml_set_f32(b, c(1, 1, 1, 1, 1)) result <- ggml_sub(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) b <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(5, 4, 3, 2, 1)) ggml_set_f32(b, c(1, 1, 1, 1, 1)) result <- ggml_sub(ctx, a, b) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place element-wise subtraction. Result is stored in tensor a, saving memory allocation.
ggml_sub_inplace(ctx, a, b)ggml_sub_inplace(ctx, a, b)
ctx |
GGML context |
a |
First tensor (will be modified in-place) |
b |
Second tensor (same shape as a) |
View of tensor a with the subtraction result
Creates a graph node that computes the sum of all elements.
ggml_sum(ctx, a)ggml_sum(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Scalar tensor with the sum
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) result <- ggml_sum(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # 15 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(1, 2, 3, 4, 5)) result <- ggml_sum(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # 15 ggml_free(ctx)
Creates a graph node that computes the sum along rows.
ggml_sum_rows(ctx, a)ggml_sum_rows(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor with row sums
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2) ggml_set_f32(a, c(1, 2, 3, 4, 5, 6)) result <- ggml_sum_rows(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [6, 15] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2) ggml_set_f32(a, c(1, 2, 3, 4, 5, 6)) result <- ggml_sum_rows(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) output <- ggml_get_f32(result) # [6, 15] ggml_free(ctx)
Creates a graph node for SwiGLU operation. SwiGLU uses SiLU (Swish) as the activation function on the first half. CRITICAL for LLaMA, Mistral, and many modern LLMs.
ggml_swiglu(ctx, a)ggml_swiglu(ctx, a)
ctx |
GGML context |
a |
Input tensor (first dimension must be even) |
Formula: output = SiLU(x) * gate
Tensor with half the first dimension of input
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 3) ggml_set_f32(a, rnorm(24)) r <- ggml_swiglu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 4x3 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 3) ggml_set_f32(a, rnorm(24)) r <- ggml_swiglu(ctx, a) graph <- ggml_build_forward_expand(ctx, r) ggml_graph_compute(ctx, graph) result <- ggml_get_f32(r) # Shape: 4x3 ggml_free(ctx)
Creates a graph node for SwiGLU with separate input and gate tensors.
ggml_swiglu_split(ctx, a, b)ggml_swiglu_split(ctx, a, b)
ctx |
GGML context |
a |
Input tensor (the values to be gated) |
b |
Gate tensor (same shape as a) |
Formula: output = SiLU(a) * b
Tensor with same shape as input tensors
Creates a graph node for hyperbolic tangent activation.
ggml_tanh(ctx, a)ggml_tanh(ctx, a)
ctx |
GGML context |
a |
Input tensor |
Tensor representing the tanh operation
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_tanh(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 5) ggml_set_f32(a, c(-2, -1, 0, 1, 2)) result <- ggml_tanh(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) ggml_get_f32(result) ggml_free(ctx)
Creates a graph node for in-place hyperbolic tangent activation.
ggml_tanh_inplace(ctx, a)ggml_tanh_inplace(ctx, a)
ctx |
GGML context |
a |
Input tensor (will be modified in-place) |
View of tensor a with tanh applied
Copies raw data from src tensor to dst tensor (must be same size).
ggml_tensor_copy(dst, src)ggml_tensor_copy(dst, src)
dst |
Destination tensor |
src |
Source tensor |
NULL (invisible)
Returns the byte strides for each dimension of the tensor.
ggml_tensor_nb(tensor)ggml_tensor_nb(tensor)
tensor |
Tensor pointer |
Numeric vector of 4 stride values (nb0, nb1, nb2, nb3)
Counts the number of tensors allocated in a context.
ggml_tensor_num(ctx)ggml_tensor_num(ctx)
ctx |
GGML context |
Number of tensors
Returns the memory overhead (metadata) for each tensor in bytes
ggml_tensor_overhead()ggml_tensor_overhead()
Size in bytes
ggml_tensor_overhead()ggml_tensor_overhead()
Sets all elements of a f32 tensor to a single value.
ggml_tensor_set_f32_scalar(tensor, value)ggml_tensor_set_f32_scalar(tensor, value)
tensor |
Tensor pointer |
value |
Float value to fill with |
NULL (invisible)
Returns the shape of a tensor as a numeric vector of 4 elements (ne0, ne1, ne2, ne3)
ggml_tensor_shape(tensor)ggml_tensor_shape(tensor)
tensor |
Tensor pointer |
Numeric vector of length 4 with dimensions
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_tensor_shape(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 10, 20) ggml_tensor_shape(t) ggml_free(ctx)
Returns the data type of a tensor as an integer code
ggml_tensor_type(tensor)ggml_tensor_type(tensor)
tensor |
Tensor pointer |
Integer type code (0 = F32, 1 = F16, etc.)
ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_tensor_type(t) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) t <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_tensor_type(t) ggml_free(ctx)
Runs GGML library self-test and prints version info.
ggml_test()ggml_test()
TRUE if test passed
ggml_test()ggml_test()
Initializes the GGML timing system. Call this once at the beginning of the program before using ggml_time_ms() or ggml_time_us().
ggml_time_init()ggml_time_init()
NULL (invisible)
ggml_time_init() start <- ggml_time_ms() Sys.sleep(0.01) elapsed <- ggml_time_ms() - startggml_time_init() start <- ggml_time_ms() Sys.sleep(0.01) elapsed <- ggml_time_ms() - start
Returns the current time in milliseconds since the timer was initialized.
ggml_time_ms()ggml_time_ms()
Numeric value representing milliseconds
ggml_time_init() start <- ggml_time_ms() Sys.sleep(0.01) elapsed <- ggml_time_ms() - startggml_time_init() start <- ggml_time_ms() Sys.sleep(0.01) elapsed <- ggml_time_ms() - start
Returns the current time in microseconds since the timer was initialized. More precise than ggml_time_ms() for micro-benchmarking.
ggml_time_us()ggml_time_us()
Numeric value representing microseconds
ggml_time_init() start <- ggml_time_us() Sys.sleep(0.001) elapsed <- ggml_time_us() - startggml_time_init() start <- ggml_time_us() Sys.sleep(0.001) elapsed <- ggml_time_us() - start
Creates sinusoidal timestep embeddings as used in diffusion models. Reference: CompVis/stable-diffusion util.py timestep_embedding
ggml_timestep_embedding(ctx, timesteps, dim, max_period = 10000L)ggml_timestep_embedding(ctx, timesteps, dim, max_period = 10000L)
ctx |
GGML context |
timesteps |
Input tensor of timestep values [N] |
dim |
Embedding dimension |
max_period |
Maximum period for sinusoidal embedding (default 10000) |
Tensor of shape [N, dim] with timestep embeddings
Returns the indices of top K elements per row. Useful for sampling strategies in language models (top-k sampling). Note: the resulting indices are in no particular order within top-k.
ggml_top_k(ctx, a, k)ggml_top_k(ctx, a, k)
ctx |
GGML context |
a |
Input tensor (F32) |
k |
Number of top elements to return per row |
Tensor containing I32 indices of top-k elements (not values)
ctx <- ggml_init(16 * 1024 * 1024) # Logits from model output logits <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_set_f32(logits, rnorm(100)) # Get top 5 logits for sampling top5 <- ggml_top_k(ctx, logits, 5) graph <- ggml_build_forward_expand(ctx, top5) ggml_graph_compute(ctx, graph) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) # Logits from model output logits <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_set_f32(logits, rnorm(100)) # Get top 5 logits for sampling top5 <- ggml_top_k(ctx, logits, 5) graph <- ggml_build_forward_expand(ctx, top5) ggml_graph_compute(ctx, graph) ggml_free(ctx)
Returns the per-epoch loss / accuracy curve recorded during
ggml_fit, in a tidy data frame. This is the standard accessor
for the loss curve; it works on a raw sequential/functional model or on a
fitted parsnip engine object (e.g. from extract_fit_engine()).
ggml_training_history(object, format = c("wide", "long"), ...)ggml_training_history(object, format = c("wide", "long"), ...)
object |
A fitted |
format |
|
... |
Unused; for extensibility. |
A data frame (tibble if tibble is installed). Wide columns:
epoch, train_loss, train_accuracy, and
val_loss / val_accuracy when a validation split was used.
Returns NULL with a warning if the model has no recorded history
(e.g. not yet fitted).
Creates a graph node for matrix transpose operation.
ggml_transpose(ctx, a)ggml_transpose(ctx, a)
ctx |
GGML context |
a |
Input tensor (2D matrix) |
Tensor representing the transposed matrix
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2) ggml_set_f32(a, 1:6) result <- ggml_transpose(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) shape <- ggml_tensor_shape(result) # [2, 3] ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 3, 2) ggml_set_f32(a, 1:6) result <- ggml_transpose(ctx, a) graph <- ggml_build_forward_expand(ctx, result) ggml_graph_compute(ctx, graph) shape <- ggml_tensor_shape(result) # [2, 3] ggml_free(ctx)
Constants representing different data types supported by GGML.
GGML_TYPE_F32 GGML_TYPE_F16 GGML_TYPE_Q4_0 GGML_TYPE_Q4_1 GGML_TYPE_Q8_0 GGML_TYPE_Q2_K GGML_TYPE_Q3_K GGML_TYPE_Q4_K GGML_TYPE_Q5_K GGML_TYPE_Q6_K GGML_TYPE_I32 GGML_TYPE_BF16GGML_TYPE_F32 GGML_TYPE_F16 GGML_TYPE_Q4_0 GGML_TYPE_Q4_1 GGML_TYPE_Q8_0 GGML_TYPE_Q2_K GGML_TYPE_Q3_K GGML_TYPE_Q4_K GGML_TYPE_Q5_K GGML_TYPE_Q6_K GGML_TYPE_I32 GGML_TYPE_BF16
Integer constants
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
GGML_TYPE_F32: 32-bit floating point (default)
GGML_TYPE_F16: 16-bit floating point (half precision)
GGML_TYPE_Q4_0: 4-bit quantization type 0
GGML_TYPE_Q4_1: 4-bit quantization type 1
GGML_TYPE_Q8_0: 8-bit quantization type 0
GGML_TYPE_I32: 32-bit integer
GGML_TYPE_BF16: 16-bit brain float (bfloat16)
An integer constant representing a GGML data type
GGML_TYPE_F32 GGML_TYPE_F16 GGML_TYPE_I32GGML_TYPE_F32 GGML_TYPE_F16 GGML_TYPE_I32
Returns the string name of a GGML type.
ggml_type_name(type)ggml_type_name(type)
type |
GGML type constant (e.g., GGML_TYPE_F32) |
Character string with type name
Other type_system:
ggml_blck_size(),
ggml_ftype_to_ggml_type(),
ggml_is_quantized(),
ggml_type_sizef()
ggml_type_name(GGML_TYPE_F32) # "f32" ggml_type_name(GGML_TYPE_Q4_0) # "q4_0"ggml_type_name(GGML_TYPE_F32) # "f32" ggml_type_name(GGML_TYPE_Q4_0) # "q4_0"
Returns the size in bytes for all elements in a block for a given type.
ggml_type_size(type)ggml_type_size(type)
type |
GGML type constant (e.g., GGML_TYPE_F32) |
Size in bytes
Returns the size in bytes of a GGML type as a floating-point number. For quantized types, this is the average bytes per element.
ggml_type_sizef(type)ggml_type_sizef(type)
type |
GGML type constant |
Numeric size in bytes (can be fractional for quantized types)
Other type_system:
ggml_blck_size(),
ggml_ftype_to_ggml_type(),
ggml_is_quantized(),
ggml_type_name()
ggml_type_sizef(GGML_TYPE_F32) # 4.0 ggml_type_sizef(GGML_TYPE_F16) # 2.0ggml_type_sizef(GGML_TYPE_F32) # 4.0 ggml_type_sizef(GGML_TYPE_F16) # 2.0
Returns the string name of a GGML unary operation.
ggml_unary_op_name(op)ggml_unary_op_name(op)
op |
GGML unary operation constant |
Character string with operation name
Other op_info:
ggml_get_unary_op(),
ggml_op_desc(),
ggml_op_name(),
ggml_op_symbol()
Sets trainable = TRUE on layers. Accepts optional from / to
to unfreeze a range of layers, or layer_names to unfreeze by name.
If none are provided, all layers are unfrozen.
ggml_unfreeze_weights( model, from = 1L, to = length(model$layers), layer_names = NULL, ... )ggml_unfreeze_weights( model, from = 1L, to = length(model$layers), layer_names = NULL, ... )
model |
A model object (ggml_sequential_model or ggml_functional_model) |
from |
Integer index of the first layer to unfreeze (default: 1) |
to |
Integer index of the last layer to unfreeze (default: last layer) |
layer_names |
Character vector of layer names to unfreeze (overrides from/to) |
... |
Additional arguments passed to methods |
The model with selected layers unfrozen.
model <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") model <- ggml_freeze_weights(model) model <- ggml_unfreeze_weights(model, from = 2) # unfreeze last layer onlymodel <- ggml_model_sequential() |> ggml_layer_dense(64, activation = "relu") |> ggml_layer_dense(10, activation = "softmax") model <- ggml_freeze_weights(model) model <- ggml_unfreeze_weights(model, from = 2) # unfreeze last layer only
Reconstructs a ggmlR model previously produced by
ggml_marshal_model. Validates the container's format tag,
schema version, and (if digest is installed) the SHA-256 checksum of
the payload before deserializing.
ggml_unmarshal_model(x, backend = NULL)ggml_unmarshal_model(x, backend = NULL)
x |
A |
backend |
Backend selection passed through to
|
A compiled ggmlR model object (sequential or functional).
ggml_marshal_model, ggml_load_model
Upscales tensor by multiplying ne0 and ne1 by scale factor. Supports different interpolation modes for image upscaling.
ggml_upscale(ctx, a, scale_factor, mode = 0L) GGML_SCALE_MODE_NEAREST GGML_SCALE_MODE_BILINEAR GGML_SCALE_MODE_BICUBICggml_upscale(ctx, a, scale_factor, mode = 0L) GGML_SCALE_MODE_NEAREST GGML_SCALE_MODE_BILINEAR GGML_SCALE_MODE_BICUBIC
ctx |
GGML context |
a |
Input tensor (typically 2D or 4D for images) |
scale_factor |
Integer scale factor (e.g., 2 = double size) |
mode |
Scale mode constant (see details) |
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
Scale mode constants:
GGML_SCALE_MODE_NEAREST (0): Nearest neighbor interpolation - fastest, pixelated
GGML_SCALE_MODE_BILINEAR (1): Bilinear interpolation - smooth, good balance
GGML_SCALE_MODE_BICUBIC (2): Bicubic interpolation - smoothest, most compute
Upscaled tensor with dimensions multiplied by scale_factor
ctx <- ggml_init(16 * 1024 * 1024) img <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 8) ggml_set_f32(img, rnorm(64)) # Nearest neighbor (fastest, pixelated) up_nearest <- ggml_upscale(ctx, img, 2, GGML_SCALE_MODE_NEAREST) # Bilinear (smooth) up_bilinear <- ggml_upscale(ctx, img, 2, GGML_SCALE_MODE_BILINEAR) # Bicubic (smoothest) up_bicubic <- ggml_upscale(ctx, img, 2, GGML_SCALE_MODE_BICUBIC) graph <- ggml_build_forward_expand(ctx, up_nearest) ggml_graph_compute(ctx, graph) # Result is 16x16 ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) img <- ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 8, 8) ggml_set_f32(img, rnorm(64)) # Nearest neighbor (fastest, pixelated) up_nearest <- ggml_upscale(ctx, img, 2, GGML_SCALE_MODE_NEAREST) # Bilinear (smooth) up_bilinear <- ggml_upscale(ctx, img, 2, GGML_SCALE_MODE_BILINEAR) # Bicubic (smoothest) up_bicubic <- ggml_upscale(ctx, img, 2, GGML_SCALE_MODE_BICUBIC) graph <- ggml_build_forward_expand(ctx, up_nearest) ggml_graph_compute(ctx, graph) # Result is 16x16 ggml_free(ctx)
Returns the amount of memory currently used in the context
ggml_used_mem(ctx)ggml_used_mem(ctx)
ctx |
GGML context |
Used memory in bytes
ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_used_mem(ctx) ggml_free(ctx)ctx <- ggml_init(1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) ggml_used_mem(ctx) ggml_free(ctx)
Get GGML version
ggml_version()ggml_version()
Character string with GGML version
ggml_version()ggml_version()
Creates a 1D view of a tensor starting at a byte offset. The view shares memory with the source tensor.
ggml_view_1d(ctx, a, ne0, offset = 0)ggml_view_1d(ctx, a, ne0, offset = 0)
ctx |
GGML context |
a |
Source tensor |
ne0 |
Number of elements in the view |
offset |
Byte offset from the start of tensor data |
View tensor
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) # View elements 10-19 (offset = 10 * 4 bytes = 40) v <- ggml_view_1d(ctx, a, 10, 40) ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 100) # View elements 10-19 (offset = 10 * 4 bytes = 40) v <- ggml_view_1d(ctx, a, 10, 40) ggml_free(ctx)
Creates a 2D view of a tensor starting at a byte offset. The view shares memory with the source tensor.
ggml_view_2d(ctx, a, ne0, ne1, nb1, offset = 0)ggml_view_2d(ctx, a, ne0, ne1, nb1, offset = 0)
ctx |
GGML context |
a |
Source tensor |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
nb1 |
Stride for dimension 1 (in bytes) |
offset |
Byte offset from the start of tensor data |
View tensor
Creates a 3D view of a tensor starting at a byte offset. The view shares memory with the source tensor.
ggml_view_3d(ctx, a, ne0, ne1, ne2, nb1, nb2, offset = 0)ggml_view_3d(ctx, a, ne0, ne1, ne2, nb1, nb2, offset = 0)
ctx |
GGML context |
a |
Source tensor |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
ne2 |
Size of dimension 2 |
nb1 |
Stride for dimension 1 (in bytes) |
nb2 |
Stride for dimension 2 (in bytes) |
offset |
Byte offset from the start of tensor data |
View tensor
Creates a 4D view of a tensor starting at a byte offset. The view shares memory with the source tensor. CRITICAL for KV-cache operations in transformers.
ggml_view_4d(ctx, a, ne0, ne1, ne2, ne3, nb1, nb2, nb3, offset = 0)ggml_view_4d(ctx, a, ne0, ne1, ne2, ne3, nb1, nb2, nb3, offset = 0)
ctx |
GGML context |
a |
Source tensor |
ne0 |
Size of dimension 0 |
ne1 |
Size of dimension 1 |
ne2 |
Size of dimension 2 |
ne3 |
Size of dimension 3 |
nb1 |
Stride for dimension 1 (in bytes) |
nb2 |
Stride for dimension 2 (in bytes) |
nb3 |
Stride for dimension 3 (in bytes) |
offset |
Byte offset from the start of tensor data |
View tensor
Creates a view of the tensor (shares data, no copy)
ggml_view_tensor(ctx, src)ggml_view_tensor(ctx, src)
ctx |
GGML context |
src |
Source tensor |
View tensor (shares data with src)
ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) view <- ggml_view_tensor(ctx, a) # view shares data with a ggml_free(ctx)ctx <- ggml_init(16 * 1024 * 1024) a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) view <- ggml_view_tensor(ctx, a) # view shares data with a ggml_free(ctx)
Returns TRUE if the package was compiled with Vulkan support. To enable Vulkan, install libvulkan-dev and glslc, then reinstall ggmlR.
ggml_vulkan_available()ggml_vulkan_available()
Logical indicating if Vulkan is available
ggml_vulkan_available()ggml_vulkan_available()
Returns the name of the Vulkan backend (includes device info).
ggml_vulkan_backend_name(backend)ggml_vulkan_backend_name(backend)
backend |
Vulkan backend pointer |
Character string with backend name
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { backend <- ggml_vulkan_init(0) print(ggml_vulkan_backend_name(backend)) ggml_vulkan_free(backend) }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { backend <- ggml_vulkan_init(0) print(ggml_vulkan_backend_name(backend)) ggml_vulkan_free(backend) }
Returns hardware capabilities for the specified Vulkan device.
ggml_vulkan_device_caps(device = 0L)ggml_vulkan_device_caps(device = 0L)
device |
Device index (0-based, default 0) |
Named list: coopmat_support, coopmat1_fa_support, fp16, subgroup_size, subgroup_no_shmem
Returns the number of available Vulkan-capable GPU devices.
ggml_vulkan_device_count()ggml_vulkan_device_count()
Integer count of Vulkan devices (0 if Vulkan not available)
if (ggml_vulkan_available()) { ggml_vulkan_device_count() }if (ggml_vulkan_available()) { ggml_vulkan_device_count() }
Returns a human-readable description of the specified Vulkan device.
ggml_vulkan_device_description(device = 0L)ggml_vulkan_device_description(device = 0L)
device |
Device index (0-based) |
Character string with device description
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { ggml_vulkan_device_description(0) }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { ggml_vulkan_device_description(0) }
Returns free and total memory for the specified Vulkan device.
ggml_vulkan_device_memory(device = 0L)ggml_vulkan_device_memory(device = 0L)
device |
Device index (0-based) |
Named list with 'free' and 'total' memory in bytes
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { mem <- ggml_vulkan_device_memory(0) cat("Free:", mem$free / 1e9, "GB\n") cat("Total:", mem$total / 1e9, "GB\n") }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { mem <- ggml_vulkan_device_memory(0) cat("Free:", mem$free / 1e9, "GB\n") cat("Total:", mem$total / 1e9, "GB\n") }
Releases resources associated with the Vulkan backend.
ggml_vulkan_free(backend)ggml_vulkan_free(backend)
backend |
Vulkan backend pointer from ggml_vulkan_init() |
NULL (invisible)
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { backend <- ggml_vulkan_init(0) ggml_vulkan_free(backend) }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { backend <- ggml_vulkan_init(0) ggml_vulkan_free(backend) }
Creates a Vulkan backend for the specified device. The backend must be freed with ggml_vulkan_free() when done.
ggml_vulkan_init(device = 0L)ggml_vulkan_init(device = 0L)
device |
Device index (0-based, default 0) |
Vulkan backend pointer
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { backend <- ggml_vulkan_init(0) print(ggml_vulkan_backend_name(backend)) ggml_vulkan_free(backend) }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { backend <- ggml_vulkan_init(0) print(ggml_vulkan_backend_name(backend)) ggml_vulkan_free(backend) }
Returns TRUE if the given backend is a Vulkan backend.
ggml_vulkan_is_backend(backend)ggml_vulkan_is_backend(backend)
backend |
Backend pointer |
Logical indicating if backend is Vulkan
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { vk_backend <- ggml_vulkan_init(0) cpu_backend <- ggml_backend_cpu_init() ggml_vulkan_is_backend(vk_backend) # TRUE ggml_vulkan_is_backend(cpu_backend) # FALSE ggml_vulkan_free(vk_backend) ggml_backend_free(cpu_backend) }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { vk_backend <- ggml_vulkan_init(0) cpu_backend <- ggml_backend_cpu_init() ggml_vulkan_is_backend(vk_backend) # TRUE ggml_vulkan_is_backend(cpu_backend) # FALSE ggml_vulkan_free(vk_backend) ggml_backend_free(cpu_backend) }
Returns detailed information about all available Vulkan devices.
ggml_vulkan_list_devices()ggml_vulkan_list_devices()
List of device information (index, name, memory)
if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { devices <- ggml_vulkan_list_devices() print(devices) }if (ggml_vulkan_available() && ggml_vulkan_device_count() > 0) { devices <- ggml_vulkan_list_devices() print(devices) }
Prints information about Vulkan availability and devices.
ggml_vulkan_status()ggml_vulkan_status()
NULL (invisible), prints status to console
ggml_vulkan_status()ggml_vulkan_status()
Partitions a tensor into non-overlapping windows of size w.
ggml_win_part(ctx, a, w)ggml_win_part(ctx, a, w)
ctx |
GGML context |
a |
Input tensor |
w |
Window size |
Partitioned tensor
Reassembles windowed partitions produced by ggml_win_part.
ggml_win_unpart(ctx, a, w0, h0, w)ggml_win_unpart(ctx, a, w0, h0, w)
ctx |
GGML context |
a |
Input tensor |
w0 |
Original width |
h0 |
Original height |
w |
Window size |
Un-partitioned tensor
Creates a temporary context, executes code, and frees it automatically. Useful when you need to create large temporary tensors.
ggml_with_temp_ctx(mem_size, expr)ggml_with_temp_ctx(mem_size, expr)
mem_size |
Context memory size in bytes |
expr |
Expression to evaluate with the temporary context |
Result of the expression
# Create tensors in temporary context result <- ggml_with_temp_ctx(1024 * 1024, { a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) ggml_get_f32(a) })# Create tensors in temporary context result <- ggml_with_temp_ctx(1024 * 1024, { a <- ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 10) ggml_set_f32(a, 1:10) ggml_get_f32(a) })
Explicitly frees the internal GGUF context. Called automatically by the garbage collector, but can be called manually to release memory sooner.
gguf_free(x)gguf_free(x)
x |
A |
Called for its side effect (releases the GGUF context); invisibly returns NULL.
Opens a GGUF file and reads its metadata. By default also reads tensor data
into memory; with meta_only = TRUE only the header and key-value
metadata are read (no tensor data is allocated), which is cheap and enough
for inspecting architecture / type fields. Returns an S3 object of class
"gguf" wrapping the internal pointer.
gguf_load(path, meta_only = FALSE)gguf_load(path, meta_only = FALSE)
path |
Path to a .gguf file. |
meta_only |
If |
An object of class "gguf".
Returns all key-value metadata pairs from a GGUF file as a named list.
gguf_metadata(x)gguf_metadata(x)
x |
A |
A named list of metadata values.
Dequantizes (if needed) and returns tensor weights as an R numeric array with dimensions matching the tensor shape.
gguf_tensor_data(x, name)gguf_tensor_data(x, name)
x |
A |
name |
Tensor name (character). |
A numeric array.
Returns name, shape, type, and size in bytes for a single tensor.
gguf_tensor_info(x, name)gguf_tensor_info(x, name)
x |
A |
name |
Tensor name (character). |
When the file was opened with meta_only = TRUE, the per-dimension
shape is NA (the public GGUF API does not expose tensor
dimensions without allocating tensors); name, type and
size_bytes are still returned.
A list with elements name, shape, type,
size_bytes.
List Tensor Names in a GGUF File
gguf_tensor_names(x)gguf_tensor_names(x)
x |
A |
Character vector of tensor names.
Returns a single-row tibble summarising the fitted model,
in broom glance() style.
## S3 method for class 'ggmlr_parsnip_model' glance(x, ...)## S3 method for class 'ggmlr_parsnip_model' glance(x, ...)
x |
A fitted |
... |
Unused; for generic compatibility. |
A one-row tibble with columns: mode, n_features, n_layers,
total_params, optimizer, loss, backend, epochs, fit_time (wall
seconds) and final_loss (last training loss, NA if no history).
spec <- parsnip::mlp(hidden_units = 8L, epochs = 3L) |> parsnip::set_engine("ggml", backend = "cpu") |> parsnip::set_mode("regression") fit_obj <- parsnip::fit(spec, mpg ~ ., data = mtcars) generics::glance(parsnip::extract_fit_engine(fit_obj))spec <- parsnip::mlp(hidden_units = 8L, epochs = 3L) |> parsnip::set_engine("ggml", backend = "cpu") |> parsnip::set_mode("regression") fit_obj <- parsnip::fit(spec, mpg ~ ., data = mtcars) generics::glance(parsnip::extract_fit_engine(fit_obj))
Frees lookup tables for IQ2 quantization types.
iq2xs_free_impl(type)iq2xs_free_impl(type)
type |
GGML type constant |
NULL invisibly
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Initializes lookup tables for IQ2 quantization types. Must be called before using iq2_xxs, iq2_xs, or iq2_s quantization.
iq2xs_init_impl(type)iq2xs_init_impl(type)
type |
GGML type constant (e.g., GGML_TYPE_IQ2_XXS()) |
NULL invisibly
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Frees lookup tables for IQ3 quantization types.
iq3xs_free_impl(grid_size)iq3xs_free_impl(grid_size)
grid_size |
Grid size for IQ3 |
NULL invisibly
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Initializes lookup tables for IQ3 quantization types. Must be called before using iq3_xxs or iq3_s quantization.
iq3xs_init_impl(grid_size)iq3xs_init_impl(grid_size)
grid_size |
Grid size for IQ3 (typically 256) |
NULL invisibly
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Varies the learning rate following a cosine curve from lr_max down to
lr_min over T_max steps. Restarts (SGDR-style) if
restart = TRUE.
lr_scheduler_cosine(optimizer, T_max, lr_min = 0, restart = FALSE)lr_scheduler_cosine(optimizer, T_max, lr_min = 0, restart = FALSE)
optimizer |
Optimizer environment. |
T_max |
Number of steps for one cosine cycle. |
lr_min |
Minimum learning rate (default 0). |
restart |
Logical; if |
An lr_scheduler_cosine environment
w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_adam(list(w = w), lr = 0.1) sch <- lr_scheduler_cosine(opt, T_max = 50L) for (epoch in 1:50) sch$step()w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_adam(list(w = w), lr = 0.1) sch <- lr_scheduler_cosine(opt, T_max = 50L) for (epoch in 1:50) sch$step()
Multiplies the optimizer learning rate by gamma every
step_size calls to $step().
lr_scheduler_step(optimizer, step_size, gamma = 0.1)lr_scheduler_step(optimizer, step_size, gamma = 0.1)
optimizer |
An |
step_size |
Decay every this many steps (epochs). |
gamma |
Multiplicative decay factor (default 0.1). |
An lr_scheduler_step environment
w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_adam(list(w = w), lr = 0.1) sch <- lr_scheduler_step(opt, step_size = 10L, gamma = 0.5) for (epoch in 1:30) sch$step() opt$lr # 0.1 * 0.5^3 = 0.0125w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_adam(list(w = w), lr = 0.1) sch <- lr_scheduler_step(opt, step_size = 10L, gamma = 0.5) for (epoch in 1:30) sch$step() opt$lr # 0.1 * 0.5^3 = 0.0125
Topologically sort nodes reachable from output nodes
nn_topo_sort(outputs)nn_topo_sort(outputs)
outputs |
List of output ggml_tensor_node objects |
Named list: nodes in topological order (inputs first, outputs last)
Returns information about backend placement: which backends are available, how the scheduler splits the graph, and how many ops are supported by GPU vs CPU-only.
onnx_device_info(model)onnx_device_info(model)
model |
An |
A list with:
Character vector of backend names (e.g. "Vulkan0", "CPU")
Number of backends
Number of scheduler splits (1 = all on one backend)
Total graph nodes
Ops supported by GPU backend
Ops that can only run on CPU
Named integer vector: op type => count (empty if all on GPU)
Returns the names and shapes of model inputs (excluding weight
initializers). Use this to know what to pass to onnx_run().
onnx_inputs(model)onnx_inputs(model)
model |
An |
A named list where names are input tensor names and values are integer vectors of dimension sizes (-1 for dynamic dimensions).
Parses an .onnx file, builds a ggml computation graph, and allocates tensors on the specified device. Weights are loaded via memory-mapped file (zero-copy where possible).
onnx_load( path, device = NULL, input_shapes = NULL, n_threads = NULL, dtype = "f32" )onnx_load( path, device = NULL, input_shapes = NULL, n_threads = NULL, dtype = "f32" )
path |
Path to .onnx file. |
device |
Backend device: |
input_shapes |
Optional named list of integer vectors specifying
fixed shapes for inputs with dynamic dimensions. Names must match
input tensor names. Each shape must include all dimensions including
batch, e.g. |
n_threads |
Number of CPU threads. |
dtype |
Weight precision: |
An opaque model object (external pointer) for use with
onnx_run(), onnx_summary(), and onnx_inputs().
Run ONNX model inference
onnx_run(model, inputs)onnx_run(model, inputs)
model |
An |
inputs |
A named list of numeric vectors/matrices.
Names must match the model's input tensor names.
Use |
A named list of output tensors (numeric vectors with dim attributes for multi-dimensional outputs).
Returns metadata about a loaded ONNX model.
onnx_summary(model)onnx_summary(model)
model |
An |
A list with ir_version, opset_version,
producer, graph_name, n_nodes,
n_initializers, and ops.
Create an Adam optimizer
optimizer_adam(params, lr = 0.001, beta1 = 0.9, beta2 = 0.999, eps = 1e-08)optimizer_adam(params, lr = 0.001, beta1 = 0.9, beta2 = 0.999, eps = 1e-08)
params |
Named list of ag_param tensors |
lr |
Learning rate (default 1e-3) |
beta1 |
First moment decay (default 0.9) |
beta2 |
Second moment decay (default 0.999) |
eps |
Stability constant (default 1e-8) |
An optimizer environment
w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_adam(list(w = w), lr = 1e-3)w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_adam(list(w = w), lr = 1e-3)
Create an SGD optimizer
optimizer_sgd(params, lr = 0.01, momentum = 0)optimizer_sgd(params, lr = 0.01, momentum = 0)
params |
Named list of ag_param tensors |
lr |
Learning rate (default 0.01) |
momentum |
Momentum factor (default 0) |
An optimizer environment
w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_sgd(list(w = w), lr = 0.01)w <- ag_param(matrix(runif(4), 2, 2)) opt <- optimizer_sgd(list(w = w), lr = 0.01)
Plots loss and accuracy curves over epochs.
## S3 method for class 'ggml_history' plot(x, ...)## S3 method for class 'ggml_history' plot(x, ...)
x |
A ggml_history object |
... |
Additional arguments (ignored) |
The history object (invisibly).
Generates predictions from a trained model. Uses the standard R
predict generic for compatibility with keras3 and
the broader R ecosystem.
## S3 method for class 'ggml_sequential_model' predict(object, x, batch_size = 32L, ...) ## S3 method for class 'ggml_functional_model' predict(object, x, batch_size = 32L, ...)## S3 method for class 'ggml_sequential_model' predict(object, x, batch_size = 32L, ...) ## S3 method for class 'ggml_functional_model' predict(object, x, batch_size = 32L, ...)
object |
A trained model object. |
x |
Input data (matrix, array, or list for multi-input models). |
batch_size |
Batch size for inference (default 32). |
... |
Additional arguments (ignored). |
Matrix of predictions.
Print method for ag_tensor
## S3 method for class 'ag_tensor' print(x, ...)## S3 method for class 'ag_tensor' print(x, ...)
x |
An |
... |
Ignored |
The input x, returned invisibly (called for its side effect of printing).
Print method for ggml_functional_model
## S3 method for class 'ggml_functional_model' print(x, ...)## S3 method for class 'ggml_functional_model' print(x, ...)
x |
A ggml_functional_model object |
... |
Additional arguments (ignored) |
The model object (invisibly).
Print method for ggml_history
## S3 method for class 'ggml_history' print(x, ...)## S3 method for class 'ggml_history' print(x, ...)
x |
A ggml_history object |
... |
Additional arguments (ignored) |
The history object (invisibly).
Prints a summary of the model architecture including layer types, output shapes, and parameter counts.
## S3 method for class 'ggml_sequential_model' print(x, ...)## S3 method for class 'ggml_sequential_model' print(x, ...)
x |
A ggml_sequential_model object |
... |
Additional arguments (ignored) |
The model object (invisibly).
Print ONNX model summary
## S3 method for class 'onnx_model' print(x, ...)## S3 method for class 'onnx_model' print(x, ...)
x |
An |
... |
Ignored. |
Invisibly returns x.
Quantizes float data to IQ format. IQ formats require importance matrix initialization before use (see iq2xs_init_impl, iq3xs_init_impl).
quantize_iq2_xxs(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq2_xs(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq2_s(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq3_xxs(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq3_s(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq1_s(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq1_m(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq4_nl(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq4_xs(src_data, n_rows, n_per_row, imatrix = NULL)quantize_iq2_xxs(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq2_xs(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq2_s(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq3_xxs(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq3_s(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq1_s(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq1_m(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq4_nl(src_data, n_rows, n_per_row, imatrix = NULL) quantize_iq4_xs(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row |
imatrix |
Optional importance matrix (numeric vector or NULL) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Quantizes float data to MXFP4 (microscaling FP4) format.
quantize_mxfp4(src_data, n_rows, n_per_row, imatrix = NULL)quantize_mxfp4(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row |
imatrix |
Optional importance matrix (numeric vector or NULL) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Quantizes float data to NVFP4 format (NVIDIA FP4 with UE4M3 per-sub-block scale).
quantize_nvfp4(src_data, n_rows, n_per_row, imatrix = NULL)quantize_nvfp4(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row (must be multiple of 64) |
imatrix |
Optional importance matrix (currently ignored) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Quantizes float data to Q1_0 format (1-bit-per-weight sign quantization).
quantize_q1_0(src_data, n_rows, n_per_row, imatrix = NULL)quantize_q1_0(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row (must be multiple of 128) |
imatrix |
Optional importance matrix (currently ignored) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Quantizes float data to K-quant format with optional importance matrix. K-quants provide better quality/size tradeoffs than basic quants.
quantize_q2_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q3_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q4_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q5_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q6_K(src_data, n_rows, n_per_row, imatrix = NULL)quantize_q2_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q3_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q4_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q5_K(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q6_K(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row |
imatrix |
Optional importance matrix (numeric vector or NULL) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Quantizes float data to Q4_0 format with optional importance matrix.
quantize_q4_0(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q4_1(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q5_0(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q5_1(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q8_0(src_data, n_rows, n_per_row, imatrix = NULL)quantize_q4_0(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q4_1(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q5_0(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q5_1(src_data, n_rows, n_per_row, imatrix = NULL) quantize_q8_0(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row |
imatrix |
Optional importance matrix (numeric vector or NULL) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Basic row-level IQ quantization.
quantize_row_iq3_xxs_ref(src_data, n_elements) quantize_row_iq4_nl_ref(src_data, n_elements) quantize_row_iq4_xs_ref(src_data, n_elements) quantize_row_iq3_s_ref(src_data, n_elements) quantize_row_iq2_s_ref(src_data, n_elements)quantize_row_iq3_xxs_ref(src_data, n_elements) quantize_row_iq4_nl_ref(src_data, n_elements) quantize_row_iq4_xs_ref(src_data, n_elements) quantize_row_iq3_s_ref(src_data, n_elements) quantize_row_iq2_s_ref(src_data, n_elements)
src_data |
Numeric vector of float values to quantize |
n_elements |
Number of elements to quantize |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Basic row-level MXFP4 quantization.
quantize_row_mxfp4_ref(src_data, n_elements)quantize_row_mxfp4_ref(src_data, n_elements)
src_data |
Numeric vector of float values to quantize |
n_elements |
Number of elements to quantize |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Basic row-level K-quant quantization without importance matrix.
quantize_row_q2_K_ref(src_data, n_elements) quantize_row_q3_K_ref(src_data, n_elements) quantize_row_q4_K_ref(src_data, n_elements) quantize_row_q5_K_ref(src_data, n_elements) quantize_row_q6_K_ref(src_data, n_elements) quantize_row_q8_K_ref(src_data, n_elements)quantize_row_q2_K_ref(src_data, n_elements) quantize_row_q3_K_ref(src_data, n_elements) quantize_row_q4_K_ref(src_data, n_elements) quantize_row_q5_K_ref(src_data, n_elements) quantize_row_q6_K_ref(src_data, n_elements) quantize_row_q8_K_ref(src_data, n_elements)
src_data |
Numeric vector of float values to quantize |
n_elements |
Number of elements to quantize |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Basic row-level quantization without importance matrix. These are reference implementations.
quantize_row_q4_0_ref(src_data, n_elements) quantize_row_q4_1_ref(src_data, n_elements) quantize_row_q5_0_ref(src_data, n_elements) quantize_row_q5_1_ref(src_data, n_elements) quantize_row_q8_0_ref(src_data, n_elements) quantize_row_q8_1_ref(src_data, n_elements)quantize_row_q4_0_ref(src_data, n_elements) quantize_row_q4_1_ref(src_data, n_elements) quantize_row_q5_0_ref(src_data, n_elements) quantize_row_q5_1_ref(src_data, n_elements) quantize_row_q8_0_ref(src_data, n_elements) quantize_row_q8_1_ref(src_data, n_elements)
src_data |
Numeric vector of float values to quantize |
n_elements |
Number of elements to quantize |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_tq1_0_ref(),
quantize_tq1_0()
Basic row-level ternary quantization.
quantize_row_tq1_0_ref(src_data, n_elements) quantize_row_tq2_0_ref(src_data, n_elements)quantize_row_tq1_0_ref(src_data, n_elements) quantize_row_tq2_0_ref(src_data, n_elements)
src_data |
Numeric vector of float values to quantize |
n_elements |
Number of elements to quantize |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_tq1_0()
Quantizes float data to ternary format with optional importance matrix.
quantize_tq1_0(src_data, n_rows, n_per_row, imatrix = NULL) quantize_tq2_0(src_data, n_rows, n_per_row, imatrix = NULL)quantize_tq1_0(src_data, n_rows, n_per_row, imatrix = NULL) quantize_tq2_0(src_data, n_rows, n_per_row, imatrix = NULL)
src_data |
Numeric vector of float values to quantize |
n_rows |
Number of rows |
n_per_row |
Number of elements per row |
imatrix |
Optional importance matrix (numeric vector or NULL) |
Raw vector of quantized data
Other quantization:
dequantize_row_iq2_xxs(),
dequantize_row_mxfp4(),
dequantize_row_nvfp4(),
dequantize_row_q1_0(),
dequantize_row_q2_K(),
dequantize_row_q4_0(),
dequantize_row_tq1_0(),
ggml_quant_block_info(),
iq2xs_free_impl(),
iq2xs_init_impl(),
iq3xs_free_impl(),
iq3xs_init_impl(),
quantize_iq2_xxs(),
quantize_mxfp4(),
quantize_nvfp4(),
quantize_q1_0(),
quantize_q2_K(),
quantize_q4_0(),
quantize_row_iq3_xxs_ref(),
quantize_row_mxfp4_ref(),
quantize_row_q2_K_ref(),
quantize_row_q4_0_ref(),
quantize_row_tq1_0_ref()
RoPE (Rotary Position Embedding) Type Constants
GGML_ROPE_TYPE_NORM GGML_ROPE_TYPE_NEOX GGML_ROPE_TYPE_MROPE GGML_ROPE_TYPE_VISIONGGML_ROPE_TYPE_NORM GGML_ROPE_TYPE_NEOX GGML_ROPE_TYPE_MROPE GGML_ROPE_TYPE_VISION
Integer constants
An object of class integer of length 1.
An object of class integer of length 1.
An object of class integer of length 1.
Constants for RoPE (Rotary Position Embedding) modes used in transformer models. Different models use different RoPE implementations.
GGML_ROPE_TYPE_NORM (0): Standard RoPE as in original paper (LLaMA, Mistral)
GGML_ROPE_TYPE_NEOX (2): GPT-NeoX style RoPE with different interleaving
GGML_ROPE_TYPE_MROPE (8): Multi-RoPE for multimodal models (Qwen2-VL)
GGML_ROPE_TYPE_VISION (24): Vision model RoPE variant
An integer constant representing a RoPE type
GGML_ROPE_TYPE_NORM # 0 - Standard RoPE (LLaMA, Mistral) GGML_ROPE_TYPE_NEOX # 2 - GPT-NeoX style GGML_ROPE_TYPE_MROPE # 8 - Multi-RoPE (Qwen2-VL) GGML_ROPE_TYPE_VISION # 24 - Vision modelsGGML_ROPE_TYPE_NORM # 0 - Standard RoPE (LLaMA, Mistral) GGML_ROPE_TYPE_NEOX # 2 - GPT-NeoX style GGML_ROPE_TYPE_MROPE # 8 - Multi-RoPE (Qwen2-VL) GGML_ROPE_TYPE_VISION # 24 - Vision models
Prints a detailed summary including input shape, layer details, trainable/non-trainable parameter counts, and memory estimate.
## S3 method for class 'ggml_sequential_model' summary(object, ...)## S3 method for class 'ggml_sequential_model' summary(object, ...)
object |
A ggml_sequential_model object |
... |
Additional arguments (ignored) |
The model object (invisibly).
Returns one row per layer of the underlying sequential network, in broom style. Useful for comparing architectures across experiments in a R Markdown / Quarto report.
## S3 method for class 'ggmlr_parsnip_model' tidy(x, ...)## S3 method for class 'ggmlr_parsnip_model' tidy(x, ...)
x |
A fitted |
... |
Unused; for generic compatibility. |
A tibble with columns: layer (name), type,
units (output units, NA if not applicable), activation,
output_shape (character), params (trainable parameter count) and
trainable (logical).
spec <- parsnip::mlp(hidden_units = 8L, epochs = 3L) |> parsnip::set_engine("ggml", backend = "cpu") |> parsnip::set_mode("regression") fit_obj <- parsnip::fit(spec, mpg ~ ., data = mtcars) generics::tidy(parsnip::extract_fit_engine(fit_obj))spec <- parsnip::mlp(hidden_units = 8L, epochs = 3L) |> parsnip::set_engine("ggml", backend = "cpu") |> parsnip::set_mode("regression") fit_obj <- parsnip::fit(spec, mpg ~ ., data = mtcars) generics::tidy(parsnip::extract_fit_engine(fit_obj))
Records all ag_* operations inside expr for later backward().
When the default device is "gpu", the ggml context is reset at the
start of each tape.
with_grad_tape(expr)with_grad_tape(expr)
expr |
Expression to evaluate under gradient tape |
Value of last expression in expr (invisibly)
w <- ag_param(matrix(c(1, 0, 0, 1), 2, 2)) x <- ag_tensor(matrix(c(1, 2), 2, 1)) y <- ag_tensor(matrix(c(1, 2), 2, 1)) with_grad_tape({ out <- ag_matmul(w, x) loss <- ag_mse_loss(out, y) }) backward(loss)w <- ag_param(matrix(c(1, 0, 0, 1), 2, 2)) x <- ag_tensor(matrix(c(1, 2), 2, 1)) y <- ag_tensor(matrix(c(1, 2), 2, 1)) with_grad_tape({ out <- ag_matmul(w, x) loss <- ag_mse_loss(out, y) }) backward(loss)