Scripting Guide

Lesson 11: AI in Your Agents (llm)

Connecting Groq, Mistral, or Gemini and calling models from agents with llm.chat: options, limits, and patterns.

Setup

AI calls run on your own API key. Run /llm, pick a provider, and paste a key from that provider's console. You can connect any or all of the three supported providers; keys are stored encrypted and never exposed to agent code. The llm helper is the third argument of your agent function.

Provider	Default model	Notes
Groq	`llama-3.3-70b-versatile`	Fast and has a generous free tier; great default.
Mistral	`mistral-small-latest`
Gemini	`gemini-2.0-flash`

The llm API

await llm.providers() returns the list of connected provider names, e.g. ["groq"].
await llm.chat(options) sends a chat completion request and returns the reply.

chat option	Type	Notes
`provider`	string (required)	`"groq"`, `"mistral"`, or `"gemini"`; must be connected.
`messages`	array (required)	`{ role, content }` objects; max 50 messages, 64,000 characters total.
`model`	string	Override the default model.
`maxTokens`	number	Default 512, capped at 4096.
`temperature`	number	Optional sampling temperature.
`stop`	string or array	Optional stop sequences.

On success the result has text (the reply), finishReason, model, provider, and usage (token counts). On failure it has an error field instead; AI calls never throw.

An AI question command

ask.js (event: messageCreate)

export async function onMessage(message, db, llm) {
  if (message.author.bot) return;
  if (!message.content.startsWith("!ask ")) return;

  const question = message.content.slice(5).trim();
  if (!question) return;

  // Show a typing indicator while the model thinks.
  await message.channel.sendTyping();

  const res = await llm.chat({
    provider: "groq",
    messages: [
      {
        role: "system",
        content:
          "You are a helpful assistant in a Discord server. " +
          "Answer in under 150 words. Plain text only.",
      },
      { role: "user", content: question },
    ],
    maxTokens: 400,
  });

  if (res.error) {
    await message.reply("AI error: " + res.error);
    return;
  }

  // Discord messages cap at 4096 characters; slice to stay safe.
  await message.reply(res.text.slice(0, 1900));
}

Expected behavior: !ask why is the sky blue shows the bot typing, then replies with a short model-written answer. If the key is invalid or the provider is down, the reply is AI error: ... instead of silence.

Timing matters

LLM runs get extra time

Each AI request is cut off after 5 seconds on Free (10 seconds with Premium), and runs in servers with an LLM provider connected get an extended execution window: about 8 seconds on Free (5 s + 3 s) and 15 seconds with Premium (10 s + 5 s). Budget for one AI call per run; chaining several can hit the wall.

Pattern: AI moderation assist

ai-mod.js (event: messageCreate)

export async function onMessage(message, db, llm) {
  if (message.author.bot) return;
  if (message.content.length < 12) return; // skip tiny messages

  const res = await llm.chat({
    provider: "groq",
    messages: [
      {
        role: "system",
        content:
          "Classify the message as SAFE or TOXIC. Reply with exactly one word.",
      },
      { role: "user", content: message.content.slice(0, 1000) },
    ],
    maxTokens: 3,
    temperature: 0,
  });

  if (res.error) return; // fail open: never punish on an API error

  if (res.text.trim().toUpperCase().startsWith("TOXIC")) {
    await message.delete();
    await message.author.send(
      "Your message was removed by the AI moderator. A human can review it on request."
    );
  }
}

Cost and quota awareness

An agent on messageCreate calls your provider on every message it does not filter out. That spends your API quota and counts against your GuildScript run limits. Filter aggressively (length checks, channel checks, prefixes) before calling llm.chat.

Common pitfalls

Provider not connected. chat returns { error: "provider not connected" }; run /llm first or pick a provider from await llm.providers().
Replying with raw model output. Models can exceed Discord's length limits; always slice.
No persistence between runs. The model does not remember earlier conversations; build context yourself by storing past exchanges in db and replaying them in messages (mind the 50-message / 64k-char caps).
Acting on errors. For destructive actions (delete, ban) treat an API error as SAFE; never punish users because an API hiccuped.
Slow interactions. In interactionCreate handlers call deferReply() before llm.chat, then editReply() with the answer.

Exercise

Build !translate <text> that asks the model to translate the text to English and replies with the translation. Then extend it: store each user's last 4 !ask exchanges in db and include them as prior messages so the model can handle follow-up questions.