feat(pruning): Add Wire Prometheus metrics into the Heavy Celery worker

2026-04-08 00:12:45 +00:00 · 2026-04-07 15:16:05 -07:00
188 changed files with 3188 additions and 7809 deletions
--- a/.cursor/skills/onyx-cli/SKILL.md
+++ b/.cursor/skills/onyx-cli/SKILL.md
@@ -1 +0,0 @@
-../../../cli/internal/embedded/SKILL.md
--- a/.cursor/skills/onyx-cli/SKILL.md
+++ b/.cursor/skills/onyx-cli/SKILL.md
@@ -0,0 +1,186 @@
+---
+name: onyx-cli
+description: Query the Onyx knowledge base using the onyx-cli command. Use when the user wants to search company documents, ask questions about internal knowledge, query connected data sources, or look up information stored in Onyx.
+---
+
+# Onyx CLI — Agent Tool
+
+Onyx is an enterprise search and Gen-AI platform that connects to company documents, apps, and people. The `onyx-cli` CLI provides non-interactive commands to query the Onyx knowledge base and list available agents.
+
+## Prerequisites
+
+### 1. Check if installed
+
+```bash
+which onyx-cli
+```
+
+### 2. Install (if needed)
+
+**Primary — pip:**
+
+```bash
+pip install onyx-cli
+```
+
+**From source (Go):**
+
+```bash
+cd cli && go build -o onyx-cli . && sudo mv onyx-cli /usr/local/bin/
+```
+
+### 3. Check if configured
+
+```bash
+onyx-cli validate-config
+```
+
+This checks the config file exists, API key is present, and tests the server connection via `/api/me`. Exit code 0 on success, non-zero with a descriptive error on failure.
+
+If unconfigured, you have two options:
+
+**Option A — Interactive setup (requires user input):**
+
+```bash
+onyx-cli configure
+```
+
+This prompts for the Onyx server URL and API key, tests the connection, and saves config.
+
+**Option B — Environment variables (non-interactive, preferred for agents):**
+
+```bash
+export ONYX_SERVER_URL="https://your-onyx-server.com"  # default: https://cloud.onyx.app
+export ONYX_API_KEY="your-api-key"
+```
+
+Environment variables override the config file. If these are set, no config file is needed.
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `ONYX_SERVER_URL` | No | Onyx server base URL (default: `https://cloud.onyx.app`) |
+| `ONYX_API_KEY` | Yes | API key for authentication |
+| `ONYX_PERSONA_ID` | No | Default agent/persona ID |
+
+If neither the config file nor environment variables are set, tell the user that `onyx-cli` needs to be configured and ask them to either:
+- Run `onyx-cli configure` interactively, or
+- Set `ONYX_SERVER_URL` and `ONYX_API_KEY` environment variables
+
+## Commands
+
+### Validate configuration
+
+```bash
+onyx-cli validate-config
+```
+
+Checks config file exists, API key is present, and tests the server connection. Use this before `ask` or `agents` to confirm the CLI is properly set up.
+
+### List available agents
+
+```bash
+onyx-cli agents
+```
+
+Prints a table of agent IDs, names, and descriptions. Use `--json` for structured output:
+
+```bash
+onyx-cli agents --json
+```
+
+Use agent IDs with `ask --agent-id` to query a specific agent.
+
+### Basic query (plain text output)
+
+```bash
+onyx-cli ask "What is our company's PTO policy?"
+```
+
+Streams the answer as plain text to stdout. Exit code 0 on success, non-zero on error.
+
+### JSON output (structured events)
+
+```bash
+onyx-cli ask --json "What authentication methods do we support?"
+```
+
+Outputs JSON-encoded parsed stream events (one object per line). Key event objects include message deltas, stop, errors, search-start, and citation payloads.
+
+Each line is a JSON object with this envelope:
+
+```json
+{"type": "<event_type>", "event": { ... }}
+```
+
+| Event Type | Description |
+|------------|-------------|
+| `message_delta` | Content token — concatenate all `content` fields for the full answer |
+| `stop` | Stream complete |
+| `error` | Error with `error` message field |
+| `search_tool_start` | Onyx started searching documents |
+| `citation_info` | Source citation — see shape below |
+
+`citation_info` event shape:
+
+```json
+{
+  "type": "citation_info",
+  "event": {
+    "citation_number": 1,
+    "document_id": "abc123def456",
+    "placement": {"turn_index": 0, "tab_index": 0, "sub_turn_index": null}
+  }
+}
+```
+
+`placement` is metadata about where in the conversation the citation appeared and can be ignored for most use cases.
+
+### Specify an agent
+
+```bash
+onyx-cli ask --agent-id 5 "Summarize our Q4 roadmap"
+```
+
+Uses a specific Onyx agent/persona instead of the default.
+
+### All flags
+
+| Flag | Type | Description |
+|------|------|-------------|
+| `--agent-id` | int | Agent ID to use (overrides default) |
+| `--json` | bool | Output raw NDJSON events instead of plain text |
+
+## Statelessness
+
+Each `onyx-cli ask` call creates an independent chat session. There is no built-in way to chain context across multiple `ask` invocations — every call starts fresh. If you need multi-turn conversation with memory, use the interactive TUI (`onyx-cli` or `onyx-cli chat`) instead.
+
+## When to Use
+
+Use `onyx-cli ask` when:
+
+- The user asks about company-specific information (policies, docs, processes)
+- You need to search internal knowledge bases or connected data sources
+- The user references Onyx, asks you to "search Onyx", or wants to query their documents
+- You need context from company wikis, Confluence, Google Drive, Slack, or other connected sources
+
+Do NOT use when:
+
+- The question is about general programming knowledge (use your own knowledge)
+- The user is asking about code in the current repository (use grep/read tools)
+- The user hasn't mentioned Onyx and the question doesn't require internal company data
+
+## Examples
+
+```bash
+# Simple question
+onyx-cli ask "What are the steps to deploy to production?"
+
+# Get structured output for parsing
+onyx-cli ask --json "List all active API integrations"
+
+# Use a specialized agent
+onyx-cli ask --agent-id 3 "What were the action items from last week's standup?"
+
+# Pipe the answer into another command
+onyx-cli ask "What is the database schema for users?" | head -20
+```
--- a/.github/workflows/deployment.yml
+++ b/.github/workflows/deployment.yml
@@ -228,7 +228,7 @@ jobs:

      - name: Create GitHub Release
        id: create-release
-        uses: softprops/action-gh-release@153bb8e04406b158c6c84fc1615b65b24149a1fe # ratchet:softprops/action-gh-release@v2
+        uses: softprops/action-gh-release@da05d552573ad5aba039eaac05058a918a7bf631 # ratchet:softprops/action-gh-release@v2
        with:
          tag_name: ${{ steps.release-tag.outputs.tag }}
          name: ${{ steps.release-tag.outputs.tag }}
--- a/.github/workflows/helm-chart-releases.yml
+++ b/.github/workflows/helm-chart-releases.yml
@@ -21,7 +21,7 @@ jobs:
          persist-credentials: false

      - name: Install Helm CLI
-        uses: azure/setup-helm@dda3372f752e03dde6b3237bc9431cdc2f7a02a2 # ratchet:azure/setup-helm@v5.0.0
+        uses: azure/setup-helm@1a275c3b69536ee54be43f2070a358922e12c8d4 # ratchet:azure/setup-helm@v4
        with:
          version: v3.12.1

--- a/.github/workflows/nightly-close-stale-issues.yml
+++ b/.github/workflows/nightly-close-stale-issues.yml
@@ -13,7 +13,7 @@ jobs:
    runs-on: ubuntu-latest
    timeout-minutes: 45
    steps:
-      - uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # ratchet:actions/stale@v10
+      - uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # ratchet:actions/stale@v10
        with:
          stale-issue-message: 'This issue is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.'
          stale-pr-message: 'This PR is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.'
--- a/.github/workflows/pr-helm-chart-testing.yml
+++ b/.github/workflows/pr-helm-chart-testing.yml
@@ -36,7 +36,7 @@ jobs:
          persist-credentials: false

      - name: Set up Helm
-        uses: azure/setup-helm@dda3372f752e03dde6b3237bc9431cdc2f7a02a2 # ratchet:azure/setup-helm@v5.0.0
+        uses: azure/setup-helm@1a275c3b69536ee54be43f2070a358922e12c8d4 # ratchet:azure/setup-helm@v4.3.1
        with:
          version: v3.19.0

--- a/.gitignore
+++ b/.gitignore
@@ -59,6 +59,3 @@ node_modules

 # plans
 plans/
-
-# Added context for LLMs
-onyx-llm-context/
--- a/backend/alembic/env.py
+++ b/backend/alembic/env.py
@@ -1,4 +1,4 @@
-from typing import Any
+from typing import Any, Literal
 from onyx.db.engine.iam_auth import get_iam_auth_token
 from onyx.configs.app_configs import USE_IAM_AUTH
 from onyx.configs.app_configs import POSTGRES_HOST
@@ -19,6 +19,7 @@ from logging.config import fileConfig

 from alembic import context
 from sqlalchemy.ext.asyncio import create_async_engine
+from sqlalchemy.sql.schema import SchemaItem
 from onyx.configs.constants import SSL_CERT_FILE
 from shared_configs.configs import (
    MULTI_TENANT,
@@ -44,6 +45,8 @@ if config.config_file_name is not None and config.attributes.get(

 target_metadata = [Base.metadata, ResultModelBase.metadata]

+EXCLUDE_TABLES = {"kombu_queue", "kombu_message"}
+
 logger = logging.getLogger(__name__)

 ssl_context: ssl.SSLContext | None = None
@@ -53,6 +56,25 @@ if USE_IAM_AUTH:
    ssl_context = ssl.create_default_context(cafile=SSL_CERT_FILE)


+def include_object(
+    object: SchemaItem,  # noqa: ARG001
+    name: str | None,
+    type_: Literal[
+        "schema",
+        "table",
+        "column",
+        "index",
+        "unique_constraint",
+        "foreign_key_constraint",
+    ],
+    reflected: bool,  # noqa: ARG001
+    compare_to: SchemaItem | None,  # noqa: ARG001
+) -> bool:
+    if type_ == "table" and name in EXCLUDE_TABLES:
+        return False
+    return True
+
+
 def filter_tenants_by_range(
    tenant_ids: list[str], start_range: int | None = None, end_range: int | None = None
 ) -> list[str]:
@@ -209,6 +231,7 @@ def do_run_migrations(
    context.configure(
        connection=connection,
        target_metadata=target_metadata,  # type: ignore
+        include_object=include_object,
        version_table_schema=schema_name,
        include_schemas=True,
        compare_type=True,
@@ -382,6 +405,7 @@ def run_migrations_offline() -> None:
                url=url,
                target_metadata=target_metadata,  # type: ignore
                literal_binds=True,
+                include_object=include_object,
                version_table_schema=schema,
                include_schemas=True,
                script_location=config.get_main_option("script_location"),
@@ -423,6 +447,7 @@ def run_migrations_offline() -> None:
                url=url,
                target_metadata=target_metadata,  # type: ignore
                literal_binds=True,
+                include_object=include_object,
                version_table_schema=schema,
                include_schemas=True,
                script_location=config.get_main_option("script_location"),
@@ -465,6 +490,7 @@ def run_migrations_online() -> None:
            context.configure(
                connection=connection,
                target_metadata=target_metadata,  # type: ignore
+                include_object=include_object,
                version_table_schema=schema_name,
                include_schemas=True,
                compare_type=True,
--- a/backend/alembic_tenants/env.py
+++ b/backend/alembic_tenants/env.py
@@ -1,9 +1,11 @@
 import asyncio
 from logging.config import fileConfig
+from typing import Literal

 from sqlalchemy import pool
 from sqlalchemy.engine import Connection
 from sqlalchemy.ext.asyncio import create_async_engine
+from sqlalchemy.schema import SchemaItem

 from alembic import context
 from onyx.db.engine.sql_engine import build_connection_string
@@ -33,6 +35,27 @@ target_metadata = [PublicBase.metadata]
 # my_important_option = config.get_main_option("my_important_option")
 # ... etc.

+EXCLUDE_TABLES = {"kombu_queue", "kombu_message"}
+
+
+def include_object(
+    object: SchemaItem,  # noqa: ARG001
+    name: str | None,
+    type_: Literal[
+        "schema",
+        "table",
+        "column",
+        "index",
+        "unique_constraint",
+        "foreign_key_constraint",
+    ],
+    reflected: bool,  # noqa: ARG001
+    compare_to: SchemaItem | None,  # noqa: ARG001
+) -> bool:
+    if type_ == "table" and name in EXCLUDE_TABLES:
+        return False
+    return True
+

 def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode.
@@ -62,6 +85,7 @@ def do_run_migrations(connection: Connection) -> None:
    context.configure(
        connection=connection,
        target_metadata=target_metadata,  # type: ignore[arg-type]
+        include_object=include_object,
    )

    with context.begin_transaction():
--- a/backend/ee/onyx/background/celery/tasks/cloud/tasks.py
+++ b/backend/ee/onyx/background/celery/tasks/cloud/tasks.py
@@ -5,7 +5,6 @@ from celery import Task
 from celery.exceptions import SoftTimeLimitExceeded
 from redis.lock import Lock as RedisLock

-from ee.onyx.server.tenants.product_gating import get_gated_tenants
 from onyx.background.celery.apps.app_base import task_logger
 from onyx.background.celery.tasks.beat_schedule import BEAT_EXPIRES_DEFAULT
 from onyx.configs.constants import CELERY_GENERIC_BEAT_LOCK_TIMEOUT
@@ -31,7 +30,6 @@ def cloud_beat_task_generator(
    queue: str = OnyxCeleryTask.DEFAULT,
    priority: int = OnyxCeleryPriority.MEDIUM,
    expires: int = BEAT_EXPIRES_DEFAULT,
-    skip_gated: bool = True,
 ) -> bool | None:
    """a lightweight task used to kick off individual beat tasks per tenant."""
    time_start = time.monotonic()
@@ -50,22 +48,20 @@ def cloud_beat_task_generator(
    last_lock_time = time.monotonic()
    tenant_ids: list[str] = []
    num_processed_tenants = 0
-    num_skipped_gated = 0

    try:
        tenant_ids = get_all_tenant_ids()

-        # Per-task control over whether gated tenants are included. Most periodic tasks
-        # do no useful work on gated tenants and just waste DB connections fanning out
-        # to ~10k+ inactive tenants. A small number of cleanup tasks (connector deletion,
-        # checkpoint/index attempt cleanup) need to run on gated tenants and pass
-        # `skip_gated=False` from the beat schedule.
-        gated_tenants: set[str] = get_gated_tenants() if skip_gated else set()
+        # NOTE: for now, we are running tasks for gated tenants, since we want to allow
+        # connector deletion to run successfully. The new plan is to continously prune
+        # the gated tenants set, so we won't have a build up of old, unused gated tenants.
+        # Keeping this around in case we want to revert to the previous behavior.
+        # gated_tenants = get_gated_tenants()

        for tenant_id in tenant_ids:
-            if tenant_id in gated_tenants:
-                num_skipped_gated += 1
-                continue
+            # Same comment here as the above NOTE
+            # if tenant_id in gated_tenants:
+            #     continue

            current_time = time.monotonic()
            if current_time - last_lock_time >= (CELERY_GENERIC_BEAT_LOCK_TIMEOUT / 4):
@@ -108,7 +104,6 @@ def cloud_beat_task_generator(
        f"cloud_beat_task_generator finished: "
        f"task={task_name} "
        f"num_processed_tenants={num_processed_tenants} "
-        f"num_skipped_gated={num_skipped_gated} "
        f"num_tenants={len(tenant_ids)} "
        f"elapsed={time_elapsed:.2f}"
    )
--- a/backend/ee/onyx/background/celery/tasks/tenant_provisioning/tasks.py
+++ b/backend/ee/onyx/background/celery/tasks/tenant_provisioning/tasks.py
@@ -27,13 +27,13 @@ from shared_configs.configs import MULTI_TENANT
 from shared_configs.configs import TENANT_ID_PREFIX

 # Maximum tenants to provision in a single task run.
-# Each tenant takes ~80s (alembic migrations), so 15 tenants ≈ 20 minutes.
-_MAX_TENANTS_PER_RUN = 15
+# Each tenant takes ~80s (alembic migrations), so 5 tenants ≈ 7 minutes.
+_MAX_TENANTS_PER_RUN = 5

 # Time limits sized for worst-case: provisioning up to _MAX_TENANTS_PER_RUN new tenants
 # (~90s each) plus migrating up to TARGET_AVAILABLE_TENANTS pool tenants (~90s each).
-_TENANT_PROVISIONING_SOFT_TIME_LIMIT = 60 * 40  # 40 minutes
-_TENANT_PROVISIONING_TIME_LIMIT = 60 * 45  # 45 minutes
+_TENANT_PROVISIONING_SOFT_TIME_LIMIT = 60 * 20  # 20 minutes
+_TENANT_PROVISIONING_TIME_LIMIT = 60 * 25  # 25 minutes


@shared_task(
--- a/backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py
+++ b/backend/ee/onyx/background/celery/tasks/ttl_management/tasks.py
@@ -1,14 +1,20 @@
+from datetime import datetime
+from datetime import timezone
 from uuid import UUID

 from celery import shared_task
 from celery import Task

 from ee.onyx.background.celery_utils import should_perform_chat_ttl_check
+from ee.onyx.background.task_name_builders import name_chat_ttl_task
 from onyx.configs.app_configs import JOB_TIMEOUT
 from onyx.configs.constants import OnyxCeleryTask
 from onyx.db.chat import delete_chat_session
 from onyx.db.chat import get_chat_sessions_older_than
 from onyx.db.engine.sql_engine import get_session_with_current_tenant
+from onyx.db.enums import TaskStatus
+from onyx.db.tasks import mark_task_as_finished_with_id
+from onyx.db.tasks import register_task
 from onyx.server.settings.store import load_settings
 from onyx.utils.logger import setup_logger

@@ -23,42 +29,59 @@ logger = setup_logger()
    trail=False,
 )
 def perform_ttl_management_task(
-    self: Task, retention_limit_days: int, *, tenant_id: str  # noqa: ARG001
+    self: Task, retention_limit_days: int, *, tenant_id: str
 ) -> None:
    task_id = self.request.id
    if not task_id:
        raise RuntimeError("No task id defined for this task; cannot identify it")

+    start_time = datetime.now(tz=timezone.utc)
+
    user_id: UUID | None = None
    session_id: UUID | None = None
    try:
        with get_session_with_current_tenant() as db_session:
+            # we generally want to move off this, but keeping for now
+            register_task(
+                db_session=db_session,
+                task_name=name_chat_ttl_task(retention_limit_days, tenant_id),
+                task_id=task_id,
+                status=TaskStatus.STARTED,
+                start_time=start_time,
+            )

            old_chat_sessions = get_chat_sessions_older_than(
                retention_limit_days, db_session
            )

        for user_id, session_id in old_chat_sessions:
-            try:
-                with get_session_with_current_tenant() as db_session:
-                    delete_chat_session(
-                        user_id,
-                        session_id,
-                        db_session,
-                        include_deleted=True,
-                        hard_delete=True,
-                    )
-            except Exception:
-                logger.exception(
-                    "Failed to delete chat session "
-                    f"user_id={user_id} session_id={session_id}, "
-                    "continuing with remaining sessions"
+            # one session per delete so that we don't blow up if a deletion fails.
+            with get_session_with_current_tenant() as db_session:
+                delete_chat_session(
+                    user_id,
+                    session_id,
+                    db_session,
+                    include_deleted=True,
+                    hard_delete=True,
                )

+        with get_session_with_current_tenant() as db_session:
+            mark_task_as_finished_with_id(
+                db_session=db_session,
+                task_id=task_id,
+                success=True,
+            )
+
    except Exception:
        logger.exception(
            f"delete_chat_session exceptioned. user_id={user_id} session_id={session_id}"
        )
+        with get_session_with_current_tenant() as db_session:
+            mark_task_as_finished_with_id(
+                db_session=db_session,
+                task_id=task_id,
+                success=False,
+            )
        raise


--- a/backend/onyx/background/README.md
+++ b/backend/onyx/background/README.md
@@ -1,7 +1,6 @@
 # Overview of Onyx Background Jobs

 The background jobs take care of:
-
 1. Pulling/Indexing documents (from connectors)
 2. Updating document metadata (from connectors)
 3. Cleaning up checkpoints and logic around indexing work (indexing indexing checkpoints and index attempt metadata)
@@ -10,41 +9,37 @@ The background jobs take care of:

 ## Worker → Queue Mapping

-| Worker                    | File                           | Queues                                                                                                               |
-| ------------------------- | ------------------------------ | -------------------------------------------------------------------------------------------------------------------- |
-| Primary                   | `apps/primary.py`              | `celery`                                                                                                             |
-| Light                     | `apps/light.py`                | `vespa_metadata_sync`, `connector_deletion`, `doc_permissions_upsert`, `checkpoint_cleanup`, `index_attempt_cleanup` |
-| Heavy                     | `apps/heavy.py`                | `connector_pruning`, `connector_doc_permissions_sync`, `connector_external_group_sync`, `csv_generation`, `sandbox`  |
-| Docprocessing             | `apps/docprocessing.py`        | `docprocessing`                                                                                                      |
-| Docfetching               | `apps/docfetching.py`          | `connector_doc_fetching`                                                                                             |
-| User File Processing      | `apps/user_file_processing.py` | `user_file_processing`, `user_file_project_sync`, `user_file_delete`                                                 |
-| Monitoring                | `apps/monitoring.py`           | `monitoring`                                                                                                         |
-| Background (consolidated) | `apps/background.py`           | All queues above except `celery`                                                                                     |
+| Worker | File | Queues |
+|--------|------|--------|
+| Primary | `apps/primary.py` | `celery` |
+| Light | `apps/light.py` | `vespa_metadata_sync`, `connector_deletion`, `doc_permissions_upsert`, `checkpoint_cleanup`, `index_attempt_cleanup` |
+| Heavy | `apps/heavy.py` | `connector_pruning`, `connector_doc_permissions_sync`, `connector_external_group_sync`, `csv_generation`, `sandbox` |
+| Docprocessing | `apps/docprocessing.py` | `docprocessing` |
+| Docfetching | `apps/docfetching.py` | `connector_doc_fetching` |
+| User File Processing | `apps/user_file_processing.py` | `user_file_processing`, `user_file_project_sync`, `user_file_delete` |
+| Monitoring | `apps/monitoring.py` | `monitoring` |
+| Background (consolidated) | `apps/background.py` | All queues above except `celery` |

 ## Non-Worker Apps
-
-| App        | File        | Purpose                                                                                               |
-| ---------- | ----------- | ----------------------------------------------------------------------------------------------------- |
-| **Beat**   | `beat.py`   | Celery beat scheduler with `DynamicTenantScheduler` that generates per-tenant periodic task schedules |
-| **Client** | `client.py` | Minimal app for task submission from non-worker processes (e.g., API server)                          |
+| App | File | Purpose |
+|-----|------|---------|
+| **Beat** | `beat.py` | Celery beat scheduler with `DynamicTenantScheduler` that generates per-tenant periodic task schedules |
+| **Client** | `client.py` | Minimal app for task submission from non-worker processes (e.g., API server) |

 ### Shared Module
-
 `app_base.py` provides:
-
 - `TenantAwareTask` - Base task class that sets tenant context
 - Signal handlers for logging, cleanup, and lifecycle events
 - Readiness probes and health checks

+
 ## Worker Details

 ### Primary (Coordinator and task dispatcher)
-
 It is the single worker which handles tasks from the default celery queue. It is a singleton worker ensured by the `PRIMARY_WORKER` Redis lock
 which it touches every `CELERY_PRIMARY_WORKER_LOCK_TIMEOUT / 8` seconds (using Celery Bootsteps)

 On startup:
-
 - waits for redis, postgres, document index to all be healthy
 - acquires the singleton lock
 - cleans all the redis states associated with background jobs
@@ -52,34 +47,34 @@ On startup:

 Then it cycles through its tasks as scheduled by Celery Beat:

-| Task                              | Frequency | Description                                                                                |
-| --------------------------------- | --------- | ------------------------------------------------------------------------------------------ |
-| `check_for_indexing`              | 15s       | Scans for connectors needing indexing → dispatches to `DOCFETCHING` queue                  |
-| `check_for_vespa_sync_task`       | 20s       | Finds stale documents/document sets → dispatches sync tasks to `VESPA_METADATA_SYNC` queue |
-| `check_for_pruning`               | 20s       | Finds connectors due for pruning → dispatches to `CONNECTOR_PRUNING` queue                 |
-| `check_for_connector_deletion`    | 20s       | Processes deletion requests → dispatches to `CONNECTOR_DELETION` queue                     |
-| `check_for_user_file_processing`  | 20s       | Checks for user uploads → dispatches to `USER_FILE_PROCESSING` queue                       |
-| `check_for_checkpoint_cleanup`    | 1h        | Cleans up old indexing checkpoints                                                         |
-| `check_for_index_attempt_cleanup` | 30m       | Cleans up old index attempts                                                               |
-| `celery_beat_heartbeat`           | 1m        | Heartbeat for Beat watchdog                                                                |
+| Task | Frequency | Description |
+|------|-----------|-------------|
+| `check_for_indexing` | 15s | Scans for connectors needing indexing → dispatches to `DOCFETCHING` queue |
+| `check_for_vespa_sync_task` | 20s | Finds stale documents/document sets → dispatches sync tasks to `VESPA_METADATA_SYNC` queue |
+| `check_for_pruning` | 20s | Finds connectors due for pruning → dispatches to `CONNECTOR_PRUNING` queue |
+| `check_for_connector_deletion` | 20s | Processes deletion requests → dispatches to `CONNECTOR_DELETION` queue |
+| `check_for_user_file_processing` | 20s | Checks for user uploads → dispatches to `USER_FILE_PROCESSING` queue |
+| `check_for_checkpoint_cleanup` | 1h | Cleans up old indexing checkpoints |
+| `check_for_index_attempt_cleanup` | 30m | Cleans up old index attempts |
+| `kombu_message_cleanup_task` | periodic | Cleans orphaned Kombu messages from DB (Kombu being the messaging framework used by Celery) |
+| `celery_beat_heartbeat` | 1m | Heartbeat for Beat watchdog |

 Watchdog is a separate Python process managed by supervisord which runs alongside celery workers. It checks the ONYX_CELERY_BEAT_HEARTBEAT_KEY in
 Redis to ensure Celery Beat is not dead. Beat schedules the celery_beat_heartbeat for Primary to touch the key and share that it's still alive.
 See supervisord.conf for watchdog config.

-### Light

+### Light
 Fast and short living tasks that are not resource intensive. High concurrency:
 Can have 24 concurrent workers, each with a prefetch of 8 for a total of 192 tasks in flight at once.

 Tasks it handles:
-
 - Syncs access/permissions, document sets, boosts, hidden state
 - Deletes documents that are marked for deletion in Postgres
 - Cleanup of checkpoints and index attempts

-### Heavy

+### Heavy
 Long running, resource intensive tasks, handles pruning and sandbox operations. Low concurrency - max concurrency of 4 with 1 prefetch.

 Does not interact with the Document Index, it handles the syncs with external systems. Large volume API calls to handle pruning and fetching permissions, etc.
@@ -88,24 +83,16 @@ Generates CSV exports which may take a long time with significant data in Postgr

 Sandbox (new feature) for running Next.js, Python virtual env, OpenCode AI Agent, and access to knowledge files

+
 ### Docprocessing, Docfetching, User File Processing
-
 Docprocessing and Docfetching are for indexing documents:
-
 - Docfetching runs connectors to pull documents from external APIs (Google Drive, Confluence, etc.), stores batches to file storage, and dispatches docprocessing tasks
- Docprocessing retrieves batches, runs the indexing pipeline (chunking, embedding), and indexes into the Document Index
- User Files come from uploads directly via the input bar
+- Docprocessing retrieves batches, runs the indexing pipeline (chunking, embedding), and indexes into the Document Index 
+User Files come from uploads directly via the input bar
+

 ### Monitoring
-
 Observability and metrics collections:
-
- Queue lengths, connector success/failure, connector latencies
+- Queue lengths, connector success/failure, lconnector latencies
 - Memory of supervisor managed processes (workers, beat, slack)
 - Cloud and multitenant specific monitorings
-
-## Prometheus Metrics
-
-Workers can expose Prometheus metrics via a standalone HTTP server. Currently docfetching and docprocessing have push-based task lifecycle metrics; the monitoring worker runs pull-based collectors for queue depth and connector health.
-
-For the full metric reference, integration guide, and PromQL examples, see [`docs/METRICS.md`](../../../docs/METRICS.md#celery-worker-metrics).
--- a/backend/onyx/background/celery/apps/heavy.py
+++ b/backend/onyx/background/celery/apps/heavy.py
@@ -13,6 +13,12 @@ from celery.signals import worker_shutdown
 import onyx.background.celery.apps.app_base as app_base
 from onyx.configs.constants import POSTGRES_CELERY_WORKER_HEAVY_APP_NAME
 from onyx.db.engine.sql_engine import SqlEngine
+from onyx.server.metrics.celery_task_metrics import on_celery_task_postrun
+from onyx.server.metrics.celery_task_metrics import on_celery_task_prerun
+from onyx.server.metrics.celery_task_metrics import on_celery_task_rejected
+from onyx.server.metrics.celery_task_metrics import on_celery_task_retry
+from onyx.server.metrics.celery_task_metrics import on_celery_task_revoked
+from onyx.server.metrics.metrics_server import start_metrics_server
 from onyx.utils.logger import setup_logger
 from shared_configs.configs import MULTI_TENANT

@@ -34,6 +40,7 @@ def on_task_prerun(
    **kwds: Any,
 ) -> None:
    app_base.on_task_prerun(sender, task_id, task, args, kwargs, **kwds)
+    on_celery_task_prerun(task_id, task)


@signals.task_postrun.connect
@@ -48,6 +55,31 @@ def on_task_postrun(
    **kwds: Any,
 ) -> None:
    app_base.on_task_postrun(sender, task_id, task, args, kwargs, retval, state, **kwds)
+    on_celery_task_postrun(task_id, task, state)
+
+
+@signals.task_retry.connect
+def on_task_retry(sender: Any | None = None, **kwargs: Any) -> None:  # noqa: ARG001
+    task_id = getattr(getattr(sender, "request", None), "id", None)
+    on_celery_task_retry(task_id, sender)
+
+
+@signals.task_revoked.connect
+def on_task_revoked(sender: Any | None = None, **kwargs: Any) -> None:
+    task_name = getattr(sender, "name", None) or str(sender)
+    on_celery_task_revoked(kwargs.get("task_id"), task_name)
+
+
+@signals.task_rejected.connect
+def on_task_rejected(sender: Any | None = None, **kwargs: Any) -> None:  # noqa: ARG001
+    message = kwargs.get("message")
+    task_name: str | None = None
+    if message is not None:
+        headers = getattr(message, "headers", None) or {}
+        task_name = headers.get("task")
+    if task_name is None:
+        task_name = "unknown"
+    on_celery_task_rejected(None, task_name)


@celeryd_init.connect
@@ -76,6 +108,7 @@ def on_worker_init(sender: Worker, **kwargs: Any) -> None:

@worker_ready.connect
 def on_worker_ready(sender: Any, **kwargs: Any) -> None:
+    start_metrics_server("heavy")
    app_base.on_worker_ready(sender, **kwargs)


--- a/backend/onyx/background/celery/apps/primary.py
+++ b/backend/onyx/background/celery/apps/primary.py
@@ -317,6 +317,7 @@ celery_app.autodiscover_tasks(
            "onyx.background.celery.tasks.docprocessing",
            "onyx.background.celery.tasks.evals",
            "onyx.background.celery.tasks.hierarchyfetching",
+            "onyx.background.celery.tasks.periodic",
            "onyx.background.celery.tasks.pruning",
            "onyx.background.celery.tasks.shared",
            "onyx.background.celery.tasks.vespa",
--- a/backend/onyx/background/celery/tasks/beat_schedule.py
+++ b/backend/onyx/background/celery/tasks/beat_schedule.py
@@ -75,8 +75,6 @@ beat_task_templates: list[dict] = [
        "options": {
            "priority": OnyxCeleryPriority.LOW,
            "expires": BEAT_EXPIRES_DEFAULT,
-            # Run on gated tenants too — they may still have stale checkpoints to clean.
-            "skip_gated": False,
        },
    },
    {
@@ -86,8 +84,6 @@ beat_task_templates: list[dict] = [
        "options": {
            "priority": OnyxCeleryPriority.MEDIUM,
            "expires": BEAT_EXPIRES_DEFAULT,
-            # Run on gated tenants too — they may still have stale index attempts.
-            "skip_gated": False,
        },
    },
    {
@@ -97,8 +93,6 @@ beat_task_templates: list[dict] = [
        "options": {
            "priority": OnyxCeleryPriority.MEDIUM,
            "expires": BEAT_EXPIRES_DEFAULT,
-            # Gated tenants may still have connectors awaiting deletion.
-            "skip_gated": False,
        },
    },
    {
@@ -272,7 +266,7 @@ def make_cloud_generator_task(task: dict[str, Any]) -> dict[str, Any]:
    cloud_task["kwargs"] = {}
    cloud_task["kwargs"]["task_name"] = task["task"]

-    optional_fields = ["queue", "priority", "expires", "skip_gated"]
+    optional_fields = ["queue", "priority", "expires"]
    for field in optional_fields:
        if field in task["options"]:
            cloud_task["kwargs"][field] = task["options"][field]
@@ -308,7 +302,7 @@ beat_cloud_tasks: list[dict] = [
    {
        "name": f"{ONYX_CLOUD_CELERY_TASK_PREFIX}_check-available-tenants",
        "task": OnyxCeleryTask.CLOUD_CHECK_AVAILABLE_TENANTS,
-        "schedule": timedelta(minutes=2),
+        "schedule": timedelta(minutes=10),
        "options": {
            "queue": OnyxCeleryQueues.MONITORING,
            "priority": OnyxCeleryPriority.HIGH,
@@ -365,13 +359,7 @@ if not MULTI_TENANT:
        ]
    )

-    # `skip_gated` is a cloud-only hint consumed by `cloud_beat_task_generator`. Strip
-    # it before extending the self-hosted schedule so it doesn't leak into apply_async
-    # as an unrecognised option on every fired task message.
-    for _template in beat_task_templates:
-        _self_hosted_template = copy.deepcopy(_template)
-        _self_hosted_template["options"].pop("skip_gated", None)
-        tasks_to_schedule.append(_self_hosted_template)
+    tasks_to_schedule.extend(beat_task_templates)


 def generate_cloud_tasks(
--- a/backend/onyx/background/celery/tasks/opensearch_migration/tasks.py
+++ b/backend/onyx/background/celery/tasks/opensearch_migration/tasks.py
@@ -36,7 +36,6 @@ from onyx.configs.constants import OnyxRedisLocks
 from onyx.db.engine.sql_engine import get_session_with_current_tenant
 from onyx.db.opensearch_migration import build_sanitized_to_original_doc_id_mapping
 from onyx.db.opensearch_migration import get_vespa_visit_state
-from onyx.db.opensearch_migration import is_migration_completed
 from onyx.db.opensearch_migration import (
    mark_migration_completed_time_if_not_set_with_commit,
 )
@@ -107,19 +106,14 @@ def migrate_chunks_from_vespa_to_opensearch_task(
            acquired; effectively a no-op. True if the task completed
            successfully. False if the task errored.
    """
-    # 1. Check if we should run the task.
-    # 1.a. If OpenSearch indexing is disabled, we don't run the task.
    if not ENABLE_OPENSEARCH_INDEXING_FOR_ONYX:
        task_logger.warning(
            "OpenSearch migration is not enabled, skipping chunk migration task."
        )
        return None
+
    task_logger.info("Starting chunk-level migration from Vespa to OpenSearch.")
    task_start_time = time.monotonic()
-
-    # 1.b. Only one instance per tenant of this task may run concurrently at
-    # once. If we fail to acquire a lock, we assume it is because another task
-    # has one and we exit.
    r = get_redis_client()
    lock: RedisLock = r.lock(
        name=OnyxRedisLocks.OPENSEARCH_MIGRATION_BEAT_LOCK,
@@ -142,11 +136,10 @@ def migrate_chunks_from_vespa_to_opensearch_task(
            f"Token: {lock.local.token}"
        )

-    # 2. Prepare to migrate.
    total_chunks_migrated_this_task = 0
    total_chunks_errored_this_task = 0
    try:
-        # 2.a. Double-check that tenant info is correct.
+        # Double check that tenant info is correct.
        if tenant_id != get_current_tenant_id():
            err_str = (
                f"Tenant ID mismatch in the OpenSearch migration task: "
@@ -155,62 +148,16 @@ def migrate_chunks_from_vespa_to_opensearch_task(
            task_logger.error(err_str)
            return False

-        # Do as much as we can with a DB session in one spot to not hold a
-        # session during a migration batch.
-        with get_session_with_current_tenant() as db_session:
-            # 2.b. Immediately check to see if this tenant is done, to save
-            # having to do any other work. This function does not require a
-            # migration record to necessarily exist.
-            if is_migration_completed(db_session):
-                return True
-
-            # 2.c. Try to insert the OpenSearchTenantMigrationRecord table if it
-            # does not exist.
+        with (
+            get_session_with_current_tenant() as db_session,
+            get_vespa_http_client(
+                timeout=VESPA_MIGRATION_REQUEST_TIMEOUT_S
+            ) as vespa_client,
+        ):
            try_insert_opensearch_tenant_migration_record_with_commit(db_session)
-
-            # 2.d. Get search settings.
            search_settings = get_current_search_settings(db_session)
-            indexing_setting = IndexingSetting.from_db_model(search_settings)
-
-            # 2.e. Build sanitized to original doc ID mapping to check for
-            # conflicts in the event we sanitize a doc ID to an
-            # already-existing doc ID.
-            # We reconstruct this mapping for every task invocation because
-            # a document may have been added in the time between two tasks.
-            sanitized_doc_start_time = time.monotonic()
-            sanitized_to_original_doc_id_mapping = (
-                build_sanitized_to_original_doc_id_mapping(db_session)
-            )
-            task_logger.debug(
-                f"Built sanitized_to_original_doc_id_mapping with {len(sanitized_to_original_doc_id_mapping)} entries "
-                f"in {time.monotonic() - sanitized_doc_start_time:.3f} seconds."
-            )
-
-            # 2.f. Get the current migration state.
-            continuation_token_map, total_chunks_migrated = get_vespa_visit_state(
-                db_session
-            )
-            # 2.f.1. Double-check that the migration state does not imply
-            # completion. Really we should never have to enter this block as we
-            # would expect is_migration_completed to return True, but in the
-            # strange event that the migration is complete but the migration
-            # completed time was never stamped, we do so here.
-            if is_continuation_token_done_for_all_slices(continuation_token_map):
-                task_logger.info(
-                    f"OpenSearch migration COMPLETED for tenant {tenant_id}. Total chunks migrated: {total_chunks_migrated}."
-                )
-                mark_migration_completed_time_if_not_set_with_commit(db_session)
-                return True
-        task_logger.debug(
-            f"Read the tenant migration record. Total chunks migrated: {total_chunks_migrated}. "
-            f"Continuation token map: {continuation_token_map}"
-        )
-
-        with get_vespa_http_client(
-            timeout=VESPA_MIGRATION_REQUEST_TIMEOUT_S
-        ) as vespa_client:
-            # 2.g. Create the OpenSearch and Vespa document indexes.
            tenant_state = TenantState(tenant_id=tenant_id, multitenant=MULTI_TENANT)
+            indexing_setting = IndexingSetting.from_db_model(search_settings)
            opensearch_document_index = OpenSearchDocumentIndex(
                tenant_state=tenant_state,
                index_name=search_settings.index_name,
@@ -224,14 +171,22 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                httpx_client=vespa_client,
            )

-            # 2.h. Get the approximate chunk count in Vespa as of this time to
-            # update the migration record.
+            sanitized_doc_start_time = time.monotonic()
+            # We reconstruct this mapping for every task invocation because a
+            # document may have been added in the time between two tasks.
+            sanitized_to_original_doc_id_mapping = (
+                build_sanitized_to_original_doc_id_mapping(db_session)
+            )
+            task_logger.debug(
+                f"Built sanitized_to_original_doc_id_mapping with {len(sanitized_to_original_doc_id_mapping)} entries "
+                f"in {time.monotonic() - sanitized_doc_start_time:.3f} seconds."
+            )
+
            approx_chunk_count_in_vespa: int | None = None
            get_chunk_count_start_time = time.monotonic()
            try:
                approx_chunk_count_in_vespa = vespa_document_index.get_chunk_count()
            except Exception:
-                # This failure should not be blocking.
                task_logger.exception(
                    "Error getting approximate chunk count in Vespa. Moving on..."
                )
@@ -240,12 +195,25 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                f"approximate chunk count in Vespa. Got {approx_chunk_count_in_vespa}."
            )

-            # 3. Do the actual migration in batches until we run out of time.
            while (
                time.monotonic() - task_start_time < MIGRATION_TASK_SOFT_TIME_LIMIT_S
                and lock.owned()
            ):
-                # 3.a. Get the next batch of raw chunks from Vespa.
+                (
+                    continuation_token_map,
+                    total_chunks_migrated,
+                ) = get_vespa_visit_state(db_session)
+                if is_continuation_token_done_for_all_slices(continuation_token_map):
+                    task_logger.info(
+                        f"OpenSearch migration COMPLETED for tenant {tenant_id}. Total chunks migrated: {total_chunks_migrated}."
+                    )
+                    mark_migration_completed_time_if_not_set_with_commit(db_session)
+                    break
+                task_logger.debug(
+                    f"Read the tenant migration record. Total chunks migrated: {total_chunks_migrated}. "
+                    f"Continuation token map: {continuation_token_map}"
+                )
+
                get_vespa_chunks_start_time = time.monotonic()
                raw_vespa_chunks, next_continuation_token_map = (
                    vespa_document_index.get_all_raw_document_chunks_paginated(
@@ -258,7 +226,6 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                    f"seconds. Next continuation token map: {next_continuation_token_map}"
                )

-                # 3.b. Transform the raw chunks to OpenSearch chunks in memory.
                opensearch_document_chunks, errored_chunks = (
                    transform_vespa_chunks_to_opensearch_chunks(
                        raw_vespa_chunks,
@@ -273,7 +240,6 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                        "errored."
                    )

-                # 3.c. Index the OpenSearch chunks into OpenSearch.
                index_opensearch_chunks_start_time = time.monotonic()
                opensearch_document_index.index_raw_chunks(
                    chunks=opensearch_document_chunks
@@ -285,38 +251,12 @@ def migrate_chunks_from_vespa_to_opensearch_task(

                total_chunks_migrated_this_task += len(opensearch_document_chunks)
                total_chunks_errored_this_task += len(errored_chunks)
-
-                # Do as much as we can with a DB session in one spot to not hold a
-                # session during a migration batch.
-                with get_session_with_current_tenant() as db_session:
-                    # 3.d. Update the migration state.
-                    update_vespa_visit_progress_with_commit(
-                        db_session,
-                        continuation_token_map=next_continuation_token_map,
-                        chunks_processed=len(opensearch_document_chunks),
-                        chunks_errored=len(errored_chunks),
-                        approx_chunk_count_in_vespa=approx_chunk_count_in_vespa,
-                    )
-
-                    # 3.e. Get the current migration state. Even thought we
-                    # technically have it in-memory since we just wrote it, we
-                    # want to reference the DB as the source of truth at all
-                    # times.
-                    continuation_token_map, total_chunks_migrated = (
-                        get_vespa_visit_state(db_session)
-                    )
-                    # 3.e.1. Check if the migration is done.
-                    if is_continuation_token_done_for_all_slices(
-                        continuation_token_map
-                    ):
-                        task_logger.info(
-                            f"OpenSearch migration COMPLETED for tenant {tenant_id}. Total chunks migrated: {total_chunks_migrated}."
-                        )
-                        mark_migration_completed_time_if_not_set_with_commit(db_session)
-                        return True
-                task_logger.debug(
-                    f"Read the tenant migration record. Total chunks migrated: {total_chunks_migrated}. "
-                    f"Continuation token map: {continuation_token_map}"
+                update_vespa_visit_progress_with_commit(
+                    db_session,
+                    continuation_token_map=next_continuation_token_map,
+                    chunks_processed=len(opensearch_document_chunks),
+                    chunks_errored=len(errored_chunks),
+                    approx_chunk_count_in_vespa=approx_chunk_count_in_vespa,
                )
    except Exception:
        traceback.print_exc()
--- a/backend/onyx/background/celery/tasks/periodic/init.py
+++ b/backend/onyx/background/celery/tasks/periodic/init.py
--- a/backend/onyx/background/celery/tasks/periodic/tasks.py
+++ b/backend/onyx/background/celery/tasks/periodic/tasks.py
@@ -0,0 +1,138 @@
+#####
+# Periodic Tasks
+#####
+import json
+from typing import Any
+
+from celery import shared_task
+from celery.contrib.abortable import AbortableTask  # type: ignore
+from celery.exceptions import TaskRevokedError
+from sqlalchemy import inspect
+from sqlalchemy import text
+from sqlalchemy.orm import Session
+
+from onyx.background.celery.apps.app_base import task_logger
+from onyx.configs.app_configs import JOB_TIMEOUT
+from onyx.configs.constants import OnyxCeleryTask
+from onyx.configs.constants import PostgresAdvisoryLocks
+from onyx.db.engine.sql_engine import get_session_with_current_tenant
+
+
+@shared_task(
+    name=OnyxCeleryTask.KOMBU_MESSAGE_CLEANUP_TASK,
+    soft_time_limit=JOB_TIMEOUT,
+    bind=True,
+    base=AbortableTask,
+)
+def kombu_message_cleanup_task(self: Any, tenant_id: str) -> int:  # noqa: ARG001
+    """Runs periodically to clean up the kombu_message table"""
+
+    # we will select messages older than this amount to clean up
+    KOMBU_MESSAGE_CLEANUP_AGE = 7  # days
+    KOMBU_MESSAGE_CLEANUP_PAGE_LIMIT = 1000
+
+    ctx = {}
+    ctx["last_processed_id"] = 0
+    ctx["deleted"] = 0
+    ctx["cleanup_age"] = KOMBU_MESSAGE_CLEANUP_AGE
+    ctx["page_limit"] = KOMBU_MESSAGE_CLEANUP_PAGE_LIMIT
+    with get_session_with_current_tenant() as db_session:
+        # Exit the task if we can't take the advisory lock
+        result = db_session.execute(
+            text("SELECT pg_try_advisory_lock(:id)"),
+            {"id": PostgresAdvisoryLocks.KOMBU_MESSAGE_CLEANUP_LOCK_ID.value},
+        ).scalar()
+        if not result:
+            return 0
+
+        while True:
+            if self.is_aborted():
+                raise TaskRevokedError("kombu_message_cleanup_task was aborted.")
+
+            b = kombu_message_cleanup_task_helper(ctx, db_session)
+            if not b:
+                break
+
+            db_session.commit()
+
+    if ctx["deleted"] > 0:
+        task_logger.info(
+            f"Deleted {ctx['deleted']} orphaned messages from kombu_message."
+        )
+
+    return ctx["deleted"]
+
+
+def kombu_message_cleanup_task_helper(ctx: dict, db_session: Session) -> bool:
+    """
+    Helper function to clean up old messages from the `kombu_message` table that are no longer relevant.
+
+    This function retrieves messages from the `kombu_message` table that are no longer visible and
+    older than a specified interval. It checks if the corresponding task_id exists in the
+    `celery_taskmeta` table. If the task_id does not exist, the message is deleted.
+
+    Args:
+        ctx (dict): A context dictionary containing configuration parameters such as:
+            - 'cleanup_age' (int): The age in days after which messages are considered old.
+            - 'page_limit' (int): The maximum number of messages to process in one batch.
+            - 'last_processed_id' (int): The ID of the last processed message to handle pagination.
+            - 'deleted' (int): A counter to track the number of deleted messages.
+        db_session (Session): The SQLAlchemy database session for executing queries.
+
+    Returns:
+        bool: Returns True if there are more rows to process, False if not.
+    """
+
+    inspector = inspect(db_session.bind)
+    if not inspector:
+        return False
+
+    # With the move to redis as celery's broker and backend, kombu tables may not even exist.
+    # We can fail silently.
+    if not inspector.has_table("kombu_message"):
+        return False
+
+    query = text(
+        """
+    SELECT id, timestamp, payload
+    FROM kombu_message WHERE visible = 'false'
+    AND timestamp < CURRENT_TIMESTAMP - INTERVAL :interval_days
+    AND id > :last_processed_id
+    ORDER BY id
+    LIMIT :page_limit
+"""
+    )
+    kombu_messages = db_session.execute(
+        query,
+        {
+            "interval_days": f"{ctx['cleanup_age']} days",
+            "page_limit": ctx["page_limit"],
+            "last_processed_id": ctx["last_processed_id"],
+        },
+    ).fetchall()
+
+    if len(kombu_messages) == 0:
+        return False
+
+    for msg in kombu_messages:
+        payload = json.loads(msg[2])
+        task_id = payload["headers"]["id"]
+
+        # Check if task_id exists in celery_taskmeta
+        task_exists = db_session.execute(
+            text("SELECT 1 FROM celery_taskmeta WHERE task_id = :task_id"),
+            {"task_id": task_id},
+        ).fetchone()
+
+        # If task_id does not exist, delete the message
+        if not task_exists:
+            result = db_session.execute(
+                text("DELETE FROM kombu_message WHERE id = :message_id"),
+                {"message_id": msg[0]},
+            )
+            if result.rowcount > 0:  # type: ignore
+                ctx["deleted"] += 1
+
+        ctx["last_processed_id"] = msg[0]
+
+    return True
--- a/backend/onyx/background/celery/tasks/pruning/tasks.py
+++ b/backend/onyx/background/celery/tasks/pruning/tasks.py
@@ -217,7 +217,7 @@ def check_for_pruning(self: Task, *, tenant_id: str) -> bool | None:
    try:
        # the entire task needs to run frequently in order to finalize pruning

-        # but pruning only kicks off once per hour
+        # but pruning only kicks off once per min
        if not r.exists(OnyxRedisSignals.BLOCK_PRUNING):
            task_logger.info("Checking for pruning due")

--- a/backend/onyx/chat/process_message.py
+++ b/backend/onyx/chat/process_message.py
@@ -996,7 +996,6 @@ def _run_models(

    def _run_model(model_idx: int) -> None:
        """Run one LLM loop inside a worker thread, writing packets to ``merged_queue``."""
-
        model_emitter = Emitter(
            model_idx=model_idx,
            merged_queue=merged_queue,
@@ -1103,33 +1102,33 @@ def _run_models(
        finally:
            merged_queue.put((model_idx, _MODEL_DONE))

-    def _save_errored_message(model_idx: int, context: str) -> None:
-        """Save an error message to a reserved ChatMessage that failed during execution."""
+    def _delete_orphaned_message(model_idx: int, context: str) -> None:
+        """Delete a reserved ChatMessage that was never populated due to a model error."""
        try:
-            msg = db_session.get(ChatMessage, setup.reserved_messages[model_idx].id)
-            if msg is not None:
-                error_text = f"Error from {setup.model_display_names[model_idx]}: model encountered an error during generation."
-                msg.message = error_text
-                msg.error = error_text
+            orphaned = db_session.get(
+                ChatMessage, setup.reserved_messages[model_idx].id
+            )
+            if orphaned is not None:
+                db_session.delete(orphaned)
                db_session.commit()
        except Exception:
            logger.exception(
-                "%s error save failed for model %d (%s)",
+                "%s orphan cleanup failed for model %d (%s)",
                context,
                model_idx,
                setup.model_display_names[model_idx],
            )

-    # Each worker thread needs its own Context copy — a single Context object
-    # cannot be entered concurrently by multiple threads (RuntimeError).
+    # Copy contextvars before submitting futures — ThreadPoolExecutor does NOT
+    # auto-propagate contextvars in Python 3.11; threads would inherit a blank context.
+    worker_context = contextvars.copy_context()
    executor = ThreadPoolExecutor(
        max_workers=n_models, thread_name_prefix="multi-model"
    )
    completion_persisted: bool = False
    try:
        for i in range(n_models):
-            ctx = contextvars.copy_context()
-            executor.submit(ctx.run, _run_model, i)
+            executor.submit(worker_context.run, _run_model, i)

        # ── Main thread: merge and yield packets ────────────────────────────
        models_remaining = n_models
@@ -1146,7 +1145,7 @@ def _run_models(
                    #   save "stopped by user" for a model that actually threw an exception.
                    for i in range(n_models):
                        if model_errored[i]:
-                            _save_errored_message(i, "stop-button")
+                            _delete_orphaned_message(i, "stop-button")
                            continue
                        try:
                            succeeded = model_succeeded[i]
@@ -1212,7 +1211,7 @@ def _run_models(
        for i in range(n_models):
            if not model_succeeded[i]:
                # Model errored — delete its orphaned reserved message.
-                _save_errored_message(i, "normal")
+                _delete_orphaned_message(i, "normal")
                continue
            try:
                llm_loop_completion_handle(
@@ -1265,7 +1264,7 @@ def _run_models(
                            setup.model_display_names[i],
                        )
                elif model_errored[i]:
-                    _save_errored_message(i, "disconnect")
+                    _delete_orphaned_message(i, "disconnect")
            # 4. Drain buffered packets from memory — no consumer is running.
            while not merged_queue.empty():
                try:
--- a/backend/onyx/configs/app_configs.py
+++ b/backend/onyx/configs/app_configs.py
@@ -379,14 +379,6 @@ POSTGRES_HOST = os.environ.get("POSTGRES_HOST") or "127.0.0.1"
 POSTGRES_PORT = os.environ.get("POSTGRES_PORT") or "5432"
 POSTGRES_DB = os.environ.get("POSTGRES_DB") or "postgres"
 AWS_REGION_NAME = os.environ.get("AWS_REGION_NAME") or "us-east-2"
-# Comma-separated replica / multi-host list. If unset, defaults to POSTGRES_HOST
-# only.
-_POSTGRES_HOSTS_STR = os.environ.get("POSTGRES_HOSTS", "").strip()
-POSTGRES_HOSTS: list[str] = (
-    [h.strip() for h in _POSTGRES_HOSTS_STR.split(",") if h.strip()]
-    if _POSTGRES_HOSTS_STR
-    else [POSTGRES_HOST]
-)

 POSTGRES_API_SERVER_POOL_SIZE = int(
    os.environ.get("POSTGRES_API_SERVER_POOL_SIZE") or 40
--- a/backend/onyx/configs/constants.py
+++ b/backend/onyx/configs/constants.py
@@ -12,11 +12,6 @@ SLACK_USER_TOKEN_PREFIX = "xoxp-"
 SLACK_BOT_TOKEN_PREFIX = "xoxb-"
 ONYX_EMAILABLE_LOGO_MAX_DIM = 512

-# The mask_string() function in encryption.py uses "•" (U+2022 BULLET) to mask secrets.
-MASK_CREDENTIAL_CHAR = "\u2022"
-# Pattern produced by mask_string for strings >= 14 chars: "abcd...wxyz" (exactly 11 chars)
-MASK_CREDENTIAL_LONG_RE = re.compile(r"^.{4}\.{3}.{4}$")
-
 SOURCE_TYPE = "source_type"
 # stored in the `metadata` of a chunk. Used to signify that this chunk should
 # not be used for QA. For example, Google Drive file types which can't be parsed
@@ -396,6 +391,10 @@ class MilestoneRecordType(str, Enum):
    REQUESTED_CONNECTOR = "requested_connector"


+class PostgresAdvisoryLocks(Enum):
+    KOMBU_MESSAGE_CLEANUP_LOCK_ID = auto()
+
+
 class OnyxCeleryQueues:
    # "celery" is the default queue defined by celery and also the queue
    # we are running in the primary worker to run system tasks
@@ -578,6 +577,7 @@ class OnyxCeleryTask:
    MONITOR_PROCESS_MEMORY = "monitor_process_memory"
    CELERY_BEAT_HEARTBEAT = "celery_beat_heartbeat"

+    KOMBU_MESSAGE_CLEANUP_TASK = "kombu_message_cleanup_task"
    CONNECTOR_PERMISSION_SYNC_GENERATOR_TASK = (
        "connector_permission_sync_generator_task"
    )
--- a/backend/onyx/connectors/notion/connector.py
+++ b/backend/onyx/connectors/notion/connector.py
@@ -44,7 +44,7 @@ _NOTION_CALL_TIMEOUT = 30  # 30 seconds
 _MAX_PAGES = 1000


-# TODO: Pages need to have their metadata ingested
+# TODO: Tables need to be ingested, Pages need to have their metadata ingested


 class NotionPage(BaseModel):
@@ -452,19 +452,6 @@ class NotionConnector(LoadConnector, PollConnector):
            sub_inner_dict: dict[str, Any] | list[Any] | str = inner_dict
            while isinstance(sub_inner_dict, dict) and "type" in sub_inner_dict:
                type_name = sub_inner_dict["type"]
-
-                # Notion user objects (people properties, created_by, etc.) have
-                # "name" at the same level as "type": "person"/"bot". If we drill
-                # into the person/bot sub-dict we lose the name. Capture it here
-                # before descending, but skip "title"-type properties where "name"
-                # is not the display value we want.
-                if (
-                    "name" in sub_inner_dict
-                    and isinstance(sub_inner_dict["name"], str)
-                    and type_name not in ("title",)
-                ):
-                    return sub_inner_dict["name"]
-
                sub_inner_dict = sub_inner_dict[type_name]

                # If the innermost layer is None, the value is not set
@@ -676,19 +663,6 @@ class NotionConnector(LoadConnector, PollConnector):
                            text = rich_text["text"]["content"]
                            cur_result_text_arr.append(text)

-                # table_row blocks store content in "cells" (list of lists
-                # of rich text objects) rather than "rich_text"
-                if "cells" in result_obj:
-                    row_cells: list[str] = []
-                    for cell in result_obj["cells"]:
-                        cell_texts = [
-                            rt.get("plain_text", "")
-                            for rt in cell
-                            if isinstance(rt, dict)
-                        ]
-                        row_cells.append(" ".join(cell_texts))
-                    cur_result_text_arr.append("\t".join(row_cells))
-
                if result["has_children"]:
                    if result_type == "child_page":
                        # Child pages will not be included at this top level, it will be a separate document.
--- a/backend/onyx/db/chat.py
+++ b/backend/onyx/db/chat.py
@@ -190,23 +190,16 @@ def delete_messages_and_files_from_chat_session(
    chat_session_id: UUID, db_session: Session
 ) -> None:
    # Select messages older than cutoff_time with files
-    messages_with_files = (
-        db_session.execute(
-            select(ChatMessage.id, ChatMessage.files).where(
-                ChatMessage.chat_session_id == chat_session_id,
-            )
+    messages_with_files = db_session.execute(
+        select(ChatMessage.id, ChatMessage.files).where(
+            ChatMessage.chat_session_id == chat_session_id,
        )
-        .tuples()
-        .all()
-    )
+    ).fetchall()

-    file_store = get_default_file_store()
    for _, files in messages_with_files:
+        file_store = get_default_file_store()
        for file_info in files or []:
-            if file_info.get("user_file_id"):
-                # user files are managed by the user file lifecycle
-                continue
-            file_store.delete_file(file_id=file_info["id"], error_on_missing=False)
+            file_store.delete_file(file_id=file_info.get("id"))

    # Delete ChatMessage records - CASCADE constraints will automatically handle:
    # - ChatMessage__StandardAnswer relationship records
--- a/backend/onyx/db/federated.py
+++ b/backend/onyx/db/federated.py
@@ -8,8 +8,6 @@ from sqlalchemy.orm import selectinload
 from sqlalchemy.orm import Session

 from onyx.configs.constants import FederatedConnectorSource
-from onyx.configs.constants import MASK_CREDENTIAL_CHAR
-from onyx.configs.constants import MASK_CREDENTIAL_LONG_RE
 from onyx.db.engine.sql_engine import get_session_with_current_tenant
 from onyx.db.models import DocumentSet
 from onyx.db.models import FederatedConnector
@@ -47,23 +45,6 @@ def fetch_all_federated_connectors_parallel() -> list[FederatedConnector]:
        return fetch_all_federated_connectors(db_session)


-def _reject_masked_credentials(credentials: dict[str, Any]) -> None:
-    """Raise if any credential string value contains mask placeholder characters.
-
-    mask_string() has two output formats:
-    - Short strings (< 14 chars): "••••••••••••" (U+2022 BULLET)
-    - Long strings (>= 14 chars): "abcd...wxyz" (first4 + "..." + last4)
-    Both must be rejected.
-    """
-    for key, val in credentials.items():
-        if isinstance(val, str) and (
-            MASK_CREDENTIAL_CHAR in val or MASK_CREDENTIAL_LONG_RE.match(val)
-        ):
-            raise ValueError(
-                f"Credential field '{key}' contains masked placeholder characters. Please provide the actual credential value."
-            )
-
-
 def validate_federated_connector_credentials(
    source: FederatedConnectorSource,
    credentials: dict[str, Any],
@@ -85,8 +66,6 @@ def create_federated_connector(
    config: dict[str, Any] | None = None,
 ) -> FederatedConnector:
    """Create a new federated connector with credential and config validation."""
-    _reject_masked_credentials(credentials)
-
    # Validate credentials before creating
    if not validate_federated_connector_credentials(source, credentials):
        raise ValueError(
@@ -298,8 +277,6 @@ def update_federated_connector(
    )

    if credentials is not None:
-        _reject_masked_credentials(credentials)
-
        # Validate credentials before updating
        if not validate_federated_connector_credentials(
            federated_connector.source, credentials
--- a/backend/onyx/db/llm.py
+++ b/backend/onyx/db/llm.py
@@ -236,15 +236,14 @@ def upsert_llm_provider(
        db_session.add(existing_llm_provider)

    # Filter out empty strings and None values from custom_config to allow
-    # providers like Bedrock to fall back to IAM roles when credentials are not provided.
-    # NOTE: An empty dict ({}) is preserved as-is — it signals that the provider was
-    # created via the custom modal and must be reopened with CustomModal, not a
-    # provider-specific modal. Only None means "no custom config at all".
+    # providers like Bedrock to fall back to IAM roles when credentials are not provided
    custom_config = llm_provider_upsert_request.custom_config
    if custom_config:
        custom_config = {
            k: v for k, v in custom_config.items() if v is not None and v.strip() != ""
        }
+        # Set to None if the dict is empty after filtering
+        custom_config = custom_config or None

    api_base = llm_provider_upsert_request.api_base or None
    existing_llm_provider.provider = llm_provider_upsert_request.provider
@@ -304,7 +303,16 @@ def upsert_llm_provider(
        ).delete(synchronize_session="fetch")
        db_session.flush()

+    # Import here to avoid circular imports
+    from onyx.llm.utils import get_max_input_tokens
+
    for model_config in llm_provider_upsert_request.model_configurations:
+        max_input_tokens = model_config.max_input_tokens
+        if max_input_tokens is None:
+            max_input_tokens = get_max_input_tokens(
+                model_name=model_config.name,
+                model_provider=llm_provider_upsert_request.provider,
+            )

        supported_flows = [LLMModelFlowType.CHAT]
        if model_config.supports_image_input:
@@ -317,7 +325,7 @@ def upsert_llm_provider(
                model_configuration_id=existing.id,
                supported_flows=supported_flows,
                is_visible=model_config.is_visible,
-                max_input_tokens=model_config.max_input_tokens,
+                max_input_tokens=max_input_tokens,
                display_name=model_config.display_name,
            )
        else:
@@ -327,7 +335,7 @@ def upsert_llm_provider(
                model_name=model_config.name,
                supported_flows=supported_flows,
                is_visible=model_config.is_visible,
-                max_input_tokens=model_config.max_input_tokens,
+                max_input_tokens=max_input_tokens,
                display_name=model_config.display_name,
            )

--- a/backend/onyx/db/opensearch_migration.py
+++ b/backend/onyx/db/opensearch_migration.py
@@ -324,15 +324,6 @@ def mark_migration_completed_time_if_not_set_with_commit(
    db_session.commit()


-def is_migration_completed(db_session: Session) -> bool:
-    """Returns True if the migration is completed.
-
-    Can be run even if the migration record does not exist.
-    """
-    record = db_session.query(OpenSearchTenantMigrationRecord).first()
-    return record is not None and record.migration_completed_at is not None
-
-
 def build_sanitized_to_original_doc_id_mapping(
    db_session: Session,
 ) -> dict[str, str]:
--- a/backend/onyx/document_index/opensearch/schema.py
+++ b/backend/onyx/document_index/opensearch/schema.py
@@ -1,4 +1,3 @@
-import hashlib
 from datetime import datetime
 from datetime import timezone
 from typing import Any
@@ -21,13 +20,9 @@ from onyx.document_index.opensearch.constants import DEFAULT_MAX_CHUNK_SIZE
 from onyx.document_index.opensearch.constants import EF_CONSTRUCTION
 from onyx.document_index.opensearch.constants import EF_SEARCH
 from onyx.document_index.opensearch.constants import M
-from onyx.document_index.opensearch.string_filtering import DocumentIDTooLongError
 from onyx.document_index.opensearch.string_filtering import (
    filter_and_validate_document_id,
 )
-from onyx.document_index.opensearch.string_filtering import (
-    MAX_DOCUMENT_ID_ENCODED_LENGTH,
-)
 from onyx.utils.tenant import get_tenant_id_short_string
 from shared_configs.configs import MULTI_TENANT
 from shared_configs.contextvars import get_current_tenant_id
@@ -80,50 +75,17 @@ def get_opensearch_doc_chunk_id(

    This will be the string used to identify the chunk in OpenSearch. Any direct
    chunk queries should use this function.
-
-    If the document ID is too long, a hash of the ID is used instead.
    """
-    opensearch_doc_chunk_id_suffix: str = f"__{max_chunk_size}__{chunk_index}"
-    encoded_suffix_length: int = len(opensearch_doc_chunk_id_suffix.encode("utf-8"))
-    max_encoded_permissible_doc_id_length: int = (
-        MAX_DOCUMENT_ID_ENCODED_LENGTH - encoded_suffix_length
+    sanitized_document_id = filter_and_validate_document_id(document_id)
+    opensearch_doc_chunk_id = (
+        f"{sanitized_document_id}__{max_chunk_size}__{chunk_index}"
    )
-    opensearch_doc_chunk_id_tenant_prefix: str = ""
    if tenant_state.multitenant:
-        short_tenant_id: str = get_tenant_id_short_string(tenant_state.tenant_id)
        # Use tenant ID because in multitenant mode each tenant has its own
        # Documents table, so there is a very small chance that doc IDs are not
        # actually unique across all tenants.
-        opensearch_doc_chunk_id_tenant_prefix = f"{short_tenant_id}__"
-        encoded_prefix_length: int = len(
-            opensearch_doc_chunk_id_tenant_prefix.encode("utf-8")
-        )
-        max_encoded_permissible_doc_id_length -= encoded_prefix_length
-
-    try:
-        sanitized_document_id: str = filter_and_validate_document_id(
-            document_id, max_encoded_length=max_encoded_permissible_doc_id_length
-        )
-    except DocumentIDTooLongError:
-        # If the document ID is too long, use a hash instead.
-        # We use blake2b because it is faster and equally secure as SHA256, and
-        # accepts digest_size which controls the number of bytes returned in the
-        # hash.
-        # digest_size is the size of the returned hash in bytes. Since we're
-        # decoding the hash bytes as a hex string, the digest_size should be
-        # half the max target size of the hash string.
-        # Subtract 1 because filter_and_validate_document_id compares on >= on
-        # max_encoded_length.
-        # 64 is the max digest_size blake2b returns.
-        digest_size: int = min((max_encoded_permissible_doc_id_length - 1) // 2, 64)
-        sanitized_document_id = hashlib.blake2b(
-            document_id.encode("utf-8"), digest_size=digest_size
-        ).hexdigest()
-
-    opensearch_doc_chunk_id: str = (
-        f"{opensearch_doc_chunk_id_tenant_prefix}{sanitized_document_id}{opensearch_doc_chunk_id_suffix}"
-    )
-
+        short_tenant_id = get_tenant_id_short_string(tenant_state.tenant_id)
+        opensearch_doc_chunk_id = f"{short_tenant_id}__{opensearch_doc_chunk_id}"
    # Do one more validation to ensure we haven't exceeded the max length.
    opensearch_doc_chunk_id = filter_and_validate_document_id(opensearch_doc_chunk_id)
    return opensearch_doc_chunk_id
--- a/backend/onyx/document_index/opensearch/string_filtering.py
+++ b/backend/onyx/document_index/opensearch/string_filtering.py
@@ -1,15 +1,7 @@
 import re

-MAX_DOCUMENT_ID_ENCODED_LENGTH: int = 512

-
-class DocumentIDTooLongError(ValueError):
-    """Raised when a document ID is too long for OpenSearch after filtering."""
-
-
-def filter_and_validate_document_id(
-    document_id: str, max_encoded_length: int = MAX_DOCUMENT_ID_ENCODED_LENGTH
-) -> str:
+def filter_and_validate_document_id(document_id: str) -> str:
    """
    Filters and validates a document ID such that it can be used as an ID in
    OpenSearch.
@@ -27,13 +19,9 @@ def filter_and_validate_document_id(

    Args:
        document_id: The document ID to filter and validate.
-        max_encoded_length: The maximum length of the document ID after
-            filtering in bytes. Compared with >= for extra resilience, so
-            encoded values of this length will fail.

    Raises:
-        DocumentIDTooLongError: If the document ID is too long after filtering.
-        ValueError: If the document ID is empty after filtering.
+        ValueError: If the document ID is empty or too long after filtering.

    Returns:
        str: The filtered document ID.
@@ -41,8 +29,6 @@ def filter_and_validate_document_id(
    filtered_document_id = re.sub(r"[^A-Za-z0-9_.\-~]", "", document_id)
    if not filtered_document_id:
        raise ValueError(f"Document ID {document_id} is empty after filtering.")
-    if len(filtered_document_id.encode("utf-8")) >= max_encoded_length:
-        raise DocumentIDTooLongError(
-            f"Document ID {document_id} is too long after filtering."
-        )
+    if len(filtered_document_id.encode("utf-8")) >= 512:
+        raise ValueError(f"Document ID {document_id} is too long after filtering.")
    return filtered_document_id
--- a/backend/onyx/file_processing/extract_file_text.py
+++ b/backend/onyx/file_processing/extract_file_text.py
@@ -52,21 +52,9 @@ KNOWN_OPENPYXL_BUGS = [

 def get_markitdown_converter() -> "MarkItDown":
    global _MARKITDOWN_CONVERTER
+    from markitdown import MarkItDown

    if _MARKITDOWN_CONVERTER is None:
-        from markitdown import MarkItDown
-
-        # Patch this function to effectively no-op because we were seeing this
-        # module take an inordinate amount of time to convert charts to markdown,
-        # making some powerpoint files with many or complicated charts nearly
-        # unindexable.
-        from markitdown.converters._pptx_converter import PptxConverter
-
-        setattr(
-            PptxConverter,
-            "_convert_chart_to_markdown",
-            lambda self, chart: "\n\n[chart omitted]\n\n",  # noqa: ARG005
-        )
        _MARKITDOWN_CONVERTER = MarkItDown(enable_plugins=False)
    return _MARKITDOWN_CONVERTER

@@ -217,26 +205,18 @@ def read_pdf_file(
    try:
        pdf_reader = PdfReader(file)

-        if pdf_reader.is_encrypted:
-            # Try the explicit password first, then fall back to an empty
-            # string.  Owner-password-only PDFs (permission restrictions but
-            # no open password) decrypt successfully with "".
-            # See https://github.com/onyx-dot-app/onyx/issues/9754
-            passwords = [p for p in [pdf_pass, ""] if p is not None]
+        if pdf_reader.is_encrypted and pdf_pass is not None:
            decrypt_success = False
-            for pw in passwords:
-                try:
-                    if pdf_reader.decrypt(pw) != 0:
-                        decrypt_success = True
-                        break
-                except Exception:
-                    pass
+            try:
+                decrypt_success = pdf_reader.decrypt(pdf_pass) != 0
+            except Exception:
+                logger.error("Unable to decrypt pdf")

            if not decrypt_success:
-                logger.error(
-                    "Encrypted PDF could not be decrypted, returning empty text."
-                )
                return "", metadata, []
+        elif pdf_reader.is_encrypted:
+            logger.warning("No Password for an encrypted PDF, returning empty text.")
+            return "", metadata, []

        # Basic PDF metadata
        if pdf_reader.metadata is not None:
--- a/backend/onyx/file_processing/password_validation.py
+++ b/backend/onyx/file_processing/password_validation.py
@@ -33,20 +33,8 @@ def is_pdf_protected(file: IO[Any]) -> bool:

    with preserve_position(file):
        reader = PdfReader(file)
-        if not reader.is_encrypted:
-            return False

-        # PDFs with only an owner password (permission restrictions like
-        # print/copy disabled) use an empty user password — any viewer can open
-        # them without prompting.  decrypt("") returns 0 only when a real user
-        # password is required.  See https://github.com/onyx-dot-app/onyx/issues/9754
-        try:
-            return reader.decrypt("") == 0
-        except Exception:
-            logger.exception(
-                "Failed to evaluate PDF encryption; treating as password protected"
-            )
-            return True
+    return bool(reader.is_encrypted)


 def is_docx_protected(file: IO[Any]) -> bool:
--- a/backend/onyx/file_store/file_store.py
+++ b/backend/onyx/file_store/file_store.py
@@ -136,14 +136,12 @@ class FileStore(ABC):
        """

    @abstractmethod
-    def delete_file(self, file_id: str, error_on_missing: bool = True) -> None:
+    def delete_file(self, file_id: str) -> None:
        """
        Delete a file by its ID.

        Parameters:
-        - file_id: ID of file to delete
-        - error_on_missing: If False, silently return when the file record
-          does not exist instead of raising.
+        - file_name: Name of file to delete
        """

    @abstractmethod
@@ -454,23 +452,12 @@ class S3BackedFileStore(FileStore):
            logger.warning(f"Error getting file size for {file_id}: {e}")
            return None

-    def delete_file(
-        self,
-        file_id: str,
-        error_on_missing: bool = True,
-        db_session: Session | None = None,
-    ) -> None:
+    def delete_file(self, file_id: str, db_session: Session | None = None) -> None:
        with get_session_with_current_tenant_if_none(db_session) as db_session:
            try:
-                file_record = get_filerecord_by_file_id_optional(
+                file_record = get_filerecord_by_file_id(
                    file_id=file_id, db_session=db_session
                )
-                if file_record is None:
-                    if error_on_missing:
-                        raise RuntimeError(
-                            f"File by id {file_id} does not exist or was deleted"
-                        )
-                    return
                if not file_record.bucket_name:
                    logger.error(
                        f"File record {file_id} with key {file_record.object_key} "
--- a/backend/onyx/file_store/postgres_file_store.py
+++ b/backend/onyx/file_store/postgres_file_store.py
@@ -222,23 +222,12 @@ class PostgresBackedFileStore(FileStore):
            logger.warning(f"Error getting file size for {file_id}: {e}")
            return None

-    def delete_file(
-        self,
-        file_id: str,
-        error_on_missing: bool = True,
-        db_session: Session | None = None,
-    ) -> None:
+    def delete_file(self, file_id: str, db_session: Session | None = None) -> None:
        with get_session_with_current_tenant_if_none(db_session) as session:
            try:
-                file_content = get_file_content_by_file_id_optional(
+                file_content = get_file_content_by_file_id(
                    file_id=file_id, db_session=session
                )
-                if file_content is None:
-                    if error_on_missing:
-                        raise RuntimeError(
-                            f"File content for file_id {file_id} does not exist or was deleted"
-                        )
-                    return
                raw_conn = _get_raw_connection(session)

                try:
--- a/backend/onyx/llm/constants.py
+++ b/backend/onyx/llm/constants.py
@@ -26,7 +26,6 @@ class LlmProviderNames(str, Enum):
    MISTRAL = "mistral"
    LITELLM_PROXY = "litellm_proxy"
    BIFROST = "bifrost"
-    OPENAI_COMPATIBLE = "openai_compatible"

    def __str__(self) -> str:
        """Needed so things like:
@@ -47,7 +46,6 @@ WELL_KNOWN_PROVIDER_NAMES = [
    LlmProviderNames.LM_STUDIO,
    LlmProviderNames.LITELLM_PROXY,
    LlmProviderNames.BIFROST,
-    LlmProviderNames.OPENAI_COMPATIBLE,
 ]


@@ -66,7 +64,6 @@ PROVIDER_DISPLAY_NAMES: dict[str, str] = {
    LlmProviderNames.LM_STUDIO: "LM Studio",
    LlmProviderNames.LITELLM_PROXY: "LiteLLM Proxy",
    LlmProviderNames.BIFROST: "Bifrost",
-    LlmProviderNames.OPENAI_COMPATIBLE: "OpenAI Compatible",
    "groq": "Groq",
    "anyscale": "Anyscale",
    "deepseek": "DeepSeek",
@@ -119,7 +116,6 @@ AGGREGATOR_PROVIDERS: set[str] = {
    LlmProviderNames.AZURE,
    LlmProviderNames.LITELLM_PROXY,
    LlmProviderNames.BIFROST,
-    LlmProviderNames.OPENAI_COMPATIBLE,
 }

 # Model family name mappings for display name generation
--- a/backend/onyx/llm/multi_llm.py
+++ b/backend/onyx/llm/multi_llm.py
@@ -327,19 +327,12 @@ class LitellmLLM(LLM):
        ):
            model_kwargs[VERTEX_LOCATION_KWARG] = "global"

-        # Bifrost and OpenAI-compatible: OpenAI-compatible proxies that send
-        # model names directly to the endpoint. We route through LiteLLM's
-        # openai provider with the server's base URL, and ensure /v1 is appended.
-        if model_provider in (
-            LlmProviderNames.BIFROST,
-            LlmProviderNames.OPENAI_COMPATIBLE,
-        ):
+        # Bifrost: OpenAI-compatible proxy that expects model names in
+        # provider/model format (e.g. "anthropic/claude-sonnet-4-6").
+        # We route through LiteLLM's openai provider with the Bifrost base URL,
+        # and ensure /v1 is appended.
+        if model_provider == LlmProviderNames.BIFROST:
            self._custom_llm_provider = "openai"
-            # LiteLLM's OpenAI client requires an api_key to be set.
-            # Many OpenAI-compatible servers don't need auth, so supply a
-            # placeholder to prevent LiteLLM from raising AuthenticationError.
-            if not self._api_key:
-                model_kwargs.setdefault("api_key", "not-needed")
            if self._api_base is not None:
                base = self._api_base.rstrip("/")
                self._api_base = base if base.endswith("/v1") else f"{base}/v1"
@@ -456,20 +449,17 @@ class LitellmLLM(LLM):
        optional_kwargs: dict[str, Any] = {}

        # Model name
-        is_openai_compatible_proxy = self._model_provider in (
-            LlmProviderNames.BIFROST,
-            LlmProviderNames.OPENAI_COMPATIBLE,
-        )
+        is_bifrost = self._model_provider == LlmProviderNames.BIFROST
        model_provider = (
            f"{self.config.model_provider}/responses"
            if is_openai_model  # Uses litellm's completions -> responses bridge
            else self.config.model_provider
        )
-        if is_openai_compatible_proxy:
-            # OpenAI-compatible proxies (Bifrost, generic OpenAI-compatible
-            # servers) expect model names sent directly to their endpoint.
-            # We use custom_llm_provider="openai" so LiteLLM doesn't try
-            # to route based on the provider prefix.
+        if is_bifrost:
+            # Bifrost expects model names in provider/model format
+            # (e.g. "anthropic/claude-sonnet-4-6") sent directly to its
+            # OpenAI-compatible endpoint. We use custom_llm_provider="openai"
+            # so LiteLLM doesn't try to route based on the provider prefix.
            model = self.config.deployment_name or self.config.model_name
        else:
            model = f"{model_provider}/{self.config.deployment_name or self.config.model_name}"
@@ -560,10 +550,7 @@ class LitellmLLM(LLM):
        if structured_response_format:
            optional_kwargs["response_format"] = structured_response_format

-        if (
-            not (is_claude_model or is_ollama or is_mistral)
-            or is_openai_compatible_proxy
-        ):
+        if not (is_claude_model or is_ollama or is_mistral) or is_bifrost:
            # Litellm bug: tool_choice is dropped silently if not specified here for OpenAI
            # However, this param breaks Anthropic and Mistral models,
            # so it must be conditionally included unless the request is
--- a/backend/onyx/llm/well_known_providers/constants.py
+++ b/backend/onyx/llm/well_known_providers/constants.py
@@ -15,8 +15,6 @@ LITELLM_PROXY_PROVIDER_NAME = "litellm_proxy"

 BIFROST_PROVIDER_NAME = "bifrost"

-OPENAI_COMPATIBLE_PROVIDER_NAME = "openai_compatible"
-
 # Providers that use optional Bearer auth from custom_config
 PROVIDERS_WITH_SPECIAL_API_KEY_HANDLING: dict[str, str] = {
    LlmProviderNames.OLLAMA_CHAT: OLLAMA_API_KEY_CONFIG_KEY,
--- a/backend/onyx/llm/well_known_providers/llm_provider_options.py
+++ b/backend/onyx/llm/well_known_providers/llm_provider_options.py
@@ -19,7 +19,6 @@ from onyx.llm.well_known_providers.constants import BIFROST_PROVIDER_NAME
 from onyx.llm.well_known_providers.constants import LITELLM_PROXY_PROVIDER_NAME
 from onyx.llm.well_known_providers.constants import LM_STUDIO_PROVIDER_NAME
 from onyx.llm.well_known_providers.constants import OLLAMA_PROVIDER_NAME
-from onyx.llm.well_known_providers.constants import OPENAI_COMPATIBLE_PROVIDER_NAME
 from onyx.llm.well_known_providers.constants import OPENAI_PROVIDER_NAME
 from onyx.llm.well_known_providers.constants import OPENROUTER_PROVIDER_NAME
 from onyx.llm.well_known_providers.constants import VERTEXAI_PROVIDER_NAME
@@ -52,7 +51,6 @@ def _get_provider_to_models_map() -> dict[str, list[str]]:
        OPENROUTER_PROVIDER_NAME: [],  # Dynamic - fetched from OpenRouter API
        LITELLM_PROXY_PROVIDER_NAME: [],  # Dynamic - fetched from LiteLLM proxy API
        BIFROST_PROVIDER_NAME: [],  # Dynamic - fetched from Bifrost API
-        OPENAI_COMPATIBLE_PROVIDER_NAME: [],  # Dynamic - fetched from OpenAI-compatible API
    }


@@ -338,7 +336,6 @@ def get_provider_display_name(provider_name: str) -> str:
        VERTEXAI_PROVIDER_NAME: "Google Vertex AI",
        OPENROUTER_PROVIDER_NAME: "OpenRouter",
        LITELLM_PROXY_PROVIDER_NAME: "LiteLLM Proxy",
-        OPENAI_COMPATIBLE_PROVIDER_NAME: "OpenAI Compatible",
    }

    if provider_name in _ONYX_PROVIDER_DISPLAY_NAMES:
--- a/backend/onyx/mcp_server/tools/search.py
+++ b/backend/onyx/mcp_server/tools/search.py
@@ -3,8 +3,6 @@
 from datetime import datetime
 from typing import Any

-import httpx
-
 from onyx.configs.constants import DocumentSource
 from onyx.mcp_server.api import mcp_server
 from onyx.mcp_server.utils import get_http_client
@@ -17,21 +15,6 @@ from onyx.utils.variable_functionality import global_version
 logger = setup_logger()


-def _extract_error_detail(response: httpx.Response) -> str:
-    """Extract a human-readable error message from a failed backend response.
-
-    The backend returns OnyxError responses as
-    ``{"error_code": "...", "detail": "..."}``.
-    """
-    try:
-        body = response.json()
-        if detail := body.get("detail"):
-            return str(detail)
-    except Exception:
-        pass
-    return f"Request failed with status {response.status_code}"
-
-
@mcp_server.tool()
 async def search_indexed_documents(
    query: str,
@@ -175,14 +158,7 @@ async def search_indexed_documents(
            json=search_request,
            headers=auth_headers,
        )
-        if not response.is_success:
-            error_detail = _extract_error_detail(response)
-            return {
-                "documents": [],
-                "total_results": 0,
-                "query": query,
-                "error": error_detail,
-            }
+        response.raise_for_status()
        result = response.json()

        # Check for error in response
@@ -258,13 +234,7 @@ async def search_web(
            json=request_payload,
            headers={"Authorization": f"Bearer {access_token.token}"},
        )
-        if not response.is_success:
-            error_detail = _extract_error_detail(response)
-            return {
-                "error": error_detail,
-                "results": [],
-                "query": query,
-            }
+        response.raise_for_status()
        response_payload = response.json()
        results = response_payload.get("results", [])
        return {
@@ -310,12 +280,7 @@ async def open_urls(
            json={"urls": urls},
            headers={"Authorization": f"Bearer {access_token.token}"},
        )
-        if not response.is_success:
-            error_detail = _extract_error_detail(response)
-            return {
-                "error": error_detail,
-                "results": [],
-            }
+        response.raise_for_status()
        response_payload = response.json()
        results = response_payload.get("results", [])
        return {
--- a/backend/onyx/mcp_server_main.py
+++ b/backend/onyx/mcp_server_main.py
@@ -6,7 +6,6 @@ from onyx.configs.app_configs import MCP_SERVER_ENABLED
 from onyx.configs.app_configs import MCP_SERVER_HOST
 from onyx.configs.app_configs import MCP_SERVER_PORT
 from onyx.utils.logger import setup_logger
-from onyx.utils.variable_functionality import set_is_ee_based_on_env_variable

 logger = setup_logger()

@@ -17,7 +16,6 @@ def main() -> None:
        logger.info("MCP server is disabled (MCP_SERVER_ENABLED=false)")
        return

-    set_is_ee_based_on_env_variable()
    logger.info(f"Starting MCP server on {MCP_SERVER_HOST}:{MCP_SERVER_PORT}")

    from onyx.mcp_server.api import mcp_app
--- a/backend/onyx/server/features/web_search/api.py
+++ b/backend/onyx/server/features/web_search/api.py
@@ -1,5 +1,6 @@
 from fastapi import APIRouter
 from fastapi import Depends
+from fastapi import HTTPException
 from sqlalchemy.orm import Session

 from onyx.auth.users import current_user
@@ -8,8 +9,6 @@ from onyx.db.engine.sql_engine import get_session
 from onyx.db.models import User
 from onyx.db.web_search import fetch_active_web_content_provider
 from onyx.db.web_search import fetch_active_web_search_provider
-from onyx.error_handling.error_codes import OnyxErrorCode
-from onyx.error_handling.exceptions import OnyxError
 from onyx.server.features.web_search.models import OpenUrlsToolRequest
 from onyx.server.features.web_search.models import OpenUrlsToolResponse
 from onyx.server.features.web_search.models import WebSearchToolRequest
@@ -62,10 +61,9 @@ def _get_active_search_provider(
 ) -> tuple[WebSearchProviderView, WebSearchProvider]:
    provider_model = fetch_active_web_search_provider(db_session)
    if provider_model is None:
-        raise OnyxError(
-            OnyxErrorCode.INVALID_INPUT,
-            "No web search provider configured. Please configure one in "
-            "Admin > Web Search settings.",
+        raise HTTPException(
+            status_code=400,
+            detail="No web search provider configured.",
        )

    provider_view = WebSearchProviderView(
@@ -78,10 +76,9 @@ def _get_active_search_provider(
    )

    if provider_model.api_key is None:
-        raise OnyxError(
-            OnyxErrorCode.INVALID_INPUT,
-            "Web search provider requires an API key. Please configure one in "
-            "Admin > Web Search settings.",
+        raise HTTPException(
+            status_code=400,
+            detail="Web search provider requires an API key.",
        )

    try:
@@ -91,7 +88,7 @@ def _get_active_search_provider(
            config=provider_model.config or {},
        )
    except ValueError as exc:
-        raise OnyxError(OnyxErrorCode.INVALID_INPUT, str(exc)) from exc
+        raise HTTPException(status_code=400, detail=str(exc)) from exc

    return provider_view, provider

@@ -113,9 +110,9 @@ def _get_active_content_provider(

    if provider_model.api_key is None:
        # TODO - this is not a great error, in fact, this key should not be nullable.
-        raise OnyxError(
-            OnyxErrorCode.INVALID_INPUT,
-            "Web content provider requires an API key.",
+        raise HTTPException(
+            status_code=400,
+            detail="Web content provider requires an API key.",
        )

    try:
@@ -128,12 +125,12 @@ def _get_active_content_provider(
            config=config,
        )
    except ValueError as exc:
-        raise OnyxError(OnyxErrorCode.INVALID_INPUT, str(exc)) from exc
+        raise HTTPException(status_code=400, detail=str(exc)) from exc

    if provider is None:
-        raise OnyxError(
-            OnyxErrorCode.INVALID_INPUT,
-            "Unable to initialize the configured web content provider.",
+        raise HTTPException(
+            status_code=400,
+            detail="Unable to initialize the configured web content provider.",
        )

    provider_view = WebContentProviderView(
@@ -157,13 +154,12 @@ def _run_web_search(
    for query in request.queries:
        try:
            search_results = provider.search(query)
-        except OnyxError:
+        except HTTPException:
            raise
        except Exception as exc:
            logger.exception("Web search provider failed for query '%s'", query)
-            raise OnyxError(
-                OnyxErrorCode.BAD_GATEWAY,
-                "Web search provider failed to execute query.",
+            raise HTTPException(
+                status_code=502, detail="Web search provider failed to execute query."
            ) from exc

        filtered_results = filter_web_search_results_with_no_title_or_snippet(
@@ -196,13 +192,12 @@ def _open_urls(
        docs = filter_web_contents_with_no_title_or_content(
            list(provider.contents(urls))
        )
-    except OnyxError:
+    except HTTPException:
        raise
    except Exception as exc:
        logger.exception("Web content provider failed to fetch URLs")
-        raise OnyxError(
-            OnyxErrorCode.BAD_GATEWAY,
-            "Web content provider failed to fetch URLs.",
+        raise HTTPException(
+            status_code=502, detail="Web content provider failed to fetch URLs."
        ) from exc

    results: list[LlmOpenUrlResult] = []
--- a/backend/onyx/server/manage/llm/api.py
+++ b/backend/onyx/server/manage/llm/api.py
@@ -74,8 +74,6 @@ from onyx.server.manage.llm.models import ModelConfigurationUpsertRequest
 from onyx.server.manage.llm.models import OllamaFinalModelResponse
 from onyx.server.manage.llm.models import OllamaModelDetails
 from onyx.server.manage.llm.models import OllamaModelsRequest
-from onyx.server.manage.llm.models import OpenAICompatibleFinalModelResponse
-from onyx.server.manage.llm.models import OpenAICompatibleModelsRequest
 from onyx.server.manage.llm.models import OpenRouterFinalModelResponse
 from onyx.server.manage.llm.models import OpenRouterModelDetails
 from onyx.server.manage.llm.models import OpenRouterModelsRequest
@@ -1577,95 +1575,3 @@ def _get_bifrost_models_response(api_base: str, api_key: str | None = None) -> d
        source_name="Bifrost",
        api_key=api_key,
    )
-
-
-@admin_router.post("/openai-compatible/available-models")
-def get_openai_compatible_server_available_models(
-    request: OpenAICompatibleModelsRequest,
-    _: User = Depends(current_admin_user),
-    db_session: Session = Depends(get_session),
-) -> list[OpenAICompatibleFinalModelResponse]:
-    """Fetch available models from a generic OpenAI-compatible /v1/models endpoint."""
-    response_json = _get_openai_compatible_server_response(
-        api_base=request.api_base, api_key=request.api_key
-    )
-
-    models = response_json.get("data", [])
-    if not isinstance(models, list) or len(models) == 0:
-        raise OnyxError(
-            OnyxErrorCode.VALIDATION_ERROR,
-            "No models found from your OpenAI-compatible endpoint",
-        )
-
-    results: list[OpenAICompatibleFinalModelResponse] = []
-    for model in models:
-        try:
-            model_id = model.get("id", "")
-            model_name = model.get("name", model_id)
-
-            if not model_id:
-                continue
-
-            # Skip embedding models
-            if is_embedding_model(model_id):
-                continue
-
-            results.append(
-                OpenAICompatibleFinalModelResponse(
-                    name=model_id,
-                    display_name=model_name,
-                    max_input_tokens=model.get("context_length"),
-                    supports_image_input=infer_vision_support(model_id),
-                    supports_reasoning=is_reasoning_model(model_id, model_name),
-                )
-            )
-        except Exception as e:
-            logger.warning(
-                "Failed to parse OpenAI-compatible model entry",
-                extra={"error": str(e), "item": str(model)[:1000]},
-            )
-
-    if not results:
-        raise OnyxError(
-            OnyxErrorCode.VALIDATION_ERROR,
-            "No compatible models found from OpenAI-compatible endpoint",
-        )
-
-    sorted_results = sorted(results, key=lambda m: m.name.lower())
-
-    # Sync new models to DB if provider_name is specified
-    if request.provider_name:
-        _sync_fetched_models(
-            db_session=db_session,
-            provider_name=request.provider_name,
-            models=[
-                SyncModelEntry(
-                    name=r.name,
-                    display_name=r.display_name,
-                    max_input_tokens=r.max_input_tokens,
-                    supports_image_input=r.supports_image_input,
-                )
-                for r in sorted_results
-            ],
-            source_label="OpenAI Compatible",
-        )
-
-    return sorted_results
-
-
-def _get_openai_compatible_server_response(
-    api_base: str, api_key: str | None = None
-) -> dict:
-    """Perform GET to an OpenAI-compatible /v1/models and return parsed JSON."""
-    cleaned_api_base = api_base.strip().rstrip("/")
-    # Ensure we hit /v1/models
-    if cleaned_api_base.endswith("/v1"):
-        url = f"{cleaned_api_base}/models"
-    else:
-        url = f"{cleaned_api_base}/v1/models"
-
-    return _get_openai_compatible_models_response(
-        url=url,
-        source_name="OpenAI Compatible",
-        api_key=api_key,
-    )
--- a/backend/onyx/server/manage/llm/models.py
+++ b/backend/onyx/server/manage/llm/models.py
@@ -79,9 +79,7 @@ class LLMProviderDescriptor(BaseModel):
            provider=provider,
            provider_display_name=get_provider_display_name(provider),
            model_configurations=filter_model_configurations(
-                llm_provider_model.model_configurations,
-                provider,
-                use_stored_display_name=llm_provider_model.custom_config is not None,
+                llm_provider_model.model_configurations, provider
            ),
        )

@@ -158,9 +156,7 @@ class LLMProviderView(LLMProvider):
            personas=personas,
            deployment_name=llm_provider_model.deployment_name,
            model_configurations=filter_model_configurations(
-                llm_provider_model.model_configurations,
-                provider,
-                use_stored_display_name=llm_provider_model.custom_config is not None,
+                llm_provider_model.model_configurations, provider
            ),
        )

@@ -202,13 +198,13 @@ class ModelConfigurationView(BaseModel):
        cls,
        model_configuration_model: "ModelConfigurationModel",
        provider_name: str,
-        use_stored_display_name: bool = False,
    ) -> "ModelConfigurationView":
-        # For dynamic providers (OpenRouter, Bedrock, Ollama) and custom-config
-        # providers, use the display_name stored in DB. Skip LiteLLM parsing.
+        # For dynamic providers (OpenRouter, Bedrock, Ollama), use the display_name
+        # stored in DB from the source API. Skip LiteLLM parsing entirely.
        if (
-            provider_name in DYNAMIC_LLM_PROVIDERS or use_stored_display_name
-        ) and model_configuration_model.display_name:
+            provider_name in DYNAMIC_LLM_PROVIDERS
+            and model_configuration_model.display_name
+        ):
            # Extract vendor from model name for grouping (e.g., "Anthropic", "OpenAI")
            vendor = extract_vendor_from_model_name(
                model_configuration_model.name, provider_name
@@ -468,18 +464,3 @@ class BifrostFinalModelResponse(BaseModel):
    max_input_tokens: int | None
    supports_image_input: bool
    supports_reasoning: bool
-
-
-# OpenAI Compatible dynamic models fetch
-class OpenAICompatibleModelsRequest(BaseModel):
-    api_base: str
-    api_key: str | None = None
-    provider_name: str | None = None  # Optional: to save models to existing provider
-
-
-class OpenAICompatibleFinalModelResponse(BaseModel):
-    name: str  # Model ID (e.g. "meta-llama/Llama-3-8B-Instruct")
-    display_name: str  # Human-readable name from API
-    max_input_tokens: int | None
-    supports_image_input: bool
-    supports_reasoning: bool
--- a/backend/onyx/server/manage/llm/utils.py
+++ b/backend/onyx/server/manage/llm/utils.py
@@ -26,7 +26,6 @@ DYNAMIC_LLM_PROVIDERS = frozenset(
        LlmProviderNames.OLLAMA_CHAT,
        LlmProviderNames.LM_STUDIO,
        LlmProviderNames.BIFROST,
-        LlmProviderNames.OPENAI_COMPATIBLE,
    }
 )

@@ -309,15 +308,12 @@ def should_filter_as_dated_duplicate(
 def filter_model_configurations(
    model_configurations: list,
    provider: str,
-    use_stored_display_name: bool = False,
 ) -> list:
    """Filter out obsolete and dated duplicate models from configurations.

    Args:
        model_configurations: List of ModelConfiguration DB models
        provider: The provider name (e.g., "openai", "anthropic")
-        use_stored_display_name: If True, prefer the display_name stored in the
-            DB over LiteLLM enrichments. Set for custom-config providers.

    Returns:
        List of ModelConfigurationView objects with obsolete/duplicate models removed
@@ -337,9 +333,7 @@ def filter_model_configurations(
        if should_filter_as_dated_duplicate(model_configuration.name, all_model_names):
            continue
        filtered_configs.append(
-            ModelConfigurationView.from_model(
-                model_configuration, provider, use_stored_display_name
-            )
+            ModelConfigurationView.from_model(model_configuration, provider)
        )

    return filtered_configs
--- a/backend/onyx/server/metrics/metrics_server.py
+++ b/backend/onyx/server/metrics/metrics_server.py
@@ -26,6 +26,7 @@ _DEFAULT_PORTS: dict[str, int] = {
    "monitoring": 9096,
    "docfetching": 9092,
    "docprocessing": 9093,
+    "heavy": 9094,
 }

 _server_started = False
--- a/backend/tests/external_dependency_unit/document_index/test_document_index.py
+++ b/backend/tests/external_dependency_unit/document_index/test_document_index.py
@@ -186,7 +186,7 @@ class TestDocumentIndexNew:
            )
            document_index.index(chunks=[pre_chunk], indexing_metadata=pre_metadata)

-            time.sleep(2)
+            time.sleep(1)

            # Now index a batch with the existing doc and a new doc.
            chunks = [
--- a/backend/tests/integration/multitenant_tests/tenants/test_tenant_provisioning_rollback.py
+++ b/backend/tests/integration/multitenant_tests/tenants/test_tenant_provisioning_rollback.py
@@ -9,7 +9,6 @@ This test verifies the full flow: provisioning failure → rollback → schema c
 """

 import uuid
-from unittest.mock import MagicMock
 from unittest.mock import patch

 from sqlalchemy import text
@@ -56,28 +55,18 @@ class TestTenantProvisioningRollback:
            created_tenant_id = tenant_id
            return create_schema_if_not_exists(tenant_id)

-        # Mock setup_tenant to fail after schema creation.
-        # Also mock the Redis lock so the test doesn't compete with a live
-        # monitoring worker that may already hold the provision lock.
-        mock_lock = MagicMock()
-        mock_lock.acquire.return_value = True
-
+        # Mock setup_tenant to fail after schema creation
        with patch(
-            "ee.onyx.background.celery.tasks.tenant_provisioning.tasks.get_redis_client"
-        ) as mock_redis:
-            mock_redis.return_value.lock.return_value = mock_lock
+            "ee.onyx.background.celery.tasks.tenant_provisioning.tasks.setup_tenant"
+        ) as mock_setup:
+            mock_setup.side_effect = Exception("Simulated provisioning failure")

            with patch(
-                "ee.onyx.background.celery.tasks.tenant_provisioning.tasks.setup_tenant"
-            ) as mock_setup:
-                mock_setup.side_effect = Exception("Simulated provisioning failure")
-
-                with patch(
-                    "ee.onyx.background.celery.tasks.tenant_provisioning.tasks.create_schema_if_not_exists",
-                    side_effect=track_schema_creation,
-                ):
-                    # Run pre-provisioning - it should fail and trigger rollback
-                    pre_provision_tenant()
+                "ee.onyx.background.celery.tasks.tenant_provisioning.tasks.create_schema_if_not_exists",
+                side_effect=track_schema_creation,
+            ):
+                # Run pre-provisioning - it should fail and trigger rollback
+                pre_provision_tenant()

        # Verify that the schema was created and then cleaned up
        assert created_tenant_id is not None, "Schema should have been created"
--- a/backend/tests/unit/federated_connector/test_reject_masked_credentials.py
+++ b/backend/tests/unit/federated_connector/test_reject_masked_credentials.py
@@ -1,58 +0,0 @@
-import pytest
-
-from onyx.configs.constants import MASK_CREDENTIAL_CHAR
-from onyx.db.federated import _reject_masked_credentials
-
-
-class TestRejectMaskedCredentials:
-    """Verify that masked credential values are never accepted for DB writes.
-
-    mask_string() has two output formats:
-    - Short strings (< 14 chars): "••••••••••••" (U+2022 BULLET)
-    - Long strings (>= 14 chars): "abcd...wxyz" (first4 + "..." + last4)
-    _reject_masked_credentials must catch both.
-    """
-
-    def test_rejects_fully_masked_value(self) -> None:
-        masked = MASK_CREDENTIAL_CHAR * 12  # "••••••••••••"
-        with pytest.raises(ValueError, match="masked placeholder"):
-            _reject_masked_credentials({"client_id": masked})
-
-    def test_rejects_long_string_masked_value(self) -> None:
-        """mask_string returns 'first4...last4' for long strings — the real
-        format used for OAuth credentials like client_id and client_secret."""
-        with pytest.raises(ValueError, match="masked placeholder"):
-            _reject_masked_credentials({"client_id": "1234...7890"})
-
-    def test_rejects_when_any_field_is_masked(self) -> None:
-        """Even if client_id is real, a masked client_secret must be caught."""
-        with pytest.raises(ValueError, match="client_secret"):
-            _reject_masked_credentials(
-                {
-                    "client_id": "1234567890.1234567890",
-                    "client_secret": MASK_CREDENTIAL_CHAR * 12,
-                }
-            )
-
-    def test_accepts_real_credentials(self) -> None:
-        # Should not raise
-        _reject_masked_credentials(
-            {
-                "client_id": "1234567890.1234567890",
-                "client_secret": "test_client_secret_value",
-            }
-        )
-
-    def test_accepts_empty_dict(self) -> None:
-        # Should not raise — empty credentials are handled elsewhere
-        _reject_masked_credentials({})
-
-    def test_ignores_non_string_values(self) -> None:
-        # Non-string values (None, bool, int) should pass through
-        _reject_masked_credentials(
-            {
-                "client_id": "real_value",
-                "redirect_uri": None,
-                "some_flag": True,
-            }
-        )
--- a/backend/tests/unit/onyx/connectors/notion/test_notion_people_and_tables.py
+++ b/backend/tests/unit/onyx/connectors/notion/test_notion_people_and_tables.py
@@ -1,318 +0,0 @@
-"""Unit tests for Notion connector handling of people properties and table blocks.
-
-Reproduces two bugs:
-1. ENG-3970: People-type database properties (user mentions) are not extracted —
-   the user's "name" field is lost when _recurse_properties drills into the
-   "person" sub-dict.
-2. ENG-3971: Inline table blocks (table/table_row) are not indexed — table_row
-   blocks store content in "cells" rather than "rich_text", so no text is extracted.
-"""
-
-from unittest.mock import patch
-
-from onyx.connectors.notion.connector import NotionConnector
-
-
-def _make_connector() -> NotionConnector:
-    connector = NotionConnector()
-    connector.load_credentials({"notion_integration_token": "fake-token"})
-    return connector
-
-
-class TestPeoplePropertyExtraction:
-    """ENG-3970: Verifies that 'people' type database properties extract user names."""
-
-    def test_single_person_property(self) -> None:
-        """A database cell with a single @mention should extract the user name."""
-        properties = {
-            "Team Lead": {
-                "id": "abc",
-                "type": "people",
-                "people": [
-                    {
-                        "object": "user",
-                        "id": "user-uuid-1",
-                        "name": "Arturo Martinez",
-                        "type": "person",
-                        "person": {"email": "arturo@example.com"},
-                    }
-                ],
-            }
-        }
-        result = NotionConnector._properties_to_str(properties)
-        assert (
-            "Arturo Martinez" in result
-        ), f"Expected 'Arturo Martinez' in extracted text, got: {result!r}"
-
-    def test_multiple_people_property(self) -> None:
-        """A database cell with multiple @mentions should extract all user names."""
-        properties = {
-            "Members": {
-                "id": "def",
-                "type": "people",
-                "people": [
-                    {
-                        "object": "user",
-                        "id": "user-uuid-1",
-                        "name": "Arturo Martinez",
-                        "type": "person",
-                        "person": {"email": "arturo@example.com"},
-                    },
-                    {
-                        "object": "user",
-                        "id": "user-uuid-2",
-                        "name": "Jane Smith",
-                        "type": "person",
-                        "person": {"email": "jane@example.com"},
-                    },
-                ],
-            }
-        }
-        result = NotionConnector._properties_to_str(properties)
-        assert (
-            "Arturo Martinez" in result
-        ), f"Expected 'Arturo Martinez' in extracted text, got: {result!r}"
-        assert (
-            "Jane Smith" in result
-        ), f"Expected 'Jane Smith' in extracted text, got: {result!r}"
-
-    def test_bot_user_property(self) -> None:
-        """Bot users (integrations) have 'type': 'bot' — name should still be extracted."""
-        properties = {
-            "Created By": {
-                "id": "ghi",
-                "type": "people",
-                "people": [
-                    {
-                        "object": "user",
-                        "id": "bot-uuid-1",
-                        "name": "Onyx Integration",
-                        "type": "bot",
-                        "bot": {},
-                    }
-                ],
-            }
-        }
-        result = NotionConnector._properties_to_str(properties)
-        assert (
-            "Onyx Integration" in result
-        ), f"Expected 'Onyx Integration' in extracted text, got: {result!r}"
-
-    def test_person_without_person_details(self) -> None:
-        """Some user objects may have an empty/null person sub-dict."""
-        properties = {
-            "Assignee": {
-                "id": "jkl",
-                "type": "people",
-                "people": [
-                    {
-                        "object": "user",
-                        "id": "user-uuid-3",
-                        "name": "Ghost User",
-                        "type": "person",
-                        "person": {},
-                    }
-                ],
-            }
-        }
-        result = NotionConnector._properties_to_str(properties)
-        assert (
-            "Ghost User" in result
-        ), f"Expected 'Ghost User' in extracted text, got: {result!r}"
-
-    def test_people_mixed_with_other_properties(self) -> None:
-        """People property should work alongside other property types."""
-        properties = {
-            "Name": {
-                "id": "aaa",
-                "type": "title",
-                "title": [
-                    {
-                        "plain_text": "Project Alpha",
-                        "type": "text",
-                        "text": {"content": "Project Alpha"},
-                    }
-                ],
-            },
-            "Lead": {
-                "id": "bbb",
-                "type": "people",
-                "people": [
-                    {
-                        "object": "user",
-                        "id": "user-uuid-1",
-                        "name": "Arturo Martinez",
-                        "type": "person",
-                        "person": {"email": "arturo@example.com"},
-                    }
-                ],
-            },
-            "Status": {
-                "id": "ccc",
-                "type": "status",
-                "status": {"name": "In Progress", "id": "status-1"},
-            },
-        }
-        result = NotionConnector._properties_to_str(properties)
-        assert "Arturo Martinez" in result
-        assert "In Progress" in result
-
-
-class TestTableBlockExtraction:
-    """ENG-3971: Verifies that inline table blocks (table/table_row) are indexed."""
-
-    def _make_blocks_response(self, results: list) -> dict:
-        return {"results": results, "next_cursor": None}
-
-    def test_table_row_cells_are_extracted(self) -> None:
-        """table_row blocks store content in 'cells', not 'rich_text'.
-        The connector should extract text from cells."""
-        connector = _make_connector()
-        connector.workspace_id = "ws-1"
-
-        table_block = {
-            "id": "table-block-1",
-            "type": "table",
-            "table": {
-                "has_column_header": True,
-                "has_row_header": False,
-                "table_width": 3,
-            },
-            "has_children": True,
-        }
-
-        header_row = {
-            "id": "row-1",
-            "type": "table_row",
-            "table_row": {
-                "cells": [
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Name"},
-                            "plain_text": "Name",
-                        }
-                    ],
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Role"},
-                            "plain_text": "Role",
-                        }
-                    ],
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Team"},
-                            "plain_text": "Team",
-                        }
-                    ],
-                ]
-            },
-            "has_children": False,
-        }
-
-        data_row = {
-            "id": "row-2",
-            "type": "table_row",
-            "table_row": {
-                "cells": [
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Arturo Martinez"},
-                            "plain_text": "Arturo Martinez",
-                        }
-                    ],
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Engineer"},
-                            "plain_text": "Engineer",
-                        }
-                    ],
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Platform"},
-                            "plain_text": "Platform",
-                        }
-                    ],
-                ]
-            },
-            "has_children": False,
-        }
-
-        with patch.object(
-            connector,
-            "_fetch_child_blocks",
-            side_effect=[
-                self._make_blocks_response([table_block]),
-                self._make_blocks_response([header_row, data_row]),
-            ],
-        ):
-            output = connector._read_blocks("page-1")
-
-        all_text = " ".join(block.text for block in output.blocks)
-        assert "Arturo Martinez" in all_text, (
-            f"Expected 'Arturo Martinez' in table row text, got blocks: "
-            f"{[(b.id, b.text) for b in output.blocks]}"
-        )
-        assert "Engineer" in all_text, (
-            f"Expected 'Engineer' in table row text, got blocks: "
-            f"{[(b.id, b.text) for b in output.blocks]}"
-        )
-        assert "Platform" in all_text, (
-            f"Expected 'Platform' in table row text, got blocks: "
-            f"{[(b.id, b.text) for b in output.blocks]}"
-        )
-
-    def test_table_with_empty_cells(self) -> None:
-        """Table rows with some empty cells should still extract non-empty content."""
-        connector = _make_connector()
-        connector.workspace_id = "ws-1"
-
-        table_block = {
-            "id": "table-block-2",
-            "type": "table",
-            "table": {
-                "has_column_header": False,
-                "has_row_header": False,
-                "table_width": 2,
-            },
-            "has_children": True,
-        }
-
-        row_with_empty = {
-            "id": "row-3",
-            "type": "table_row",
-            "table_row": {
-                "cells": [
-                    [
-                        {
-                            "type": "text",
-                            "text": {"content": "Has Value"},
-                            "plain_text": "Has Value",
-                        }
-                    ],
-                    [],  # empty cell
-                ]
-            },
-            "has_children": False,
-        }
-
-        with patch.object(
-            connector,
-            "_fetch_child_blocks",
-            side_effect=[
-                self._make_blocks_response([table_block]),
-                self._make_blocks_response([row_with_empty]),
-            ],
-        ):
-            output = connector._read_blocks("page-2")
-
-        all_text = " ".join(block.text for block in output.blocks)
-        assert "Has Value" in all_text, (
-            f"Expected 'Has Value' in table row text, got blocks: "
-            f"{[(b.id, b.text) for b in output.blocks]}"
-        )
--- a/backend/tests/unit/onyx/db/test_chat_message_cleanup.py
+++ b/backend/tests/unit/onyx/db/test_chat_message_cleanup.py
@@ -1,100 +0,0 @@
-"""Regression tests for delete_messages_and_files_from_chat_session.
-
-Verifies that user-owned files (those with user_file_id) are never deleted
-during chat session cleanup — only chat-only files should be removed.
-"""
-
-from unittest.mock import call
-from unittest.mock import MagicMock
-from unittest.mock import patch
-from uuid import uuid4
-
-from onyx.db.chat import delete_messages_and_files_from_chat_session
-
-_MODULE = "onyx.db.chat"
-
-
-def _make_db_session(
-    rows: list[tuple[int, list[dict[str, str]] | None]],
-) -> MagicMock:
-    db_session = MagicMock()
-    db_session.execute.return_value.tuples.return_value.all.return_value = rows
-    return db_session
-
-
-@patch(f"{_MODULE}.delete_orphaned_search_docs")
-@patch(f"{_MODULE}.get_default_file_store")
-def test_user_files_are_not_deleted(
-    mock_get_file_store: MagicMock,
-    _mock_orphan_cleanup: MagicMock,
-) -> None:
-    """User files (with user_file_id) must be skipped during cleanup."""
-    file_store = MagicMock()
-    mock_get_file_store.return_value = file_store
-
-    db_session = _make_db_session(
-        [
-            (
-                1,
-                [
-                    {"id": "chat-file-1", "type": "image"},
-                    {"id": "user-file-1", "type": "document", "user_file_id": "uf-1"},
-                    {"id": "chat-file-2", "type": "image"},
-                ],
-            ),
-        ]
-    )
-
-    delete_messages_and_files_from_chat_session(uuid4(), db_session)
-
-    assert file_store.delete_file.call_count == 2
-    file_store.delete_file.assert_has_calls(
-        [
-            call(file_id="chat-file-1", error_on_missing=False),
-            call(file_id="chat-file-2", error_on_missing=False),
-        ]
-    )
-
-
-@patch(f"{_MODULE}.delete_orphaned_search_docs")
-@patch(f"{_MODULE}.get_default_file_store")
-def test_only_user_files_means_no_deletions(
-    mock_get_file_store: MagicMock,
-    _mock_orphan_cleanup: MagicMock,
-) -> None:
-    """When every file in the session is a user file, nothing should be deleted."""
-    file_store = MagicMock()
-    mock_get_file_store.return_value = file_store
-
-    db_session = _make_db_session(
-        [
-            (1, [{"id": "uf-a", "type": "document", "user_file_id": "uf-1"}]),
-            (2, [{"id": "uf-b", "type": "document", "user_file_id": "uf-2"}]),
-        ]
-    )
-
-    delete_messages_and_files_from_chat_session(uuid4(), db_session)
-
-    file_store.delete_file.assert_not_called()
-
-
-@patch(f"{_MODULE}.delete_orphaned_search_docs")
-@patch(f"{_MODULE}.get_default_file_store")
-def test_messages_with_no_files(
-    mock_get_file_store: MagicMock,
-    _mock_orphan_cleanup: MagicMock,
-) -> None:
-    """Messages with None or empty file lists should not trigger any deletions."""
-    file_store = MagicMock()
-    mock_get_file_store.return_value = file_store
-
-    db_session = _make_db_session(
-        [
-            (1, None),
-            (2, []),
-        ]
-    )
-
-    delete_messages_and_files_from_chat_session(uuid4(), db_session)
-
-    file_store.delete_file.assert_not_called()
--- a/backend/tests/unit/onyx/document_index/opensearch/test_get_doc_chunk_id.py
+++ b/backend/tests/unit/onyx/document_index/opensearch/test_get_doc_chunk_id.py
@@ -1,203 +0,0 @@
-import pytest
-
-from onyx.document_index.interfaces_new import TenantState
-from onyx.document_index.opensearch.constants import DEFAULT_MAX_CHUNK_SIZE
-from onyx.document_index.opensearch.schema import get_opensearch_doc_chunk_id
-from onyx.document_index.opensearch.string_filtering import (
-    MAX_DOCUMENT_ID_ENCODED_LENGTH,
-)
-from shared_configs.configs import POSTGRES_DEFAULT_SCHEMA_STANDARD_VALUE
-
-
-SINGLE_TENANT_STATE = TenantState(
-    tenant_id=POSTGRES_DEFAULT_SCHEMA_STANDARD_VALUE, multitenant=False
-)
-MULTI_TENANT_STATE = TenantState(
-    tenant_id="tenant_abcdef12-3456-7890-abcd-ef1234567890", multitenant=True
-)
-EXPECTED_SHORT_TENANT = "abcdef12"
-
-
-class TestGetOpensearchDocChunkIdSingleTenant:
-    def test_basic(self) -> None:
-        result = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, "my-doc-id", chunk_index=0
-        )
-        assert result == f"my-doc-id__{DEFAULT_MAX_CHUNK_SIZE}__0"
-
-    def test_custom_chunk_size(self) -> None:
-        result = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, "doc1", chunk_index=3, max_chunk_size=1024
-        )
-        assert result == "doc1__1024__3"
-
-    def test_special_chars_are_stripped(self) -> None:
-        """Tests characters not matching [A-Za-z0-9_.-~] are removed."""
-        result = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, "doc/with?special#chars&more%stuff", chunk_index=0
-        )
-        assert "/" not in result
-        assert "?" not in result
-        assert "#" not in result
-        assert result == f"docwithspecialcharsmorestuff__{DEFAULT_MAX_CHUNK_SIZE}__0"
-
-    def test_short_doc_id_not_hashed(self) -> None:
-        """
-        Tests that a short doc ID should appear directly in the result, not as a
-        hash.
-        """
-        doc_id = "short-id"
-        result = get_opensearch_doc_chunk_id(SINGLE_TENANT_STATE, doc_id, chunk_index=0)
-        assert "short-id" in result
-
-    def test_long_doc_id_is_hashed(self) -> None:
-        """
-        Tests that a doc ID exceeding the max length should be replaced with a
-        blake2b hash.
-        """
-        # Create a doc ID that will exceed max length after the suffix is
-        # appended.
-        doc_id = "a" * MAX_DOCUMENT_ID_ENCODED_LENGTH
-        result = get_opensearch_doc_chunk_id(SINGLE_TENANT_STATE, doc_id, chunk_index=0)
-        # The original doc ID should NOT appear in the result.
-        assert doc_id not in result
-        # The suffix should still be present.
-        assert f"__{DEFAULT_MAX_CHUNK_SIZE}__0" in result
-
-    def test_long_doc_id_hash_is_deterministic(self) -> None:
-        doc_id = "x" * MAX_DOCUMENT_ID_ENCODED_LENGTH
-        result1 = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, doc_id, chunk_index=5
-        )
-        result2 = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, doc_id, chunk_index=5
-        )
-        assert result1 == result2
-
-    def test_long_doc_id_different_inputs_produce_different_hashes(self) -> None:
-        doc_id_a = "a" * MAX_DOCUMENT_ID_ENCODED_LENGTH
-        doc_id_b = "b" * MAX_DOCUMENT_ID_ENCODED_LENGTH
-        result_a = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, doc_id_a, chunk_index=0
-        )
-        result_b = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, doc_id_b, chunk_index=0
-        )
-        assert result_a != result_b
-
-    def test_result_never_exceeds_max_length(self) -> None:
-        """
-        Tests that the final result should always be under
-        MAX_DOCUMENT_ID_ENCODED_LENGTH bytes.
-        """
-        doc_id = "z" * (MAX_DOCUMENT_ID_ENCODED_LENGTH * 2)
-        result = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, doc_id, chunk_index=999, max_chunk_size=99999
-        )
-        assert len(result.encode("utf-8")) < MAX_DOCUMENT_ID_ENCODED_LENGTH
-
-    def test_no_tenant_prefix_in_single_tenant(self) -> None:
-        result = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, "mydoc", chunk_index=0
-        )
-        assert not result.startswith(SINGLE_TENANT_STATE.tenant_id)
-
-
-class TestGetOpensearchDocChunkIdMultiTenant:
-    def test_includes_tenant_prefix(self) -> None:
-        result = get_opensearch_doc_chunk_id(MULTI_TENANT_STATE, "mydoc", chunk_index=0)
-        assert result.startswith(f"{EXPECTED_SHORT_TENANT}__")
-
-    def test_format(self) -> None:
-        result = get_opensearch_doc_chunk_id(
-            MULTI_TENANT_STATE, "mydoc", chunk_index=2, max_chunk_size=256
-        )
-        assert result == f"{EXPECTED_SHORT_TENANT}__mydoc__256__2"
-
-    def test_long_doc_id_is_hashed_multitenant(self) -> None:
-        doc_id = "d" * MAX_DOCUMENT_ID_ENCODED_LENGTH
-        result = get_opensearch_doc_chunk_id(MULTI_TENANT_STATE, doc_id, chunk_index=0)
-        # Should still have tenant prefix.
-        assert result.startswith(f"{EXPECTED_SHORT_TENANT}__")
-        # The original doc ID should NOT appear in the result.
-        assert doc_id not in result
-        # The suffix should still be present.
-        assert f"__{DEFAULT_MAX_CHUNK_SIZE}__0" in result
-
-    def test_result_never_exceeds_max_length_multitenant(self) -> None:
-        doc_id = "q" * (MAX_DOCUMENT_ID_ENCODED_LENGTH * 2)
-        result = get_opensearch_doc_chunk_id(
-            MULTI_TENANT_STATE, doc_id, chunk_index=999, max_chunk_size=99999
-        )
-        assert len(result.encode("utf-8")) < MAX_DOCUMENT_ID_ENCODED_LENGTH
-
-    def test_different_tenants_produce_different_ids(self) -> None:
-        tenant_a = TenantState(
-            tenant_id="tenant_aaaaaaaa-0000-0000-0000-000000000000", multitenant=True
-        )
-        tenant_b = TenantState(
-            tenant_id="tenant_bbbbbbbb-0000-0000-0000-000000000000", multitenant=True
-        )
-        result_a = get_opensearch_doc_chunk_id(tenant_a, "same-doc", chunk_index=0)
-        result_b = get_opensearch_doc_chunk_id(tenant_b, "same-doc", chunk_index=0)
-        assert result_a != result_b
-
-
-class TestGetOpensearchDocChunkIdEdgeCases:
-    def test_chunk_index_zero(self) -> None:
-        result = get_opensearch_doc_chunk_id(SINGLE_TENANT_STATE, "doc", chunk_index=0)
-        assert result.endswith("__0")
-
-    def test_large_chunk_index(self) -> None:
-        result = get_opensearch_doc_chunk_id(
-            SINGLE_TENANT_STATE, "doc", chunk_index=99999
-        )
-        assert result.endswith("__99999")
-
-    def test_doc_id_with_only_special_chars_raises(self) -> None:
-        """
-        Tests that a doc ID that becomes empty after filtering should raise
-        ValueError.
-        """
-        with pytest.raises(ValueError, match="empty after filtering"):
-            get_opensearch_doc_chunk_id(SINGLE_TENANT_STATE, "###???///", chunk_index=0)
-
-    def test_doc_id_at_boundary_length(self) -> None:
-        """
-        Tests that a doc ID right at the boundary should not be hashed.
-        """
-        suffix = f"__{DEFAULT_MAX_CHUNK_SIZE}__0"
-        suffix_len = len(suffix.encode("utf-8"))
-        # Max doc ID length that won't trigger hashing (must be <
-        # max_encoded_length).
-        max_doc_len = MAX_DOCUMENT_ID_ENCODED_LENGTH - suffix_len - 1
-        doc_id = "a" * max_doc_len
-        result = get_opensearch_doc_chunk_id(SINGLE_TENANT_STATE, doc_id, chunk_index=0)
-        assert doc_id in result
-
-    def test_doc_id_at_boundary_length_multitenant(self) -> None:
-        """
-        Tests that a doc ID right at the boundary should not be hashed in
-        multitenant mode.
-        """
-        suffix = f"__{DEFAULT_MAX_CHUNK_SIZE}__0"
-        suffix_len = len(suffix.encode("utf-8"))
-        prefix = f"{EXPECTED_SHORT_TENANT}__"
-        prefix_len = len(prefix.encode("utf-8"))
-        # Max doc ID length that won't trigger hashing (must be <
-        # max_encoded_length).
-        max_doc_len = MAX_DOCUMENT_ID_ENCODED_LENGTH - suffix_len - prefix_len - 1
-        doc_id = "a" * max_doc_len
-        result = get_opensearch_doc_chunk_id(MULTI_TENANT_STATE, doc_id, chunk_index=0)
-        assert doc_id in result
-
-    def test_doc_id_one_over_boundary_is_hashed(self) -> None:
-        """
-        Tests that a doc ID one byte over the boundary should be hashed.
-        """
-        suffix = f"__{DEFAULT_MAX_CHUNK_SIZE}__0"
-        suffix_len = len(suffix.encode("utf-8"))
-        # This length will trigger the >= check in filter_and_validate_document_id
-        doc_id = "a" * (MAX_DOCUMENT_ID_ENCODED_LENGTH - suffix_len)
-        result = get_opensearch_doc_chunk_id(SINGLE_TENANT_STATE, doc_id, chunk_index=0)
-        assert doc_id not in result
--- a/backend/tests/unit/onyx/file_processing/fixtures/owner_protected.pdf
+++ b/backend/tests/unit/onyx/file_processing/fixtures/owner_protected.pdf
@@ -1,76 +0,0 @@
-%PDF-1.3
-%<25><><EFBFBD><EFBFBD>
-1 0 obj
-<<
-/Producer <1083d595b1>
->>
-endobj
-2 0 obj
-<<
-/Type /Pages
-/Count 1
-/Kids [ 4 0 R ]
->>
-endobj
-3 0 obj
-<<
-/Type /Catalog
-/Pages 2 0 R
->>
-endobj
-4 0 obj
-<<
-/Type /Page
-/Resources <<
-/Font <<
-/F1 <<
-/Type /Font
-/Subtype /Type1
-/BaseFont /Helvetica
->>
->>
->>
-/MediaBox [ 0.0 0.0 200 200 ]
-/Contents 5 0 R
-/Parent 2 0 R
->>
-endobj
-5 0 obj
-<<
-/Length 42
->>
-stream
-,N<><6~<7E>)<29><><EFBFBD><EFBFBD><EFBFBD>u<EFBFBD><0C><><EFBFBD>Zc'<27><>>8g<38><67><EFBFBD>n<EFBFBD><6E><EFBFBD><EFBFBD><EFBFBD>9"
-endstream
-endobj
-6 0 obj
-<<
-/V 2
-/R 3
-/Length 128
-/P 4294967292
-/Filter /Standard
-/O <6a340a292629053da84a6d8b19a5d505953b8b3fdac3d2d389fde0e354528d44>
-/U <d6f0dc91c7b9de264a8d708515468e6528bf4e5e4e758a4164004e56fffa0108>
->>
-endobj
-xref
-0 7
-0000000000 65535 f 
-0000000015 00000 n 
-0000000059 00000 n 
-0000000118 00000 n 
-0000000167 00000 n 
-0000000348 00000 n 
-0000000440 00000 n 
-trailer
-<<
-/Size 7
-/Root 3 0 R
-/Info 1 0 R
-/ID [ <6364336635356135633239323638353039306635656133623165313637366430> <6364336635356135633239323638353039306635656133623165313637366430> ]
-/Encrypt 6 0 R
->>
-startxref
-655
-%%EOF
--- a/backend/tests/unit/onyx/file_processing/test_pdf.py
+++ b/backend/tests/unit/onyx/file_processing/test_pdf.py
@@ -54,12 +54,6 @@ class TestReadPdfFile:
        text, _, _ = read_pdf_file(_load("encrypted.pdf"), pdf_pass="wrong")
        assert text == ""

-    def test_owner_password_only_pdf_extracts_text(self) -> None:
-        """A PDF encrypted with only an owner password (no user password)
-        should still yield its text content. Regression for #9754."""
-        text, _, _ = read_pdf_file(_load("owner_protected.pdf"))
-        assert "Hello World" in text
-
    def test_empty_pdf(self) -> None:
        text, _, _ = read_pdf_file(_load("empty.pdf"))
        assert text.strip() == ""
@@ -123,12 +117,6 @@ class TestIsPdfProtected:
    def test_protected_pdf(self) -> None:
        assert is_pdf_protected(_load("encrypted.pdf")) is True

-    def test_owner_password_only_is_not_protected(self) -> None:
-        """A PDF with only an owner password (permission restrictions) but no
-        user password should NOT be considered protected — any viewer can open
-        it without prompting for a password."""
-        assert is_pdf_protected(_load("owner_protected.pdf")) is False
-
    def test_preserves_file_position(self) -> None:
        pdf = _load("simple.pdf")
        pdf.seek(42)
--- a/backend/tests/unit/onyx/file_processing/test_pptx_to_text.py
+++ b/backend/tests/unit/onyx/file_processing/test_pptx_to_text.py
@@ -1,79 +0,0 @@
-import io
-
-from pptx import Presentation  # type: ignore[import-untyped]
-from pptx.chart.data import CategoryChartData  # type: ignore[import-untyped]
-from pptx.enum.chart import XL_CHART_TYPE  # type: ignore[import-untyped]
-from pptx.util import Inches  # type: ignore[import-untyped]
-
-from onyx.file_processing.extract_file_text import pptx_to_text
-
-
-def _make_pptx_with_chart() -> io.BytesIO:
-    """Create an in-memory pptx with one text slide and one chart slide."""
-    prs = Presentation()
-
-    # Slide 1: text only
-    slide1 = prs.slides.add_slide(prs.slide_layouts[1])
-    slide1.shapes.title.text = "Introduction"
-    slide1.placeholders[1].text = "This is the first slide."
-
-    # Slide 2: chart
-    slide2 = prs.slides.add_slide(prs.slide_layouts[5])  # Blank layout
-    chart_data = CategoryChartData()
-    chart_data.categories = ["Q1", "Q2", "Q3"]
-    chart_data.add_series("Revenue", (100, 200, 300))
-    slide2.shapes.add_chart(
-        XL_CHART_TYPE.COLUMN_CLUSTERED,
-        Inches(1),
-        Inches(1),
-        Inches(6),
-        Inches(4),
-        chart_data,
-    )
-
-    buf = io.BytesIO()
-    prs.save(buf)
-    buf.seek(0)
-    return buf
-
-
-def _make_pptx_without_chart() -> io.BytesIO:
-    """Create an in-memory pptx with a single text-only slide."""
-    prs = Presentation()
-    slide = prs.slides.add_slide(prs.slide_layouts[1])
-    slide.shapes.title.text = "Hello World"
-    slide.placeholders[1].text = "Some content here."
-
-    buf = io.BytesIO()
-    prs.save(buf)
-    buf.seek(0)
-    return buf
-
-
-class TestPptxToText:
-    def test_chart_is_omitted(self) -> None:
-        # Precondition
-        pptx_file = _make_pptx_with_chart()
-
-        # Under test
-        result = pptx_to_text(pptx_file)
-
-        # Postcondition
-        assert "Introduction" in result
-        assert "first slide" in result
-        assert "[chart omitted]" in result
-        # The actual chart data should NOT appear in the output.
-        assert "Revenue" not in result
-        assert "Q1" not in result
-
-    def test_text_only_pptx(self) -> None:
-        # Precondition
-        pptx_file = _make_pptx_without_chart()
-
-        # Under test
-        result = pptx_to_text(pptx_file)
-
-        # Postcondition
-        assert "Hello World" in result
-        assert "Some content" in result
-        assert "[chart omitted]" not in result
--- a/backend/tests/unit/onyx/file_store/test_delete_file.py
+++ b/backend/tests/unit/onyx/file_store/test_delete_file.py
@@ -1,91 +0,0 @@
-"""Tests for FileStore.delete_file error_on_missing behavior."""
-
-from unittest.mock import MagicMock
-from unittest.mock import patch
-
-import pytest
-
-_S3_MODULE = "onyx.file_store.file_store"
-_PG_MODULE = "onyx.file_store.postgres_file_store"
-
-
-def _mock_db_session() -> MagicMock:
-    session = MagicMock()
-    session.__enter__ = MagicMock(return_value=session)
-    session.__exit__ = MagicMock(return_value=False)
-    return session
-
-
-# ── S3BackedFileStore ────────────────────────────────────────────────
-
-
-@patch(f"{_S3_MODULE}.get_session_with_current_tenant_if_none")
-@patch(f"{_S3_MODULE}.get_filerecord_by_file_id_optional", return_value=None)
-def test_s3_delete_missing_file_raises_by_default(
-    _mock_get_record: MagicMock,
-    mock_ctx: MagicMock,
-) -> None:
-    from onyx.file_store.file_store import S3BackedFileStore
-
-    mock_ctx.return_value = _mock_db_session()
-    store = S3BackedFileStore(bucket_name="b")
-
-    with pytest.raises(RuntimeError, match="does not exist"):
-        store.delete_file("nonexistent")
-
-
-@patch(f"{_S3_MODULE}.get_session_with_current_tenant_if_none")
-@patch(f"{_S3_MODULE}.get_filerecord_by_file_id_optional", return_value=None)
-@patch(f"{_S3_MODULE}.delete_filerecord_by_file_id")
-def test_s3_delete_missing_file_silent_when_error_on_missing_false(
-    mock_delete_record: MagicMock,
-    _mock_get_record: MagicMock,
-    mock_ctx: MagicMock,
-) -> None:
-    from onyx.file_store.file_store import S3BackedFileStore
-
-    mock_ctx.return_value = _mock_db_session()
-    store = S3BackedFileStore(bucket_name="b")
-
-    store.delete_file("nonexistent", error_on_missing=False)
-
-    mock_delete_record.assert_not_called()
-
-
-# ── PostgresBackedFileStore ──────────────────────────────────────────
-
-
-@patch(f"{_PG_MODULE}.get_session_with_current_tenant_if_none")
-@patch(f"{_PG_MODULE}.get_file_content_by_file_id_optional", return_value=None)
-def test_pg_delete_missing_file_raises_by_default(
-    _mock_get_content: MagicMock,
-    mock_ctx: MagicMock,
-) -> None:
-    from onyx.file_store.postgres_file_store import PostgresBackedFileStore
-
-    mock_ctx.return_value = _mock_db_session()
-    store = PostgresBackedFileStore()
-
-    with pytest.raises(RuntimeError, match="does not exist"):
-        store.delete_file("nonexistent")
-
-
-@patch(f"{_PG_MODULE}.get_session_with_current_tenant_if_none")
-@patch(f"{_PG_MODULE}.get_file_content_by_file_id_optional", return_value=None)
-@patch(f"{_PG_MODULE}.delete_file_content_by_file_id")
-@patch(f"{_PG_MODULE}.delete_filerecord_by_file_id")
-def test_pg_delete_missing_file_silent_when_error_on_missing_false(
-    mock_delete_record: MagicMock,
-    mock_delete_content: MagicMock,
-    _mock_get_content: MagicMock,
-    mock_ctx: MagicMock,
-) -> None:
-    from onyx.file_store.postgres_file_store import PostgresBackedFileStore
-
-    mock_ctx.return_value = _mock_db_session()
-    store = PostgresBackedFileStore()
-
-    store.delete_file("nonexistent", error_on_missing=False)
-
-    mock_delete_record.assert_not_called()
-    mock_delete_content.assert_not_called()
--- a/cli/README.md
+++ b/cli/README.md
@@ -98,7 +98,6 @@ Useful hardening flags:
 | `serve` | Serve the interactive chat TUI over SSH |
 | `configure` | Configure server URL and API key |
 | `validate-config` | Validate configuration and test connection |
-| `install-skill` | Install the agent skill file into a project |

 ## Slash Commands (in TUI)

--- a/cli/cmd/agents.go
+++ b/cli/cmd/agents.go
@@ -7,7 +7,6 @@ import (

 	"github.com/onyx-dot-app/onyx/cli/internal/api"
 	"github.com/onyx-dot-app/onyx/cli/internal/config"
-	"github.com/onyx-dot-app/onyx/cli/internal/exitcodes"
 	"github.com/spf13/cobra"
 )

@@ -17,23 +16,16 @@ func newAgentsCmd() *cobra.Command {
 	cmd := &cobra.Command{
 		Use:   "agents",
 		Short: "List available agents",
-		Long: `List all visible agents configured on the Onyx server.
-
-By default, output is a human-readable table with ID, name, and description.
-Use --json for machine-readable output.`,
-		Example: `  onyx-cli agents
-  onyx-cli agents --json
-  onyx-cli agents --json | jq '.[].name'`,
 		RunE: func(cmd *cobra.Command, args []string) error {
 			cfg := config.Load()
 			if !cfg.IsConfigured() {
-				return exitcodes.New(exitcodes.NotConfigured, "onyx CLI is not configured\n  Run: onyx-cli configure")
+				return fmt.Errorf("onyx CLI is not configured — run 'onyx-cli configure' first")
 			}

 			client := api.NewClient(cfg)
 			agents, err := client.ListAgents(cmd.Context())
 			if err != nil {
-				return fmt.Errorf("failed to list agents: %w\n  Check your connection with: onyx-cli validate-config", err)
+				return fmt.Errorf("failed to list agents: %w", err)
 			}

 			if agentsJSON {
--- a/cli/cmd/ask.go
+++ b/cli/cmd/ask.go
@@ -4,65 +4,33 @@ import (
 	"context"
 	"encoding/json"
 	"fmt"
-	"io"
 	"os"
 	"os/signal"
-	"strings"
 	"syscall"

 	"github.com/onyx-dot-app/onyx/cli/internal/api"
 	"github.com/onyx-dot-app/onyx/cli/internal/config"
-	"github.com/onyx-dot-app/onyx/cli/internal/exitcodes"
 	"github.com/onyx-dot-app/onyx/cli/internal/models"
-	"github.com/onyx-dot-app/onyx/cli/internal/overflow"
 	"github.com/spf13/cobra"
-	"golang.org/x/term"
 )

-const defaultMaxOutputBytes = 4096
-
 func newAskCmd() *cobra.Command {
 	var (
 		askAgentID int
 		askJSON    bool
-		askQuiet   bool
-		askPrompt  string
-		maxOutput  int
 	)

 	cmd := &cobra.Command{
 		Use:   "ask [question]",
 		Short: "Ask a one-shot question (non-interactive)",
-		Long: `Send a one-shot question to an Onyx agent and print the response.
-
-The question can be provided as a positional argument, via --prompt, or piped
-through stdin. When stdin contains piped data, it is sent as context along
-with the question from --prompt (or used as the question itself).
-
-When stdout is not a TTY (e.g., called by a script or AI agent), output is
-automatically truncated to --max-output bytes and the full response is saved
-to a temp file. Set --max-output 0 to disable truncation.`,
-		Args: cobra.MaximumNArgs(1),
-		Example: `  onyx-cli ask "What connectors are available?"
-  onyx-cli ask --agent-id 3 "Summarize our Q4 revenue"
-  onyx-cli ask --json "List all users" | jq '.event.content'
-  cat error.log | onyx-cli ask --prompt "Find the root cause"
-  echo "what is onyx?" | onyx-cli ask`,
+		Args:  cobra.ExactArgs(1),
 		RunE: func(cmd *cobra.Command, args []string) error {
 			cfg := config.Load()
 			if !cfg.IsConfigured() {
-				return exitcodes.New(exitcodes.NotConfigured, "onyx CLI is not configured\n  Run: onyx-cli configure")
-			}
-
-			if askJSON && askQuiet {
-				return exitcodes.New(exitcodes.BadRequest, "--json and --quiet cannot be used together")
-			}
-
-			question, err := resolveQuestion(args, askPrompt)
-			if err != nil {
-				return err
+				return fmt.Errorf("onyx CLI is not configured — run 'onyx-cli configure' first")
 			}

+			question := args[0]
 			agentID := cfg.DefaultAgentID
 			if cmd.Flags().Changed("agent-id") {
 				agentID = askAgentID
@@ -82,23 +50,9 @@ to a temp file. Set --max-output 0 to disable truncation.`,
 				nil,
 			)

-			// Determine truncation threshold.
-			isTTY := term.IsTerminal(int(os.Stdout.Fd()))
-			truncateAt := 0 // 0 means no truncation
-			if cmd.Flags().Changed("max-output") {
-				truncateAt = maxOutput
-			} else if !isTTY {
-				truncateAt = defaultMaxOutputBytes
-			}
-
 			var sessionID string
 			var lastErr error
 			gotStop := false
-
-			// Overflow writer: tees to stdout and optionally to a temp file.
-			// In quiet mode, buffer everything and print once at the end.
-			ow := &overflow.Writer{Limit: truncateAt, Quiet: askQuiet}
-
 			for event := range ch {
 				if e, ok := event.(models.SessionCreatedEvent); ok {
 					sessionID = e.ChatSessionID
@@ -128,50 +82,22 @@ to a temp file. Set --max-output 0 to disable truncation.`,

 				switch e := event.(type) {
 				case models.MessageDeltaEvent:
-					ow.Write(e.Content)
-				case models.SearchStartEvent:
-					if isTTY && !askQuiet {
-						if e.IsInternetSearch {
-							fmt.Fprintf(os.Stderr, "\033[2mSearching the web...\033[0m\n")
-						} else {
-							fmt.Fprintf(os.Stderr, "\033[2mSearching documents...\033[0m\n")
-						}
-					}
-				case models.SearchQueriesEvent:
-					if isTTY && !askQuiet {
-						for _, q := range e.Queries {
-							fmt.Fprintf(os.Stderr, "\033[2m  → %s\033[0m\n", q)
-						}
-					}
-				case models.SearchDocumentsEvent:
-					if isTTY && !askQuiet && len(e.Documents) > 0 {
-						fmt.Fprintf(os.Stderr, "\033[2mFound %d documents\033[0m\n", len(e.Documents))
-					}
-				case models.ReasoningStartEvent:
-					if isTTY && !askQuiet {
-						fmt.Fprintf(os.Stderr, "\033[2mThinking...\033[0m\n")
-					}
-				case models.ToolStartEvent:
-					if isTTY && !askQuiet && e.ToolName != "" {
-						fmt.Fprintf(os.Stderr, "\033[2mUsing %s...\033[0m\n", e.ToolName)
-					}
+					fmt.Print(e.Content)
 				case models.ErrorEvent:
-					ow.Finish()
 					return fmt.Errorf("%s", e.Error)
 				case models.StopEvent:
-					ow.Finish()
+					fmt.Println()
 					return nil
 				}
 			}

-			if !askJSON {
-				ow.Finish()
-			}
-
 			if ctx.Err() != nil {
 				if sessionID != "" {
 					client.StopChatSession(context.Background(), sessionID)
 				}
+				if !askJSON {
+					fmt.Println()
+				}
 				return nil
 			}

@@ -179,56 +105,20 @@ to a temp file. Set --max-output 0 to disable truncation.`,
 				return lastErr
 			}
 			if !gotStop {
+				if !askJSON {
+					fmt.Println()
+				}
 				return fmt.Errorf("stream ended unexpectedly")
 			}
+			if !askJSON {
+				fmt.Println()
+			}
 			return nil
 		},
 	}

 	cmd.Flags().IntVar(&askAgentID, "agent-id", 0, "Agent ID to use")
 	cmd.Flags().BoolVar(&askJSON, "json", false, "Output raw JSON events")
-	cmd.Flags().BoolVarP(&askQuiet, "quiet", "q", false, "Buffer output and print once at end (no streaming)")
-	cmd.Flags().StringVar(&askPrompt, "prompt", "", "Question text (use with piped stdin context)")
-	cmd.Flags().IntVar(&maxOutput, "max-output", defaultMaxOutputBytes,
-		"Max bytes to print before truncating (0 to disable, auto-enabled for non-TTY)")
+	// Suppress cobra's default error/usage on RunE errors
 	return cmd
 }
-
-// resolveQuestion builds the final question string from args, --prompt, and stdin.
-func resolveQuestion(args []string, prompt string) (string, error) {
-	hasArg := len(args) > 0
-	hasPrompt := prompt != ""
-	hasStdin := !term.IsTerminal(int(os.Stdin.Fd()))
-
-	if hasArg && hasPrompt {
-		return "", exitcodes.New(exitcodes.BadRequest, "specify the question as an argument or --prompt, not both")
-	}
-
-	var stdinContent string
-	if hasStdin {
-		const maxStdinBytes = 10 * 1024 * 1024 // 10MB
-		data, err := io.ReadAll(io.LimitReader(os.Stdin, maxStdinBytes))
-		if err != nil {
-			return "", fmt.Errorf("failed to read stdin: %w", err)
-		}
-		stdinContent = strings.TrimSpace(string(data))
-	}
-
-	switch {
-	case hasArg && stdinContent != "":
-		// arg is the question, stdin is context
-		return args[0] + "\n\n" + stdinContent, nil
-	case hasArg:
-		return args[0], nil
-	case hasPrompt && stdinContent != "":
-		// --prompt is the question, stdin is context
-		return prompt + "\n\n" + stdinContent, nil
-	case hasPrompt:
-		return prompt, nil
-	case stdinContent != "":
-		return stdinContent, nil
-	default:
-		return "", exitcodes.New(exitcodes.BadRequest, "no question provided\n  Usage: onyx-cli ask \"your question\"\n  Or:    echo \"context\" | onyx-cli ask --prompt \"your question\"")
-	}
-}
-
--- a/cli/cmd/chat.go
+++ b/cli/cmd/chat.go
@@ -10,16 +10,9 @@ import (
 )

 func newChatCmd() *cobra.Command {
-	var noStreamMarkdown bool
-
-	cmd := &cobra.Command{
+	return &cobra.Command{
 		Use:   "chat",
 		Short: "Launch the interactive chat TUI (default)",
-		Long: `Launch the interactive terminal UI for chatting with your Onyx agent.
-This is the default command when no subcommand is specified. On first run,
-an interactive setup wizard will guide you through configuration.`,
-		Example: `  onyx-cli chat
-  onyx-cli`,
 		RunE: func(cmd *cobra.Command, args []string) error {
 			cfg := config.Load()

@@ -32,12 +25,6 @@ an interactive setup wizard will guide you through configuration.`,
 				cfg = *result
 			}

-			// CLI flag overrides config/env
-			if cmd.Flags().Changed("no-stream-markdown") {
-				v := !noStreamMarkdown
-				cfg.Features.StreamMarkdown = &v
-			}
-
 			starprompt.MaybePrompt()

 			m := tui.NewModel(cfg)
@@ -46,8 +33,4 @@ an interactive setup wizard will guide you through configuration.`,
 			return err
 		},
 	}
-
-	cmd.Flags().BoolVar(&noStreamMarkdown, "no-stream-markdown", false, "Disable progressive markdown rendering during streaming")
-
-	return cmd
 }
--- a/cli/cmd/configure.go
+++ b/cli/cmd/configure.go
@@ -1,126 +1,19 @@
 package cmd

 import (
-	"context"
-	"errors"
-	"fmt"
-	"io"
-	"os"
-	"strings"
-	"time"
-
-	"github.com/onyx-dot-app/onyx/cli/internal/api"
 	"github.com/onyx-dot-app/onyx/cli/internal/config"
-	"github.com/onyx-dot-app/onyx/cli/internal/exitcodes"
 	"github.com/onyx-dot-app/onyx/cli/internal/onboarding"
 	"github.com/spf13/cobra"
-	"golang.org/x/term"
 )

 func newConfigureCmd() *cobra.Command {
-	var (
-		serverURL   string
-		apiKey      string
-		apiKeyStdin bool
-		dryRun      bool
-	)
-
-	cmd := &cobra.Command{
+	return &cobra.Command{
 		Use:   "configure",
 		Short: "Configure server URL and API key",
-		Long: `Set up the Onyx CLI with your server URL and API key.
-
-When --server-url and --api-key are both provided, the configuration is saved
-non-interactively (useful for scripts and AI agents). Otherwise, an interactive
-setup wizard is launched.
-
-If --api-key is omitted but stdin has piped data, the API key is read from
-stdin automatically. You can also use --api-key-stdin to make this explicit.
-This avoids leaking the key in shell history.
-
-Use --dry-run to test the connection without saving the configuration.`,
-		Example: `  onyx-cli configure
-  onyx-cli configure --server-url https://my-onyx.com --api-key sk-...
-  echo "$ONYX_API_KEY" | onyx-cli configure --server-url https://my-onyx.com
-  echo "$ONYX_API_KEY" | onyx-cli configure --server-url https://my-onyx.com --api-key-stdin
-  onyx-cli configure --server-url https://my-onyx.com --api-key sk-... --dry-run`,
 		RunE: func(cmd *cobra.Command, args []string) error {
-			// Read API key from stdin if piped (implicit) or --api-key-stdin (explicit)
-			if apiKeyStdin && apiKey != "" {
-				return exitcodes.New(exitcodes.BadRequest, "--api-key and --api-key-stdin cannot be used together")
-			}
-			if (apiKey == "" && !term.IsTerminal(int(os.Stdin.Fd()))) || apiKeyStdin {
-				data, err := io.ReadAll(os.Stdin)
-				if err != nil {
-					return fmt.Errorf("failed to read API key from stdin: %w", err)
-				}
-				apiKey = strings.TrimSpace(string(data))
-			}
-
-			if serverURL != "" && apiKey != "" {
-				return configureNonInteractive(serverURL, apiKey, dryRun)
-			}
-
-			if dryRun {
-				return exitcodes.New(exitcodes.BadRequest, "--dry-run requires --server-url and --api-key")
-			}
-
-			if serverURL != "" || apiKey != "" {
-				return exitcodes.New(exitcodes.BadRequest, "both --server-url and --api-key are required for non-interactive setup\n  Run 'onyx-cli configure' without flags for interactive setup")
-			}
-
 			cfg := config.Load()
 			onboarding.Run(&cfg)
 			return nil
 		},
 	}
-
-	cmd.Flags().StringVar(&serverURL, "server-url", "", "Onyx server URL (e.g., https://cloud.onyx.app)")
-	cmd.Flags().StringVar(&apiKey, "api-key", "", "API key for authentication (or pipe via stdin)")
-	cmd.Flags().BoolVar(&apiKeyStdin, "api-key-stdin", false, "Read API key from stdin (explicit; also happens automatically when stdin is piped)")
-	cmd.Flags().BoolVar(&dryRun, "dry-run", false, "Test connection without saving config (requires --server-url and --api-key)")
-
-	return cmd
-}
-
-func configureNonInteractive(serverURL, apiKey string, dryRun bool) error {
-	cfg := config.OnyxCliConfig{
-		ServerURL:      serverURL,
-		APIKey:         apiKey,
-		DefaultAgentID: 0,
-	}
-
-	// Preserve existing default agent ID from disk (not env overrides)
-	if existing := config.LoadFromDisk(); existing.DefaultAgentID != 0 {
-		cfg.DefaultAgentID = existing.DefaultAgentID
-	}
-
-	// Test connection
-	client := api.NewClient(cfg)
-	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
-	defer cancel()
-
-	if err := client.TestConnection(ctx); err != nil {
-		var authErr *api.AuthError
-		if errors.As(err, &authErr) {
-			return exitcodes.Newf(exitcodes.AuthFailure, "authentication failed: %v\n  Check your API key", err)
-		}
-		return exitcodes.Newf(exitcodes.Unreachable, "connection failed: %v\n  Check your server URL", err)
-	}
-
-	if dryRun {
-		fmt.Printf("Server:  %s\n", serverURL)
-		fmt.Println("Status:  connected and authenticated")
-		fmt.Println("Dry run: config was NOT saved")
-		return nil
-	}
-
-	if err := config.Save(cfg); err != nil {
-		return fmt.Errorf("could not save config: %w", err)
-	}
-
-	fmt.Printf("Config:  %s\n", config.ConfigFilePath())
-	fmt.Printf("Server:  %s\n", serverURL)
-	fmt.Println("Status:  connected and authenticated")
-	return nil
 }
--- a/cli/cmd/experiments.go
+++ b/cli/cmd/experiments.go
@@ -1,20 +0,0 @@
-package cmd
-
-import (
-	"fmt"
-
-	"github.com/onyx-dot-app/onyx/cli/internal/config"
-	"github.com/spf13/cobra"
-)
-
-func newExperimentsCmd() *cobra.Command {
-	return &cobra.Command{
-		Use:   "experiments",
-		Short: "List experimental features and their status",
-		RunE: func(cmd *cobra.Command, args []string) error {
-			cfg := config.Load()
-			_, _ = fmt.Fprintln(cmd.OutOrStdout(), config.ExperimentsText(cfg.Features))
-			return nil
-		},
-	}
-}
--- a/cli/cmd/install_skill.go
+++ b/cli/cmd/install_skill.go
@@ -1,176 +0,0 @@
-package cmd
-
-import (
-	"fmt"
-	"os"
-	"path/filepath"
-
-	"github.com/onyx-dot-app/onyx/cli/internal/embedded"
-	"github.com/onyx-dot-app/onyx/cli/internal/fsutil"
-	"github.com/spf13/cobra"
-)
-
-// agentSkillDirs maps agent names to their skill directory paths (relative to
-// the project or home root). "Universal" agents like Cursor and Codex read
-// from .agents/skills directly, so they don't need their own entry here.
-var agentSkillDirs = map[string]string{
-	"claude-code": filepath.Join(".claude", "skills"),
-}
-
-const (
-	canonicalDir = ".agents/skills"
-	skillName    = "onyx-cli"
-)
-
-func newInstallSkillCmd() *cobra.Command {
-	var (
-		global    bool
-		copyMode  bool
-		agents    []string
-	)
-
-	cmd := &cobra.Command{
-		Use:   "install-skill",
-		Short: "Install the Onyx CLI agent skill file",
-		Long: `Install the bundled SKILL.md so that AI coding agents can discover and use
-the Onyx CLI as a tool.
-
-Files are written to the canonical .agents/skills/onyx-cli/ directory. For
-agents that use their own skill directory (e.g. Claude Code uses .claude/skills/),
-a symlink is created pointing back to the canonical copy.
-
-By default the skill is installed at the project level (current directory).
-Use --global to install under your home directory instead.
-
-Use --copy to write independent copies instead of symlinks.
-Use --agent to target specific agents (can be repeated).`,
-		Example: `  onyx-cli install-skill
-  onyx-cli install-skill --global
-  onyx-cli install-skill --agent claude-code
-  onyx-cli install-skill --copy`,
-		RunE: func(cmd *cobra.Command, args []string) error {
-			base, err := installBase(global)
-			if err != nil {
-				return err
-			}
-
-			// Write the canonical copy.
-			canonicalSkillDir := filepath.Join(base, canonicalDir, skillName)
-			dest := filepath.Join(canonicalSkillDir, "SKILL.md")
-			content := []byte(embedded.SkillMD)
-
-			status, err := fsutil.CompareFile(dest, content)
-			if err != nil {
-				return err
-			}
-			switch status {
-			case fsutil.StatusUpToDate:
-				_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Up to date %s\n", dest)
-			case fsutil.StatusDiffers:
-				_, _ = fmt.Fprintf(cmd.ErrOrStderr(), "Warning: overwriting modified %s\n", dest)
-				if err := os.WriteFile(dest, content, 0o644); err != nil {
-					return fmt.Errorf("could not write skill file: %w", err)
-				}
-				_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Installed %s\n", dest)
-			default: // statusMissing
-				if err := os.MkdirAll(canonicalSkillDir, 0o755); err != nil {
-					return fmt.Errorf("could not create directory: %w", err)
-				}
-				if err := os.WriteFile(dest, content, 0o644); err != nil {
-					return fmt.Errorf("could not write skill file: %w", err)
-				}
-				_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Installed %s\n", dest)
-			}
-
-			// Determine which agents to link.
-			targets := agentSkillDirs
-			if len(agents) > 0 {
-				targets = make(map[string]string)
-				for _, a := range agents {
-					dir, ok := agentSkillDirs[a]
-					if !ok {
-						_, _ = fmt.Fprintf(cmd.ErrOrStderr(), "Unknown agent %q (skipped) — known agents:", a)
-						for name := range agentSkillDirs {
-							_, _ = fmt.Fprintf(cmd.ErrOrStderr(), " %s", name)
-						}
-						_, _ = fmt.Fprintln(cmd.ErrOrStderr())
-						continue
-					}
-					targets[a] = dir
-				}
-			}
-
-			// Create symlinks (or copies) from agent-specific dirs to canonical.
-			for name, skillsDir := range targets {
-				agentSkillDir := filepath.Join(base, skillsDir, skillName)
-
-				if copyMode {
-					copyDest := filepath.Join(agentSkillDir, "SKILL.md")
-					if err := fsutil.EnsureDirForCopy(agentSkillDir); err != nil {
-						return fmt.Errorf("could not prepare %s directory: %w", name, err)
-					}
-					if err := os.MkdirAll(agentSkillDir, 0o755); err != nil {
-						return fmt.Errorf("could not create %s directory: %w", name, err)
-					}
-					if err := os.WriteFile(copyDest, []byte(embedded.SkillMD), 0o644); err != nil {
-						return fmt.Errorf("could not write %s skill file: %w", name, err)
-					}
-					_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Copied  %s\n", copyDest)
-					continue
-				}
-
-				// Compute relative symlink target. Symlinks resolve relative to
-				// the parent directory of the link, not the link itself.
-				rel, err := filepath.Rel(filepath.Dir(agentSkillDir), canonicalSkillDir)
-				if err != nil {
-					return fmt.Errorf("could not compute relative path for %s: %w", name, err)
-				}
-
-				if err := os.MkdirAll(filepath.Dir(agentSkillDir), 0o755); err != nil {
-					return fmt.Errorf("could not create %s directory: %w", name, err)
-				}
-
-				// Remove existing symlink/dir before creating.
-				_ = os.Remove(agentSkillDir)
-
-				if err := os.Symlink(rel, agentSkillDir); err != nil {
-					// Fall back to copy if symlink fails (e.g. Windows without dev mode).
-					copyDest := filepath.Join(agentSkillDir, "SKILL.md")
-					if mkErr := os.MkdirAll(agentSkillDir, 0o755); mkErr != nil {
-						return fmt.Errorf("could not create %s directory: %w", name, mkErr)
-					}
-					if wErr := os.WriteFile(copyDest, []byte(embedded.SkillMD), 0o644); wErr != nil {
-						return fmt.Errorf("could not write %s skill file: %w", name, wErr)
-					}
-					_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Copied  %s (symlink failed)\n", copyDest)
-					continue
-				}
-				_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Linked  %s -> %s\n", agentSkillDir, rel)
-			}
-
-			return nil
-		},
-	}
-
-	cmd.Flags().BoolVarP(&global, "global", "g", false, "Install to home directory instead of project")
-	cmd.Flags().BoolVar(&copyMode, "copy", false, "Copy files instead of symlinking")
-	cmd.Flags().StringSliceVarP(&agents, "agent", "a", nil, "Target specific agents (e.g. claude-code)")
-
-	return cmd
-}
-
-func installBase(global bool) (string, error) {
-	if global {
-		home, err := os.UserHomeDir()
-		if err != nil {
-			return "", fmt.Errorf("could not determine home directory: %w", err)
-		}
-		return home, nil
-	}
-	cwd, err := os.Getwd()
-	if err != nil {
-		return "", fmt.Errorf("could not determine working directory: %w", err)
-	}
-	return cwd, nil
-}
-
--- a/cli/cmd/root.go
+++ b/cli/cmd/root.go
@@ -97,8 +97,6 @@ func Execute() error {
 	rootCmd.AddCommand(newConfigureCmd())
 	rootCmd.AddCommand(newValidateConfigCmd())
 	rootCmd.AddCommand(newServeCmd())
-	rootCmd.AddCommand(newInstallSkillCmd())
-	rootCmd.AddCommand(newExperimentsCmd())

 	// Default command is chat, but intercept --version first
 	rootCmd.RunE = func(cmd *cobra.Command, args []string) error {
--- a/cli/cmd/serve.go
+++ b/cli/cmd/serve.go
@@ -23,7 +23,6 @@ import (
 	"github.com/charmbracelet/wish/ratelimiter"
 	"github.com/onyx-dot-app/onyx/cli/internal/api"
 	"github.com/onyx-dot-app/onyx/cli/internal/config"
-	"github.com/onyx-dot-app/onyx/cli/internal/exitcodes"
 	"github.com/onyx-dot-app/onyx/cli/internal/tui"
 	"github.com/spf13/cobra"
 	"golang.org/x/time/rate"
@@ -296,15 +295,15 @@ provided via the ONYX_API_KEY environment variable to skip the prompt:
 The server URL is taken from the server operator's config. The server
 auto-generates an Ed25519 host key on first run if the key file does not
 already exist. The host key path can also be set via the ONYX_SSH_HOST_KEY
-environment variable (the --host-key flag takes precedence).`,
-		Example: `  onyx-cli serve --port 2222
-  ssh localhost -p 2222
-  onyx-cli serve --host 0.0.0.0 --port 2222
-  onyx-cli serve --idle-timeout 30m --max-session-timeout 2h`,
+environment variable (the --host-key flag takes precedence).
+
+Example:
+  onyx-cli serve --port 2222
+  ssh localhost -p 2222`,
 		RunE: func(cmd *cobra.Command, args []string) error {
 			serverCfg := config.Load()
 			if serverCfg.ServerURL == "" {
-				return exitcodes.New(exitcodes.NotConfigured, "server URL is not configured\n  Run: onyx-cli configure")
+				return fmt.Errorf("server URL is not configured; run 'onyx-cli configure' first")
 			}
 			if !cmd.Flags().Changed("host-key") {
 				if v := os.Getenv(config.EnvSSHHostKey); v != "" {
--- a/cli/cmd/validate.go
+++ b/cli/cmd/validate.go
@@ -2,13 +2,11 @@ package cmd

 import (
 	"context"
-	"errors"
 	"fmt"
 	"time"

 	"github.com/onyx-dot-app/onyx/cli/internal/api"
 	"github.com/onyx-dot-app/onyx/cli/internal/config"
-	"github.com/onyx-dot-app/onyx/cli/internal/exitcodes"
 	"github.com/onyx-dot-app/onyx/cli/internal/version"
 	log "github.com/sirupsen/logrus"
 	"github.com/spf13/cobra"
@@ -18,21 +16,17 @@ func newValidateConfigCmd() *cobra.Command {
 	return &cobra.Command{
 		Use:   "validate-config",
 		Short: "Validate configuration and test server connection",
-		Long: `Check that the CLI is configured, the server is reachable, and the API key
-is valid. Also reports the server version and warns if it is below the
-minimum required.`,
-		Example: `  onyx-cli validate-config`,
 		RunE: func(cmd *cobra.Command, args []string) error {
 			// Check config file
 			if !config.ConfigExists() {
-				return exitcodes.Newf(exitcodes.NotConfigured, "config file not found at %s\n  Run: onyx-cli configure", config.ConfigFilePath())
+				return fmt.Errorf("config file not found at %s\n  Run 'onyx-cli configure' to set up", config.ConfigFilePath())
 			}

 			cfg := config.Load()

 			// Check API key
 			if !cfg.IsConfigured() {
-				return exitcodes.New(exitcodes.NotConfigured, "API key is missing\n  Run: onyx-cli configure")
+				return fmt.Errorf("API key is missing\n  Run 'onyx-cli configure' to set up")
 			}

 			_, _ = fmt.Fprintf(cmd.OutOrStdout(), "Config:  %s\n", config.ConfigFilePath())
@@ -41,11 +35,7 @@ minimum required.`,
 			// Test connection
 			client := api.NewClient(cfg)
 			if err := client.TestConnection(cmd.Context()); err != nil {
-				var authErr *api.AuthError
-				if errors.As(err, &authErr) {
-					return exitcodes.Newf(exitcodes.AuthFailure, "authentication failed: %v\n  Reconfigure with: onyx-cli configure", err)
-				}
-				return exitcodes.Newf(exitcodes.Unreachable, "connection failed: %v\n  Reconfigure with: onyx-cli configure", err)
+				return fmt.Errorf("connection failed: %w", err)
 			}

 			_, _ = fmt.Fprintln(cmd.OutOrStdout(), "Status:  connected and authenticated")
--- a/cli/internal/api/client.go
+++ b/cli/internal/api/client.go
@@ -149,12 +149,12 @@ func (c *Client) TestConnection(ctx context.Context) error {

 	if resp2.StatusCode == 401 || resp2.StatusCode == 403 {
 		if isHTML || strings.Contains(respServer, "awselb") {
-			return &AuthError{Message: fmt.Sprintf("HTTP %d from a reverse proxy (not the Onyx backend).\n  Check your deployment's ingress / proxy configuration", resp2.StatusCode)}
+			return fmt.Errorf("HTTP %d from a reverse proxy (not the Onyx backend).\n  Check your deployment's ingress / proxy configuration", resp2.StatusCode)
 		}
 		if resp2.StatusCode == 401 {
-			return &AuthError{Message: fmt.Sprintf("invalid API key or token.\n  %s", body)}
+			return fmt.Errorf("invalid API key or token.\n  %s", body)
 		}
-		return &AuthError{Message: fmt.Sprintf("access denied — check that the API key is valid.\n  %s", body)}
+		return fmt.Errorf("access denied — check that the API key is valid.\n  %s", body)
 	}

 	detail := fmt.Sprintf("HTTP %d", resp2.StatusCode)
--- a/cli/internal/api/errors.go
+++ b/cli/internal/api/errors.go
@@ -11,12 +11,3 @@ type OnyxAPIError struct {
 func (e *OnyxAPIError) Error() string {
 	return fmt.Sprintf("HTTP %d: %s", e.StatusCode, e.Detail)
 }
-
-// AuthError is returned when authentication or authorization fails.
-type AuthError struct {
-	Message string
-}
-
-func (e *AuthError) Error() string {
-	return e.Message
-}
--- a/cli/internal/config/config.go
+++ b/cli/internal/config/config.go
@@ -9,47 +9,28 @@ import (
 )

 const (
-	EnvServerURL      = "ONYX_SERVER_URL"
-	EnvAPIKey         = "ONYX_API_KEY"
-	EnvAgentID        = "ONYX_PERSONA_ID"
-	EnvSSHHostKey     = "ONYX_SSH_HOST_KEY"
-	EnvStreamMarkdown = "ONYX_STREAM_MARKDOWN"
+	EnvServerURL  = "ONYX_SERVER_URL"
+	EnvAPIKey     = "ONYX_API_KEY"
+	EnvAgentID    = "ONYX_PERSONA_ID"
+	EnvSSHHostKey = "ONYX_SSH_HOST_KEY"
 )

-// Features holds experimental feature flags for the CLI.
-type Features struct {
-	// StreamMarkdown enables progressive markdown rendering during streaming,
-	// so output is formatted as it arrives rather than after completion.
-	// nil means use the app default (true).
-	StreamMarkdown *bool `json:"stream_markdown,omitempty"`
-}
-
 // OnyxCliConfig holds the CLI configuration.
 type OnyxCliConfig struct {
-	ServerURL      string   `json:"server_url"`
-	APIKey         string   `json:"api_key"`
-	DefaultAgentID int      `json:"default_persona_id"`
-	Features       Features `json:"features,omitempty"`
+	ServerURL        string `json:"server_url"`
+	APIKey           string `json:"api_key"`
+	DefaultAgentID int    `json:"default_persona_id"`
 }

 // DefaultConfig returns a config with default values.
 func DefaultConfig() OnyxCliConfig {
 	return OnyxCliConfig{
-		ServerURL:      "https://cloud.onyx.app",
-		APIKey:         "",
+		ServerURL:        "https://cloud.onyx.app",
+		APIKey:           "",
 		DefaultAgentID: 0,
 	}
 }

-// StreamMarkdownEnabled returns whether stream markdown is enabled,
-// defaulting to true when the user hasn't set an explicit preference.
-func (f Features) StreamMarkdownEnabled() bool {
-	if f.StreamMarkdown != nil {
-		return *f.StreamMarkdown
-	}
-	return true
-}
-
 // IsConfigured returns true if the config has an API key.
 func (c OnyxCliConfig) IsConfigured() bool {
 	return c.APIKey != ""
@@ -78,10 +59,8 @@ func ConfigExists() bool {
 	return err == nil
 }

-// LoadFromDisk reads config from the file only, without applying environment
-// variable overrides. Use this when you need the persisted config values
-// (e.g., to preserve them during a save operation).
-func LoadFromDisk() OnyxCliConfig {
+// Load reads config from file and applies environment variable overrides.
+func Load() OnyxCliConfig {
 	cfg := DefaultConfig()

 	data, err := os.ReadFile(ConfigFilePath())
@@ -91,13 +70,6 @@ func LoadFromDisk() OnyxCliConfig {
 		}
 	}

-	return cfg
-}
-
-// Load reads config from file and applies environment variable overrides.
-func Load() OnyxCliConfig {
-	cfg := LoadFromDisk()
-
 	// Environment overrides
 	if v := os.Getenv(EnvServerURL); v != "" {
 		cfg.ServerURL = v
@@ -110,13 +82,6 @@ func Load() OnyxCliConfig {
 			cfg.DefaultAgentID = id
 		}
 	}
-	if v := os.Getenv(EnvStreamMarkdown); v != "" {
-		if b, err := strconv.ParseBool(v); err == nil {
-			cfg.Features.StreamMarkdown = &b
-		} else {
-			fmt.Fprintf(os.Stderr, "warning: invalid value %q for %s (expected true/false), ignoring\n", v, EnvStreamMarkdown)
-		}
-	}

 	return cfg
 }
--- a/cli/internal/config/config_test.go
+++ b/cli/internal/config/config_test.go
@@ -9,7 +9,7 @@ import (

 func clearEnvVars(t *testing.T) {
 	t.Helper()
-	for _, key := range []string{EnvServerURL, EnvAPIKey, EnvAgentID, EnvStreamMarkdown} {
+	for _, key := range []string{EnvServerURL, EnvAPIKey, EnvAgentID} {
 		t.Setenv(key, "")
 		if err := os.Unsetenv(key); err != nil {
 			t.Fatal(err)
@@ -199,48 +199,6 @@ func TestSaveAndReload(t *testing.T) {
 	}
 }

-func TestDefaultFeaturesStreamMarkdownNil(t *testing.T) {
-	cfg := DefaultConfig()
-	if cfg.Features.StreamMarkdown != nil {
-		t.Error("expected StreamMarkdown to be nil by default")
-	}
-	if !cfg.Features.StreamMarkdownEnabled() {
-		t.Error("expected StreamMarkdownEnabled() to return true when nil")
-	}
-}
-
-func TestEnvOverrideStreamMarkdownFalse(t *testing.T) {
-	clearEnvVars(t)
-	dir := t.TempDir()
-	t.Setenv("XDG_CONFIG_HOME", dir)
-	t.Setenv(EnvStreamMarkdown, "false")
-
-	cfg := Load()
-	if cfg.Features.StreamMarkdown == nil || *cfg.Features.StreamMarkdown {
-		t.Error("expected StreamMarkdown=false from env override")
-	}
-}
-
-func TestLoadFeaturesFromFile(t *testing.T) {
-	clearEnvVars(t)
-	dir := t.TempDir()
-	t.Setenv("XDG_CONFIG_HOME", dir)
-
-	data, _ := json.Marshal(map[string]interface{}{
-		"server_url": "https://example.com",
-		"api_key":    "key",
-		"features": map[string]interface{}{
-			"stream_markdown": true,
-		},
-	})
-	writeConfig(t, dir, data)
-
-	cfg := Load()
-	if cfg.Features.StreamMarkdown == nil || !*cfg.Features.StreamMarkdown {
-		t.Error("expected StreamMarkdown=true from config file")
-	}
-}
-
 func TestSaveCreatesParentDirs(t *testing.T) {
 	clearEnvVars(t)
 	dir := t.TempDir()
--- a/cli/internal/config/experiments.go
+++ b/cli/internal/config/experiments.go
@@ -1,46 +0,0 @@
-package config
-
-import "fmt"
-
-// Experiment describes an experimental feature flag.
-type Experiment struct {
-	Name    string
-	Flag    string // CLI flag name
-	EnvVar  string // environment variable name
-	Config  string // JSON path in config file
-	Enabled bool
-	Desc    string
-}
-
-// Experiments returns the list of available experimental features
-// with their current status based on the given feature flags.
-func Experiments(f Features) []Experiment {
-	return []Experiment{
-		{
-			Name:    "Stream Markdown",
-			Flag:    "--no-stream-markdown",
-			EnvVar:  EnvStreamMarkdown,
-			Config:  "features.stream_markdown",
-			Enabled: f.StreamMarkdownEnabled(),
-			Desc:    "Render markdown progressively as the response streams in (enabled by default)",
-		},
-	}
-}
-
-// ExperimentsText formats the experiments list for display.
-func ExperimentsText(f Features) string {
-	exps := Experiments(f)
-	text := "Experimental Features\n\n"
-	for _, e := range exps {
-		status := "off"
-		if e.Enabled {
-			status = "on"
-		}
-		text += fmt.Sprintf("  %-20s [%s]\n", e.Name, status)
-		text += fmt.Sprintf("    %s\n", e.Desc)
-		text += fmt.Sprintf("    flag: %s  env: %s  config: %s\n\n", e.Flag, e.EnvVar, e.Config)
-	}
-	text += "Toggle via CLI flag, environment variable, or config file.\n"
-	text += "Example: onyx-cli chat --no-stream-markdown"
-	return text
-}
--- a/cli/internal/embedded/SKILL.md
+++ b/cli/internal/embedded/SKILL.md
@@ -1,187 +0,0 @@
---
-name: onyx-cli
-description: Query the Onyx knowledge base using the onyx-cli command. Use when the user wants to search company documents, ask questions about internal knowledge, query connected data sources, or look up information stored in Onyx.
---
-
-# Onyx CLI — Agent Tool
-
-Onyx is an enterprise search and Gen-AI platform that connects to company documents, apps, and people. The `onyx-cli` CLI provides non-interactive commands to query the Onyx knowledge base and list available agents.
-
-## Prerequisites
-
-### 1. Check if installed
-
-```bash
-which onyx-cli
-```
-
-### 2. Install (if needed)
-
-**Primary — pip:**
-
-```bash
-pip install onyx-cli
-```
-
-**From source (Go):**
-
-```bash
-go build -o onyx-cli github.com/onyx-dot-app/onyx/cli && sudo mv onyx-cli /usr/local/bin/
-```
-
-### 3. Check if configured
-
-```bash
-onyx-cli validate-config
-```
-
-This checks the config file exists, API key is present, and tests the server connection via `/api/me`. Exit code 0 on success, non-zero with a descriptive error on failure.
-
-If unconfigured, you have two options:
-
-**Option A — Interactive setup (requires user input):**
-
-```bash
-onyx-cli configure
-```
-
-This prompts for the Onyx server URL and API key, tests the connection, and saves config.
-
-**Option B — Environment variables (non-interactive, preferred for agents):**
-
-```bash
-export ONYX_SERVER_URL="https://your-onyx-server.com"  # default: https://cloud.onyx.app
-export ONYX_API_KEY="your-api-key"
-```
-
-Environment variables override the config file. If these are set, no config file is needed.
-
-| Variable          | Required | Description                                              |
-| ----------------- | -------- | -------------------------------------------------------- |
-| `ONYX_SERVER_URL` | No       | Onyx server base URL (default: `https://cloud.onyx.app`) |
-| `ONYX_API_KEY`    | Yes      | API key for authentication                               |
-| `ONYX_PERSONA_ID` | No       | Default agent/persona ID                                 |
-
-If neither the config file nor environment variables are set, tell the user that `onyx-cli` needs to be configured and ask them to either:
-
- Run `onyx-cli configure` interactively, or
- Set `ONYX_SERVER_URL` and `ONYX_API_KEY` environment variables
-
-## Commands
-
-### Validate configuration
-
-```bash
-onyx-cli validate-config
-```
-
-Checks config file exists, API key is present, and tests the server connection. Use this before `ask` or `agents` to confirm the CLI is properly set up.
-
-### List available agents
-
-```bash
-onyx-cli agents
-```
-
-Prints a table of agent IDs, names, and descriptions. Use `--json` for structured output:
-
-```bash
-onyx-cli agents --json
-```
-
-Use agent IDs with `ask --agent-id` to query a specific agent.
-
-### Basic query (plain text output)
-
-```bash
-onyx-cli ask "What is our company's PTO policy?"
-```
-
-Streams the answer as plain text to stdout. Exit code 0 on success, non-zero on error.
-
-### JSON output (structured events)
-
-```bash
-onyx-cli ask --json "What authentication methods do we support?"
-```
-
-Outputs JSON-encoded parsed stream events (one object per line). Key event objects include message deltas, stop, errors, search-start, and citation payloads.
-
-Each line is a JSON object with this envelope:
-
-```json
-{"type": "<event_type>", "event": { ... }}
-```
-
-| Event Type          | Description                                                          |
-| ------------------- | -------------------------------------------------------------------- |
-| `message_delta`     | Content token — concatenate all `content` fields for the full answer |
-| `stop`              | Stream complete                                                      |
-| `error`             | Error with `error` message field                                     |
-| `search_tool_start` | Onyx started searching documents                                     |
-| `citation_info`     | Source citation — see shape below                                    |
-
-`citation_info` event shape:
-
-```json
-{
-  "type": "citation_info",
-  "event": {
-    "citation_number": 1,
-    "document_id": "abc123def456",
-    "placement": { "turn_index": 0, "tab_index": 0, "sub_turn_index": null }
-  }
-}
-```
-
-`placement` is metadata about where in the conversation the citation appeared and can be ignored for most use cases.
-
-### Specify an agent
-
-```bash
-onyx-cli ask --agent-id 5 "Summarize our Q4 roadmap"
-```
-
-Uses a specific Onyx agent/persona instead of the default.
-
-### All flags
-
-| Flag         | Type | Description                                    |
-| ------------ | ---- | ---------------------------------------------- |
-| `--agent-id` | int  | Agent ID to use (overrides default)            |
-| `--json`     | bool | Output raw NDJSON events instead of plain text |
-
-## Statelessness
-
-Each `onyx-cli ask` call creates an independent chat session. There is no built-in way to chain context across multiple `ask` invocations — every call starts fresh. If you need multi-turn conversation with memory, use the interactive TUI (`onyx-cli` or `onyx-cli chat`) instead.
-
-## When to Use
-
-Use `onyx-cli ask` when:
-
- The user asks about company-specific information (policies, docs, processes)
- You need to search internal knowledge bases or connected data sources
- The user references Onyx, asks you to "search Onyx", or wants to query their documents
- You need context from company wikis, Confluence, Google Drive, Slack, or other connected sources
-
-Do NOT use when:
-
- The question is about general programming knowledge (use your own knowledge)
- The user is asking about code in the current repository (use grep/read tools)
- The user hasn't mentioned Onyx and the question doesn't require internal company data
-
-## Examples
-
-```bash
-# Simple question
-onyx-cli ask "What are the steps to deploy to production?"
-
-# Get structured output for parsing
-onyx-cli ask --json "List all active API integrations"
-
-# Use a specialized agent
-onyx-cli ask --agent-id 3 "What were the action items from last week's standup?"
-
-# Pipe the answer into another command
-onyx-cli ask "What is the database schema for users?" | head -20
-```
--- a/cli/internal/embedded/embed.go
+++ b/cli/internal/embedded/embed.go
@@ -1,7 +0,0 @@
-// Package embedded holds files that are compiled into the onyx-cli binary.
-package embedded
-
-import _ "embed"
-
-//go:embed SKILL.md
-var SkillMD string
--- a/cli/internal/exitcodes/codes.go
+++ b/cli/internal/exitcodes/codes.go
@@ -1,33 +0,0 @@
-// Package exitcodes defines semantic exit codes for the Onyx CLI.
-package exitcodes
-
-import "fmt"
-
-const (
-	Success       = 0
-	General       = 1
-	BadRequest    = 2 // invalid args / command-line errors (convention)
-	NotConfigured = 3
-	AuthFailure   = 4
-	Unreachable   = 5
-)
-
-// ExitError wraps an error with a specific exit code.
-type ExitError struct {
-	Code int
-	Err  error
-}
-
-func (e *ExitError) Error() string {
-	return e.Err.Error()
-}
-
-// New creates an ExitError with the given code and message.
-func New(code int, msg string) *ExitError {
-	return &ExitError{Code: code, Err: fmt.Errorf("%s", msg)}
-}
-
-// Newf creates an ExitError with a formatted message.
-func Newf(code int, format string, args ...any) *ExitError {
-	return &ExitError{Code: code, Err: fmt.Errorf(format, args...)}
-}
--- a/cli/internal/exitcodes/codes_test.go
+++ b/cli/internal/exitcodes/codes_test.go
@@ -1,40 +0,0 @@
-package exitcodes
-
-import (
-	"errors"
-	"fmt"
-	"testing"
-)
-
-func TestExitError_Error(t *testing.T) {
-	e := New(NotConfigured, "not configured")
-	if e.Error() != "not configured" {
-		t.Fatalf("expected 'not configured', got %q", e.Error())
-	}
-	if e.Code != NotConfigured {
-		t.Fatalf("expected code %d, got %d", NotConfigured, e.Code)
-	}
-}
-
-func TestExitError_Newf(t *testing.T) {
-	e := Newf(Unreachable, "cannot reach %s", "server")
-	if e.Error() != "cannot reach server" {
-		t.Fatalf("expected 'cannot reach server', got %q", e.Error())
-	}
-	if e.Code != Unreachable {
-		t.Fatalf("expected code %d, got %d", Unreachable, e.Code)
-	}
-}
-
-func TestExitError_ErrorsAs(t *testing.T) {
-	e := New(BadRequest, "bad input")
-	wrapped := fmt.Errorf("wrapper: %w", e)
-
-	var exitErr *ExitError
-	if !errors.As(wrapped, &exitErr) {
-		t.Fatal("errors.As should find ExitError")
-	}
-	if exitErr.Code != BadRequest {
-		t.Fatalf("expected code %d, got %d", BadRequest, exitErr.Code)
-	}
-}
--- a/cli/internal/fsutil/fsutil.go
+++ b/cli/internal/fsutil/fsutil.go
@@ -1,50 +0,0 @@
-// Package fsutil provides filesystem helper functions.
-package fsutil
-
-import (
-	"bytes"
-	"errors"
-	"fmt"
-	"os"
-)
-
-// FileStatus describes how an on-disk file compares to expected content.
-type FileStatus int
-
-const (
-	StatusMissing  FileStatus = iota
-	StatusUpToDate            // file exists with identical content
-	StatusDiffers             // file exists with different content
-)
-
-// CompareFile checks whether the file at path matches the expected content.
-func CompareFile(path string, expected []byte) (FileStatus, error) {
-	existing, err := os.ReadFile(path)
-	if err != nil {
-		if errors.Is(err, os.ErrNotExist) {
-			return StatusMissing, nil
-		}
-		return 0, fmt.Errorf("could not read %s: %w", path, err)
-	}
-	if bytes.Equal(existing, expected) {
-		return StatusUpToDate, nil
-	}
-	return StatusDiffers, nil
-}
-
-// EnsureDirForCopy makes sure path is a real directory, not a symlink or
-// regular file. If a symlink or file exists at path it is removed so the
-// caller can create a directory with independent content.
-func EnsureDirForCopy(path string) error {
-	info, err := os.Lstat(path)
-	if err == nil {
-		if info.Mode()&os.ModeSymlink != 0 || !info.IsDir() {
-			if err := os.Remove(path); err != nil {
-				return err
-			}
-		}
-	} else if !errors.Is(err, os.ErrNotExist) {
-		return err
-	}
-	return nil
-}
--- a/cli/internal/fsutil/fsutil_test.go
+++ b/cli/internal/fsutil/fsutil_test.go
@@ -1,116 +0,0 @@
-package fsutil
-
-import (
-	"os"
-	"path/filepath"
-	"testing"
-)
-
-// TestCompareFile verifies that CompareFile correctly distinguishes between a
-// missing file, a file with matching content, and a file with different content.
-func TestCompareFile(t *testing.T) {
-	tmpDir := t.TempDir()
-	path := filepath.Join(tmpDir, "skill.md")
-	expected := []byte("expected content")
-
-	status, err := CompareFile(path, expected)
-	if err != nil {
-		t.Fatalf("CompareFile on missing file failed: %v", err)
-	}
-	if status != StatusMissing {
-		t.Fatalf("expected StatusMissing, got %v", status)
-	}
-
-	if err := os.WriteFile(path, expected, 0o644); err != nil {
-		t.Fatalf("write expected file failed: %v", err)
-	}
-	status, err = CompareFile(path, expected)
-	if err != nil {
-		t.Fatalf("CompareFile on matching file failed: %v", err)
-	}
-	if status != StatusUpToDate {
-		t.Fatalf("expected StatusUpToDate, got %v", status)
-	}
-
-	if err := os.WriteFile(path, []byte("different content"), 0o644); err != nil {
-		t.Fatalf("write different file failed: %v", err)
-	}
-	status, err = CompareFile(path, expected)
-	if err != nil {
-		t.Fatalf("CompareFile on different file failed: %v", err)
-	}
-	if status != StatusDiffers {
-		t.Fatalf("expected StatusDiffers, got %v", status)
-	}
-}
-
-// TestEnsureDirForCopy verifies that EnsureDirForCopy clears symlinks and
-// regular files so --copy can write a real directory, while leaving existing
-// directories and missing paths untouched.
-func TestEnsureDirForCopy(t *testing.T) {
-	t.Run("removes symlink", func(t *testing.T) {
-		tmpDir := t.TempDir()
-		targetDir := filepath.Join(tmpDir, "target")
-		linkPath := filepath.Join(tmpDir, "link")
-
-		if err := os.MkdirAll(targetDir, 0o755); err != nil {
-			t.Fatalf("mkdir target failed: %v", err)
-		}
-		if err := os.Symlink(targetDir, linkPath); err != nil {
-			t.Fatalf("create symlink failed: %v", err)
-		}
-
-		if err := EnsureDirForCopy(linkPath); err != nil {
-			t.Fatalf("EnsureDirForCopy failed: %v", err)
-		}
-
-		if _, err := os.Lstat(linkPath); !os.IsNotExist(err) {
-			t.Fatalf("expected symlink path to be removed, got err=%v", err)
-		}
-	})
-
-	t.Run("removes regular file", func(t *testing.T) {
-		tmpDir := t.TempDir()
-		filePath := filepath.Join(tmpDir, "onyx-cli")
-		if err := os.WriteFile(filePath, []byte("x"), 0o644); err != nil {
-			t.Fatalf("write file failed: %v", err)
-		}
-
-		if err := EnsureDirForCopy(filePath); err != nil {
-			t.Fatalf("EnsureDirForCopy failed: %v", err)
-		}
-
-		if _, err := os.Lstat(filePath); !os.IsNotExist(err) {
-			t.Fatalf("expected file path to be removed, got err=%v", err)
-		}
-	})
-
-	t.Run("keeps existing directory", func(t *testing.T) {
-		tmpDir := t.TempDir()
-		dirPath := filepath.Join(tmpDir, "onyx-cli")
-		if err := os.MkdirAll(dirPath, 0o755); err != nil {
-			t.Fatalf("mkdir failed: %v", err)
-		}
-
-		if err := EnsureDirForCopy(dirPath); err != nil {
-			t.Fatalf("EnsureDirForCopy failed: %v", err)
-		}
-
-		info, err := os.Lstat(dirPath)
-		if err != nil {
-			t.Fatalf("lstat directory failed: %v", err)
-		}
-		if !info.IsDir() {
-			t.Fatalf("expected directory to remain, got mode %v", info.Mode())
-		}
-	})
-
-	t.Run("missing path is no-op", func(t *testing.T) {
-		tmpDir := t.TempDir()
-		missingPath := filepath.Join(tmpDir, "does-not-exist")
-
-		if err := EnsureDirForCopy(missingPath); err != nil {
-			t.Fatalf("EnsureDirForCopy failed: %v", err)
-		}
-	})
-}
--- a/cli/internal/overflow/writer.go
+++ b/cli/internal/overflow/writer.go
@@ -1,121 +0,0 @@
-// Package overflow provides a streaming writer that auto-truncates output
-// for non-TTY callers (e.g., AI agents, scripts). Full content is saved to
-// a temp file on disk; only the first N bytes are printed to stdout.
-package overflow
-
-import (
-	"fmt"
-	"os"
-	"strings"
-
-	log "github.com/sirupsen/logrus"
-)
-
-// Writer handles streaming output with optional truncation.
-// When Limit > 0, it streams to a temp file on disk (not memory) and stops
-// writing to stdout after Limit bytes. When Limit == 0, it writes directly
-// to stdout. In Quiet mode, it buffers in memory and prints once at the end.
-type Writer struct {
-	Limit      int
-	Quiet      bool
-	written    int
-	totalBytes int
-	truncated  bool
-	buf        strings.Builder // used only in quiet mode
-	tmpFile    *os.File        // used only in truncation mode (Limit > 0)
-}
-
-// Write sends a chunk of content through the writer.
-func (w *Writer) Write(s string) {
-	w.totalBytes += len(s)
-
-	// Quiet mode: buffer in memory, print nothing
-	if w.Quiet {
-		w.buf.WriteString(s)
-		return
-	}
-
-	if w.Limit <= 0 {
-		fmt.Print(s)
-		return
-	}
-
-	// Truncation mode: stream all content to temp file on disk
-	if w.tmpFile == nil {
-		f, err := os.CreateTemp("", "onyx-ask-*.txt")
-		if err != nil {
-			// Fall back to no-truncation if we can't create the file
-			fmt.Fprintf(os.Stderr, "warning: could not create temp file: %v\n", err)
-			w.Limit = 0
-			fmt.Print(s)
-			return
-		}
-		w.tmpFile = f
-	}
-	if _, err := w.tmpFile.WriteString(s); err != nil {
-		// Disk write failed — abandon truncation, stream directly to stdout
-		fmt.Fprintf(os.Stderr, "warning: temp file write failed: %v\n", err)
-		w.closeTmpFile(true)
-		w.Limit = 0
-		w.truncated = false
-		fmt.Print(s)
-		return
-	}
-
-	if w.truncated {
-		return
-	}
-
-	remaining := w.Limit - w.written
-	if len(s) <= remaining {
-		fmt.Print(s)
-		w.written += len(s)
-	} else {
-		if remaining > 0 {
-			fmt.Print(s[:remaining])
-			w.written += remaining
-		}
-		w.truncated = true
-	}
-}
-
-// Finish flushes remaining output. Call once after all Write calls are done.
-func (w *Writer) Finish() {
-	// Quiet mode: print buffered content at once
-	if w.Quiet {
-		fmt.Println(w.buf.String())
-		return
-	}
-
-	if !w.truncated {
-		w.closeTmpFile(true) // clean up unused temp file
-		fmt.Println()
-		return
-	}
-
-	// Close the temp file so it's readable
-	tmpPath := w.tmpFile.Name()
-	w.closeTmpFile(false) // close but keep the file
-
-	fmt.Printf("\n\n--- response truncated (%d bytes total) ---\n", w.totalBytes)
-	fmt.Printf("Full response: %s\n", tmpPath)
-	fmt.Printf("Explore:\n")
-	fmt.Printf("  cat %s | grep \"<pattern>\"\n", tmpPath)
-	fmt.Printf("  cat %s | tail -50\n", tmpPath)
-}
-
-// closeTmpFile closes and optionally removes the temp file.
-func (w *Writer) closeTmpFile(remove bool) {
-	if w.tmpFile == nil {
-		return
-	}
-	if err := w.tmpFile.Close(); err != nil {
-		log.Debugf("warning: failed to close temp file: %v", err)
-	}
-	if remove {
-		if err := os.Remove(w.tmpFile.Name()); err != nil {
-			log.Debugf("warning: failed to remove temp file: %v", err)
-		}
-	}
-	w.tmpFile = nil
-}
--- a/cli/internal/overflow/writer_test.go
+++ b/cli/internal/overflow/writer_test.go
@@ -1,95 +0,0 @@
-package overflow
-
-import (
-	"os"
-	"testing"
-)
-
-func TestWriter_NoLimit(t *testing.T) {
-	w := &Writer{Limit: 0}
-	w.Write("hello world")
-	if w.truncated {
-		t.Fatal("should not be truncated with limit 0")
-	}
-	if w.totalBytes != 11 {
-		t.Fatalf("expected 11 total bytes, got %d", w.totalBytes)
-	}
-}
-
-func TestWriter_UnderLimit(t *testing.T) {
-	w := &Writer{Limit: 100}
-	w.Write("hello")
-	w.Write(" world")
-	if w.truncated {
-		t.Fatal("should not be truncated when under limit")
-	}
-	if w.written != 11 {
-		t.Fatalf("expected 11 written bytes, got %d", w.written)
-	}
-}
-
-func TestWriter_OverLimit(t *testing.T) {
-	w := &Writer{Limit: 5}
-	w.Write("hello world") // 11 bytes, limit 5
-	if !w.truncated {
-		t.Fatal("should be truncated")
-	}
-	if w.written != 5 {
-		t.Fatalf("expected 5 written bytes, got %d", w.written)
-	}
-	if w.totalBytes != 11 {
-		t.Fatalf("expected 11 total bytes, got %d", w.totalBytes)
-	}
-	if w.tmpFile == nil {
-		t.Fatal("temp file should have been created")
-	}
-	_ = w.tmpFile.Close()
-	data, _ := os.ReadFile(w.tmpFile.Name())
-	_ = os.Remove(w.tmpFile.Name())
-	if string(data) != "hello world" {
-		t.Fatalf("temp file should contain full content, got %q", string(data))
-	}
-}
-
-func TestWriter_MultipleChunks(t *testing.T) {
-	w := &Writer{Limit: 10}
-	w.Write("hello") // 5 bytes
-	w.Write(" ")     // 6 bytes
-	w.Write("world") // 11 bytes, crosses limit
-	w.Write("!")     // 12 bytes, already truncated
-
-	if !w.truncated {
-		t.Fatal("should be truncated")
-	}
-	if w.written != 10 {
-		t.Fatalf("expected 10 written bytes, got %d", w.written)
-	}
-	if w.totalBytes != 12 {
-		t.Fatalf("expected 12 total bytes, got %d", w.totalBytes)
-	}
-	if w.tmpFile == nil {
-		t.Fatal("temp file should have been created")
-	}
-	_ = w.tmpFile.Close()
-	data, _ := os.ReadFile(w.tmpFile.Name())
-	_ = os.Remove(w.tmpFile.Name())
-	if string(data) != "hello world!" {
-		t.Fatalf("temp file should contain full content, got %q", string(data))
-	}
-}
-
-func TestWriter_QuietMode(t *testing.T) {
-	w := &Writer{Limit: 0, Quiet: true}
-	w.Write("hello")
-	w.Write(" world")
-
-	if w.written != 0 {
-		t.Fatalf("quiet mode should not write to stdout, got %d written", w.written)
-	}
-	if w.totalBytes != 11 {
-		t.Fatalf("expected 11 total bytes, got %d", w.totalBytes)
-	}
-	if w.buf.String() != "hello world" {
-		t.Fatalf("buffer should contain full content, got %q", w.buf.String())
-	}
-}
--- a/cli/internal/tui/app.go
+++ b/cli/internal/tui/app.go
@@ -55,7 +55,7 @@ func NewModel(cfg config.OnyxCliConfig) Model {
 	return Model{
 		config:          cfg,
 		client:          client,
-		viewport:        newViewport(80, cfg.Features.StreamMarkdownEnabled()),
+		viewport:        newViewport(80),
 		input:           newInputModel(),
 		status:          newStatusBar(),
 		agentID:       cfg.DefaultAgentID,
--- a/cli/internal/tui/commands.go
+++ b/cli/internal/tui/commands.go
@@ -67,10 +67,6 @@ func handleSlashCommand(m Model, text string) (Model, tea.Cmd) {
 		}
 		return m, nil

-	case "/experiments":
-		m.viewport.addInfo(m.experimentsText())
-		return m, nil
-
 	case "/quit":
 		return m, tea.Quit

--- a/cli/internal/tui/experiments.go
+++ b/cli/internal/tui/experiments.go
@@ -1,8 +0,0 @@
-package tui
-
-import "github.com/onyx-dot-app/onyx/cli/internal/config"
-
-// experimentsText returns the formatted experiments list for the current config.
-func (m Model) experimentsText() string {
-	return config.ExperimentsText(m.config.Features)
-}
--- a/cli/internal/tui/help.go
+++ b/cli/internal/tui/help.go
@@ -10,7 +10,6 @@ const helpText = `Onyx CLI Commands
  /configure         Re-run connection setup
  /connectors        Open connectors page in browser
  /settings          Open Onyx settings in browser
-  /experiments       List experimental features and their status
  /quit              Exit Onyx CLI

 Keyboard Shortcuts
--- a/cli/internal/tui/input.go
+++ b/cli/internal/tui/input.go
@@ -24,7 +24,6 @@ var slashCommands = []slashCommand{
 	{"/configure", "Re-run connection setup"},
 	{"/connectors", "Open connectors in browser"},
 	{"/settings", "Open settings in browser"},
-	{"/experiments", "List experimental features"},
 	{"/quit", "Exit Onyx CLI"},
 }

--- a/cli/internal/tui/viewport.go
+++ b/cli/internal/tui/viewport.go
@@ -4,7 +4,6 @@ import (
 	"fmt"
 	"sort"
 	"strings"
-	"time"

 	"github.com/charmbracelet/glamour"
 	"github.com/charmbracelet/glamour/styles"
@@ -45,9 +44,6 @@ type pickerItem struct {
 	label string
 }

-// streamRenderInterval is the minimum time between markdown re-renders during streaming.
-const streamRenderInterval = 100 * time.Millisecond
-
 // viewport manages the chat display.
 type viewport struct {
 	entries      []chatEntry
@@ -61,12 +57,6 @@ type viewport struct {
 	pickerIndex  int
 	pickerType   pickerKind
 	scrollOffset int // lines scrolled up from bottom (0 = pinned to bottom)
-
-	// Progressive markdown rendering during streaming
-	streamMarkdown bool   // feature flag: render markdown while streaming
-	streamRendered string // cached rendered output during streaming
-	lastRenderTime time.Time
-	lastRenderLen  int // length of streamBuf at last render (skip if unchanged)
 }

 // newMarkdownRenderer creates a Glamour renderer with zero left margin.
@@ -81,11 +71,10 @@ func newMarkdownRenderer(width int) *glamour.TermRenderer {
 	return r
 }

-func newViewport(width int, streamMarkdown bool) *viewport {
+func newViewport(width int) *viewport {
 	return &viewport{
-		width:          width,
-		renderer:       newMarkdownRenderer(width),
-		streamMarkdown: streamMarkdown,
+		width:    width,
+		renderer: newMarkdownRenderer(width),
 	}
 }

@@ -119,27 +108,12 @@ func (v *viewport) addUserMessage(msg string) {
 func (v *viewport) startAgent() {
 	v.streaming = true
 	v.streamBuf = ""
-	v.streamRendered = ""
-	v.lastRenderLen = 0
-	v.lastRenderTime = time.Time{}
 	// Add a blank-line spacer entry before the agent message
 	v.entries = append(v.entries, chatEntry{kind: entryInfo, rendered: ""})
 }

 func (v *viewport) appendToken(token string) {
 	v.streamBuf += token
-
-	if !v.streamMarkdown {
-		return
-	}
-
-	now := time.Now()
-	bufLen := len(v.streamBuf)
-	if bufLen != v.lastRenderLen && now.Sub(v.lastRenderTime) >= streamRenderInterval {
-		v.streamRendered = v.renderAgentContent(v.streamBuf)
-		v.lastRenderTime = now
-		v.lastRenderLen = bufLen
-	}
 }

 func (v *viewport) finishAgent() {
@@ -161,8 +135,6 @@ func (v *viewport) finishAgent() {
 	})
 	v.streaming = false
 	v.streamBuf = ""
-	v.streamRendered = ""
-	v.lastRenderLen = 0
 }

 func (v *viewport) renderAgentContent(content string) string {
@@ -386,22 +358,6 @@ func (v *viewport) renderPicker(width, height int) string {
 	return lipgloss.Place(width, height, lipgloss.Center, lipgloss.Center, panel)
 }

-// streamingContent returns the display content for the in-progress stream.
-func (v *viewport) streamingContent() string {
-	if v.streamMarkdown && v.streamRendered != "" {
-		return v.streamRendered
-	}
-	// Fall back to raw text with agent dot prefix
-	bufLines := strings.Split(v.streamBuf, "\n")
-	if len(bufLines) > 0 {
-		bufLines[0] = agentDot + " " + bufLines[0]
-		for i := 1; i < len(bufLines); i++ {
-			bufLines[i] = "  " + bufLines[i]
-		}
-	}
-	return strings.Join(bufLines, "\n")
-}
-
 // totalLines computes the total number of rendered content lines.
 func (v *viewport) totalLines() int {
 	var lines []string
@@ -412,7 +368,14 @@ func (v *viewport) totalLines() int {
 		lines = append(lines, e.rendered)
 	}
 	if v.streaming && v.streamBuf != "" {
-		lines = append(lines, v.streamingContent())
+		bufLines := strings.Split(v.streamBuf, "\n")
+		if len(bufLines) > 0 {
+			bufLines[0] = agentDot + " " + bufLines[0]
+			for i := 1; i < len(bufLines); i++ {
+				bufLines[i] = "  " + bufLines[i]
+			}
+		}
+		lines = append(lines, strings.Join(bufLines, "\n"))
 	} else if v.streaming {
 		lines = append(lines, agentDot+" ")
 	}
@@ -436,9 +399,16 @@ func (v *viewport) view(height int) string {
 		lines = append(lines, e.rendered)
 	}

-	// Streaming buffer
+	// Streaming buffer (plain text, not markdown)
 	if v.streaming && v.streamBuf != "" {
-		lines = append(lines, v.streamingContent())
+		bufLines := strings.Split(v.streamBuf, "\n")
+		if len(bufLines) > 0 {
+			bufLines[0] = agentDot + " " + bufLines[0]
+			for i := 1; i < len(bufLines); i++ {
+				bufLines[i] = "  " + bufLines[i]
+			}
+		}
+		lines = append(lines, strings.Join(bufLines, "\n"))
 	} else if v.streaming {
 		lines = append(lines, agentDot+" ")
 	}
--- a/cli/internal/tui/viewport_test.go
+++ b/cli/internal/tui/viewport_test.go
@@ -4,7 +4,6 @@ import (
 	"regexp"
 	"strings"
 	"testing"
-	"time"
 )

 // stripANSI removes ANSI escape sequences for test comparisons.
@@ -15,7 +14,7 @@ func stripANSI(s string) string {
 }

 func TestAddUserMessage(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addUserMessage("hello world")

 	if len(v.entries) != 1 {
@@ -38,7 +37,7 @@ func TestAddUserMessage(t *testing.T) {
 }

 func TestStartAndFinishAgent(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.startAgent()

 	if !v.streaming {
@@ -84,7 +83,7 @@ func TestStartAndFinishAgent(t *testing.T) {
 }

 func TestFinishAgentNoPadding(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.startAgent()
 	v.appendToken("Test message")
 	v.finishAgent()
@@ -99,7 +98,7 @@ func TestFinishAgentNoPadding(t *testing.T) {
 }

 func TestFinishAgentMultiline(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.startAgent()
 	v.appendToken("Line one\n\nLine three")
 	v.finishAgent()
@@ -116,7 +115,7 @@ func TestFinishAgentMultiline(t *testing.T) {
 }

 func TestFinishAgentEmpty(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.startAgent()
 	v.finishAgent()

@@ -129,7 +128,7 @@ func TestFinishAgentEmpty(t *testing.T) {
 }

 func TestAddInfo(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addInfo("test info")

 	if len(v.entries) != 1 {
@@ -146,7 +145,7 @@ func TestAddInfo(t *testing.T) {
 }

 func TestAddError(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addError("something broke")

 	if len(v.entries) != 1 {
@@ -163,7 +162,7 @@ func TestAddError(t *testing.T) {
 }

 func TestAddCitations(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addCitations(map[int]string{1: "doc-a", 2: "doc-b"})

 	if len(v.entries) != 1 {
@@ -183,7 +182,7 @@ func TestAddCitations(t *testing.T) {
 }

 func TestAddCitationsEmpty(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addCitations(map[int]string{})

 	if len(v.entries) != 0 {
@@ -192,7 +191,7 @@ func TestAddCitationsEmpty(t *testing.T) {
 }

 func TestCitationVisibility(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addInfo("hello")
 	v.addCitations(map[int]string{1: "doc"})

@@ -212,7 +211,7 @@ func TestCitationVisibility(t *testing.T) {
 }

 func TestClearAll(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addUserMessage("test")
 	v.startAgent()
 	v.appendToken("response")
@@ -231,7 +230,7 @@ func TestClearAll(t *testing.T) {
 }

 func TestClearDisplay(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addUserMessage("test")
 	v.clearDisplay()

@@ -241,7 +240,7 @@ func TestClearDisplay(t *testing.T) {
 }

 func TestViewPadsShortContent(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	v.addInfo("hello")

 	view := v.view(10)
@@ -252,7 +251,7 @@ func TestViewPadsShortContent(t *testing.T) {
 }

 func TestViewTruncatesTallContent(t *testing.T) {
-	v := newViewport(80, false)
+	v := newViewport(80)
 	for i := 0; i < 20; i++ {
 		v.addInfo("line")
 	}
@@ -263,93 +262,3 @@ func TestViewTruncatesTallContent(t *testing.T) {
 		t.Errorf("expected 5 lines (truncated), got %d", len(lines))
 	}
 }
-
-func TestStreamMarkdownRendersOnThrottle(t *testing.T) {
-	v := newViewport(80, true)
-	v.startAgent()
-
-	// First token: no prior render, so it should render immediately
-	v.appendToken("**bold text**")
-
-	if v.streamRendered == "" {
-		t.Error("expected streamRendered to be populated after first token")
-	}
-	plain := stripANSI(v.streamRendered)
-	if !strings.Contains(plain, "bold text") {
-		t.Errorf("expected rendered to contain 'bold text', got %q", plain)
-	}
-	// Should not contain raw markdown asterisks
-	if strings.Contains(plain, "**") {
-		t.Errorf("expected markdown to be rendered (no **), got %q", plain)
-	}
-
-	// Second token within throttle window: should NOT re-render
-	v.lastRenderTime = time.Now() // simulate recent render
-	prevRendered := v.streamRendered
-	v.appendToken(" more")
-	if v.streamRendered != prevRendered {
-		t.Error("expected streamRendered to be unchanged within throttle window")
-	}
-
-	// After throttle interval: should re-render
-	v.lastRenderTime = time.Now().Add(-streamRenderInterval - time.Millisecond)
-	v.appendToken("!")
-	if v.streamRendered == prevRendered {
-		t.Error("expected streamRendered to update after throttle interval")
-	}
-	plain = stripANSI(v.streamRendered)
-	if !strings.Contains(plain, "bold text more!") {
-		t.Errorf("expected updated rendered content, got %q", plain)
-	}
-}
-
-func TestStreamMarkdownDisabledNoRender(t *testing.T) {
-	v := newViewport(80, false)
-	v.startAgent()
-	v.appendToken("**bold**")
-
-	if v.streamRendered != "" {
-		t.Error("expected no streamRendered when streamMarkdown is disabled")
-	}
-
-	// View should show raw markdown
-	view := v.view(10)
-	plain := stripANSI(view)
-	if !strings.Contains(plain, "**bold**") {
-		t.Errorf("expected raw markdown in view, got %q", plain)
-	}
-}
-
-func TestStreamMarkdownViewUsesRendered(t *testing.T) {
-	v := newViewport(80, true)
-	v.startAgent()
-	v.appendToken("**formatted**")
-
-	view := v.view(10)
-	plain := stripANSI(view)
-	// Should show rendered content, not raw **formatted**
-	if strings.Contains(plain, "**") {
-		t.Errorf("expected rendered markdown in view (no **), got %q", plain)
-	}
-	if !strings.Contains(plain, "formatted") {
-		t.Errorf("expected 'formatted' in view, got %q", plain)
-	}
-}
-
-func TestStreamMarkdownResetOnStart(t *testing.T) {
-	v := newViewport(80, true)
-
-	// First stream cycle
-	v.startAgent()
-	v.appendToken("first")
-	v.finishAgent()
-
-	// Start second stream - state should be clean
-	v.startAgent()
-	if v.streamRendered != "" {
-		t.Error("expected streamRendered cleared on startAgent")
-	}
-	if v.lastRenderLen != 0 {
-		t.Error("expected lastRenderLen reset on startAgent")
-	}
-}
--- a/cli/main.go
+++ b/cli/main.go
@@ -1,12 +1,10 @@
 package main

 import (
-	"errors"
 	"fmt"
 	"os"

 	"github.com/onyx-dot-app/onyx/cli/cmd"
-	"github.com/onyx-dot-app/onyx/cli/internal/exitcodes"
 )

 var (
@@ -20,10 +18,6 @@ func main() {

 	if err := cmd.Execute(); err != nil {
 		fmt.Fprintf(os.Stderr, "Error: %v\n", err)
-		var exitErr *exitcodes.ExitError
-		if errors.As(err, &exitErr) {
-			os.Exit(exitErr.Code)
-		}
 		os.Exit(1)
 	}
 }
--- a/deployment/helm/charts/onyx/Chart.lock
+++ b/deployment/helm/charts/onyx/Chart.lock
@@ -19,6 +19,6 @@ dependencies:
  version: 5.4.0
 - name: code-interpreter
  repository: https://onyx-dot-app.github.io/python-sandbox/
-  version: 0.3.2
-digest: sha256:74908ea45ace2b4be913ff762772e6d87e40bab64e92c6662aa51730eaeb9d87
-generated: "2026-04-06T15:34:02.597166-07:00"
+  version: 0.3.1
+digest: sha256:4965b6ea3674c37163832a2192cd3bc8004f2228729fca170af0b9f457e8f987
+generated: "2026-03-02T15:29:39.632344-08:00"
--- a/deployment/helm/charts/onyx/Chart.yaml
+++ b/deployment/helm/charts/onyx/Chart.yaml
@@ -5,7 +5,7 @@ home: https://www.onyx.app/
 sources:
  - "https://github.com/onyx-dot-app/onyx"
 type: application
-version: 0.4.40
+version: 0.4.39
 appVersion: latest
 annotations:
  category: Productivity
@@ -45,6 +45,6 @@ dependencies:
    repository: https://charts.min.io/
    condition: minio.enabled
  - name: code-interpreter
-    version: 0.3.2
+    version: 0.3.1
    repository: https://onyx-dot-app.github.io/python-sandbox/
    condition: codeInterpreter.enabled
--- a/deployment/helm/charts/onyx/templates/api-deployment.yaml
+++ b/deployment/helm/charts/onyx/templates/api-deployment.yaml
@@ -67,9 +67,6 @@ spec:
            - "/bin/sh"
            - "-c"
            - |
-              {{- if .Values.api.runUpdateCaCertificates }}
-              update-ca-certificates &&
-              {{- end }}
              alembic upgrade head &&
              echo "Starting Onyx Api Server" &&
              uvicorn onyx.main:app --host {{ .Values.global.host }} --port {{ .Values.api.containerPorts.server }}
--- a/deployment/helm/charts/onyx/values.yaml
+++ b/deployment/helm/charts/onyx/values.yaml
@@ -504,18 +504,6 @@ api:
  tolerations: []
  affinity: {}

-  # Run update-ca-certificates before starting the server.
-  # Useful when mounting custom CA certificates via volumes/volumeMounts.
-  # NOTE: Requires the container to run as root (runAsUser: 0).
-  # CA certificate files must be mounted under /usr/local/share/ca-certificates/
-  # with a .crt extension (e.g. /usr/local/share/ca-certificates/my-ca.crt).
-  # NOTE: Python HTTP clients (requests, httpx) use certifi's bundle by default
-  # and will not pick up the system CA store automatically. Set the following
-  # environment variables via configMap values (loaded through envFrom) to make them use the updated system bundle:
-  #   REQUESTS_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
-  #   SSL_CERT_FILE: /etc/ssl/certs/ca-certificates.crt
-  runUpdateCaCertificates: false
-

 ######################################################################
 #
--- a/docker-bake.hcl
+++ b/docker-bake.hcl
@@ -30,10 +30,7 @@ target "backend" {
  context    = "backend"
  dockerfile = "Dockerfile"

-  cache-from = [
-    "type=registry,ref=${BACKEND_REPOSITORY}:latest",
-    "type=registry,ref=${BACKEND_REPOSITORY}:edge",
-  ]
+  cache-from = ["type=registry,ref=${BACKEND_REPOSITORY}:latest"]
  cache-to   = ["type=inline"]

  tags      = ["${BACKEND_REPOSITORY}:${TAG}"]
@@ -43,10 +40,7 @@ target "web" {
  context    = "web"
  dockerfile = "Dockerfile"

-  cache-from = [
-    "type=registry,ref=${WEB_SERVER_REPOSITORY}:latest",
-    "type=registry,ref=${WEB_SERVER_REPOSITORY}:edge",
-  ]
+  cache-from = ["type=registry,ref=${WEB_SERVER_REPOSITORY}:latest"]
  cache-to   = ["type=inline"]

  tags      = ["${WEB_SERVER_REPOSITORY}:${TAG}"]
@@ -57,10 +51,7 @@ target "model-server" {

  dockerfile = "Dockerfile.model_server"

-  cache-from = [
-    "type=registry,ref=${MODEL_SERVER_REPOSITORY}:latest",
-    "type=registry,ref=${MODEL_SERVER_REPOSITORY}:edge",
-  ]
+  cache-from = ["type=registry,ref=${MODEL_SERVER_REPOSITORY}:latest"]
  cache-to   = ["type=inline"]

  tags      = ["${MODEL_SERVER_REPOSITORY}:${TAG}"]
@@ -82,10 +73,7 @@ target "cli" {
  context    = "cli"
  dockerfile = "Dockerfile"

-  cache-from = [
-    "type=registry,ref=${CLI_REPOSITORY}:latest",
-    "type=registry,ref=${CLI_REPOSITORY}:edge",
-  ]
+  cache-from = ["type=registry,ref=${CLI_REPOSITORY}:latest"]
  cache-to   = ["type=inline"]

  tags      = ["${CLI_REPOSITORY}:${TAG}"]
--- a/docs/METRICS.md
+++ b/docs/METRICS.md
@@ -6,11 +6,11 @@ All Prometheus metrics live in the `backend/onyx/server/metrics/` package. Follo

 ### 1. Choose the right file (or create a new one)

-| File                                  | Purpose                                      |
-| ------------------------------------- | -------------------------------------------- |
-| `metrics/slow_requests.py`            | Slow request counter + callback              |
-| `metrics/postgres_connection_pool.py` | SQLAlchemy connection pool metrics           |
-| `metrics/prometheus_setup.py`         | FastAPI instrumentator config (orchestrator) |
+| File | Purpose |
+|------|---------|
+| `metrics/slow_requests.py` | Slow request counter + callback |
+| `metrics/postgres_connection_pool.py` | SQLAlchemy connection pool metrics |
+| `metrics/prometheus_setup.py` | FastAPI instrumentator config (orchestrator) |

 If your metric is a standalone concern (e.g. cache hit rates, queue depths), create a new file under `metrics/` and keep one metric concept per file.

@@ -30,7 +30,6 @@ _my_counter = Counter(
 ```

 **Naming conventions:**
-
 - Prefix all metric names with `onyx_`
 - Counters: `_total` suffix (e.g. `onyx_api_slow_requests_total`)
 - Histograms: `_seconds` or `_bytes` suffix for durations/sizes
@@ -108,26 +107,26 @@ These metrics are exposed at `GET /metrics` on the API server.

 ### Built-in (via `prometheus-fastapi-instrumentator`)

-| Metric                                | Type      | Labels                        | Description                                       |
-| ------------------------------------- | --------- | ----------------------------- | ------------------------------------------------- |
-| `http_requests_total`                 | Counter   | `method`, `status`, `handler` | Total request count                               |
-| `http_request_duration_highr_seconds` | Histogram | _(none)_                      | High-resolution latency (many buckets, no labels) |
-| `http_request_duration_seconds`       | Histogram | `method`, `handler`           | Latency by handler (custom buckets for P95/P99)   |
-| `http_request_size_bytes`             | Summary   | `handler`                     | Incoming request content length                   |
-| `http_response_size_bytes`            | Summary   | `handler`                     | Outgoing response content length                  |
-| `http_requests_inprogress`            | Gauge     | `method`, `handler`           | Currently in-flight requests                      |
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `http_requests_total` | Counter | `method`, `status`, `handler` | Total request count |
+| `http_request_duration_highr_seconds` | Histogram | _(none)_ | High-resolution latency (many buckets, no labels) |
+| `http_request_duration_seconds` | Histogram | `method`, `handler` | Latency by handler (custom buckets for P95/P99) |
+| `http_request_size_bytes` | Summary | `handler` | Incoming request content length |
+| `http_response_size_bytes` | Summary | `handler` | Outgoing response content length |
+| `http_requests_inprogress` | Gauge | `method`, `handler` | Currently in-flight requests |

 ### Custom (via `onyx.server.metrics`)

-| Metric                         | Type    | Labels                        | Description                                                      |
-| ------------------------------ | ------- | ----------------------------- | ---------------------------------------------------------------- |
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
 | `onyx_api_slow_requests_total` | Counter | `method`, `handler`, `status` | Requests exceeding `SLOW_REQUEST_THRESHOLD_SECONDS` (default 1s) |

 ### Configuration

-| Env Var                          | Default | Description                                  |
-| -------------------------------- | ------- | -------------------------------------------- |
-| `SLOW_REQUEST_THRESHOLD_SECONDS` | `1.0`   | Duration threshold for slow request counting |
+| Env Var | Default | Description |
+|---------|---------|-------------|
+| `SLOW_REQUEST_THRESHOLD_SECONDS` | `1.0` | Duration threshold for slow request counting |

 ### Instrumentator Settings

@@ -142,188 +141,44 @@ These metrics provide visibility into SQLAlchemy connection pool state across al

 ### Pool State (via custom Prometheus collector — snapshot on each scrape)

-| Metric                     | Type  | Labels   | Description                                     |
-| -------------------------- | ----- | -------- | ----------------------------------------------- |
-| `onyx_db_pool_checked_out` | Gauge | `engine` | Currently checked-out connections               |
-| `onyx_db_pool_checked_in`  | Gauge | `engine` | Idle connections available in the pool          |
-| `onyx_db_pool_overflow`    | Gauge | `engine` | Current overflow connections beyond `pool_size` |
-| `onyx_db_pool_size`        | Gauge | `engine` | Configured pool size (constant)                 |
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `onyx_db_pool_checked_out` | Gauge | `engine` | Currently checked-out connections |
+| `onyx_db_pool_checked_in` | Gauge | `engine` | Idle connections available in the pool |
+| `onyx_db_pool_overflow` | Gauge | `engine` | Current overflow connections beyond `pool_size` |
+| `onyx_db_pool_size` | Gauge | `engine` | Configured pool size (constant) |

 ### Pool Lifecycle (via SQLAlchemy pool event listeners)

-| Metric                                   | Type    | Labels   | Description                              |
-| ---------------------------------------- | ------- | -------- | ---------------------------------------- |
-| `onyx_db_pool_checkout_total`            | Counter | `engine` | Total connection checkouts from the pool |
-| `onyx_db_pool_checkin_total`             | Counter | `engine` | Total connection checkins to the pool    |
-| `onyx_db_pool_connections_created_total` | Counter | `engine` | Total new database connections created   |
-| `onyx_db_pool_invalidations_total`       | Counter | `engine` | Total connection invalidations           |
-| `onyx_db_pool_checkout_timeout_total`    | Counter | `engine` | Total connection checkout timeouts       |
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `onyx_db_pool_checkout_total` | Counter | `engine` | Total connection checkouts from the pool |
+| `onyx_db_pool_checkin_total` | Counter | `engine` | Total connection checkins to the pool |
+| `onyx_db_pool_connections_created_total` | Counter | `engine` | Total new database connections created |
+| `onyx_db_pool_invalidations_total` | Counter | `engine` | Total connection invalidations |
+| `onyx_db_pool_checkout_timeout_total` | Counter | `engine` | Total connection checkout timeouts |

 ### Per-Endpoint Attribution (via pool events + endpoint context middleware)

-| Metric                                 | Type      | Labels              | Description                                     |
-| -------------------------------------- | --------- | ------------------- | ----------------------------------------------- |
-| `onyx_db_connections_held_by_endpoint` | Gauge     | `handler`, `engine` | DB connections currently held, by endpoint      |
-| `onyx_db_connection_hold_seconds`      | Histogram | `handler`, `engine` | Duration a DB connection is held by an endpoint |
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `onyx_db_connections_held_by_endpoint` | Gauge | `handler`, `engine` | DB connections currently held, by endpoint |
+| `onyx_db_connection_hold_seconds` | Histogram | `handler`, `engine` | Duration a DB connection is held by an endpoint |

 Engine label values: `sync` (main read-write), `async` (async sessions), `readonly` (read-only user).

 Connections from background tasks (Celery) or boot-time warmup appear as `handler="unknown"`.

-## Celery Worker Metrics
-
-Celery workers expose metrics via a standalone Prometheus HTTP server (separate from the API server's `/metrics` endpoint). Each worker type runs its own server on a dedicated port.
-
-### Metrics Server (`onyx.server.metrics.metrics_server`)
-
-| Env Var                      | Default             | Description                                           |
-| ---------------------------- | ------------------- | ----------------------------------------------------- |
-| `PROMETHEUS_METRICS_PORT`    | _(per worker type)_ | Override the default port for this worker             |
-| `PROMETHEUS_METRICS_ENABLED` | `true`              | Set to `false` to disable the metrics server entirely |
-
-Default ports:
-
-| Worker          | Port |
-| --------------- | ---- |
-| `docfetching`   | 9092 |
-| `docprocessing` | 9093 |
-| `monitoring`    | 9096 |
-
-Workers without a default port and no `PROMETHEUS_METRICS_PORT` env var will skip starting the server.
-
-### Generic Task Lifecycle Metrics (`onyx.server.metrics.celery_task_metrics`)
-
-Push-based metrics that fire on Celery signals for all tasks on the worker.
-
-| Metric                              | Type      | Labels                          | Description                                                                   |
-| ----------------------------------- | --------- | ------------------------------- | ----------------------------------------------------------------------------- |
-| `onyx_celery_task_started_total`    | Counter   | `task_name`, `queue`            | Total tasks started                                                           |
-| `onyx_celery_task_completed_total`  | Counter   | `task_name`, `queue`, `outcome` | Total tasks completed (`outcome`: `success` or `failure`)                     |
-| `onyx_celery_task_duration_seconds` | Histogram | `task_name`, `queue`            | Task execution duration. Buckets: 1, 5, 15, 30, 60, 120, 300, 600, 1800, 3600 |
-| `onyx_celery_tasks_active`          | Gauge     | `task_name`, `queue`            | Currently executing tasks                                                     |
-| `onyx_celery_task_retried_total`    | Counter   | `task_name`, `queue`            | Total task retries                                                            |
-| `onyx_celery_task_revoked_total`    | Counter   | `task_name`                     | Total tasks revoked (cancelled)                                               |
-| `onyx_celery_task_rejected_total`   | Counter   | `task_name`                     | Total tasks rejected by worker                                                |
-
-Stale start-time entries (tasks killed via SIGTERM/OOM where `task_postrun` never fires) are evicted after 1 hour.
-
-### Per-Connector Indexing Metrics (`onyx.server.metrics.indexing_task_metrics`)
-
-Enriches docfetching and docprocessing tasks with connector-level labels. Silently no-ops for all other tasks.
-
-| Metric                                | Type      | Labels                                                      | Description                              |
-| ------------------------------------- | --------- | ----------------------------------------------------------- | ---------------------------------------- |
-| `onyx_indexing_task_started_total`    | Counter   | `task_name`, `source`, `tenant_id`, `cc_pair_id`            | Indexing tasks started per connector     |
-| `onyx_indexing_task_completed_total`  | Counter   | `task_name`, `source`, `tenant_id`, `cc_pair_id`, `outcome` | Indexing tasks completed per connector   |
-| `onyx_indexing_task_duration_seconds` | Histogram | `task_name`, `source`, `tenant_id`                          | Indexing task duration by connector type |
-
-`connector_name` is intentionally excluded from these push-based counters to avoid unbounded cardinality (it's a free-form user string). The pull-based collectors on the monitoring worker include it since they have bounded cardinality (one series per connector).
-
-### Pull-Based Collectors (`onyx.server.metrics.indexing_pipeline`)
-
-Registered only in the **Monitoring** worker. Collectors query Redis/Postgres at scrape time with a 30-second TTL cache.
-
-| Metric                               | Type  | Labels  | Description                         |
-| ------------------------------------ | ----- | ------- | ----------------------------------- |
-| `onyx_queue_depth`                   | Gauge | `queue` | Celery queue length                 |
-| `onyx_queue_unacked`                 | Gauge | `queue` | Unacknowledged messages per queue   |
-| `onyx_queue_oldest_task_age_seconds` | Gauge | `queue` | Age of the oldest task in the queue |
-
-Plus additional connector health, index attempt, and worker heartbeat metrics — see `indexing_pipeline.py` for the full list.
-
-### Adding Metrics to a Worker
-
-Currently only the docfetching and docprocessing workers have push-based task metrics wired up. To add metrics to another worker (e.g. heavy, light, primary):
-
-**1. Import and call the generic handlers from the worker's signal handlers:**
-
-```python
-from onyx.server.metrics.celery_task_metrics import (
-    on_celery_task_prerun,
-    on_celery_task_postrun,
-    on_celery_task_retry,
-    on_celery_task_revoked,
-    on_celery_task_rejected,
-)
-
-@signals.task_prerun.connect
-def on_task_prerun(sender, task_id, task, args, kwargs, **kwds):
-    app_base.on_task_prerun(sender, task_id, task, args, kwargs, **kwds)
-    on_celery_task_prerun(task_id, task)
-```
-
-Do the same for `task_postrun`, `task_retry`, `task_revoked`, and `task_rejected` — see `apps/docfetching.py` for the complete example.
-
-**2. Start the metrics server on `worker_ready`:**
-
-```python
-from onyx.server.metrics.metrics_server import start_metrics_server
-
-@worker_ready.connect
-def on_worker_ready(sender, **kwargs):
-    start_metrics_server("your_worker_type")
-    app_base.on_worker_ready(sender, **kwargs)
-```
-
-Add a default port for your worker type in `metrics_server.py`'s `_DEFAULT_PORTS` dict, or set `PROMETHEUS_METRICS_PORT` in the environment.
-
-**3. (Optional) Add domain-specific enrichment:**
-
-If your tasks need richer labels beyond `task_name`/`queue`, create a new module in `server/metrics/` following `indexing_task_metrics.py`:
-
- Define Counters/Histograms with your domain labels
- Write `on_<domain>_task_prerun` / `on_<domain>_task_postrun` handlers that filter by task name and no-op for others
- Call them from the worker's signal handlers alongside the generic ones
-
-**Cardinality warning:** Never use user-defined free-form strings as metric labels — they create unbounded cardinality. Use IDs or enum values. If you need free-form labels, use pull-based collectors (monitoring worker) where cardinality is naturally bounded.
-
-### Current Worker Integration Status
-
-| Worker               | Generic Task Metrics | Domain Metrics | Metrics Server                       |
-| -------------------- | -------------------- | -------------- | ------------------------------------ |
-| Docfetching          | ✓                    | ✓ (indexing)   | ✓ (port 9092)                        |
-| Docprocessing        | ✓                    | ✓ (indexing)   | ✓ (port 9093)                        |
-| Monitoring           | —                    | —              | ✓ (port 9096, pull-based collectors) |
-| Primary              | —                    | —              | —                                    |
-| Light                | —                    | —              | —                                    |
-| Heavy                | —                    | —              | —                                    |
-| User File Processing | —                    | —              | —                                    |
-| KG Processing        | —                    | —              | —                                    |
-
-### Example PromQL Queries (Celery)
-
-```promql
-# Task completion rate by worker queue
-sum by (queue) (rate(onyx_celery_task_completed_total[5m]))
-
-# P95 task duration for pruning tasks
-histogram_quantile(0.95,
-  sum by (le) (rate(onyx_celery_task_duration_seconds_bucket{task_name=~".*pruning.*"}[5m])))
-
-# Task failure rate
-sum by (task_name) (rate(onyx_celery_task_completed_total{outcome="failure"}[5m]))
-  / sum by (task_name) (rate(onyx_celery_task_completed_total[5m]))
-
-# Active tasks per queue
-sum by (queue) (onyx_celery_tasks_active)
-
-# Indexing throughput by source type
-sum by (source) (rate(onyx_indexing_task_completed_total{outcome="success"}[5m]))
-
-# Queue depth — are tasks backing up?
-onyx_queue_depth > 100
-```
-
 ## OpenSearch Search Metrics

 These metrics track OpenSearch search latency and throughput. Collected via `onyx.server.metrics.opensearch_search`.

-| Metric                                           | Type      | Labels        | Description                                                                 |
-| ------------------------------------------------ | --------- | ------------- | --------------------------------------------------------------------------- |
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
 | `onyx_opensearch_search_client_duration_seconds` | Histogram | `search_type` | Client-side end-to-end latency (network + serialization + server execution) |
-| `onyx_opensearch_search_server_duration_seconds` | Histogram | `search_type` | Server-side execution time from OpenSearch `took` field                     |
-| `onyx_opensearch_search_total`                   | Counter   | `search_type` | Total search requests sent to OpenSearch                                    |
-| `onyx_opensearch_searches_in_progress`           | Gauge     | `search_type` | Currently in-flight OpenSearch searches                                     |
+| `onyx_opensearch_search_server_duration_seconds` | Histogram | `search_type` | Server-side execution time from OpenSearch `took` field |
+| `onyx_opensearch_search_total` | Counter | `search_type` | Total search requests sent to OpenSearch |
+| `onyx_opensearch_searches_in_progress` | Gauge | `search_type` | Currently in-flight OpenSearch searches |

 Search type label values: See `OpenSearchSearchType`.

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -70,10 +70,6 @@ backend = [
    "lazy_imports==1.0.1",
    "lxml==5.3.0",
    "Mako==1.2.4",
-    # NOTE: Do not update without understanding the patching behavior in
-    # get_markitdown_converter in
-    # backend/onyx/file_processing/extract_file_text.py and what impacts
-    # updating might have on this behavior.
    "markitdown[pdf, docx, pptx, xlsx, xls]==0.1.2",
    "mcp[cli]==1.26.0",
    "msal==1.34.0",
--- a/web/lib/opal/src/components/buttons/sidebar-tab/components.tsx
+++ b/web/lib/opal/src/components/buttons/sidebar-tab/components.tsx
@@ -127,7 +127,7 @@ function SidebarTab({
              rightChildren={truncationSpacer}
            />
          ) : (
-            <div className="flex flex-row items-center gap-2 w-full">
+            <div className="flex flex-row items-center gap-2 flex-1">
              {Icon && (
                <div className="flex items-center justify-center p-0.5">
                  <Icon className="h-[1rem] w-[1rem] text-text-03" />
@@ -153,7 +153,7 @@ function SidebarTab({
            side="right"
            sideOffset={4}
          >
-            {children}
+            <Text>{children}</Text>
          </TooltipPrimitive.Content>
        </TooltipPrimitive.Portal>
      </TooltipPrimitive.Root>
--- a/web/lib/opal/src/components/cards/empty-message-card/components.tsx
+++ b/web/lib/opal/src/components/cards/empty-message-card/components.tsx
@@ -1,22 +1,18 @@
 import { Card } from "@opal/components/cards/card/components";
-import { Content, SizePreset } from "@opal/layouts";
+import { Content } from "@opal/layouts";
 import { SvgEmpty } from "@opal/icons";
-import type {
-  IconFunctionComponent,
-  PaddingVariants,
-  RichStr,
-} from "@opal/types";
+import type { IconFunctionComponent, PaddingVariants } from "@opal/types";

 // ---------------------------------------------------------------------------
 // Types
 // ---------------------------------------------------------------------------

-type EmptyMessageCardBaseProps = {
+type EmptyMessageCardProps = {
  /** Icon displayed alongside the title. */
  icon?: IconFunctionComponent;

  /** Primary message text. */
-  title: string | RichStr;
+  title: string;

  /** Padding preset for the card. @default "md" */
  padding?: PaddingVariants;
@@ -25,30 +21,16 @@ type EmptyMessageCardBaseProps = {
  ref?: React.Ref<HTMLDivElement>;
 };

-type EmptyMessageCardProps =
-  | (EmptyMessageCardBaseProps & {
-      /** @default "secondary" */
-      sizePreset?: "secondary";
-    })
-  | (EmptyMessageCardBaseProps & {
-      sizePreset: "main-ui";
-      /** Description text. Only supported when `sizePreset` is `"main-ui"`. */
-      description?: string | RichStr;
-    });
-
 // ---------------------------------------------------------------------------
 // EmptyMessageCard
 // ---------------------------------------------------------------------------

-function EmptyMessageCard(props: EmptyMessageCardProps) {
-  const {
-    sizePreset = "secondary",
-    icon = SvgEmpty,
-    title,
-    padding = "md",
-    ref,
-  } = props;
-
+function EmptyMessageCard({
+  icon = SvgEmpty,
+  title,
+  padding = "md",
+  ref,
+}: EmptyMessageCardProps) {
  return (
    <Card
      ref={ref}
@@ -57,23 +39,13 @@ function EmptyMessageCard(props: EmptyMessageCardProps) {
      padding={padding}
      rounding="md"
    >
-      {sizePreset === "secondary" ? (
-        <Content
-          icon={icon}
-          title={title}
-          sizePreset="secondary"
-          variant="body"
-          prominence="muted"
-        />
-      ) : (
-        <Content
-          icon={icon}
-          title={title}
-          description={"description" in props ? props.description : undefined}
-          sizePreset={sizePreset}
-          variant="section"
-        />
-      )}
+      <Content
+        icon={icon}
+        title={title}
+        sizePreset="secondary"
+        variant="body"
+        prominence="muted"
+      />
    </Card>
  );
 }
--- a/web/lib/opal/src/core/animations/components.tsx
+++ b/web/lib/opal/src/core/animations/components.tsx
@@ -1,16 +1,41 @@
 "use client";

 import "@opal/core/animations/styles.css";
-import React from "react";
+import React, { createContext, useContext, useState, useCallback } from "react";
 import { cn } from "@opal/utils";
 import type { WithoutStyles, ExtremaSizeVariants } from "@opal/types";
 import { widthVariants } from "@opal/shared";

 // ---------------------------------------------------------------------------
-// Types
+// Context-per-group registry
 // ---------------------------------------------------------------------------

-type HoverableInteraction = "rest" | "hover";
+/**
+ * Lazily-created map of group names to React contexts.
+ *
+ * Each group gets its own `React.Context<boolean | null>` so that a
+ * `Hoverable.Item` only re-renders when its *own* group's hover state
+ * changes — not when any unrelated group changes.
+ *
+ * The default value is `null` (no provider found), which lets
+ * `Hoverable.Item` distinguish "no Root ancestor" from "Root says
+ * not hovered" and throw when `group` was explicitly specified.
+ */
+const contextMap = new Map<string, React.Context<boolean | null>>();
+
+function getOrCreateContext(group: string): React.Context<boolean | null> {
+  let ctx = contextMap.get(group);
+  if (!ctx) {
+    ctx = createContext<boolean | null>(null);
+    ctx.displayName = `HoverableContext(${group})`;
+    contextMap.set(group, ctx);
+  }
+  return ctx;
+}
+
+// ---------------------------------------------------------------------------
+// Types
+// ---------------------------------------------------------------------------

 interface HoverableRootProps
  extends WithoutStyles<React.HTMLAttributes<HTMLDivElement>> {
@@ -18,17 +43,6 @@ interface HoverableRootProps
  group: string;
  /** Width preset. @default "auto" */
  widthVariant?: ExtremaSizeVariants;
-  /**
-   * JS-controllable interaction state override.
-   *
-   * - `"rest"` (default): items are shown/hidden by CSS `:hover`.
-   * - `"hover"`: forces items visible regardless of hover state. Useful when
-   *   a hoverable action opens a modal — set `interaction="hover"` while the
-   *   modal is open so the user can see which element they're interacting with.
-   *
-   * @default "rest"
-   */
-  interaction?: HoverableInteraction;
  /** Ref forwarded to the root `<div>`. */
  ref?: React.Ref<HTMLDivElement>;
 }
@@ -51,10 +65,12 @@ interface HoverableItemProps
 /**
 * Hover-tracking container for a named group.
 *
- * Uses a `data-hover-group` attribute and CSS `:hover` to control
- * descendant `Hoverable.Item` visibility. No React state or context —
- * the browser natively removes `:hover` when modals/portals steal
- * pointer events, preventing stale hover state.
+ * Wraps children in a `<div>` that tracks mouse-enter / mouse-leave and
+ * provides the hover state via a per-group React context.
+ *
+ * Nesting works because each `Hoverable.Root` creates a **new** context
+ * provider that shadows the parent — so an inner `Hoverable.Item group="b"`
+ * reads from the inner provider, not the outer `group="a"` provider.
 *
 * @example
 * ```tsx
@@ -71,20 +87,70 @@ function HoverableRoot({
  group,
  children,
  widthVariant = "full",
-  interaction = "rest",
  ref,
+  onMouseEnter: consumerMouseEnter,
+  onMouseLeave: consumerMouseLeave,
+  onFocusCapture: consumerFocusCapture,
+  onBlurCapture: consumerBlurCapture,
  ...props
 }: HoverableRootProps) {
+  const [hovered, setHovered] = useState(false);
+  const [focused, setFocused] = useState(false);
+
+  const onMouseEnter = useCallback(
+    (e: React.MouseEvent<HTMLDivElement>) => {
+      setHovered(true);
+      consumerMouseEnter?.(e);
+    },
+    [consumerMouseEnter]
+  );
+
+  const onMouseLeave = useCallback(
+    (e: React.MouseEvent<HTMLDivElement>) => {
+      setHovered(false);
+      consumerMouseLeave?.(e);
+    },
+    [consumerMouseLeave]
+  );
+
+  const onFocusCapture = useCallback(
+    (e: React.FocusEvent<HTMLDivElement>) => {
+      setFocused(true);
+      consumerFocusCapture?.(e);
+    },
+    [consumerFocusCapture]
+  );
+
+  const onBlurCapture = useCallback(
+    (e: React.FocusEvent<HTMLDivElement>) => {
+      if (
+        !(e.relatedTarget instanceof Node) ||
+        !e.currentTarget.contains(e.relatedTarget)
+      ) {
+        setFocused(false);
+      }
+      consumerBlurCapture?.(e);
+    },
+    [consumerBlurCapture]
+  );
+
+  const active = hovered || focused;
+  const GroupContext = getOrCreateContext(group);
+
  return (
-    <div
-      {...props}
-      ref={ref}
-      className={cn(widthVariants[widthVariant])}
-      data-hover-group={group}
-      data-interaction={interaction !== "rest" ? interaction : undefined}
-    >
-      {children}
-    </div>
+    <GroupContext.Provider value={active}>
+      <div
+        {...props}
+        ref={ref}
+        className={cn(widthVariants[widthVariant])}
+        onMouseEnter={onMouseEnter}
+        onMouseLeave={onMouseLeave}
+        onFocusCapture={onFocusCapture}
+        onBlurCapture={onBlurCapture}
+      >
+        {children}
+      </div>
+    </GroupContext.Provider>
  );
 }

@@ -96,10 +162,13 @@ function HoverableRoot({
 * An element whose visibility is controlled by hover state.
 *
 * **Local mode** (`group` omitted): the item handles hover on its own
- * element via CSS `:hover`.
+ * element via CSS `:hover`. This is the core abstraction.
 *
- * **Group mode** (`group` provided): visibility is driven by CSS `:hover`
- * on the nearest `Hoverable.Root` ancestor via `[data-hover-group]:hover`.
+ * **Group mode** (`group` provided): visibility is driven by a matching
+ * `Hoverable.Root` ancestor's hover state via React context. If no
+ * matching Root is found, an error is thrown.
+ *
+ * Uses data-attributes for variant styling (see `styles.css`).
 *
 * @example
 * ```tsx
@@ -115,6 +184,8 @@ function HoverableRoot({
 *   </Hoverable.Item>
 * </Hoverable.Root>
 * ```
+ *
+ * @throws If `group` is specified but no matching `Hoverable.Root` ancestor exists.
 */
 function HoverableItem({
  group,
@@ -123,6 +194,17 @@ function HoverableItem({
  ref,
  ...props
 }: HoverableItemProps) {
+  const contextValue = useContext(
+    group ? getOrCreateContext(group) : NOOP_CONTEXT
+  );
+
+  if (group && contextValue === null) {
+    throw new Error(
+      `Hoverable.Item group="${group}" has no matching Hoverable.Root ancestor. ` +
+        `Either wrap it in <Hoverable.Root group="${group}"> or remove the group prop for local hover.`
+    );
+  }
+
  const isLocal = group === undefined;

  return (
@@ -131,6 +213,9 @@ function HoverableItem({
      ref={ref}
      className={cn("hoverable-item")}
      data-hoverable-variant={variant}
+      data-hoverable-active={
+        isLocal ? undefined : contextValue ? "true" : undefined
+      }
      data-hoverable-local={isLocal ? "true" : undefined}
    >
      {children}
@@ -138,6 +223,9 @@ function HoverableItem({
  );
 }

+/** Stable context used when no group is specified (local mode). */
+const NOOP_CONTEXT = createContext<boolean | null>(null);
+
 // ---------------------------------------------------------------------------
 // Compound export
 // ---------------------------------------------------------------------------
@@ -145,16 +233,18 @@ function HoverableItem({
 /**
 * Hoverable compound component for hover-to-reveal patterns.
 *
- * Entirely CSS-driven — no React state or context. The browser's native
- * `:hover` pseudo-class handles all state, which means hover is
- * automatically cleared when modals/portals steal pointer events.
+ * Provides two sub-components:
 *
- * - `Hoverable.Root` — Container with `data-hover-group`. CSS `:hover`
- *   on this element reveals descendant `Hoverable.Item` elements.
+ * - `Hoverable.Root` — A container that tracks hover state for a named group
+ *   and provides it via React context.
 *
- * - `Hoverable.Item` — Hidden by default. In group mode, revealed when
- *   the ancestor Root is hovered. In local mode (no `group`), revealed
- *   when the item itself is hovered.
+ * - `Hoverable.Item` — The core abstraction. On its own (no `group`), it
+ *   applies local CSS `:hover` for the variant effect. When `group` is
+ *   specified, it reads hover state from the nearest matching
+ *   `Hoverable.Root` — and throws if no matching Root is found.
+ *
+ * Supports nesting: a child `Hoverable.Root` shadows the parent's context,
+ * so each group's items only respond to their own root's hover.
 *
 * @example
 * ```tsx
@@ -186,5 +276,4 @@ export {
  type HoverableRootProps,
  type HoverableItemProps,
  type HoverableItemVariant,
-  type HoverableInteraction,
 };
--- a/web/lib/opal/src/core/animations/styles.css
+++ b/web/lib/opal/src/core/animations/styles.css
@@ -7,20 +7,8 @@
  opacity: 0;
 }

-/* Group mode — Root :hover controls descendant item visibility via CSS.
-   Exclude local-mode items so they aren't revealed by an ancestor root. */
-[data-hover-group]:hover
-  .hoverable-item[data-hoverable-variant="opacity-on-hover"]:not(
-    [data-hoverable-local]
-  ) {
-  opacity: 1;
-}
-
-/* Interaction override — force items visible via JS */
-[data-hover-group][data-interaction="hover"]
-  .hoverable-item[data-hoverable-variant="opacity-on-hover"]:not(
-    [data-hoverable-local]
-  ) {
+/* Group mode — Root controls visibility via React context */
+.hoverable-item[data-hoverable-variant="opacity-on-hover"][data-hoverable-active="true"] {
  opacity: 1;
 }

@@ -29,16 +17,7 @@
  opacity: 1;
 }

-/* Group focus — any focusable descendant of the Root receives keyboard focus,
-   revealing all group items (same behavior as hover). */
-[data-hover-group]:focus-within
-  .hoverable-item[data-hoverable-variant="opacity-on-hover"]:not(
-    [data-hoverable-local]
-  ) {
-  opacity: 1;
-}
-
-/* Local focus — item (or a focusable descendant) receives keyboard focus */
+/* Focus — item (or a focusable descendant) receives keyboard focus */
 .hoverable-item[data-hoverable-variant="opacity-on-hover"]:has(:focus-visible) {
  opacity: 1;
 }
--- a/web/lib/opal/src/icons/bifrost.tsx
+++ b/web/lib/opal/src/icons/bifrost.tsx
@@ -8,7 +8,7 @@ const SvgBifrost = ({ size, className, ...props }: IconProps) => (
    viewBox="0 0 37 46"
    fill="none"
    xmlns="http://www.w3.org/2000/svg"
-    className={cn(className, "!text-[#33C19E]")}
+    className={cn(className, "text-[#33C19E] dark:text-white")}
    {...props}
  >
    <title>Bifrost</title>
--- a/web/lib/opal/src/layouts/cards/README.md
+++ b/web/lib/opal/src/layouts/cards/README.md
@@ -1,116 +0,0 @@
-# Card
-
-**Import:** `import { Card } from "@opal/layouts";`
-
-A namespace of card layout primitives. Each sub-component handles a specific region of a card.
-
-## Card.Header
-
-A card header layout that pairs a [`Content`](../content/README.md) block with a right-side column and an optional full-width children slot.
-
-### Why Card.Header?
-
-[`ContentAction`](../content-action/README.md) provides a single `rightChildren` slot. Card headers typically need two distinct right-side regions — a primary action on top and secondary actions on the bottom. `Card.Header` provides this with `rightChildren` and `bottomRightChildren` slots, plus a `children` slot for full-width content below the header row (e.g., search bars, expandable tool lists).
-
-### Props
-
-Inherits **all** props from [`Content`](../content/README.md) (icon, title, description, sizePreset, variant, editable, onTitleChange, suffix, etc.) plus:
-
-| Prop | Type | Default | Description |
-|---|---|---|---|
-| `rightChildren` | `ReactNode` | `undefined` | Content rendered to the right of the Content block (top of right column). |
-| `bottomRightChildren` | `ReactNode` | `undefined` | Content rendered below `rightChildren` in the same column. Laid out as `flex flex-row`. |
-| `children` | `ReactNode` | `undefined` | Content rendered below the full header row, spanning the entire width. |
-
-### Layout Structure
-
-```
-+---------------------------------------------------------+
-| [Content (p-2, self-start)]    [rightChildren]          |
-|  icon + title + description    [bottomRightChildren]    |
-+---------------------------------------------------------+
-| [children — full width]                                 |
-+---------------------------------------------------------+
-```
-
- Outer wrapper: `flex flex-col w-full`
- Header row: `flex flex-row items-stretch w-full`
- Content area: `flex-1 min-w-0 self-start p-2` — top-aligned with fixed padding
- Right column: `flex flex-col items-end shrink-0` — no padding, no gap
- `bottomRightChildren` wrapper: `flex flex-row` — lays children out horizontally
- `children` wrapper: `w-full` — only rendered when children are provided
-
-### Usage
-
-#### Card with primary and secondary actions
-
-```tsx
-import { Card } from "@opal/layouts";
-import { Button } from "@opal/components";
-import { SvgGlobe, SvgSettings, SvgUnplug, SvgCheckSquare } from "@opal/icons";
-
-<Card.Header
-  icon={SvgGlobe}
-  title="Google Search"
-  description="Web search provider"
-  sizePreset="main-ui"
-  variant="section"
-  rightChildren={
-    <Button icon={SvgCheckSquare} variant="action" prominence="tertiary">
-      Current Default
-    </Button>
-  }
-  bottomRightChildren={
-    <>
-      <Button icon={SvgUnplug} size="sm" prominence="tertiary" tooltip="Disconnect" />
-      <Button icon={SvgSettings} size="sm" prominence="tertiary" tooltip="Edit" />
-    </>
-  }
-/>
-```
-
-#### Card with only a connect action
-
-```tsx
-<Card.Header
-  icon={SvgCloud}
-  title="OpenAI"
-  description="Not configured"
-  sizePreset="main-ui"
-  variant="section"
-  rightChildren={
-    <Button rightIcon={SvgArrowExchange} prominence="tertiary">
-      Connect
-    </Button>
-  }
-/>
-```
-
-#### Card with expandable children
-
-```tsx
-<Card.Header
-  icon={SvgServer}
-  title="MCP Server"
-  description="12 tools available"
-  sizePreset="main-ui"
-  variant="section"
-  rightChildren={<Button icon={SvgSettings} prominence="tertiary" />}
->
-  <SearchBar placeholder="Search tools..." />
-</Card.Header>
-```
-
-#### No right children
-
-```tsx
-<Card.Header
-  icon={SvgInfo}
-  title="Section Header"
-  description="Description text"
-  sizePreset="main-content"
-  variant="section"
-/>
-```
-
-When both `rightChildren` and `bottomRightChildren` are omitted and no `children` are provided, the component renders only the padded `Content`.
--- a/web/lib/opal/src/layouts/cards/header-layout/CardHeaderLayout.stories.tsx
+++ b/web/lib/opal/src/layouts/cards/header-layout/CardHeaderLayout.stories.tsx
@@ -1,5 +1,5 @@
 import type { Meta, StoryObj } from "@storybook/react";
-import { Card } from "@opal/layouts";
+import { CardHeaderLayout } from "@opal/layouts";
 import { Button } from "@opal/components";
 import {
  SvgArrowExchange,
@@ -18,14 +18,14 @@ const withTooltipProvider: Decorator = (Story) => (
 );

 const meta = {
-  title: "Layouts/Card.Header",
-  component: Card.Header,
+  title: "Layouts/CardHeaderLayout",
+  component: CardHeaderLayout,
  tags: ["autodocs"],
  decorators: [withTooltipProvider],
  parameters: {
    layout: "centered",
  },
-} satisfies Meta<typeof Card.Header>;
+} satisfies Meta<typeof CardHeaderLayout>;

 export default meta;

@@ -38,7 +38,7 @@ type Story = StoryObj<typeof meta>;
 export const Default: Story = {
  render: () => (
    <div className="w-[28rem] border rounded-16">
-      <Card.Header
+      <CardHeaderLayout
        sizePreset="main-ui"
        variant="section"
        icon={SvgGlobe}
@@ -57,7 +57,7 @@ export const Default: Story = {
 export const WithBothSlots: Story = {
  render: () => (
    <div className="w-[28rem] border rounded-16">
-      <Card.Header
+      <CardHeaderLayout
        sizePreset="main-ui"
        variant="section"
        icon={SvgGlobe}
@@ -92,7 +92,7 @@ export const WithBothSlots: Story = {
 export const RightChildrenOnly: Story = {
  render: () => (
    <div className="w-[28rem] border rounded-16">
-      <Card.Header
+      <CardHeaderLayout
        sizePreset="main-ui"
        variant="section"
        icon={SvgGlobe}
@@ -111,7 +111,7 @@ export const RightChildrenOnly: Story = {
 export const NoRightChildren: Story = {
  render: () => (
    <div className="w-[28rem] border rounded-16">
-      <Card.Header
+      <CardHeaderLayout
        sizePreset="main-ui"
        variant="section"
        icon={SvgGlobe}
@@ -125,7 +125,7 @@ export const NoRightChildren: Story = {
 export const LongContent: Story = {
  render: () => (
    <div className="w-[28rem] border rounded-16">
-      <Card.Header
+      <CardHeaderLayout
        sizePreset="main-ui"
        variant="section"
        icon={SvgGlobe}
--- a/Show More
+++ b/Show More