.

chore: preview modal (#8665 )
fix(search): Improve Speed (#8430 )
2026-02-24 03:05:48 +00:00 · 2026-02-23 16:30:30 -08:00 · 2026-02-23 16:29:13 -08:00 · 2026-02-23 16:29:13 -08:00 · 2026-02-23 16:29:13 -08:00 · 2026-02-23 16:29:13 -08:00
542 changed files with 28904 additions and 9497 deletions
--- a/.claude/skills
+++ b/.claude/skills
@@ -0,0 +1 @@
+../.cursor/skills
--- a/.cursor/skills/playwright/SKILL.md
+++ b/.cursor/skills/playwright/SKILL.md
@@ -0,0 +1,248 @@
+---
+name: playwright-e2e-tests
+description: Write and maintain Playwright end-to-end tests for the Onyx application. Use when creating new E2E tests, debugging test failures, adding test coverage, or when the user mentions Playwright, E2E tests, or browser testing.
+---
+
+# Playwright E2E Tests
+
+## Project Layout
+
+- **Tests**: `web/tests/e2e/` — organized by feature (`auth/`, `admin/`, `chat/`, `assistants/`, `connectors/`, `mcp/`)
+- **Config**: `web/playwright.config.ts`
+- **Utilities**: `web/tests/e2e/utils/`
+- **Constants**: `web/tests/e2e/constants.ts`
+- **Global setup**: `web/tests/e2e/global-setup.ts`
+- **Output**: `web/output/playwright/`
+
+## Imports
+
+Always use absolute imports with the `@tests/e2e/` prefix — never relative paths (`../`, `../../`). The alias is defined in `web/tsconfig.json` and resolves to `web/tests/`.
+
+```typescript
+import { loginAs } from "@tests/e2e/utils/auth";
+import { OnyxApiClient } from "@tests/e2e/utils/onyxApiClient";
+import { TEST_ADMIN_CREDENTIALS } from "@tests/e2e/constants";
+```
+
+All new files should be `.ts`, not `.js`.
+
+## Running Tests
+
+```bash
+# Run a specific test file
+npx playwright test web/tests/e2e/chat/default_assistant.spec.ts
+
+# Run a specific project
+npx playwright test --project admin
+npx playwright test --project exclusive
+```
+
+## Test Projects
+
+| Project | Description | Parallelism |
+|---------|-------------|-------------|
+| `admin` | Standard tests (excludes `@exclusive`) | Parallel |
+| `exclusive` | Serial, slower tests (tagged `@exclusive`) | 1 worker |
+
+All tests use `admin_auth.json` storage state by default (pre-authenticated admin session).
+
+## Authentication
+
+Global setup (`global-setup.ts`) runs automatically before all tests and handles:
+
+- Server readiness check (polls health endpoint, 60s timeout)
+- Provisioning test users: admin, admin2, and a **pool of worker users** (`worker0@example.com` through `worker7@example.com`) (idempotent)
+- API login + saving storage states: `admin_auth.json`, `admin2_auth.json`, and `worker{N}_auth.json` for each worker user
+- Setting display name to `"worker"` for each worker user
+- Promoting admin2 to admin role
+- Ensuring a public LLM provider exists
+
+Both test projects set `storageState: "admin_auth.json"`, so **every test starts pre-authenticated as admin with no login code needed**.
+
+When a test needs a different user, use API-based login — never drive the login UI:
+
+```typescript
+import { loginAs } from "@tests/e2e/utils/auth";
+
+await page.context().clearCookies();
+await loginAs(page, "admin2");
+
+// Log in as the worker-specific user (preferred for test isolation):
+import { loginAsWorkerUser } from "@tests/e2e/utils/auth";
+await page.context().clearCookies();
+await loginAsWorkerUser(page, testInfo.workerIndex);
+```
+
+## Test Structure
+
+Tests start pre-authenticated as admin — navigate and test directly:
+
+```typescript
+import { test, expect } from "@playwright/test";
+
+test.describe("Feature Name", () => {
+  test("should describe expected behavior clearly", async ({ page }) => {
+    await page.goto("/app");
+    await page.waitForLoadState("networkidle");
+    // Already authenticated as admin — go straight to testing
+  });
+});
+```
+
+**User isolation** — tests that modify visible app state (creating assistants, sending chat messages, pinning items) should run as a **worker-specific user** and clean up resources in `afterAll`. Global setup provisions a pool of worker users (`worker0@example.com` through `worker7@example.com`). `loginAsWorkerUser` maps `testInfo.workerIndex` to a pool slot via modulo, so retry workers (which get incrementing indices beyond the pool size) safely reuse existing users. This ensures parallel workers never share user state, keeps usernames deterministic for screenshots, and avoids cross-contamination:
+
+```typescript
+import { test } from "@playwright/test";
+import { loginAsWorkerUser } from "@tests/e2e/utils/auth";
+
+test.beforeEach(async ({ page }, testInfo) => {
+  await page.context().clearCookies();
+  await loginAsWorkerUser(page, testInfo.workerIndex);
+});
+```
+
+If the test requires admin privileges *and* modifies visible state, use `"admin2"` instead — it's a pre-provisioned admin account that keeps the primary `"admin"` clean for other parallel tests. Switch to `"admin"` only for privileged setup (creating providers, configuring tools), then back to the worker user for the actual test. See `chat/default_assistant.spec.ts` for a full example.
+
+`loginAsRandomUser` exists for the rare case where the test requires a brand-new user (e.g. onboarding flows). Avoid it elsewhere — it produces non-deterministic usernames that complicate screenshots.
+
+**API resource setup** — only when tests need to create backend resources (image gen configs, web search providers, MCP servers). Use `beforeAll`/`afterAll` with `OnyxApiClient` to create and clean up. See `chat/default_assistant.spec.ts` or `mcp/mcp_oauth_flow.spec.ts` for examples. This is uncommon (~4 of 37 test files).
+
+## Key Utilities
+
+### `OnyxApiClient` (`@tests/e2e/utils/onyxApiClient`)
+
+Backend API client for test setup/teardown. Key methods:
+
+- **Connectors**: `createFileConnector()`, `deleteCCPair()`, `pauseConnector()`
+- **LLM Providers**: `ensurePublicProvider()`, `createRestrictedProvider()`, `setProviderAsDefault()`
+- **Assistants**: `createAssistant()`, `deleteAssistant()`, `findAssistantByName()`
+- **User Groups**: `createUserGroup()`, `deleteUserGroup()`, `setUserRole()`
+- **Tools**: `createWebSearchProvider()`, `createImageGenerationConfig()`
+- **Chat**: `createChatSession()`, `deleteChatSession()`
+
+### `chatActions` (`@tests/e2e/utils/chatActions`)
+
+- `sendMessage(page, message)` — sends a message and waits for AI response
+- `startNewChat(page)` — clicks new-chat button and waits for intro
+- `verifyDefaultAssistantIsChosen(page)` — checks Onyx logo is visible
+- `verifyAssistantIsChosen(page, name)` — checks assistant name display
+- `switchModel(page, modelName)` — switches LLM model via popover
+
+### `visualRegression` (`@tests/e2e/utils/visualRegression`)
+
+- `expectScreenshot(page, { name, mask?, hide?, fullPage? })`
+- `expectElementScreenshot(locator, { name, mask?, hide? })`
+- Controlled by `VISUAL_REGRESSION=true` env var
+
+### `theme` (`@tests/e2e/utils/theme`)
+
+- `THEMES` — `["light", "dark"] as const` array for iterating over both themes
+- `setThemeBeforeNavigation(page, theme)` — sets `next-themes` theme via `localStorage` before navigation
+
+When tests need light/dark screenshots, loop over `THEMES` at the `test.describe` level and call `setThemeBeforeNavigation` in `beforeEach` **before** any `page.goto()`. Include the theme in screenshot names. See `admin/admin_pages.spec.ts` or `chat/chat_message_rendering.spec.ts` for examples:
+
+```typescript
+import { THEMES, setThemeBeforeNavigation } from "@tests/e2e/utils/theme";
+
+for (const theme of THEMES) {
+  test.describe(`Feature (${theme} mode)`, () => {
+    test.beforeEach(async ({ page }) => {
+      await setThemeBeforeNavigation(page, theme);
+    });
+
+    test("renders correctly", async ({ page }) => {
+      await page.goto("/app");
+      await expectScreenshot(page, { name: `feature-${theme}` });
+    });
+  });
+}
+```
+
+### `tools` (`@tests/e2e/utils/tools`)
+
+- `TOOL_IDS` — centralized `data-testid` selectors for tool options
+- `openActionManagement(page)` — opens the tool management popover
+
+## Locator Strategy
+
+Use locators in this priority order:
+
+1. **`data-testid` / `aria-label`** — preferred for Onyx components
+   ```typescript
+   page.getByTestId("AppSidebar/new-session")
+   page.getByLabel("admin-page-title")
+   ```
+
+2. **Role-based** — for standard HTML elements
+   ```typescript
+   page.getByRole("button", { name: "Create" })
+   page.getByRole("dialog")
+   ```
+
+3. **Text/Label** — for visible text content
+   ```typescript
+   page.getByText("Custom Assistant")
+   page.getByLabel("Email")
+   ```
+
+4. **CSS selectors** — last resort, only when above won't work
+   ```typescript
+   page.locator('input[name="name"]')
+   page.locator("#onyx-chat-input-textarea")
+   ```
+
+**Never use** `page.locator` with complex CSS/XPath when a built-in locator works.
+
+## Assertions
+
+Use web-first assertions — they auto-retry until the condition is met:
+
+```typescript
+// Visibility
+await expect(page.getByTestId("onyx-logo")).toBeVisible({ timeout: 5000 });
+
+// Text content
+await expect(page.getByTestId("assistant-name-display")).toHaveText("My Assistant");
+
+// Count
+await expect(page.locator('[data-testid="onyx-ai-message"]')).toHaveCount(2, { timeout: 30000 });
+
+// URL
+await expect(page).toHaveURL(/chatId=/);
+
+// Element state
+await expect(toggle).toBeChecked();
+await expect(button).toBeEnabled();
+```
+
+**Never use** `assert` statements or hardcoded `page.waitForTimeout()`.
+
+## Waiting Strategy
+
+```typescript
+// Wait for load state after navigation
+await page.goto("/app");
+await page.waitForLoadState("networkidle");
+
+// Wait for specific element
+await page.getByTestId("chat-intro").waitFor({ state: "visible", timeout: 10000 });
+
+// Wait for URL change
+await page.waitForFunction(() => window.location.href.includes("chatId="), null, { timeout: 10000 });
+
+// Wait for network response
+await page.waitForResponse(resp => resp.url().includes("/api/chat") && resp.status() === 200);
+```
+
+## Best Practices
+
+1. **Descriptive test names** — clearly state expected behavior: `"should display greeting message when opening new chat"`
+2. **API-first setup** — use `OnyxApiClient` for backend state; reserve UI interactions for the behavior under test
+3. **User isolation** — tests that modify visible app state (sidebar, chat history) should run as the worker-specific user via `loginAsWorkerUser(page, testInfo.workerIndex)` (not admin) and clean up resources in `afterAll`. Each parallel worker gets its own user, preventing cross-contamination. Reserve `loginAsRandomUser` for flows that require a brand-new user (e.g. onboarding)
+4. **DRY helpers** — extract reusable logic into `utils/` with JSDoc comments
+5. **No hardcoded waits** — use `waitFor`, `waitForLoadState`, or web-first assertions
+6. **Parallel-safe** — no shared mutable state between tests. Prefer static, human-readable names (e.g. `"E2E-CMD Chat 1"`) and clean up resources by ID in `afterAll`. This keeps screenshots deterministic and avoids needing to mask/hide dynamic text. Only fall back to timestamps (`\`test-${Date.now()}\``) when resources cannot be reliably cleaned up or when name collisions across parallel workers would cause functional failures
+7. **Error context** — catch and re-throw with useful debug info (page text, URL, etc.)
+8. **Tag slow tests** — mark serial/slow tests with `@exclusive` in the test title
+9. **Visual regression** — use `expectScreenshot()` for UI consistency checks
+10. **Minimal comments** — only comment to clarify non-obvious intent; never restate what the next line of code does
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@@ -8,5 +8,5 @@

 ## Additional Options

- [ ] [Required] I have considered whether this PR needs to be cherry-picked to the latest beta branch.
+- [ ] [Optional] Please cherry-pick this PR to the latest release version.
 - [ ] [Optional] Override Linear Check
--- a/.github/workflows/helm-chart-releases.yml
+++ b/.github/workflows/helm-chart-releases.yml
@@ -33,7 +33,7 @@ jobs:
          helm repo add cloudnative-pg https://cloudnative-pg.github.io/charts
          helm repo add ot-container-kit https://ot-container-kit.github.io/helm-charts
          helm repo add minio https://charts.min.io/
-          helm repo add code-interpreter https://onyx-dot-app.github.io/code-interpreter/
+          helm repo add code-interpreter https://onyx-dot-app.github.io/python-sandbox/
          helm repo update

      - name: Build chart dependencies
--- a/.github/workflows/nightly-scan-licenses.yml
+++ b/.github/workflows/nightly-scan-licenses.yml
@@ -1,151 +0,0 @@
-# Scan for problematic software licenses
-
-# trivy has their own rate limiting issues causing this action to flake
-# we worked around it by hardcoding to different db repos in env
-# can re-enable when they figure it out
-# https://github.com/aquasecurity/trivy/discussions/7538
-# https://github.com/aquasecurity/trivy-action/issues/389
-
-name: 'Nightly - Scan licenses'
-on:
-#   schedule:
-#     - cron: '0 14 * * *'  # Runs every day at 6 AM PST / 7 AM PDT / 2 PM UTC
-  workflow_dispatch:  # Allows manual triggering
-
-permissions:
-  actions: read
-  contents: read
-
-jobs:
-  scan-licenses:
-    # See https://runs-on.com/runners/linux/
-    runs-on: [runs-on,runner=2cpu-linux-x64,"run-id=${{ github.run_id }}-scan-licenses"]
-    timeout-minutes: 45
-    permissions:
-      actions: read
-      contents: read
-      security-events: write
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # ratchet:actions/checkout@v6
-        with:
-          persist-credentials: false
-
-      - name: Set up Python
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # ratchet:actions/setup-python@v6
-        with:
-          python-version: '3.11'
-          cache: 'pip'
-          cache-dependency-path: |
-            backend/requirements/default.txt
-            backend/requirements/dev.txt
-            backend/requirements/model_server.txt
-
-      - name: Get explicit and transitive dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install --retries 5 --timeout 30 -r backend/requirements/default.txt
-          pip install --retries 5 --timeout 30 -r backend/requirements/dev.txt
-          pip install --retries 5 --timeout 30 -r backend/requirements/model_server.txt
-          pip freeze > requirements-all.txt
-
-      - name: Check python
-        id: license_check_report
-        uses: pilosus/action-pip-license-checker@e909b0226ff49d3235c99c4585bc617f49fff16a # ratchet:pilosus/action-pip-license-checker@v3
-        with:
-          requirements: 'requirements-all.txt'
-          fail: 'Copyleft'
-          exclude: '(?i)^(pylint|aio[-_]*).*'
-
-      - name: Print report
-        if: always()
-        env:
-          REPORT: ${{ steps.license_check_report.outputs.report }}
-        run: echo "$REPORT"
-
-      - name: Install npm dependencies
-        working-directory: ./web
-        run: npm ci
-
-        # be careful enabling the sarif and upload as it may spam the security tab
-        # with a huge amount of items. Work out the issues before enabling upload.
-#       - name: Run Trivy vulnerability scanner in repo mode
-#         if: always()
-#         uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8 # ratchet:aquasecurity/trivy-action@0.33.1
-#         with:
-#           scan-type: fs
-#           scan-ref: .
-#           scanners: license
-#           format: table
-#           severity: HIGH,CRITICAL
-# #           format: sarif
-# #           output: trivy-results.sarif
-#
-# #       - name: Upload Trivy scan results to GitHub Security tab
-# #         uses: github/codeql-action/upload-sarif@v3
-# #         with:
-# #           sarif_file: trivy-results.sarif
-
-  scan-trivy:
-    # See https://runs-on.com/runners/linux/
-    runs-on: [runs-on,runner=2cpu-linux-x64,"run-id=${{ github.run_id }}-scan-trivy"]
-    timeout-minutes: 45
-
-    steps:
-    - name: Set up Docker Buildx
-      uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # ratchet:docker/setup-buildx-action@v3
-
-    - name: Login to Docker Hub
-      uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # ratchet:docker/login-action@v3
-      with:
-        username: ${{ secrets.DOCKER_USERNAME }}
-        password: ${{ secrets.DOCKER_TOKEN }}
-
-    # Backend
-    - name: Pull backend docker image
-      run: docker pull onyxdotapp/onyx-backend:latest
-
-    - name: Run Trivy vulnerability scanner on backend
-      uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8 # ratchet:aquasecurity/trivy-action@0.33.1
-      env:
-        TRIVY_DB_REPOSITORY: 'public.ecr.aws/aquasecurity/trivy-db:2'
-        TRIVY_JAVA_DB_REPOSITORY: 'public.ecr.aws/aquasecurity/trivy-java-db:1'
-      with:
-        image-ref: onyxdotapp/onyx-backend:latest
-        scanners: license
-        severity: HIGH,CRITICAL
-        vuln-type: library
-        exit-code: 0  # Set to 1 if we want a failed scan to fail the workflow
-
-    # Web server
-    - name: Pull web server docker image
-      run: docker pull onyxdotapp/onyx-web-server:latest
-
-    - name: Run Trivy vulnerability scanner on web server
-      uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8 # ratchet:aquasecurity/trivy-action@0.33.1
-      env:
-        TRIVY_DB_REPOSITORY: 'public.ecr.aws/aquasecurity/trivy-db:2'
-        TRIVY_JAVA_DB_REPOSITORY: 'public.ecr.aws/aquasecurity/trivy-java-db:1'
-      with:
-        image-ref: onyxdotapp/onyx-web-server:latest
-        scanners: license
-        severity: HIGH,CRITICAL
-        vuln-type: library
-        exit-code: 0
-
-    # Model server
-    - name: Pull model server docker image
-      run: docker pull onyxdotapp/onyx-model-server:latest
-
-    - name: Run Trivy vulnerability scanner
-      uses: aquasecurity/trivy-action@b6643a29fecd7f34b3597bc6acb0a98b03d33ff8 # ratchet:aquasecurity/trivy-action@0.33.1
-      env:
-        TRIVY_DB_REPOSITORY: 'public.ecr.aws/aquasecurity/trivy-db:2'
-        TRIVY_JAVA_DB_REPOSITORY: 'public.ecr.aws/aquasecurity/trivy-java-db:1'
-      with:
-        image-ref: onyxdotapp/onyx-model-server:latest
-        scanners: license
-        severity: HIGH,CRITICAL
-        vuln-type: library
-        exit-code: 0
--- a/.github/workflows/post-merge-beta-cherry-pick.yml
+++ b/.github/workflows/post-merge-beta-cherry-pick.yml
@@ -0,0 +1,79 @@
+name: Post-Merge Beta Cherry-Pick
+
+on:
+  push:
+    branches:
+      - main
+
+permissions:
+  contents: write
+  pull-requests: write
+
+jobs:
+  cherry-pick-to-latest-release:
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+    steps:
+      - name: Resolve merged PR and checkbox state
+        id: gate
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          # For the commit that triggered this workflow (HEAD on main), fetch all
+          # associated PRs and keep only the PR that was actually merged into main
+          # with this exact merge commit SHA.
+          pr_numbers="$(gh api "repos/${GITHUB_REPOSITORY}/commits/${GITHUB_SHA}/pulls" | jq -r --arg sha "${GITHUB_SHA}" '.[] | select(.merged_at != null and .base.ref == "main" and .merge_commit_sha == $sha) | .number')"
+          match_count="$(printf '%s\n' "$pr_numbers" | sed '/^[[:space:]]*$/d' | wc -l | tr -d ' ')"
+          pr_number="$(printf '%s\n' "$pr_numbers" | sed '/^[[:space:]]*$/d' | head -n 1)"
+
+          if [ "${match_count}" -gt 1 ]; then
+            echo "::warning::Multiple merged PRs matched commit ${GITHUB_SHA}. Using PR #${pr_number}."
+          fi
+
+          if [ -z "$pr_number" ]; then
+            echo "No merged PR associated with commit ${GITHUB_SHA}; skipping."
+            echo "should_cherrypick=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          # Read the PR body and check whether the helper checkbox is checked.
+          pr_body="$(gh api "repos/${GITHUB_REPOSITORY}/pulls/${pr_number}" --jq '.body // ""')"
+          echo "pr_number=$pr_number" >> "$GITHUB_OUTPUT"
+
+          if echo "$pr_body" | grep -qiE "\\[x\\][[:space:]]*(\\[[^]]+\\][[:space:]]*)?Please cherry-pick this PR to the latest release version"; then
+            echo "should_cherrypick=true" >> "$GITHUB_OUTPUT"
+            echo "Cherry-pick checkbox checked for PR #${pr_number}."
+            exit 0
+          fi
+
+          echo "should_cherrypick=false" >> "$GITHUB_OUTPUT"
+          echo "Cherry-pick checkbox not checked for PR #${pr_number}. Skipping."
+
+      - name: Checkout repository
+        if: steps.gate.outputs.should_cherrypick == 'true'
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # ratchet:actions/checkout@v6
+        with:
+          fetch-depth: 0
+          persist-credentials: true
+          ref: main
+
+      - name: Install the latest version of uv
+        if: steps.gate.outputs.should_cherrypick == 'true'
+        uses: astral-sh/setup-uv@61cb8a9741eeb8a550a1b8544337180c0fc8476b # ratchet:astral-sh/setup-uv@v7
+        with:
+          enable-cache: false
+          version: "0.9.9"
+
+      - name: Configure git identity
+        if: steps.gate.outputs.should_cherrypick == 'true'
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+
+      - name: Create cherry-pick PR to latest release
+        if: steps.gate.outputs.should_cherrypick == 'true'
+        env:
+          GH_TOKEN: ${{ github.token }}
+          GITHUB_TOKEN: ${{ github.token }}
+        run: |
+          uv run --no-sync --with onyx-devtools ods cherry-pick "${GITHUB_SHA}" --yes --no-verify
--- a/.github/workflows/pr-beta-cherrypick-check.yml
+++ b/.github/workflows/pr-beta-cherrypick-check.yml
@@ -1,28 +0,0 @@
-name: Require beta cherry-pick consideration
-concurrency:
-  group: Require-Beta-Cherrypick-Consideration-${{ github.workflow }}-${{ github.head_ref || github.event.workflow_run.head_branch || github.run_id }}
-  cancel-in-progress: true
-
-on:
-  pull_request:
-    types: [opened, edited, reopened, synchronize]
-
-permissions:
-  contents: read
-
-jobs:
-  beta-cherrypick-check:
-    runs-on: ubuntu-latest
-    timeout-minutes: 45
-    steps:
-      - name: Check PR body for beta cherry-pick consideration
-        env:
-          PR_BODY: ${{ github.event.pull_request.body }}
-        run: |
-          if echo "$PR_BODY" | grep -qiE "\\[x\\][[:space:]]*\\[Required\\][[:space:]]*I have considered whether this PR needs to be cherry[- ]picked to the latest beta branch"; then
-            echo "Cherry-pick consideration box is checked. Check passed."
-            exit 0
-          fi
-
-          echo "::error::Please check the 'I have considered whether this PR needs to be cherry-picked to the latest beta branch' box in the PR description."
-          exit 1
--- a/.github/workflows/pr-external-dependency-unit-tests.yml
+++ b/.github/workflows/pr-external-dependency-unit-tests.yml
@@ -45,9 +45,6 @@ env:
  # TODO: debug why this is failing and enable
  CODE_INTERPRETER_BASE_URL: http://localhost:8000

-  # OpenSearch
-  OPENSEARCH_ADMIN_PASSWORD: "StrongPassword123!"
-
 jobs:
  discover-test-dirs:
    # NOTE: Github-hosted runners have about 20s faster queue times and are preferred here.
@@ -118,9 +115,10 @@ jobs:
      - name: Create .env file for Docker Compose
        run: |
          cat <<EOF > deployment/docker_compose/.env
-          COMPOSE_PROFILES=s3-filestore
+          COMPOSE_PROFILES=s3-filestore,opensearch-enabled
          CODE_INTERPRETER_BETA_ENABLED=true
          DISABLE_TELEMETRY=true
+          OPENSEARCH_FOR_ONYX_ENABLED=true
          EOF

      - name: Set up Standard Dependencies
@@ -129,7 +127,6 @@ jobs:
          docker compose \
            -f docker-compose.yml \
            -f docker-compose.dev.yml \
-            -f docker-compose.opensearch.yml \
            up -d \
            minio \
            relational_db \
--- a/.github/workflows/pr-helm-chart-testing.yml
+++ b/.github/workflows/pr-helm-chart-testing.yml
@@ -41,8 +41,7 @@ jobs:
          version: v3.19.0

      - name: Set up chart-testing
-        # NOTE: This is Jamison's patch from https://github.com/helm/chart-testing-action/pull/194
-        uses: helm/chart-testing-action@8958a6ac472cbd8ee9a8fbb6f1acbc1b0e966e44 # zizmor: ignore[impostor-commit]
+        uses: helm/chart-testing-action@b5eebdd9998021f29756c53432f48dab66394810
        with:
          uv_version: "0.9.9"

@@ -92,7 +91,7 @@ jobs:
          helm repo add cloudnative-pg https://cloudnative-pg.github.io/charts
          helm repo add ot-container-kit https://ot-container-kit.github.io/helm-charts
          helm repo add minio https://charts.min.io/
-          helm repo add code-interpreter https://onyx-dot-app.github.io/code-interpreter/
+          helm repo add code-interpreter https://onyx-dot-app.github.io/python-sandbox/
          helm repo update

      - name: Install Redis operator
--- a/.github/workflows/pr-playwright-tests.yml
+++ b/.github/workflows/pr-playwright-tests.yml
@@ -22,6 +22,9 @@ env:
  SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
  GEN_AI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
+  FIRECRAWL_API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
+  GOOGLE_PSE_API_KEY: ${{ secrets.GOOGLE_PSE_API_KEY }}
+  GOOGLE_PSE_SEARCH_ENGINE_ID: ${{ secrets.GOOGLE_PSE_SEARCH_ENGINE_ID }}

  # for federated slack tests
  SLACK_CLIENT_ID: ${{ secrets.SLACK_CLIENT_ID }}
@@ -300,6 +303,7 @@ jobs:
          # TODO(Nik): https://linear.app/onyx-app/issue/ENG-1/update-test-infra-to-use-test-license
          LICENSE_ENFORCEMENT_ENABLED=false
          AUTH_TYPE=basic
+          INTEGRATION_TESTS_MODE=true
          GEN_AI_API_KEY=${OPENAI_API_KEY_VALUE}
          EXA_API_KEY=${EXA_API_KEY_VALUE}
          REQUIRE_EMAIL_VERIFICATION=false
@@ -589,7 +593,10 @@ jobs:
  # Post a single combined visual regression comment after all matrix jobs finish
  visual-regression-comment:
    needs: [playwright-tests]
-    if: always() && github.event_name == 'pull_request'
+    if: >-
+      always() &&
+      github.event_name == 'pull_request' &&
+      needs.playwright-tests.result != 'cancelled'
    runs-on: ubuntu-slim
    timeout-minutes: 5
    permissions:
--- a/.github/workflows/zizmor.yml
+++ b/.github/workflows/zizmor.yml
@@ -5,6 +5,8 @@ on:
    branches: ["main"]
  pull_request:
    branches: ["**"]
+    paths:
+      - ".github/**"

 permissions: {}

@@ -21,29 +23,18 @@ jobs:
        with:
          persist-credentials: false

-      - name: Detect changes
-        id: filter
-        uses: dorny/paths-filter@de90cc6fb38fc0963ad72b210f1f284cd68cea36 # ratchet:dorny/paths-filter@v3
-        with:
-          filters: |
-            zizmor:
-              - '.github/**'
-
      - name: Install the latest version of uv
-        if: steps.filter.outputs.zizmor == 'true' || github.ref_name == 'main'
        uses: astral-sh/setup-uv@61cb8a9741eeb8a550a1b8544337180c0fc8476b # ratchet:astral-sh/setup-uv@v7
        with:
          enable-cache: false
          version: "0.9.9"

      - name: Run zizmor
-        if: steps.filter.outputs.zizmor == 'true' || github.ref_name == 'main'
        run: uv run --no-sync --with zizmor zizmor --format=sarif . > results.sarif
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Upload SARIF file
-        if: steps.filter.outputs.zizmor == 'true' || github.ref_name == 'main'
        uses: github/codeql-action/upload-sarif@ba454b8ab46733eb6145342877cd148270bb77ab # ratchet:github/codeql-action/upload-sarif@codeql-bundle-v2.23.5
        with:
          sarif_file: results.sarif
--- a/.gitignore
+++ b/.gitignore
@@ -7,6 +7,7 @@
 .zed
 .cursor
 !/.cursor/mcp.json
+!/.cursor/skills/

 # macos
 .DS_store
--- a/backend/alembic/run_multitenant_migrations.py
+++ b/backend/alembic/run_multitenant_migrations.py
@@ -21,15 +21,14 @@ import sys
 import threading
 import time
 from concurrent.futures import ThreadPoolExecutor, as_completed
-from typing import List, NamedTuple
+from typing import NamedTuple

 from alembic.config import Config
 from alembic.script import ScriptDirectory
-from sqlalchemy import text

-from onyx.db.engine.sql_engine import is_valid_schema_name
 from onyx.db.engine.sql_engine import SqlEngine
 from onyx.db.engine.tenant_utils import get_all_tenant_ids
+from onyx.db.engine.tenant_utils import get_schemas_needing_migration
 from shared_configs.configs import TENANT_ID_PREFIX


@@ -105,56 +104,6 @@ def get_head_revision() -> str | None:
    return script.get_current_head()


-def get_schemas_needing_migration(
-    tenant_schemas: List[str], head_rev: str
-) -> List[str]:
-    """Return only schemas whose current alembic version is not at head."""
-    if not tenant_schemas:
-        return []
-
-    engine = SqlEngine.get_engine()
-
-    with engine.connect() as conn:
-        # Find which schemas actually have an alembic_version table
-        rows = conn.execute(
-            text(
-                "SELECT table_schema FROM information_schema.tables "
-                "WHERE table_name = 'alembic_version' "
-                "AND table_schema = ANY(:schemas)"
-            ),
-            {"schemas": tenant_schemas},
-        )
-        schemas_with_table = set(row[0] for row in rows)
-
-        # Schemas without the table definitely need migration
-        needs_migration = [s for s in tenant_schemas if s not in schemas_with_table]
-
-        if not schemas_with_table:
-            return needs_migration
-
-        # Validate schema names before interpolating into SQL
-        for schema in schemas_with_table:
-            if not is_valid_schema_name(schema):
-                raise ValueError(f"Invalid schema name: {schema}")
-
-        # Single query to get every schema's current revision at once.
-        # Use integer tags instead of interpolating schema names into
-        # string literals to avoid quoting issues.
-        schema_list = list(schemas_with_table)
-        union_parts = [
-            f'SELECT {i} AS idx, version_num FROM "{schema}".alembic_version'
-            for i, schema in enumerate(schema_list)
-        ]
-        rows = conn.execute(text(" UNION ALL ".join(union_parts)))
-        version_by_schema = {schema_list[row[0]]: row[1] for row in rows}
-
-        needs_migration.extend(
-            s for s in schemas_with_table if version_by_schema.get(s) != head_rev
-        )
-
-    return needs_migration
-
-
 def run_migrations_parallel(
    schemas: list[str],
    max_workers: int,
--- a/backend/alembic/versions/0bb4558f35df_add_scim_username_to_scim_user_mapping.py
+++ b/backend/alembic/versions/0bb4558f35df_add_scim_username_to_scim_user_mapping.py
@@ -0,0 +1,28 @@
+"""add scim_username to scim_user_mapping
+
+Revision ID: 0bb4558f35df
+Revises: 631fd2504136
+Create Date: 2026-02-20 10:45:30.340188
+
+"""
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision = "0bb4558f35df"
+down_revision = "631fd2504136"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.add_column(
+        "scim_user_mapping",
+        sa.Column("scim_username", sa.String(), nullable=True),
+    )
+
+
+def downgrade() -> None:
+    op.drop_column("scim_user_mapping", "scim_username")
--- a/backend/alembic/versions/631fd2504136_add_approx_chunk_count_in_vespa_to_.py
+++ b/backend/alembic/versions/631fd2504136_add_approx_chunk_count_in_vespa_to_.py
@@ -0,0 +1,32 @@
+"""add approx_chunk_count_in_vespa to opensearch tenant migration
+
+Revision ID: 631fd2504136
+Revises: c7f2e1b4a9d3
+Create Date: 2026-02-18 21:07:52.831215
+
+"""
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision = "631fd2504136"
+down_revision = "c7f2e1b4a9d3"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.add_column(
+        "opensearch_tenant_migration_record",
+        sa.Column(
+            "approx_chunk_count_in_vespa",
+            sa.Integer(),
+            nullable=True,
+        ),
+    )
+
+
+def downgrade() -> None:
+    op.drop_column("opensearch_tenant_migration_record", "approx_chunk_count_in_vespa")
--- a/backend/alembic/versions/7cb492013621_code_interpreter_server_model.py
+++ b/backend/alembic/versions/7cb492013621_code_interpreter_server_model.py
@@ -0,0 +1,31 @@
+"""code interpreter server model
+
+Revision ID: 7cb492013621
+Revises: 0bb4558f35df
+Create Date: 2026-02-22 18:54:54.007265
+
+"""
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision = "7cb492013621"
+down_revision = "0bb4558f35df"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.create_table(
+        "code_interpreter_server",
+        sa.Column("id", sa.Integer, primary_key=True),
+        sa.Column(
+            "server_enabled", sa.Boolean, nullable=False, server_default=sa.true()
+        ),
+    )
+
+
+def downgrade() -> None:
+    op.drop_table("code_interpreter_server")
--- a/backend/alembic/versions/c7f2e1b4a9d3_add_sharing_scope_to_build_session.py
+++ b/backend/alembic/versions/c7f2e1b4a9d3_add_sharing_scope_to_build_session.py
@@ -0,0 +1,31 @@
+"""add sharing_scope to build_session
+
+Revision ID: c7f2e1b4a9d3
+Revises: 19c0ccb01687
+Create Date: 2026-02-17 12:00:00.000000
+
+"""
+
+from alembic import op
+import sqlalchemy as sa
+
+revision = "c7f2e1b4a9d3"
+down_revision = "19c0ccb01687"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.add_column(
+        "build_session",
+        sa.Column(
+            "sharing_scope",
+            sa.String(),
+            nullable=False,
+            server_default="private",
+        ),
+    )
+
+
+def downgrade() -> None:
+    op.drop_column("build_session", "sharing_scope")
--- a/backend/ee/onyx/background/celery/tasks/cleanup/init.py
+++ b/backend/ee/onyx/background/celery/tasks/cleanup/init.py
--- a/backend/ee/onyx/background/celery/tasks/cloud/init.py
+++ b/backend/ee/onyx/background/celery/tasks/cloud/init.py
--- a/backend/ee/onyx/background/celery/tasks/doc_permission_syncing/init.py
+++ b/backend/ee/onyx/background/celery/tasks/doc_permission_syncing/init.py
--- a/backend/ee/onyx/background/celery/tasks/external_group_syncing/init.py
+++ b/backend/ee/onyx/background/celery/tasks/external_group_syncing/init.py
--- a/backend/ee/onyx/background/celery/tasks/tenant_provisioning/init.py
+++ b/backend/ee/onyx/background/celery/tasks/tenant_provisioning/init.py
--- a/backend/ee/onyx/background/celery/tasks/ttl_management/init.py
+++ b/backend/ee/onyx/background/celery/tasks/ttl_management/init.py
--- a/backend/ee/onyx/background/celery/tasks/usage_reporting/init.py
+++ b/backend/ee/onyx/background/celery/tasks/usage_reporting/init.py
--- a/backend/ee/onyx/background/celery/tasks/vespa/init.py
+++ b/backend/ee/onyx/background/celery/tasks/vespa/init.py
--- a/backend/ee/onyx/db/license.py
+++ b/backend/ee/onyx/db/license.py
@@ -263,9 +263,15 @@ def refresh_license_cache(

    try:
        payload = verify_license_signature(license_record.license_data)
+        # Derive source from payload: manual licenses lack stripe_customer_id
+        source: LicenseSource = (
+            LicenseSource.AUTO_FETCH
+            if payload.stripe_customer_id
+            else LicenseSource.MANUAL_UPLOAD
+        )
        return update_license_cache(
            payload,
-            source=LicenseSource.AUTO_FETCH,
+            source=source,
            tenant_id=tenant_id,
        )
    except ValueError as e:
--- a/backend/ee/onyx/db/scim.py
+++ b/backend/ee/onyx/db/scim.py
@@ -0,0 +1,688 @@
+"""SCIM Data Access Layer.
+
+All database operations for SCIM provisioning — token management, user
+mappings, and group mappings. Extends the base DAL (see ``onyx.db.dal``).
+
+Usage from FastAPI::
+
+    def get_scim_dal(db_session: Session = Depends(get_session)) -> ScimDAL:
+        return ScimDAL(db_session)
+
+    @router.post("/tokens")
+    def create_token(dal: ScimDAL = Depends(get_scim_dal)) -> ...:
+        token = dal.create_token(name=..., hashed_token=..., ...)
+        dal.commit()
+        return token
+
+Usage from background tasks::
+
+    with ScimDAL.from_tenant("tenant_abc") as dal:
+        mapping = dal.create_user_mapping(external_id="idp-123", user_id=uid)
+        dal.commit()
+"""
+
+from __future__ import annotations
+
+from uuid import UUID
+
+from sqlalchemy import delete as sa_delete
+from sqlalchemy import func
+from sqlalchemy import Select
+from sqlalchemy import select
+from sqlalchemy import SQLColumnExpression
+from sqlalchemy.dialects.postgresql import insert as pg_insert
+
+from ee.onyx.server.scim.filtering import ScimFilter
+from ee.onyx.server.scim.filtering import ScimFilterOperator
+from onyx.db.dal import DAL
+from onyx.db.models import ScimGroupMapping
+from onyx.db.models import ScimToken
+from onyx.db.models import ScimUserMapping
+from onyx.db.models import User
+from onyx.db.models import User__UserGroup
+from onyx.db.models import UserGroup
+from onyx.db.models import UserRole
+from onyx.utils.logger import setup_logger
+
+logger = setup_logger()
+
+
+class ScimDAL(DAL):
+    """Data Access Layer for SCIM provisioning operations.
+
+    Methods mutate but do NOT commit — call ``dal.commit()`` explicitly
+    when you want to persist changes. This follows the existing ``_no_commit``
+    convention and lets callers batch multiple operations into one transaction.
+    """
+
+    # ------------------------------------------------------------------
+    # Token operations
+    # ------------------------------------------------------------------
+
+    def create_token(
+        self,
+        name: str,
+        hashed_token: str,
+        token_display: str,
+        created_by_id: UUID,
+    ) -> ScimToken:
+        """Create a new SCIM bearer token.
+
+        Only one token is active at a time — this method automatically revokes
+        all existing active tokens before creating the new one.
+        """
+        # Revoke any currently active tokens
+        active_tokens = list(
+            self._session.scalars(
+                select(ScimToken).where(ScimToken.is_active.is_(True))
+            ).all()
+        )
+        for t in active_tokens:
+            t.is_active = False
+
+        token = ScimToken(
+            name=name,
+            hashed_token=hashed_token,
+            token_display=token_display,
+            created_by_id=created_by_id,
+        )
+        self._session.add(token)
+        self._session.flush()
+        return token
+
+    def get_active_token(self) -> ScimToken | None:
+        """Return the single currently active token, or None."""
+        return self._session.scalar(
+            select(ScimToken).where(ScimToken.is_active.is_(True))
+        )
+
+    def get_token_by_hash(self, hashed_token: str) -> ScimToken | None:
+        """Look up a token by its SHA-256 hash."""
+        return self._session.scalar(
+            select(ScimToken).where(ScimToken.hashed_token == hashed_token)
+        )
+
+    def revoke_token(self, token_id: int) -> None:
+        """Deactivate a token by ID.
+
+        Raises:
+            ValueError: If the token does not exist.
+        """
+        token = self._session.get(ScimToken, token_id)
+        if not token:
+            raise ValueError(f"SCIM token with id {token_id} not found")
+        token.is_active = False
+
+    def update_token_last_used(self, token_id: int) -> None:
+        """Update the last_used_at timestamp for a token."""
+        token = self._session.get(ScimToken, token_id)
+        if token:
+            token.last_used_at = func.now()  # type: ignore[assignment]
+
+    # ------------------------------------------------------------------
+    # User mapping operations
+    # ------------------------------------------------------------------
+
+    def create_user_mapping(
+        self,
+        external_id: str,
+        user_id: UUID,
+        scim_username: str | None = None,
+    ) -> ScimUserMapping:
+        """Create a mapping between a SCIM externalId and an Onyx user."""
+        mapping = ScimUserMapping(
+            external_id=external_id,
+            user_id=user_id,
+            scim_username=scim_username,
+        )
+        self._session.add(mapping)
+        self._session.flush()
+        return mapping
+
+    def get_user_mapping_by_external_id(
+        self, external_id: str
+    ) -> ScimUserMapping | None:
+        """Look up a user mapping by the IdP's external identifier."""
+        return self._session.scalar(
+            select(ScimUserMapping).where(ScimUserMapping.external_id == external_id)
+        )
+
+    def get_user_mapping_by_user_id(self, user_id: UUID) -> ScimUserMapping | None:
+        """Look up a user mapping by the Onyx user ID."""
+        return self._session.scalar(
+            select(ScimUserMapping).where(ScimUserMapping.user_id == user_id)
+        )
+
+    def list_user_mappings(
+        self,
+        start_index: int = 1,
+        count: int = 100,
+    ) -> tuple[list[ScimUserMapping], int]:
+        """List user mappings with SCIM-style pagination.
+
+        Args:
+            start_index: 1-based start index (SCIM convention).
+            count: Maximum number of results to return.
+
+        Returns:
+            A tuple of (mappings, total_count).
+        """
+        total = (
+            self._session.scalar(select(func.count()).select_from(ScimUserMapping)) or 0
+        )
+
+        offset = max(start_index - 1, 0)
+        mappings = list(
+            self._session.scalars(
+                select(ScimUserMapping)
+                .order_by(ScimUserMapping.id)
+                .offset(offset)
+                .limit(count)
+            ).all()
+        )
+
+        return mappings, total
+
+    def update_user_mapping_external_id(
+        self,
+        mapping_id: int,
+        external_id: str,
+    ) -> ScimUserMapping:
+        """Update the external ID on a user mapping.
+
+        Raises:
+            ValueError: If the mapping does not exist.
+        """
+        mapping = self._session.get(ScimUserMapping, mapping_id)
+        if not mapping:
+            raise ValueError(f"SCIM user mapping with id {mapping_id} not found")
+        mapping.external_id = external_id
+        return mapping
+
+    def delete_user_mapping(self, mapping_id: int) -> None:
+        """Delete a user mapping by ID. No-op if already deleted."""
+        mapping = self._session.get(ScimUserMapping, mapping_id)
+        if not mapping:
+            logger.warning("SCIM user mapping %d not found during delete", mapping_id)
+            return
+        self._session.delete(mapping)
+
+    # ------------------------------------------------------------------
+    # User query operations
+    # ------------------------------------------------------------------
+
+    def get_user(self, user_id: UUID) -> User | None:
+        """Fetch a user by ID."""
+        return self._session.scalar(
+            select(User).where(User.id == user_id)  # type: ignore[arg-type]
+        )
+
+    def get_user_by_email(self, email: str) -> User | None:
+        """Fetch a user by email (case-insensitive)."""
+        return self._session.scalar(
+            select(User).where(func.lower(User.email) == func.lower(email))
+        )
+
+    def add_user(self, user: User) -> None:
+        """Add a new user to the session and flush to assign an ID."""
+        self._session.add(user)
+        self._session.flush()
+
+    def update_user(
+        self,
+        user: User,
+        *,
+        email: str | None = None,
+        is_active: bool | None = None,
+        personal_name: str | None = None,
+    ) -> None:
+        """Update user attributes. Only sets fields that are provided."""
+        if email is not None:
+            user.email = email
+        if is_active is not None:
+            user.is_active = is_active
+        if personal_name is not None:
+            user.personal_name = personal_name
+
+    def deactivate_user(self, user: User) -> None:
+        """Mark a user as inactive."""
+        user.is_active = False
+
+    def list_users(
+        self,
+        scim_filter: ScimFilter | None,
+        start_index: int = 1,
+        count: int = 100,
+    ) -> tuple[list[tuple[User, ScimUserMapping | None]], int]:
+        """Query users with optional SCIM filter and pagination.
+
+        Returns:
+            A tuple of (list of (user, mapping) pairs, total_count).
+
+        Raises:
+            ValueError: If the filter uses an unsupported attribute.
+        """
+        query = select(User).where(
+            User.role.notin_([UserRole.SLACK_USER, UserRole.EXT_PERM_USER])
+        )
+
+        if scim_filter:
+            attr = scim_filter.attribute.lower()
+            if attr == "username":
+                # arg-type: fastapi-users types User.email as str, not a column expression
+                # assignment: union return type widens but query is still Select[tuple[User]]
+                query = _apply_scim_string_op(query, User.email, scim_filter)  # type: ignore[arg-type, assignment]
+            elif attr == "active":
+                query = query.where(
+                    User.is_active.is_(scim_filter.value.lower() == "true")  # type: ignore[attr-defined]
+                )
+            elif attr == "externalid":
+                mapping = self.get_user_mapping_by_external_id(scim_filter.value)
+                if not mapping:
+                    return [], 0
+                query = query.where(User.id == mapping.user_id)  # type: ignore[arg-type]
+            else:
+                raise ValueError(
+                    f"Unsupported filter attribute: {scim_filter.attribute}"
+                )
+
+        # Count total matching rows first, then paginate. SCIM uses 1-based
+        # indexing (RFC 7644 §3.4.2), so we convert to a 0-based offset.
+        total = (
+            self._session.scalar(select(func.count()).select_from(query.subquery()))
+            or 0
+        )
+
+        offset = max(start_index - 1, 0)
+        users = list(
+            self._session.scalars(
+                query.order_by(User.id).offset(offset).limit(count)  # type: ignore[arg-type]
+            )
+            .unique()
+            .all()
+        )
+
+        # Batch-fetch SCIM mappings to avoid N+1 queries
+        mapping_map = self._get_user_mappings_batch([u.id for u in users])
+        return [(u, mapping_map.get(u.id)) for u in users], total
+
+    def sync_user_external_id(
+        self,
+        user_id: UUID,
+        new_external_id: str | None,
+        scim_username: str | None = None,
+    ) -> None:
+        """Create, update, or delete the external ID mapping for a user."""
+        mapping = self.get_user_mapping_by_user_id(user_id)
+        if new_external_id:
+            if mapping:
+                if mapping.external_id != new_external_id:
+                    mapping.external_id = new_external_id
+                if scim_username is not None:
+                    mapping.scim_username = scim_username
+            else:
+                self.create_user_mapping(
+                    external_id=new_external_id,
+                    user_id=user_id,
+                    scim_username=scim_username,
+                )
+        elif mapping:
+            self.delete_user_mapping(mapping.id)
+
+    def _get_user_mappings_batch(
+        self, user_ids: list[UUID]
+    ) -> dict[UUID, ScimUserMapping]:
+        """Batch-fetch SCIM user mappings keyed by user ID."""
+        if not user_ids:
+            return {}
+        mappings = self._session.scalars(
+            select(ScimUserMapping).where(ScimUserMapping.user_id.in_(user_ids))
+        ).all()
+        return {m.user_id: m for m in mappings}
+
+    def get_user_groups(self, user_id: UUID) -> list[tuple[int, str]]:
+        """Get groups a user belongs to as ``(group_id, group_name)`` pairs.
+
+        Excludes groups marked for deletion.
+        """
+        rels = self._session.scalars(
+            select(User__UserGroup).where(User__UserGroup.user_id == user_id)
+        ).all()
+
+        group_ids = [r.user_group_id for r in rels]
+        if not group_ids:
+            return []
+
+        groups = self._session.scalars(
+            select(UserGroup).where(
+                UserGroup.id.in_(group_ids),
+                UserGroup.is_up_for_deletion.is_(False),
+            )
+        ).all()
+        return [(g.id, g.name) for g in groups]
+
+    def get_users_groups_batch(
+        self, user_ids: list[UUID]
+    ) -> dict[UUID, list[tuple[int, str]]]:
+        """Batch-fetch group memberships for multiple users.
+
+        Returns a mapping of ``user_id → [(group_id, group_name), ...]``.
+        Avoids N+1 queries when building user list responses.
+        """
+        if not user_ids:
+            return {}
+
+        rels = self._session.scalars(
+            select(User__UserGroup).where(User__UserGroup.user_id.in_(user_ids))
+        ).all()
+
+        group_ids = list({r.user_group_id for r in rels})
+        if not group_ids:
+            return {}
+
+        groups = self._session.scalars(
+            select(UserGroup).where(
+                UserGroup.id.in_(group_ids),
+                UserGroup.is_up_for_deletion.is_(False),
+            )
+        ).all()
+        groups_by_id = {g.id: g.name for g in groups}
+
+        result: dict[UUID, list[tuple[int, str]]] = {}
+        for r in rels:
+            if r.user_id and r.user_group_id in groups_by_id:
+                result.setdefault(r.user_id, []).append(
+                    (r.user_group_id, groups_by_id[r.user_group_id])
+                )
+        return result
+
+    # ------------------------------------------------------------------
+    # Group mapping operations
+    # ------------------------------------------------------------------
+
+    def create_group_mapping(
+        self,
+        external_id: str,
+        user_group_id: int,
+    ) -> ScimGroupMapping:
+        """Create a mapping between a SCIM externalId and an Onyx user group."""
+        mapping = ScimGroupMapping(external_id=external_id, user_group_id=user_group_id)
+        self._session.add(mapping)
+        self._session.flush()
+        return mapping
+
+    def get_group_mapping_by_external_id(
+        self, external_id: str
+    ) -> ScimGroupMapping | None:
+        """Look up a group mapping by the IdP's external identifier."""
+        return self._session.scalar(
+            select(ScimGroupMapping).where(ScimGroupMapping.external_id == external_id)
+        )
+
+    def get_group_mapping_by_group_id(
+        self, user_group_id: int
+    ) -> ScimGroupMapping | None:
+        """Look up a group mapping by the Onyx user group ID."""
+        return self._session.scalar(
+            select(ScimGroupMapping).where(
+                ScimGroupMapping.user_group_id == user_group_id
+            )
+        )
+
+    def list_group_mappings(
+        self,
+        start_index: int = 1,
+        count: int = 100,
+    ) -> tuple[list[ScimGroupMapping], int]:
+        """List group mappings with SCIM-style pagination.
+
+        Args:
+            start_index: 1-based start index (SCIM convention).
+            count: Maximum number of results to return.
+
+        Returns:
+            A tuple of (mappings, total_count).
+        """
+        total = (
+            self._session.scalar(select(func.count()).select_from(ScimGroupMapping))
+            or 0
+        )
+
+        offset = max(start_index - 1, 0)
+        mappings = list(
+            self._session.scalars(
+                select(ScimGroupMapping)
+                .order_by(ScimGroupMapping.id)
+                .offset(offset)
+                .limit(count)
+            ).all()
+        )
+
+        return mappings, total
+
+    def delete_group_mapping(self, mapping_id: int) -> None:
+        """Delete a group mapping by ID. No-op if already deleted."""
+        mapping = self._session.get(ScimGroupMapping, mapping_id)
+        if not mapping:
+            logger.warning("SCIM group mapping %d not found during delete", mapping_id)
+            return
+        self._session.delete(mapping)
+
+    # ------------------------------------------------------------------
+    # Group query operations
+    # ------------------------------------------------------------------
+
+    def get_group(self, group_id: int) -> UserGroup | None:
+        """Fetch a group by ID, returning None if deleted or missing."""
+        group = self._session.get(UserGroup, group_id)
+        if group and group.is_up_for_deletion:
+            return None
+        return group
+
+    def get_group_by_name(self, name: str) -> UserGroup | None:
+        """Fetch a group by exact name."""
+        return self._session.scalar(select(UserGroup).where(UserGroup.name == name))
+
+    def add_group(self, group: UserGroup) -> None:
+        """Add a new group to the session and flush to assign an ID."""
+        self._session.add(group)
+        self._session.flush()
+
+    def update_group(
+        self,
+        group: UserGroup,
+        *,
+        name: str | None = None,
+    ) -> None:
+        """Update group attributes and set the modification timestamp."""
+        if name is not None:
+            group.name = name
+        group.time_last_modified_by_user = func.now()
+
+    def delete_group(self, group: UserGroup) -> None:
+        """Delete a group from the session."""
+        self._session.delete(group)
+
+    def list_groups(
+        self,
+        scim_filter: ScimFilter | None,
+        start_index: int = 1,
+        count: int = 100,
+    ) -> tuple[list[tuple[UserGroup, str | None]], int]:
+        """Query groups with optional SCIM filter and pagination.
+
+        Returns:
+            A tuple of (list of (group, external_id) pairs, total_count).
+
+        Raises:
+            ValueError: If the filter uses an unsupported attribute.
+        """
+        query = select(UserGroup).where(UserGroup.is_up_for_deletion.is_(False))
+
+        if scim_filter:
+            attr = scim_filter.attribute.lower()
+            if attr == "displayname":
+                # assignment: union return type widens but query is still Select[tuple[UserGroup]]
+                query = _apply_scim_string_op(query, UserGroup.name, scim_filter)  # type: ignore[assignment]
+            elif attr == "externalid":
+                mapping = self.get_group_mapping_by_external_id(scim_filter.value)
+                if not mapping:
+                    return [], 0
+                query = query.where(UserGroup.id == mapping.user_group_id)
+            else:
+                raise ValueError(
+                    f"Unsupported filter attribute: {scim_filter.attribute}"
+                )
+
+        total = (
+            self._session.scalar(select(func.count()).select_from(query.subquery()))
+            or 0
+        )
+
+        offset = max(start_index - 1, 0)
+        groups = list(
+            self._session.scalars(
+                query.order_by(UserGroup.id).offset(offset).limit(count)
+            ).all()
+        )
+
+        ext_id_map = self._get_group_external_ids([g.id for g in groups])
+        return [(g, ext_id_map.get(g.id)) for g in groups], total
+
+    def get_group_members(self, group_id: int) -> list[tuple[UUID, str | None]]:
+        """Get group members as (user_id, email) pairs."""
+        rels = self._session.scalars(
+            select(User__UserGroup).where(User__UserGroup.user_group_id == group_id)
+        ).all()
+
+        user_ids = [r.user_id for r in rels if r.user_id]
+        if not user_ids:
+            return []
+
+        users = (
+            self._session.scalars(
+                select(User).where(User.id.in_(user_ids))  # type: ignore[attr-defined]
+            )
+            .unique()
+            .all()
+        )
+        users_by_id = {u.id: u for u in users}
+
+        return [
+            (
+                r.user_id,
+                users_by_id[r.user_id].email if r.user_id in users_by_id else None,
+            )
+            for r in rels
+            if r.user_id
+        ]
+
+    def validate_member_ids(self, uuids: list[UUID]) -> list[UUID]:
+        """Return the subset of UUIDs that don't exist as users.
+
+        Returns an empty list if all IDs are valid.
+        """
+        if not uuids:
+            return []
+        existing_users = (
+            self._session.scalars(
+                select(User).where(User.id.in_(uuids))  # type: ignore[attr-defined]
+            )
+            .unique()
+            .all()
+        )
+        existing_ids = {u.id for u in existing_users}
+        return [uid for uid in uuids if uid not in existing_ids]
+
+    def upsert_group_members(self, group_id: int, user_ids: list[UUID]) -> None:
+        """Add user-group relationships, ignoring duplicates."""
+        if not user_ids:
+            return
+        self._session.execute(
+            pg_insert(User__UserGroup)
+            .values([{"user_id": uid, "user_group_id": group_id} for uid in user_ids])
+            .on_conflict_do_nothing(
+                index_elements=[
+                    User__UserGroup.user_group_id,
+                    User__UserGroup.user_id,
+                ]
+            )
+        )
+
+    def replace_group_members(self, group_id: int, user_ids: list[UUID]) -> None:
+        """Replace all members of a group."""
+        self._session.execute(
+            sa_delete(User__UserGroup).where(User__UserGroup.user_group_id == group_id)
+        )
+        self.upsert_group_members(group_id, user_ids)
+
+    def remove_group_members(self, group_id: int, user_ids: list[UUID]) -> None:
+        """Remove specific members from a group."""
+        if not user_ids:
+            return
+        self._session.execute(
+            sa_delete(User__UserGroup).where(
+                User__UserGroup.user_group_id == group_id,
+                User__UserGroup.user_id.in_(user_ids),
+            )
+        )
+
+    def delete_group_with_members(self, group: UserGroup) -> None:
+        """Remove all member relationships and delete the group."""
+        self._session.execute(
+            sa_delete(User__UserGroup).where(User__UserGroup.user_group_id == group.id)
+        )
+        self._session.delete(group)
+
+    def sync_group_external_id(
+        self, group_id: int, new_external_id: str | None
+    ) -> None:
+        """Create, update, or delete the external ID mapping for a group."""
+        mapping = self.get_group_mapping_by_group_id(group_id)
+        if new_external_id:
+            if mapping:
+                if mapping.external_id != new_external_id:
+                    mapping.external_id = new_external_id
+            else:
+                self.create_group_mapping(
+                    external_id=new_external_id, user_group_id=group_id
+                )
+        elif mapping:
+            self.delete_group_mapping(mapping.id)
+
+    def _get_group_external_ids(self, group_ids: list[int]) -> dict[int, str]:
+        """Batch-fetch external IDs for a list of group IDs."""
+        if not group_ids:
+            return {}
+        mappings = self._session.scalars(
+            select(ScimGroupMapping).where(
+                ScimGroupMapping.user_group_id.in_(group_ids)
+            )
+        ).all()
+        return {m.user_group_id: m.external_id for m in mappings}
+
+
+# ---------------------------------------------------------------------------
+# Module-level helpers (used by DAL methods above)
+# ---------------------------------------------------------------------------
+
+
+def _apply_scim_string_op(
+    query: Select[tuple[User]] | Select[tuple[UserGroup]],
+    column: SQLColumnExpression[str],
+    scim_filter: ScimFilter,
+) -> Select[tuple[User]] | Select[tuple[UserGroup]]:
+    """Apply a SCIM string filter operator using SQLAlchemy column operators.
+
+    Handles eq (case-insensitive exact), co (contains), and sw (starts with).
+    SQLAlchemy's operators handle LIKE-pattern escaping internally.
+    """
+    val = scim_filter.value
+    if scim_filter.operator == ScimFilterOperator.EQUAL:
+        return query.where(func.lower(column) == val.lower())
+    elif scim_filter.operator == ScimFilterOperator.CONTAINS:
+        return query.where(column.icontains(val, autoescape=True))
+    elif scim_filter.operator == ScimFilterOperator.STARTS_WITH:
+        return query.where(column.istartswith(val, autoescape=True))
+    else:
+        raise ValueError(f"Unsupported string filter operator: {scim_filter.operator}")
--- a/backend/ee/onyx/db/user_group.py
+++ b/backend/ee/onyx/db/user_group.py
@@ -9,6 +9,7 @@ from sqlalchemy import Select
 from sqlalchemy import select
 from sqlalchemy import update
 from sqlalchemy.dialects.postgresql import insert
+from sqlalchemy.orm import selectinload
 from sqlalchemy.orm import Session

 from ee.onyx.server.user_group.models import SetCuratorRequest
@@ -18,11 +19,15 @@ from onyx.db.connector_credential_pair import get_connector_credential_pair_from
 from onyx.db.enums import AccessType
 from onyx.db.enums import ConnectorCredentialPairStatus
 from onyx.db.models import ConnectorCredentialPair
+from onyx.db.models import Credential
 from onyx.db.models import Credential__UserGroup
 from onyx.db.models import Document
 from onyx.db.models import DocumentByConnectorCredentialPair
+from onyx.db.models import DocumentSet
 from onyx.db.models import DocumentSet__UserGroup
+from onyx.db.models import FederatedConnector__DocumentSet
 from onyx.db.models import LLMProvider__UserGroup
+from onyx.db.models import Persona
 from onyx.db.models import Persona__UserGroup
 from onyx.db.models import TokenRateLimit__UserGroup
 from onyx.db.models import User
@@ -195,8 +200,60 @@ def fetch_user_group(db_session: Session, user_group_id: int) -> UserGroup | Non
    return db_session.scalar(stmt)


+def _add_user_group_snapshot_eager_loads(
+    stmt: Select,
+) -> Select:
+    """Add eager loading options needed by UserGroup.from_model snapshot creation."""
+    return stmt.options(
+        selectinload(UserGroup.users),
+        selectinload(UserGroup.user_group_relationships),
+        selectinload(UserGroup.cc_pair_relationships)
+        .selectinload(UserGroup__ConnectorCredentialPair.cc_pair)
+        .options(
+            selectinload(ConnectorCredentialPair.connector),
+            selectinload(ConnectorCredentialPair.credential).selectinload(
+                Credential.user
+            ),
+        ),
+        selectinload(UserGroup.document_sets).options(
+            selectinload(DocumentSet.connector_credential_pairs).selectinload(
+                ConnectorCredentialPair.connector
+            ),
+            selectinload(DocumentSet.users),
+            selectinload(DocumentSet.groups),
+            selectinload(DocumentSet.federated_connectors).selectinload(
+                FederatedConnector__DocumentSet.federated_connector
+            ),
+        ),
+        selectinload(UserGroup.personas).options(
+            selectinload(Persona.tools),
+            selectinload(Persona.hierarchy_nodes),
+            selectinload(Persona.attached_documents).selectinload(
+                Document.parent_hierarchy_node
+            ),
+            selectinload(Persona.labels),
+            selectinload(Persona.document_sets).options(
+                selectinload(DocumentSet.connector_credential_pairs).selectinload(
+                    ConnectorCredentialPair.connector
+                ),
+                selectinload(DocumentSet.users),
+                selectinload(DocumentSet.groups),
+                selectinload(DocumentSet.federated_connectors).selectinload(
+                    FederatedConnector__DocumentSet.federated_connector
+                ),
+            ),
+            selectinload(Persona.user),
+            selectinload(Persona.user_files),
+            selectinload(Persona.users),
+            selectinload(Persona.groups),
+        ),
+    )
+
+
 def fetch_user_groups(
-    db_session: Session, only_up_to_date: bool = True
+    db_session: Session,
+    only_up_to_date: bool = True,
+    eager_load_for_snapshot: bool = False,
 ) -> Sequence[UserGroup]:
    """
    Fetches user groups from the database.
@@ -209,6 +266,8 @@ def fetch_user_groups(
        db_session (Session): The SQLAlchemy session used to query the database.
        only_up_to_date (bool, optional): Flag to determine whether to filter the results
            to include only up to date user groups. Defaults to `True`.
+        eager_load_for_snapshot: If True, adds eager loading for all relationships
+            needed by UserGroup.from_model snapshot creation.

    Returns:
        Sequence[UserGroup]: A sequence of `UserGroup` objects matching the query criteria.
@@ -216,11 +275,16 @@ def fetch_user_groups(
    stmt = select(UserGroup)
    if only_up_to_date:
        stmt = stmt.where(UserGroup.is_up_to_date == True)  # noqa: E712
-    return db_session.scalars(stmt).all()
+    if eager_load_for_snapshot:
+        stmt = _add_user_group_snapshot_eager_loads(stmt)
+    return db_session.scalars(stmt).unique().all()


 def fetch_user_groups_for_user(
-    db_session: Session, user_id: UUID, only_curator_groups: bool = False
+    db_session: Session,
+    user_id: UUID,
+    only_curator_groups: bool = False,
+    eager_load_for_snapshot: bool = False,
 ) -> Sequence[UserGroup]:
    stmt = (
        select(UserGroup)
@@ -230,7 +294,9 @@ def fetch_user_groups_for_user(
    )
    if only_curator_groups:
        stmt = stmt.where(User__UserGroup.is_curator == True)  # noqa: E712
-    return db_session.scalars(stmt).all()
+    if eager_load_for_snapshot:
+        stmt = _add_user_group_snapshot_eager_loads(stmt)
+    return db_session.scalars(stmt).unique().all()


 def construct_document_id_select_by_usergroup(
--- a/backend/ee/onyx/external_permissions/sharepoint/group_sync.py
+++ b/backend/ee/onyx/external_permissions/sharepoint/group_sync.py
@@ -1,9 +1,13 @@
 from collections.abc import Generator

+from office365.sharepoint.client_context import ClientContext  # type: ignore[import-untyped]
+
 from ee.onyx.db.external_perm import ExternalUserGroup
 from ee.onyx.external_permissions.sharepoint.permission_utils import (
    get_sharepoint_external_groups,
 )
+from onyx.configs.app_configs import SHAREPOINT_EXHAUSTIVE_AD_ENUMERATION
+from onyx.connectors.sharepoint.connector import acquire_token_for_rest
 from onyx.connectors.sharepoint.connector import SharepointConnector
 from onyx.db.models import ConnectorCredentialPair
 from onyx.utils.logger import setup_logger
@@ -43,14 +47,27 @@ def sharepoint_group_sync(

    logger.info(f"Processing {len(site_descriptors)} sites for group sync")

-    # Process each site
+    enumerate_all = connector_config.get(
+        "exhaustive_ad_enumeration", SHAREPOINT_EXHAUSTIVE_AD_ENUMERATION
+    )
+
+    msal_app = connector.msal_app
+    sp_tenant_domain = connector.sp_tenant_domain
+    sp_domain_suffix = connector.sharepoint_domain_suffix
    for site_descriptor in site_descriptors:
        logger.debug(f"Processing site: {site_descriptor.url}")

-        ctx = connector._create_rest_client_context(site_descriptor.url)
+        ctx = ClientContext(site_descriptor.url).with_access_token(
+            lambda: acquire_token_for_rest(msal_app, sp_tenant_domain, sp_domain_suffix)
+        )

-        # Get external groups for this site
-        external_groups = get_sharepoint_external_groups(ctx, connector.graph_client)
+        external_groups = get_sharepoint_external_groups(
+            ctx,
+            connector.graph_client,
+            graph_api_base=connector.graph_api_base,
+            get_access_token=connector._get_graph_access_token,
+            enumerate_all_ad_groups=enumerate_all,
+        )

        # Yield each group
        for group in external_groups:
--- a/backend/ee/onyx/external_permissions/sharepoint/permission_utils.py
+++ b/backend/ee/onyx/external_permissions/sharepoint/permission_utils.py
@@ -1,9 +1,13 @@
 import re
+import time
 from collections import deque
+from collections.abc import Callable
+from collections.abc import Generator
 from typing import Any
 from urllib.parse import unquote
 from urllib.parse import urlparse

+import requests as _requests
 from office365.graph_client import GraphClient  # type: ignore[import-untyped]
 from office365.onedrive.driveitems.driveItem import DriveItem  # type: ignore[import-untyped]
 from office365.runtime.client_request import ClientRequestException  # type: ignore
@@ -14,7 +18,10 @@ from pydantic import BaseModel
 from ee.onyx.db.external_perm import ExternalUserGroup
 from onyx.access.models import ExternalAccess
 from onyx.access.utils import build_ext_group_name_for_onyx
+from onyx.configs.app_configs import REQUEST_TIMEOUT_SECONDS
 from onyx.configs.constants import DocumentSource
+from onyx.connectors.sharepoint.connector import GRAPH_API_MAX_RETRIES
+from onyx.connectors.sharepoint.connector import GRAPH_API_RETRYABLE_STATUSES
 from onyx.connectors.sharepoint.connector import SHARED_DOCUMENTS_MAP_REVERSE
 from onyx.connectors.sharepoint.connector import sleep_and_retry
 from onyx.utils.logger import setup_logger
@@ -33,6 +40,70 @@ LIMITED_ACCESS_ROLE_TYPES = [1, 9]
 LIMITED_ACCESS_ROLE_NAMES = ["Limited Access", "Web-Only Limited Access"]


+AD_GROUP_ENUMERATION_THRESHOLD = 100_000
+
+
+def _graph_api_get(
+    url: str,
+    get_access_token: Callable[[], str],
+    params: dict[str, str] | None = None,
+) -> dict[str, Any]:
+    """Authenticated Graph API GET with retry on transient errors."""
+    for attempt in range(GRAPH_API_MAX_RETRIES + 1):
+        access_token = get_access_token()
+        headers = {"Authorization": f"Bearer {access_token}"}
+        try:
+            resp = _requests.get(
+                url, headers=headers, params=params, timeout=REQUEST_TIMEOUT_SECONDS
+            )
+            if (
+                resp.status_code in GRAPH_API_RETRYABLE_STATUSES
+                and attempt < GRAPH_API_MAX_RETRIES
+            ):
+                wait = min(int(resp.headers.get("Retry-After", str(2**attempt))), 60)
+                logger.warning(
+                    f"Graph API {resp.status_code} on attempt {attempt + 1}, "
+                    f"retrying in {wait}s: {url}"
+                )
+                time.sleep(wait)
+                continue
+            resp.raise_for_status()
+            return resp.json()
+        except (_requests.ConnectionError, _requests.Timeout, _requests.HTTPError):
+            if attempt < GRAPH_API_MAX_RETRIES:
+                wait = min(2**attempt, 60)
+                logger.warning(
+                    f"Graph API connection error on attempt {attempt + 1}, "
+                    f"retrying in {wait}s: {url}"
+                )
+                time.sleep(wait)
+                continue
+            raise
+    raise RuntimeError(
+        f"Graph API request failed after {GRAPH_API_MAX_RETRIES + 1} attempts: {url}"
+    )
+
+
+def _iter_graph_collection(
+    initial_url: str,
+    get_access_token: Callable[[], str],
+    params: dict[str, str] | None = None,
+) -> Generator[dict[str, Any], None, None]:
+    """Paginate through a Graph API collection, yielding items one at a time."""
+    url: str | None = initial_url
+    while url:
+        data = _graph_api_get(url, get_access_token, params)
+        params = None
+        yield from data.get("value", [])
+        url = data.get("@odata.nextLink")
+
+
+def _normalize_email(email: str) -> str:
+    if MICROSOFT_DOMAIN in email:
+        return email.replace(MICROSOFT_DOMAIN, "")
+    return email
+
+
 class SharepointGroup(BaseModel):
    model_config = {"frozen": True}

@@ -572,8 +643,65 @@ def get_external_access_from_sharepoint(
    )


+def _enumerate_ad_groups_paginated(
+    get_access_token: Callable[[], str],
+    already_resolved: set[str],
+    graph_api_base: str,
+) -> Generator[ExternalUserGroup, None, None]:
+    """Paginate through all Azure AD groups and yield ExternalUserGroup for each.
+
+    Skips groups whose suffixed name is already in *already_resolved*.
+    Stops early if the number of groups exceeds AD_GROUP_ENUMERATION_THRESHOLD.
+    """
+    groups_url = f"{graph_api_base}/groups"
+    groups_params: dict[str, str] = {"$select": "id,displayName", "$top": "999"}
+    total_groups = 0
+
+    for group_json in _iter_graph_collection(
+        groups_url, get_access_token, groups_params
+    ):
+        group_id: str = group_json.get("id", "")
+        display_name: str = group_json.get("displayName", "")
+        if not group_id or not display_name:
+            continue
+
+        total_groups += 1
+        if total_groups > AD_GROUP_ENUMERATION_THRESHOLD:
+            logger.warning(
+                f"Azure AD group enumeration exceeded {AD_GROUP_ENUMERATION_THRESHOLD} "
+                "groups — stopping to avoid excessive memory/API usage. "
+                "Remaining groups will be resolved from role assignments only."
+            )
+            return
+
+        name = f"{display_name}_{group_id}"
+        if name in already_resolved:
+            continue
+
+        member_emails: list[str] = []
+        members_url = f"{graph_api_base}/groups/{group_id}/members"
+        members_params: dict[str, str] = {
+            "$select": "userPrincipalName,mail",
+            "$top": "999",
+        }
+        for member_json in _iter_graph_collection(
+            members_url, get_access_token, members_params
+        ):
+            email = member_json.get("userPrincipalName") or member_json.get("mail")
+            if email:
+                member_emails.append(_normalize_email(email))
+
+        yield ExternalUserGroup(id=name, user_emails=member_emails)
+
+    logger.info(f"Enumerated {total_groups} Azure AD groups via paginated Graph API")
+
+
 def get_sharepoint_external_groups(
-    client_context: ClientContext, graph_client: GraphClient
+    client_context: ClientContext,
+    graph_client: GraphClient,
+    graph_api_base: str,
+    get_access_token: Callable[[], str] | None = None,
+    enumerate_all_ad_groups: bool = False,
 ) -> list[ExternalUserGroup]:

    groups: set[SharepointGroup] = set()
@@ -629,57 +757,22 @@ def get_sharepoint_external_groups(
        client_context, graph_client, groups, is_group_sync=True
    )

-    # get all Azure AD groups because if any group is assigned to the drive item, we don't want to miss them
-    # We can't assign sharepoint groups to drive items or drives, so we don't need to get all sharepoint groups
-    azure_ad_groups = sleep_and_retry(
-        graph_client.groups.get_all(page_loaded=lambda _: None),
-        "get_sharepoint_external_groups:get_azure_ad_groups",
-    )
-    logger.info(f"Azure AD Groups: {len(azure_ad_groups)}")
-    identified_groups: set[str] = set(groups_and_members.groups_to_emails.keys())
-    ad_groups_to_emails: dict[str, set[str]] = {}
-    for group in azure_ad_groups:
-        # If the group is already identified, we don't need to get the members
-        if group.display_name in identified_groups:
-            continue
-        # AD groups allows same display name for multiple groups, so we need to add the GUID to the name
-        name = group.display_name
-        name = _get_group_name_with_suffix(group.id, name, graph_client)
+    external_user_groups: list[ExternalUserGroup] = [
+        ExternalUserGroup(id=group_name, user_emails=list(emails))
+        for group_name, emails in groups_and_members.groups_to_emails.items()
+    ]

-        members = sleep_and_retry(
-            group.members.get_all(page_loaded=lambda _: None),
-            "get_sharepoint_external_groups:get_azure_ad_groups:get_members",
+    if not enumerate_all_ad_groups or get_access_token is None:
+        logger.info(
+            "Skipping exhaustive Azure AD group enumeration. "
+            "Only groups found in site role assignments are included."
        )
-        for member in members:
-            member_data = member.to_json()
-            user_principal_name = member_data.get("userPrincipalName")
-            mail = member_data.get("mail")
-            if not ad_groups_to_emails.get(name):
-                ad_groups_to_emails[name] = set()
-            if user_principal_name:
-                if MICROSOFT_DOMAIN in user_principal_name:
-                    user_principal_name = user_principal_name.replace(
-                        MICROSOFT_DOMAIN, ""
-                    )
-                ad_groups_to_emails[name].add(user_principal_name)
-            elif mail:
-                if MICROSOFT_DOMAIN in mail:
-                    mail = mail.replace(MICROSOFT_DOMAIN, "")
-                ad_groups_to_emails[name].add(mail)
+        return external_user_groups

-    external_user_groups: list[ExternalUserGroup] = []
-    for group_name, emails in groups_and_members.groups_to_emails.items():
-        external_user_group = ExternalUserGroup(
-            id=group_name,
-            user_emails=list(emails),
-        )
-        external_user_groups.append(external_user_group)
-
-    for group_name, emails in ad_groups_to_emails.items():
-        external_user_group = ExternalUserGroup(
-            id=group_name,
-            user_emails=list(emails),
-        )
-        external_user_groups.append(external_user_group)
+    already_resolved = set(groups_and_members.groups_to_emails.keys())
+    for group in _enumerate_ad_groups_paginated(
+        get_access_token, already_resolved, graph_api_base
+    ):
+        external_user_groups.append(group)

    return external_user_groups
--- a/backend/ee/onyx/main.py
+++ b/backend/ee/onyx/main.py
@@ -31,6 +31,7 @@ from ee.onyx.server.query_and_chat.query_backend import (
 from ee.onyx.server.query_and_chat.search_backend import router as search_router
 from ee.onyx.server.query_history.api import router as query_history_router
 from ee.onyx.server.reporting.usage_export_api import router as usage_export_router
+from ee.onyx.server.scim.api import scim_router
 from ee.onyx.server.seeding import seed_db
 from ee.onyx.server.tenants.api import router as tenants_router
 from ee.onyx.server.token_rate_limits.api import (
@@ -162,6 +163,11 @@ def get_application() -> FastAPI:
        # Tenant management
        include_router_with_global_prefix_prepended(application, tenants_router)

+    # SCIM 2.0 — protocol endpoints (unauthenticated by Onyx session auth;
+    # they use their own SCIM bearer token auth).
+    # Not behind APP_API_PREFIX because IdPs expect /scim/v2/... directly.
+    application.include_router(scim_router)
+
    # Ensure all routes have auth enabled or are explicitly marked as public
    check_ee_router_auth(application)

--- a/backend/ee/onyx/server/auth_check.py
+++ b/backend/ee/onyx/server/auth_check.py
@@ -5,6 +5,11 @@ from onyx.server.auth_check import PUBLIC_ENDPOINT_SPECS


 EE_PUBLIC_ENDPOINT_SPECS = PUBLIC_ENDPOINT_SPECS + [
+    # SCIM 2.0 service discovery — unauthenticated so IdPs can probe
+    # before bearer token configuration is complete
+    ("/scim/v2/ServiceProviderConfig", {"GET"}),
+    ("/scim/v2/ResourceTypes", {"GET"}),
+    ("/scim/v2/Schemas", {"GET"}),
    # needs to be accessible prior to user login
    ("/enterprise-settings", {"GET"}),
    ("/enterprise-settings/logo", {"GET"}),
--- a/backend/ee/onyx/server/enterprise_settings/api.py
+++ b/backend/ee/onyx/server/enterprise_settings/api.py
@@ -13,6 +13,7 @@ from pydantic import BaseModel
 from pydantic import Field
 from sqlalchemy.orm import Session

+from ee.onyx.db.scim import ScimDAL
 from ee.onyx.server.enterprise_settings.models import AnalyticsScriptUpload
 from ee.onyx.server.enterprise_settings.models import EnterpriseSettings
 from ee.onyx.server.enterprise_settings.store import get_logo_filename
@@ -22,6 +23,10 @@ from ee.onyx.server.enterprise_settings.store import load_settings
 from ee.onyx.server.enterprise_settings.store import store_analytics_script
 from ee.onyx.server.enterprise_settings.store import store_settings
 from ee.onyx.server.enterprise_settings.store import upload_logo
+from ee.onyx.server.scim.auth import generate_scim_token
+from ee.onyx.server.scim.models import ScimTokenCreate
+from ee.onyx.server.scim.models import ScimTokenCreatedResponse
+from ee.onyx.server.scim.models import ScimTokenResponse
 from onyx.auth.users import current_admin_user
 from onyx.auth.users import current_user_with_expired_token
 from onyx.auth.users import get_user_manager
@@ -198,3 +203,63 @@ def upload_custom_analytics_script(
@basic_router.get("/custom-analytics-script")
 def fetch_custom_analytics_script() -> str | None:
    return load_analytics_script()
+
+
+# ---------------------------------------------------------------------------
+# SCIM token management
+# ---------------------------------------------------------------------------
+
+
+def _get_scim_dal(db_session: Session = Depends(get_session)) -> ScimDAL:
+    return ScimDAL(db_session)
+
+
+@admin_router.get("/scim/token")
+def get_active_scim_token(
+    _: User = Depends(current_admin_user),
+    dal: ScimDAL = Depends(_get_scim_dal),
+) -> ScimTokenResponse:
+    """Return the currently active SCIM token's metadata, or 404 if none."""
+    token = dal.get_active_token()
+    if not token:
+        raise HTTPException(status_code=404, detail="No active SCIM token")
+    return ScimTokenResponse(
+        id=token.id,
+        name=token.name,
+        token_display=token.token_display,
+        is_active=token.is_active,
+        created_at=token.created_at,
+        last_used_at=token.last_used_at,
+    )
+
+
+@admin_router.post("/scim/token", status_code=201)
+def create_scim_token(
+    body: ScimTokenCreate,
+    user: User = Depends(current_admin_user),
+    dal: ScimDAL = Depends(_get_scim_dal),
+) -> ScimTokenCreatedResponse:
+    """Create a new SCIM bearer token.
+
+    Only one token is active at a time — creating a new token automatically
+    revokes all previous tokens. The raw token value is returned exactly once
+    in the response; it cannot be retrieved again.
+    """
+    raw_token, hashed_token, token_display = generate_scim_token()
+    token = dal.create_token(
+        name=body.name,
+        hashed_token=hashed_token,
+        token_display=token_display,
+        created_by_id=user.id,
+    )
+    dal.commit()
+
+    return ScimTokenCreatedResponse(
+        id=token.id,
+        name=token.name,
+        token_display=token.token_display,
+        is_active=token.is_active,
+        created_at=token.created_at,
+        last_used_at=token.last_used_at,
+        raw_token=raw_token,
+    )
--- a/backend/ee/onyx/server/query_and_chat/models.py
+++ b/backend/ee/onyx/server/query_and_chat/models.py
@@ -34,7 +34,7 @@ class SendSearchQueryRequest(BaseModel):
    filters: BaseFilters | None = None
    num_docs_fed_to_llm_selection: int | None = None
    run_query_expansion: bool = False
-    num_hits: int = 50
+    num_hits: int = 30

    include_content: bool = False
    stream: bool = False
--- a/backend/ee/onyx/server/scim/api.py
+++ b/backend/ee/onyx/server/scim/api.py
@@ -0,0 +1,722 @@
+"""SCIM 2.0 API endpoints (RFC 7644).
+
+This module provides the FastAPI router for SCIM service discovery,
+User CRUD, and Group CRUD. Identity providers (Okta, Azure AD) call
+these endpoints to provision and manage users and groups.
+
+Service discovery endpoints are unauthenticated — IdPs may probe them
+before bearer token configuration is complete. All other endpoints
+require a valid SCIM bearer token.
+"""
+
+from __future__ import annotations
+
+from uuid import UUID
+
+from fastapi import APIRouter
+from fastapi import Depends
+from fastapi import Query
+from fastapi import Response
+from fastapi.responses import JSONResponse
+from fastapi_users.password import PasswordHelper
+from sqlalchemy import func
+from sqlalchemy.exc import IntegrityError
+from sqlalchemy.orm import Session
+
+from ee.onyx.db.scim import ScimDAL
+from ee.onyx.server.scim.auth import verify_scim_token
+from ee.onyx.server.scim.filtering import parse_scim_filter
+from ee.onyx.server.scim.models import ScimError
+from ee.onyx.server.scim.models import ScimGroupMember
+from ee.onyx.server.scim.models import ScimGroupResource
+from ee.onyx.server.scim.models import ScimListResponse
+from ee.onyx.server.scim.models import ScimName
+from ee.onyx.server.scim.models import ScimPatchRequest
+from ee.onyx.server.scim.models import ScimResourceType
+from ee.onyx.server.scim.models import ScimSchemaDefinition
+from ee.onyx.server.scim.models import ScimServiceProviderConfig
+from ee.onyx.server.scim.models import ScimUserResource
+from ee.onyx.server.scim.patch import apply_group_patch
+from ee.onyx.server.scim.patch import apply_user_patch
+from ee.onyx.server.scim.patch import ScimPatchError
+from ee.onyx.server.scim.providers.base import get_default_provider
+from ee.onyx.server.scim.providers.base import ScimProvider
+from ee.onyx.server.scim.schema_definitions import GROUP_RESOURCE_TYPE
+from ee.onyx.server.scim.schema_definitions import GROUP_SCHEMA_DEF
+from ee.onyx.server.scim.schema_definitions import SERVICE_PROVIDER_CONFIG
+from ee.onyx.server.scim.schema_definitions import USER_RESOURCE_TYPE
+from ee.onyx.server.scim.schema_definitions import USER_SCHEMA_DEF
+from onyx.db.engine.sql_engine import get_session
+from onyx.db.models import ScimToken
+from onyx.db.models import User
+from onyx.db.models import UserGroup
+from onyx.db.models import UserRole
+from onyx.utils.variable_functionality import fetch_ee_implementation_or_noop
+
+# NOTE: All URL paths in this router (/ServiceProviderConfig, /ResourceTypes,
+# /Schemas, /Users, /Groups) are mandated by the SCIM spec (RFC 7643/7644).
+# IdPs like Okta and Azure AD hardcode these exact paths, so they cannot be
+# changed to kebab-case.
+scim_router = APIRouter(prefix="/scim/v2", tags=["SCIM"])
+
+_pw_helper = PasswordHelper()
+
+
+def _get_provider(
+    _token: ScimToken = Depends(verify_scim_token),
+) -> ScimProvider:
+    """Resolve the SCIM provider for the current request.
+
+    Currently returns OktaProvider for all requests. When multi-provider
+    support is added (ENG-3652), this will resolve based on token metadata
+    or tenant configuration — no endpoint changes required.
+    """
+    return get_default_provider()
+
+
+# ---------------------------------------------------------------------------
+# Service Discovery Endpoints (unauthenticated)
+# ---------------------------------------------------------------------------
+
+
+@scim_router.get("/ServiceProviderConfig")
+def get_service_provider_config() -> ScimServiceProviderConfig:
+    """Advertise supported SCIM features (RFC 7643 §5)."""
+    return SERVICE_PROVIDER_CONFIG
+
+
+@scim_router.get("/ResourceTypes")
+def get_resource_types() -> list[ScimResourceType]:
+    """List available SCIM resource types (RFC 7643 §6)."""
+    return [USER_RESOURCE_TYPE, GROUP_RESOURCE_TYPE]
+
+
+@scim_router.get("/Schemas")
+def get_schemas() -> list[ScimSchemaDefinition]:
+    """Return SCIM schema definitions (RFC 7643 §7)."""
+    return [USER_SCHEMA_DEF, GROUP_SCHEMA_DEF]
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _scim_error_response(status: int, detail: str) -> JSONResponse:
+    """Build a SCIM-compliant error response (RFC 7644 §3.12)."""
+    body = ScimError(status=str(status), detail=detail)
+    return JSONResponse(
+        status_code=status,
+        content=body.model_dump(exclude_none=True),
+    )
+
+
+def _check_seat_availability(dal: ScimDAL) -> str | None:
+    """Return an error message if seat limit is reached, else None."""
+    check_fn = fetch_ee_implementation_or_noop(
+        "onyx.db.license", "check_seat_availability", None
+    )
+    if check_fn is None:
+        return None
+    result = check_fn(dal.session, seats_needed=1)
+    if not result.available:
+        return result.error_message or "Seat limit reached"
+    return None
+
+
+def _fetch_user_or_404(user_id: str, dal: ScimDAL) -> User | JSONResponse:
+    """Parse *user_id* as UUID, look up the user, or return a 404 error."""
+    try:
+        uid = UUID(user_id)
+    except ValueError:
+        return _scim_error_response(404, f"User {user_id} not found")
+    user = dal.get_user(uid)
+    if not user:
+        return _scim_error_response(404, f"User {user_id} not found")
+    return user
+
+
+def _scim_name_to_str(name: ScimName | None) -> str | None:
+    """Extract a display name string from a SCIM name object.
+
+    Returns None if no name is provided, so the caller can decide
+    whether to update the user's personal_name.
+    """
+    if not name:
+        return None
+    # Build from givenName/familyName first — IdPs like Okta may send a stale
+    # ``formatted`` value while updating the individual name components.
+    parts = " ".join(part for part in [name.givenName, name.familyName] if part)
+    return parts or name.formatted
+
+
+# ---------------------------------------------------------------------------
+# User CRUD (RFC 7644 §3)
+# ---------------------------------------------------------------------------
+
+
+@scim_router.get("/Users", response_model=None)
+def list_users(
+    filter: str | None = Query(None),
+    startIndex: int = Query(1, ge=1),
+    count: int = Query(100, ge=0, le=500),
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimListResponse | JSONResponse:
+    """List users with optional SCIM filter and pagination."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    try:
+        scim_filter = parse_scim_filter(filter)
+    except ValueError as e:
+        return _scim_error_response(400, str(e))
+
+    try:
+        users_with_mappings, total = dal.list_users(scim_filter, startIndex, count)
+    except ValueError as e:
+        return _scim_error_response(400, str(e))
+
+    user_groups_map = dal.get_users_groups_batch([u.id for u, _ in users_with_mappings])
+    resources: list[ScimUserResource | ScimGroupResource] = [
+        provider.build_user_resource(
+            user,
+            mapping.external_id if mapping else None,
+            groups=user_groups_map.get(user.id, []),
+            scim_username=mapping.scim_username if mapping else None,
+        )
+        for user, mapping in users_with_mappings
+    ]
+
+    return ScimListResponse(
+        totalResults=total,
+        startIndex=startIndex,
+        itemsPerPage=count,
+        Resources=resources,
+    )
+
+
+@scim_router.get("/Users/{user_id}", response_model=None)
+def get_user(
+    user_id: str,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimUserResource | JSONResponse:
+    """Get a single user by ID."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_user_or_404(user_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    user = result
+
+    mapping = dal.get_user_mapping_by_user_id(user.id)
+    return provider.build_user_resource(
+        user,
+        mapping.external_id if mapping else None,
+        groups=dal.get_user_groups(user.id),
+        scim_username=mapping.scim_username if mapping else None,
+    )
+
+
+@scim_router.post("/Users", status_code=201, response_model=None)
+def create_user(
+    user_resource: ScimUserResource,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimUserResource | JSONResponse:
+    """Create a new user from a SCIM provisioning request."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    email = user_resource.userName.strip()
+
+    # externalId is how the IdP correlates this user on subsequent requests.
+    # Without it, the IdP can't find the user and will try to re-create,
+    # hitting a 409 conflict — so we require it up front.
+    if not user_resource.externalId:
+        return _scim_error_response(400, "externalId is required")
+
+    # Enforce seat limit
+    seat_error = _check_seat_availability(dal)
+    if seat_error:
+        return _scim_error_response(403, seat_error)
+
+    # Check for existing user
+    if dal.get_user_by_email(email):
+        return _scim_error_response(409, f"User with email {email} already exists")
+
+    # Create user with a random password (SCIM users authenticate via IdP)
+    personal_name = _scim_name_to_str(user_resource.name)
+    user = User(
+        email=email,
+        hashed_password=_pw_helper.hash(_pw_helper.generate()),
+        role=UserRole.BASIC,
+        is_active=user_resource.active,
+        is_verified=True,
+        personal_name=personal_name,
+    )
+
+    try:
+        dal.add_user(user)
+    except IntegrityError:
+        dal.rollback()
+        return _scim_error_response(409, f"User with email {email} already exists")
+
+    # Create SCIM mapping (externalId is validated above, always present)
+    external_id = user_resource.externalId
+    scim_username = user_resource.userName.strip()
+    dal.create_user_mapping(
+        external_id=external_id, user_id=user.id, scim_username=scim_username
+    )
+
+    dal.commit()
+
+    return provider.build_user_resource(user, external_id, scim_username=scim_username)
+
+
+@scim_router.put("/Users/{user_id}", response_model=None)
+def replace_user(
+    user_id: str,
+    user_resource: ScimUserResource,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimUserResource | JSONResponse:
+    """Replace a user entirely (RFC 7644 §3.5.1)."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_user_or_404(user_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    user = result
+
+    # Handle activation (need seat check) / deactivation
+    if user_resource.active and not user.is_active:
+        seat_error = _check_seat_availability(dal)
+        if seat_error:
+            return _scim_error_response(403, seat_error)
+
+    personal_name = _scim_name_to_str(user_resource.name)
+
+    dal.update_user(
+        user,
+        email=user_resource.userName.strip(),
+        is_active=user_resource.active,
+        personal_name=personal_name,
+    )
+
+    new_external_id = user_resource.externalId
+    scim_username = user_resource.userName.strip()
+    dal.sync_user_external_id(user.id, new_external_id, scim_username=scim_username)
+
+    dal.commit()
+
+    return provider.build_user_resource(
+        user,
+        new_external_id,
+        groups=dal.get_user_groups(user.id),
+        scim_username=scim_username,
+    )
+
+
+@scim_router.patch("/Users/{user_id}", response_model=None)
+def patch_user(
+    user_id: str,
+    patch_request: ScimPatchRequest,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimUserResource | JSONResponse:
+    """Partially update a user (RFC 7644 §3.5.2).
+
+    This is the primary endpoint for user deprovisioning — Okta sends
+    ``PATCH {"active": false}`` rather than DELETE.
+    """
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_user_or_404(user_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    user = result
+
+    mapping = dal.get_user_mapping_by_user_id(user.id)
+    external_id = mapping.external_id if mapping else None
+    current_scim_username = mapping.scim_username if mapping else None
+
+    current = provider.build_user_resource(
+        user,
+        external_id,
+        groups=dal.get_user_groups(user.id),
+        scim_username=current_scim_username,
+    )
+
+    try:
+        patched = apply_user_patch(
+            patch_request.Operations, current, provider.ignored_patch_paths
+        )
+    except ScimPatchError as e:
+        return _scim_error_response(e.status, e.detail)
+
+    # Apply changes back to the DB model
+    if patched.active != user.is_active:
+        if patched.active:
+            seat_error = _check_seat_availability(dal)
+            if seat_error:
+                return _scim_error_response(403, seat_error)
+
+    # Track the scim_username — if userName was patched, update it
+    new_scim_username = patched.userName.strip() if patched.userName else None
+
+    # If displayName was explicitly patched (different from the original), use
+    # it as personal_name directly.  Otherwise, derive from name components.
+    personal_name: str | None
+    if patched.displayName and patched.displayName != current.displayName:
+        personal_name = patched.displayName
+    else:
+        personal_name = _scim_name_to_str(patched.name)
+
+    dal.update_user(
+        user,
+        email=(
+            patched.userName.strip()
+            if patched.userName.strip().lower() != user.email.lower()
+            else None
+        ),
+        is_active=patched.active if patched.active != user.is_active else None,
+        personal_name=personal_name,
+    )
+
+    dal.sync_user_external_id(
+        user.id, patched.externalId, scim_username=new_scim_username
+    )
+
+    dal.commit()
+
+    return provider.build_user_resource(
+        user,
+        patched.externalId,
+        groups=dal.get_user_groups(user.id),
+        scim_username=new_scim_username,
+    )
+
+
+@scim_router.delete("/Users/{user_id}", status_code=204, response_model=None)
+def delete_user(
+    user_id: str,
+    _token: ScimToken = Depends(verify_scim_token),
+    db_session: Session = Depends(get_session),
+) -> Response | JSONResponse:
+    """Delete a user (RFC 7644 §3.6).
+
+    Deactivates the user and removes the SCIM mapping. Note that Okta
+    typically uses PATCH active=false instead of DELETE.
+    """
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_user_or_404(user_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    user = result
+
+    dal.deactivate_user(user)
+
+    mapping = dal.get_user_mapping_by_user_id(user.id)
+    if mapping:
+        dal.delete_user_mapping(mapping.id)
+
+    dal.commit()
+
+    return Response(status_code=204)
+
+
+# ---------------------------------------------------------------------------
+# Group helpers
+# ---------------------------------------------------------------------------
+
+
+def _fetch_group_or_404(group_id: str, dal: ScimDAL) -> UserGroup | JSONResponse:
+    """Parse *group_id* as int, look up the group, or return a 404 error."""
+    try:
+        gid = int(group_id)
+    except ValueError:
+        return _scim_error_response(404, f"Group {group_id} not found")
+    group = dal.get_group(gid)
+    if not group:
+        return _scim_error_response(404, f"Group {group_id} not found")
+    return group
+
+
+def _parse_member_uuids(
+    members: list[ScimGroupMember],
+) -> tuple[list[UUID], str | None]:
+    """Parse member value strings to UUIDs.
+
+    Returns (uuid_list, error_message). error_message is None on success.
+    """
+    uuids: list[UUID] = []
+    for m in members:
+        try:
+            uuids.append(UUID(m.value))
+        except ValueError:
+            return [], f"Invalid member ID: {m.value}"
+    return uuids, None
+
+
+def _validate_and_parse_members(
+    members: list[ScimGroupMember], dal: ScimDAL
+) -> tuple[list[UUID], str | None]:
+    """Parse and validate member UUIDs exist in the database.
+
+    Returns (uuid_list, error_message). error_message is None on success.
+    """
+    uuids, err = _parse_member_uuids(members)
+    if err:
+        return [], err
+
+    if uuids:
+        missing = dal.validate_member_ids(uuids)
+        if missing:
+            return [], f"Member(s) not found: {', '.join(str(u) for u in missing)}"
+
+    return uuids, None
+
+
+# ---------------------------------------------------------------------------
+# Group CRUD (RFC 7644 §3)
+# ---------------------------------------------------------------------------
+
+
+@scim_router.get("/Groups", response_model=None)
+def list_groups(
+    filter: str | None = Query(None),
+    startIndex: int = Query(1, ge=1),
+    count: int = Query(100, ge=0, le=500),
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimListResponse | JSONResponse:
+    """List groups with optional SCIM filter and pagination."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    try:
+        scim_filter = parse_scim_filter(filter)
+    except ValueError as e:
+        return _scim_error_response(400, str(e))
+
+    try:
+        groups_with_ext_ids, total = dal.list_groups(scim_filter, startIndex, count)
+    except ValueError as e:
+        return _scim_error_response(400, str(e))
+
+    resources: list[ScimUserResource | ScimGroupResource] = [
+        provider.build_group_resource(group, dal.get_group_members(group.id), ext_id)
+        for group, ext_id in groups_with_ext_ids
+    ]
+
+    return ScimListResponse(
+        totalResults=total,
+        startIndex=startIndex,
+        itemsPerPage=count,
+        Resources=resources,
+    )
+
+
+@scim_router.get("/Groups/{group_id}", response_model=None)
+def get_group(
+    group_id: str,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimGroupResource | JSONResponse:
+    """Get a single group by ID."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_group_or_404(group_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    group = result
+
+    mapping = dal.get_group_mapping_by_group_id(group.id)
+    members = dal.get_group_members(group.id)
+
+    return provider.build_group_resource(
+        group, members, mapping.external_id if mapping else None
+    )
+
+
+@scim_router.post("/Groups", status_code=201, response_model=None)
+def create_group(
+    group_resource: ScimGroupResource,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimGroupResource | JSONResponse:
+    """Create a new group from a SCIM provisioning request."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    if dal.get_group_by_name(group_resource.displayName):
+        return _scim_error_response(
+            409, f"Group with name '{group_resource.displayName}' already exists"
+        )
+
+    member_uuids, err = _validate_and_parse_members(group_resource.members, dal)
+    if err:
+        return _scim_error_response(400, err)
+
+    db_group = UserGroup(
+        name=group_resource.displayName,
+        is_up_to_date=True,
+        time_last_modified_by_user=func.now(),
+    )
+    try:
+        dal.add_group(db_group)
+    except IntegrityError:
+        dal.rollback()
+        return _scim_error_response(
+            409, f"Group with name '{group_resource.displayName}' already exists"
+        )
+
+    dal.upsert_group_members(db_group.id, member_uuids)
+
+    external_id = group_resource.externalId
+    if external_id:
+        dal.create_group_mapping(external_id=external_id, user_group_id=db_group.id)
+
+    dal.commit()
+
+    members = dal.get_group_members(db_group.id)
+    return provider.build_group_resource(db_group, members, external_id)
+
+
+@scim_router.put("/Groups/{group_id}", response_model=None)
+def replace_group(
+    group_id: str,
+    group_resource: ScimGroupResource,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimGroupResource | JSONResponse:
+    """Replace a group entirely (RFC 7644 §3.5.1)."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_group_or_404(group_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    group = result
+
+    member_uuids, err = _validate_and_parse_members(group_resource.members, dal)
+    if err:
+        return _scim_error_response(400, err)
+
+    dal.update_group(group, name=group_resource.displayName)
+    dal.replace_group_members(group.id, member_uuids)
+    dal.sync_group_external_id(group.id, group_resource.externalId)
+
+    dal.commit()
+
+    members = dal.get_group_members(group.id)
+    return provider.build_group_resource(group, members, group_resource.externalId)
+
+
+@scim_router.patch("/Groups/{group_id}", response_model=None)
+def patch_group(
+    group_id: str,
+    patch_request: ScimPatchRequest,
+    _token: ScimToken = Depends(verify_scim_token),
+    provider: ScimProvider = Depends(_get_provider),
+    db_session: Session = Depends(get_session),
+) -> ScimGroupResource | JSONResponse:
+    """Partially update a group (RFC 7644 §3.5.2).
+
+    Handles member add/remove operations from Okta and Azure AD.
+    """
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_group_or_404(group_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    group = result
+
+    mapping = dal.get_group_mapping_by_group_id(group.id)
+    external_id = mapping.external_id if mapping else None
+
+    current_members = dal.get_group_members(group.id)
+    current = provider.build_group_resource(group, current_members, external_id)
+
+    try:
+        patched, added_ids, removed_ids = apply_group_patch(
+            patch_request.Operations, current, provider.ignored_patch_paths
+        )
+    except ScimPatchError as e:
+        return _scim_error_response(e.status, e.detail)
+
+    new_name = patched.displayName if patched.displayName != group.name else None
+    dal.update_group(group, name=new_name)
+
+    if added_ids:
+        add_uuids = [UUID(mid) for mid in added_ids if _is_valid_uuid(mid)]
+        if add_uuids:
+            missing = dal.validate_member_ids(add_uuids)
+            if missing:
+                return _scim_error_response(
+                    400,
+                    f"Member(s) not found: {', '.join(str(u) for u in missing)}",
+                )
+            dal.upsert_group_members(group.id, add_uuids)
+
+    if removed_ids:
+        remove_uuids = [UUID(mid) for mid in removed_ids if _is_valid_uuid(mid)]
+        dal.remove_group_members(group.id, remove_uuids)
+
+    dal.sync_group_external_id(group.id, patched.externalId)
+    dal.commit()
+
+    members = dal.get_group_members(group.id)
+    return provider.build_group_resource(group, members, patched.externalId)
+
+
+@scim_router.delete("/Groups/{group_id}", status_code=204, response_model=None)
+def delete_group(
+    group_id: str,
+    _token: ScimToken = Depends(verify_scim_token),
+    db_session: Session = Depends(get_session),
+) -> Response | JSONResponse:
+    """Delete a group (RFC 7644 §3.6)."""
+    dal = ScimDAL(db_session)
+    dal.update_token_last_used(_token.id)
+
+    result = _fetch_group_or_404(group_id, dal)
+    if isinstance(result, JSONResponse):
+        return result
+    group = result
+
+    mapping = dal.get_group_mapping_by_group_id(group.id)
+    if mapping:
+        dal.delete_group_mapping(mapping.id)
+
+    dal.delete_group_with_members(group)
+    dal.commit()
+
+    return Response(status_code=204)
+
+
+def _is_valid_uuid(value: str) -> bool:
+    """Check if a string is a valid UUID."""
+    try:
+        UUID(value)
+        return True
+    except ValueError:
+        return False
--- a/backend/ee/onyx/server/scim/auth.py
+++ b/backend/ee/onyx/server/scim/auth.py
@@ -0,0 +1,104 @@
+"""SCIM bearer token authentication.
+
+SCIM endpoints are authenticated via bearer tokens that admins create in the
+Onyx UI. This module provides:
+
+  - ``verify_scim_token``: FastAPI dependency that extracts, hashes, and
+    validates the token from the Authorization header.
+  - ``generate_scim_token``: Creates a new cryptographically random token
+    and returns the raw value, its SHA-256 hash, and a display suffix.
+
+Token format: ``onyx_scim_<random>`` where ``<random>`` is 48 bytes of
+URL-safe base64 from ``secrets.token_urlsafe``.
+
+The hash is stored in the ``scim_token`` table; the raw value is shown to
+the admin exactly once at creation time.
+"""
+
+import hashlib
+import secrets
+
+from fastapi import Depends
+from fastapi import HTTPException
+from fastapi import Request
+from sqlalchemy.orm import Session
+
+from ee.onyx.db.scim import ScimDAL
+from onyx.auth.utils import get_hashed_bearer_token_from_request
+from onyx.db.engine.sql_engine import get_session
+from onyx.db.models import ScimToken
+
+SCIM_TOKEN_PREFIX = "onyx_scim_"
+SCIM_TOKEN_LENGTH = 48
+
+
+def _hash_scim_token(token: str) -> str:
+    """SHA-256 hash a SCIM token. No salt needed — tokens are random."""
+    return hashlib.sha256(token.encode("utf-8")).hexdigest()
+
+
+def generate_scim_token() -> tuple[str, str, str]:
+    """Generate a new SCIM bearer token.
+
+    Returns:
+        A tuple of ``(raw_token, hashed_token, token_display)`` where
+        ``token_display`` is a masked version showing only the last 4 chars.
+    """
+    raw_token = SCIM_TOKEN_PREFIX + secrets.token_urlsafe(SCIM_TOKEN_LENGTH)
+    hashed_token = _hash_scim_token(raw_token)
+    token_display = SCIM_TOKEN_PREFIX + "****" + raw_token[-4:]
+    return raw_token, hashed_token, token_display
+
+
+def _get_hashed_scim_token_from_request(request: Request) -> str | None:
+    """Extract and hash a SCIM token from the request Authorization header."""
+    return get_hashed_bearer_token_from_request(
+        request,
+        valid_prefixes=[SCIM_TOKEN_PREFIX],
+        hash_fn=_hash_scim_token,
+    )
+
+
+def _get_scim_dal(db_session: Session = Depends(get_session)) -> ScimDAL:
+    return ScimDAL(db_session)
+
+
+def verify_scim_token(
+    request: Request,
+    dal: ScimDAL = Depends(_get_scim_dal),
+) -> ScimToken:
+    """FastAPI dependency that authenticates SCIM requests.
+
+    Extracts the bearer token from the Authorization header, hashes it,
+    looks it up in the database, and verifies it is active.
+
+    Note:
+        This dependency does NOT update ``last_used_at`` — the endpoint
+        should do that via ``ScimDAL.update_token_last_used()`` so the
+        timestamp write is part of the endpoint's transaction.
+
+    Raises:
+        HTTPException(401): If the token is missing, invalid, or inactive.
+    """
+    hashed = _get_hashed_scim_token_from_request(request)
+    if not hashed:
+        raise HTTPException(
+            status_code=401,
+            detail="Missing or invalid SCIM bearer token",
+        )
+
+    token = dal.get_token_by_hash(hashed)
+
+    if not token:
+        raise HTTPException(
+            status_code=401,
+            detail="Invalid SCIM bearer token",
+        )
+
+    if not token.is_active:
+        raise HTTPException(
+            status_code=401,
+            detail="SCIM token has been revoked",
+        )
+
+    return token
--- a/backend/ee/onyx/server/scim/models.py
+++ b/backend/ee/onyx/server/scim/models.py
@@ -30,6 +30,7 @@ SCIM_SERVICE_PROVIDER_CONFIG_SCHEMA = (
    "urn:ietf:params:scim:schemas:core:2.0:ServiceProviderConfig"
 )
 SCIM_RESOURCE_TYPE_SCHEMA = "urn:ietf:params:scim:schemas:core:2.0:ResourceType"
+SCIM_SCHEMA_SCHEMA = "urn:ietf:params:scim:schemas:core:2.0:Schema"


 # ---------------------------------------------------------------------------
@@ -62,6 +63,13 @@ class ScimMeta(BaseModel):
    location: str | None = None


+class ScimUserGroupRef(BaseModel):
+    """Group reference within a User resource (RFC 7643 §4.1.2, read-only)."""
+
+    value: str
+    display: str | None = None
+
+
 class ScimUserResource(BaseModel):
    """SCIM User resource representation (RFC 7643 §4.1).

@@ -75,8 +83,10 @@ class ScimUserResource(BaseModel):
    externalId: str | None = None  # IdP's identifier for this user
    userName: str  # Typically the user's email address
    name: ScimName | None = None
+    displayName: str | None = None
    emails: list[ScimEmail] = Field(default_factory=list)
    active: bool = True
+    groups: list[ScimUserGroupRef] = Field(default_factory=list)
    meta: ScimMeta | None = None


@@ -120,12 +130,40 @@ class ScimPatchOperationType(str, Enum):
    REMOVE = "remove"


+class ScimPatchResourceValue(BaseModel):
+    """Partial resource dict for path-less PATCH replace operations.
+
+    When an IdP sends a PATCH without a ``path``, the ``value`` is a dict
+    of resource attributes to set.  IdPs may include read-only fields
+    (``id``, ``schemas``, ``meta``) alongside actual changes — these are
+    stripped by the provider's ``ignored_patch_paths`` before processing.
+
+    ``extra="allow"`` lets unknown attributes pass through so the patch
+    handler can decide what to do with them (ignore or reject).
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    active: bool | None = None
+    userName: str | None = None
+    displayName: str | None = None
+    externalId: str | None = None
+    name: ScimName | None = None
+    members: list[ScimGroupMember] | None = None
+    id: str | None = None
+    schemas: list[str] | None = None
+    meta: ScimMeta | None = None
+
+
+ScimPatchValue = str | bool | list[ScimGroupMember] | ScimPatchResourceValue | None
+
+
 class ScimPatchOperation(BaseModel):
    """Single PATCH operation (RFC 7644 §3.5.2)."""

    op: ScimPatchOperationType
    path: str | None = None
-    value: str | list[dict[str, str]] | dict[str, str | bool] | bool | None = None
+    value: ScimPatchValue = None


 class ScimPatchRequest(BaseModel):
@@ -195,10 +233,39 @@ class ScimServiceProviderConfig(BaseModel):
    )


+class ScimSchemaAttribute(BaseModel):
+    """Attribute definition within a SCIM Schema (RFC 7643 §7)."""
+
+    name: str
+    type: str
+    multiValued: bool = False
+    required: bool = False
+    description: str = ""
+    caseExact: bool = False
+    mutability: str = "readWrite"
+    returned: str = "default"
+    uniqueness: str = "none"
+    subAttributes: list["ScimSchemaAttribute"] = Field(default_factory=list)
+
+
+class ScimSchemaDefinition(BaseModel):
+    """SCIM Schema definition (RFC 7643 §7).
+
+    Served at GET /scim/v2/Schemas. Describes the attributes available
+    on each resource type so IdPs know which fields they can provision.
+    """
+
+    schemas: list[str] = Field(default_factory=lambda: [SCIM_SCHEMA_SCHEMA])
+    id: str
+    name: str
+    description: str
+    attributes: list[ScimSchemaAttribute] = Field(default_factory=list)
+
+
 class ScimSchemaExtension(BaseModel):
    """Schema extension reference within ResourceType (RFC 7643 §6)."""

-    model_config = ConfigDict(populate_by_name=True)
+    model_config = ConfigDict(populate_by_name=True, serialize_by_alias=True)

    schema_: str = Field(alias="schema")
    required: bool
@@ -211,7 +278,7 @@ class ScimResourceType(BaseModel):
    types are available (Users, Groups) and their respective endpoints.
    """

-    model_config = ConfigDict(populate_by_name=True)
+    model_config = ConfigDict(populate_by_name=True, serialize_by_alias=True)

    schemas: list[str] = Field(default_factory=lambda: [SCIM_RESOURCE_TYPE_SCHEMA])
    id: str
--- a/backend/ee/onyx/server/scim/patch.py
+++ b/backend/ee/onyx/server/scim/patch.py
@@ -16,9 +16,12 @@ from __future__ import annotations

 import re

+from ee.onyx.server.scim.models import ScimGroupMember
 from ee.onyx.server.scim.models import ScimGroupResource
 from ee.onyx.server.scim.models import ScimPatchOperation
 from ee.onyx.server.scim.models import ScimPatchOperationType
+from ee.onyx.server.scim.models import ScimPatchResourceValue
+from ee.onyx.server.scim.models import ScimPatchValue
 from ee.onyx.server.scim.models import ScimUserResource


@@ -41,9 +44,15 @@ _MEMBER_FILTER_RE = re.compile(
 def apply_user_patch(
    operations: list[ScimPatchOperation],
    current: ScimUserResource,
+    ignored_paths: frozenset[str] = frozenset(),
 ) -> ScimUserResource:
    """Apply SCIM PATCH operations to a user resource.

+    Args:
+        operations: The PATCH operations to apply.
+        current: The current user resource state.
+        ignored_paths: SCIM attribute paths to silently skip (from provider).
+
    Returns a new ``ScimUserResource`` with the modifications applied.
    The original object is not mutated.

@@ -55,9 +64,9 @@ def apply_user_patch(

    for op in operations:
        if op.op == ScimPatchOperationType.REPLACE:
-            _apply_user_replace(op, data, name_data)
+            _apply_user_replace(op, data, name_data, ignored_paths)
        elif op.op == ScimPatchOperationType.ADD:
-            _apply_user_replace(op, data, name_data)
+            _apply_user_replace(op, data, name_data, ignored_paths)
        else:
            raise ScimPatchError(
                f"Unsupported operation '{op.op.value}' on User resource"
@@ -71,30 +80,34 @@ def _apply_user_replace(
    op: ScimPatchOperation,
    data: dict,
    name_data: dict,
+    ignored_paths: frozenset[str],
 ) -> None:
    """Apply a replace/add operation to user data."""
    path = (op.path or "").lower()

    if not path:
-        # No path — value is a dict of top-level attributes to set
-        if isinstance(op.value, dict):
-            for key, val in op.value.items():
-                _set_user_field(key.lower(), val, data, name_data)
+        # No path — value is a resource dict of top-level attributes to set
+        if isinstance(op.value, ScimPatchResourceValue):
+            for key, val in op.value.model_dump(exclude_unset=True).items():
+                _set_user_field(key.lower(), val, data, name_data, ignored_paths)
        else:
            raise ScimPatchError("Replace without path requires a dict value")
        return

-    _set_user_field(path, op.value, data, name_data)
+    _set_user_field(path, op.value, data, name_data, ignored_paths)


 def _set_user_field(
    path: str,
-    value: str | bool | dict | list | None,
+    value: ScimPatchValue,
    data: dict,
    name_data: dict,
+    ignored_paths: frozenset[str],
 ) -> None:
    """Set a single field on user data by SCIM path."""
-    if path == "active":
+    if path in ignored_paths:
+        return
+    elif path == "active":
        data["active"] = value
    elif path == "username":
        data["userName"] = value
@@ -107,7 +120,7 @@ def _set_user_field(
    elif path == "name.formatted":
        name_data["formatted"] = value
    elif path == "displayname":
-        # Some IdPs send displayName on users; map to formatted name
+        data["displayName"] = value
        name_data["formatted"] = value
    else:
        raise ScimPatchError(f"Unsupported path '{path}' for User PATCH")
@@ -116,9 +129,15 @@ def _set_user_field(
 def apply_group_patch(
    operations: list[ScimPatchOperation],
    current: ScimGroupResource,
+    ignored_paths: frozenset[str] = frozenset(),
 ) -> tuple[ScimGroupResource, list[str], list[str]]:
    """Apply SCIM PATCH operations to a group resource.

+    Args:
+        operations: The PATCH operations to apply.
+        current: The current group resource state.
+        ignored_paths: SCIM attribute paths to silently skip (from provider).
+
    Returns:
        A tuple of (modified group, added member IDs, removed member IDs).
        The caller uses the member ID lists to update the database.
@@ -133,7 +152,9 @@ def apply_group_patch(

    for op in operations:
        if op.op == ScimPatchOperationType.REPLACE:
-            _apply_group_replace(op, data, current_members, added_ids, removed_ids)
+            _apply_group_replace(
+                op, data, current_members, added_ids, removed_ids, ignored_paths
+            )
        elif op.op == ScimPatchOperationType.ADD:
            _apply_group_add(op, current_members, added_ids)
        elif op.op == ScimPatchOperationType.REMOVE:
@@ -154,38 +175,48 @@ def _apply_group_replace(
    current_members: list[dict],
    added_ids: list[str],
    removed_ids: list[str],
+    ignored_paths: frozenset[str],
 ) -> None:
    """Apply a replace operation to group data."""
    path = (op.path or "").lower()

    if not path:
-        if isinstance(op.value, dict):
-            for key, val in op.value.items():
+        if isinstance(op.value, ScimPatchResourceValue):
+            dumped = op.value.model_dump(exclude_unset=True)
+            for key, val in dumped.items():
                if key.lower() == "members":
                    _replace_members(val, current_members, added_ids, removed_ids)
                else:
-                    _set_group_field(key.lower(), val, data)
+                    _set_group_field(key.lower(), val, data, ignored_paths)
        else:
            raise ScimPatchError("Replace without path requires a dict value")
        return

    if path == "members":
-        _replace_members(op.value, current_members, added_ids, removed_ids)
+        _replace_members(
+            _members_to_dicts(op.value), current_members, added_ids, removed_ids
+        )
        return

-    _set_group_field(path, op.value, data)
+    _set_group_field(path, op.value, data, ignored_paths)
+
+
+def _members_to_dicts(
+    value: str | bool | list[ScimGroupMember] | ScimPatchResourceValue | None,
+) -> list[dict]:
+    """Convert a member list value to a list of dicts for internal processing."""
+    if not isinstance(value, list):
+        raise ScimPatchError("Replace members requires a list value")
+    return [m.model_dump(exclude_none=True) for m in value]


 def _replace_members(
-    value: str | list | dict | bool | None,
+    value: list[dict],
    current_members: list[dict],
    added_ids: list[str],
    removed_ids: list[str],
 ) -> None:
    """Replace the entire group member list."""
-    if not isinstance(value, list):
-        raise ScimPatchError("Replace members requires a list value")
-
    old_ids = {m["value"] for m in current_members}
    new_ids = {m.get("value", "") for m in value}

@@ -197,11 +228,14 @@ def _replace_members(

 def _set_group_field(
    path: str,
-    value: str | bool | dict | list | None,
+    value: ScimPatchValue,
    data: dict,
+    ignored_paths: frozenset[str],
 ) -> None:
    """Set a single field on group data by SCIM path."""
-    if path == "displayname":
+    if path in ignored_paths:
+        return
+    elif path == "displayname":
        data["displayName"] = value
    elif path == "externalid":
        data["externalId"] = value
@@ -223,8 +257,10 @@ def _apply_group_add(
    if not isinstance(op.value, list):
        raise ScimPatchError("Add members requires a list value")

+    member_dicts = [m.model_dump(exclude_none=True) for m in op.value]
+
    existing_ids = {m["value"] for m in members}
-    for member_data in op.value:
+    for member_data in member_dicts:
        member_id = member_data.get("value", "")
        if member_id and member_id not in existing_ids:
            members.append(member_data)
--- a/backend/ee/onyx/server/scim/providers/init.py
+++ b/backend/ee/onyx/server/scim/providers/init.py
--- a/backend/ee/onyx/server/scim/providers/base.py
+++ b/backend/ee/onyx/server/scim/providers/base.py
@@ -0,0 +1,123 @@
+"""Base SCIM provider abstraction."""
+
+from __future__ import annotations
+
+from abc import ABC
+from abc import abstractmethod
+from uuid import UUID
+
+from ee.onyx.server.scim.models import ScimEmail
+from ee.onyx.server.scim.models import ScimGroupMember
+from ee.onyx.server.scim.models import ScimGroupResource
+from ee.onyx.server.scim.models import ScimMeta
+from ee.onyx.server.scim.models import ScimName
+from ee.onyx.server.scim.models import ScimUserGroupRef
+from ee.onyx.server.scim.models import ScimUserResource
+from onyx.db.models import User
+from onyx.db.models import UserGroup
+
+
+class ScimProvider(ABC):
+    """Base class for provider-specific SCIM behavior.
+
+    Subclass this to handle IdP-specific quirks. The base class provides
+    RFC 7643-compliant response builders that populate all standard fields.
+    """
+
+    @property
+    @abstractmethod
+    def name(self) -> str:
+        """Short identifier for this provider (e.g. ``"okta"``)."""
+        ...
+
+    @property
+    @abstractmethod
+    def ignored_patch_paths(self) -> frozenset[str]:
+        """SCIM attribute paths to silently skip in PATCH value-object dicts.
+
+        IdPs may include read-only or meta fields alongside actual changes
+        (e.g. Okta sends ``{"id": "...", "active": false}``). Paths listed
+        here are silently dropped instead of raising an error.
+        """
+        ...
+
+    def build_user_resource(
+        self,
+        user: User,
+        external_id: str | None = None,
+        groups: list[tuple[int, str]] | None = None,
+        scim_username: str | None = None,
+    ) -> ScimUserResource:
+        """Build a SCIM User response from an Onyx User.
+
+        Args:
+            user: The Onyx user model.
+            external_id: The IdP's external identifier for this user.
+            groups: List of ``(group_id, group_name)`` tuples for the
+                ``groups`` read-only attribute. Pass ``None`` or ``[]``
+                for newly-created users.
+            scim_username: The original-case userName from the IdP. Falls
+                back to ``user.email`` (lowercase) when not available.
+        """
+        group_refs = [
+            ScimUserGroupRef(value=str(gid), display=gname)
+            for gid, gname in (groups or [])
+        ]
+
+        # Use original-case userName if stored, otherwise fall back to the
+        # lowercased email from the User model.
+        username = scim_username or user.email
+
+        return ScimUserResource(
+            id=str(user.id),
+            externalId=external_id,
+            userName=username,
+            name=self._build_scim_name(user),
+            displayName=user.personal_name,
+            emails=[ScimEmail(value=username, type="work", primary=True)],
+            active=user.is_active,
+            groups=group_refs,
+            meta=ScimMeta(resourceType="User"),
+        )
+
+    def build_group_resource(
+        self,
+        group: UserGroup,
+        members: list[tuple[UUID, str | None]],
+        external_id: str | None = None,
+    ) -> ScimGroupResource:
+        """Build a SCIM Group response from an Onyx UserGroup."""
+        scim_members = [
+            ScimGroupMember(value=str(uid), display=email) for uid, email in members
+        ]
+        return ScimGroupResource(
+            id=str(group.id),
+            externalId=external_id,
+            displayName=group.name,
+            members=scim_members,
+            meta=ScimMeta(resourceType="Group"),
+        )
+
+    @staticmethod
+    def _build_scim_name(user: User) -> ScimName | None:
+        """Extract SCIM name components from a user's personal name."""
+        if not user.personal_name:
+            return None
+        parts = user.personal_name.split(" ", 1)
+        return ScimName(
+            givenName=parts[0],
+            familyName=parts[1] if len(parts) > 1 else None,
+            formatted=user.personal_name,
+        )
+
+
+def get_default_provider() -> ScimProvider:
+    """Return the default SCIM provider.
+
+    Currently returns ``OktaProvider`` since Okta is the primary supported
+    IdP. When provider detection is added (via token metadata or tenant
+    config), this can be replaced with dynamic resolution.
+    """
+    from ee.onyx.server.scim.providers.okta import OktaProvider
+
+    return OktaProvider()
--- a/backend/ee/onyx/server/scim/providers/okta.py
+++ b/backend/ee/onyx/server/scim/providers/okta.py
@@ -0,0 +1,25 @@
+"""Okta SCIM provider."""
+
+from __future__ import annotations
+
+from ee.onyx.server.scim.providers.base import ScimProvider
+
+
+class OktaProvider(ScimProvider):
+    """Okta SCIM provider.
+
+    Okta behavioral notes:
+      - Uses ``PATCH {"active": false}`` for deprovisioning (not DELETE)
+      - Sends path-less PATCH with value dicts containing extra fields
+        (``id``, ``schemas``)
+      - Expects ``displayName`` and ``groups`` in user responses
+      - Only uses ``eq`` operator for ``userName`` filter
+    """
+
+    @property
+    def name(self) -> str:
+        return "okta"
+
+    @property
+    def ignored_patch_paths(self) -> frozenset[str]:
+        return frozenset({"id", "schemas", "meta"})
--- a/backend/ee/onyx/server/scim/schema_definitions.py
+++ b/backend/ee/onyx/server/scim/schema_definitions.py
@@ -0,0 +1,144 @@
+"""Static SCIM service discovery responses (RFC 7643 §5, §6, §7).
+
+Pre-built at import time — these never change at runtime. Separated from
+api.py to keep the endpoint module focused on request handling.
+"""
+
+from ee.onyx.server.scim.models import SCIM_GROUP_SCHEMA
+from ee.onyx.server.scim.models import SCIM_USER_SCHEMA
+from ee.onyx.server.scim.models import ScimResourceType
+from ee.onyx.server.scim.models import ScimSchemaAttribute
+from ee.onyx.server.scim.models import ScimSchemaDefinition
+from ee.onyx.server.scim.models import ScimServiceProviderConfig
+
+SERVICE_PROVIDER_CONFIG = ScimServiceProviderConfig()
+
+USER_RESOURCE_TYPE = ScimResourceType.model_validate(
+    {
+        "id": "User",
+        "name": "User",
+        "endpoint": "/scim/v2/Users",
+        "description": "SCIM User resource",
+        "schema": SCIM_USER_SCHEMA,
+    }
+)
+
+GROUP_RESOURCE_TYPE = ScimResourceType.model_validate(
+    {
+        "id": "Group",
+        "name": "Group",
+        "endpoint": "/scim/v2/Groups",
+        "description": "SCIM Group resource",
+        "schema": SCIM_GROUP_SCHEMA,
+    }
+)
+
+USER_SCHEMA_DEF = ScimSchemaDefinition(
+    id=SCIM_USER_SCHEMA,
+    name="User",
+    description="SCIM core User schema",
+    attributes=[
+        ScimSchemaAttribute(
+            name="userName",
+            type="string",
+            required=True,
+            uniqueness="server",
+            description="Unique identifier for the user, typically an email address.",
+        ),
+        ScimSchemaAttribute(
+            name="name",
+            type="complex",
+            description="The components of the user's name.",
+            subAttributes=[
+                ScimSchemaAttribute(
+                    name="givenName",
+                    type="string",
+                    description="The user's first name.",
+                ),
+                ScimSchemaAttribute(
+                    name="familyName",
+                    type="string",
+                    description="The user's last name.",
+                ),
+                ScimSchemaAttribute(
+                    name="formatted",
+                    type="string",
+                    description="The full name, including all middle names and titles.",
+                ),
+            ],
+        ),
+        ScimSchemaAttribute(
+            name="emails",
+            type="complex",
+            multiValued=True,
+            description="Email addresses for the user.",
+            subAttributes=[
+                ScimSchemaAttribute(
+                    name="value",
+                    type="string",
+                    description="Email address value.",
+                ),
+                ScimSchemaAttribute(
+                    name="type",
+                    type="string",
+                    description="Label for this email (e.g. 'work').",
+                ),
+                ScimSchemaAttribute(
+                    name="primary",
+                    type="boolean",
+                    description="Whether this is the primary email.",
+                ),
+            ],
+        ),
+        ScimSchemaAttribute(
+            name="active",
+            type="boolean",
+            description="Whether the user account is active.",
+        ),
+        ScimSchemaAttribute(
+            name="externalId",
+            type="string",
+            description="Identifier from the provisioning client (IdP).",
+            caseExact=True,
+        ),
+    ],
+)
+
+GROUP_SCHEMA_DEF = ScimSchemaDefinition(
+    id=SCIM_GROUP_SCHEMA,
+    name="Group",
+    description="SCIM core Group schema",
+    attributes=[
+        ScimSchemaAttribute(
+            name="displayName",
+            type="string",
+            required=True,
+            description="Human-readable name for the group.",
+        ),
+        ScimSchemaAttribute(
+            name="members",
+            type="complex",
+            multiValued=True,
+            description="Members of the group.",
+            subAttributes=[
+                ScimSchemaAttribute(
+                    name="value",
+                    type="string",
+                    description="User ID of the group member.",
+                ),
+                ScimSchemaAttribute(
+                    name="display",
+                    type="string",
+                    mutability="readOnly",
+                    description="Display name of the group member.",
+                ),
+            ],
+        ),
+        ScimSchemaAttribute(
+            name="externalId",
+            type="string",
+            description="Identifier from the provisioning client (IdP).",
+            caseExact=True,
+        ),
+    ],
+)
--- a/backend/ee/onyx/server/settings/api.py
+++ b/backend/ee/onyx/server/settings/api.py
@@ -1,10 +1,13 @@
 """EE Settings API - provides license-aware settings override."""

 from redis.exceptions import RedisError
+from sqlalchemy.exc import SQLAlchemyError

 from ee.onyx.configs.app_configs import LICENSE_ENFORCEMENT_ENABLED
 from ee.onyx.db.license import get_cached_license_metadata
+from ee.onyx.db.license import refresh_license_cache
 from onyx.configs.app_configs import ENTERPRISE_EDITION_ENABLED
+from onyx.db.engine.sql_engine import get_session_with_current_tenant
 from onyx.server.settings.models import ApplicationStatus
 from onyx.server.settings.models import Settings
 from onyx.utils.logger import setup_logger
@@ -41,6 +44,14 @@ def check_ee_features_enabled() -> bool:
    tenant_id = get_current_tenant_id()
    try:
        metadata = get_cached_license_metadata(tenant_id)
+        if not metadata:
+            # Cache miss — warm from DB so cold-start doesn't block EE features
+            try:
+                with get_session_with_current_tenant() as db_session:
+                    metadata = refresh_license_cache(db_session, tenant_id)
+            except SQLAlchemyError as db_error:
+                logger.warning(f"Failed to load license from DB: {db_error}")
+
        if metadata and metadata.status != _BLOCKING_STATUS:
            # Has a valid license (GRACE_PERIOD/PAYMENT_REMINDER still allow EE features)
            return True
@@ -82,6 +93,18 @@ def apply_license_status_to_settings(settings: Settings) -> Settings:
    tenant_id = get_current_tenant_id()
    try:
        metadata = get_cached_license_metadata(tenant_id)
+        if not metadata:
+            # Cache miss (e.g. after TTL expiry). Fall back to DB so
+            # the /settings request doesn't falsely return GATED_ACCESS
+            # while the cache is cold.
+            try:
+                with get_session_with_current_tenant() as db_session:
+                    metadata = refresh_license_cache(db_session, tenant_id)
+            except SQLAlchemyError as db_error:
+                logger.warning(
+                    f"Failed to load license from DB for settings: {db_error}"
+                )
+
        if metadata:
            if metadata.status == _BLOCKING_STATUS:
                settings.application_status = metadata.status
@@ -90,7 +113,7 @@ def apply_license_status_to_settings(settings: Settings) -> Settings:
                # Has a valid license (GRACE_PERIOD/PAYMENT_REMINDER still allow EE features)
                settings.ee_features_enabled = True
        else:
-            # No license found.
+            # No license found in cache or DB.
            if ENTERPRISE_EDITION_ENABLED:
                # Legacy EE flag is set → prior EE usage (e.g. permission
                # syncing) means indexed data may need protection.
--- a/backend/ee/onyx/server/user_group/api.py
+++ b/backend/ee/onyx/server/user_group/api.py
@@ -37,12 +37,15 @@ def list_user_groups(
    db_session: Session = Depends(get_session),
 ) -> list[UserGroup]:
    if user.role == UserRole.ADMIN:
-        user_groups = fetch_user_groups(db_session, only_up_to_date=False)
+        user_groups = fetch_user_groups(
+            db_session, only_up_to_date=False, eager_load_for_snapshot=True
+        )
    else:
        user_groups = fetch_user_groups_for_user(
            db_session=db_session,
            user_id=user.id,
            only_curator_groups=user.role == UserRole.CURATOR,
+            eager_load_for_snapshot=True,
        )
    return [UserGroup.from_model(user_group) for user_group in user_groups]

--- a/backend/ee/onyx/server/user_group/models.py
+++ b/backend/ee/onyx/server/user_group/models.py
@@ -53,7 +53,8 @@ class UserGroup(BaseModel):
                    id=cc_pair_relationship.cc_pair.id,
                    name=cc_pair_relationship.cc_pair.name,
                    connector=ConnectorSnapshot.from_connector_db_model(
-                        cc_pair_relationship.cc_pair.connector
+                        cc_pair_relationship.cc_pair.connector,
+                        credential_ids=[cc_pair_relationship.cc_pair.credential_id],
                    ),
                    credential=CredentialSnapshot.from_credential_db_model(
                        cc_pair_relationship.cc_pair.credential
--- a/backend/onyx/auth/users.py
+++ b/backend/onyx/auth/users.py
@@ -121,6 +121,7 @@ from onyx.db.pat import fetch_user_for_pat
 from onyx.db.users import get_user_by_email
 from onyx.redis.redis_pool import get_async_redis_connection
 from onyx.redis.redis_pool import get_redis_client
+from onyx.server.settings.store import load_settings
 from onyx.server.utils import BasicAuthenticationError
 from onyx.utils.logger import setup_logger
 from onyx.utils.telemetry import mt_cloud_telemetry
@@ -137,6 +138,8 @@ from shared_configs.contextvars import get_current_tenant_id

 logger = setup_logger()

+REGISTER_INVITE_ONLY_CODE = "REGISTER_INVITE_ONLY"
+

 def is_user_admin(user: User) -> bool:
    return user.role == UserRole.ADMIN
@@ -208,22 +211,34 @@ def anonymous_user_enabled(*, tenant_id: str | None = None) -> bool:
    return int(value.decode("utf-8")) == 1


+def workspace_invite_only_enabled() -> bool:
+    settings = load_settings()
+    return settings.invite_only_enabled
+
+
 def verify_email_is_invited(email: str) -> None:
    if AUTH_TYPE in {AuthType.SAML, AuthType.OIDC}:
        # SSO providers manage membership; allow JIT provisioning regardless of invites
        return

-    whitelist = get_invited_users()
-    if not whitelist:
+    if not workspace_invite_only_enabled():
        return

+    whitelist = get_invited_users()
+
    if not email:
-        raise PermissionError("Email must be specified")
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail={"reason": "Email must be specified"},
+        )

    try:
        email_info = validate_email(email, check_deliverability=False)
    except EmailUndeliverableError:
-        raise PermissionError("Email is not valid")
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail={"reason": "Email is not valid"},
+        )

    for email_whitelist in whitelist:
        try:
@@ -240,7 +255,13 @@ def verify_email_is_invited(email: str) -> None:
        if email_info.normalized.lower() == email_info_whitelist.normalized.lower():
            return

-    raise PermissionError("User not on allowed user whitelist")
+    raise HTTPException(
+        status_code=status.HTTP_403_FORBIDDEN,
+        detail={
+            "code": REGISTER_INVITE_ONLY_CODE,
+            "reason": "This workspace is invite-only. Please ask your admin to invite you.",
+        },
+    )


 def verify_email_in_whitelist(email: str, tenant_id: str) -> None:
@@ -256,13 +277,32 @@ def verify_email_domain(email: str) -> None:
            detail="Email is not valid",
        )

-    domain = email.split("@")[-1].lower()
+    local_part, domain = email.split("@")
+    domain = domain.lower()
+
+    if AUTH_TYPE == AuthType.CLOUD:
+        # Normalize googlemail.com to gmail.com (they deliver to the same inbox)
+        if domain == "googlemail.com":
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail={"reason": "Please use @gmail.com instead of @googlemail.com."},
+            )
+
+        if "+" in local_part and domain != "onyx.app":
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail={
+                    "reason": "Email addresses with '+' are not allowed. Please use your base email address."
+                },
+            )

    # Check if email uses a disposable/temporary domain
    if is_disposable_email(email):
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
-            detail="Disposable email addresses are not allowed. Please use a permanent email address.",
+            detail={
+                "reason": "Disposable email addresses are not allowed. Please use a permanent email address."
+            },
        )

    # Check domain whitelist if configured
@@ -1650,7 +1690,10 @@ def get_oauth_router(
        if redirect_url is not None:
            authorize_redirect_url = redirect_url
        else:
-            authorize_redirect_url = str(request.url_for(callback_route_name))
+            # Use WEB_DOMAIN instead of request.url_for() to prevent host
+            # header poisoning — request.url_for() trusts the Host header.
+            callback_path = request.app.url_path_for(callback_route_name)
+            authorize_redirect_url = f"{WEB_DOMAIN}{callback_path}"

        next_url = request.query_params.get("next", "/")

--- a/backend/onyx/background/celery/celery_utils.py
+++ b/backend/onyx/background/celery/celery_utils.py
@@ -1,25 +1,30 @@
 from collections.abc import Generator
 from collections.abc import Iterator
+from collections.abc import Sequence
 from datetime import datetime
 from datetime import timezone
 from pathlib import Path
 from typing import Any
 from typing import cast
+from typing import TypeVar

 import httpx
+from pydantic import BaseModel

 from onyx.configs.app_configs import MAX_PRUNING_DOCUMENT_RETRIEVAL_PER_MINUTE
 from onyx.configs.app_configs import VESPA_REQUEST_TIMEOUT
-from onyx.connectors.connector_runner import batched_doc_ids
+from onyx.connectors.connector_runner import CheckpointOutputWrapper
 from onyx.connectors.cross_connector_utils.rate_limit_wrapper import (
    rate_limit_builder,
 )
 from onyx.connectors.interfaces import BaseConnector
 from onyx.connectors.interfaces import CheckpointedConnector
+from onyx.connectors.interfaces import ConnectorCheckpoint
 from onyx.connectors.interfaces import LoadConnector
 from onyx.connectors.interfaces import PollConnector
 from onyx.connectors.interfaces import SlimConnector
 from onyx.connectors.interfaces import SlimConnectorWithPermSync
+from onyx.connectors.models import ConnectorFailure
 from onyx.connectors.models import Document
 from onyx.connectors.models import HierarchyNode
 from onyx.connectors.models import SlimDocument
@@ -29,63 +34,129 @@ from onyx.utils.logger import setup_logger


 logger = setup_logger()
-PRUNING_CHECKPOINTED_BATCH_SIZE = 32
+
+CT = TypeVar("CT", bound=ConnectorCheckpoint)


-def document_batch_to_ids(
-    doc_batch: (
-        Iterator[list[Document | HierarchyNode]]
-        | Iterator[list[SlimDocument | HierarchyNode]]
-    ),
-) -> Generator[set[str], None, None]:
-    for doc_list in doc_batch:
-        yield {
-            doc.raw_node_id if isinstance(doc, HierarchyNode) else doc.id
-            for doc in doc_list
-        }
+class SlimConnectorExtractionResult(BaseModel):
+    """Result of extracting document IDs and hierarchy nodes from a connector."""
+
+    doc_ids: set[str]
+    hierarchy_nodes: list[HierarchyNode]
+
+
+def _checkpointed_batched_items(
+    connector: CheckpointedConnector[CT],
+    start: float,
+    end: float,
+) -> Generator[list[Document | HierarchyNode | ConnectorFailure], None, None]:
+    """Loop through all checkpoint steps and yield batched items.
+
+    Some checkpointed connectors (e.g. IMAP) are multi-step: the first
+    checkpoint call may only initialize internal state without yielding
+    any documents. This function loops until checkpoint.has_more is False
+    to ensure all items are collected across every step.
+    """
+    checkpoint = connector.build_dummy_checkpoint()
+    while True:
+        checkpoint_output = connector.load_from_checkpoint(
+            start=start, end=end, checkpoint=checkpoint
+        )
+        wrapper: CheckpointOutputWrapper[CT] = CheckpointOutputWrapper()
+        batch: list[Document | HierarchyNode | ConnectorFailure] = []
+        for document, hierarchy_node, failure, next_checkpoint in wrapper(
+            checkpoint_output
+        ):
+            if document is not None:
+                batch.append(document)
+            elif hierarchy_node is not None:
+                batch.append(hierarchy_node)
+            elif failure is not None:
+                batch.append(failure)
+
+            if next_checkpoint is not None:
+                checkpoint = next_checkpoint
+
+        if batch:
+            yield batch
+
+        if not checkpoint.has_more:
+            break
+
+
+def _get_failure_id(failure: ConnectorFailure) -> str | None:
+    """Extract the document/entity ID from a ConnectorFailure."""
+    if failure.failed_document:
+        return failure.failed_document.document_id
+    if failure.failed_entity:
+        return failure.failed_entity.entity_id
+    return None
+
+
+def _extract_from_batch(
+    doc_list: Sequence[Document | SlimDocument | HierarchyNode | ConnectorFailure],
+) -> tuple[set[str], list[HierarchyNode]]:
+    """Separate a batch into document IDs and hierarchy nodes.
+
+    ConnectorFailure items have their failed document/entity IDs added to the
+    ID set so that failed-to-retrieve documents are not accidentally pruned.
+    """
+    ids: set[str] = set()
+    hierarchy_nodes: list[HierarchyNode] = []
+    for item in doc_list:
+        if isinstance(item, HierarchyNode):
+            hierarchy_nodes.append(item)
+            ids.add(item.raw_node_id)
+        elif isinstance(item, ConnectorFailure):
+            failed_id = _get_failure_id(item)
+            if failed_id:
+                ids.add(failed_id)
+            logger.warning(
+                f"Failed to retrieve document {failed_id}: " f"{item.failure_message}"
+            )
+        else:
+            ids.add(item.id)
+    return ids, hierarchy_nodes


 def extract_ids_from_runnable_connector(
    runnable_connector: BaseConnector,
    callback: IndexingHeartbeatInterface | None = None,
-) -> set[str]:
+) -> SlimConnectorExtractionResult:
    """
-    If the given connector is neither a SlimConnector nor a SlimConnectorWithPermSync, just pull
-    all docs using the load_from_state and grab out the IDs.
+    Extract document IDs and hierarchy nodes from a runnable connector.
+
+    Hierarchy nodes yielded alongside documents/slim docs are collected and
+    returned in the result. ConnectorFailure items have their IDs preserved
+    so that failed-to-retrieve documents are not accidentally pruned.

    Optionally, a callback can be passed to handle the length of each document batch.
    """
    all_connector_doc_ids: set[str] = set()
+    all_hierarchy_nodes: list[HierarchyNode] = []
+
+    # Sequence (covariant) lets all the specific list[...] iterator types unify here
+    raw_batch_generator: (
+        Iterator[Sequence[Document | SlimDocument | HierarchyNode | ConnectorFailure]]
+        | None
+    ) = None

-    doc_batch_id_generator = None
    if isinstance(runnable_connector, SlimConnector):
-        doc_batch_id_generator = document_batch_to_ids(
-            runnable_connector.retrieve_all_slim_docs()
-        )
+        raw_batch_generator = runnable_connector.retrieve_all_slim_docs()
    elif isinstance(runnable_connector, SlimConnectorWithPermSync):
-        doc_batch_id_generator = document_batch_to_ids(
-            runnable_connector.retrieve_all_slim_docs_perm_sync()
-        )
+        raw_batch_generator = runnable_connector.retrieve_all_slim_docs_perm_sync()
    # If the connector isn't slim, fall back to running it normally to get ids
    elif isinstance(runnable_connector, LoadConnector):
-        doc_batch_id_generator = document_batch_to_ids(
-            runnable_connector.load_from_state()
-        )
+        raw_batch_generator = runnable_connector.load_from_state()
    elif isinstance(runnable_connector, PollConnector):
        start = datetime(1970, 1, 1, tzinfo=timezone.utc).timestamp()
        end = datetime.now(timezone.utc).timestamp()
-        doc_batch_id_generator = document_batch_to_ids(
-            runnable_connector.poll_source(start=start, end=end)
-        )
+        raw_batch_generator = runnable_connector.poll_source(start=start, end=end)
    elif isinstance(runnable_connector, CheckpointedConnector):
        start = datetime(1970, 1, 1, tzinfo=timezone.utc).timestamp()
        end = datetime.now(timezone.utc).timestamp()
-        checkpoint = runnable_connector.build_dummy_checkpoint()
-        checkpoint_generator = runnable_connector.load_from_checkpoint(
-            start=start, end=end, checkpoint=checkpoint
-        )
-        doc_batch_id_generator = batched_doc_ids(
-            checkpoint_generator, batch_size=PRUNING_CHECKPOINTED_BATCH_SIZE
+        raw_batch_generator = _checkpointed_batched_items(
+            runnable_connector, start, end
        )
    else:
        raise RuntimeError("Pruning job could not find a valid runnable_connector.")
@@ -99,19 +170,24 @@ def extract_ids_from_runnable_connector(
        else lambda x: x
    )

-    for doc_batch_ids in doc_batch_id_generator:
-        if callback:
-            if callback.should_stop():
-                raise RuntimeError(
-                    "extract_ids_from_runnable_connector: Stop signal detected"
-                )
+    # process raw batches to extract both IDs and hierarchy nodes
+    for doc_list in raw_batch_generator:
+        if callback and callback.should_stop():
+            raise RuntimeError(
+                "extract_ids_from_runnable_connector: Stop signal detected"
+            )

-        all_connector_doc_ids.update(doc_batch_processing_func(doc_batch_ids))
+        batch_ids, batch_nodes = _extract_from_batch(doc_list)
+        all_connector_doc_ids.update(doc_batch_processing_func(batch_ids))
+        all_hierarchy_nodes.extend(batch_nodes)

        if callback:
-            callback.progress("extract_ids_from_runnable_connector", len(doc_batch_ids))
+            callback.progress("extract_ids_from_runnable_connector", len(batch_ids))

-    return all_connector_doc_ids
+    return SlimConnectorExtractionResult(
+        doc_ids=all_connector_doc_ids,
+        hierarchy_nodes=all_hierarchy_nodes,
+    )


 def celery_is_listening_to_queue(worker: Any, name: str) -> bool:
--- a/backend/onyx/background/celery/tasks/connector_deletion/init.py
+++ b/backend/onyx/background/celery/tasks/connector_deletion/init.py
--- a/backend/onyx/background/celery/tasks/docfetching/init.py
+++ b/backend/onyx/background/celery/tasks/docfetching/init.py
--- a/backend/onyx/background/celery/tasks/docprocessing/init.py
+++ b/backend/onyx/background/celery/tasks/docprocessing/init.py
--- a/backend/onyx/background/celery/tasks/evals/init.py
+++ b/backend/onyx/background/celery/tasks/evals/init.py
--- a/backend/onyx/background/celery/tasks/hierarchyfetching/init.py
+++ b/backend/onyx/background/celery/tasks/hierarchyfetching/init.py
@@ -1,10 +0,0 @@
-"""Celery tasks for hierarchy fetching."""
-
-from onyx.background.celery.tasks.hierarchyfetching.tasks import (  # noqa: F401
-    check_for_hierarchy_fetching,
-)
-from onyx.background.celery.tasks.hierarchyfetching.tasks import (  # noqa: F401
-    connector_hierarchy_fetching_task,
-)
-
-__all__ = ["check_for_hierarchy_fetching", "connector_hierarchy_fetching_task"]
--- a/backend/onyx/background/celery/tasks/llm_model_update/init.py
+++ b/backend/onyx/background/celery/tasks/llm_model_update/init.py
--- a/backend/onyx/background/celery/tasks/monitoring/init.py
+++ b/backend/onyx/background/celery/tasks/monitoring/init.py
--- a/backend/onyx/background/celery/tasks/opensearch_migration/init.py
+++ b/backend/onyx/background/celery/tasks/opensearch_migration/init.py
--- a/backend/onyx/background/celery/tasks/opensearch_migration/constants.py
+++ b/backend/onyx/background/celery/tasks/opensearch_migration/constants.py
@@ -41,3 +41,14 @@ assert (
 CHECK_FOR_DOCUMENTS_TASK_LOCK_BLOCKING_TIMEOUT_S = 30  # 30 seconds.

 TOTAL_ALLOWABLE_DOC_MIGRATION_ATTEMPTS_BEFORE_PERMANENT_FAILURE = 15
+
+# WARNING: Do not change these values without knowing what changes also need to
+# be made to OpenSearchTenantMigrationRecord.
+GET_VESPA_CHUNKS_PAGE_SIZE = 500
+GET_VESPA_CHUNKS_SLICE_COUNT = 4
+
+# String used to indicate in the vespa_visit_continuation_token mapping that the
+# slice has finished and there is nothing left to visit.
+FINISHED_VISITING_SLICE_CONTINUATION_TOKEN = (
+    "FINISHED_VISITING_SLICE_CONTINUATION_TOKEN"
+)
--- a/backend/onyx/background/celery/tasks/opensearch_migration/tasks.py
+++ b/backend/onyx/background/celery/tasks/opensearch_migration/tasks.py
@@ -8,6 +8,12 @@ from celery import Task
 from redis.lock import Lock as RedisLock

 from onyx.background.celery.apps.app_base import task_logger
+from onyx.background.celery.tasks.opensearch_migration.constants import (
+    FINISHED_VISITING_SLICE_CONTINUATION_TOKEN,
+)
+from onyx.background.celery.tasks.opensearch_migration.constants import (
+    GET_VESPA_CHUNKS_PAGE_SIZE,
+)
 from onyx.background.celery.tasks.opensearch_migration.constants import (
    MIGRATION_TASK_LOCK_BLOCKING_TIMEOUT_S,
 )
@@ -47,7 +53,13 @@ from shared_configs.configs import MULTI_TENANT
 from shared_configs.contextvars import get_current_tenant_id


-GET_VESPA_CHUNKS_PAGE_SIZE = 1000
+def is_continuation_token_done_for_all_slices(
+    continuation_token_map: dict[int, str | None],
+) -> bool:
+    return all(
+        continuation_token == FINISHED_VISITING_SLICE_CONTINUATION_TOKEN
+        for continuation_token in continuation_token_map.values()
+    )


 # shared_task allows this task to be shared across celery app instances.
@@ -76,11 +88,15 @@ def migrate_chunks_from_vespa_to_opensearch_task(

    Uses Vespa's Visit API to iterate through ALL chunks in bulk (not
    per-document), transform them, and index them into OpenSearch. Progress is
-    tracked via a continuation token stored in the
+    tracked via a continuation token map stored in the
    OpenSearchTenantMigrationRecord.

-    The first time we see no continuation token and non-zero chunks migrated, we
-    consider the migration complete and all subsequent invocations are no-ops.
+    The first time we see no continuation token map and non-zero chunks
+    migrated, we consider the migration complete and all subsequent invocations
+    are no-ops.
+
+    We divide the index into GET_VESPA_CHUNKS_SLICE_COUNT independent slices
+    where progress is tracked for each slice.

    Returns:
        None if OpenSearch migration is not enabled, or if the lock could not be
@@ -153,15 +169,28 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                f"in {time.monotonic() - sanitized_doc_start_time:.3f} seconds."
            )

+            approx_chunk_count_in_vespa: int | None = None
+            get_chunk_count_start_time = time.monotonic()
+            try:
+                approx_chunk_count_in_vespa = vespa_document_index.get_chunk_count()
+            except Exception:
+                task_logger.exception(
+                    "Error getting approximate chunk count in Vespa. Moving on..."
+                )
+            task_logger.debug(
+                f"Took {time.monotonic() - get_chunk_count_start_time:.3f} seconds to attempt to get "
+                f"approximate chunk count in Vespa. Got {approx_chunk_count_in_vespa}."
+            )
+
            while (
                time.monotonic() - task_start_time < MIGRATION_TASK_SOFT_TIME_LIMIT_S
                and lock.owned()
            ):
                (
-                    continuation_token,
+                    continuation_token_map,
                    total_chunks_migrated,
                ) = get_vespa_visit_state(db_session)
-                if continuation_token is None and total_chunks_migrated > 0:
+                if is_continuation_token_done_for_all_slices(continuation_token_map):
                    task_logger.info(
                        f"OpenSearch migration COMPLETED for tenant {tenant_id}. "
                        f"Total chunks migrated: {total_chunks_migrated}."
@@ -170,19 +199,19 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                    break
                task_logger.debug(
                    f"Read the tenant migration record. Total chunks migrated: {total_chunks_migrated}. "
-                    f"Continuation token: {continuation_token}"
+                    f"Continuation token map: {continuation_token_map}"
                )

                get_vespa_chunks_start_time = time.monotonic()
-                raw_vespa_chunks, next_continuation_token = (
+                raw_vespa_chunks, next_continuation_token_map = (
                    vespa_document_index.get_all_raw_document_chunks_paginated(
-                        continuation_token=continuation_token,
+                        continuation_token_map=continuation_token_map,
                        page_size=GET_VESPA_CHUNKS_PAGE_SIZE,
                    )
                )
                task_logger.debug(
                    f"Read {len(raw_vespa_chunks)} chunks from Vespa in {time.monotonic() - get_vespa_chunks_start_time:.3f} "
-                    f"seconds. Next continuation token: {next_continuation_token}"
+                    f"seconds. Next continuation token map: {next_continuation_token_map}"
                )

                opensearch_document_chunks, errored_chunks = (
@@ -212,14 +241,11 @@ def migrate_chunks_from_vespa_to_opensearch_task(
                total_chunks_errored_this_task += len(errored_chunks)
                update_vespa_visit_progress_with_commit(
                    db_session,
-                    continuation_token=next_continuation_token,
+                    continuation_token_map=next_continuation_token_map,
                    chunks_processed=len(opensearch_document_chunks),
                    chunks_errored=len(errored_chunks),
+                    approx_chunk_count_in_vespa=approx_chunk_count_in_vespa,
                )
-
-                if next_continuation_token is None and len(raw_vespa_chunks) == 0:
-                    task_logger.info("Vespa reported no more chunks to migrate.")
-                    break
    except Exception:
        traceback.print_exc()
        task_logger.exception("Error in the OpenSearch migration task.")
--- a/backend/onyx/background/celery/tasks/opensearch_migration/transformer.py
+++ b/backend/onyx/background/celery/tasks/opensearch_migration/transformer.py
@@ -37,6 +37,35 @@ from shared_configs.configs import MULTI_TENANT
 logger = setup_logger(__name__)


+FIELDS_NEEDED_FOR_TRANSFORMATION: list[str] = [
+    DOCUMENT_ID,
+    CHUNK_ID,
+    TITLE,
+    TITLE_EMBEDDING,
+    CONTENT,
+    EMBEDDINGS,
+    SOURCE_TYPE,
+    METADATA_LIST,
+    DOC_UPDATED_AT,
+    HIDDEN,
+    BOOST,
+    SEMANTIC_IDENTIFIER,
+    IMAGE_FILE_NAME,
+    SOURCE_LINKS,
+    BLURB,
+    DOC_SUMMARY,
+    CHUNK_CONTEXT,
+    METADATA_SUFFIX,
+    DOCUMENT_SETS,
+    USER_PROJECT,
+    PRIMARY_OWNERS,
+    SECONDARY_OWNERS,
+    ACCESS_CONTROL_LIST,
+]
+if MULTI_TENANT:
+    FIELDS_NEEDED_FOR_TRANSFORMATION.append(TENANT_ID)
+
+
 def _extract_content_vector(embeddings: Any) -> list[float]:
    """Extracts the full chunk embedding vector from Vespa's embeddings tensor.

--- a/backend/onyx/background/celery/tasks/pruning/init.py
+++ b/backend/onyx/background/celery/tasks/pruning/init.py
@@ -1,8 +0,0 @@
-"""Celery tasks for connector pruning."""
-
-from onyx.background.celery.tasks.pruning.tasks import check_for_pruning  # noqa: F401
-from onyx.background.celery.tasks.pruning.tasks import (  # noqa: F401
-    connector_pruning_generator_task,
-)
-
-__all__ = ["check_for_pruning", "connector_pruning_generator_task"]
--- a/backend/onyx/background/celery/tasks/pruning/tasks.py
+++ b/backend/onyx/background/celery/tasks/pruning/tasks.py
@@ -43,9 +43,11 @@ from onyx.db.connector_credential_pair import get_connector_credential_pair_from
 from onyx.db.connector_credential_pair import get_connector_credential_pairs
 from onyx.db.document import get_documents_for_connector_credential_pair
 from onyx.db.engine.sql_engine import get_session_with_current_tenant
+from onyx.db.enums import AccessType
 from onyx.db.enums import ConnectorCredentialPairStatus
 from onyx.db.enums import SyncStatus
 from onyx.db.enums import SyncType
+from onyx.db.hierarchy import upsert_hierarchy_nodes_batch
 from onyx.db.models import ConnectorCredentialPair
 from onyx.db.sync_record import insert_sync_record
 from onyx.db.sync_record import update_sync_record_status
@@ -53,6 +55,9 @@ from onyx.db.tag import delete_orphan_tags__no_commit
 from onyx.redis.redis_connector import RedisConnector
 from onyx.redis.redis_connector_prune import RedisConnectorPrune
 from onyx.redis.redis_connector_prune import RedisConnectorPrunePayload
+from onyx.redis.redis_hierarchy import cache_hierarchy_nodes_batch
+from onyx.redis.redis_hierarchy import ensure_source_node_exists
+from onyx.redis.redis_hierarchy import HierarchyNodeCacheEntry
 from onyx.redis.redis_pool import get_redis_client
 from onyx.redis.redis_pool import get_redis_replica_client
 from onyx.server.runtime.onyx_runtime import OnyxRuntime
@@ -526,10 +531,44 @@ def connector_pruning_generator_task(
                timeout_seconds=JOB_TIMEOUT,
            )

-            # a list of docs in the source
-            all_connector_doc_ids: set[str] = extract_ids_from_runnable_connector(
+            # Extract docs and hierarchy nodes from the source
+            extraction_result = extract_ids_from_runnable_connector(
                runnable_connector, callback
            )
+            all_connector_doc_ids = extraction_result.doc_ids
+
+            # Process hierarchy nodes (same as docfetching):
+            # upsert to Postgres and cache in Redis
+            if extraction_result.hierarchy_nodes:
+                is_connector_public = cc_pair.access_type == AccessType.PUBLIC
+
+                redis_client = get_redis_client(tenant_id=tenant_id)
+                ensure_source_node_exists(
+                    redis_client, db_session, cc_pair.connector.source
+                )
+
+                upserted_nodes = upsert_hierarchy_nodes_batch(
+                    db_session=db_session,
+                    nodes=extraction_result.hierarchy_nodes,
+                    source=cc_pair.connector.source,
+                    commit=True,
+                    is_connector_public=is_connector_public,
+                )
+
+                cache_entries = [
+                    HierarchyNodeCacheEntry.from_db_model(node)
+                    for node in upserted_nodes
+                ]
+                cache_hierarchy_nodes_batch(
+                    redis_client=redis_client,
+                    source=cc_pair.connector.source,
+                    entries=cache_entries,
+                )
+
+                task_logger.info(
+                    f"Pruning: persisted and cached {len(extraction_result.hierarchy_nodes)} "
+                    f"hierarchy nodes for cc_pair={cc_pair_id}"
+                )

            # a list of docs in our local index
            all_indexed_document_ids = {
--- a/backend/onyx/background/celery/tasks/shared/init.py
+++ b/backend/onyx/background/celery/tasks/shared/init.py
--- a/backend/onyx/background/celery/tasks/user_file_processing/init.py
+++ b/backend/onyx/background/celery/tasks/user_file_processing/init.py
--- a/backend/onyx/background/celery/tasks/user_file_processing/tasks.py
+++ b/backend/onyx/background/celery/tasks/user_file_processing/tasks.py
@@ -13,6 +13,7 @@ from sqlalchemy import select
 from sqlalchemy.orm import Session

 from onyx.background.celery.apps.app_base import task_logger
+from onyx.background.celery.celery_redis import celery_get_queue_length
 from onyx.background.celery.celery_utils import httpx_init_vespa_pool
 from onyx.background.celery.tasks.shared.RetryDocumentIndex import RetryDocumentIndex
 from onyx.configs.app_configs import DISABLE_VECTOR_DB
@@ -21,12 +22,14 @@ from onyx.configs.app_configs import VESPA_CLOUD_CERT_PATH
 from onyx.configs.app_configs import VESPA_CLOUD_KEY_PATH
 from onyx.configs.constants import CELERY_GENERIC_BEAT_LOCK_TIMEOUT
 from onyx.configs.constants import CELERY_USER_FILE_PROCESSING_LOCK_TIMEOUT
+from onyx.configs.constants import CELERY_USER_FILE_PROCESSING_TASK_EXPIRES
 from onyx.configs.constants import CELERY_USER_FILE_PROJECT_SYNC_LOCK_TIMEOUT
 from onyx.configs.constants import DocumentSource
 from onyx.configs.constants import OnyxCeleryPriority
 from onyx.configs.constants import OnyxCeleryQueues
 from onyx.configs.constants import OnyxCeleryTask
 from onyx.configs.constants import OnyxRedisLocks
+from onyx.configs.constants import USER_FILE_PROCESSING_MAX_QUEUE_DEPTH
 from onyx.connectors.file.connector import LocalFileConnector
 from onyx.connectors.models import Document
 from onyx.connectors.models import HierarchyNode
@@ -57,6 +60,17 @@ def _user_file_lock_key(user_file_id: str | UUID) -> str:
    return f"{OnyxRedisLocks.USER_FILE_PROCESSING_LOCK_PREFIX}:{user_file_id}"


+def _user_file_queued_key(user_file_id: str | UUID) -> str:
+    """Key that exists while a process_single_user_file task is sitting in the queue.
+
+    The beat generator sets this with a TTL equal to CELERY_USER_FILE_PROCESSING_TASK_EXPIRES
+    before enqueuing and the worker deletes it as its first action.  This prevents
+    the beat from adding duplicate tasks for files that already have a live task
+    in flight.
+    """
+    return f"{OnyxRedisLocks.USER_FILE_QUEUED_PREFIX}:{user_file_id}"
+
+
 def _user_file_project_sync_lock_key(user_file_id: str | UUID) -> str:
    return f"{OnyxRedisLocks.USER_FILE_PROJECT_SYNC_LOCK_PREFIX}:{user_file_id}"

@@ -120,7 +134,24 @@ def _get_document_chunk_count(
 def check_user_file_processing(self: Task, *, tenant_id: str) -> None:
    """Scan for user files with PROCESSING status and enqueue per-file tasks.

-    Uses direct Redis locks to avoid overlapping runs.
+    Three mechanisms prevent queue runaway:
+
+    1. **Queue depth backpressure** – if the broker queue already has more than
+       USER_FILE_PROCESSING_MAX_QUEUE_DEPTH items we skip this beat cycle
+       entirely.  Workers are clearly behind; adding more tasks would only make
+       the backlog worse.
+
+    2. **Per-file queued guard** – before enqueuing a task we set a short-lived
+       Redis key (TTL = CELERY_USER_FILE_PROCESSING_TASK_EXPIRES).  If that key
+       already exists the file already has a live task in the queue, so we skip
+       it.  The worker deletes the key the moment it picks up the task so the
+       next beat cycle can re-enqueue if the file is still PROCESSING.
+
+    3. **Task expiry** – every enqueued task carries an `expires` value equal to
+       CELERY_USER_FILE_PROCESSING_TASK_EXPIRES.  If a task is still sitting in
+       the queue after that deadline, Celery discards it without touching the DB.
+       This is a belt-and-suspenders defence: even if the guard key is lost (e.g.
+       Redis restart), stale tasks evict themselves rather than piling up forever.
    """
    task_logger.info("check_user_file_processing - Starting")

@@ -135,7 +166,21 @@ def check_user_file_processing(self: Task, *, tenant_id: str) -> None:
        return None

    enqueued = 0
+    skipped_guard = 0
    try:
+        # --- Protection 1: queue depth backpressure ---
+        r_celery = self.app.broker_connection().channel().client  # type: ignore
+        queue_len = celery_get_queue_length(
+            OnyxCeleryQueues.USER_FILE_PROCESSING, r_celery
+        )
+        if queue_len > USER_FILE_PROCESSING_MAX_QUEUE_DEPTH:
+            task_logger.warning(
+                f"check_user_file_processing - Queue depth {queue_len} exceeds "
+                f"{USER_FILE_PROCESSING_MAX_QUEUE_DEPTH}, skipping enqueue for "
+                f"tenant={tenant_id}"
+            )
+            return None
+
        with get_session_with_current_tenant() as db_session:
            user_file_ids = (
                db_session.execute(
@@ -148,12 +193,35 @@ def check_user_file_processing(self: Task, *, tenant_id: str) -> None:
            )

            for user_file_id in user_file_ids:
-                self.app.send_task(
-                    OnyxCeleryTask.PROCESS_SINGLE_USER_FILE,
-                    kwargs={"user_file_id": str(user_file_id), "tenant_id": tenant_id},
-                    queue=OnyxCeleryQueues.USER_FILE_PROCESSING,
-                    priority=OnyxCeleryPriority.HIGH,
+                # --- Protection 2: per-file queued guard ---
+                queued_key = _user_file_queued_key(user_file_id)
+                guard_set = redis_client.set(
+                    queued_key,
+                    1,
+                    ex=CELERY_USER_FILE_PROCESSING_TASK_EXPIRES,
+                    nx=True,
                )
+                if not guard_set:
+                    skipped_guard += 1
+                    continue
+
+                # --- Protection 3: task expiry ---
+                # If task submission fails, clear the guard immediately so the
+                # next beat cycle can retry enqueuing this file.
+                try:
+                    self.app.send_task(
+                        OnyxCeleryTask.PROCESS_SINGLE_USER_FILE,
+                        kwargs={
+                            "user_file_id": str(user_file_id),
+                            "tenant_id": tenant_id,
+                        },
+                        queue=OnyxCeleryQueues.USER_FILE_PROCESSING,
+                        priority=OnyxCeleryPriority.HIGH,
+                        expires=CELERY_USER_FILE_PROCESSING_TASK_EXPIRES,
+                    )
+                except Exception:
+                    redis_client.delete(queued_key)
+                    raise
                enqueued += 1

    finally:
@@ -161,7 +229,8 @@ def check_user_file_processing(self: Task, *, tenant_id: str) -> None:
            lock.release()

    task_logger.info(
-        f"check_user_file_processing - Enqueued {enqueued} tasks for tenant={tenant_id}"
+        f"check_user_file_processing - Enqueued {enqueued} skipped_guard={skipped_guard} "
+        f"tasks for tenant={tenant_id}"
    )
    return None

@@ -304,6 +373,12 @@ def process_single_user_file(
    start = time.monotonic()

    redis_client = get_redis_client(tenant_id=tenant_id)
+
+    # Clear the "queued" guard set by the beat generator so that the next beat
+    # cycle can re-enqueue this file if it is still in PROCESSING state after
+    # this task completes or fails.
+    redis_client.delete(_user_file_queued_key(user_file_id))
+
    file_lock: RedisLock = redis_client.lock(
        _user_file_lock_key(user_file_id),
        timeout=CELERY_USER_FILE_PROCESSING_LOCK_TIMEOUT,
--- a/backend/onyx/background/celery/tasks/vespa/init.py
+++ b/backend/onyx/background/celery/tasks/vespa/init.py
--- a/backend/onyx/chat/COMPRESSION.md
+++ b/backend/onyx/chat/COMPRESSION.md
@@ -9,10 +9,8 @@ Summaries are stored as `ChatMessage` records with two key fields:
 - `parent_message_id` → last message when compression triggered (places summary in the tree)
 - `last_summarized_message_id` → pointer to an older message up the chain (the cutoff). Messages after this are kept verbatim.

-**Why store summary as a separate message?** If we embedded the summary in the `last_summarized_message_id` message itself, that message would contain context from messages that came after it—context that doesn't exist in other branches. By creating the summary as a new message attached to the branch tip, it only applies to the specific branch where compression occurred.
-
-### Timestamp-Based Ordering
-Messages are filtered by `time_sent` (not ID) so the logic remains intact if IDs are changed to UUIDs in the future.
+**Why store summary as a separate message?** If we embedded the summary in the `last_summarized_message_id` message itself, that message would contain context from messages that came after it—context that doesn't exist in other branches. By creating the summary as a new message attached to the branch tip, it only applies to the specific branch where compression occurred. It's only back-pointed to by the
+branch which it applies to. All of this is necessary because we keep the last few messages verbatim and also to support branching logic.

 ### Progressive Summarization
 Subsequent compressions incorporate the existing summary text + new messages, preventing information loss in very long conversations.
@@ -26,10 +24,11 @@ Context window breakdown:
 - `max_context_tokens` — LLM's total context window
 - `reserved_tokens` — space for system prompt, tools, files, etc.
 - Available for chat history = `max_context_tokens - reserved_tokens`
+Note: If there is a lot of reserved tokens, chat compression may happen fairly frequently which is costly, slow, and leads to a bad user experience. Possible area of future improvement.

 Configurable ratios:
 - `COMPRESSION_TRIGGER_RATIO` (default 0.75) — compress when chat history exceeds this ratio of available space
- `RECENT_MESSAGES_RATIO` (default 0.25) — portion of chat history to keep verbatim when compressing
+- `RECENT_MESSAGES_RATIO` (default 0.2) — portion of chat history to keep verbatim when compressing

 ## Flow

--- a/backend/onyx/chat/chat_utils.py
+++ b/backend/onyx/chat/chat_utils.py
@@ -1,3 +1,4 @@
+import json
 import re
 from collections.abc import Callable
 from typing import cast
@@ -45,6 +46,7 @@ from onyx.utils.timing import log_function_time


 logger = setup_logger()
+IMAGE_GENERATION_TOOL_NAME = "generate_image"


 def create_chat_session_from_request(
@@ -422,6 +424,40 @@ def convert_chat_history_basic(
    return list(reversed(trimmed_reversed))


+def _build_tool_call_response_history_message(
+    tool_name: str,
+    generated_images: list[dict] | None,
+    tool_call_response: str | None,
+) -> str:
+    if tool_name != IMAGE_GENERATION_TOOL_NAME:
+        return TOOL_CALL_RESPONSE_CROSS_MESSAGE
+
+    if generated_images:
+        llm_image_context: list[dict[str, str]] = []
+        for image in generated_images:
+            file_id = image.get("file_id")
+            revised_prompt = image.get("revised_prompt")
+            if not isinstance(file_id, str):
+                continue
+
+            llm_image_context.append(
+                {
+                    "file_id": file_id,
+                    "revised_prompt": (
+                        revised_prompt if isinstance(revised_prompt, str) else ""
+                    ),
+                }
+            )
+
+        if llm_image_context:
+            return json.dumps(llm_image_context)
+
+    if tool_call_response:
+        return tool_call_response
+
+    return TOOL_CALL_RESPONSE_CROSS_MESSAGE
+
+
 def convert_chat_history(
    chat_history: list[ChatMessage],
    files: list[ChatLoadedFile],
@@ -582,10 +618,24 @@ def convert_chat_history(

                    # Add TOOL_CALL_RESPONSE messages for each tool call in this turn
                    for tool_call in turn_tool_calls:
+                        tool_name = tool_id_to_name_map.get(
+                            tool_call.tool_id, "unknown"
+                        )
+                        tool_response_message = (
+                            _build_tool_call_response_history_message(
+                                tool_name=tool_name,
+                                generated_images=tool_call.generated_images,
+                                tool_call_response=tool_call.tool_call_response,
+                            )
+                        )
                        simple_messages.append(
                            ChatMessageSimple(
-                                message=TOOL_CALL_RESPONSE_CROSS_MESSAGE,
-                                token_count=20,  # Tiny overestimate
+                                message=tool_response_message,
+                                token_count=(
+                                    token_counter(tool_response_message)
+                                    if tool_name == IMAGE_GENERATION_TOOL_NAME
+                                    else 20
+                                ),
                                message_type=MessageType.TOOL_CALL_RESPONSE,
                                tool_call_id=tool_call.tool_call_id,
                                image_files=None,
--- a/backend/onyx/chat/compression.py
+++ b/backend/onyx/chat/compression.py
@@ -17,20 +17,26 @@ from onyx.configs.chat_configs import COMPRESSION_TRIGGER_RATIO
 from onyx.configs.constants import MessageType
 from onyx.db.models import ChatMessage
 from onyx.llm.interfaces import LLM
+from onyx.llm.models import AssistantMessage
+from onyx.llm.models import ChatCompletionMessage
 from onyx.llm.models import SystemMessage
 from onyx.llm.models import UserMessage
 from onyx.natural_language_processing.utils import get_tokenizer
-from onyx.prompts.compression_prompts import PROGRESSIVE_SUMMARY_PROMPT
+from onyx.prompts.compression_prompts import PROGRESSIVE_SUMMARY_SYSTEM_PROMPT_BLOCK
 from onyx.prompts.compression_prompts import PROGRESSIVE_USER_REMINDER
 from onyx.prompts.compression_prompts import SUMMARIZATION_CUTOFF_MARKER
 from onyx.prompts.compression_prompts import SUMMARIZATION_PROMPT
-from onyx.prompts.compression_prompts import USER_FINAL_REMINDER
+from onyx.prompts.compression_prompts import USER_REMINDER
+from onyx.tracing.framework.create import ensure_trace
+from onyx.tracing.llm_utils import llm_generation_span
+from onyx.tracing.llm_utils import record_llm_response
 from onyx.utils.logger import setup_logger
+from shared_configs.contextvars import get_current_tenant_id

 logger = setup_logger()

 # Ratio of available context to allocate for recent messages after compression
-RECENT_MESSAGES_RATIO = 0.25
+RECENT_MESSAGES_RATIO = 0.2


 class CompressionResult(BaseModel):
@@ -187,6 +193,11 @@ def get_messages_to_summarize(
        recent_messages.insert(0, msg)
        tokens_used += msg_tokens

+    # Ensure cutoff is right before a user message by moving any leading
+    # non-user messages from recent_messages to older_messages
+    while recent_messages and recent_messages[0].message_type != MessageType.USER:
+        recent_messages.pop(0)
+
    # Everything else gets summarized
    recent_ids = {m.id for m in recent_messages}
    older_messages = [m for m in messages if m.id not in recent_ids]
@@ -196,31 +207,47 @@ def get_messages_to_summarize(
    )


-def format_messages_for_summary(
+def _build_llm_messages_for_summarization(
    messages: list[ChatMessage],
    tool_id_to_name: dict[int, str],
-) -> str:
-    """Format messages into a string for the summarization prompt.
+) -> list[UserMessage | AssistantMessage]:
+    """Convert ChatMessage objects to LLM message format for summarization.

-    Tool call messages are formatted compactly to save tokens.
+    This is intentionally different from translate_history_to_llm_format in llm_step.py:
+    - Compacts tool calls to "[Used tools: tool1, tool2]" to save tokens in summaries
+    - Skips TOOL_CALL_RESPONSE messages entirely (tool usage captured in assistant message)
+    - No image/multimodal handling (summaries are text-only)
+    - No caching or LLMConfig-specific behavior needed
    """
-    formatted = []
+    result: list[UserMessage | AssistantMessage] = []
+
    for msg in messages:
-        # Format assistant messages with tool calls compactly
-        if msg.message_type == MessageType.ASSISTANT and msg.tool_calls:
-            tool_names = [
-                tool_id_to_name.get(tc.tool_id, "unknown") for tc in msg.tool_calls
-            ]
-            formatted.append(f"[assistant used tools: {', '.join(tool_names)}]")
+        # Skip empty messages
+        if not msg.message:
+            continue
+
+        # Handle assistant messages with tool calls compactly
+        if msg.message_type == MessageType.ASSISTANT:
+            if msg.tool_calls:
+                tool_names = [
+                    tool_id_to_name.get(tc.tool_id, "unknown") for tc in msg.tool_calls
+                ]
+                result.append(
+                    AssistantMessage(content=f"[Used tools: {', '.join(tool_names)}]")
+                )
+            else:
+                result.append(AssistantMessage(content=msg.message))
            continue

        # Skip tool call response messages - tool calls are captured above via assistant messages
        if msg.message_type == MessageType.TOOL_CALL_RESPONSE:
            continue

-        role = msg.message_type.value
-        formatted.append(f"[{role}]: {msg.message}")
-    return "\n\n".join(formatted)
+        # Handle user messages
+        if msg.message_type == MessageType.USER:
+            result.append(UserMessage(content=msg.message))
+
+    return result


 def generate_summary(
@@ -236,6 +263,9 @@ def generate_summary(
    The cutoff marker tells the LLM to summarize only older messages,
    while using recent messages as context to inform what's important.

+    Messages are sent as separate UserMessage/AssistantMessage objects rather
+    than being concatenated into a single message.
+
    Args:
        older_messages: Messages to compress into summary (before cutoff)
        recent_messages: Messages kept verbatim (after cutoff, for context only)
@@ -246,37 +276,54 @@ def generate_summary(
    Returns:
        Summary text
    """
-    older_messages_str = format_messages_for_summary(older_messages, tool_id_to_name)
-    recent_messages_str = format_messages_for_summary(recent_messages, tool_id_to_name)
-
-    # Build user prompt with cutoff marker
+    # Build system prompt
+    system_content = SUMMARIZATION_PROMPT
    if existing_summary:
-        # Progressive summarization: include existing summary
-        user_prompt = PROGRESSIVE_SUMMARY_PROMPT.format(
-            existing_summary=existing_summary
+        # Progressive summarization: append existing summary to system prompt
+        system_content += PROGRESSIVE_SUMMARY_SYSTEM_PROMPT_BLOCK.format(
+            previous_summary=existing_summary
        )
-        user_prompt += f"\n\n{older_messages_str}"
        final_reminder = PROGRESSIVE_USER_REMINDER
    else:
-        # Initial summarization
-        user_prompt = older_messages_str
-        final_reminder = USER_FINAL_REMINDER
+        final_reminder = USER_REMINDER

-    # Add cutoff marker and recent messages as context
-    user_prompt += f"\n\n{SUMMARIZATION_CUTOFF_MARKER}"
-    if recent_messages_str:
-        user_prompt += f"\n\n{recent_messages_str}"
+    # Convert messages to LLM format (using compression-specific conversion)
+    older_llm_messages = _build_llm_messages_for_summarization(
+        older_messages, tool_id_to_name
+    )
+    recent_llm_messages = _build_llm_messages_for_summarization(
+        recent_messages, tool_id_to_name
+    )
+
+    # Build message list with separate messages
+    input_messages: list[ChatCompletionMessage] = [
+        SystemMessage(content=system_content),
+    ]
+
+    # Add older messages (to be summarized)
+    input_messages.extend(older_llm_messages)
+
+    # Add cutoff marker as a user message
+    input_messages.append(UserMessage(content=SUMMARIZATION_CUTOFF_MARKER))
+
+    # Add recent messages (for context only)
+    input_messages.extend(recent_llm_messages)

    # Add final reminder
-    user_prompt += f"\n\n{final_reminder}"
+    input_messages.append(UserMessage(content=final_reminder))

-    response = llm.invoke(
-        [
-            SystemMessage(content=SUMMARIZATION_PROMPT),
-            UserMessage(content=user_prompt),
-        ]
-    )
-    return response.choice.message.content or ""
+    with llm_generation_span(
+        llm=llm,
+        flow="chat_history_summarization",
+        input_messages=input_messages,
+    ) as span_generation:
+        response = llm.invoke(input_messages)
+        record_llm_response(span_generation, response)
+
+    content = response.choice.message.content
+    if not (content and content.strip()):
+        raise ValueError("LLM returned empty summary")
+    return content.strip()


 def compress_chat_history(
@@ -292,6 +339,19 @@ def compress_chat_history(
    The summary message's parent_message_id points to the last message in
    chat_history, making it branch-aware via the tree structure.

+    Note: This takes the entire chat history as input, splits it into older
+    messages (to summarize) and recent messages (kept verbatim within the
+    token budget), generates a summary of the older part, and persists the
+    new summary message with its parent set to the last message in history.
+
+    Past summary is taken into context (progressive summarization): we find
+    at most one existing summary for this branch. If present, only messages
+    after that summary's last_summarized_message_id are considered; the
+    existing summary text is passed into the LLM so the new summary
+    incorporates it instead of summarizing from scratch.
+
+    For more details, see the COMPRESSION.md file.
+
    Args:
        db_session: Database session
        chat_history: Branch-aware list of messages
@@ -305,74 +365,84 @@ def compress_chat_history(
    if not chat_history:
        return CompressionResult(summary_created=False, messages_summarized=0)

+    chat_session_id = chat_history[0].chat_session_id
+
    logger.info(
-        f"Starting compression for session {chat_history[0].chat_session_id}, "
+        f"Starting compression for session {chat_session_id}, "
        f"history_len={len(chat_history)}, tokens_for_recent={compression_params.tokens_for_recent}"
    )

-    try:
-        # Find existing summary for this branch
-        existing_summary = find_summary_for_branch(db_session, chat_history)
+    with ensure_trace(
+        "chat_history_compression",
+        group_id=str(chat_session_id),
+        metadata={
+            "tenant_id": get_current_tenant_id(),
+            "chat_session_id": str(chat_session_id),
+        },
+    ):
+        try:
+            # Find existing summary for this branch
+            existing_summary = find_summary_for_branch(db_session, chat_history)

-        # Get messages to summarize
-        summary_content = get_messages_to_summarize(
-            chat_history,
-            existing_summary,
-            tokens_for_recent=compression_params.tokens_for_recent,
-        )
+            # Get messages to summarize
+            summary_content = get_messages_to_summarize(
+                chat_history,
+                existing_summary,
+                tokens_for_recent=compression_params.tokens_for_recent,
+            )

-        if not summary_content.older_messages:
-            logger.debug("No messages to summarize, skipping compression")
-            return CompressionResult(summary_created=False, messages_summarized=0)
+            if not summary_content.older_messages:
+                logger.debug("No messages to summarize, skipping compression")
+                return CompressionResult(summary_created=False, messages_summarized=0)

-        # Generate summary (incorporate existing summary if present)
-        existing_summary_text = existing_summary.message if existing_summary else None
-        summary_text = generate_summary(
-            older_messages=summary_content.older_messages,
-            recent_messages=summary_content.recent_messages,
-            llm=llm,
-            tool_id_to_name=tool_id_to_name,
-            existing_summary=existing_summary_text,
-        )
+            # Generate summary (incorporate existing summary if present)
+            existing_summary_text = (
+                existing_summary.message if existing_summary else None
+            )
+            summary_text = generate_summary(
+                older_messages=summary_content.older_messages,
+                recent_messages=summary_content.recent_messages,
+                llm=llm,
+                tool_id_to_name=tool_id_to_name,
+                existing_summary=existing_summary_text,
+            )

-        # Calculate token count for the summary
-        tokenizer = get_tokenizer(None, None)
-        summary_token_count = len(tokenizer.encode(summary_text))
-        logger.debug(
-            f"Generated summary ({summary_token_count} tokens): {summary_text[:200]}..."
-        )
+            # Calculate token count for the summary
+            tokenizer = get_tokenizer(None, None)
+            summary_token_count = len(tokenizer.encode(summary_text))
+            logger.debug(
+                f"Generated summary ({summary_token_count} tokens): {summary_text[:200]}..."
+            )

-        # Create new summary as a ChatMessage
-        # Parent is the last message in history - this makes the summary branch-aware
-        summary_message = ChatMessage(
-            chat_session_id=chat_history[0].chat_session_id,
-            message_type=MessageType.ASSISTANT,
-            message=summary_text,
-            token_count=summary_token_count,
-            parent_message_id=chat_history[-1].id,
-            last_summarized_message_id=summary_content.older_messages[-1].id,
-        )
-        db_session.add(summary_message)
-        db_session.commit()
+            # Create new summary as a ChatMessage
+            # Parent is the last message in history - this makes the summary branch-aware
+            summary_message = ChatMessage(
+                chat_session_id=chat_session_id,
+                message_type=MessageType.ASSISTANT,
+                message=summary_text,
+                token_count=summary_token_count,
+                parent_message_id=chat_history[-1].id,
+                last_summarized_message_id=summary_content.older_messages[-1].id,
+            )
+            db_session.add(summary_message)
+            db_session.commit()

-        logger.info(
-            f"Compressed {len(summary_content.older_messages)} messages into summary "
-            f"(session_id={chat_history[0].chat_session_id}, "
-            f"summary_tokens={summary_token_count})"
-        )
+            logger.info(
+                f"Compressed {len(summary_content.older_messages)} messages into summary "
+                f"(session_id={chat_session_id}, "
+                f"summary_tokens={summary_token_count})"
+            )

-        return CompressionResult(
-            summary_created=True,
-            messages_summarized=len(summary_content.older_messages),
-        )
+            return CompressionResult(
+                summary_created=True,
+                messages_summarized=len(summary_content.older_messages),
+            )

-    except Exception as e:
-        logger.exception(
-            f"Compression failed for session {chat_history[0].chat_session_id}: {e}"
-        )
-        db_session.rollback()
-        return CompressionResult(
-            summary_created=False,
-            messages_summarized=0,
-            error=str(e),
-        )
+        except Exception as e:
+            logger.exception(f"Compression failed for session {chat_session_id}: {e}")
+            db_session.rollback()
+            return CompressionResult(
+                summary_created=False,
+                messages_summarized=0,
+                error=str(e),
+            )
--- a/backend/onyx/chat/llm_loop.py
+++ b/backend/onyx/chat/llm_loop.py
@@ -48,6 +48,7 @@ from onyx.server.query_and_chat.streaming_models import TopLevelBranching
 from onyx.tools.built_in_tools import CITEABLE_TOOLS_NAMES
 from onyx.tools.built_in_tools import STOPPING_TOOLS_NAMES
 from onyx.tools.interface import Tool
+from onyx.tools.models import ChatFile
 from onyx.tools.models import MemoryToolResponseSnapshot
 from onyx.tools.models import ToolCallInfo
 from onyx.tools.models import ToolCallKickoff
@@ -56,6 +57,7 @@ from onyx.tools.tool_implementations.images.models import (
    FinalImageGenerationResponse,
 )
 from onyx.tools.tool_implementations.memory.models import MemoryToolResponse
+from onyx.tools.tool_implementations.python.python_tool import PythonTool
 from onyx.tools.tool_implementations.search.search_tool import SearchTool
 from onyx.tools.tool_implementations.web_search.utils import extract_url_snippet_map
 from onyx.tools.tool_implementations.web_search.web_search_tool import WebSearchTool
@@ -67,6 +69,18 @@ from shared_configs.contextvars import get_current_tenant_id
 logger = setup_logger()


+def _looks_like_xml_tool_call_payload(text: str | None) -> bool:
+    """Detect XML-style marshaled tool calls emitted as plain text."""
+    if not text:
+        return False
+    lowered = text.lower()
+    return (
+        "<function_calls" in lowered
+        and "<invoke" in lowered
+        and "<parameter" in lowered
+    )
+
+
 def _should_keep_bedrock_tool_definitions(
    llm: object, simple_chat_history: list[ChatMessageSimple]
 ) -> bool:
@@ -121,38 +135,56 @@ def _try_fallback_tool_extraction(
    reasoning_but_no_answer_or_tools = (
        llm_step_result.reasoning and not llm_step_result.answer and no_tool_calls
    )
+    xml_tool_call_text_detected = no_tool_calls and (
+        _looks_like_xml_tool_call_payload(llm_step_result.answer)
+        or _looks_like_xml_tool_call_payload(llm_step_result.raw_answer)
+        or _looks_like_xml_tool_call_payload(llm_step_result.reasoning)
+    )
    should_try_fallback = (
-        tool_choice == ToolChoiceOptions.REQUIRED and no_tool_calls
-    ) or reasoning_but_no_answer_or_tools
+        (tool_choice == ToolChoiceOptions.REQUIRED and no_tool_calls)
+        or reasoning_but_no_answer_or_tools
+        or xml_tool_call_text_detected
+    )

    if not should_try_fallback:
        return llm_step_result, False

    # Try to extract from answer first, then fall back to reasoning
    extracted_tool_calls: list[ToolCallKickoff] = []
+
    if llm_step_result.answer:
        extracted_tool_calls = extract_tool_calls_from_response_text(
            response_text=llm_step_result.answer,
            tool_definitions=tool_defs,
            placement=Placement(turn_index=turn_index),
        )
+    if (
+        not extracted_tool_calls
+        and llm_step_result.raw_answer
+        and llm_step_result.raw_answer != llm_step_result.answer
+    ):
+        extracted_tool_calls = extract_tool_calls_from_response_text(
+            response_text=llm_step_result.raw_answer,
+            tool_definitions=tool_defs,
+            placement=Placement(turn_index=turn_index),
+        )
    if not extracted_tool_calls and llm_step_result.reasoning:
        extracted_tool_calls = extract_tool_calls_from_response_text(
            response_text=llm_step_result.reasoning,
            tool_definitions=tool_defs,
            placement=Placement(turn_index=turn_index),
        )
-
    if extracted_tool_calls:
        logger.info(
            f"Extracted {len(extracted_tool_calls)} tool call(s) from response text "
-            f"as fallback (tool_choice was REQUIRED but no tool calls returned)"
+            "as fallback"
        )
        return (
            LlmStepResult(
                reasoning=llm_step_result.reasoning,
                answer=llm_step_result.answer,
                tool_calls=extracted_tool_calls,
+                raw_answer=llm_step_result.raw_answer,
            ),
            True,
        )
@@ -450,7 +482,42 @@ def construct_message_history(
    if reminder_message:
        result.append(reminder_message)

-    return result
+    return _drop_orphaned_tool_call_responses(result)
+
+
+def _drop_orphaned_tool_call_responses(
+    messages: list[ChatMessageSimple],
+) -> list[ChatMessageSimple]:
+    """Drop tool response messages whose tool_call_id is not in prior assistant tool calls.
+
+    This can happen when history truncation drops an ASSISTANT tool-call message but
+    leaves a later TOOL_CALL_RESPONSE message in context. Some providers (e.g. Ollama)
+    reject such history with an "unexpected tool call id" error.
+    """
+    known_tool_call_ids: set[str] = set()
+    sanitized: list[ChatMessageSimple] = []
+
+    for msg in messages:
+        if msg.message_type == MessageType.ASSISTANT and msg.tool_calls:
+            for tool_call in msg.tool_calls:
+                known_tool_call_ids.add(tool_call.tool_call_id)
+            sanitized.append(msg)
+            continue
+
+        if msg.message_type == MessageType.TOOL_CALL_RESPONSE:
+            if msg.tool_call_id and msg.tool_call_id in known_tool_call_ids:
+                sanitized.append(msg)
+            else:
+                logger.debug(
+                    "Dropping orphaned tool response with tool_call_id=%s while "
+                    "constructing message history",
+                    msg.tool_call_id,
+                )
+            continue
+
+        sanitized.append(msg)
+
+    return sanitized


 def _create_file_tool_metadata_message(
@@ -525,6 +592,7 @@ def run_llm_loop(
    forced_tool_id: int | None = None,
    user_identity: LLMUserIdentity | None = None,
    chat_session_id: str | None = None,
+    chat_files: list[ChatFile] | None = None,
    include_citations: bool = True,
    all_injected_file_metadata: dict[str, FileToolMetadata] | None = None,
    inject_memories_in_prompt: bool = True,
@@ -584,6 +652,7 @@ def run_llm_loop(
        ran_image_gen: bool = False
        just_ran_web_search: bool = False
        has_called_search_tool: bool = False
+        code_interpreter_file_generated: bool = False
        fallback_extraction_attempted: bool = False
        citation_mapping: dict[int, str] = {}  # Maps citation_num -> document_id/URL

@@ -694,6 +763,7 @@ def run_llm_loop(
                    ),
                    include_citation_reminder=should_cite_documents
                    or always_cite_documents,
+                    include_file_reminder=code_interpreter_file_generated,
                    is_last_cycle=out_of_cycles,
                )

@@ -806,6 +876,7 @@ def run_llm_loop(
                next_citation_num=citation_processor.get_next_citation_number(),
                max_concurrent_tools=None,
                skip_search_query_expansion=has_called_search_tool,
+                chat_files=chat_files,
                url_snippet_map=extract_url_snippet_map(gathered_documents or []),
                inject_memories_in_prompt=inject_memories_in_prompt,
            )
@@ -832,6 +903,18 @@ def run_llm_loop(
                if tool_call.tool_name == SearchTool.NAME:
                    has_called_search_tool = True

+                # Track if code interpreter generated files with download links
+                if (
+                    tool_call.tool_name == PythonTool.NAME
+                    and not code_interpreter_file_generated
+                ):
+                    try:
+                        parsed = json.loads(tool_response.llm_facing_response)
+                        if parsed.get("generated_files"):
+                            code_interpreter_file_generated = True
+                    except (json.JSONDecodeError, AttributeError):
+                        pass
+
                # Build a mapping of tool names to tool objects for getting tool_id
                tools_by_name = {tool.name: tool for tool in final_tools}

--- a/backend/onyx/chat/llm_step.py
+++ b/backend/onyx/chat/llm_step.py
@@ -1,10 +1,12 @@
 import json
+import re
 import time
 import uuid
 from collections.abc import Callable
 from collections.abc import Generator
 from collections.abc import Mapping
 from collections.abc import Sequence
+from html import unescape
 from typing import Any
 from typing import cast

@@ -18,6 +20,7 @@ from onyx.configs.app_configs import PROMPT_CACHE_CHAT_HISTORY
 from onyx.configs.constants import MessageType
 from onyx.context.search.models import SearchDoc
 from onyx.file_store.models import ChatFileType
+from onyx.llm.constants import LlmProviderNames
 from onyx.llm.interfaces import LanguageModelInput
 from onyx.llm.interfaces import LLM
 from onyx.llm.interfaces import LLMConfig
@@ -56,6 +59,112 @@ from onyx.utils.text_processing import find_all_json_objects

 logger = setup_logger()

+_XML_INVOKE_BLOCK_RE = re.compile(
+    r"<invoke\b(?P<attrs>[^>]*)>(?P<body>.*?)</invoke>",
+    re.IGNORECASE | re.DOTALL,
+)
+_XML_PARAMETER_RE = re.compile(
+    r"<parameter\b(?P<attrs>[^>]*)>(?P<value>.*?)</parameter>",
+    re.IGNORECASE | re.DOTALL,
+)
+_FUNCTION_CALLS_OPEN_MARKER = "<function_calls"
+_FUNCTION_CALLS_CLOSE_MARKER = "</function_calls>"
+
+
+class _XmlToolCallContentFilter:
+    """Streaming filter that strips XML-style tool call payload blocks from text."""
+
+    def __init__(self) -> None:
+        self._pending = ""
+        self._inside_function_calls_block = False
+
+    def process(self, content: str) -> str:
+        if not content:
+            return ""
+
+        self._pending += content
+        output_parts: list[str] = []
+
+        while self._pending:
+            pending_lower = self._pending.lower()
+
+            if self._inside_function_calls_block:
+                end_idx = pending_lower.find(_FUNCTION_CALLS_CLOSE_MARKER)
+                if end_idx == -1:
+                    # Keep buffering until we see the close marker.
+                    return "".join(output_parts)
+
+                # Drop the whole function_calls block.
+                self._pending = self._pending[
+                    end_idx + len(_FUNCTION_CALLS_CLOSE_MARKER) :
+                ]
+                self._inside_function_calls_block = False
+                continue
+
+            start_idx = _find_function_calls_open_marker(pending_lower)
+            if start_idx == -1:
+                # Keep only a possible prefix of "<function_calls" in the buffer so
+                # marker splits across chunks are handled correctly.
+                tail_len = _matching_open_marker_prefix_len(self._pending)
+                emit_upto = len(self._pending) - tail_len
+                if emit_upto > 0:
+                    output_parts.append(self._pending[:emit_upto])
+                    self._pending = self._pending[emit_upto:]
+                return "".join(output_parts)
+
+            if start_idx > 0:
+                output_parts.append(self._pending[:start_idx])
+
+            # Enter block-stripping mode and keep scanning for close marker.
+            self._pending = self._pending[start_idx:]
+            self._inside_function_calls_block = True
+
+        return "".join(output_parts)
+
+    def flush(self) -> str:
+        if self._inside_function_calls_block:
+            # Drop any incomplete block at stream end.
+            self._pending = ""
+            self._inside_function_calls_block = False
+            return ""
+
+        remaining = self._pending
+        self._pending = ""
+        return remaining
+
+
+def _matching_open_marker_prefix_len(text: str) -> int:
+    """Return longest suffix of text that matches prefix of "<function_calls"."""
+    max_len = min(len(text), len(_FUNCTION_CALLS_OPEN_MARKER) - 1)
+    text_lower = text.lower()
+    marker_lower = _FUNCTION_CALLS_OPEN_MARKER
+
+    for candidate_len in range(max_len, 0, -1):
+        if text_lower.endswith(marker_lower[:candidate_len]):
+            return candidate_len
+
+    return 0
+
+
+def _is_valid_function_calls_open_follower(char: str | None) -> bool:
+    return char is None or char in {">", " ", "\t", "\n", "\r"}
+
+
+def _find_function_calls_open_marker(text_lower: str) -> int:
+    """Find '<function_calls' with a valid tag boundary follower."""
+    search_from = 0
+    while True:
+        idx = text_lower.find(_FUNCTION_CALLS_OPEN_MARKER, search_from)
+        if idx == -1:
+            return -1
+
+        follower_pos = idx + len(_FUNCTION_CALLS_OPEN_MARKER)
+        follower = text_lower[follower_pos] if follower_pos < len(text_lower) else None
+        if _is_valid_function_calls_open_follower(follower):
+            return idx
+
+        search_from = idx + 1
+

 def _sanitize_llm_output(value: str) -> str:
    """Remove characters that PostgreSQL's text/JSONB types cannot store.
@@ -272,14 +381,7 @@ def _extract_tool_call_kickoffs(
    tab_index_calculated = 0
    for tool_call_data in id_to_tool_call_map.values():
        if tool_call_data.get("id") and tool_call_data.get("name"):
-            try:
-                tool_args = _parse_tool_args_to_dict(tool_call_data.get("arguments"))
-            except json.JSONDecodeError:
-                # If parsing fails, try empty dict, most tools would fail though
-                logger.error(
-                    f"Failed to parse tool call arguments: {tool_call_data['arguments']}"
-                )
-                tool_args = {}
+            tool_args = _parse_tool_args_to_dict(tool_call_data.get("arguments"))

            tool_calls.append(
                ToolCallKickoff(
@@ -307,8 +409,9 @@ def extract_tool_calls_from_response_text(
    """Extract tool calls from LLM response text by matching JSON against tool definitions.

    This is a fallback mechanism for when the LLM was expected to return tool calls
-    but didn't use the proper tool call format. It searches for JSON objects in the
-    response text that match the structure of available tools.
+    but didn't use the proper tool call format. It searches for tool calls embedded
+    in response text (JSON first, then XML-like invoke blocks) that match available
+    tool definitions.

    Args:
        response_text: The LLM's text response to search for tool calls
@@ -333,10 +436,9 @@ def extract_tool_calls_from_response_text(
    if not tool_name_to_def:
        return []

+    matched_tool_calls: list[tuple[str, dict[str, Any]]] = []
    # Find all JSON objects in the response text
    json_objects = find_all_json_objects(response_text)
-
-    matched_tool_calls: list[tuple[str, dict[str, Any]]] = []
    prev_json_obj: dict[str, Any] | None = None
    prev_tool_call: tuple[str, dict[str, Any]] | None = None

@@ -364,6 +466,14 @@ def extract_tool_calls_from_response_text(
        prev_json_obj = json_obj
        prev_tool_call = matched_tool_call

+    # Some providers/models emit XML-style function calls instead of JSON objects.
+    # Keep this as a fallback behind JSON extraction to preserve current behavior.
+    if not matched_tool_calls:
+        matched_tool_calls = _extract_xml_tool_calls_from_response_text(
+            response_text=response_text,
+            tool_name_to_def=tool_name_to_def,
+        )
+
    tool_calls: list[ToolCallKickoff] = []
    for tab_index, (tool_name, tool_args) in enumerate(matched_tool_calls):
        tool_calls.append(
@@ -386,6 +496,88 @@ def extract_tool_calls_from_response_text(
    return tool_calls


+def _extract_xml_tool_calls_from_response_text(
+    response_text: str,
+    tool_name_to_def: dict[str, dict],
+) -> list[tuple[str, dict[str, Any]]]:
+    """Extract XML-style tool calls from response text.
+
+    Supports formats such as:
+    <function_calls>
+      <invoke name="internal_search">
+        <parameter name="queries" string="false">["foo"]</parameter>
+      </invoke>
+    </function_calls>
+    """
+    matched_tool_calls: list[tuple[str, dict[str, Any]]] = []
+
+    for invoke_match in _XML_INVOKE_BLOCK_RE.finditer(response_text):
+        invoke_attrs = invoke_match.group("attrs")
+        tool_name = _extract_xml_attribute(invoke_attrs, "name")
+        if not tool_name or tool_name not in tool_name_to_def:
+            continue
+
+        tool_args: dict[str, Any] = {}
+        invoke_body = invoke_match.group("body")
+        for parameter_match in _XML_PARAMETER_RE.finditer(invoke_body):
+            parameter_attrs = parameter_match.group("attrs")
+            parameter_name = _extract_xml_attribute(parameter_attrs, "name")
+            if not parameter_name:
+                continue
+
+            string_attr = _extract_xml_attribute(parameter_attrs, "string")
+            tool_args[parameter_name] = _parse_xml_parameter_value(
+                raw_value=parameter_match.group("value"),
+                string_attr=string_attr,
+            )
+
+        matched_tool_calls.append((tool_name, tool_args))
+
+    return matched_tool_calls
+
+
+def _extract_xml_attribute(attrs: str, attr_name: str) -> str | None:
+    """Extract a single XML-style attribute value from a tag attribute string."""
+    attr_match = re.search(
+        rf"""\b{re.escape(attr_name)}\s*=\s*(['"])(.*?)\1""",
+        attrs,
+        flags=re.IGNORECASE | re.DOTALL,
+    )
+    if not attr_match:
+        return None
+    return _sanitize_llm_output(unescape(attr_match.group(2).strip()))
+
+
+def _parse_xml_parameter_value(raw_value: str, string_attr: str | None) -> Any:
+    """Parse a parameter value from XML-style tool call payloads."""
+    value = _sanitize_llm_output(unescape(raw_value).strip())
+
+    if string_attr and string_attr.lower() == "true":
+        return value
+
+    try:
+        return json.loads(value)
+    except json.JSONDecodeError:
+        return value
+
+
+def _resolve_tool_arguments(obj: dict[str, Any]) -> dict[str, Any] | None:
+    """Extract and parse an arguments/parameters value from a tool-call-like object.
+
+    Looks for "arguments" or "parameters" keys, handles JSON-string values,
+    and returns a dict if successful, or None otherwise.
+    """
+    arguments = obj.get("arguments", obj.get("parameters", {}))
+    if isinstance(arguments, str):
+        try:
+            arguments = json.loads(arguments)
+        except json.JSONDecodeError:
+            arguments = {}
+    if isinstance(arguments, dict):
+        return arguments
+    return None
+
+
 def _try_match_json_to_tool(
    json_obj: dict[str, Any],
    tool_name_to_def: dict[str, dict],
@@ -408,13 +600,8 @@ def _try_match_json_to_tool(
    # Format 1: Direct tool call format {"name": "...", "arguments": {...}}
    if "name" in json_obj and json_obj["name"] in tool_name_to_def:
        tool_name = json_obj["name"]
-        arguments = json_obj.get("arguments", json_obj.get("parameters", {}))
-        if isinstance(arguments, str):
-            try:
-                arguments = json.loads(arguments)
-            except json.JSONDecodeError:
-                arguments = {}
-        if isinstance(arguments, dict):
+        arguments = _resolve_tool_arguments(json_obj)
+        if arguments is not None:
            return (tool_name, arguments)

    # Format 2: Function call format {"function": {"name": "...", "arguments": {...}}}
@@ -422,13 +609,8 @@ def _try_match_json_to_tool(
        func_obj = json_obj["function"]
        if "name" in func_obj and func_obj["name"] in tool_name_to_def:
            tool_name = func_obj["name"]
-            arguments = func_obj.get("arguments", func_obj.get("parameters", {}))
-            if isinstance(arguments, str):
-                try:
-                    arguments = json.loads(arguments)
-                except json.JSONDecodeError:
-                    arguments = {}
-            if isinstance(arguments, dict):
+            arguments = _resolve_tool_arguments(func_obj)
+            if arguments is not None:
                return (tool_name, arguments)

    # Format 3: Tool name as key {"tool_name": {...arguments...}}
@@ -495,6 +677,107 @@ def _extract_nested_arguments_obj(
    return None


+def _build_structured_assistant_message(msg: ChatMessageSimple) -> AssistantMessage:
+    tool_calls_list: list[ToolCall] | None = None
+    if msg.tool_calls:
+        tool_calls_list = [
+            ToolCall(
+                id=tc.tool_call_id,
+                type="function",
+                function=FunctionCall(
+                    name=tc.tool_name,
+                    arguments=json.dumps(tc.tool_arguments),
+                ),
+            )
+            for tc in msg.tool_calls
+        ]
+
+    return AssistantMessage(
+        role="assistant",
+        content=msg.message or None,
+        tool_calls=tool_calls_list,
+    )
+
+
+def _build_structured_tool_response_message(msg: ChatMessageSimple) -> ToolMessage:
+    if not msg.tool_call_id:
+        raise ValueError(
+            "Tool call response message encountered but tool_call_id is not available. "
+            f"Message: {msg}"
+        )
+
+    return ToolMessage(
+        role="tool",
+        content=msg.message,
+        tool_call_id=msg.tool_call_id,
+    )
+
+
+class _HistoryMessageFormatter:
+    def format_assistant_message(self, msg: ChatMessageSimple) -> AssistantMessage:
+        raise NotImplementedError
+
+    def format_tool_response_message(
+        self, msg: ChatMessageSimple
+    ) -> ToolMessage | UserMessage:
+        raise NotImplementedError
+
+
+class _DefaultHistoryMessageFormatter(_HistoryMessageFormatter):
+    def format_assistant_message(self, msg: ChatMessageSimple) -> AssistantMessage:
+        return _build_structured_assistant_message(msg)
+
+    def format_tool_response_message(self, msg: ChatMessageSimple) -> ToolMessage:
+        return _build_structured_tool_response_message(msg)
+
+
+class _OllamaHistoryMessageFormatter(_HistoryMessageFormatter):
+    def format_assistant_message(self, msg: ChatMessageSimple) -> AssistantMessage:
+        if not msg.tool_calls:
+            return _build_structured_assistant_message(msg)
+
+        tool_call_lines = [
+            (
+                f"[Tool Call] name={tc.tool_name} "
+                f"id={tc.tool_call_id} args={json.dumps(tc.tool_arguments)}"
+            )
+            for tc in msg.tool_calls
+        ]
+        assistant_content = (
+            "\n".join([msg.message, *tool_call_lines])
+            if msg.message
+            else "\n".join(tool_call_lines)
+        )
+        return AssistantMessage(
+            role="assistant",
+            content=assistant_content,
+            tool_calls=None,
+        )
+
+    def format_tool_response_message(self, msg: ChatMessageSimple) -> UserMessage:
+        if not msg.tool_call_id:
+            raise ValueError(
+                "Tool call response message encountered but tool_call_id is not available. "
+                f"Message: {msg}"
+            )
+
+        return UserMessage(
+            role="user",
+            content=f"[Tool Result] id={msg.tool_call_id}\n{msg.message}",
+        )
+
+
+_DEFAULT_HISTORY_MESSAGE_FORMATTER = _DefaultHistoryMessageFormatter()
+_OLLAMA_HISTORY_MESSAGE_FORMATTER = _OllamaHistoryMessageFormatter()
+
+
+def _get_history_message_formatter(llm_config: LLMConfig) -> _HistoryMessageFormatter:
+    if llm_config.model_provider == LlmProviderNames.OLLAMA_CHAT:
+        return _OLLAMA_HISTORY_MESSAGE_FORMATTER
+
+    return _DEFAULT_HISTORY_MESSAGE_FORMATTER
+
+
 def translate_history_to_llm_format(
    history: list[ChatMessageSimple],
    llm_config: LLMConfig,
@@ -505,6 +788,10 @@ def translate_history_to_llm_format(
    handling different message types and image files for multimodal support.
    """
    messages: list[ChatCompletionMessage] = []
+    history_message_formatter = _get_history_message_formatter(llm_config)
+    # Note: cacheability is computed from pre-translation ChatMessageSimple types.
+    # Some providers flatten tool history into plain assistant/user text, so this split
+    # may be less semantically meaningful, but it remains safe and order-preserving.
    last_cacheable_msg_idx = -1
    all_previous_msgs_cacheable = True

@@ -586,39 +873,10 @@ def translate_history_to_llm_format(
            messages.append(reminder_msg)

        elif msg.message_type == MessageType.ASSISTANT:
-            tool_calls_list: list[ToolCall] | None = None
-            if msg.tool_calls:
-                tool_calls_list = [
-                    ToolCall(
-                        id=tc.tool_call_id,
-                        type="function",
-                        function=FunctionCall(
-                            name=tc.tool_name,
-                            arguments=json.dumps(tc.tool_arguments),
-                        ),
-                    )
-                    for tc in msg.tool_calls
-                ]
-
-            assistant_msg = AssistantMessage(
-                role="assistant",
-                content=msg.message or None,
-                tool_calls=tool_calls_list,
-            )
-            messages.append(assistant_msg)
+            messages.append(history_message_formatter.format_assistant_message(msg))

        elif msg.message_type == MessageType.TOOL_CALL_RESPONSE:
-            if not msg.tool_call_id:
-                raise ValueError(
-                    f"Tool call response message encountered but tool_call_id is not available. Message: {msg}"
-                )
-
-            tool_msg = ToolMessage(
-                role="tool",
-                content=msg.message,
-                tool_call_id=msg.tool_call_id,
-            )
-            messages.append(tool_msg)
+            messages.append(history_message_formatter.format_tool_response_message(msg))

        else:
            logger.warning(
@@ -698,7 +956,8 @@ def run_llm_step_pkt_generator(
        tool_definitions: List of tool definitions available to the LLM.
        tool_choice: Tool choice configuration (e.g., "auto", "required", "none").
        llm: Language model interface to use for generation.
-        turn_index: Current turn index in the conversation.
+        placement: Placement info (turn_index, tab_index, sub_turn_index) for
+            positioning packets in the conversation UI.
        state_container: Container for storing chat state (reasoning, answers).
        citation_processor: Optional processor for extracting and formatting citations
            from the response. If provided, processes tokens to identify citations.
@@ -710,7 +969,14 @@ def run_llm_step_pkt_generator(
        custom_token_processor: Optional callable that processes each token delta
            before yielding. Receives (delta, processor_state) and returns
            (modified_delta, new_processor_state). Can return None for delta to skip.
-        sub_turn_index: Optional sub-turn index for nested tool/agent calls.
+        max_tokens: Optional maximum number of tokens for the LLM response.
+        use_existing_tab_index: If True, use the tab_index from placement for all
+            tool calls instead of auto-incrementing.
+        is_deep_research: If True, treat content before tool calls as reasoning
+            when tool_choice is REQUIRED.
+        pre_answer_processing_time: Optional time spent processing before the
+            answer started, recorded in state_container for analytics.
+        timeout_override: Optional timeout override for the LLM call.

    Yields:
        Packet: Streaming packets containing:
@@ -736,8 +1002,15 @@ def run_llm_step_pkt_generator(
    tab_index = placement.tab_index
    sub_turn_index = placement.sub_turn_index

+    def _current_placement() -> Placement:
+        return Placement(
+            turn_index=turn_index,
+            tab_index=tab_index,
+            sub_turn_index=sub_turn_index,
+        )
+
    llm_msg_history = translate_history_to_llm_format(history, llm.config)
-    has_reasoned = 0
+    has_reasoned = False

    if LOG_ONYX_MODEL_INTERACTIONS:
        logger.debug(
@@ -749,6 +1022,8 @@ def run_llm_step_pkt_generator(
    answer_start = False
    accumulated_reasoning = ""
    accumulated_answer = ""
+    accumulated_raw_answer = ""
+    xml_tool_call_content_filter = _XmlToolCallContentFilter()

    processor_state: Any = None

@@ -764,6 +1039,112 @@ def run_llm_step_pkt_generator(
        )
        stream_start_time = time.monotonic()
        first_action_recorded = False
+
+        def _emit_citation_results(
+            results: Generator[str | CitationInfo, None, None],
+        ) -> Generator[Packet, None, None]:
+            """Yield packets for citation processor results (str or CitationInfo)."""
+            nonlocal accumulated_answer
+
+            for result in results:
+                if isinstance(result, str):
+                    accumulated_answer += result
+                    if state_container:
+                        state_container.set_answer_tokens(accumulated_answer)
+                    yield Packet(
+                        placement=_current_placement(),
+                        obj=AgentResponseDelta(content=result),
+                    )
+                elif isinstance(result, CitationInfo):
+                    yield Packet(
+                        placement=_current_placement(),
+                        obj=result,
+                    )
+                    if state_container:
+                        state_container.add_emitted_citation(result.citation_number)
+
+        def _close_reasoning_if_active() -> Generator[Packet, None, None]:
+            """Emit ReasoningDone and increment turns if reasoning is in progress."""
+            nonlocal reasoning_start
+            nonlocal has_reasoned
+            nonlocal turn_index
+            nonlocal sub_turn_index
+
+            if reasoning_start:
+                yield Packet(
+                    placement=Placement(
+                        turn_index=turn_index,
+                        tab_index=tab_index,
+                        sub_turn_index=sub_turn_index,
+                    ),
+                    obj=ReasoningDone(),
+                )
+                has_reasoned = True
+                turn_index, sub_turn_index = _increment_turns(
+                    turn_index, sub_turn_index
+                )
+                reasoning_start = False
+
+        def _emit_content_chunk(content_chunk: str) -> Generator[Packet, None, None]:
+            nonlocal accumulated_answer
+            nonlocal accumulated_reasoning
+            nonlocal answer_start
+            nonlocal reasoning_start
+            nonlocal turn_index
+            nonlocal sub_turn_index
+
+            # When tool_choice is REQUIRED, content before tool calls is reasoning/thinking
+            # about which tool to call, not an actual answer to the user.
+            # Treat this content as reasoning instead of answer.
+            if is_deep_research and tool_choice == ToolChoiceOptions.REQUIRED:
+                accumulated_reasoning += content_chunk
+                if state_container:
+                    state_container.set_reasoning_tokens(accumulated_reasoning)
+                if not reasoning_start:
+                    yield Packet(
+                        placement=_current_placement(),
+                        obj=ReasoningStart(),
+                    )
+                yield Packet(
+                    placement=_current_placement(),
+                    obj=ReasoningDelta(reasoning=content_chunk),
+                )
+                reasoning_start = True
+                return
+
+            # Normal flow for AUTO or NONE tool choice
+            yield from _close_reasoning_if_active()
+
+            if not answer_start:
+                # Store pre-answer processing time in state container for save_chat
+                if state_container and pre_answer_processing_time is not None:
+                    state_container.set_pre_answer_processing_time(
+                        pre_answer_processing_time
+                    )
+
+                yield Packet(
+                    placement=_current_placement(),
+                    obj=AgentResponseStart(
+                        final_documents=final_documents,
+                        pre_answer_processing_seconds=pre_answer_processing_time,
+                    ),
+                )
+                answer_start = True
+
+            if citation_processor:
+                yield from _emit_citation_results(
+                    citation_processor.process_token(content_chunk)
+                )
+            else:
+                accumulated_answer += content_chunk
+                # Save answer incrementally to state container
+                if state_container:
+                    state_container.set_answer_tokens(accumulated_answer)
+                yield Packet(
+                    placement=_current_placement(),
+                    obj=AgentResponseDelta(content=content_chunk),
+                )
+
        for packet in llm.stream(
            prompt=llm_msg_history,
            tools=tool_definitions,
@@ -822,152 +1203,34 @@ def run_llm_step_pkt_generator(
                    state_container.set_reasoning_tokens(accumulated_reasoning)
                if not reasoning_start:
                    yield Packet(
-                        placement=Placement(
-                            turn_index=turn_index,
-                            tab_index=tab_index,
-                            sub_turn_index=sub_turn_index,
-                        ),
+                        placement=_current_placement(),
                        obj=ReasoningStart(),
                    )
                yield Packet(
-                    placement=Placement(
-                        turn_index=turn_index,
-                        tab_index=tab_index,
-                        sub_turn_index=sub_turn_index,
-                    ),
+                    placement=_current_placement(),
                    obj=ReasoningDelta(reasoning=delta.reasoning_content),
                )
                reasoning_start = True

            if delta.content:
-                # When tool_choice is REQUIRED, content before tool calls is reasoning/thinking
-                # about which tool to call, not an actual answer to the user.
-                # Treat this content as reasoning instead of answer.
-                if is_deep_research and tool_choice == ToolChoiceOptions.REQUIRED:
-                    # Treat content as reasoning when we know tool calls are coming
-                    accumulated_reasoning += delta.content
-                    if state_container:
-                        state_container.set_reasoning_tokens(accumulated_reasoning)
-                    if not reasoning_start:
-                        yield Packet(
-                            placement=Placement(
-                                turn_index=turn_index,
-                                tab_index=tab_index,
-                                sub_turn_index=sub_turn_index,
-                            ),
-                            obj=ReasoningStart(),
-                        )
-                    yield Packet(
-                        placement=Placement(
-                            turn_index=turn_index,
-                            tab_index=tab_index,
-                            sub_turn_index=sub_turn_index,
-                        ),
-                        obj=ReasoningDelta(reasoning=delta.content),
-                    )
-                    reasoning_start = True
-                else:
-                    # Normal flow for AUTO or NONE tool choice
-                    if reasoning_start:
-                        yield Packet(
-                            placement=Placement(
-                                turn_index=turn_index,
-                                tab_index=tab_index,
-                                sub_turn_index=sub_turn_index,
-                            ),
-                            obj=ReasoningDone(),
-                        )
-                        has_reasoned = 1
-                        turn_index, sub_turn_index = _increment_turns(
-                            turn_index, sub_turn_index
-                        )
-                        reasoning_start = False
-
-                    if not answer_start:
-                        # Store pre-answer processing time in state container for save_chat
-                        if state_container and pre_answer_processing_time is not None:
-                            state_container.set_pre_answer_processing_time(
-                                pre_answer_processing_time
-                            )
-
-                        yield Packet(
-                            placement=Placement(
-                                turn_index=turn_index,
-                                tab_index=tab_index,
-                                sub_turn_index=sub_turn_index,
-                            ),
-                            obj=AgentResponseStart(
-                                final_documents=final_documents,
-                                pre_answer_processing_seconds=pre_answer_processing_time,
-                            ),
-                        )
-                        answer_start = True
-
-                    if citation_processor:
-                        for result in citation_processor.process_token(delta.content):
-                            if isinstance(result, str):
-                                accumulated_answer += result
-                                # Save answer incrementally to state container
-                                if state_container:
-                                    state_container.set_answer_tokens(
-                                        accumulated_answer
-                                    )
-                                yield Packet(
-                                    placement=Placement(
-                                        turn_index=turn_index,
-                                        tab_index=tab_index,
-                                        sub_turn_index=sub_turn_index,
-                                    ),
-                                    obj=AgentResponseDelta(content=result),
-                                )
-                            elif isinstance(result, CitationInfo):
-                                yield Packet(
-                                    placement=Placement(
-                                        turn_index=turn_index,
-                                        tab_index=tab_index,
-                                        sub_turn_index=sub_turn_index,
-                                    ),
-                                    obj=result,
-                                )
-                                # Track emitted citation for saving
-                                if state_container:
-                                    state_container.add_emitted_citation(
-                                        result.citation_number
-                                    )
-                    else:
-                        # When citation_processor is None, use delta.content directly without modification
-                        accumulated_answer += delta.content
-                        # Save answer incrementally to state container
-                        if state_container:
-                            state_container.set_answer_tokens(accumulated_answer)
-                        yield Packet(
-                            placement=Placement(
-                                turn_index=turn_index,
-                                tab_index=tab_index,
-                                sub_turn_index=sub_turn_index,
-                            ),
-                            obj=AgentResponseDelta(content=delta.content),
-                        )
+                # Keep raw content for fallback extraction. Display content can be
+                # filtered and, in deep-research REQUIRED mode, routed as reasoning.
+                accumulated_raw_answer += delta.content
+                filtered_content = xml_tool_call_content_filter.process(delta.content)
+                if filtered_content:
+                    yield from _emit_content_chunk(filtered_content)

            if delta.tool_calls:
-                if reasoning_start:
-                    yield Packet(
-                        placement=Placement(
-                            turn_index=turn_index,
-                            tab_index=tab_index,
-                            sub_turn_index=sub_turn_index,
-                        ),
-                        obj=ReasoningDone(),
-                    )
-                    has_reasoned = 1
-                    turn_index, sub_turn_index = _increment_turns(
-                        turn_index, sub_turn_index
-                    )
-                    reasoning_start = False
+                yield from _close_reasoning_if_active()

                for tool_call_delta in delta.tool_calls:
                    _update_tool_call_with_delta(id_to_tool_call_map, tool_call_delta)

+        # Flush any tail text buffered while checking for split "<function_calls" markers.
+        filtered_content_tail = xml_tool_call_content_filter.flush()
+        if filtered_content_tail:
+            yield from _emit_content_chunk(filtered_content_tail)
+
        # Flush custom token processor to get any final tool calls
        if custom_token_processor:
            flush_delta, processor_state = custom_token_processor(None, processor_state)
@@ -1023,50 +1286,14 @@ def run_llm_step_pkt_generator(

    # This may happen if the custom token processor is used to modify other packets into reasoning
    # Then there won't necessarily be anything else to come after the reasoning tokens
-    if reasoning_start:
-        yield Packet(
-            placement=Placement(
-                turn_index=turn_index,
-                tab_index=tab_index,
-                sub_turn_index=sub_turn_index,
-            ),
-            obj=ReasoningDone(),
-        )
-        has_reasoned = 1
-        turn_index, sub_turn_index = _increment_turns(turn_index, sub_turn_index)
-        reasoning_start = False
+    yield from _close_reasoning_if_active()

    # Flush any remaining content from citation processor
    # Reasoning is always first so this should use the post-incremented value of turn_index
    # Note that this doesn't need to handle any sub-turns as those docs will not have citations
    # as clickable items and will be stripped out instead.
    if citation_processor:
-        for result in citation_processor.process_token(None):
-            if isinstance(result, str):
-                accumulated_answer += result
-                # Save answer incrementally to state container
-                if state_container:
-                    state_container.set_answer_tokens(accumulated_answer)
-                yield Packet(
-                    placement=Placement(
-                        turn_index=turn_index,
-                        tab_index=tab_index,
-                        sub_turn_index=sub_turn_index,
-                    ),
-                    obj=AgentResponseDelta(content=result),
-                )
-            elif isinstance(result, CitationInfo):
-                yield Packet(
-                    placement=Placement(
-                        turn_index=turn_index,
-                        tab_index=tab_index,
-                        sub_turn_index=sub_turn_index,
-                    ),
-                    obj=result,
-                )
-                # Track emitted citation for saving
-                if state_container:
-                    state_container.add_emitted_citation(result.citation_number)
+        yield from _emit_citation_results(citation_processor.process_token(None))

    # Note: Content (AgentResponseDelta) doesn't need an explicit end packet - OverallStop handles it
    # Tool calls are handled by tool execution code and emit their own packets (e.g., SectionEnd)
@@ -1088,8 +1315,9 @@ def run_llm_step_pkt_generator(
            reasoning=accumulated_reasoning if accumulated_reasoning else None,
            answer=accumulated_answer if accumulated_answer else None,
            tool_calls=tool_calls if tool_calls else None,
+            raw_answer=accumulated_raw_answer if accumulated_raw_answer else None,
        ),
-        bool(has_reasoned),
+        has_reasoned,
    )


@@ -1144,4 +1372,4 @@ def run_llm_step(
            emitter.emit(packet)
        except StopIteration as e:
            llm_step_result, has_reasoned = e.value
-            return llm_step_result, bool(has_reasoned)
+            return llm_step_result, has_reasoned
--- a/backend/onyx/chat/models.py
+++ b/backend/onyx/chat/models.py
@@ -185,3 +185,6 @@ class LlmStepResult(BaseModel):
    reasoning: str | None
    answer: str | None
    tool_calls: list[ToolCallKickoff] | None
+    # Raw LLM text before any display-oriented filtering/sanitization.
+    # Used for fallback tool-call extraction when providers emit calls as text.
+    raw_answer: str | None = None
--- a/backend/onyx/chat/process_message.py
+++ b/backend/onyx/chat/process_message.py
@@ -4,7 +4,6 @@ An overview can be found in the README.md file in this directory.
 """

 import re
-import time
 import traceback
 from collections.abc import Callable
 from contextvars import Token
@@ -89,6 +88,7 @@ from onyx.server.query_and_chat.streaming_models import Packet
 from onyx.server.usage_limits import check_llm_cost_limit_for_provider
 from onyx.tools.constants import SEARCH_TOOL_ID
 from onyx.tools.interface import Tool
+from onyx.tools.models import ChatFile
 from onyx.tools.models import SearchToolUsage
 from onyx.tools.tool_constructor import construct_tools
 from onyx.tools.tool_constructor import CustomToolConfig
@@ -169,6 +169,29 @@ def _should_enable_slack_search(
    )


+def _convert_loaded_files_to_chat_files(
+    loaded_files: list[ChatLoadedFile],
+) -> list[ChatFile]:
+    """Convert ChatLoadedFile objects to ChatFile for tool usage (e.g., PythonTool).
+
+    Args:
+        loaded_files: List of ChatLoadedFile objects from the chat history
+
+    Returns:
+        List of ChatFile objects that can be passed to tools
+    """
+    chat_files = []
+    for loaded_file in loaded_files:
+        if len(loaded_file.content) > 0:
+            chat_files.append(
+                ChatFile(
+                    filename=loaded_file.filename or f"file_{loaded_file.file_id}",
+                    content=loaded_file.content,
+                )
+            )
+    return chat_files
+
+
 def _extract_project_file_texts_and_images(
    project_id: int | None,
    user_id: UUID | None,
@@ -431,7 +454,6 @@ def handle_stream_message_objects(
    external_state_container: ChatStateContainer | None = None,
 ) -> AnswerStream:
    tenant_id = get_current_tenant_id()
-    processing_start_time = time.monotonic()
    mock_response_token: Token[str | None] | None = None

    llm: LLM | None = None
@@ -735,6 +757,9 @@ def handle_stream_message_objects(
        # load all files needed for this chat chain in memory
        files = load_all_chat_files(chat_history, db_session)

+        # Convert loaded files to ChatFile format for tools like PythonTool
+        chat_files_for_tools = _convert_loaded_files_to_chat_files(files)
+
        # TODO Need to think of some way to support selected docs from the sidebar

        # Reserve a message id for the assistant response for frontend to track packets
@@ -829,7 +854,6 @@ def handle_stream_message_objects(
                assistant_message=assistant_response,
                llm=llm,
                reserved_tokens=reserved_token_count,
-                processing_start_time=processing_start_time,
            )

        # The stream generator can resume on a different worker thread after early yields.
@@ -886,6 +910,7 @@ def handle_stream_message_objects(
                forced_tool_id=forced_tool_id,
                user_identity=user_identity,
                chat_session_id=str(chat_session.id),
+                chat_files=chat_files_for_tools,
                include_citations=new_msg_req.include_citations,
                all_injected_file_metadata=all_injected_file_metadata,
                inject_memories_in_prompt=user.use_memories,
@@ -962,7 +987,6 @@ def llm_loop_completion_handle(
    assistant_message: ChatMessage,
    llm: LLM,
    reserved_tokens: int,
-    processing_start_time: float | None = None,  # noqa: ARG001
 ) -> None:
    chat_session_id = assistant_message.chat_session_id

--- a/backend/onyx/chat/prompt_utils.py
+++ b/backend/onyx/chat/prompt_utils.py
@@ -10,6 +10,7 @@ from onyx.db.user_file import calculate_user_files_token_count
 from onyx.file_store.models import FileDescriptor
 from onyx.prompts.chat_prompts import CITATION_REMINDER
 from onyx.prompts.chat_prompts import DEFAULT_SYSTEM_PROMPT
+from onyx.prompts.chat_prompts import FILE_REMINDER
 from onyx.prompts.chat_prompts import LAST_CYCLE_CITATION_REMINDER
 from onyx.prompts.chat_prompts import REQUIRE_CITATION_GUIDANCE
 from onyx.prompts.prompt_utils import get_company_context
@@ -125,6 +126,7 @@ def calculate_reserved_tokens(
 def build_reminder_message(
    reminder_text: str | None,
    include_citation_reminder: bool,
+    include_file_reminder: bool,
    is_last_cycle: bool,
 ) -> str | None:
    reminder = reminder_text.strip() if reminder_text else ""
@@ -132,6 +134,8 @@ def build_reminder_message(
        reminder += "\n\n" + LAST_CYCLE_CITATION_REMINDER
    if include_citation_reminder:
        reminder += "\n\n" + CITATION_REMINDER
+    if include_file_reminder:
+        reminder += "\n\n" + FILE_REMINDER
    reminder = reminder.strip()
    return reminder if reminder else None

@@ -186,7 +190,7 @@ def _build_user_information_section(
    if not sections:
        return ""

-    return USER_INFORMATION_HEADER + "".join(sections)
+    return USER_INFORMATION_HEADER + "\n".join(sections)


 def build_system_prompt(
@@ -224,23 +228,21 @@ def build_system_prompt(
        system_prompt += REQUIRE_CITATION_GUIDANCE

    if include_all_guidance:
-        system_prompt += (
-            TOOL_SECTION_HEADER
-            + TOOL_DESCRIPTION_SEARCH_GUIDANCE
-            + INTERNAL_SEARCH_GUIDANCE
-            + WEB_SEARCH_GUIDANCE.format(
+        tool_sections = [
+            TOOL_DESCRIPTION_SEARCH_GUIDANCE,
+            INTERNAL_SEARCH_GUIDANCE,
+            WEB_SEARCH_GUIDANCE.format(
                site_colon_disabled=WEB_SEARCH_SITE_DISABLED_GUIDANCE
-            )
-            + OPEN_URLS_GUIDANCE
-            + PYTHON_TOOL_GUIDANCE
-            + GENERATE_IMAGE_GUIDANCE
-            + MEMORY_GUIDANCE
-        )
+            ),
+            OPEN_URLS_GUIDANCE,
+            PYTHON_TOOL_GUIDANCE,
+            GENERATE_IMAGE_GUIDANCE,
+            MEMORY_GUIDANCE,
+        ]
+        system_prompt += TOOL_SECTION_HEADER + "\n".join(tool_sections)
        return system_prompt

    if tools:
-        system_prompt += TOOL_SECTION_HEADER
-
        has_web_search = any(isinstance(tool, WebSearchTool) for tool in tools)
        has_internal_search = any(isinstance(tool, SearchTool) for tool in tools)
        has_open_urls = any(isinstance(tool, OpenURLTool) for tool in tools)
@@ -250,12 +252,14 @@ def build_system_prompt(
        )
        has_memory = any(isinstance(tool, MemoryTool) for tool in tools)

+        tool_guidance_sections: list[str] = []
+
        if has_web_search or has_internal_search or include_all_guidance:
-            system_prompt += TOOL_DESCRIPTION_SEARCH_GUIDANCE
+            tool_guidance_sections.append(TOOL_DESCRIPTION_SEARCH_GUIDANCE)

        # These are not included at the Tool level because the ordering may matter.
        if has_internal_search or include_all_guidance:
-            system_prompt += INTERNAL_SEARCH_GUIDANCE
+            tool_guidance_sections.append(INTERNAL_SEARCH_GUIDANCE)

        if has_web_search or include_all_guidance:
            site_disabled_guidance = ""
@@ -265,20 +269,23 @@ def build_system_prompt(
                )
                if web_search_tool and not web_search_tool.supports_site_filter:
                    site_disabled_guidance = WEB_SEARCH_SITE_DISABLED_GUIDANCE
-            system_prompt += WEB_SEARCH_GUIDANCE.format(
-                site_colon_disabled=site_disabled_guidance
+            tool_guidance_sections.append(
+                WEB_SEARCH_GUIDANCE.format(site_colon_disabled=site_disabled_guidance)
            )

        if has_open_urls or include_all_guidance:
-            system_prompt += OPEN_URLS_GUIDANCE
+            tool_guidance_sections.append(OPEN_URLS_GUIDANCE)

        if has_python or include_all_guidance:
-            system_prompt += PYTHON_TOOL_GUIDANCE
+            tool_guidance_sections.append(PYTHON_TOOL_GUIDANCE)

        if has_generate_image or include_all_guidance:
-            system_prompt += GENERATE_IMAGE_GUIDANCE
+            tool_guidance_sections.append(GENERATE_IMAGE_GUIDANCE)

        if has_memory or include_all_guidance:
-            system_prompt += MEMORY_GUIDANCE
+            tool_guidance_sections.append(MEMORY_GUIDANCE)
+
+        if tool_guidance_sections:
+            system_prompt += TOOL_SECTION_HEADER + "\n".join(tool_guidance_sections)

    return system_prompt
--- a/backend/onyx/configs/app_configs.py
+++ b/backend/onyx/configs/app_configs.py
@@ -251,7 +251,9 @@ DEFAULT_OPENSEARCH_QUERY_TIMEOUT_S = int(
    os.environ.get("DEFAULT_OPENSEARCH_QUERY_TIMEOUT_S") or 50
 )
 OPENSEARCH_ADMIN_USERNAME = os.environ.get("OPENSEARCH_ADMIN_USERNAME", "admin")
-OPENSEARCH_ADMIN_PASSWORD = os.environ.get("OPENSEARCH_ADMIN_PASSWORD", "")
+OPENSEARCH_ADMIN_PASSWORD = os.environ.get(
+    "OPENSEARCH_ADMIN_PASSWORD", "StrongPassword123!"
+)
 USING_AWS_MANAGED_OPENSEARCH = (
    os.environ.get("USING_AWS_MANAGED_OPENSEARCH", "").lower() == "true"
 )
@@ -263,6 +265,18 @@ OPENSEARCH_PROFILING_DISABLED = (
    os.environ.get("OPENSEARCH_PROFILING_DISABLED", "").lower() == "true"
 )

+# When enabled, OpenSearch returns detailed score breakdowns for each hit.
+# Useful for debugging and tuning search relevance. Has ~10-30% performance overhead according to documentation.
+# Seems for Hybrid Search in practice, the impact is actually more like 1000x slower.
+OPENSEARCH_EXPLAIN_ENABLED = (
+    os.environ.get("OPENSEARCH_EXPLAIN_ENABLED", "").lower() == "true"
+)
+
+# Analyzer used for full-text fields (title, content). Use OpenSearch built-in analyzer
+# names (e.g. "english", "standard", "german"). Affects stemming and tokenization;
+# existing indices need reindexing after a change.
+OPENSEARCH_TEXT_ANALYZER = os.environ.get("OPENSEARCH_TEXT_ANALYZER") or "english"
+
 # This is the "base" config for now, the idea is that at least for our dev
 # environments we always want to be dual indexing into both OpenSearch and Vespa
 # to stress test the new codepaths. Only enable this if there is some instance
@@ -270,6 +284,9 @@ OPENSEARCH_PROFILING_DISABLED = (
 ENABLE_OPENSEARCH_INDEXING_FOR_ONYX = (
    os.environ.get("ENABLE_OPENSEARCH_INDEXING_FOR_ONYX", "").lower() == "true"
 )
+# NOTE: This effectively does nothing anymore, admins can now toggle whether
+# retrieval is through OpenSearch. This value is only used as a final fallback
+# in case that doesn't work for whatever reason.
 # Given that the "base" config above is true, this enables whether we want to
 # retrieve from OpenSearch or Vespa. We want to be able to quickly toggle this
 # in the event we see issues with OpenSearch retrieval in our dev environments.
@@ -625,6 +642,14 @@ SHAREPOINT_CONNECTOR_SIZE_THRESHOLD = int(
    os.environ.get("SHAREPOINT_CONNECTOR_SIZE_THRESHOLD", 20 * 1024 * 1024)
 )

+# When True, group sync enumerates every Azure AD group in the tenant (expensive).
+# When False (default), only groups found in site role assignments are synced.
+# Can be overridden per-connector via the "exhaustive_ad_enumeration" key in
+# connector_specific_config.
+SHAREPOINT_EXHAUSTIVE_AD_ENUMERATION = (
+    os.environ.get("SHAREPOINT_EXHAUSTIVE_AD_ENUMERATION", "").lower() == "true"
+)
+
 BLOB_STORAGE_SIZE_THRESHOLD = int(
    os.environ.get("BLOB_STORAGE_SIZE_THRESHOLD", 20 * 1024 * 1024)
 )
--- a/backend/onyx/configs/constants.py
+++ b/backend/onyx/configs/constants.py
@@ -157,6 +157,17 @@ CELERY_EXTERNAL_GROUP_SYNC_LOCK_TIMEOUT = 300  # 5 min

 CELERY_USER_FILE_PROCESSING_LOCK_TIMEOUT = 30 * 60  # 30 minutes (in seconds)

+# How long a queued user-file task is valid before workers discard it.
+# Should be longer than the beat interval (20 s) but short enough to prevent
+# indefinite queue growth.  Workers drop tasks older than this without touching
+# the DB, so a shorter value = faster drain of stale duplicates.
+CELERY_USER_FILE_PROCESSING_TASK_EXPIRES = 60  # 1 minute (in seconds)
+
+# Maximum number of tasks allowed in the user-file-processing queue before the
+# beat generator stops adding more.  Prevents unbounded queue growth when workers
+# fall behind.
+USER_FILE_PROCESSING_MAX_QUEUE_DEPTH = 500
+
 CELERY_USER_FILE_PROJECT_SYNC_LOCK_TIMEOUT = 5 * 60  # 5 minutes (in seconds)

 CELERY_SANDBOX_FILE_SYNC_LOCK_TIMEOUT = 5 * 60  # 5 minutes (in seconds)
@@ -443,6 +454,9 @@ class OnyxRedisLocks:
    # User file processing
    USER_FILE_PROCESSING_BEAT_LOCK = "da_lock:check_user_file_processing_beat"
    USER_FILE_PROCESSING_LOCK_PREFIX = "da_lock:user_file_processing"
+    # Short-lived key set when a task is enqueued; cleared when the worker picks it up.
+    # Prevents the beat from re-enqueuing the same file while a task is already queued.
+    USER_FILE_QUEUED_PREFIX = "da_lock:user_file_queued"
    USER_FILE_PROJECT_SYNC_BEAT_LOCK = "da_lock:check_user_file_project_sync_beat"
    USER_FILE_PROJECT_SYNC_LOCK_PREFIX = "da_lock:user_file_project_sync"
    USER_FILE_DELETE_BEAT_LOCK = "da_lock:check_user_file_delete_beat"
--- a/backend/onyx/connectors/confluence/connector.py
+++ b/backend/onyx/connectors/confluence/connector.py
@@ -905,6 +905,10 @@ class ConfluenceConnector(
                self.confluence_client, self.is_cloud
            )

+        # Yield space hierarchy nodes first
+        for node in self._yield_space_hierarchy_nodes():
+            doc_metadata_list.append(node)
+
        def get_external_access(
            doc_id: str, restrictions: dict[str, Any], ancestors: list[dict[str, Any]]
        ) -> ExternalAccess | None:
@@ -919,6 +923,10 @@ class ConfluenceConnector(
            expand=restrictions_expand,
            limit=_SLIM_DOC_BATCH_SIZE,
        ):
+            # Yield ancestor hierarchy nodes for this page
+            for node in self._yield_ancestor_hierarchy_nodes(page):
+                doc_metadata_list.append(node)
+
            page_id = _get_page_id(page)
            page_restrictions = page.get("restrictions") or {}
            page_space_key = page.get("space", {}).get("key")
@@ -939,6 +947,7 @@ class ConfluenceConnector(
            )

            # Query attachments for each page
+            page_hierarchy_node_yielded = False
            attachment_query = self._construct_attachment_query(_get_page_id(page))
            for attachment in self.confluence_client.cql_paginate_all_expansions(
                cql=attachment_query,
@@ -952,6 +961,14 @@ class ConfluenceConnector(
                ):
                    continue

+                # If this page has valid attachments and we haven't yielded it as a
+                # hierarchy node yet, do so now (attachments are children of the page)
+                if not page_hierarchy_node_yielded:
+                    page_node = self._maybe_yield_page_hierarchy_node(page)
+                    if page_node:
+                        doc_metadata_list.append(page_node)
+                    page_hierarchy_node_yielded = True
+
                attachment_restrictions = attachment.get("restrictions", {})
                if not attachment_restrictions:
                    attachment_restrictions = page_restrictions or {}
--- a/backend/onyx/connectors/google_drive/connector.py
+++ b/backend/onyx/connectors/google_drive/connector.py
@@ -46,6 +46,7 @@ from onyx.connectors.google_drive.file_retrieval import get_external_access_for_
 from onyx.connectors.google_drive.file_retrieval import get_files_in_shared_drive
 from onyx.connectors.google_drive.file_retrieval import get_folder_metadata
 from onyx.connectors.google_drive.file_retrieval import get_root_folder_id
+from onyx.connectors.google_drive.file_retrieval import get_shared_drive_name
 from onyx.connectors.google_drive.file_retrieval import has_link_only_permission
 from onyx.connectors.google_drive.models import DriveRetrievalStage
 from onyx.connectors.google_drive.models import GoogleDriveCheckpoint
@@ -156,10 +157,7 @@ def _is_shared_drive_root(folder: GoogleDriveFileType) -> bool:
        return False

    # For shared drive content, the root has id == driveId
-    if drive_id and folder_id == drive_id:
-        return True
-
-    return False
+    return bool(drive_id and folder_id == drive_id)


 def _public_access() -> ExternalAccess:
@@ -616,6 +614,16 @@ class GoogleDriveConnector(
                # empty parents due to permission limitations)
                # Check shared drive root first (simple ID comparison)
                if _is_shared_drive_root(folder):
+                    # files().get() returns 'Drive' for shared drive roots;
+                    # fetch the real name via drives().get().
+                    # Try both the retriever and admin since the admin may
+                    # not have access to private shared drives.
+                    drive_name = self._get_shared_drive_name(
+                        current_id, file.user_email
+                    )
+                    if drive_name:
+                        node.display_name = drive_name
+                    node.node_type = HierarchyNodeType.SHARED_DRIVE
                    reached_terminal = True
                    break

@@ -691,6 +699,15 @@ class GoogleDriveConnector(
        )
        return None

+    def _get_shared_drive_name(self, drive_id: str, retriever_email: str) -> str | None:
+        """Fetch the name of a shared drive, trying both the retriever and admin."""
+        for email in {retriever_email, self.primary_admin_email}:
+            svc = get_drive_service(self.creds, email)
+            name = get_shared_drive_name(svc, drive_id)
+            if name:
+                return name
+        return None
+
    def get_all_drive_ids(self) -> set[str]:
        return self._get_all_drives_for_user(self.primary_admin_email)

--- a/backend/onyx/connectors/google_drive/file_retrieval.py
+++ b/backend/onyx/connectors/google_drive/file_retrieval.py
@@ -154,6 +154,26 @@ def _get_hierarchy_fields_for_file_type(field_type: DriveFileFieldType) -> str:
        return HIERARCHY_FIELDS


+def get_shared_drive_name(
+    service: Resource,
+    drive_id: str,
+) -> str | None:
+    """Fetch the actual name of a shared drive via the drives().get() API.
+
+    The files().get() API returns 'Drive' as the name for shared drive root
+    folders. Only drives().get() returns the real user-assigned name.
+    """
+    try:
+        drive = service.drives().get(driveId=drive_id, fields="name").execute()
+        return drive.get("name")
+    except HttpError as e:
+        if e.resp.status in (403, 404):
+            logger.debug(f"Cannot access drive {drive_id}: {e}")
+        else:
+            raise
+    return None
+
+
 def get_external_access_for_folder(
    folder: GoogleDriveFileType,
    google_domain: str,
--- a/backend/onyx/connectors/sharepoint/connector.py
+++ b/backend/onyx/connectors/sharepoint/connector.py
--- a/backend/onyx/connectors/teams/connector.py
+++ b/backend/onyx/connectors/teams/connector.py
@@ -50,12 +50,15 @@ class TeamsCheckpoint(ConnectorCheckpoint):
    todo_team_ids: list[str] | None = None


+DEFAULT_AUTHORITY_HOST = "https://login.microsoftonline.com"
+DEFAULT_GRAPH_API_HOST = "https://graph.microsoft.com"
+
+
 class TeamsConnector(
    CheckpointedConnectorWithPermSync[TeamsCheckpoint],
    SlimConnectorWithPermSync,
 ):
    MAX_WORKERS = 10
-    AUTHORITY_URL_PREFIX = "https://login.microsoftonline.com/"

    def __init__(
        self,
@@ -63,11 +66,15 @@ class TeamsConnector(
        # are not necessarily guaranteed to be unique
        teams: list[str] = [],
        max_workers: int = MAX_WORKERS,
+        authority_host: str = DEFAULT_AUTHORITY_HOST,
+        graph_api_host: str = DEFAULT_GRAPH_API_HOST,
    ) -> None:
        self.graph_client: GraphClient | None = None
        self.msal_app: msal.ConfidentialClientApplication | None = None
        self.max_workers = max_workers
        self.requested_team_list: list[str] = teams
+        self.authority_host = authority_host.rstrip("/")
+        self.graph_api_host = graph_api_host.rstrip("/")

    # impls for BaseConnector

@@ -76,7 +83,7 @@ class TeamsConnector(
        teams_client_secret = credentials["teams_client_secret"]
        teams_directory_id = credentials["teams_directory_id"]

-        authority_url = f"{TeamsConnector.AUTHORITY_URL_PREFIX}{teams_directory_id}"
+        authority_url = f"{self.authority_host}/{teams_directory_id}"
        self.msal_app = msal.ConfidentialClientApplication(
            authority=authority_url,
            client_id=teams_client_id,
@@ -91,7 +98,7 @@ class TeamsConnector(
                raise RuntimeError("MSAL app is not initialized")

            token = self.msal_app.acquire_token_for_client(
-                scopes=["https://graph.microsoft.com/.default"]
+                scopes=[f"{self.graph_api_host}/.default"]
            )

            if not isinstance(token, dict):
--- a/backend/onyx/context/search/federated/slack_search.py
+++ b/backend/onyx/context/search/federated/slack_search.py
@@ -32,6 +32,7 @@ from onyx.context.search.federated.slack_search_utils import should_include_mess
 from onyx.context.search.models import ChunkIndexRequest
 from onyx.context.search.models import InferenceChunk
 from onyx.db.document import DocumentSource
+from onyx.db.models import SearchSettings
 from onyx.db.search_settings import get_current_search_settings
 from onyx.document_index.document_index_utils import (
    get_multipass_config,
@@ -905,13 +906,15 @@ def convert_slack_score(slack_score: float) -> float:
 def slack_retrieval(
    query: ChunkIndexRequest,
    access_token: str,
-    db_session: Session,
+    db_session: Session | None = None,
    connector: FederatedConnectorDetail | None = None,  # noqa: ARG001
    entities: dict[str, Any] | None = None,
    limit: int | None = None,
    slack_event_context: SlackContext | None = None,
    bot_token: str | None = None,  # Add bot token parameter
    team_id: str | None = None,
+    # Pre-fetched data — when provided, avoids DB query (no session needed)
+    search_settings: SearchSettings | None = None,
 ) -> list[InferenceChunk]:
    """
    Main entry point for Slack federated search with entity filtering.
@@ -925,7 +928,7 @@ def slack_retrieval(
    Args:
        query: Search query object
        access_token: User OAuth access token
-        db_session: Database session
+        db_session: Database session (optional if search_settings provided)
        connector: Federated connector detail (unused, kept for backwards compat)
        entities: Connector-level config (entity filtering configuration)
        limit: Maximum number of results
@@ -1153,7 +1156,10 @@ def slack_retrieval(

    # chunk index docs into doc aware chunks
    # a single index doc can get split into multiple chunks
-    search_settings = get_current_search_settings(db_session)
+    if search_settings is None:
+        if db_session is None:
+            raise ValueError("Either db_session or search_settings must be provided")
+        search_settings = get_current_search_settings(db_session)
    embedder = DefaultIndexingEmbedder.from_db_search_settings(
        search_settings=search_settings
    )
--- a/backend/onyx/context/search/pipeline.py
+++ b/backend/onyx/context/search/pipeline.py
@@ -18,8 +18,10 @@ from onyx.context.search.utils import inference_section_from_chunks
 from onyx.db.models import Persona
 from onyx.db.models import User
 from onyx.document_index.interfaces import DocumentIndex
+from onyx.federated_connectors.federated_retrieval import FederatedRetrievalInfo
 from onyx.llm.interfaces import LLM
 from onyx.natural_language_processing.english_stopwords import strip_stopwords
+from onyx.natural_language_processing.search_nlp_models import EmbeddingModel
 from onyx.secondary_llm_flows.source_filter import extract_source_filter
 from onyx.secondary_llm_flows.time_filter import extract_time_filter
 from onyx.utils.logger import setup_logger
@@ -41,7 +43,7 @@ def _build_index_filters(
    user_file_ids: list[UUID] | None,
    persona_document_sets: list[str] | None,
    persona_time_cutoff: datetime | None,
-    db_session: Session,
+    db_session: Session | None = None,
    auto_detect_filters: bool = False,
    query: str | None = None,
    llm: LLM | None = None,
@@ -49,18 +51,19 @@ def _build_index_filters(
    # Assistant knowledge filters
    attached_document_ids: list[str] | None = None,
    hierarchy_node_ids: list[int] | None = None,
+    # Pre-fetched ACL filters (skips DB query when provided)
+    acl_filters: list[str] | None = None,
 ) -> IndexFilters:
    if auto_detect_filters and (llm is None or query is None):
        raise RuntimeError("LLM and query are required for auto detect filters")

    base_filters = user_provided_filters or BaseFilters()

-    if (
-        user_provided_filters
-        and user_provided_filters.document_set is None
-        and persona_document_sets is not None
-    ):
-        base_filters.document_set = persona_document_sets
+    document_set_filter = (
+        base_filters.document_set
+        if base_filters.document_set is not None
+        else persona_document_sets
+    )

    time_filter = base_filters.time_cutoff or persona_time_cutoff
    source_filter = base_filters.source_type
@@ -103,15 +106,20 @@ def _build_index_filters(
            source_filter = list(source_filter) + [DocumentSource.USER_FILE]
            logger.debug("Added USER_FILE to source_filter for user knowledge search")

-    user_acl_filters = (
-        None if bypass_acl else build_access_filters_for_user(user, db_session)
-    )
+    if bypass_acl:
+        user_acl_filters = None
+    elif acl_filters is not None:
+        user_acl_filters = acl_filters
+    else:
+        if db_session is None:
+            raise ValueError("Either db_session or acl_filters must be provided")
+        user_acl_filters = build_access_filters_for_user(user, db_session)

    final_filters = IndexFilters(
        user_file_ids=user_file_ids,
        project_id=project_id,
        source_type=source_filter,
-        document_set=persona_document_sets,
+        document_set=document_set_filter,
        time_cutoff=time_filter,
        tags=base_filters.tags,
        access_control_list=user_acl_filters,
@@ -252,11 +260,15 @@ def search_pipeline(
    user: User,
    # Used for default filters and settings
    persona: Persona | None,
-    db_session: Session,
+    db_session: Session | None = None,
    auto_detect_filters: bool = False,
    llm: LLM | None = None,
    # If a project ID is provided, it will be exclusively scoped to that project
    project_id: int | None = None,
+    # Pre-fetched data — when provided, avoids DB queries (no session needed)
+    acl_filters: list[str] | None = None,
+    embedding_model: EmbeddingModel | None = None,
+    prefetched_federated_retrieval_infos: list[FederatedRetrievalInfo] | None = None,
 ) -> list[InferenceChunk]:
    user_uploaded_persona_files: list[UUID] | None = (
        [user_file.id for user_file in persona.user_files] if persona else None
@@ -297,6 +309,7 @@ def search_pipeline(
        bypass_acl=chunk_search_request.bypass_acl,
        attached_document_ids=attached_document_ids,
        hierarchy_node_ids=hierarchy_node_ids,
+        acl_filters=acl_filters,
    )

    query_keywords = strip_stopwords(chunk_search_request.query)
@@ -315,6 +328,8 @@ def search_pipeline(
        user_id=user.id if user else None,
        document_index=document_index,
        db_session=db_session,
+        embedding_model=embedding_model,
+        prefetched_federated_retrieval_infos=prefetched_federated_retrieval_infos,
    )

    # For some specific connectors like Salesforce, a user that has access to an object doesn't mean
--- a/backend/onyx/context/search/retrieval/search_runner.py
+++ b/backend/onyx/context/search/retrieval/search_runner.py
@@ -14,9 +14,11 @@ from onyx.context.search.utils import get_query_embedding
 from onyx.context.search.utils import inference_section_from_chunks
 from onyx.document_index.interfaces import DocumentIndex
 from onyx.document_index.interfaces import VespaChunkRequest
+from onyx.federated_connectors.federated_retrieval import FederatedRetrievalInfo
 from onyx.federated_connectors.federated_retrieval import (
    get_federated_retrieval_functions,
 )
+from onyx.natural_language_processing.search_nlp_models import EmbeddingModel
 from onyx.utils.logger import setup_logger
 from onyx.utils.threadpool_concurrency import run_functions_tuples_in_parallel

@@ -50,9 +52,14 @@ def combine_retrieval_results(
 def _embed_and_search(
    query_request: ChunkIndexRequest,
    document_index: DocumentIndex,
-    db_session: Session,
+    db_session: Session | None = None,
+    embedding_model: EmbeddingModel | None = None,
 ) -> list[InferenceChunk]:
-    query_embedding = get_query_embedding(query_request.query, db_session)
+    query_embedding = get_query_embedding(
+        query_request.query,
+        db_session=db_session,
+        embedding_model=embedding_model,
+    )

    hybrid_alpha = query_request.hybrid_alpha or HYBRID_ALPHA

@@ -78,7 +85,9 @@ def search_chunks(
    query_request: ChunkIndexRequest,
    user_id: UUID | None,
    document_index: DocumentIndex,
-    db_session: Session,
+    db_session: Session | None = None,
+    embedding_model: EmbeddingModel | None = None,
+    prefetched_federated_retrieval_infos: list[FederatedRetrievalInfo] | None = None,
 ) -> list[InferenceChunk]:
    run_queries: list[tuple[Callable, tuple]] = []

@@ -88,14 +97,22 @@ def search_chunks(
        else None
    )

-    # Federated retrieval
-    federated_retrieval_infos = get_federated_retrieval_functions(
-        db_session=db_session,
-        user_id=user_id,
-        source_types=list(source_filters) if source_filters else None,
-        document_set_names=query_request.filters.document_set,
-        user_file_ids=query_request.filters.user_file_ids,
-    )
+    # Federated retrieval — use pre-fetched if available, otherwise query DB
+    if prefetched_federated_retrieval_infos is not None:
+        federated_retrieval_infos = prefetched_federated_retrieval_infos
+    else:
+        if db_session is None:
+            raise ValueError(
+                "Either db_session or prefetched_federated_retrieval_infos "
+                "must be provided"
+            )
+        federated_retrieval_infos = get_federated_retrieval_functions(
+            db_session=db_session,
+            user_id=user_id,
+            source_types=list(source_filters) if source_filters else None,
+            document_set_names=query_request.filters.document_set,
+            user_file_ids=query_request.filters.user_file_ids,
+        )

    federated_sources = set(
        federated_retrieval_info.source.to_non_federated_source()
@@ -114,7 +131,10 @@ def search_chunks(

    if normal_search_enabled:
        run_queries.append(
-            (_embed_and_search, (query_request, document_index, db_session))
+            (
+                _embed_and_search,
+                (query_request, document_index, db_session, embedding_model),
+            )
        )

    parallel_search_results = run_functions_tuples_in_parallel(run_queries)
--- a/backend/onyx/context/search/utils.py
+++ b/backend/onyx/context/search/utils.py
@@ -64,23 +64,34 @@ def inference_section_from_single_chunk(
    )


-def get_query_embeddings(queries: list[str], db_session: Session) -> list[Embedding]:
-    search_settings = get_current_search_settings(db_session)
+def get_query_embeddings(
+    queries: list[str],
+    db_session: Session | None = None,
+    embedding_model: EmbeddingModel | None = None,
+) -> list[Embedding]:
+    if embedding_model is None:
+        if db_session is None:
+            raise ValueError("Either db_session or embedding_model must be provided")
+        search_settings = get_current_search_settings(db_session)
+        embedding_model = EmbeddingModel.from_db_model(
+            search_settings=search_settings,
+            server_host=MODEL_SERVER_HOST,
+            server_port=MODEL_SERVER_PORT,
+        )

-    model = EmbeddingModel.from_db_model(
-        search_settings=search_settings,
-        # The below are globally set, this flow always uses the indexing one
-        server_host=MODEL_SERVER_HOST,
-        server_port=MODEL_SERVER_PORT,
-    )
-
-    query_embedding = model.encode(queries, text_type=EmbedTextType.QUERY)
+    query_embedding = embedding_model.encode(queries, text_type=EmbedTextType.QUERY)
    return query_embedding


@log_function_time(print_only=True, debug_only=True)
-def get_query_embedding(query: str, db_session: Session) -> Embedding:
-    return get_query_embeddings([query], db_session)[0]
+def get_query_embedding(
+    query: str,
+    db_session: Session | None = None,
+    embedding_model: EmbeddingModel | None = None,
+) -> Embedding:
+    return get_query_embeddings(
+        [query], db_session=db_session, embedding_model=embedding_model
+    )[0]


 def convert_inference_sections_to_search_docs(
--- a/backend/onyx/db/api_key.py
+++ b/backend/onyx/db/api_key.py
@@ -4,6 +4,7 @@ from fastapi_users.password import PasswordHelper
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.orm import joinedload
+from sqlalchemy.orm import selectinload
 from sqlalchemy.orm import Session

 from onyx.auth.api_key import ApiKeyDescriptor
@@ -54,6 +55,7 @@ async def fetch_user_for_api_key(
        select(User)
        .join(ApiKey, ApiKey.user_id == User.id)
        .where(ApiKey.hashed_api_key == hashed_api_key)
+        .options(selectinload(User.memories))
    )


--- a/backend/onyx/db/auth.py
+++ b/backend/onyx/db/auth.py
@@ -13,6 +13,7 @@ from sqlalchemy import func
 from sqlalchemy import Select
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
+from sqlalchemy.orm import selectinload
 from sqlalchemy.orm import Session

 from onyx.auth.schemas import UserRole
@@ -97,6 +98,11 @@ async def get_user_count(only_admin_users: bool = False) -> int:

 # Need to override this because FastAPI Users doesn't give flexibility for backend field creation logic in OAuth flow
 class SQLAlchemyUserAdminDB(SQLAlchemyUserDatabase[UP, ID]):
+    async def _get_user(self, statement: Select) -> UP | None:
+        statement = statement.options(selectinload(User.memories))
+        results = await self.session.execute(statement)
+        return results.unique().scalar_one_or_none()
+
    async def create(
        self,
        create_dict: Dict[str, Any],
--- a/backend/onyx/db/connector_credential_pair.py
+++ b/backend/onyx/db/connector_credential_pair.py
@@ -116,12 +116,15 @@ def get_connector_credential_pairs_for_user(
    order_by_desc: bool = False,
    source: DocumentSource | None = None,
    processing_mode: ProcessingMode | None = ProcessingMode.REGULAR,
+    defer_connector_config: bool = False,
 ) -> list[ConnectorCredentialPair]:
    """Get connector credential pairs for a user.

    Args:
        processing_mode: Filter by processing mode. Defaults to REGULAR to hide
            FILE_SYSTEM connectors from standard admin UI. Pass None to get all.
+        defer_connector_config: If True, skips loading Connector.connector_specific_config
+            to avoid fetching large JSONB blobs when they aren't needed.
    """
    if eager_load_user:
        assert (
@@ -130,7 +133,10 @@ def get_connector_credential_pairs_for_user(
    stmt = select(ConnectorCredentialPair).distinct()

    if eager_load_connector:
-        stmt = stmt.options(selectinload(ConnectorCredentialPair.connector))
+        connector_load = selectinload(ConnectorCredentialPair.connector)
+        if defer_connector_config:
+            connector_load = connector_load.defer(Connector.connector_specific_config)
+        stmt = stmt.options(connector_load)

    if eager_load_credential:
        load_opts = selectinload(ConnectorCredentialPair.credential)
@@ -170,6 +176,7 @@ def get_connector_credential_pairs_for_user_parallel(
    order_by_desc: bool = False,
    source: DocumentSource | None = None,
    processing_mode: ProcessingMode | None = ProcessingMode.REGULAR,
+    defer_connector_config: bool = False,
 ) -> list[ConnectorCredentialPair]:
    with get_session_with_current_tenant() as db_session:
        return get_connector_credential_pairs_for_user(
@@ -183,6 +190,7 @@ def get_connector_credential_pairs_for_user_parallel(
            order_by_desc=order_by_desc,
            source=source,
            processing_mode=processing_mode,
+            defer_connector_config=defer_connector_config,
        )


--- a/backend/onyx/db/dal.py
+++ b/backend/onyx/db/dal.py
@@ -0,0 +1,74 @@
+"""Base Data Access Layer (DAL) for database operations.
+
+The DAL pattern groups related database operations into cohesive classes
+with explicit session management. It supports two usage modes:
+
+  1. **External session** (FastAPI endpoints) — the caller provides a session
+     whose lifecycle is managed by FastAPI's dependency injection.
+
+  2. **Self-managed session** (Celery tasks, scripts) — the DAL creates its
+     own session via the tenant-aware session factory.
+
+Subclasses add domain-specific query methods while inheriting session
+management. See ``ee.onyx.db.scim.ScimDAL`` for a concrete example.
+
+Example (FastAPI)::
+
+    def get_scim_dal(db_session: Session = Depends(get_session)) -> ScimDAL:
+        return ScimDAL(db_session)
+
+    @router.get("/users")
+    def list_users(dal: ScimDAL = Depends(get_scim_dal)) -> ...:
+        return dal.list_user_mappings(...)
+
+Example (Celery)::
+
+    with ScimDAL.from_tenant("tenant_abc") as dal:
+        dal.create_user_mapping(...)
+        dal.commit()
+"""
+
+from __future__ import annotations
+
+from collections.abc import Generator
+from contextlib import contextmanager
+
+from sqlalchemy.orm import Session
+
+from onyx.db.engine.sql_engine import get_session_with_tenant
+
+
+class DAL:
+    """Base Data Access Layer.
+
+    Holds a SQLAlchemy session and provides transaction control helpers.
+    Subclasses add domain-specific query methods.
+    """
+
+    def __init__(self, db_session: Session) -> None:
+        self._session = db_session
+
+    @property
+    def session(self) -> Session:
+        """Direct access to the underlying session for advanced use cases."""
+        return self._session
+
+    def commit(self) -> None:
+        self._session.commit()
+
+    def flush(self) -> None:
+        self._session.flush()
+
+    def rollback(self) -> None:
+        self._session.rollback()
+
+    @classmethod
+    @contextmanager
+    def from_tenant(cls, tenant_id: str) -> Generator["DAL", None, None]:
+        """Create a DAL with a self-managed session for the given tenant.
+
+        The session is automatically closed when the context manager exits.
+        The caller must explicitly call ``commit()`` to persist changes.
+        """
+        with get_session_with_tenant(tenant_id=tenant_id) as session:
+            yield cls(session)
--- a/backend/onyx/db/document_set.py
+++ b/backend/onyx/db/document_set.py
@@ -554,10 +554,19 @@ def fetch_all_document_sets_for_user(
    stmt = (
        select(DocumentSetDBModel)
        .distinct()
-        .options(selectinload(DocumentSetDBModel.federated_connectors))
+        .options(
+            selectinload(DocumentSetDBModel.connector_credential_pairs).selectinload(
+                ConnectorCredentialPair.connector
+            ),
+            selectinload(DocumentSetDBModel.users),
+            selectinload(DocumentSetDBModel.groups),
+            selectinload(DocumentSetDBModel.federated_connectors).selectinload(
+                FederatedConnector__DocumentSet.federated_connector
+            ),
+        )
    )
    stmt = _add_user_filters(stmt, user, get_editable=get_editable)
-    return db_session.scalars(stmt).all()
+    return db_session.scalars(stmt).unique().all()


 def fetch_documents_for_document_set_paginated(
--- a/backend/onyx/db/engine/tenant_utils.py
+++ b/backend/onyx/db/engine/tenant_utils.py
@@ -1,11 +1,102 @@
 from sqlalchemy import text

 from onyx.db.engine.sql_engine import get_session_with_shared_schema
+from onyx.db.engine.sql_engine import SqlEngine
 from shared_configs.configs import MULTI_TENANT
 from shared_configs.configs import POSTGRES_DEFAULT_SCHEMA
 from shared_configs.configs import TENANT_ID_PREFIX


+def get_schemas_needing_migration(
+    tenant_schemas: list[str], head_rev: str
+) -> list[str]:
+    """Return only schemas whose current alembic version is not at head.
+
+    Uses a server-side PL/pgSQL loop to collect each schema's alembic version
+    into a temp table one at a time. This avoids building a massive UNION ALL
+    query (which locks the DB and times out at 17k+ schemas) and instead
+    acquires locks sequentially, one schema per iteration.
+    """
+    if not tenant_schemas:
+        return []
+
+    engine = SqlEngine.get_engine()
+
+    with engine.connect() as conn:
+        # Populate a temp input table with exactly the schemas we care about.
+        # The DO block reads from this table so it only iterates the requested
+        # schemas instead of every tenant_% schema in the database.
+        conn.execute(text("DROP TABLE IF EXISTS _alembic_version_snapshot"))
+        conn.execute(text("DROP TABLE IF EXISTS _tenant_schemas_input"))
+        conn.execute(text("CREATE TEMP TABLE _tenant_schemas_input (schema_name text)"))
+        conn.execute(
+            text(
+                "INSERT INTO _tenant_schemas_input (schema_name) "
+                "SELECT unnest(CAST(:schemas AS text[]))"
+            ),
+            {"schemas": tenant_schemas},
+        )
+        conn.execute(
+            text(
+                "CREATE TEMP TABLE _alembic_version_snapshot "
+                "(schema_name text, version_num text)"
+            )
+        )
+
+        conn.execute(
+            text(
+                """
+                DO $$
+                DECLARE
+                    s        text;
+                    schemas  text[];
+                BEGIN
+                    SELECT array_agg(schema_name) INTO schemas
+                    FROM _tenant_schemas_input;
+
+                    IF schemas IS NULL THEN
+                        RAISE NOTICE 'No tenant schemas found.';
+                        RETURN;
+                    END IF;
+
+                    FOREACH s IN ARRAY schemas LOOP
+                        BEGIN
+                            EXECUTE format(
+                                'INSERT INTO _alembic_version_snapshot
+                                 SELECT %L, version_num FROM %I.alembic_version',
+                                s, s
+                            );
+                        EXCEPTION
+                            -- undefined_table: schema exists but has no alembic_version
+                            --   table yet (new tenant, not yet migrated).
+                            -- invalid_schema_name: tenant is registered but its
+                            --   PostgreSQL schema does not exist yet (e.g. provisioning
+                            --   incomplete). Both cases mean no version is available and
+                            --   the schema will be included in the migration list.
+                            WHEN undefined_table THEN NULL;
+                            WHEN invalid_schema_name THEN NULL;
+                        END;
+                    END LOOP;
+                END;
+                $$
+                """
+            )
+        )
+
+        rows = conn.execute(
+            text("SELECT schema_name, version_num FROM _alembic_version_snapshot")
+        )
+        version_by_schema = {row[0]: row[1] for row in rows}
+
+        conn.execute(text("DROP TABLE IF EXISTS _alembic_version_snapshot"))
+        conn.execute(text("DROP TABLE IF EXISTS _tenant_schemas_input"))
+
+    # Schemas missing from the snapshot have no alembic_version table yet and
+    # also need migration. version_by_schema.get(s) returns None for those,
+    # and None != head_rev, so they are included automatically.
+    return [s for s in tenant_schemas if version_by_schema.get(s) != head_rev]
+
+
 def get_all_tenant_ids() -> list[str]:
    """Returning [None] means the only tenant is the 'public' or self hosted tenant."""

--- a/backend/onyx/db/enums.py
+++ b/backend/onyx/db/enums.py
@@ -232,6 +232,12 @@ class BuildSessionStatus(str, PyEnum):
    IDLE = "idle"


+class SharingScope(str, PyEnum):
+    PRIVATE = "private"
+    PUBLIC_ORG = "public_org"
+    PUBLIC_GLOBAL = "public_global"
+
+
 class SandboxStatus(str, PyEnum):
    PROVISIONING = "provisioning"
    RUNNING = "running"
--- a/backend/onyx/db/llm.py
+++ b/backend/onyx/db/llm.py
@@ -430,7 +430,7 @@ def fetch_existing_models(

 def fetch_existing_llm_providers(
    db_session: Session,
-    flow_types: list[LLMModelFlowType],
+    flow_type_filter: list[LLMModelFlowType],
    only_public: bool = False,
    exclude_image_generation_providers: bool = True,
 ) -> list[LLMProviderModel]:
@@ -438,30 +438,27 @@ def fetch_existing_llm_providers(

    Args:
        db_session: Database session
-        flow_types: List of flow types to filter by
+        flow_type_filter: List of flow types to filter by, empty list for no filter
        only_public: If True, only return public providers
        exclude_image_generation_providers: If True, exclude providers that are
            used for image generation configs
    """
-    providers_with_flows = (
-        select(ModelConfiguration.llm_provider_id)
-        .join(LLMModelFlow)
-        .where(LLMModelFlow.llm_model_flow_type.in_(flow_types))
-        .distinct()
-    )
+    stmt = select(LLMProviderModel)
+
+    if flow_type_filter:
+        providers_with_flows = (
+            select(ModelConfiguration.llm_provider_id)
+            .join(LLMModelFlow)
+            .where(LLMModelFlow.llm_model_flow_type.in_(flow_type_filter))
+            .distinct()
+        )
+        stmt = stmt.where(LLMProviderModel.id.in_(providers_with_flows))

    if exclude_image_generation_providers:
-        stmt = select(LLMProviderModel).where(
-            LLMProviderModel.id.in_(providers_with_flows)
-        )
-    else:
        image_gen_provider_ids = select(ModelConfiguration.llm_provider_id).join(
            ImageGenerationConfig
        )
-        stmt = select(LLMProviderModel).where(
-            LLMProviderModel.id.in_(providers_with_flows)
-            | LLMProviderModel.id.in_(image_gen_provider_ids)
-        )
+        stmt = stmt.where(~LLMProviderModel.id.in_(image_gen_provider_ids))

    stmt = stmt.options(
        selectinload(LLMProviderModel.model_configurations),
@@ -797,13 +794,15 @@ def sync_auto_mode_models(
                changes += 1
        else:
            # Add new model - all models from GitHub config are visible
-            new_model = ModelConfiguration(
+            insert_new_model_configuration__no_commit(
+                db_session=db_session,
                llm_provider_id=provider.id,
-                name=model_config.name,
-                display_name=model_config.display_name,
+                model_name=model_config.name,
+                supported_flows=[LLMModelFlowType.CHAT],
                is_visible=True,
+                max_input_tokens=None,
+                display_name=model_config.display_name,
            )
-            db_session.add(new_model)
            changes += 1

    # In Auto mode, default model is always set from GitHub config
--- a/backend/onyx/db/models.py
+++ b/backend/onyx/db/models.py
@@ -77,6 +77,7 @@ from onyx.db.enums import (
    ThemePreference,
    DefaultAppMode,
    SwitchoverType,
+    SharingScope,
 )
 from onyx.configs.constants import NotificationType
 from onyx.configs.constants import SearchFeedbackType
@@ -286,7 +287,7 @@ class User(SQLAlchemyBaseUserTableUUID, Base):

    # relationships
    credentials: Mapped[list["Credential"]] = relationship(
-        "Credential", back_populates="user", lazy="joined"
+        "Credential", back_populates="user"
    )
    chat_sessions: Mapped[list["ChatSession"]] = relationship(
        "ChatSession", back_populates="user"
@@ -320,7 +321,6 @@ class User(SQLAlchemyBaseUserTableUUID, Base):
        "Memory",
        back_populates="user",
        cascade="all, delete-orphan",
-        lazy="selectin",
        order_by="desc(Memory.id)",
    )
    oauth_user_tokens: Mapped[list["OAuthUserToken"]] = relationship(
@@ -806,12 +806,13 @@ class HierarchyNode(Base):
        "HierarchyNode", remote_side=[id], back_populates="children"
    )
    children: Mapped[list["HierarchyNode"]] = relationship(
-        "HierarchyNode", back_populates="parent"
+        "HierarchyNode", back_populates="parent", passive_deletes=True
    )
    child_documents: Mapped[list["Document"]] = relationship(
        "Document",
        back_populates="parent_hierarchy_node",
        foreign_keys="Document.parent_hierarchy_node_id",
+        passive_deletes=True,
    )
    # Personas that have this hierarchy node attached for scoped search
    personas: Mapped[list["Persona"]] = relationship(
@@ -936,6 +937,7 @@ class Document(Base):
        "HierarchyNode",
        back_populates="document",
        foreign_keys="HierarchyNode.document_id",
+        passive_deletes=True,
    )
    # Personas that have this document directly attached for scoped search
    attached_personas: Mapped[list["Persona"]] = relationship(
@@ -1038,7 +1040,9 @@ class OpenSearchTenantMigrationRecord(Base):
        nullable=False,
    )
    # Opaque continuation token from Vespa's Visit API.
-    # NULL means "not started" or "visit completed".
+    # NULL means "not started".
+    # Otherwise contains a serialized mapping between slice ID and continuation
+    # token for that slice.
    vespa_visit_continuation_token: Mapped[str | None] = mapped_column(
        Text, nullable=True
    )
@@ -1062,6 +1066,9 @@ class OpenSearchTenantMigrationRecord(Base):
    enable_opensearch_retrieval: Mapped[bool] = mapped_column(
        Boolean, nullable=False, default=False
    )
+    approx_chunk_count_in_vespa: Mapped[int | None] = mapped_column(
+        Integer, nullable=True
+    )


 class KGEntityType(Base):
@@ -4710,6 +4717,12 @@ class BuildSession(Base):
    demo_data_enabled: Mapped[bool] = mapped_column(
        Boolean, nullable=False, server_default=text("true")
    )
+    sharing_scope: Mapped[SharingScope] = mapped_column(
+        String,
+        nullable=False,
+        default=SharingScope.PRIVATE,
+        server_default="private",
+    )

    # Relationships
    user: Mapped[User | None] = relationship("User", foreign_keys=[user_id])
@@ -4926,6 +4939,7 @@ class ScimUserMapping(Base):
    user_id: Mapped[UUID] = mapped_column(
        ForeignKey("user.id", ondelete="CASCADE"), unique=True, nullable=False
    )
+    scim_username: Mapped[str | None] = mapped_column(String, nullable=True)

    created_at: Mapped[datetime.datetime] = mapped_column(
        DateTime(timezone=True), server_default=func.now(), nullable=False
@@ -4964,3 +4978,12 @@ class ScimGroupMapping(Base):
    user_group: Mapped[UserGroup] = relationship(
        "UserGroup", foreign_keys=[user_group_id]
    )
+
+
+class CodeInterpreterServer(Base):
+    """Details about the code interpreter server"""
+
+    __tablename__ = "code_interpreter_server"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True)
+    server_enabled: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
--- a/backend/onyx/db/opensearch_migration.py
+++ b/backend/onyx/db/opensearch_migration.py
@@ -4,6 +4,7 @@ This module provides functions to track the progress of migrating documents
 from Vespa to OpenSearch.
 """

+import json
 from datetime import datetime
 from datetime import timezone

@@ -12,6 +13,9 @@ from sqlalchemy import text
 from sqlalchemy.dialects.postgresql import insert
 from sqlalchemy.orm import Session

+from onyx.background.celery.tasks.opensearch_migration.constants import (
+    GET_VESPA_CHUNKS_SLICE_COUNT,
+)
 from onyx.background.celery.tasks.opensearch_migration.constants import (
    TOTAL_ALLOWABLE_DOC_MIGRATION_ATTEMPTS_BEFORE_PERMANENT_FAILURE,
 )
@@ -243,29 +247,37 @@ def should_document_migration_be_permanently_failed(

 def get_vespa_visit_state(
    db_session: Session,
-) -> tuple[str | None, int]:
+) -> tuple[dict[int, str | None], int]:
    """Gets the current Vespa migration state from the tenant migration record.

    Requires the OpenSearchTenantMigrationRecord to exist.

    Returns:
-        Tuple of (continuation_token, total_chunks_migrated). continuation_token
-            is None if not started or completed.
+        Tuple of (continuation_token_map, total_chunks_migrated).
    """
    record = db_session.query(OpenSearchTenantMigrationRecord).first()
    if record is None:
        raise RuntimeError("OpenSearchTenantMigrationRecord not found.")
-    return (
-        record.vespa_visit_continuation_token,
-        record.total_chunks_migrated,
-    )
+    if record.vespa_visit_continuation_token is None:
+        continuation_token_map: dict[int, str | None] = {
+            slice_id: None for slice_id in range(GET_VESPA_CHUNKS_SLICE_COUNT)
+        }
+    else:
+        json_loaded_continuation_token_map = json.loads(
+            record.vespa_visit_continuation_token
+        )
+        continuation_token_map = {
+            int(key): value for key, value in json_loaded_continuation_token_map.items()
+        }
+    return continuation_token_map, record.total_chunks_migrated


 def update_vespa_visit_progress_with_commit(
    db_session: Session,
-    continuation_token: str | None,
+    continuation_token_map: dict[int, str | None],
    chunks_processed: int,
    chunks_errored: int,
+    approx_chunk_count_in_vespa: int | None,
 ) -> None:
    """Updates the Vespa migration progress and commits.

@@ -273,19 +285,26 @@ def update_vespa_visit_progress_with_commit(

    Args:
        db_session: SQLAlchemy session.
-        continuation_token: The new continuation token. None means the visit
-            is complete.
+        continuation_token_map: The new continuation token map. None entry means
+            the visit is complete for that slice.
        chunks_processed: Number of chunks processed in this batch (added to
            the running total).
        chunks_errored: Number of chunks errored in this batch (added to the
            running errored total).
+        approx_chunk_count_in_vespa: Approximate number of chunks in Vespa. If
+            None, the existing value is used.
    """
    record = db_session.query(OpenSearchTenantMigrationRecord).first()
    if record is None:
        raise RuntimeError("OpenSearchTenantMigrationRecord not found.")
-    record.vespa_visit_continuation_token = continuation_token
+    record.vespa_visit_continuation_token = json.dumps(continuation_token_map)
    record.total_chunks_migrated += chunks_processed
    record.total_chunks_errored += chunks_errored
+    record.approx_chunk_count_in_vespa = (
+        approx_chunk_count_in_vespa
+        if approx_chunk_count_in_vespa is not None
+        else record.approx_chunk_count_in_vespa
+    )
    db_session.commit()


@@ -353,25 +372,27 @@ def build_sanitized_to_original_doc_id_mapping(

 def get_opensearch_migration_state(
    db_session: Session,
-) -> tuple[int, datetime | None, datetime | None]:
+) -> tuple[int, datetime | None, datetime | None, int | None]:
    """Returns the state of the Vespa to OpenSearch migration.

    If the tenant migration record is not found, returns defaults of 0, None,
-    None.
+    None, None.

    Args:
        db_session: SQLAlchemy session.

    Returns:
-        Tuple of (total_chunks_migrated, created_at, migration_completed_at).
+        Tuple of (total_chunks_migrated, created_at, migration_completed_at,
+            approx_chunk_count_in_vespa).
    """
    record = db_session.query(OpenSearchTenantMigrationRecord).first()
    if record is None:
-        return 0, None, None
+        return 0, None, None, None
    return (
        record.total_chunks_migrated,
        record.created_at,
        record.migration_completed_at,
+        record.approx_chunk_count_in_vespa,
    )


--- a/backend/onyx/db/pat.py
+++ b/backend/onyx/db/pat.py
@@ -8,6 +8,7 @@ from uuid import UUID
 from sqlalchemy import select
 from sqlalchemy import update
 from sqlalchemy.ext.asyncio import AsyncSession
+from sqlalchemy.orm import selectinload
 from sqlalchemy.orm import Session

 from onyx.auth.pat import build_displayable_pat
@@ -31,55 +32,61 @@ async def fetch_user_for_pat(

    NOTE: This is async since it's used during auth (which is necessarily async due to FastAPI Users).
    NOTE: Expired includes both naturally expired and user-revoked tokens (revocation sets expires_at=NOW()).
+
+    Uses select(User) as primary entity so that joined-eager relationships (e.g. oauth_accounts)
+    are loaded correctly — matching the pattern in fetch_user_for_api_key.
    """
-    # Single joined query with all filters pushed to database
    now = datetime.now(timezone.utc)
-    result = await async_db_session.execute(
-        select(PersonalAccessToken, User)
-        .join(User, PersonalAccessToken.user_id == User.id)
+
+    user = await async_db_session.scalar(
+        select(User)
+        .join(PersonalAccessToken, PersonalAccessToken.user_id == User.id)
        .where(PersonalAccessToken.hashed_token == hashed_token)
        .where(User.is_active)  # type: ignore
        .where(
            (PersonalAccessToken.expires_at.is_(None))
            | (PersonalAccessToken.expires_at > now)
        )
-        .limit(1)
+        .options(selectinload(User.memories))
    )
-    row = result.first()
-
-    if not row:
+    if not user:
        return None

-    pat, user = row
-
-    # Throttle last_used_at updates to reduce DB load (5-minute granularity sufficient for auditing)
-    # For request-level auditing, use application logs or a dedicated audit table
-    should_update = (
-        pat.last_used_at is None or (now - pat.last_used_at).total_seconds() > 300
-    )
-
-    if should_update:
-        # Update in separate session to avoid transaction coupling (fire-and-forget)
-        async def _update_last_used() -> None:
-            try:
-                tenant_id = get_current_tenant_id()
-                async with get_async_session_context_manager(
-                    tenant_id
-                ) as separate_session:
-                    await separate_session.execute(
-                        update(PersonalAccessToken)
-                        .where(PersonalAccessToken.hashed_token == hashed_token)
-                        .values(last_used_at=now)
-                    )
-                    await separate_session.commit()
-            except Exception as e:
-                logger.warning(f"Failed to update last_used_at for PAT: {e}")
-
-        asyncio.create_task(_update_last_used())
-
+    _schedule_pat_last_used_update(hashed_token, now)
    return user


+def _schedule_pat_last_used_update(hashed_token: str, now: datetime) -> None:
+    """Fire-and-forget update of last_used_at, throttled to 5-minute granularity."""
+
+    async def _update() -> None:
+        try:
+            tenant_id = get_current_tenant_id()
+            async with get_async_session_context_manager(tenant_id) as session:
+                pat = await session.scalar(
+                    select(PersonalAccessToken).where(
+                        PersonalAccessToken.hashed_token == hashed_token
+                    )
+                )
+                if not pat:
+                    return
+                if (
+                    pat.last_used_at is not None
+                    and (now - pat.last_used_at).total_seconds() <= 300
+                ):
+                    return
+                await session.execute(
+                    update(PersonalAccessToken)
+                    .where(PersonalAccessToken.hashed_token == hashed_token)
+                    .values(last_used_at=now)
+                )
+                await session.commit()
+        except Exception as e:
+            logger.warning(f"Failed to update last_used_at for PAT: {e}")
+
+    asyncio.create_task(_update())
+
+
 def create_pat(
    db_session: Session,
    user_id: UUID,
--- a/backend/onyx/db/persona.py
+++ b/backend/onyx/db/persona.py
@@ -28,6 +28,7 @@ from onyx.db.document_access import get_accessible_documents_by_ids
 from onyx.db.models import ConnectorCredentialPair
 from onyx.db.models import Document
 from onyx.db.models import DocumentSet
+from onyx.db.models import FederatedConnector__DocumentSet
 from onyx.db.models import HierarchyNode
 from onyx.db.models import Persona
 from onyx.db.models import Persona__User
@@ -420,9 +421,16 @@ def get_minimal_persona_snapshots_for_user(
    stmt = stmt.options(
        selectinload(Persona.tools),
        selectinload(Persona.labels),
-        selectinload(Persona.document_sets)
-        .selectinload(DocumentSet.connector_credential_pairs)
-        .selectinload(ConnectorCredentialPair.connector),
+        selectinload(Persona.document_sets).options(
+            selectinload(DocumentSet.connector_credential_pairs).selectinload(
+                ConnectorCredentialPair.connector
+            ),
+            selectinload(DocumentSet.users),
+            selectinload(DocumentSet.groups),
+            selectinload(DocumentSet.federated_connectors).selectinload(
+                FederatedConnector__DocumentSet.federated_connector
+            ),
+        ),
        selectinload(Persona.hierarchy_nodes),
        selectinload(Persona.attached_documents).selectinload(
            Document.parent_hierarchy_node
@@ -453,7 +461,16 @@ def get_persona_snapshots_for_user(
            Document.parent_hierarchy_node
        ),
        selectinload(Persona.labels),
-        selectinload(Persona.document_sets),
+        selectinload(Persona.document_sets).options(
+            selectinload(DocumentSet.connector_credential_pairs).selectinload(
+                ConnectorCredentialPair.connector
+            ),
+            selectinload(DocumentSet.users),
+            selectinload(DocumentSet.groups),
+            selectinload(DocumentSet.federated_connectors).selectinload(
+                FederatedConnector__DocumentSet.federated_connector
+            ),
+        ),
        selectinload(Persona.user),
        selectinload(Persona.user_files),
        selectinload(Persona.users),
@@ -550,9 +567,16 @@ def get_minimal_persona_snapshots_paginated(
            Document.parent_hierarchy_node
        ),
        selectinload(Persona.labels),
-        selectinload(Persona.document_sets)
-        .selectinload(DocumentSet.connector_credential_pairs)
-        .selectinload(ConnectorCredentialPair.connector),
+        selectinload(Persona.document_sets).options(
+            selectinload(DocumentSet.connector_credential_pairs).selectinload(
+                ConnectorCredentialPair.connector
+            ),
+            selectinload(DocumentSet.users),
+            selectinload(DocumentSet.groups),
+            selectinload(DocumentSet.federated_connectors).selectinload(
+                FederatedConnector__DocumentSet.federated_connector
+            ),
+        ),
        selectinload(Persona.user),
    )

@@ -611,7 +635,16 @@ def get_persona_snapshots_paginated(
            Document.parent_hierarchy_node
        ),
        selectinload(Persona.labels),
-        selectinload(Persona.document_sets),
+        selectinload(Persona.document_sets).options(
+            selectinload(DocumentSet.connector_credential_pairs).selectinload(
+                ConnectorCredentialPair.connector
+            ),
+            selectinload(DocumentSet.users),
+            selectinload(DocumentSet.groups),
+            selectinload(DocumentSet.federated_connectors).selectinload(
+                FederatedConnector__DocumentSet.federated_connector
+            ),
+        ),
        selectinload(Persona.user),
        selectinload(Persona.user_files),
        selectinload(Persona.users),
--- a/backend/onyx/document_index/opensearch/client.py
+++ b/backend/onyx/document_index/opensearch/client.py
@@ -54,6 +54,9 @@ class SearchHit(BaseModel, Generic[SchemaDocumentModel]):
    # Maps schema property name to a list of highlighted snippets with match
    # terms wrapped in tags (e.g. "something <hi>keyword</hi> other thing").
    match_highlights: dict[str, list[str]] = {}
+    # Score explanation from OpenSearch when "explain": true is set in the query.
+    # Contains detailed breakdown of how the score was calculated.
+    explanation: dict[str, Any] | None = None


 def get_new_body_without_vectors(body: dict[str, Any]) -> dict[str, Any]:
@@ -706,10 +709,12 @@ class OpenSearchClient:
                )
            document_chunk_score = hit.get("_score", None)
            match_highlights: dict[str, list[str]] = hit.get("highlight", {})
+            explanation: dict[str, Any] | None = hit.get("_explanation", None)
            search_hit = SearchHit[DocumentChunk](
                document_chunk=DocumentChunk.model_validate(document_chunk_source),
                score=document_chunk_score,
                match_highlights=match_highlights,
+                explanation=explanation,
            )
            search_hits.append(search_hit)
        logger.debug(
--- a/backend/onyx/document_index/opensearch/constants.py
+++ b/backend/onyx/document_index/opensearch/constants.py
@@ -10,31 +10,31 @@ EF_CONSTRUCTION = 256
 # quality but increase memory footprint. Values typically range between 12 - 48.
 M = 32  # Set relatively high for better accuracy.

-# Number of vectors to examine for top k neighbors for the HNSW method.
-# Should be >= DEFAULT_K_NUM_CANDIDATES for good recall; higher = better accuracy, slower search.
-# Bumped this to 1000, for dataset of low 10,000 docs, did not see improvement in recall.
-EF_SEARCH = 256
+# When performing hybrid search, we need to consider more candidates than the number of results to be returned.
+# This is because the scoring is hybrid and the results are reordered due to the hybrid scoring.
+# Higher = more candidates for hybrid fusion = better retrieval accuracy, but results in more computation per query.
+# Imagine a simple case with a single keyword query and a single vector query and we want 10 final docs.
+# If we only fetch 10 candidates from each of keyword and vector, they would have to have perfect overlap to get a good hybrid
+# ranking for the 10 results. If we fetch 1000 candidates from each, we have a much higher chance of all 10 of the final desired
+# docs showing up and getting scored. In worse situations, the final 10 docs don't even show up as the final 10 (worse than just
+# a miss at the reranking step).
+DEFAULT_NUM_HYBRID_SEARCH_CANDIDATES = 750

-# The default number of neighbors to consider for knn vector similarity search.
-# We need this higher than the number of results because the scoring is hybrid.
-# If there is only 1 query, setting k equal to the number of results is enough,
-# but since there is heavy reordering due to hybrid scoring, we need to set k higher.
-# Higher = more candidates for hybrid fusion = better retrieval accuracy, more query cost.
-DEFAULT_K_NUM_CANDIDATES = 50  # TODO likely need to bump this way higher
+# Number of vectors to examine for top k neighbors for the HNSW method.
+EF_SEARCH = DEFAULT_NUM_HYBRID_SEARCH_CANDIDATES

 # Since the titles are included in the contents, they are heavily downweighted as they act as a boost
 # rather than an independent scoring component.
 SEARCH_TITLE_VECTOR_WEIGHT = 0.1
-SEARCH_TITLE_KEYWORD_WEIGHT = 0.1
-SEARCH_CONTENT_VECTOR_WEIGHT = 0.4
-SEARCH_CONTENT_KEYWORD_WEIGHT = 0.4
+SEARCH_CONTENT_VECTOR_WEIGHT = 0.45
+# Single keyword weight for both title and content (merged from former title keyword + content keyword).
+SEARCH_KEYWORD_WEIGHT = 0.45

 # NOTE: it is critical that the order of these weights matches the order of the sub-queries in the hybrid search.
 HYBRID_SEARCH_NORMALIZATION_WEIGHTS = [
    SEARCH_TITLE_VECTOR_WEIGHT,
-    SEARCH_TITLE_KEYWORD_WEIGHT,
    SEARCH_CONTENT_VECTOR_WEIGHT,
-    SEARCH_CONTENT_KEYWORD_WEIGHT,
+    SEARCH_KEYWORD_WEIGHT,
 ]

 assert sum(HYBRID_SEARCH_NORMALIZATION_WEIGHTS) == 1.0
--- a/backend/onyx/document_index/opensearch/opensearch_document_index.py
+++ b/backend/onyx/document_index/opensearch/opensearch_document_index.py
@@ -842,6 +842,8 @@ class OpenSearchDocumentIndex(DocumentIndex):
            body=query_body,
            search_pipeline_id=ZSCORE_NORMALIZATION_PIPELINE_NAME,
        )
+
+        # Good place for a breakpoint to inspect the search hits if you have "explain" enabled.
        inference_chunks_uncleaned: list[InferenceChunkUncleaned] = [
            _convert_retrieved_opensearch_chunk_to_inference_chunk_uncleaned(
                search_hit.document_chunk, search_hit.score, search_hit.match_highlights
--- a/backend/onyx/document_index/opensearch/schema.py
+++ b/backend/onyx/document_index/opensearch/schema.py
@@ -11,6 +11,7 @@ from pydantic import model_serializer
 from pydantic import model_validator
 from pydantic import SerializerFunctionWrapHandler

+from onyx.configs.app_configs import OPENSEARCH_TEXT_ANALYZER
 from onyx.document_index.interfaces_new import TenantState
 from onyx.document_index.opensearch.constants import DEFAULT_MAX_CHUNK_SIZE
 from onyx.document_index.opensearch.constants import EF_CONSTRUCTION
@@ -54,6 +55,11 @@ SECONDARY_OWNERS_FIELD_NAME = "secondary_owners"
 ANCESTOR_HIERARCHY_NODE_IDS_FIELD_NAME = "ancestor_hierarchy_node_ids"


+# Faiss was also tried but it didn't have any benefits
+# NMSLIB is deprecated, not recommended
+OPENSEARCH_KNN_ENGINE = "lucene"
+
+
 def get_opensearch_doc_chunk_id(
    tenant_state: TenantState,
    document_id: str,
@@ -343,6 +349,9 @@ class DocumentSchema:
            "properties": {
                TITLE_FIELD_NAME: {
                    "type": "text",
+                    # Language analyzer (e.g. english) stems at index and search time for variant matching.
+                    # Configure via OPENSEARCH_TEXT_ANALYZER. Existing indices need reindexing after a change.
+                    "analyzer": OPENSEARCH_TEXT_ANALYZER,
                    "fields": {
                        # Subfield accessed as title.keyword. Not indexed for
                        # values longer than 256 chars.
@@ -357,9 +366,7 @@ class DocumentSchema:
                CONTENT_FIELD_NAME: {
                    "type": "text",
                    "store": True,
-                    # This makes highlighting text during queries more efficient
-                    # at the cost of disk space. See
-                    # https://docs.opensearch.org/latest/search-plugins/searching-data/highlight/#methods-of-obtaining-offsets
+                    "analyzer": OPENSEARCH_TEXT_ANALYZER,
                    "index_options": "offsets",
                },
                TITLE_VECTOR_FIELD_NAME: {
@@ -368,7 +375,7 @@ class DocumentSchema:
                    "method": {
                        "name": "hnsw",
                        "space_type": "cosinesimil",
-                        "engine": "lucene",
+                        "engine": OPENSEARCH_KNN_ENGINE,
                        "parameters": {"ef_construction": EF_CONSTRUCTION, "m": M},
                    },
                },
@@ -380,7 +387,7 @@ class DocumentSchema:
                    "method": {
                        "name": "hnsw",
                        "space_type": "cosinesimil",
-                        "engine": "lucene",
+                        "engine": OPENSEARCH_KNN_ENGINE,
                        "parameters": {"ef_construction": EF_CONSTRUCTION, "m": M},
                    },
                },
--- a/backend/onyx/document_index/opensearch/search.py
+++ b/backend/onyx/document_index/opensearch/search.py
@@ -6,13 +6,16 @@ from typing import Any
 from uuid import UUID

 from onyx.configs.app_configs import DEFAULT_OPENSEARCH_QUERY_TIMEOUT_S
+from onyx.configs.app_configs import OPENSEARCH_EXPLAIN_ENABLED
 from onyx.configs.app_configs import OPENSEARCH_PROFILING_DISABLED
 from onyx.configs.constants import DocumentSource
 from onyx.configs.constants import INDEX_SEPARATOR
 from onyx.context.search.models import IndexFilters
 from onyx.context.search.models import Tag
 from onyx.document_index.interfaces_new import TenantState
-from onyx.document_index.opensearch.constants import DEFAULT_K_NUM_CANDIDATES
+from onyx.document_index.opensearch.constants import (
+    DEFAULT_NUM_HYBRID_SEARCH_CANDIDATES,
+)
 from onyx.document_index.opensearch.constants import HYBRID_SEARCH_NORMALIZATION_WEIGHTS
 from onyx.document_index.opensearch.schema import ACCESS_CONTROL_LIST_FIELD_NAME
 from onyx.document_index.opensearch.schema import ANCESTOR_HIERARCHY_NODE_IDS_FIELD_NAME
@@ -240,6 +243,9 @@ class DocumentQuery:
        Returns:
            A dictionary representing the final hybrid search query.
        """
+        # WARNING: Profiling does not work with hybrid search; do not add it at
+        # this level. See https://github.com/opensearch-project/neural-search/issues/1255
+
        if num_hits > DEFAULT_OPENSEARCH_MAX_RESULT_WINDOW:
            raise ValueError(
                f"Bug: num_hits ({num_hits}) is greater than the current maximum allowed "
@@ -247,7 +253,7 @@ class DocumentQuery:
            )

        hybrid_search_subqueries = DocumentQuery._get_hybrid_search_subqueries(
-            query_text, query_vector, num_candidates=DEFAULT_K_NUM_CANDIDATES
+            query_text, query_vector
        )
        hybrid_search_filters = DocumentQuery._get_search_filters(
            tenant_state=tenant_state,
@@ -275,25 +281,31 @@ class DocumentQuery:
        hybrid_search_query: dict[str, Any] = {
            "hybrid": {
                "queries": hybrid_search_subqueries,
-                # Applied to all the sub-queries. Source:
+                # Max results per subquery per shard before aggregation. Ensures keyword and vector
+                # subqueries contribute equally to the candidate pool for hybrid fusion.
+                # Sources:
+                # https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/pagination/
+                # https://opensearch.org/blog/navigating-pagination-in-hybrid-queries-with-the-pagination_depth-parameter/
+                "pagination_depth": DEFAULT_NUM_HYBRID_SEARCH_CANDIDATES,
+                # Applied to all the sub-queries independently (this avoids having subqueries having a lot of results thrown out).
+                # Sources:
                # https://docs.opensearch.org/latest/query-dsl/compound/hybrid/
+                # https://opensearch.org/blog/introducing-common-filter-support-for-hybrid-search-queries
                # Does AND for each filter in the list.
                "filter": {"bool": {"filter": hybrid_search_filters}},
            }
        }

-        # NOTE: By default, hybrid search retrieves "size"-many results from
-        # each OpenSearch shard before aggregation. Source:
-        # https://docs.opensearch.org/latest/vector-search/ai-search/hybrid-search/pagination/
-
        final_hybrid_search_body: dict[str, Any] = {
            "query": hybrid_search_query,
            "size": num_hits,
            "highlight": match_highlights_configuration,
            "timeout": f"{DEFAULT_OPENSEARCH_QUERY_TIMEOUT_S}s",
        }
-        # WARNING: Profiling does not work with hybrid search; do not add it at
-        # this level. See https://github.com/opensearch-project/neural-search/issues/1255
+
+        # Explain is for scoring breakdowns.
+        if OPENSEARCH_EXPLAIN_ENABLED:
+            final_hybrid_search_body["explain"] = True

        return final_hybrid_search_body

@@ -355,7 +367,12 @@ class DocumentQuery:

    @staticmethod
    def _get_hybrid_search_subqueries(
-        query_text: str, query_vector: list[float], num_candidates: int
+        query_text: str,
+        query_vector: list[float],
+        # The default number of neighbors to consider for knn vector similarity search.
+        # This is higher than the number of results because the scoring is hybrid.
+        # for a detailed breakdown, see where the default value is set.
+        vector_candidates: int = DEFAULT_NUM_HYBRID_SEARCH_CANDIDATES,
    ) -> list[dict[str, Any]]:
        """Returns subqueries for hybrid search.

@@ -367,9 +384,8 @@ class DocumentQuery:

        Matches:
          - Title vector
-          - Title keyword
          - Content vector
-          - Content keyword + phrase
+          - Keyword (title + content, match and phrase)

        Normalization is not performed here.
        The weights of each of these subqueries should be configured in a search
@@ -390,9 +406,9 @@ class DocumentQuery:
        NOTE: Options considered and rejected:
        - minimum_should_match: Since it's hybrid search and users often provide semantic queries, there is often a lot of terms,
          and very low number of meaningful keywords (and a low ratio of keywords).
-        - fuzziness AUTO: typo tolerance (0/1/2 edit distance by term length). This is reasonable but in reality seeing the
-          user usage patterns, this is not very common and people tend to not be confused when a miss happens for this reason.
-          In testing datasets, this makes recall slightly worse.
+        - fuzziness AUTO: typo tolerance (0/1/2 edit distance by term length). It's mostly for typos as the analyzer ("english by
+          default") already does some stemming and tokenization. In testing datasets, this makes recall slightly worse. It also is
+          less performant so not really any reason to do it.

        Args:
            query_text: The text of the query to search for.
@@ -401,64 +417,56 @@ class DocumentQuery:
                similarity search.
        """
        # Build sub-queries for hybrid search. Order must match normalization
-        # pipeline weights: title vector, title keyword, content vector,
-        # content keyword.
+        # pipeline weights: title vector, content vector, keyword (title + content).
        hybrid_search_queries: list[dict[str, Any]] = [
            # 1. Title vector search
            {
                "knn": {
                    TITLE_VECTOR_FIELD_NAME: {
                        "vector": query_vector,
-                        "k": num_candidates,
+                        "k": vector_candidates,
                    }
                }
            },
-            # 2. Title keyword + phrase search.
-            {
-                "bool": {
-                    "should": [
-                        {
-                            "match": {
-                                TITLE_FIELD_NAME: {
-                                    "query": query_text,
-                                    # operator "or" = match doc if any query term matches (default, explicit for clarity).
-                                    "operator": "or",
-                                }
-                            }
-                        },
-                        {
-                            "match_phrase": {
-                                TITLE_FIELD_NAME: {
-                                    "query": query_text,
-                                    # Slop = 1 allows one extra word or transposition in phrase match.
-                                    "slop": 1,
-                                    # Boost phrase over bag-of-words; exact phrase is a stronger signal.
-                                    "boost": 1.5,
-                                }
-                            }
-                        },
-                    ]
-                }
-            },
-            # 3. Content vector search
+            # 2. Content vector search
            {
                "knn": {
                    CONTENT_VECTOR_FIELD_NAME: {
                        "vector": query_vector,
-                        "k": num_candidates,
+                        "k": vector_candidates,
                    }
                }
            },
-            # 4. Content keyword + phrase search.
+            # 3. Keyword (title + content) match and phrase search.
            {
                "bool": {
                    "should": [
+                        {
+                            "match": {
+                                TITLE_FIELD_NAME: {
+                                    "query": query_text,
+                                    "operator": "or",
+                                    # The title fields are strongly discounted as they are included in the content.
+                                    # It just acts as a minor boost
+                                    "boost": 0.1,
+                                }
+                            }
+                        },
+                        {
+                            "match_phrase": {
+                                TITLE_FIELD_NAME: {
+                                    "query": query_text,
+                                    "slop": 1,
+                                    "boost": 0.2,
+                                }
+                            }
+                        },
                        {
                            "match": {
                                CONTENT_FIELD_NAME: {
                                    "query": query_text,
-                                    # operator "or" = match doc if any query term matches (default, explicit for clarity).
                                    "operator": "or",
+                                    "boost": 1.0,
                                }
                            }
                        },
@@ -466,9 +474,7 @@ class DocumentQuery:
                            "match_phrase": {
                                CONTENT_FIELD_NAME: {
                                    "query": query_text,
-                                    # Slop = 1 allows one extra word or transposition in phrase match.
                                    "slop": 1,
-                                    # Boost phrase over bag-of-words; exact phrase is a stronger signal.
                                    "boost": 1.5,
                                }
                            }
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Dane Urban	80cf389774	.	2026-02-23 16:30:30 -08:00
Danelegend	e775aaacb7	chore: preview modal (#8665 )	2026-02-23 16:29:13 -08:00
Justin Tahara	e5b08b3d92	fix(search): Improve Speed (#8430 )	2026-02-23 16:29:13 -08:00
Jamison Lahman	7c91304ba2	chore(playwright): warn user if setup takes longer than usual (#8690 )	2026-02-23 16:29:13 -08:00
roshan	68a292b500	fix(ui): Clean up NRF settings button styling (#8678 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-02-23 16:29:13 -08:00
Justin Tahara	e553b80030	fix(db): Multitenant Schema migration update (#8679 )	2026-02-23 16:29:13 -08:00
Justin Tahara	f3949f8e09	chore(ods): Automated Cherry-pick backport (#8642 )	2026-02-23 16:29:13 -08:00
Nikolas Garza	c7c064e296	feat(scim): Okta compatibility + provider abstraction (#8568 )	2026-02-23 16:29:13 -08:00
Wenxi	68b91a8862	fix: domain rules for signup on cloud (#8671 )	2026-02-23 16:29:13 -08:00
roshan	c23e5a196d	fix: Handle unauthenticated state gracefully on NRF page (#8491 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-02-23 16:29:13 -08:00
Raunak Bhagat	093223c6c4	refactor: migrate Web Search page to SettingsLayouts + Content (#8662 )	2026-02-23 16:29:13 -08:00
Danelegend	89517111d4	feat: Add code interpreter server db model (#8669 )	2026-02-23 16:29:13 -08:00
Wenxi	883d4b4ceb	chore: set trial api usage to 0 and show ui (#8664 )	2026-02-23 16:29:13 -08:00
Dane Urban	f3672b6819	CSV rendering	2026-02-22 18:33:39 -08:00
Dane Urban	921f5d9e96	preview modal	2026-02-22 17:42:30 -08:00
SubashMohan	15fe47adc5	fix: improve connector status display, agent search tool detection, and prompt formatting (#8579 )	2026-02-22 07:22:06 +00:00
SubashMohan	29958f1a52	perf(open-url): parallelize URL fetching with split connect/read timeouts (#8580 )	2026-02-22 06:45:26 +00:00
SubashMohan	ac7f9838bc	feat: add mixed content handler for chat and image generation packets (#8494 )	2026-02-22 06:12:48 +00:00
Raunak Bhagat	d0fa4b3319	fix: limit connectors section to 3 items in Chat Preferences (#8661 )	2026-02-22 01:01:42 +00:00
Raunak Bhagat	3fb4fb422e	feat: refreshed admin Chat Preferences page (#8488 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 04:52:04 +00:00
acaprau	ba5da22ea1	chore(documentation): Add comment in `contributing_guides/best_practices.md` about accountability in TODOs (#8659 )	2026-02-21 04:47:48 +00:00
Evan Lohn	9909049047	fix: improve eager loading behavior (#8576 )	2026-02-21 04:02:34 +00:00
Nikolas Garza	c516aa3e3c	fix(search): eliminate DB pool exhaustion from SearchTool parallel queries (#8651 )	2026-02-21 03:56:00 +00:00
Evan Lohn	5cc6220417	feat: checkpointing mid drive (#8606 )	2026-02-21 03:47:28 +00:00
acaprau	15da1e0a88	feat(opensearch): Add OpenSearch in docker compose (#8611 )	2026-02-21 02:57:31 +00:00
Jamison Lahman	e9ff00890b	chore(fe): fix Pinned icon hover background when card is hovered (#8656 )	2026-02-21 02:37:18 +00:00
Wenxi	67747a9d93	fix: unify provider retrieval into swr hook (#8609 )	2026-02-21 02:15:28 +00:00
Jamison Lahman	edfc51b439	fix(fe): Truncated tooltip is disabled when not needed to be shown (#8629 )	2026-02-21 02:07:18 +00:00
Evan Lohn	ac4fba947e	feat: sharepoint scalability5 (#8631 )	2026-02-21 01:56:11 +00:00
Nikolas Garza	c142b2db02	chore: special sauce (#8646 )	2026-02-21 01:30:36 +00:00
Jamison Lahman	fb7e7e4395	fix(fe): keep focus on input on empty (#8627 )	2026-02-21 00:13:26 +00:00
Justin Tahara	113f23398e	fix(celery): Guardrail for User File Processing (#8633 )	2026-02-20 22:43:01 +00:00
Danelegend	5a8716026a	feat: Python tool call replay packets (#8649 )	2026-02-20 22:28:06 +00:00
Jamison Lahman	3389140bfd	chore(deps): @radix-ui/react-tooltip: v1.1.3->v1.2.8 (#8647 )	2026-02-20 22:27:49 +00:00
Justin Tahara	13109e7b81	chore(llm): Removal of Retired Models + Cleanup (#8645 )	2026-02-20 21:47:53 +00:00
Jessica Singh	56ad457168	chore(auth): tests cleanup (#8559 )	2026-02-20 20:51:29 +00:00
Nikolas Garza	a81aea2afc	feat(scim): add scim_username column to ScimUserMapping (#8635 )	2026-02-20 12:44:16 -08:00
Jamison Lahman	7cb5c9c4a6	chore(fe): replace BlinkingDot with a BlinkingBar (#8640 )	2026-02-20 20:02:19 +00:00
Evan Lohn	3520c58a22	feat: sharepoint scalability4 (#8551 )	2026-02-20 19:35:36 +00:00
Nikolas Garza	bd9d1bfa27	chore(helm): update code-interpreter chart repo URL to python-sandbox (#8625 )	2026-02-20 11:39:08 -08:00
Evan Lohn	14416cc3db	fix: search tool enabled when nothing selected (#8637 )	2026-02-20 19:22:28 +00:00
Jamison Lahman	d7fce14d26	chore(devtools): upgrade `ods`: 0.5.7->0.6.0 (#8628 )	2026-02-20 10:18:27 -08:00
Wenxi	39a8d8ed05	fix: boolean form field click on text, open url tool checkbox in default assistant, and simple tooltip rendering (#8480 )	2026-02-20 17:38:29 +00:00
Jamison Lahman	82f735a434	chore(fe): rm settings page top padding (#8621 )	2026-02-20 17:37:19 +00:00
Jamison Lahman	aadb58518b	fix(fe): popover width can fit trigger element (#8624 )	2026-02-20 07:29:21 +00:00
Jamison Lahman	0755499e0f	chore(fe): whitelabel logo layout followup (#8626 )	2026-02-19 23:27:37 -08:00
Danelegend	27aaf977a2	feat(code interpreter): improve produced code artifacts (#8507 ) Co-authored-by: Jamison Lahman <jamison@lahman.dev>	2026-02-20 07:10:09 +00:00
Evan Lohn	9f707f195e	feat: sharepoint scalability3 (#8537 )	2026-02-20 05:34:50 +00:00
Justin Tahara	3e35570f70	feat(vertex ai): Image Gen Historical Context (#8603 )	2026-02-20 05:31:41 +00:00
Jamison Lahman	53b1bf3b2c	chore(fe): fix nested button hydration error (#8599 )	2026-02-20 05:14:24 +00:00
Jamison Lahman	5a3fa6b648	chore(fe): fix whitelabel logo moving on sidebar close (#8577 )	2026-02-20 04:30:01 +00:00
Evan Lohn	fc6a37850b	feat: delta sync sharepoint (#8532 ) Co-authored-by: CE11-Kishan <CE11-Kishan@users.noreply.github.com>	2026-02-20 03:26:54 +00:00
Raunak Bhagat	aa6fec3d58	Fix/pat UI (#8617 )	2026-02-19 19:23:26 -08:00
Raunak Bhagat	efa6005e36	feat(opal): Add widthVariant to Interactive.Container (#8610 )	2026-02-20 03:14:30 +00:00
Nikolas Garza	921bfc72f4	fix(helm): route /openapi.json to api_server in nginx config (#8612 )	2026-02-19 19:05:14 -08:00
Justin Tahara	812603152d	feat(web): FE Changes for Brave Web Search 3/3 (#8597 )	2026-02-20 02:43:30 +00:00
Raunak Bhagat	6779d8fbd7	feat(opal): Add Content layout component (#8534 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 02:35:27 +00:00
Justin Tahara	2c9826e4a9	feat(web): Logical Hardening for Brave Web Search 2/3 (#8595 )	2026-02-20 02:09:49 +00:00
Justin Tahara	5b54687077	feat(web): Initial Framework for Brave Web Search 1/3 (#8594 )	2026-02-20 01:38:19 +00:00
Raunak Bhagat	0f7e2ee674	feat(opal): Add Tag component and resync colors with Figma (#8533 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 01:14:52 +00:00
Danelegend	ea466648d9	feat: file preview from llm (#8604 )	2026-02-20 00:47:03 +00:00
Evan Lohn	a402911ee6	feat: sharepoint scalability 1 (#8531 )	2026-02-20 00:41:48 +00:00
Wenxi	7ae9ba807d	fix(manage-users): exclude slack users from /users list (#8602 )	2026-02-19 23:41:25 +00:00
Nikolas Garza	1f79223c42	feat(ods): add --continue flag and cp alias to cherry-pick (#8601 )	2026-02-19 22:31:06 +00:00
Danelegend	c0c2247d5a	feat: Modal Header no icon (#8596 )	2026-02-19 21:59:18 +00:00
Danelegend	2989ceda41	feat: add file formatting reminder (#8524 )	2026-02-19 21:51:06 +00:00
Wenxi	c825f5eca6	feat: whitelist invite setting, allow users to register and invite, new accoount blocked page (#8527 )	2026-02-19 20:16:50 +00:00
Jamison Lahman	a8965def79	chore(playwright): fix chat scrolling non-determinism (#8584 )	2026-02-19 19:26:09 +00:00
Jamison Lahman	59e1ad51ba	chore(fe): fix drop-down overflow in API Key modal (#8574 )	2026-02-19 19:15:10 +00:00
Jamison Lahman	0e70a8f826	chore(fe): remove close button from image gen tooltip (#8585 )	2026-02-19 18:31:02 +00:00
SubashMohan	0891737dfd	fix: update SourceTag component to use variant prop for sizing (#8582 )	2026-02-19 17:27:04 +00:00
victoria reese	5a20112670	fix: define fallback only on custom metrics (#8566 )	2026-02-19 09:19:44 -08:00
SubashMohan	584f2e2638	fix(ui): Fix admin UI layout, sticky headers, LLM filtering, and project view issues (#8496 )	2026-02-19 14:02:22 +00:00
Yuhong Sun	aa24b16ec1	chore: Opensearch tuning (#8518 )	2026-02-19 05:43:45 +00:00
acaprau	50aa9d7df6	feat(opensearch): Display percentage progress in the migration page (#8575 )	2026-02-19 05:33:29 +00:00
acaprau	bfda586054	chore(opensearch): Make the migration visit from 4 independent Vespa slices concurrently (#8570 )	2026-02-19 04:26:59 +00:00
roshan	e04392fbb1	feat(craft): shareable webapp URLs with two-tier access control (#8510 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 04:04:21 +00:00
Wenxi	e46c6c5175	fix: init all celery tasks for autodiscovery (#8539 )	2026-02-19 03:45:40 +00:00
Nikolas Garza	f59792b4ac	feat(metrics): add SQLAlchemy connection pool Prometheus metrics (#8543 )	2026-02-19 02:32:42 +00:00
Justin Tahara	973b9456e9	fix(vertex ai): Sort Model Names (#8572 )	2026-02-19 02:24:47 +00:00
Justin Tahara	aa8d126513	fix(ollama): Ollama Chat fixes (#8522 )	2026-02-19 01:37:07 +00:00
Jamison Lahman	a6da5add49	chore(fe): header text content wraps when responsive (#8565 )	2026-02-19 01:25:12 +00:00
Jamison Lahman	3356f90437	chore(fe): Button handles empty string as text, use in header (#8563 )	2026-02-19 01:24:10 +00:00
Danelegend	27c254ecf9	fix: /llm/provider route returns all providers (#8545 )	2026-02-19 00:13:27 +00:00
Jamison Lahman	09678b3c8e	chore(fe): whitelabeling header nits (#8561 )	2026-02-18 23:39:54 +00:00
Justin Tahara	ecdb962e24	chore(llm): Cleaning up arg parsing (#8555 )	2026-02-18 23:06:19 +00:00
Justin Tahara	63b9b91565	chore(llm): Extract _close_reasoning (#8550 )	2026-02-18 22:59:07 +00:00
Justin Tahara	14770e6e90	chore(llm): Extract citation flush helper (#8554 )	2026-02-18 22:31:51 +00:00
Justin Tahara	14807d986a	chore(llm): Using bool for has_reasoned (#8549 )	2026-02-18 22:18:09 +00:00
Justin Tahara	290eb98020	chore(llm): Extract _make_placement (#8552 )	2026-02-18 21:58:58 +00:00
Nikolas Garza	fe6fa3d034	feat(scim): add SCIM Group CRUD endpoints (#8456 )	2026-02-18 21:52:56 +00:00
Nikolas Garza	2a60a02e0e	feat(scim): add admin SCIM token management API (#8538 )	2026-02-18 21:41:09 +00:00
Justin Tahara	3bcd666e90	chore(llm): Cleaning up _extract_tool_call_kickoffs (#8548 )	2026-02-18 21:39:36 +00:00
Jamison Lahman	684013732c	chore(fe): update human message size (#8547 )	2026-02-18 21:27:02 +00:00
Nikolas Garza	367dcb8f8b	feat(scim): add SCIM User CRUD endpoints (#8455 )	2026-02-18 21:26:52 +00:00
Justin Tahara	59dfed0bc8	chore(llm): Cleaning up Docstring (#8546 )	2026-02-18 21:25:32 +00:00
Jamison Lahman	7a719b54bb	chore(playwright): worker-aware users for test isolation (#8544 )	2026-02-18 21:13:07 +00:00
Jamison Lahman	25ef5ff010	chore(fe): update SourceTag tag size (#8540 )	2026-02-18 20:49:11 +00:00
Danelegend	53f9f042a1	fix: model config not populating flow during sync (#8542 )	2026-02-18 20:08:59 +00:00
Jamison Lahman	3469f0c979	chore(gha): rm nightly license scan workflow (#8541 )	2026-02-18 19:55:23 +00:00
Jamison Lahman	f688efbcd6	chore(gha): update `helm/chart-testing-action` version (#8536 )	2026-02-18 11:21:17 -08:00
Jamison Lahman	250658a8b2	chore(playwright): screenshots wait for animations (#8530 )	2026-02-18 18:33:33 +00:00
SubashMohan	5150ffc3e0	feat(chat): new chat sharing UI (#8471 )	2026-02-18 15:15:34 +00:00
Jamison Lahman	858c1dbe4a	chore(playwright): chat rendering tests (#8526 )	2026-02-18 05:41:54 +00:00
Evan Lohn	a8e7353227	fix: shared drive node names (#8520 )	2026-02-18 05:34:00 +00:00
Nikolas Garza	343cda35cb	feat(observability): add production Prometheus instrumentation module (#8503 )	2026-02-18 05:09:32 +00:00
Wenxi	1cbe47d85e	chore: increase firecrawl test timeout to 60s for e2e test (#8529 )	2026-02-18 05:02:45 +00:00
Jamison Lahman	221658132a	chore(playwright): skip visual regression report on cancelled (#8528 ) Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-17 20:56:22 -08:00
Jamison Lahman	fe8fb9eb75	fix(fe): center-align modals relative to chat container (#8517 )	2026-02-18 04:27:38 +00:00
Wenxi	f7925584b8	fix: create release notifications on cloud (#8525 )	2026-02-18 03:47:23 +00:00
Nikolas Garza	00b0e15ed7	feat(scim): add SCIM 2.0 service discovery endpoints (#8454 )	2026-02-18 02:07:49 +00:00
Jamison Lahman	c2968e3bfe	chore(playwright): update skill re: user isolation best-practices (#8521 )	2026-02-17 17:52:03 -08:00
Justin Tahara	978f0a9d35	fix(ollama): Cleaning up DeepSeek (#8519 )	2026-02-18 01:44:34 +00:00
Danelegend	410340fe37	chore: add linguist-language (#8515 ) Co-authored-by: Jamison Lahman <jamison@lahman.dev>	2026-02-18 00:57:27 +00:00
Wenxi	a0545c7eb3	fix: open_url broken on non-normalized urls and enable web crawl tests (#8508 )	2026-02-18 00:41:21 +00:00
Jamison Lahman	aa46a8bba2	chore(gha): only run zizmor when .github/ changes (#8516 )	2026-02-17 16:25:13 -08:00
Jamison Lahman	cd5aaa0302	chore(playwright): prefer absolute imports (#8511 )	2026-02-17 16:22:51 -08:00
roshan	db33efaeaa	fix: Enable search UI in Chrome extension side panel (#8486 ) Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-17 23:59:05 +00:00
Justin Tahara	c28d37dff8	chore(playwright): Cleanup MCP Tests (#8512 )	2026-02-17 23:57:22 +00:00
Jamison Lahman	696e72bcbb	chore(playwright): organize tests into directories (#8509 )	2026-02-17 15:37:35 -08:00
Nikolas Garza	cfdac8083a	feat(scim): add SCIM bearer token authentication (#8423 )	2026-02-17 23:28:00 +00:00
Jamison Lahman	781aab67fa	chore(cursor): playwright SKILL.md (#8506 )	2026-02-17 15:12:01 -08:00
Justin Tahara	b14d357d55	chore(mcp): Adding more Playwright Tests (#8452 )	2026-02-17 22:55:35 +00:00
Nikolas Garza	60bc1ce8a1	feat(scim): add SCIM database CRUD operations (DAL) (#8424 )	2026-02-17 22:43:54 +00:00
roshan	c89e82ee58	feat(craft): ACP session persistence across sandbox sleep/wake (#8466 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:32:49 +00:00
Evan Lohn	529ab8179f	chore: undeprecate doc sets (#8505 )	2026-02-17 21:56:49 +00:00
Nikolas Garza	19716874b2	fix(ee): small ux fixes for licensing (#8498 )	2026-02-17 21:55:06 +00:00
Justin Tahara	1ac3b8515d	chore(playwright): Cleanup for LLM Tests (#8504 )	2026-02-17 21:38:24 +00:00
Justin Tahara	0d0c8580ca	fix(helm): Log Level Passthrough for Celery (#8495 )	2026-02-17 21:15:46 +00:00
roshan	96a38dcc06	fix(craft): ephemeral ACP clients to prevent multi-replica session corruption (#8465 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 21:10:21 +00:00
Jamison Lahman	6fe72e5524	chore(playwright): use `user` for llm runtime tests (#8502 )	2026-02-17 20:21:28 +00:00
Jamison Lahman	1f4ee4d550	chore(playwright): hide toast elements in screenshots (#8501 )	2026-02-17 19:42:54 +00:00
Danelegend	534db103fc	fix: Edit messages w/ files (#8461 )	2026-02-17 18:27:40 +00:00
Justin Tahara	f6f2e2d7c0	fix(desktop): Link clicking within App (#8493 )	2026-02-17 18:17:16 +00:00
Raunak Bhagat	b2836c5a13	fix: Interactive.Base href rendering and container self-stretch (#8492 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 18:16:40 +00:00
Justin Tahara	47177c536d	fix(vertex): Additional fix for Opus 4.6 (#8497 )	2026-02-17 18:12:19 +00:00
victoria reese	3920a1554e	Chore/mcp server host (#8472 )	2026-02-17 07:19:29 -08:00
SubashMohan	8b12bf34ed	fix(frontend): simplify onboarding logic and fix UI issues (#8470 )	2026-02-17 08:42:30 +00:00
roshan	dd5b7edc6b	fix(ui): remove data-state from Switch (#8394 )	2026-02-17 05:22:38 +00:00
Danelegend	7977017aab	fix: Code interpreter takes files (#8489 ) Co-authored-by: dat <dat@purplefish.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Dat Bac Do <dat.b.do@gmail.com>	2026-02-16 18:53:20 -08:00
Evan Lohn	efd8388239	feat: slim confluence hiernodes (#8490 )	2026-02-17 02:37:10 +00:00
Evan Lohn	ac6668309f	fix: custom MCP auth Headers in API [mirror of #8345 ] (#8487 ) Co-authored-by: Clément Rigo <rigoclement@mydnic.be>	2026-02-17 01:21:35 +00:00
Evan Lohn	a8bf63315c	chore: hiernode passive deletes (#8484 )	2026-02-17 00:30:18 +00:00
Evan Lohn	d905eced87	feat: pruning picks up hierarchynodes (#8481 )	2026-02-16 23:56:20 +00:00
Evan Lohn	1e6f8fa6bf	fix: allow public oauth clients via DCR (#8482 )	2026-02-16 23:13:27 +00:00
Raunak Bhagat	faeac13693	feat: agent view only modal (#7989 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 18:23:44 +00:00
SubashMohan	ddb14ec762	fix(ui): fix few common ui bugs (#8425 )	2026-02-16 11:22:43 +00:00
Justin Tahara	f31f589860	chore(llm): Adding more FE Unit Tests (#8457 )	2026-02-16 03:02:19 +00:00
Evan Lohn	63b9a869af	fix: CheckpointedConnector pruning only processes first checkpoint step (mirror of #8464 ) (#8468 ) Co-authored-by: Yves Grolet <yves@grolet.com>	2026-02-16 00:45:43 +00:00
Yuhong Sun	6aea36b573	chore: Context summarization update (#8467 )	2026-02-15 23:39:47 +00:00
Wenxi	3d8e8d0846	refactor: connector config refresh elements/cleanup (#8428 )	2026-02-15 20:12:51 +00:00