k

update error
fix main
2026-02-23 10:45:44 +00:00 · 2025-02-17 19:17:48 -08:00 · 2025-02-17 19:17:48 -08:00 · 2025-02-17 18:23:15 -08:00 · 2025-02-17 21:02:03 +00:00 · 2025-02-17 21:01:24 +00:00
55 changed files with 847 additions and 323 deletions
--- a/README.md
+++ b/README.md
@@ -24,95 +24,113 @@
 </a>
 </p>

-<strong>[Onyx](https://www.onyx.app/)</strong> (formerly Danswer) is the AI platform connected to your company's docs, apps, and people.
-Onyx provides a feature rich Chat interface and plugs into any LLM of your choice.
-There are over 40 supported connectors such as Google Drive, Slack, Confluence, Salesforce, etc. which keep knowledge and permissions up to date.
-Create custom AI agents with unique prompts, knowledge, and actions the agents can take.
-Onyx can be deployed securely anywhere and for any scale - on a laptop, on-premise, or to cloud.
+<strong>[Onyx](https://www.onyx.app/)</strong> (formerly Danswer) is the AI Assistant connected to your company's docs, apps, and people.
+Onyx provides a Chat interface and plugs into any LLM of your choice. Onyx can be deployed anywhere and for any
+scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your
+own control. Onyx is dual Licensed with most of it under MIT license and designed to be modular and easily extensible. The system also comes fully ready
+for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for
+configuring AI Assistants.

+Onyx also serves as a Enterprise Search across all common workplace tools such as Slack, Google Drive, Confluence, etc.
+By combining LLMs and team specific knowledge, Onyx becomes a subject matter expert for the team. Imagine ChatGPT if
+it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already
+supported?" or "Where's the pull request for feature Y?"

-<h3>Feature Showcase</h3>
+<h3>Usage</h3>

-**Deep research over your team's knowledge:**
+Onyx Web App:

-https://private-user-images.githubusercontent.com/32520769/414509312-48392e83-95d0-4fb5-8650-a396e05e0a32.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5Mjg2MzYsIm5iZiI6MTczOTkyODMzNiwicGF0aCI6Ii8zMjUyMDc2OS80MTQ1MDkzMTItNDgzOTJlODMtOTVkMC00ZmI1LTg2NTAtYTM5NmUwNWUwYTMyLm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDAxMjUzNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFhMzk5Njg2Y2Y5YjFmNDNiYTQ2YzM5ZTg5YWJiYTU2NWMyY2YwNmUyODE2NWUxMDRiMWQxZWJmODI4YTA0MTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.a9D8A0sgKE9AoaoE-mfFbJ6_OKYeqaf7TZ4Han2JfW8
+https://github.com/onyx-dot-app/onyx/assets/32520769/563be14c-9304-47b5-bf0a-9049c2b6f410

+Or, plug Onyx into your existing Slack workflows (more integrations to come 😁):

-**Use Onyx as a secure AI Chat with any LLM:**
-
-![Onyx Chat Silent Demo](https://github.com/onyx-dot-app/onyx/releases/download/v0.21.1/OnyxChatSilentDemo.gif)
-
-
-**Easily set up connectors to your apps:**
-
-![Onyx Connector Silent Demo](https://github.com/onyx-dot-app/onyx/releases/download/v0.21.1/OnyxConnectorSilentDemo.gif)
-
-
-**Access Onyx where your team already works:**
-
-![Onyx Bot Demo](https://github.com/onyx-dot-app/onyx/releases/download/v0.21.1/OnyxBot.png)
+https://github.com/onyx-dot-app/onyx/assets/25087905/3e19739b-d178-4371-9a38-011430bdec1b

+For more details on the Admin UI to manage connectors and users, check out our
+<strong><a href="https://www.youtube.com/watch?v=geNzY1nbCnU">Full Video Demo</a></strong>!

 ## Deployment
-> [!TIP]  
-> To try it out for free and get started in seconds, check out **[Onyx Cloud](https://cloud.onyx.app/signup)**.

-Onyx can also be run locally (even on a laptop) or deployed on a virtual machine with a single
+Onyx can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
 `docker compose` command. Checkout our [docs](https://docs.onyx.app/quickstart) to learn more.

-We also have built-in support for high-availability/scalable deployment on Kubernetes.
-References [here](https://github.com/onyx-dot-app/onyx/tree/main/deployment).
+We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/onyx-dot-app/onyx/tree/main/deployment/kubernetes).

+## 💃 Main Features
+
+- Chat UI with the ability to select documents to chat with.
+- Create custom AI Assistants with different prompts and backing knowledge sets.
+- Connect Onyx with LLM of your choice (self-host for a fully airgapped solution).
+- Document Search + AI Answers for natural language queries.
+- Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
+- Slack integration to get answers and search results directly in Slack.

 ## 🚧 Roadmap
- Extensions to the Chrome Plugin
- Latest methods in information retrieval (StructRAG, LightGraphRAG, etc.)
- Personalized Search
+
+- Chat/Prompt sharing with specific teammates and user groups.
+- Multimodal model support, chat with images, video etc.
+- Choosing between LLMs and parameters during chat session.
+- Tool calling and agent configurations options.
 - Organizational understanding and ability to locate and suggest experts from your team.
- Code Search
- SQL and Structured Query Language

+## Other Notable Benefits of Onyx

-## 🔍 Other Notable Benefits of Onyx
- Custom deep learning models only through Onyx + learn from user feedback.
- Flexible security features like SSO (OIDC/SAML/OAuth2), RBAC, encryption of credentials, etc.
- Knowledge curation features like document-sets, query history, usage analytics, etc.
- Scalable deployment options tested up to many tens of thousands users and hundreds of millions of documents.
-
+- User Authentication with document level access management.
+- Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
+- Admin Dashboard to configure connectors, document-sets, access, etc.
+- Custom deep learning models + learn from user feedback.
+- Easy deployment and ability to host Onyx anywhere of your choosing.

 ## 🔌 Connectors
-Keep knowledge and access up to sync across 40+ connectors:

+Efficiently pulls the latest changes from:
+
+- Slack
+- GitHub
 - Google Drive
 - Confluence
- Slack
- Gmail
- Salesforce
- Microsoft Sharepoint
- Github
 - Jira
 - Zendesk
+- Gmail
+- Notion
 - Gong
- Microsoft Teams
- Dropbox
+- Slab
+- Linear
+- Productboard
+- Guru
+- Bookstack
+- Document360
+- Sharepoint
+- Hubspot
 - Local Files
 - Websites
 - And more ...

-See the full list [here](https://docs.onyx.app/connectors).
+## 📚 Editions

-
-## 📚 Licensing
 There are two editions of Onyx:

- Onyx Community Edition (CE) is available freely under the MIT Expat license. Simply follow the Deployment guide above.
- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations.
-For feature details, check out [our website](https://www.onyx.app/pricing).
+- Onyx Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Onyx you will get if you follow the Deployment guide above.
+- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
+  - Single Sign-On (SSO), with support for both SAML and OIDC
+  - Role-based access control
+  - Document permission inheritance from connected sources
+  - Usage analytics and query history accessible to admins
+  - Whitelabeling
+  - API key authentication
+  - Encryption of secrets
+  - And many more! Checkout [our website](https://www.onyx.app/) for the latest.

 To try the Onyx Enterprise Edition:
-1. Checkout [Onyx Cloud](https://cloud.onyx.app/signup).
-2. For self-hosting the Enterprise Edition, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/onyx/founders).

+1. Checkout our [Cloud product](https://cloud.onyx.app/signup).
+2. For self-hosting, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/onyx/founders).

 ## 💡 Contributing
+
 Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
+
+## ⭐Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=onyx-dot-app/onyx&type=Date)](https://star-history.com/#onyx-dot-app/onyx&Date)
+
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -28,11 +28,11 @@ RUN apt-get update && \
        curl \
        zip \
        ca-certificates \
-        libgnutls30=3.7.9-2+deb12u3 \
-        libblkid1=2.38.1-5+deb12u1 \
-        libmount1=2.38.1-5+deb12u1 \
-        libsmartcols1=2.38.1-5+deb12u1 \
-        libuuid1=2.38.1-5+deb12u1 \
+        libgnutls30 \
+        libblkid1 \
+        libmount1 \
+        libsmartcols1 \
+        libuuid1 \
        libxmlsec1-dev \
        pkg-config \
        gcc \
--- a/backend/onyx/agents/agent_search/basic/graph_builder.py
+++ b/backend/onyx/agents/agent_search/basic/graph_builder.py
@@ -5,14 +5,14 @@ from langgraph.graph import StateGraph
 from onyx.agents.agent_search.basic.states import BasicInput
 from onyx.agents.agent_search.basic.states import BasicOutput
 from onyx.agents.agent_search.basic.states import BasicState
-from onyx.agents.agent_search.orchestration.nodes.basic_use_tool_response import (
-    basic_use_tool_response,
-)
-from onyx.agents.agent_search.orchestration.nodes.llm_tool_choice import llm_tool_choice
+from onyx.agents.agent_search.orchestration.nodes.call_tool import call_tool
+from onyx.agents.agent_search.orchestration.nodes.choose_tool import choose_tool
 from onyx.agents.agent_search.orchestration.nodes.prepare_tool_input import (
    prepare_tool_input,
 )
-from onyx.agents.agent_search.orchestration.nodes.tool_call import tool_call
+from onyx.agents.agent_search.orchestration.nodes.use_tool_response import (
+    basic_use_tool_response,
+)
 from onyx.utils.logger import setup_logger

 logger = setup_logger()
@@ -33,13 +33,13 @@ def basic_graph_builder() -> StateGraph:
    )

    graph.add_node(
-        node="llm_tool_choice",
-        action=llm_tool_choice,
+        node="choose_tool",
+        action=choose_tool,
    )

    graph.add_node(
-        node="tool_call",
-        action=tool_call,
+        node="call_tool",
+        action=call_tool,
    )

    graph.add_node(
@@ -51,12 +51,12 @@ def basic_graph_builder() -> StateGraph:

    graph.add_edge(start_key=START, end_key="prepare_tool_input")

-    graph.add_edge(start_key="prepare_tool_input", end_key="llm_tool_choice")
+    graph.add_edge(start_key="prepare_tool_input", end_key="choose_tool")

-    graph.add_conditional_edges("llm_tool_choice", should_continue, ["tool_call", END])
+    graph.add_conditional_edges("choose_tool", should_continue, ["call_tool", END])

    graph.add_edge(
-        start_key="tool_call",
+        start_key="call_tool",
        end_key="basic_use_tool_response",
    )

@@ -73,7 +73,7 @@ def should_continue(state: BasicState) -> str:
        # If there are no tool calls, basic graph already streamed the answer
        END
        if state.tool_choice is None
-        else "tool_call"
+        else "call_tool"
    )


--- a/backend/onyx/agents/agent_search/deep_search/initial/generate_individual_sub_answer/nodes/check_sub_answer.py
+++ b/backend/onyx/agents/agent_search/deep_search/initial/generate_individual_sub_answer/nodes/check_sub_answer.py
@@ -31,12 +31,14 @@ from onyx.agents.agent_search.shared_graph_utils.utils import (
    get_langgraph_node_log_string,
 )
 from onyx.agents.agent_search.shared_graph_utils.utils import parse_question_id
-from onyx.configs.agent_configs import AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_CHECK
+from onyx.configs.agent_configs import AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_CHECK
+from onyx.configs.agent_configs import AGENT_TIMEOUT_LLM_SUBANSWER_CHECK
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
 from onyx.prompts.agent_search import SUB_ANSWER_CHECK_PROMPT
 from onyx.prompts.agent_search import UNKNOWN_ANSWER
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -85,9 +87,11 @@ def check_sub_answer(
    agent_error: AgentErrorLog | None = None
    response: BaseMessage | None = None
    try:
-        response = fast_llm.invoke(
+        response = run_with_timeout(
+            AGENT_TIMEOUT_LLM_SUBANSWER_CHECK,
+            fast_llm.invoke,
            prompt=msg,
-            timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_CHECK,
+            timeout_override=AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_CHECK,
        )

        quality_str: str = cast(str, response.content)
@@ -96,7 +100,7 @@ def check_sub_answer(
        )
        log_result = f"Answer quality: {quality_str}"

-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        agent_error = AgentErrorLog(
            error_type=AgentLLMErrorType.TIMEOUT,
            error_message=AGENT_LLM_TIMEOUT_MESSAGE,
--- a/backend/onyx/agents/agent_search/deep_search/initial/generate_individual_sub_answer/nodes/generate_sub_answer.py
+++ b/backend/onyx/agents/agent_search/deep_search/initial/generate_individual_sub_answer/nodes/generate_sub_answer.py
@@ -1,5 +1,4 @@
 from datetime import datetime
-from typing import Any
 from typing import cast

 from langchain_core.messages import merge_message_runs
@@ -47,11 +46,13 @@ from onyx.chat.models import StreamStopInfo
 from onyx.chat.models import StreamStopReason
 from onyx.chat.models import StreamType
 from onyx.configs.agent_configs import AGENT_MAX_ANSWER_CONTEXT_DOCS
-from onyx.configs.agent_configs import AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_GENERATION
+from onyx.configs.agent_configs import AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_GENERATION
+from onyx.configs.agent_configs import AGENT_TIMEOUT_LLM_SUBANSWER_GENERATION
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
 from onyx.prompts.agent_search import NO_RECOVERED_DOCS
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -110,15 +111,14 @@ def generate_sub_answer(
            config=fast_llm.config,
        )

-        response: list[str | list[str | dict[str, Any]]] = []
        dispatch_timings: list[float] = []
-
        agent_error: AgentErrorLog | None = None
+        response: list[str] = []

-        try:
+        def stream_sub_answer() -> list[str]:
            for message in fast_llm.stream(
                prompt=msg,
-                timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_GENERATION,
+                timeout_override=AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_GENERATION,
            ):
                # TODO: in principle, the answer here COULD contain images, but we don't support that yet
                content = message.content
@@ -142,8 +142,15 @@ def generate_sub_answer(
                    (end_stream_token - start_stream_token).microseconds
                )
                response.append(content)
+            return response

-        except LLMTimeoutError:
+        try:
+            response = run_with_timeout(
+                AGENT_TIMEOUT_LLM_SUBANSWER_GENERATION,
+                stream_sub_answer,
+            )
+
+        except (LLMTimeoutError, TimeoutError):
            agent_error = AgentErrorLog(
                error_type=AgentLLMErrorType.TIMEOUT,
                error_message=AGENT_LLM_TIMEOUT_MESSAGE,
--- a/backend/onyx/agents/agent_search/deep_search/initial/generate_initial_answer/nodes/generate_initial_answer.py
+++ b/backend/onyx/agents/agent_search/deep_search/initial/generate_initial_answer/nodes/generate_initial_answer.py
@@ -1,5 +1,4 @@
 from datetime import datetime
-from typing import Any
 from typing import cast

 from langchain_core.messages import HumanMessage
@@ -60,11 +59,15 @@ from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.chat.models import AgentAnswerPiece
 from onyx.chat.models import ExtendedToolResponse
 from onyx.chat.models import StreamingError
+from onyx.configs.agent_configs import AGENT_ANSWER_GENERATION_BY_FAST_LLM
 from onyx.configs.agent_configs import AGENT_MAX_ANSWER_CONTEXT_DOCS
 from onyx.configs.agent_configs import AGENT_MAX_STREAMED_DOCS_FOR_INITIAL_ANSWER
 from onyx.configs.agent_configs import AGENT_MIN_ORIG_QUESTION_DOCS
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_INITIAL_ANSWER_GENERATION,
+    AGENT_TIMEOUT_CONNECT_LLM_INITIAL_ANSWER_GENERATION,
+)
+from onyx.configs.agent_configs import (
+    AGENT_TIMEOUT_LLM_INITIAL_ANSWER_GENERATION,
 )
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
@@ -77,6 +80,7 @@ from onyx.prompts.agent_search import (
 )
 from onyx.prompts.agent_search import UNKNOWN_ANSWER
 from onyx.tools.tool_implementations.search.search_tool import yield_search_responses
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 _llm_node_error_strings = LLMNodeErrorStrings(
@@ -230,7 +234,11 @@ def generate_initial_answer(

        sub_questions = all_sub_questions  # Replace the original assignment

-        model = graph_config.tooling.fast_llm
+        model = (
+            graph_config.tooling.fast_llm
+            if AGENT_ANSWER_GENERATION_BY_FAST_LLM
+            else graph_config.tooling.primary_llm
+        )

        doc_context = format_docs(answer_generation_documents.context_documents)
        doc_context = trim_prompt_piece(
@@ -260,15 +268,16 @@ def generate_initial_answer(
            )
        ]

-        streamed_tokens: list[str | list[str | dict[str, Any]]] = [""]
+        streamed_tokens: list[str] = [""]
        dispatch_timings: list[float] = []

        agent_error: AgentErrorLog | None = None

-        try:
+        def stream_initial_answer() -> list[str]:
+            response: list[str] = []
            for message in model.stream(
                msg,
-                timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_INITIAL_ANSWER_GENERATION,
+                timeout_override=AGENT_TIMEOUT_CONNECT_LLM_INITIAL_ANSWER_GENERATION,
            ):
                # TODO: in principle, the answer here COULD contain images, but we don't support that yet
                content = message.content
@@ -292,9 +301,16 @@ def generate_initial_answer(
                dispatch_timings.append(
                    (end_stream_token - start_stream_token).microseconds
                )
-                streamed_tokens.append(content)
+                response.append(content)
+            return response

-        except LLMTimeoutError:
+        try:
+            streamed_tokens = run_with_timeout(
+                AGENT_TIMEOUT_LLM_INITIAL_ANSWER_GENERATION,
+                stream_initial_answer,
+            )
+
+        except (LLMTimeoutError, TimeoutError):
            agent_error = AgentErrorLog(
                error_type=AgentLLMErrorType.TIMEOUT,
                error_message=AGENT_LLM_TIMEOUT_MESSAGE,
--- a/backend/onyx/agents/agent_search/deep_search/initial/generate_sub_answers/nodes/decompose_orig_question.py
+++ b/backend/onyx/agents/agent_search/deep_search/initial/generate_sub_answers/nodes/decompose_orig_question.py
@@ -36,7 +36,10 @@ from onyx.chat.models import StreamType
 from onyx.chat.models import SubQuestionPiece
 from onyx.configs.agent_configs import AGENT_NUM_DOCS_FOR_DECOMPOSITION
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_SUBQUESTION_GENERATION,
+    AGENT_TIMEOUT_CONNECT_LLM_SUBQUESTION_GENERATION,
+)
+from onyx.configs.agent_configs import (
+    AGENT_TIMEOUT_LLM_SUBQUESTION_GENERATION,
 )
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
@@ -47,6 +50,7 @@ from onyx.prompts.agent_search import (
    INITIAL_QUESTION_DECOMPOSITION_PROMPT_ASSUMING_REFINEMENT,
 )
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -131,10 +135,12 @@ def decompose_orig_question(
    streamed_tokens: list[BaseMessage_Content] = []

    try:
-        streamed_tokens = dispatch_separated(
+        streamed_tokens = run_with_timeout(
+            AGENT_TIMEOUT_LLM_SUBQUESTION_GENERATION,
+            dispatch_separated,
            model.stream(
                msg,
-                timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_SUBQUESTION_GENERATION,
+                timeout_override=AGENT_TIMEOUT_CONNECT_LLM_SUBQUESTION_GENERATION,
            ),
            dispatch_subquestion(0, writer),
            sep_callback=dispatch_subquestion_sep(0, writer),
@@ -154,7 +160,7 @@ def decompose_orig_question(
        )
        write_custom_event("stream_finished", stop_event, writer)

-    except LLMTimeoutError as e:
+    except (LLMTimeoutError, TimeoutError) as e:
        logger.error("LLM Timeout Error - decompose orig question")
        raise e  # fail loudly on this critical step
    except LLMRateLimitError as e:
--- a/backend/onyx/agents/agent_search/deep_search/main/edges.py
+++ b/backend/onyx/agents/agent_search/deep_search/main/edges.py
@@ -25,7 +25,7 @@ logger = setup_logger()

 def route_initial_tool_choice(
    state: MainState, config: RunnableConfig
-) -> Literal["tool_call", "start_agent_search", "logging_node"]:
+) -> Literal["call_tool", "start_agent_search", "logging_node"]:
    """
    LangGraph edge to route to agent search.
    """
@@ -38,7 +38,7 @@ def route_initial_tool_choice(
        ):
            return "start_agent_search"
        else:
-            return "tool_call"
+            return "call_tool"
    else:
        return "logging_node"

--- a/backend/onyx/agents/agent_search/deep_search/main/graph_builder.py
+++ b/backend/onyx/agents/agent_search/deep_search/main/graph_builder.py
@@ -43,14 +43,14 @@ from onyx.agents.agent_search.deep_search.main.states import MainState
 from onyx.agents.agent_search.deep_search.refinement.consolidate_sub_answers.graph_builder import (
    answer_refined_query_graph_builder,
 )
-from onyx.agents.agent_search.orchestration.nodes.basic_use_tool_response import (
-    basic_use_tool_response,
-)
-from onyx.agents.agent_search.orchestration.nodes.llm_tool_choice import llm_tool_choice
+from onyx.agents.agent_search.orchestration.nodes.call_tool import call_tool
+from onyx.agents.agent_search.orchestration.nodes.choose_tool import choose_tool
 from onyx.agents.agent_search.orchestration.nodes.prepare_tool_input import (
    prepare_tool_input,
 )
-from onyx.agents.agent_search.orchestration.nodes.tool_call import tool_call
+from onyx.agents.agent_search.orchestration.nodes.use_tool_response import (
+    basic_use_tool_response,
+)
 from onyx.agents.agent_search.shared_graph_utils.utils import get_test_config
 from onyx.utils.logger import setup_logger

@@ -77,13 +77,13 @@ def main_graph_builder(test_mode: bool = False) -> StateGraph:
    # Choose the initial tool
    graph.add_node(
        node="initial_tool_choice",
-        action=llm_tool_choice,
+        action=choose_tool,
    )

    # Call the tool, if required
    graph.add_node(
-        node="tool_call",
-        action=tool_call,
+        node="call_tool",
+        action=call_tool,
    )

    # Use the tool response
@@ -168,11 +168,11 @@ def main_graph_builder(test_mode: bool = False) -> StateGraph:
    graph.add_conditional_edges(
        "initial_tool_choice",
        route_initial_tool_choice,
-        ["tool_call", "start_agent_search", "logging_node"],
+        ["call_tool", "start_agent_search", "logging_node"],
    )

    graph.add_edge(
-        start_key="tool_call",
+        start_key="call_tool",
        end_key="basic_use_tool_response",
    )
    graph.add_edge(
--- a/backend/onyx/agents/agent_search/deep_search/main/nodes/compare_answers.py
+++ b/backend/onyx/agents/agent_search/deep_search/main/nodes/compare_answers.py
@@ -33,13 +33,15 @@ from onyx.agents.agent_search.shared_graph_utils.utils import (
 )
 from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.chat.models import RefinedAnswerImprovement
-from onyx.configs.agent_configs import AGENT_TIMEOUT_OVERRIDE_LLM_COMPARE_ANSWERS
+from onyx.configs.agent_configs import AGENT_TIMEOUT_CONNECT_LLM_COMPARE_ANSWERS
+from onyx.configs.agent_configs import AGENT_TIMEOUT_LLM_COMPARE_ANSWERS
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
 from onyx.prompts.agent_search import (
    INITIAL_REFINED_ANSWER_COMPARISON_PROMPT,
 )
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -105,11 +107,14 @@ def compare_answers(
    refined_answer_improvement: bool | None = None
    # no need to stream this
    try:
-        resp = model.invoke(
-            msg, timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_COMPARE_ANSWERS
+        resp = run_with_timeout(
+            AGENT_TIMEOUT_LLM_COMPARE_ANSWERS,
+            model.invoke,
+            prompt=msg,
+            timeout_override=AGENT_TIMEOUT_CONNECT_LLM_COMPARE_ANSWERS,
        )

-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        agent_error = AgentErrorLog(
            error_type=AgentLLMErrorType.TIMEOUT,
            error_message=AGENT_LLM_TIMEOUT_MESSAGE,
--- a/backend/onyx/agents/agent_search/deep_search/main/nodes/create_refined_sub_questions.py
+++ b/backend/onyx/agents/agent_search/deep_search/main/nodes/create_refined_sub_questions.py
@@ -44,7 +44,10 @@ from onyx.agents.agent_search.shared_graph_utils.utils import make_question_id
 from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.chat.models import StreamingError
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_SUBQUESTION_GENERATION,
+    AGENT_TIMEOUT_CONNECT_LLM_REFINED_SUBQUESTION_GENERATION,
+)
+from onyx.configs.agent_configs import (
+    AGENT_TIMEOUT_LLM_REFINED_SUBQUESTION_GENERATION,
 )
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
@@ -53,6 +56,7 @@ from onyx.prompts.agent_search import (
 )
 from onyx.tools.models import ToolCallKickoff
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -134,15 +138,17 @@ def create_refined_sub_questions(
    agent_error: AgentErrorLog | None = None
    streamed_tokens: list[BaseMessage_Content] = []
    try:
-        streamed_tokens = dispatch_separated(
+        streamed_tokens = run_with_timeout(
+            AGENT_TIMEOUT_LLM_REFINED_SUBQUESTION_GENERATION,
+            dispatch_separated,
            model.stream(
                msg,
-                timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_SUBQUESTION_GENERATION,
+                timeout_override=AGENT_TIMEOUT_CONNECT_LLM_REFINED_SUBQUESTION_GENERATION,
            ),
            dispatch_subquestion(1, writer),
            sep_callback=dispatch_subquestion_sep(1, writer),
        )
-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        agent_error = AgentErrorLog(
            error_type=AgentLLMErrorType.TIMEOUT,
            error_message=AGENT_LLM_TIMEOUT_MESSAGE,
--- a/backend/onyx/agents/agent_search/deep_search/main/nodes/extract_entities_terms.py
+++ b/backend/onyx/agents/agent_search/deep_search/main/nodes/extract_entities_terms.py
@@ -22,11 +22,17 @@ from onyx.agents.agent_search.shared_graph_utils.utils import (
    get_langgraph_node_log_string,
 )
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_ENTITY_TERM_EXTRACTION,
+    AGENT_TIMEOUT_CONNECT_LLM_ENTITY_TERM_EXTRACTION,
+)
+from onyx.configs.agent_configs import (
+    AGENT_TIMEOUT_LLM_ENTITY_TERM_EXTRACTION,
 )
 from onyx.configs.constants import NUM_EXPLORATORY_DOCS
+from onyx.llm.chat_llm import LLMRateLimitError
+from onyx.llm.chat_llm import LLMTimeoutError
 from onyx.prompts.agent_search import ENTITY_TERM_EXTRACTION_PROMPT
 from onyx.prompts.agent_search import ENTITY_TERM_EXTRACTION_PROMPT_JSON_EXAMPLE
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time


@@ -84,30 +90,42 @@ def extract_entities_terms(
    ]
    fast_llm = graph_config.tooling.fast_llm
    # Grader
-    llm_response = fast_llm.invoke(
-        prompt=msg,
-        timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_ENTITY_TERM_EXTRACTION,
-    )
-
-    cleaned_response = (
-        str(llm_response.content).replace("```json\n", "").replace("\n```", "")
-    )
-    first_bracket = cleaned_response.find("{")
-    last_bracket = cleaned_response.rfind("}")
-    cleaned_response = cleaned_response[first_bracket : last_bracket + 1]
-
    try:
-        entity_extraction_result = EntityExtractionResult.model_validate_json(
-            cleaned_response
+        llm_response = run_with_timeout(
+            AGENT_TIMEOUT_LLM_ENTITY_TERM_EXTRACTION,
+            fast_llm.invoke,
+            prompt=msg,
+            timeout_override=AGENT_TIMEOUT_CONNECT_LLM_ENTITY_TERM_EXTRACTION,
        )
-    except ValueError:
-        logger.error("Failed to parse LLM response as JSON in Entity-Term Extraction")
+
+        cleaned_response = (
+            str(llm_response.content).replace("```json\n", "").replace("\n```", "")
+        )
+        first_bracket = cleaned_response.find("{")
+        last_bracket = cleaned_response.rfind("}")
+        cleaned_response = cleaned_response[first_bracket : last_bracket + 1]
+
+        try:
+            entity_extraction_result = EntityExtractionResult.model_validate_json(
+                cleaned_response
+            )
+        except ValueError:
+            logger.error(
+                "Failed to parse LLM response as JSON in Entity-Term Extraction"
+            )
+            entity_extraction_result = EntityExtractionResult(
+                retrieved_entities_relationships=EntityRelationshipTermExtraction(),
+            )
+    except (LLMTimeoutError, TimeoutError):
+        logger.error("LLM Timeout Error - extract entities terms")
        entity_extraction_result = EntityExtractionResult(
-            retrieved_entities_relationships=EntityRelationshipTermExtraction(
-                entities=[],
-                relationships=[],
-                terms=[],
-            ),
+            retrieved_entities_relationships=EntityRelationshipTermExtraction(),
+        )
+
+    except LLMRateLimitError:
+        logger.error("LLM Rate Limit Error - extract entities terms")
+        entity_extraction_result = EntityExtractionResult(
+            retrieved_entities_relationships=EntityRelationshipTermExtraction(),
        )

    return EntityTermExtractionUpdate(
--- a/backend/onyx/agents/agent_search/deep_search/main/nodes/generate_validate_refined_answer.py
+++ b/backend/onyx/agents/agent_search/deep_search/main/nodes/generate_validate_refined_answer.py
@@ -1,5 +1,4 @@
 from datetime import datetime
-from typing import Any
 from typing import cast

 from langchain_core.messages import HumanMessage
@@ -66,14 +65,21 @@ from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.chat.models import AgentAnswerPiece
 from onyx.chat.models import ExtendedToolResponse
 from onyx.chat.models import StreamingError
+from onyx.configs.agent_configs import AGENT_ANSWER_GENERATION_BY_FAST_LLM
 from onyx.configs.agent_configs import AGENT_MAX_ANSWER_CONTEXT_DOCS
 from onyx.configs.agent_configs import AGENT_MAX_STREAMED_DOCS_FOR_REFINED_ANSWER
 from onyx.configs.agent_configs import AGENT_MIN_ORIG_QUESTION_DOCS
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_GENERATION,
+    AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_GENERATION,
 )
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_VALIDATION,
+    AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_VALIDATION,
+)
+from onyx.configs.agent_configs import (
+    AGENT_TIMEOUT_LLM_REFINED_ANSWER_GENERATION,
+)
+from onyx.configs.agent_configs import (
+    AGENT_TIMEOUT_LLM_REFINED_ANSWER_VALIDATION,
 )
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
@@ -92,6 +98,7 @@ from onyx.prompts.agent_search import (
 from onyx.prompts.agent_search import UNKNOWN_ANSWER
 from onyx.tools.tool_implementations.search.search_tool import yield_search_responses
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -253,7 +260,12 @@ def generate_validate_refined_answer(
        else REFINED_ANSWER_PROMPT_WO_SUB_QUESTIONS
    )

-    model = graph_config.tooling.fast_llm
+    model = (
+        graph_config.tooling.fast_llm
+        if AGENT_ANSWER_GENERATION_BY_FAST_LLM
+        else graph_config.tooling.primary_llm
+    )
+
    relevant_docs_str = format_docs(answer_generation_documents.context_documents)
    relevant_docs_str = trim_prompt_piece(
        model.config,
@@ -284,13 +296,13 @@ def generate_validate_refined_answer(
        )
    ]

-    streamed_tokens: list[str | list[str | dict[str, Any]]] = [""]
+    streamed_tokens: list[str] = [""]
    dispatch_timings: list[float] = []
    agent_error: AgentErrorLog | None = None

-    try:
+    def stream_refined_answer() -> list[str]:
        for message in model.stream(
-            msg, timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_GENERATION
+            msg, timeout_override=AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_GENERATION
        ):
            # TODO: in principle, the answer here COULD contain images, but we don't support that yet
            content = message.content
@@ -315,8 +327,15 @@ def generate_validate_refined_answer(
                (end_stream_token - start_stream_token).microseconds
            )
            streamed_tokens.append(content)
+        return streamed_tokens

-    except LLMTimeoutError:
+    try:
+        streamed_tokens = run_with_timeout(
+            AGENT_TIMEOUT_LLM_REFINED_ANSWER_GENERATION,
+            stream_refined_answer,
+        )
+
+    except (LLMTimeoutError, TimeoutError):
        agent_error = AgentErrorLog(
            error_type=AgentLLMErrorType.TIMEOUT,
            error_message=AGENT_LLM_TIMEOUT_MESSAGE,
@@ -383,16 +402,20 @@ def generate_validate_refined_answer(
        )
    ]

+    validation_model = graph_config.tooling.fast_llm
    try:
-        validation_response = model.invoke(
-            msg, timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_VALIDATION
+        validation_response = run_with_timeout(
+            AGENT_TIMEOUT_LLM_REFINED_ANSWER_VALIDATION,
+            validation_model.invoke,
+            prompt=msg,
+            timeout_override=AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_VALIDATION,
        )
        refined_answer_quality = binary_string_test_after_answer_separator(
            text=cast(str, validation_response.content),
            positive_value=AGENT_POSITIVE_VALUE_STR,
            separator=AGENT_ANSWER_SEPARATOR,
        )
-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        refined_answer_quality = True
        logger.error("LLM Timeout Error - validate refined answer")

--- a/backend/onyx/agents/agent_search/deep_search/shared/expanded_retrieval/nodes/expand_queries.py
+++ b/backend/onyx/agents/agent_search/deep_search/shared/expanded_retrieval/nodes/expand_queries.py
@@ -34,14 +34,16 @@ from onyx.agents.agent_search.shared_graph_utils.utils import (
 )
 from onyx.agents.agent_search.shared_graph_utils.utils import parse_question_id
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_QUERY_REWRITING_GENERATION,
+    AGENT_TIMEOUT_CONNECT_LLM_QUERY_REWRITING_GENERATION,
 )
+from onyx.configs.agent_configs import AGENT_TIMEOUT_LLM_QUERY_REWRITING_GENERATION
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
 from onyx.prompts.agent_search import (
    QUERY_REWRITING_PROMPT,
 )
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -69,7 +71,7 @@ def expand_queries(
    node_start_time = datetime.now()
    question = state.question

-    llm = graph_config.tooling.fast_llm
+    model = graph_config.tooling.fast_llm
    sub_question_id = state.sub_question_id
    if sub_question_id is None:
        level, question_num = 0, 0
@@ -88,10 +90,12 @@ def expand_queries(
    rewritten_queries = []

    try:
-        llm_response_list = dispatch_separated(
-            llm.stream(
+        llm_response_list = run_with_timeout(
+            AGENT_TIMEOUT_LLM_QUERY_REWRITING_GENERATION,
+            dispatch_separated,
+            model.stream(
                prompt=msg,
-                timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_QUERY_REWRITING_GENERATION,
+                timeout_override=AGENT_TIMEOUT_CONNECT_LLM_QUERY_REWRITING_GENERATION,
            ),
            dispatch_subquery(level, question_num, writer),
        )
@@ -101,7 +105,7 @@ def expand_queries(
        rewritten_queries = llm_response.split("\n")
        log_result = f"Number of expanded queries: {len(rewritten_queries)}"

-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        agent_error = AgentErrorLog(
            error_type=AgentLLMErrorType.TIMEOUT,
            error_message=AGENT_LLM_TIMEOUT_MESSAGE,
--- a/backend/onyx/agents/agent_search/deep_search/shared/expanded_retrieval/nodes/rerank_documents.py
+++ b/backend/onyx/agents/agent_search/deep_search/shared/expanded_retrieval/nodes/rerank_documents.py
@@ -55,6 +55,7 @@ def rerank_documents(

    # Note that these are passed in values from the API and are overrides which are typically None
    rerank_settings = graph_config.inputs.search_request.rerank_settings
+    allow_agent_reranking = graph_config.behavior.allow_agent_reranking

    if rerank_settings is None:
        with get_session_context_manager() as db_session:
@@ -62,23 +63,31 @@ def rerank_documents(
            if not search_settings.disable_rerank_for_streaming:
                rerank_settings = RerankingDetails.from_db_model(search_settings)

+    # Initial default: no reranking. Will be overwritten below if reranking is warranted
+    reranked_documents = verified_documents
+
    if should_rerank(rerank_settings) and len(verified_documents) > 0:
        if len(verified_documents) > 1:
-            reranked_documents = rerank_sections(
-                query_str=question,
-                # if runnable, then rerank_settings is not None
-                rerank_settings=cast(RerankingDetails, rerank_settings),
-                sections_to_rerank=verified_documents,
-            )
+            if not allow_agent_reranking:
+                logger.info("Use of local rerank model without GPU, skipping reranking")
+            # No reranking, stay with verified_documents as default
+
+            else:
+                # Reranking is warranted, use the rerank_sections functon
+                reranked_documents = rerank_sections(
+                    query_str=question,
+                    # if runnable, then rerank_settings is not None
+                    rerank_settings=cast(RerankingDetails, rerank_settings),
+                    sections_to_rerank=verified_documents,
+                )
        else:
            logger.warning(
                f"{len(verified_documents)} verified document(s) found, skipping reranking"
            )
-            reranked_documents = verified_documents
+            # No reranking, stay with verified_documents as default
    else:
        logger.warning("No reranking settings found, using unranked documents")
-        reranked_documents = verified_documents
-
+        # No reranking, stay with verified_documents as default
    if AGENT_RERANKING_STATS:
        fit_scores = get_fit_scores(verified_documents, reranked_documents)
    else:
--- a/backend/onyx/agents/agent_search/deep_search/shared/expanded_retrieval/nodes/verify_documents.py
+++ b/backend/onyx/agents/agent_search/deep_search/shared/expanded_retrieval/nodes/verify_documents.py
@@ -25,13 +25,15 @@ from onyx.agents.agent_search.shared_graph_utils.models import LLMNodeErrorStrin
 from onyx.agents.agent_search.shared_graph_utils.utils import (
    get_langgraph_node_log_string,
 )
-from onyx.configs.agent_configs import AGENT_TIMEOUT_OVERRIDE_LLM_DOCUMENT_VERIFICATION
+from onyx.configs.agent_configs import AGENT_TIMEOUT_CONNECT_LLM_DOCUMENT_VERIFICATION
+from onyx.configs.agent_configs import AGENT_TIMEOUT_LLM_DOCUMENT_VERIFICATION
 from onyx.llm.chat_llm import LLMRateLimitError
 from onyx.llm.chat_llm import LLMTimeoutError
 from onyx.prompts.agent_search import (
    DOCUMENT_VERIFICATION_PROMPT,
 )
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout
 from onyx.utils.timing import log_function_time

 logger = setup_logger()
@@ -86,8 +88,11 @@ def verify_documents(
    ]  # default is to treat document as relevant

    try:
-        response = fast_llm.invoke(
-            msg, timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_DOCUMENT_VERIFICATION
+        response = run_with_timeout(
+            AGENT_TIMEOUT_LLM_DOCUMENT_VERIFICATION,
+            fast_llm.invoke,
+            prompt=msg,
+            timeout_override=AGENT_TIMEOUT_CONNECT_LLM_DOCUMENT_VERIFICATION,
        )

        assert isinstance(response.content, str)
@@ -96,7 +101,7 @@ def verify_documents(
        ):
            verified_documents = []

-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        # In this case, we decide to continue and don't raise an error, as
        # little harm in letting some docs through that are less relevant.
        logger.error("LLM Timeout Error - verify documents")
--- a/backend/onyx/agents/agent_search/models.py
+++ b/backend/onyx/agents/agent_search/models.py
@@ -67,6 +67,7 @@ class GraphSearchConfig(BaseModel):
    # Whether to allow creation of refinement questions (and entity extraction, etc.)
    allow_refinement: bool = True
    skip_gen_ai_answer_generation: bool = False
+    allow_agent_reranking: bool = False


 class GraphConfig(BaseModel):
--- a/backend/onyx/agents/agent_search/orchestration/nodes/call_tool.py
+++ b/backend/onyx/agents/agent_search/orchestration/nodes/call_tool.py
@@ -28,7 +28,7 @@ def emit_packet(packet: AnswerPacket, writer: StreamWriter) -> None:
    write_custom_event("basic_response", packet, writer)


-def tool_call(
+def call_tool(
    state: ToolChoiceUpdate,
    config: RunnableConfig,
    writer: StreamWriter = lambda _: None,
--- a/backend/onyx/agents/agent_search/orchestration/nodes/llm_tool_choice.py
+++ b/backend/onyx/agents/agent_search/orchestration/nodes/llm_tool_choice.py
@@ -25,7 +25,7 @@ logger = setup_logger()
 # and a function that handles extracting the necessary fields
 # from the state and config
 # TODO: fan-out to multiple tool call nodes? Make this configurable?
-def llm_tool_choice(
+def choose_tool(
    state: ToolChoiceState,
    config: RunnableConfig,
    writer: StreamWriter = lambda _: None,
--- a/backend/onyx/agents/agent_search/orchestration/nodes/basic_use_tool_response.py
+++ b/backend/onyx/agents/agent_search/orchestration/nodes/basic_use_tool_response.py
--- a/backend/onyx/agents/agent_search/shared_graph_utils/utils.py
+++ b/backend/onyx/agents/agent_search/shared_graph_utils/utils.py
@@ -43,8 +43,9 @@ from onyx.chat.models import StreamStopReason
 from onyx.chat.models import StreamType
 from onyx.chat.prompt_builder.answer_prompt_builder import AnswerPromptBuilder
 from onyx.configs.agent_configs import (
-    AGENT_TIMEOUT_OVERRIDE_LLM_HISTORY_SUMMARY_GENERATION,
+    AGENT_TIMEOUT_CONNECT_LLM_HISTORY_SUMMARY_GENERATION,
 )
+from onyx.configs.agent_configs import AGENT_TIMEOUT_LLM_HISTORY_SUMMARY_GENERATION
 from onyx.configs.chat_configs import CHAT_TARGET_CHUNK_PERCENTAGE
 from onyx.configs.chat_configs import MAX_CHUNKS_FED_TO_CHAT
 from onyx.configs.constants import DEFAULT_PERSONA_ID
@@ -80,6 +81,7 @@ from onyx.tools.tool_implementations.search.search_tool import SearchResponseSum
 from onyx.tools.tool_implementations.search.search_tool import SearchTool
 from onyx.tools.utils import explicit_tool_calling_supported
 from onyx.utils.logger import setup_logger
+from onyx.utils.threadpool_concurrency import run_with_timeout

 logger = setup_logger()

@@ -395,11 +397,13 @@ def summarize_history(
    )

    try:
-        history_response = llm.invoke(
+        history_response = run_with_timeout(
+            AGENT_TIMEOUT_LLM_HISTORY_SUMMARY_GENERATION,
+            llm.invoke,
            history_context_prompt,
-            timeout_override=AGENT_TIMEOUT_OVERRIDE_LLM_HISTORY_SUMMARY_GENERATION,
+            timeout_override=AGENT_TIMEOUT_CONNECT_LLM_HISTORY_SUMMARY_GENERATION,
        )
-    except LLMTimeoutError:
+    except (LLMTimeoutError, TimeoutError):
        logger.error("LLM Timeout Error - summarize history")
        return (
            history  # this is what is done at this point anyway, so we default to this
--- a/backend/onyx/auth/users.py
+++ b/backend/onyx/auth/users.py
@@ -94,6 +94,7 @@ from onyx.db.models import User
 from onyx.db.users import get_user_by_email
 from onyx.redis.redis_pool import get_async_redis_connection
 from onyx.redis.redis_pool import get_redis_client
+from onyx.server.utils import BasicAuthenticationError
 from onyx.utils.logger import setup_logger
 from onyx.utils.telemetry import create_milestone_and_report
 from onyx.utils.telemetry import optional_telemetry
@@ -107,11 +108,6 @@ from shared_configs.contextvars import CURRENT_TENANT_ID_CONTEXTVAR
 logger = setup_logger()


-class BasicAuthenticationError(HTTPException):
-    def __init__(self, detail: str):
-        super().__init__(status_code=status.HTTP_403_FORBIDDEN, detail=detail)
-
-
 def is_user_admin(user: User | None) -> bool:
    if AUTH_TYPE == AuthType.DISABLED:
        return True
--- a/backend/onyx/background/celery/tasks/shared/tasks.py
+++ b/backend/onyx/background/celery/tasks/shared/tasks.py
@@ -105,6 +105,7 @@ def document_by_cc_pair_cleanup_task(
                    tenant_id=tenant_id,
                    chunk_count=chunk_count,
                )
+
                delete_documents_complete__no_commit(
                    db_session=db_session,
                    document_ids=[document_id],
--- a/backend/onyx/chat/answer.py
+++ b/backend/onyx/chat/answer.py
@@ -27,8 +27,10 @@ from onyx.file_store.utils import InMemoryChatFile
 from onyx.llm.interfaces import LLM
 from onyx.tools.force import ForceUseTool
 from onyx.tools.tool import Tool
+from onyx.tools.tool_implementations.search.search_tool import QUERY_FIELD
 from onyx.tools.tool_implementations.search.search_tool import SearchTool
 from onyx.tools.utils import explicit_tool_calling_supported
+from onyx.utils.gpu_utils import gpu_status_request
 from onyx.utils.logger import setup_logger

 logger = setup_logger()
@@ -80,6 +82,26 @@ class Answer:
            and not skip_explicit_tool_calling
        )

+        rerank_settings = search_request.rerank_settings
+
+        using_cloud_reranking = (
+            rerank_settings is not None
+            and rerank_settings.rerank_provider_type is not None
+        )
+        allow_agent_reranking = gpu_status_request() or using_cloud_reranking
+
+        # TODO: this is a hack to force the query to be used for the search tool
+        #       this should be removed once we fully unify graph inputs (i.e.
+        #       remove SearchQuery entirely)
+        if (
+            force_use_tool.force_use
+            and search_tool
+            and force_use_tool.args
+            and force_use_tool.tool_name == search_tool.name
+            and QUERY_FIELD in force_use_tool.args
+        ):
+            search_request.query = force_use_tool.args[QUERY_FIELD]
+
        self.graph_inputs = GraphInputs(
            search_request=search_request,
            prompt_builder=prompt_builder,
@@ -94,7 +116,6 @@ class Answer:
            force_use_tool=force_use_tool,
            using_tool_calling_llm=using_tool_calling_llm,
        )
-        assert db_session, "db_session must be provided for agentic persistence"
        self.graph_persistence = GraphPersistence(
            db_session=db_session,
            chat_session_id=chat_session_id,
@@ -104,6 +125,7 @@ class Answer:
            use_agentic_search=use_agentic_search,
            skip_gen_ai_answer_generation=skip_gen_ai_answer_generation,
            allow_refinement=True,
+            allow_agent_reranking=allow_agent_reranking,
        )
        self.graph_config = GraphConfig(
            inputs=self.graph_inputs,
--- a/backend/onyx/chat/process_message.py
+++ b/backend/onyx/chat/process_message.py
@@ -7,7 +7,7 @@ from typing import cast

 from sqlalchemy.orm import Session

-from onyx.agents.agent_search.orchestration.nodes.tool_call import ToolCallException
+from onyx.agents.agent_search.orchestration.nodes.call_tool import ToolCallException
 from onyx.chat.answer import Answer
 from onyx.chat.chat_utils import create_chat_chain
 from onyx.chat.chat_utils import create_temporary_persona
--- a/backend/onyx/configs/agent_configs.py
+++ b/backend/onyx/configs/agent_configs.py
@@ -31,22 +31,9 @@ AGENT_DEFAULT_MIN_ORIG_QUESTION_DOCS = 3
 AGENT_DEFAULT_MAX_ANSWER_CONTEXT_DOCS = 10
 AGENT_DEFAULT_MAX_STATIC_HISTORY_WORD_LENGTH = 2000

-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_GENERAL_GENERATION = 30  # in seconds
-
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_HISTORY_SUMMARY_GENERATION = 10  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_ENTITY_TERM_EXTRACTION = 25  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_QUERY_REWRITING_GENERATION = 4  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_DOCUMENT_VERIFICATION = 1  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_SUBQUESTION_GENERATION = 3  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_GENERATION = 12  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_CHECK = 8  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_INITIAL_ANSWER_GENERATION = 25  # in seconds
-
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_REFINED_SUBQUESTION_GENERATION = 6  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_GENERATION = 25  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_VALIDATION = 8  # in seconds
-AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_COMPARE_ANSWERS = 8  # in seconds
-
+AGENT_ANSWER_GENERATION_BY_FAST_LLM = (
+    os.environ.get("AGENT_ANSWER_GENERATION_BY_FAST_LLM", "").lower() == "true"
+)

 AGENT_RETRIEVAL_STATS = (
    not os.environ.get("AGENT_RETRIEVAL_STATS") == "False"
@@ -178,80 +165,172 @@ AGENT_MAX_STATIC_HISTORY_WORD_LENGTH = int(
 )  # 2000


-AGENT_TIMEOUT_OVERRIDE_LLM_ENTITY_TERM_EXTRACTION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_ENTITY_TERM_EXTRACTION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_ENTITY_TERM_EXTRACTION
-)  # 25
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_ENTITY_TERM_EXTRACTION = 10  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_ENTITY_TERM_EXTRACTION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_ENTITY_TERM_EXTRACTION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_ENTITY_TERM_EXTRACTION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_ENTITY_TERM_EXTRACTION = 30  # in seconds
+AGENT_TIMEOUT_LLM_ENTITY_TERM_EXTRACTION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_ENTITY_TERM_EXTRACTION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_ENTITY_TERM_EXTRACTION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_DOCUMENT_VERIFICATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_DOCUMENT_VERIFICATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_DOCUMENT_VERIFICATION
-)  # 3
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_DOCUMENT_VERIFICATION = 2  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_DOCUMENT_VERIFICATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_DOCUMENT_VERIFICATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_DOCUMENT_VERIFICATION
+)

-AGENT_TIMEOUT_OVERRIDE_LLM_GENERAL_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_GENERAL_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_GENERAL_GENERATION
-)  # 30
+AGENT_DEFAULT_TIMEOUT_LLM_DOCUMENT_VERIFICATION = 4  # in seconds
+AGENT_TIMEOUT_LLM_DOCUMENT_VERIFICATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_DOCUMENT_VERIFICATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_DOCUMENT_VERIFICATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_SUBQUESTION_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_SUBQUESTION_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_SUBQUESTION_GENERATION
-)  # 8
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_GENERAL_GENERATION = 5  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_GENERAL_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_GENERAL_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_GENERAL_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_GENERAL_GENERATION = 30  # in seconds
+AGENT_TIMEOUT_LLM_GENERAL_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_GENERAL_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_GENERAL_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_GENERATION
-)  # 12
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_SUBQUESTION_GENERATION = 2  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_SUBQUESTION_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_SUBQUESTION_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_SUBQUESTION_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_SUBQUESTION_GENERATION = 5  # in seconds
+AGENT_TIMEOUT_LLM_SUBQUESTION_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_SUBQUESTION_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_SUBQUESTION_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_INITIAL_ANSWER_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_INITIAL_ANSWER_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_INITIAL_ANSWER_GENERATION
-)  # 25
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_SUBANSWER_GENERATION = 3  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_SUBANSWER_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_SUBANSWER_GENERATION = 30  # in seconds
+AGENT_TIMEOUT_LLM_SUBANSWER_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_SUBANSWER_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_SUBANSWER_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_GENERATION
-)  # 25
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_INITIAL_ANSWER_GENERATION = 5  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_INITIAL_ANSWER_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_INITIAL_ANSWER_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_INITIAL_ANSWER_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_INITIAL_ANSWER_GENERATION = 25  # in seconds
+AGENT_TIMEOUT_LLM_INITIAL_ANSWER_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_INITIAL_ANSWER_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_INITIAL_ANSWER_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_CHECK = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_CHECK")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_SUBANSWER_CHECK
-)  # 8
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_GENERATION = 5  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_REFINED_ANSWER_GENERATION = 30  # in seconds
+AGENT_TIMEOUT_LLM_REFINED_ANSWER_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_REFINED_ANSWER_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_REFINED_ANSWER_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_SUBQUESTION_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_SUBQUESTION_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_REFINED_SUBQUESTION_GENERATION
-)  # 6
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_SUBANSWER_CHECK = 2  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_CHECK = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_SUBANSWER_CHECK")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_SUBANSWER_CHECK
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_SUBANSWER_CHECK = 8  # in seconds
+AGENT_TIMEOUT_LLM_SUBANSWER_CHECK = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_SUBANSWER_CHECK")
+    or AGENT_DEFAULT_TIMEOUT_LLM_SUBANSWER_CHECK
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_QUERY_REWRITING_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_QUERY_REWRITING_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_QUERY_REWRITING_GENERATION
-)  # 1
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_REFINED_SUBQUESTION_GENERATION = 3  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_REFINED_SUBQUESTION_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_REFINED_SUBQUESTION_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_REFINED_SUBQUESTION_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_REFINED_SUBQUESTION_GENERATION = 8  # in seconds
+AGENT_TIMEOUT_LLM_REFINED_SUBQUESTION_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_REFINED_SUBQUESTION_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_REFINED_SUBQUESTION_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_HISTORY_SUMMARY_GENERATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_HISTORY_SUMMARY_GENERATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_HISTORY_SUMMARY_GENERATION
-)  # 4
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_QUERY_REWRITING_GENERATION = 1  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_QUERY_REWRITING_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_QUERY_REWRITING_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_QUERY_REWRITING_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_QUERY_REWRITING_GENERATION = 3  # in seconds
+AGENT_TIMEOUT_LLM_QUERY_REWRITING_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_QUERY_REWRITING_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_QUERY_REWRITING_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_COMPARE_ANSWERS = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_COMPARE_ANSWERS")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_COMPARE_ANSWERS
-)  # 8
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_HISTORY_SUMMARY_GENERATION = 2  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_HISTORY_SUMMARY_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_HISTORY_SUMMARY_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_HISTORY_SUMMARY_GENERATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_HISTORY_SUMMARY_GENERATION = 5  # in seconds
+AGENT_TIMEOUT_LLM_HISTORY_SUMMARY_GENERATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_HISTORY_SUMMARY_GENERATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_HISTORY_SUMMARY_GENERATION
+)


-AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_VALIDATION = int(
-    os.environ.get("AGENT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_VALIDATION")
-    or AGENT_DEFAULT_TIMEOUT_OVERRIDE_LLM_REFINED_ANSWER_VALIDATION
-)  # 8
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_COMPARE_ANSWERS = 2  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_COMPARE_ANSWERS = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_COMPARE_ANSWERS")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_COMPARE_ANSWERS
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_COMPARE_ANSWERS = 8  # in seconds
+AGENT_TIMEOUT_LLM_COMPARE_ANSWERS = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_COMPARE_ANSWERS")
+    or AGENT_DEFAULT_TIMEOUT_LLM_COMPARE_ANSWERS
+)
+
+
+AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_VALIDATION = 2  # in seconds
+AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_VALIDATION = int(
+    os.environ.get("AGENT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_VALIDATION")
+    or AGENT_DEFAULT_TIMEOUT_CONNECT_LLM_REFINED_ANSWER_VALIDATION
+)
+
+AGENT_DEFAULT_TIMEOUT_LLM_REFINED_ANSWER_VALIDATION = 8  # in seconds
+AGENT_TIMEOUT_LLM_REFINED_ANSWER_VALIDATION = int(
+    os.environ.get("AGENT_TIMEOUT_LLM_REFINED_ANSWER_VALIDATION")
+    or AGENT_DEFAULT_TIMEOUT_LLM_REFINED_ANSWER_VALIDATION
+)

 GRAPH_VERSION_NAME: str = "a"
--- a/backend/onyx/db/chat.py
+++ b/backend/onyx/db/chat.py
@@ -628,7 +628,7 @@ def create_new_chat_message(
    commit: bool = True,
    reserved_message_id: int | None = None,
    overridden_model: str | None = None,
-    refined_answer_improvement: bool = True,
+    refined_answer_improvement: bool | None = None,
 ) -> ChatMessage:
    if reserved_message_id is not None:
        # Edit existing message
--- a/backend/onyx/llm/chat_llm.py
+++ b/backend/onyx/llm/chat_llm.py
@@ -409,10 +409,6 @@ class DefaultMultiLLM(LLM):
        self._record_call(processed_prompt)

        try:
-            print(
-                "model is",
-                f"{self.config.model_provider}/{self.config.deployment_name or self.config.model_name}",
-            )
            return litellm.completion(
                mock_response=MOCK_LLM_RESPONSE,
                # model choice
--- a/backend/onyx/seeding/load_docs.py
+++ b/backend/onyx/seeding/load_docs.py
@@ -61,10 +61,10 @@ def _create_indexable_chunks(
            doc_updated_at=None,
            primary_owners=[],
            secondary_owners=[],
-            chunk_count=1,
+            chunk_count=preprocessed_doc["chunk_ind"] + 1,
        )
-        if preprocessed_doc["chunk_ind"] == 0:
-            ids_to_documents[document.id] = document
+
+        ids_to_documents[document.id] = document

        chunk = DocMetadataAwareIndexChunk(
            chunk_id=preprocessed_doc["chunk_ind"],
@@ -92,6 +92,7 @@ def _create_indexable_chunks(
            boost=DEFAULT_BOOST,
            large_chunk_id=None,
        )
+
        chunks.append(chunk)

    return list(ids_to_documents.values()), chunks
@@ -192,6 +193,7 @@ def seed_initial_documents(
        last_successful_index_time=last_index_time,
        seeding_flow=True,
    )
+
    cc_pair_id = cast(int, result.data)
    processed_docs = fetch_versioned_implementation(
        "onyx.seeding.load_docs",
@@ -249,4 +251,5 @@ def seed_initial_documents(
            .values(chunk_count=doc.chunk_count)
        )

+    db_session.commit()
    kv_store.store(KV_DOCUMENTS_SEEDED_KEY, True)
--- a/backend/onyx/server/query_and_chat/chat_backend.py
+++ b/backend/onyx/server/query_and_chat/chat_backend.py
@@ -213,8 +213,6 @@ def get_chat_session(
        # we need the tool call objs anyways, so just fetch them in a single call
        prefetch_tool_calls=True,
    )
-    for message in session_messages:
-        translate_db_message_to_chat_message_detail(message)

    return ChatSessionDetailResponse(
        chat_session_id=session_id,
--- a/backend/onyx/tools/tool_implementations/search/search_tool.py
+++ b/backend/onyx/tools/tool_implementations/search/search_tool.py
@@ -58,6 +58,7 @@ SEARCH_RESPONSE_SUMMARY_ID = "search_response_summary"
 SEARCH_DOC_CONTENT_ID = "search_doc_content"
 SECTION_RELEVANCE_LIST_ID = "section_relevance_list"
 SEARCH_EVALUATION_ID = "llm_doc_eval"
+QUERY_FIELD = "query"


 class SearchResponseSummary(SearchQueryInfo):
@@ -179,12 +180,12 @@ class SearchTool(Tool[SearchToolOverrideKwargs]):
                "parameters": {
                    "type": "object",
                    "properties": {
-                        "query": {
+                        QUERY_FIELD: {
                            "type": "string",
                            "description": "What to search for",
                        },
                    },
-                    "required": ["query"],
+                    "required": [QUERY_FIELD],
                },
            },
        }
@@ -223,7 +224,7 @@ class SearchTool(Tool[SearchToolOverrideKwargs]):
        rephrased_query = history_based_query_rephrase(
            query=query, history=history, llm=llm
        )
-        return {"query": rephrased_query}
+        return {QUERY_FIELD: rephrased_query}

    """Actual tool execution"""

@@ -279,7 +280,7 @@ class SearchTool(Tool[SearchToolOverrideKwargs]):
    def run(
        self, override_kwargs: SearchToolOverrideKwargs | None = None, **llm_kwargs: Any
    ) -> Generator[ToolResponse, None, None]:
-        query = cast(str, llm_kwargs["query"])
+        query = cast(str, llm_kwargs[QUERY_FIELD])
        force_no_rerank = False
        alternate_db_session = None
        retrieved_sections_callback = None
--- a/backend/onyx/utils/threadpool_concurrency.py
+++ b/backend/onyx/utils/threadpool_concurrency.py
@@ -1,3 +1,4 @@
+import threading
 import uuid
 from collections.abc import Callable
 from concurrent.futures import as_completed
@@ -13,6 +14,10 @@ logger = setup_logger()
 R = TypeVar("R")


+# WARNING: it is not currently well understood whether we lose access to contextvars when functions are
+# executed through this wrapper Do NOT try to acquire a db session in a function run through this unless
+# you have heavily tested that multi-tenancy is respected. If/when we know for sure that it is or
+# is not safe, update this comment.
 def run_functions_tuples_in_parallel(
    functions_with_args: list[tuple[Callable, tuple]],
    allow_failures: bool = False,
@@ -78,6 +83,10 @@ class FunctionCall(Generic[R]):
        return self.func(*self.args, **self.kwargs)


+# WARNING: it is not currently well understood whether we lose access to contextvars when functions are
+# executed through this wrapper Do NOT try to acquire a db session in a function run through this unless
+# you have heavily tested that multi-tenancy is respected. If/when we know for sure that it is or
+# is not safe, update this comment.
 def run_functions_in_parallel(
    function_calls: list[FunctionCall],
    allow_failures: bool = False,
@@ -109,3 +118,49 @@ def run_functions_in_parallel(
                    raise

    return results
+
+
+class TimeoutThread(threading.Thread):
+    def __init__(
+        self, timeout: float, func: Callable[..., R], *args: Any, **kwargs: Any
+    ):
+        super().__init__()
+        self.timeout = timeout
+        self.func = func
+        self.args = args
+        self.kwargs = kwargs
+        self.exception: Exception | None = None
+
+    def run(self) -> None:
+        try:
+            self.result = self.func(*self.args, **self.kwargs)
+        except Exception as e:
+            self.exception = e
+
+    def end(self) -> None:
+        raise TimeoutError(
+            f"Function {self.func.__name__} timed out after {self.timeout} seconds"
+        )
+
+
+# WARNING: it is not currently well understood whether we lose access to contextvars when functions are
+# executed through this wrapper Do NOT try to acquire a db session in a function run through this unless
+# you have heavily tested that multi-tenancy is respected. If/when we know for sure that it is or
+# is not safe, update this comment.
+def run_with_timeout(
+    timeout: float, func: Callable[..., R], *args: Any, **kwargs: Any
+) -> R:
+    """
+    Executes a function with a timeout. If the function doesn't complete within the specified
+    timeout, raises TimeoutError.
+    """
+    task = TimeoutThread(timeout, func, *args, **kwargs)
+    task.start()
+    task.join(timeout)
+
+    if task.exception is not None:
+        raise task.exception
+    if task.is_alive():
+        task.end()
+
+    return task.result
--- a/backend/tests/unit/onyx/chat/test_answer.py
+++ b/backend/tests/unit/onyx/chat/test_answer.py
@@ -11,6 +11,7 @@ from langchain_core.messages import HumanMessage
 from langchain_core.messages import SystemMessage
 from langchain_core.messages import ToolCall
 from langchain_core.messages import ToolCallChunk
+from pytest_mock import MockerFixture
 from sqlalchemy.orm import Session

 from onyx.chat.answer import Answer
@@ -25,6 +26,7 @@ from onyx.chat.models import StreamStopReason
 from onyx.chat.prompt_builder.answer_prompt_builder import AnswerPromptBuilder
 from onyx.chat.prompt_builder.answer_prompt_builder import default_build_system_message
 from onyx.chat.prompt_builder.answer_prompt_builder import default_build_user_message
+from onyx.context.search.models import RerankingDetails
 from onyx.context.search.models import SearchRequest
 from onyx.llm.interfaces import LLM
 from onyx.tools.force import ForceUseTool
@@ -35,6 +37,7 @@ from onyx.tools.tool_implementations.search.search_tool import SEARCH_DOC_CONTEN
 from onyx.tools.tool_implementations.search_like_tool_utils import (
    FINAL_CONTEXT_DOCUMENTS_ID,
 )
+from shared_configs.enums import RerankerProvider
 from tests.unit.onyx.chat.conftest import DEFAULT_SEARCH_ARGS
 from tests.unit.onyx.chat.conftest import QUERY

@@ -44,6 +47,20 @@ def answer_instance(
    mock_llm: LLM,
    answer_style_config: AnswerStyleConfig,
    prompt_config: PromptConfig,
+    mocker: MockerFixture,
+) -> Answer:
+    mocker.patch(
+        "onyx.chat.answer.gpu_status_request",
+        return_value=True,
+    )
+    return _answer_fixture_impl(mock_llm, answer_style_config, prompt_config)
+
+
+def _answer_fixture_impl(
+    mock_llm: LLM,
+    answer_style_config: AnswerStyleConfig,
+    prompt_config: PromptConfig,
+    rerank_settings: RerankingDetails | None = None,
 ) -> Answer:
    return Answer(
        prompt_builder=AnswerPromptBuilder(
@@ -64,13 +81,13 @@ def answer_instance(
        llm=mock_llm,
        fast_llm=mock_llm,
        force_use_tool=ForceUseTool(force_use=False, tool_name="", args=None),
-        search_request=SearchRequest(query=QUERY),
+        search_request=SearchRequest(query=QUERY, rerank_settings=rerank_settings),
        chat_session_id=UUID("123e4567-e89b-12d3-a456-426614174000"),
        current_agent_message_id=0,
    )


-def test_basic_answer(answer_instance: Answer) -> None:
+def test_basic_answer(answer_instance: Answer, mocker: MockerFixture) -> None:
    mock_llm = cast(Mock, answer_instance.graph_config.tooling.primary_llm)
    mock_llm.stream.return_value = [
        AIMessageChunk(content="This is a "),
@@ -363,3 +380,49 @@ def test_is_cancelled(answer_instance: Answer) -> None:

    # Verify LLM calls
    mock_llm.stream.assert_called_once()
+
+
+@pytest.mark.parametrize(
+    "gpu_enabled,is_local_model",
+    [
+        (True, False),
+        (False, True),
+        (True, True),
+        (False, False),
+    ],
+)
+def test_no_slow_reranking(
+    gpu_enabled: bool,
+    is_local_model: bool,
+    mock_llm: LLM,
+    answer_style_config: AnswerStyleConfig,
+    prompt_config: PromptConfig,
+    mocker: MockerFixture,
+) -> None:
+    mocker.patch(
+        "onyx.chat.answer.gpu_status_request",
+        return_value=gpu_enabled,
+    )
+    rerank_settings = (
+        None
+        if is_local_model
+        else RerankingDetails(
+            rerank_model_name="test_model",
+            rerank_api_url="test_url",
+            rerank_api_key="test_key",
+            num_rerank=10,
+            rerank_provider_type=RerankerProvider.COHERE,
+        )
+    )
+    answer_instance = _answer_fixture_impl(
+        mock_llm, answer_style_config, prompt_config, rerank_settings=rerank_settings
+    )
+
+    assert (
+        answer_instance.graph_config.inputs.search_request.rerank_settings
+        == rerank_settings
+    )
+    assert (
+        answer_instance.graph_config.behavior.allow_agent_reranking == gpu_enabled
+        or not is_local_model
+    )
--- a/backend/tests/unit/onyx/chat/test_skip_gen_ai.py
+++ b/backend/tests/unit/onyx/chat/test_skip_gen_ai.py
@@ -36,7 +36,12 @@ def test_skip_gen_ai_answer_generation_flag(
    mock_search_tool: SearchTool,
    answer_style_config: AnswerStyleConfig,
    prompt_config: PromptConfig,
+    mocker: MockerFixture,
 ) -> None:
+    mocker.patch(
+        "onyx.chat.answer.gpu_status_request",
+        return_value=True,
+    )
    question = config["question"]
    skip_gen_ai_answer_generation = config["skip_gen_ai_answer_generation"]

--- a/backend/tests/unit/onyx/utils/test_threadpool_concurrency.py
+++ b/backend/tests/unit/onyx/utils/test_threadpool_concurrency.py
@@ -0,0 +1,61 @@
+import time
+
+import pytest
+
+from onyx.utils.threadpool_concurrency import run_with_timeout
+
+
+def test_run_with_timeout_completes() -> None:
+    """Test that a function that completes within timeout works correctly"""
+
+    def quick_function(x: int) -> int:
+        return x * 2
+
+    result = run_with_timeout(1.0, quick_function, x=21)
+    assert result == 42
+
+
+@pytest.mark.parametrize("slow,timeout", [(1, 0.1), (0.3, 0.2)])
+def test_run_with_timeout_raises_on_timeout(slow: float, timeout: float) -> None:
+    """Test that a function that exceeds timeout raises TimeoutError"""
+
+    def slow_function() -> None:
+        time.sleep(slow)  # Sleep for 2 seconds
+
+    with pytest.raises(TimeoutError) as exc_info:
+        start = time.time()
+        run_with_timeout(timeout, slow_function)  # Set timeout to 0.1 seconds
+        end = time.time()
+        assert end - start >= timeout
+        assert end - start < (slow + timeout) / 2
+    assert f"timed out after {timeout} seconds" in str(exc_info.value)
+
+
+@pytest.mark.filterwarnings("ignore::pytest.PytestUnhandledThreadExceptionWarning")
+def test_run_with_timeout_propagates_exceptions() -> None:
+    """Test that other exceptions from the function are propagated properly"""
+
+    def error_function() -> None:
+        raise ValueError("Test error")
+
+    with pytest.raises(ValueError) as exc_info:
+        run_with_timeout(1.0, error_function)
+
+    assert "Test error" in str(exc_info.value)
+
+
+def test_run_with_timeout_with_args_and_kwargs() -> None:
+    """Test that args and kwargs are properly passed to the function"""
+
+    def complex_function(x: int, y: int, multiply: bool = False) -> int:
+        if multiply:
+            return x * y
+        return x + y
+
+    # Test with just positional args
+    result1 = run_with_timeout(1.0, complex_function, x=5, y=3)
+    assert result1 == 8
+
+    # Test with positional and keyword args
+    result2 = run_with_timeout(1.0, complex_function, x=5, y=3, multiply=True)
+    assert result2 == 15
--- a/web/src/app/chat/ChatPage.tsx
+++ b/web/src/app/chat/ChatPage.tsx
@@ -1,6 +1,11 @@
 "use client";

-import { redirect, useRouter, useSearchParams } from "next/navigation";
+import {
+  redirect,
+  usePathname,
+  useRouter,
+  useSearchParams,
+} from "next/navigation";
 import {
  BackendChatSession,
  BackendMessage,
@@ -130,6 +135,7 @@ import {
 } from "@/lib/browserUtilities";
 import { Button } from "@/components/ui/button";
 import { ConfirmEntityModal } from "@/components/modals/ConfirmEntityModal";
+import { MessageChannel } from "node:worker_threads";

 const TEMP_USER_MESSAGE_ID = -1;
 const TEMP_ASSISTANT_MESSAGE_ID = -2;
@@ -1145,6 +1151,7 @@ export function ChatPage({
    regenerationRequest?: RegenerationRequest | null;
    overrideFileDescriptors?: FileDescriptor[];
  } = {}) => {
+    navigatingAway.current = false;
    let frozenSessionId = currentSessionId();
    updateCanContinue(false, frozenSessionId);

@@ -1267,7 +1274,6 @@ export function ChatPage({
    let stackTrace: string | null = null;

    let sub_questions: SubQuestionDetail[] = [];
-    let second_level_sub_questions: SubQuestionDetail[] = [];
    let is_generating: boolean = false;
    let second_level_generating: boolean = false;
    let finalMessage: BackendMessage | null = null;
@@ -1291,7 +1297,7 @@ export function ChatPage({

      const stack = new CurrentMessageFIFO();
      updateCurrentMessageFIFO(stack, {
-        signal: controller.signal, // Add this line
+        signal: controller.signal,
        message: currMessage,
        alternateAssistantId: currentAssistantId,
        fileDescriptors: overrideFileDescriptors || currentMessageFiles,
@@ -1712,7 +1718,10 @@ export function ChatPage({
        const newUrl = buildChatUrl(searchParams, currChatSessionId, null);
        // newUrl is like /chat?chatId=10
        // current page is like /chat
-        router.push(newUrl, { scroll: false });
+
+        if (pathname == "/chat" && !navigatingAway.current) {
+          router.push(newUrl, { scroll: false });
+        }
      }
    }
    if (
@@ -2086,6 +2095,31 @@ export function ChatPage({
    llmOverrideManager.updateImageFilesPresent(imageFileInMessageHistory);
  }, [imageFileInMessageHistory]);

+  const pathname = usePathname();
+  useEffect(() => {
+    return () => {
+      // Cleanup which only runs when the component unmounts (i.e. when you navigate away).
+      const currentSession = currentSessionId();
+      const controller = abortControllersRef.current.get(currentSession);
+      if (controller) {
+        controller.abort();
+        navigatingAway.current = true;
+        setAbortControllers((prev) => {
+          const newControllers = new Map(prev);
+          newControllers.delete(currentSession);
+          return newControllers;
+        });
+      }
+    };
+  }, [pathname]);
+
+  const navigatingAway = useRef(false);
+  // Keep a ref to abortControllers to ensure we always have the latest value
+  const abortControllersRef = useRef(abortControllers);
+  useEffect(() => {
+    abortControllersRef.current = abortControllers;
+  }, [abortControllers]);
+
  useSidebarShortcut(router, toggleSidebar);

  const [sharedChatSession, setSharedChatSession] =
@@ -2300,7 +2334,7 @@ export function ChatPage({
                fixed
                left-0
                z-40
-                bg-background-100
+                bg-neutral-200
                h-screen
                transition-all
                bg-opacity-80
@@ -2557,12 +2591,21 @@ export function ChatPage({
                                ) {
                                  return <></>;
                                }
+                                const nextMessage =
+                                  messageHistory.length > i + 1
+                                    ? messageHistory[i + 1]
+                                    : null;
                                return (
                                  <div
                                    id={`message-${message.messageId}`}
                                    key={messageReactComponentKey}
                                  >
                                    <HumanMessage
+                                      disableSwitchingForStreaming={
+                                        (nextMessage &&
+                                          nextMessage.is_generating) ||
+                                        false
+                                      }
                                      stopGenerating={stopGenerating}
                                      content={message.message}
                                      files={message.files}
--- a/web/src/app/chat/input/AgenticToggle.tsx
+++ b/web/src/app/chat/input/AgenticToggle.tsx
@@ -94,7 +94,7 @@ export function AgenticToggle({
              Agent Search (BETA)
            </h3>
          </div>
-          <p className="text-xs text-neutarl-600  dark:text-neutral-700 mb-2">
+          <p className="text-xs text-neutral-600  dark:text-neutral-700 mb-2">
            Use AI agents to break down questions and run deep iterative
            research through promising pathways. Gives more thorough and
            accurate responses but takes slightly longer.
--- a/web/src/app/chat/input/LLMPopover.tsx
+++ b/web/src/app/chat/input/LLMPopover.tsx
@@ -113,7 +113,7 @@ export default function LLMPopover({
    <Popover open={isOpen} onOpenChange={setIsOpen}>
      <PopoverTrigger asChild>
        <button
-          className="focus:outline-none"
+          className="dark:text-[#fff] text-[#000] focus:outline-none"
          data-testid="llm-popover-trigger"
        >
          <ChatInputOption
--- a/web/src/app/chat/lib.tsx
+++ b/web/src/app/chat/lib.tsx
@@ -250,7 +250,7 @@ export async function* sendMessage({
    throw new Error(`HTTP error! status: ${response.status}`);
  }

-  yield* handleSSEStream<PacketType>(response);
+  yield* handleSSEStream<PacketType>(response, signal);
 }

 export async function nameChatSession(chatSessionId: string) {
--- a/web/src/app/chat/message/AgenticMessage.tsx
+++ b/web/src/app/chat/message/AgenticMessage.tsx
@@ -9,6 +9,12 @@ import React, {
  useMemo,
  useState,
 } from "react";
+import {
+  Tooltip,
+  TooltipContent,
+  TooltipProvider,
+  TooltipTrigger,
+} from "@/components/ui/tooltip";
 import ReactMarkdown from "react-markdown";
 import { OnyxDocument, FilteredOnyxDocument } from "@/lib/search/interfaces";
 import remarkGfm from "remark-gfm";
@@ -308,7 +314,7 @@ export const AgenticMessage = ({
  const renderedAlternativeMarkdown = useMemo(() => {
    return (
      <ReactMarkdown
-        className="prose max-w-full text-base"
+        className="prose dark:prose-invert max-w-full text-base"
        components={{
          ...markdownComponents,
          code: ({ node, className, children }: any) => {
@@ -335,7 +341,7 @@ export const AgenticMessage = ({
  const renderedMarkdown = useMemo(() => {
    return (
      <ReactMarkdown
-        className="prose max-w-full text-base"
+        className="prose dark:prose-invert max-w-full text-base"
        components={markdownComponents}
        remarkPlugins={[remarkGfm, remarkMath]}
        rehypePlugins={[[rehypePrism, { ignoreMissing: true }], rehypeKatex]}
@@ -530,6 +536,7 @@ export const AgenticMessage = ({
                            {includeMessageSwitcher && (
                              <div className="-mx-1 mr-auto">
                                <MessageSwitcher
+                                  disableForStreaming={!isComplete}
                                  currentPage={currentMessageInd + 1}
                                  totalPages={otherMessagesCanSwitchTo.length}
                                  handlePrevious={() => {
@@ -616,6 +623,7 @@ export const AgenticMessage = ({
                            {includeMessageSwitcher && (
                              <div className="-mx-1 mr-auto">
                                <MessageSwitcher
+                                  disableForStreaming={!isComplete}
                                  currentPage={currentMessageInd + 1}
                                  totalPages={otherMessagesCanSwitchTo.length}
                                  handlePrevious={() => {
@@ -694,27 +702,52 @@ function MessageSwitcher({
  totalPages,
  handlePrevious,
  handleNext,
+  disableForStreaming,
 }: {
  currentPage: number;
  totalPages: number;
  handlePrevious: () => void;
  handleNext: () => void;
+  disableForStreaming?: boolean;
 }) {
  return (
    <div className="flex items-center text-sm space-x-0.5">
-      <Hoverable
-        icon={FiChevronLeft}
-        onClick={currentPage === 1 ? undefined : handlePrevious}
-      />
+      <TooltipProvider>
+        <Tooltip>
+          <TooltipTrigger asChild>
+            <div>
+              <Hoverable
+                icon={FiChevronLeft}
+                onClick={currentPage === 1 ? undefined : handlePrevious}
+              />
+            </div>
+          </TooltipTrigger>
+          <TooltipContent>
+            {disableForStreaming ? "Disabled" : "Previous"}
+          </TooltipContent>
+        </Tooltip>
+      </TooltipProvider>

      <span className="text-text-darker select-none">
        {currentPage} / {totalPages}
+        {disableForStreaming ? "Complete" : "Generating"}
      </span>

-      <Hoverable
-        icon={FiChevronRight}
-        onClick={currentPage === totalPages ? undefined : handleNext}
-      />
+      <TooltipProvider>
+        <Tooltip>
+          <TooltipTrigger asChild>
+            <div>
+              <Hoverable
+                icon={FiChevronRight}
+                onClick={currentPage === totalPages ? undefined : handleNext}
+              />
+            </div>
+          </TooltipTrigger>
+          <TooltipContent>
+            {disableForStreaming ? "Disabled" : "Next"}
+          </TooltipContent>
+        </Tooltip>
+      </TooltipProvider>
    </div>
  );
 }
--- a/web/src/app/chat/message/Messages.tsx
+++ b/web/src/app/chat/message/Messages.tsx
@@ -383,7 +383,7 @@ export const AIMessage = ({
          dangerouslySetInnerHTML={{ __html: htmlContent }}
        />
        <ReactMarkdown
-          className="prose max-w-full text-base"
+          className="prose dark:prose-invert max-w-full text-base"
          components={markdownComponents}
          remarkPlugins={[remarkGfm, remarkMath]}
          rehypePlugins={[[rehypePrism, { ignoreMissing: true }], rehypeKatex]}
@@ -495,7 +495,10 @@ export const AIMessage = ({
                    {docs && docs.length > 0 && (
                      <div
                        className={`mobile:hidden ${
-                          query && "mt-2"
+                          (query ||
+                            toolCall?.tool_name ===
+                              INTERNET_SEARCH_TOOL_NAME) &&
+                          "mt-2"
                        }  -mx-8 w-full mb-4 flex relative`}
                      >
                        <div className="w-full">
@@ -795,27 +798,67 @@ function MessageSwitcher({
  totalPages,
  handlePrevious,
  handleNext,
+  disableForStreaming,
 }: {
  currentPage: number;
  totalPages: number;
  handlePrevious: () => void;
  handleNext: () => void;
+  disableForStreaming?: boolean;
 }) {
  return (
    <div className="flex items-center text-sm space-x-0.5">
-      <Hoverable
-        icon={FiChevronLeft}
-        onClick={currentPage === 1 ? undefined : handlePrevious}
-      />
+      <TooltipProvider>
+        <Tooltip>
+          <TooltipTrigger asChild>
+            <div>
+              <Hoverable
+                icon={FiChevronLeft}
+                onClick={
+                  disableForStreaming
+                    ? () => null
+                    : currentPage === 1
+                      ? undefined
+                      : handlePrevious
+                }
+              />
+            </div>
+          </TooltipTrigger>
+          <TooltipContent>
+            {disableForStreaming
+              ? "Wait for agent message to complete"
+              : "Previous"}
+          </TooltipContent>
+        </Tooltip>
+      </TooltipProvider>

      <span className="text-text-darker select-none">
        {currentPage} / {totalPages}
      </span>

-      <Hoverable
-        icon={FiChevronRight}
-        onClick={currentPage === totalPages ? undefined : handleNext}
-      />
+      <TooltipProvider>
+        <Tooltip>
+          <TooltipTrigger>
+            <div>
+              <Hoverable
+                icon={FiChevronRight}
+                onClick={
+                  disableForStreaming
+                    ? () => null
+                    : currentPage === totalPages
+                      ? undefined
+                      : handleNext
+                }
+              />
+            </div>
+          </TooltipTrigger>
+          <TooltipContent>
+            {disableForStreaming
+              ? "Wait for agent message to complete"
+              : "Next"}
+          </TooltipContent>
+        </Tooltip>
+      </TooltipProvider>
    </div>
  );
 }
@@ -829,6 +872,7 @@ export const HumanMessage = ({
  onMessageSelection,
  shared,
  stopGenerating = () => null,
+  disableSwitchingForStreaming = false,
 }: {
  shared?: boolean;
  content: string;
@@ -838,6 +882,7 @@ export const HumanMessage = ({
  onEdit?: (editedContent: string) => void;
  onMessageSelection?: (messageId: number) => void;
  stopGenerating?: () => void;
+  disableSwitchingForStreaming?: boolean;
 }) => {
  const textareaRef = useRef<HTMLTextAreaElement>(null);

@@ -1067,6 +1112,7 @@ export const HumanMessage = ({
              otherMessagesCanSwitchTo.length > 1 && (
                <div className="ml-auto mr-3">
                  <MessageSwitcher
+                    disableForStreaming={disableSwitchingForStreaming}
                    currentPage={currentMessageInd + 1}
                    totalPages={otherMessagesCanSwitchTo.length}
                    handlePrevious={() => {
--- a/web/src/app/chat/message/SubQuestionsDisplay.tsx
+++ b/web/src/app/chat/message/SubQuestionsDisplay.tsx
@@ -294,7 +294,7 @@ const SubQuestionDisplay: React.FC<{
  const renderedMarkdown = useMemo(() => {
    return (
      <ReactMarkdown
-        className="prose max-w-full text-base"
+        className="prose dark:prose-invert max-w-full text-base"
        components={markdownComponents}
        remarkPlugins={[remarkGfm, remarkMath]}
        rehypePlugins={[rehypeKatex]}
@@ -340,7 +340,7 @@ const SubQuestionDisplay: React.FC<{
              {subQuestion?.question || temporaryDisplay?.question}
            </div>
            <ChevronDown
-              className={`mt-0.5 text-text-darker transition-transform duration-500 ease-in-out ${
+              className={`mt-0.5 flex-none text-text-darker transition-transform duration-500 ease-in-out ${
                toggled ? "" : "-rotate-90"
              }`}
              size={20}
@@ -632,9 +632,7 @@ const SubQuestionsDisplay: React.FC<SubQuestionsDisplayProps> = ({
        }
      `}</style>
      <div className="relative">
-        {/* {subQuestions.map((subQuestion, index) => ( */}
        {memoizedSubQuestions.map((subQuestion, index) => (
-          // {dynamicSubQuestions.map((subQuestion, index) => (
          <SubQuestionDisplay
            currentlyOpen={
              currentlyOpenQuestion?.level === subQuestion.level &&
--- a/web/src/app/ee/admin/standard-answer/page.tsx
+++ b/web/src/app/ee/admin/standard-answer/page.tsx
@@ -131,7 +131,7 @@ const StandardAnswersTableRow = ({
        />,
        <ReactMarkdown
          key={`answer-${standardAnswer.id}`}
-          className="prose"
+          className="prose dark:prose-invert"
          remarkPlugins={[remarkGfm]}
        >
          {standardAnswer.answer}
--- a/web/src/app/globals.css
+++ b/web/src/app/globals.css
@@ -562,6 +562,7 @@ body {
 .prose :where(pre):not(:where([class~="not-prose"], [class~="not-prose"] *)) {
  background-color: theme("colors.code-bg");
  font-size: theme("fontSize.code-sm");
+  color: #fff;
 }

 pre[class*="language-"],
@@ -655,16 +656,3 @@ ul > li > p {
  display: inline;
  /* Make paragraphs inline to reduce vertical space */
 }
-
-.dark strong {
-  color: white;
-}
-
-.prose.dark li,
-.prose.dark h1,
-.prose.dark h2,
-.prose.dark h3,
-.prose.dark h4,
-.prose.dark h5 {
-  color: #e5e5e5;
-}
--- a/web/src/components/Hoverable.tsx
+++ b/web/src/components/Hoverable.tsx
@@ -17,7 +17,7 @@ export const Hoverable: React.FC<{
      <div className="flex items-center">
        <Icon
          size={size}
-          className="hover:bg-background-chat-hover dark:text-[#B4B4B4] text-neutral-600  rounded h-fit cursor-pointer"
+          className="dark:text-[#B4B4B4] text-neutral-600  rounded h-fit cursor-pointer"
        />
        {hoverText && (
          <div className="max-w-0 leading-none whitespace-nowrap overflow-hidden transition-all duration-300 ease-in-out group-hover:max-w-xs group-hover:ml-2">
--- a/web/src/components/SearchResultIcon.tsx
+++ b/web/src/components/SearchResultIcon.tsx
@@ -50,7 +50,7 @@ export function SearchResultIcon({ url }: { url: string }) {
    return <SourceIcon sourceType={ValidSources.Web} iconSize={18} />;
  }
  if (url.includes("docs.onyx.app")) {
-    return <OnyxIcon size={18} />;
+    return <OnyxIcon size={18} className="dark:text-[#fff] text-[#000]" />;
  }

  return (
--- a/web/src/components/WebResultIcon.tsx
+++ b/web/src/components/WebResultIcon.tsx
@@ -23,7 +23,7 @@ export function WebResultIcon({
  return (
    <>
      {hostname == "docs.onyx.app" ? (
-        <OnyxIcon size={size} />
+        <OnyxIcon size={size} className="dark:text-[#fff] text-[#000]" />
      ) : !error ? (
        <img
          className="my-0 rounded-full py-0"
--- a/web/src/components/admin/connectors/Field.tsx
+++ b/web/src/components/admin/connectors/Field.tsx
@@ -432,7 +432,10 @@ export const MarkdownFormField = ({
        </div>
        {isPreviewOpen ? (
          <div className="p-4 border-t border-background-300">
-            <ReactMarkdown className="prose" remarkPlugins={[remarkGfm]}>
+            <ReactMarkdown
+              className="prose dark:prose-invert"
+              remarkPlugins={[remarkGfm]}
+            >
              {field.value}
            </ReactMarkdown>
          </div>
--- a/web/src/components/chat/BlurBackground.tsx
+++ b/web/src/components/chat/BlurBackground.tsx
@@ -9,7 +9,7 @@ export default function BlurBackground({
    <div
      onClick={onClick}
      className={`
-        desktop:hidden w-full h-full fixed inset-0 bg-black bg-opacity-50 backdrop-blur-sm z-30 transition-opacity duration-300 ease-in-out ${
+        desktop:hidden w-full h-full fixed inset-0 bg-neutral-700 bg-opacity-50 backdrop-blur-sm z-30 transition-opacity duration-300 ease-in-out ${
          visible
            ? "opacity-100 pointer-events-auto"
            : "opacity-0 pointer-events-none"
--- a/web/src/components/chat/MinimalMarkdown.tsx
+++ b/web/src/components/chat/MinimalMarkdown.tsx
@@ -35,7 +35,7 @@ export const MinimalMarkdown: React.FC<MinimalMarkdownProps> = ({

  return (
    <ReactMarkdown
-      className={`w-full text-wrap break-word ${className}`}
+      className={`w-full text-wrap break-word prose dark:prose-invert ${className}`}
      components={markdownComponents}
      remarkPlugins={[remarkGfm]}
    >
--- a/web/src/components/chat/sources/SourceCard.tsx
+++ b/web/src/components/chat/sources/SourceCard.tsx
@@ -78,7 +78,7 @@ export function getUniqueIcons(docs: OnyxDocument[]): JSX.Element[] {

  for (const doc of docs) {
    // If it's a web source, we check domain uniqueness
-    if (doc.source_type === ValidSources.Web && doc.link) {
+    if ((doc.is_internet || doc.source_type === ValidSources.Web) && doc.link) {
      const domain = getDomainFromUrl(doc.link);
      if (domain && !seenDomains.has(domain)) {
        seenDomains.add(domain);
--- a/web/src/components/header/LogoWithText.tsx
+++ b/web/src/components/header/LogoWithText.tsx
@@ -47,7 +47,7 @@ export default function LogoWithText({
          className="flex gap-x-2 items-center ml-0 cursor-pointer desktop:hidden "
        >
          {!toggled ? (
-            <Logo className="desktop:hidden -my-2" height={24} width={24} />
+            <Logo className="desktop:hidden" height={24} width={24} />
          ) : (
            <LogoComponent
              show={toggled}
--- a/web/src/lib/hooks.ts
+++ b/web/src/lib/hooks.ts
@@ -23,8 +23,11 @@ import { AllUsersResponse } from "./types";
 import { Credential } from "./connectors/credentials";
 import { SettingsContext } from "@/components/settings/SettingsProvider";
 import { Persona, PersonaLabel } from "@/app/admin/assistants/interfaces";
-import { LLMProviderDescriptor } from "@/app/admin/configuration/llm/interfaces";
-import { isAnthropic } from "@/app/admin/configuration/llm/interfaces";
+import {
+  isAnthropic,
+  LLMProviderDescriptor,
+} from "@/app/admin/configuration/llm/interfaces";
+
 import { getSourceMetadata } from "./sources";
 import { AuthType, NEXT_PUBLIC_CLOUD_ENABLED } from "./constants";
 import { useUser } from "@/components/user/UserProvider";
--- a/web/src/lib/search/streamingUtils.ts
+++ b/web/src/lib/search/streamingUtils.ts
@@ -79,12 +79,18 @@ export async function* handleStream<T extends NonEmptyObject>(
 }

 export async function* handleSSEStream<T extends PacketType>(
-  streamingResponse: Response
+  streamingResponse: Response,
+  signal?: AbortSignal
 ): AsyncGenerator<T, void, unknown> {
  const reader = streamingResponse.body?.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
-
+  if (signal) {
+    signal.addEventListener("abort", () => {
+      console.log("aborting");
+      reader?.cancel();
+    });
+  }
  while (true) {
    const rawChunk = await reader?.read();
    if (!rawChunk) {
--- a/web/tailwind-themes/tailwind.config.js
+++ b/web/tailwind-themes/tailwind.config.js
@@ -21,7 +21,6 @@ module.exports = {
      transitionProperty: {
        spacing: "margin, padding",
      },
-
      keyframes: {
        "subtle-pulse": {
          "0%, 100%": { opacity: 0.9 },
@@ -148,7 +147,6 @@ module.exports = {
        "text-mobile-sidebar": "var(--text-800)",
        "background-search-filter": "var(--neutral-100-border-light)",
        "background-search-filter-dropdown": "var(--neutral-100-border-light)",
-        "tw-prose-bold": "var(--text-800)",

        "user-bubble": "var(--off-white)",
Author	SHA1	Message	Date
pablodanswer	d1b9f11750	k	2025-02-17 19:17:48 -08:00
pablodanswer	9b067d8179	update error	2025-02-17 19:17:48 -08:00
pablonyx	cf0e3d1ff4	fix main	2025-02-17 18:23:15 -08:00
evan-danswer	10c81f75e2	consistent refined answer improvement (#4027 )	2025-02-17 21:02:03 +00:00
evan-danswer	5ca898bde2	Force use tool overrides (#4024 ) * initial rename + timeout bump * querry override	2025-02-17 21:01:24 +00:00
pablonyx	58b252727f	UX (#4014 )	2025-02-17 13:21:43 -08:00
joachim-danswer	86bd121806	no reranking if local model w/o GPU for Agent Search (#4011 ) * no reranking if locql model w/o GPU * more efficient gpu status calling * fix unit tests --------- Co-authored-by: Evan Lohn <evan@danswer.ai>	2025-02-17 14:13:24 +00:00
evan-danswer	9324f426c0	added timeouts for agent llm calls (#4019 ) * added timeouts for agent llm calls * timing suggestions in agent config * improved timeout that actually exits early * added new global timeout and connection timeout distinction * fixed error raising bug and made entity extraction recoverable * warnings and refactor * mypy --------- Co-authored-by: joachim-danswer <joachim@danswer.ai>	2025-02-17 07:02:19 +00:00
joachim-danswer	20d3efc86e	By default, use primary LLM for initial & refined answer (#4012 ) * By default, use primary LLM for initial & refined answer Use of new env variable * simplification	2025-02-16 23:20:07 +00:00
pablonyx	ec0e55fd39	Seeding count issue (#4009 ) * k * k * quick nit * nit	2025-02-16 20:49:25 +00:00