* Outline
* fixConnector
* fixTest
* The date filtering is implemented correctly as client-side filtering, which is the only way to achieve it with the Outline API since it doesn't support date parameters natively.
* Update web/src/lib/connectors/connectors.tsx
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* no connector config for outline
* Update backend/onyx/connectors/outline/client.py
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* Fix all PR review issues: document ID prefixes, error handling, test assertions, and null guards
* Update backend/onyx/connectors/outline/client.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* The test no longer depends on external network connectivity to httpbin.org
* I've enhanced the OutlineApiClient.post() method in backend/onyx/connectors/outline/client.py to properly handle network-level exceptions that could crash the connector during synchronization:
* Polling mechanism
* Removed flag-based approach
* commentOnClasses
* commentOnClasses
* commentOnClasses
* responseStatus
* startBound
* Changed the method signature to match the interface
* ConnectorMissingCredentials
* Time Out shared config
* Missing Credential message
---------
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* don't skip ccpairs if embedding swap in progress
* refactor check_for_indexing to properly handle search setting swaps
* mypy
* mypy
* comment debugging log
* nits and more efficient active index attempt check
* Add popular connectors sections and cleanup connectors page
* Add other connectors env var
* other connectors env var to vscode env template
* update playwright tests
* sort by popuarlity
* recategorize and sort by popularity
* replacement of "message_delta" etc as Enums + removal
* prompt changes
* cubic fixes where appropriate
* schema fixes + citation symbols
* various fixes
* fix for kg context in new search
* cw comments
* updates
* Explicitly add limit to the function calls
This means we miss fewer messages. The default limit is 100.
Signed-off-by: nigel brown <nigel@stacklok.com>
* Update backend/onyx/connectors/discord/connector.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Signed-off-by: nigel brown <nigel@stacklok.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* squash: combine all DR commits into one
Co-authored-by: Joachim Rahmfeld <joachim@onyx.app>
Co-authored-by: Rei Meguro <rmeguro@umich.edu>
* Fixes
* show KG in Assistant only if available
* KG only usable for KG Beta (for now)
* base file upload
* improvements
* raise error if uploaded context is too long
* More improvements
* Fix citations
* jank implementation of internet search with deep research that can kind of work
* early implementation for google api support
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
* .
---------
Co-authored-by: Weves <chrisweaver101@gmail.com>
Co-authored-by: Joachim Rahmfeld <joachim@onyx.app>
Co-authored-by: Rei Meguro <rmeguro@umich.edu>
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* add api/versions to onyx
* add test and rename onyx
* cubic nit
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* move api version constants and add explanatory comment
---------
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* squash: combine all DR commits into one
Co-authored-by: Joachim Rahmfeld <joachim@onyx.app>
Co-authored-by: Rei Meguro <rmeguro@umich.edu>
* Fixes
* show KG in Assistant only if available
* KG only usable for KG Beta (for now)
* base file upload
* raise error if uploaded context is too long
* improvements
* More improvements
* Fix citations
* better decision making
* improved decision-making in Orchestrator
* generic_internal tools
* Small tweak
* tool use improvements
* add on
* More image gen stuff
* fixes
* Small color improvements
* Markdown utils
* fixed end conditions (incl early exit for image generation)
* remove agent search + image fixes
* Okta tool support for reload
* Some cleanup
* Stream back search tool results as they come
* tool forcing
* fixed no-Tool-Assistant
* Support anthropic tool calling
* Support anthropic models better
* More stuff
* prompt fixes and search step numbers
* Fix hook ordering issue
* internal search fix
* Improve citation look
* Small UI improvements
* Improvements
* Improve dot
* Small chat fixes
* Small UI tweaks
* Small improvements
* Remove un-used code
* Fix
* Remove test_answer.py for now
* Fix
* improvements
* Add foreign keys
* early forcing
* Fix tests
* Fix tests
---------
Co-authored-by: Joachim Rahmfeld <joachim@onyx.app>
Co-authored-by: Rei Meguro <rmeguro@umich.edu>
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* indexing status optimization first draft
* refactor: update pagination logic and enhance UI for indexing status table
* add index attempt pruning job and display federated connectors in index status page
* update celery worker command to include index_attempt_cleanup queue
* refactor: enhance indexing status table and remove deprecated components
* mypy fix
* address review comments
* fix pagination reset issue
* add TODO for optimizing connector materialization and performance in future deployments
* enhance connector indexing status retrieval by adding 'get_all_connectors' option and updating pagination logic
* refactor: transition to paginated connector indexing status retrieval and update related components
* fix: initialize latest_index_attempt_docs_indexed to 0 in CCPairIndexingStatusTable component
* feat: add mock connector file support for indexing status retrieval and update indexing_statuses type to Sequence
* mypy fix
* refactor: rename indexing status endpoint to simplify API and update related components
* move api-based embeddings/reranking calls to api server out of model server, added/modified unit tests
* ran pre-commit
* fix mypy errors
* mypy and precommit
* move utils to right place and add requirements
* precommit check
* removed extra constants, changed error msg
* Update backend/onyx/utils/search_nlp_models_utils.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* greptile
* addressed comments
* added code enforcement to throw error
---------
Co-authored-by: Jessica Singh <jessicasingh@Mac.attlocal.net>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* feat: add non-root user to backend and model-server image
* feat: update values to support security context for index, inference, and celery_shared
* feat: add security context support for index and inference
* feat: add celery_shared security context support to celery worker templates
* fix: cache management strategy
* fix: update deployment files for volume mount
* fix: address comments
* fix: bump helm chart version for new security context template changes
* fix: bump helm chart version for new security context template changes
* feat: move useradd earlier in build for reduced image size
---------
Co-authored-by: Phil Critchfield <phil.critchfield@liatrio.com>
* fix(connector): #5178 Add error handling and logging for empty answer text in LoopioConnector
* fix(connector): onyx-dot-app#5178: Improve handling of empty answer text in LoopioConnector
---------
Co-authored-by: Jose Bañez <jose@4gclinical.com>
* feat(infra): Adding new AWS Terraform Template Code
* Addressing greptile comments
* Applying some updates after the cubic reviews as well
* Adding one detail
* Removing unused var
* Addressing more cubic comments
* feat: make sharepoint documents and sharepoint pages optional
* fix: address review feedback for PR #5183
* fix: exclude personal sites from sharepoint connector
---------
Co-authored-by: Nils Kleinrahm <nils.kleinrahm@pledoc.de>
* fix: sf connector docs
* more sf logs
* better logs and new attempt
* add fields to error temporarily
* fix sf
---------
Co-authored-by: Wenxi <wenxi@onyx.app>
* initial migration
* getting metadata from tags
* complete migration
* migration override for cloud
* fix: more robust structured tag gen
* tag and indexing update
* fix: move is_list to tags
* migration rebase
* test cases + bugfix on unique constraint
* fix logging
* Update names in map-comprehension
* Make default name for ungrounded types public
* Return the default name for ungrounded entity-types
* Update backend/onyx/db/entities.py
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
---------
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* github perm sync initial draft
* introduce github doc sync and perm sync
* remove specific start time check
* Refactor GitHub connector to use SlimCheckpointOutputWrapper for improved document handling
* Update GitHub sync frequency defaults from 30 minutes to 5 minutes
* Add stop signal handling and progress reporting in GitHub document sync
* Refactor tests for Confluence and Google Drive connectors to use a mock fetch function for document access
* change the doc_sync approach
* add static typing for ocument columns and where clause
* remove prefix logic in connector runner
* mypy fix
* code review changes
* mypy fix
* fix review comments
* add sort order
* Implement merge heads migration for Alembic and update Confluence and Google Drive test
* github unit tests fix
* delete merge head and rebase the docmetadata field migration
---------
Co-authored-by: Subash <subash@onyx.app>
* Use props instead of inline type def
* Add new AppProvider
* Remove unused component file
* Move `sessionSidebar` to be inside of `components` instead of `app/chat`
* Change name of `sessionSidebar` to `sidebar`
* Remove `AppModeProvider`
* Fix bug in how the cookies were set
* WIP
* renamed and moved tasks (WIP)
* minio migration
* bug fixes and finally add document batch storage
* WIP: can suceed but status is error
* WIP
* import fixes
* working v1 of decoupled
* catastrophe handling
* refactor
* remove unused db session in prep for new approach
* renaming and docstrings (untested)
* renames
* WIP with no more indexing fences
* robustness improvements
* clean up rebase
* migration and salesforce rate limits
* minor tweaks
* test fix
* connector pausing behavior
* correct checkpoint resumption logic
* cleanups in docfetching
* add heartbeat file
* update template jsonc
* deployment fixes
* fix vespa httpx pool
* error handling
* cosmetic fixes
* dumb
* logging improvements and non checkpointed connector fixes
* didnt save
* misc fixes
* fix import
* fix deletion of old files
* add in attempt prefix
* fix attempt prefix
* tiny log improvement
* minor changes
* fixed resumption behavior
* passing int tests
* fix unit test
* fixed unit tests
* trying timeout bump to see if int tests pass
* trying timeout bump to see if int tests pass
* fix autodiscovery
* helm chart fixes
* helm and logging
* Improve check_for_indexing + check_for_vespa_sync_task
* Remove unused
* Fix
* Simplify query
* Add more logging
* Address bot comments
* Increase # of tasks generated since we're not going cc-pair by cc-pair
* Only index 50 user files at a time
* Add basic structure for frontend email connector
* Update names of credentials-json keys
* Fix up configurations workflow
* Edit logic on how `mail_client` is used
- imaplib.IMAP4_SSL is supposed to be treated as an ephemeral object
* Edit helper name and add docs
* Fix invalid mailbox selection error
* Implement greptile suggestions
* Make recipients optional and add sender to primary-owners
* Add sender to external-access too; perform dedupe-ing of emails
* Simplify logic
* Make constant a global
* Add ability to specify vertex location
* Add period
* Add a hardcoding path to the frontend
* Add docs
* Add default value to `CustomConfigKey`
* Consume default value from custom-config-key on frontend
* Use markdown renderer instead
* Update description
* Remove macro stylings from HTML tree
* Add params
* Handle multiple cases of `ac:structured-macro` being found.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implement fetching; still need to work on document parsing
* Add basic skeleton of parsing email bodies
* Add id field
* Add email body parsing
* Implement checkpointed imap-connector
* Add testing logic for basic iteration
* Add logic to get different header if "to" isn't present
- possible in mailing-list workflows
* Add ability to index specific mailboxes
* Add breaking when indexing has been fully exhausted
* Sanitize all mailbox names + add space between stripped strings after parsing
* Add multi-recipient parsing
* Change around semantic-identifier and title
* Add imap tests
* Add recipients and content assertions to tests
* Add envvars to github actions workflow file
* Remove encoding header
* Update logic to not immediately establish connection upon init of `ImapConnector`
* Add start and end datetime filtering + edit when connection is established / how login is done
* Remove content-type header
* Add note about guards
* Change default parameters to be `None` instead of `[]`
* Address comment on PR
* Implement more PR suggestions
* More PR suggestions
* Implement more PR suggestions
* Change up login/logout flow (PR suggestion)
* Move port number to be envvar
* Make globals variants in enum instead (PR suggestion)
* Fix more documentation related suggestions on PR
* Have the imap connector implement `CheckpointedConnectorWithPermSync` instead
* Add helper for loading all docs with permission syncing
* fixed id extraction in drive connector
* WIP migration
* full migration script
* migration works single tenant without duplicates
* tested single tenant with duplicate docs
* migrations and frontend
* tested mutlitenant
* fix connector tests
* make tests pass
* Fix bug with incorrect model icon being shown
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add visibility to filtering
* Update the model names which are shown in the popup
* Fix incorrect llm updating bug
* Fix bug in which the provider name would be used instead
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add new convenience method
* Fix bug in which emails would be fetched for initial indexing
* Improve tests for MS Teams connector
* Fix test_gdrive_perm_sync_with_real_data patching
* Protect against incorrect truthiness
---------
Co-authored-by: Weves <chrisweaver101@gmail.com>
* feat: move vespa at end in try block
* simplify query
* mypy
* added order by just in case for consistent pagination
* liveness probe
* kg_p check for both extraction and clustering
* fix: better vespa logging
* Add function stubs for Teams
* Implement more boilerplate code
* Change structure of helper functions
* Implement teams perms for the initial index
* Make private functions start with underscore
* Implement slim_doc retrieval and fix up doc_sync
* Simplify how doc-sync is done
* Refactor jira doc-sync
* Make locally used function start with an underscore
* Update backend/ee/onyx/configs/app_configs.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add docstring to helper function
* Update tests
* Add an expected failure
* Address comment on PR
* Skip expert-info if user does not have a display-name
* Add doc comments
* Fix error in generic_doc_sync
* Move callback invocation to earlier in the loop
* Update tests to include proper list of user emails
* Update logic to grab user emails as well
* Only fetch expert-info if channel is not public
* Pull expert-info creation outside of loop
* Remove unnecessary call to `iter`
* Switch from `dataclass` to `BaseModel`
* Simplify boolean logic
* Simplify logic for determining if channel is public
* Remove unnecessary channel membership-type
* Add log-warns
* Only perform another API fetch if email is not present
* Address comments on PR
* Add message on assertion failure
* Address typo
* Make exception message more descriptive
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add function stubs for Teams
* Implement more boilerplate code
* Change structure of helper functions
* Implement teams perms for the initial index
* Make private functions start with underscore
* Implement slim_doc retrieval and fix up doc_sync
* Simplify how doc-sync is done
* Refactor jira doc-sync
* Make locally used function start with an underscore
* Update backend/ee/onyx/configs/app_configs.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add docstring to helper function
* Update tests
* Add an expected failure
* Address comment on PR
* Skip expert-info if user does not have a display-name
* Add doc comments
* Fix error in generic_doc_sync
* Move callback invocation to earlier in the loop
* Update tests to include proper list of user emails
* Update logic to grab user emails as well
* Only fetch expert-info if channel is not public
* Pull expert-info creation outside of loop
* Remove unnecessary call to `iter`
* Switch from `dataclass` to `BaseModel`
* Simplify boolean logic
* Simplify logic for determining if channel is public
* Remove unnecessary channel membership-type
* Add log-warns
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* kg cleanup
* more cleanup
* fix: copy over _get_classification_content_from_call_chunks for content formatting
* added back deep extraction logic
* feat: making deep extraction and clustering work
* nit
Changed the restart policy to unless-stopped to ensure containers
automatically restart after failures or reboots but allow manual stop
without immediate restart.
This is preferable over always because it prevents containers from
restarting automatically after a manual stop, enabling controlled
shutdowns and maintenance without unintended restarts.
* Split up engine file
* Switch to schema_translate_map
* Fix mass serach/replace
* Remove unused
* Fix mypy
* Fix
* Add back __init__.py
* kg fix for new session management
Adding "<tenant_id>" in front of all views.
* additional kg fix
* better handling
* improve naming
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* updates
- no classification if deep extraction is False
- separate names for views in LLM generation
- better prompts
- any relationship type provided to LLM that relates to identified entities
* CW feedback/comment update
* GCS metadata processing
* Unprocessable files should still be indexed to be searched by title
* Moved re-used logic to utils. Combined file metadata PR with GCS metadata changes
* Added OnyxMetadata type, adjusted timestamp naming consistency, clarified timestamp logic
* Use BaseModel
---------
Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
* Create Entity-Only path for simple entity-focussed queries. Plus
other fixes.
* fix: use env var
* mypy fix
* fix: mypy
---------
Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com>
* refactor salesforce sqlite db access
* more refactoring
* refactor again
* refactor again
* rename object
* add finalizer to ensure db connection is always closed
* avoid unnecessarily nesting connections and commit regularly when possible
* remove db usage from csv download
* dead code
* hide deprecation warning in ddtrace
* remove unused param
* local testing WIP
* stuff for pytest-dotenv
* autodetect filter types instead of assuming last modified always works (it doesn't)
Move filtering responsibility up instead of making utility calls excessively stateful
* fix how changed parent id's are yielded
* remove slow part of test
* clean up comments
* small refactor
* more refactor
* add normalize test
* checkpoint and comments
* add helper function
* fix gitignore
* add gitignore
* update pyproject
* delta updates
* remove comments
* fix time import
* fix set init
* add salesforce env vars
* cleanup
* more cleanup
* filtered item is unbound here
* typo
* fix suffix check
* fix empty type query
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* db setup
* transfer 1 - incomplete
* more adjustments
* relationship table + query update
* temp view creation
* restructuring
* nits
* updates
* separate read_only engine
* extraction revamp
* focus on metadata relatonships 1
* dev
* migration downgrade fix
* rebase migration change
* a3+
* progress
* base
* new extraction
* progress
* fixed KG extraction
* nits
* updates
* simplifications & cleanup
* fixes
* updates
* more feature flag checks
* fixes
* extraction process fix
* read-only user creation as part of setup
* fix for missing entity attributes
* kg read-only user creation as part of migration
* typo
* EL initial comments
* initial Account/SF Connector chnges
* SF Connector update
- include account information
* base w/ salesforce
* evan updates + quite a bit more
* kg-filtered search
* EL changes pt 2
* migrations and env vars
* quick migration fix
* migration update
* post_rebase fixes
* mypy fixes
* test fixes
* test fix
* test fix
* read_only pool + misc
* nf
* env vars
* test improvements
* salesforce fix
* test update
* small changes
* small adjustments
* SF Connector fix & kg_stage removal for one table
* mypy fix
* small fixes
* EL + RK (pt 1) comments
* nit
* setting updated
* Salesforce test update
* EL comments
* read-only user replacement & cleanup
* SQL View fix
* converting entity type-name separators
* sql view group ownership
* view fix
* SQL tweak
* dealing with docs that were skipped by indexing
* increased error handling
* more error handling
* Output formatting fix
* kg-incremental-reindexing
* 0-doc found improvement
* celery
* migration correction
* timeout adjustments
* nit
* Updated migration
* Entity Normalization for KG Dev 1 (#4746)
* feat: trigrams column
* fix: reranking and db
* feat: v1
* fix: convert to orm
* feat: parallel
* fix: default to id_name
* fix: renamed semantic_id and semantic_id_trigrams
* fix: scalar subquery
* fix: tuning + redundancy
* fix: threshold
* fix: typo
* fix: shorten names
* wip
* fix: reverted
* feat: config
* feat: works but it was dumb
* feat: clustering works
* fix: mypy
* normalization <-> language awareness for SQL generation
* small type fixes
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* mypy
* typo and dead code
* kg_time_fencing
* feat: remove temp views on migration downgrade
* remove functions and triggers for now
* rebase adjustments
* EL code review results
* quick fix + trigger/funcs for single tenant
* fix: typo, mypy, dead code
* fix: autoflake
* small updatesd
* nit
* fix: typo
* early + faster view creation
* Extension creation in MT migration
* nit changes to default ETs
* Incremental Clustering and KG Refactor V1 (#4784)
Optimized/restructured incremental clustering. New pipeline actually that moves vespa updates to clustering.
Also, celery configuration has been updated.
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* Move file
* Fix all prior imports
* Clean sidebar items logic; add kg page
* Add kg_processing celery background task
* prompt tweak & ET extraction reset
* more general hierarchical structure
* feat: better vespa reset logic
* Add basic knowledge graph configuration
* Add configurations for KG entity-type
* prompt optimization and entity replacemants
* small prompt changes
* Implement backend APIs
* KG Refactor V2 (#4814)
Clustering & Extraction improvements & various nits
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* add connector-level coverage days
* Update APIs to be more frontend ergonomic
* Add simple test
* Make config optional in test
* fix: nit
* initial EL responses
* refactor: helper functions for formatting
* fix: more helper fns & comments
* fix: comment code that's been implemented elsewhere
* Add entity-types APIs
* Hook up frontend to backend
* Finish hookup up entity-types to backend
* Update ordering of entity-types and fix form submitting
* Add backend API to get kg-exposed
* Add kg-exposed to sidebar
* Fix path
* Use existing values, even if kg-enabled is false
* Update what initial values are used
* Add skeleton for kg resetting
* Add return type
* Add default entity-type population when fetching entity-types
* Remove circular deps
* Minor fixes to logic
* Edit logic for default entity-types population
* Add re-index API + skeleton
* Update verbiage for KG
* Remove templatization in favour of function
* Address comments on PR
* Pull call out into its own binding
* Remove re-index API and revert implement of reset back to stub
* Fix circular import error
* Remove 'reindex' button
* Edit how the empty vendor name list is handled
* Edit how exposed is processed
* Redirect if navigated to `/admin/kg` and kg is not exposed
* Address comments on PR
* reset + entity type table display & updating updates
* Update fetching entity-types
* Make KG entity types refresh when reset
* Edit verbiage of reset button
* Update package-lock.json file
* Protect against overflowing
* Re-implement refreshing table after reset
* Edit message when nothing is shown.
* UI enhancements
* small fixes
* remove form validation?
* fix
* nit
* nit
* nit
* nit
* fix configure max coverage days
* EL comments for JR
* refactor: moved functions where they belong to fix circular import
* feat: intuitive coverage days
* feat: intuitive coverage days
* fix: safe date picker
* fix: startdate
* evan fixes
* fix: evan comment on enable/disable
* fix: style
* fix: ui issues
* fix: ui issues for reset too
* fix: tests
* fix: kg entity is not enabled
* fix: entity type reload on enable
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com>
* Edit logic for default entity-types population
* Remove templatization in favour of function
* Address comments on PR
* Pull call out into its own binding
* Address comments on PR
* Add rate-limiting to Teams API request
* Add comment for rate-limiting
* Implement rate-limiting for office365 library.
* Remove hardcoded value
* Fix nits on PR
* initial model switching changes
* Update image generation output format and revise prompt handling
* Add validation for output format in ImageGenerationTool and implement tests
---------
Co-authored-by: Subash <subash@onyx.app>
* Add perm sync to indexing for google drive
* Applying changes elsewhere
* Turn on EE for perm sync slack tests
* Add new load_from_checkpoint_with_perm_sync
* Adjust way perm sync configs are represented
* Adjust run_indexing to handle perm sync on first run
* Add missing file
* Add sync on index for slack
* Add test + fixes
* Update permission
* Fix connector tests
* skip perm sync test if running MIT tests
* Address EL comments
* Add error clarity to restart containers script
* erroneous cleanup on exit
* fix when starting containers for the first time
---------
Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
* Fixed indexing when no sites are specificed
* Added test for Sharepoint all sites index
* Accounted for paginated results.
* Typing
* Typing
---------
Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
* add percentage progress
* range checking
* formatting
* for new channels, skip them if the most recent messages are all from bots
* comments
* bypass bot channels
* code review
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* try fixing slack bot
* add logging
* just use if
* safe msg get
* .close isn't async
* enforce block list size limit
* various fixes and notes
* don't use self
* switch to punkt_tab
* fix return condition
* synchronize waiting, use non thread local redis locks
* fix log format, make collection copy more explicit for readability
* fix some logging
* unnecessary function
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* add more info
* fix headers
* add filename as param (merge)
* db manager entry in launch template
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Add replies to document construction and edit tests
* Update tests
* Add replies processing to teams
* Fix test
* Add try-except block around potential failure
* Update entity-id during ConnectorFailure raise
* Change query-exporting to use generators instead of expanding fully into memory
* Fix pagination logic
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add type annotation
* Add early break if list of chat_sessions is empty
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Highlight active link in AdminSidebar based on current pathname
* Refactor AdminSidebar to declare pathname variable earlier
---------
Co-authored-by: Subash <subash@onyx.app>
* Add basic foundation for teams checkpointing classes
* Fix slack connector main entrypoint
* Saving changes
* Finish teams checkpointing impl
* Remove commented out code
* Remove more unused code
* Move code around
* Add threadpool to process requests in parallel
* Fix mypy errors / warnings
* Move test import to main function only
* Address nits on PR
* Remove unnecessary check prior to entering while-loop
* Remove print statement
* Change exception message
* Address more nits
* Use indexing instead of destructuring
* Add back invocation of `run_with_timeout` instead of a direct call
* Revert slack testing code
* Move early return to before second API call
* Pull fetch to team outside of loop
* Address nits on PR
* Add back client-side filtering
* Updated connector to return after a team's indexing is finished
* Add type ignore
* Implement proper datetime range fetching
* Address comment on PR
* Rename function
* Change exception type when no team with the given id was found
* Address nit on PR
* Add comment on why `page_loaded` is needed to be specified explicitly
* Remove duplicated calls to fetching channels
* Use helper function for thread-based yielding instead of manual logic
* Move datetime filtering to message-level instead
* Address more comments on PR
* Add new utility function for yielding sections
* Add additional utility function
* Add teams tests
* Edit error message
* Address nits on PR
* Promote url-prefix to be a class level constant
* Fix mypy error
* Remove start/end parameters from function that doesn't use them anymore; move around comments
* Address more nits on PR
* Add comment
* add utility function
* add utility functions to DocExternalAccess
* refactor db access out of individual celery tasks and put it directly into the heavy task
* code review and remove leftovers
* fix circular imports
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* ensure we don't tag 'latest' with cloud images
* add docker login to trivy
* fix tag names
* flavor latest false (no auto latest tags)
* fix typo
* only run the appropriate workflow for web
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* set field size limit
* don't use sys.maxsize
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* run testing
* need to break on success
* add a readme
* raise vespa to 6GB
* allow test to retry
* add 20 attempts
* put memory limits back to normal
* restore chart testing on changes only
* increase retries to 40
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Add more logging for confluence perm-sync + handle case where permissions are removed from the access token
* Make required permissions are explicit
* more
* Add slim fetch limit + mark all cc pairs of source type as successful upon group sync
* Add to dev compose
* Small teams fix
* Add file
* Add single limit pagination for confluence
* Restrict to server only
* more logging
* cleanup
* Cleanup
* Remove CONFLUENCE_CONNECTOR_SLIM_FETCH_LIMIT
* Handle teams error
* Fix ut
* Remove db dependency from confluence_doc_sync
* move stuff back to debug
* restore caching and fix up some prefixing
* try backend matrix build and fix artifact names
* need id
* add backslashes to be consistent
* fix no-cache
* leave docker tags to the meta action
* need checkout in merge
* add comment
* move spammy logs to debug status
* bunch of no-cache updates
* prefix
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Update mode to be a default parameter in `FileStore.read`
* Move query history exporting process to be a background job instead
* Move hardcoded report-file-naming to a common utility function
* Add type annotations
* Update download component
* Implement button to re-ping and download CSV file; fix up some backend file-checking logic
* De-indent logic (w/ early return)
* Return different error codes dependings on the type of task status
* Add more resistant failure retrying mechanisms
* Remove default parameter in helper function
* Use popup for error messaging
* Update return code
* Update web/src/app/ee/admin/performance/query-history/DownloadAsCSV.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add type to useState call
* Update backend/ee/onyx/server/query_history/api.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/file_store/file_store.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/ee/onyx/background/celery/apps/primary.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Move rerender call to after check
* Run formatter
* Add type conversions back (smh greptile)
* Remove duplicated call to save_file
* Move non-fallible logic out of try-except block
* Pass date-ranges into API call
* Convert to ISO strings before passing it into the API call
* Add API to list all tasks
* Create new pydantic model to represent tasks to return instead
* Change helper to only fetch query-history tasks
* Use `shared_tasks` instead of old method
* Address more comments from PR; consolidate how task name is generated
* Mark task as failed if any exception is raised
* Change the task object which is returned back to the FE
* Add a table to display previously generated query-history-csv's
* Add timestamps to task; delete tasks as soon as file finishes processing
* Raise exception if start_time is not present
* Convert hard-coded string to constant
* Add "Generated At" field to table
* Return task list in sorted order (based off of start-time)
* Implement pagination
* Remove unused props and cleanup tailwind classes
* Change the name of kickoff button
* Redesign how previous query exports are viewed
* Make button a constant width even when contents change
* Remove timezone information before comparing
* Decrease interval time for re-pinging API
* Add timezone to start-time creation
* Add a refreshInterval for getting updated task status
* Add new background queue
* Edit small verbiage and remove error popup when max-retries is hit
* Change up heavy worker to recognize new task in new module
* Ensure `celery_app` is imported
* Change how `celery_app` is imported and defined
* Update comment on why `celery_app` must be imported
* Add basic skeleton for new beat task to cleanup any dead / failed query-history-export tasks
* Move cleanup task to different worker / queue
* Implement cleanup task
* Add return type
* Address comment on PR
* Remove delimiter from prefix
* Change name of function to be more descriptive
* Remove delimiter from prefix constant
* Move function invocation closer to usage location
* Move imports to top of file
* Move variable up a scope due to undefined error
* Remove dangling if-statement
* Make function more pure-functional
* Remove redefinition
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* it will never happen again.
* fix perm sync issue
* fix perm sync issue2
* ensure member emails map is populated
* other fix for perm sync
* address CW comments
* nit
* don't log all channels
* print number of channels
* sanitize indexing exception messages
* harden vespa index swap
* use constants and fix list generation
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* memory optimize task generation for connector deletion
* test
* fix up integration test docker file
* more no-cache
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
- created env variable AGENT_ALLOW_REFINEMENT with default "". Must be set to true to enable Refinement.
- added an environment variable for the upper limit of docs that can be sent to verification
* don't hardcode -1
* extra spaces
* fix binary data in blurb
* add note to binary handling
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* tolerance of confluence api weirdness
* remove checkpointing
* remove skipping logic from checkpointing
* add back checkpointing
* switch confluence checkpointing to be based on page starts
* address CW comments and fix unit tests
* some mitigations of bad confluence api
* new checkpointing approach and testing fixes
* fix test
* CW comments
* Fix migration
* Fix migration to take care of various nullability cases
* Address comments on PR
* Rename variables to be more descriptive
* Make helpers private
* Fix select statement
* Add comments to explain the involved logic
* Saving changes
* Finish script to revalidate `display_model_names`
* Address comments on PR by greptile
* Add missing columns
* Pull difference operator out into binding
* Add deletion prior to re-insertion
* Use map from shared llm-provider file instead
* Use helper function instead of copying code
* Remove delete and convert into an update statement
* Use pydantic for ModelConfigurations
* Update to do nothing on-conflict rather than update
* Address nits on PR
* Add default visible model(s) for bedrock
* Perform an update on conflict instead of doing nothing
* Fix migration
* Fix migration to take care of various nullability cases
* Address comments on PR
* Rename variables to be more descriptive
* Make helpers private
* Fix select statement
* Add comments to explain the involved logic
* Add helpers for viewing visible model names
* Fix logic for missing model + display-model names in migration
* refactor salesforce sqlite db access
* more refactoring
* refactor again
* refactor again
* rename object
* add finalizer to ensure db connection is always closed
* avoid unnecessarily nesting connections and commit regularly when possible
* remove db usage from csv download
* dead code
* hide deprecation warning in ddtrace
* remove unused param
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* debug script + slight refactor of db class
* better comments
* move setup logger
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* friendlier handling of slack channel retrieval
* retry on downgrade_postgres deadlock
* fix comment
* text
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Convert the model_names and display_model_names into a set instead
* Update backend/alembic/versions/7a70b7664e37_add_model_configuration_table.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Return default value instead of throwing error
* Add default parameter
* Move logic around
* Use dummy value for max_input_tokens in testing flow
* Remove unnecessary assignment
* tool to generate vespa schema variations for our cloud
* extraneous assign
* use a real templating system instead of search/replace
* fix float
* maybe this should be double
* remove redundant var
* template the other files
* try a spawned process
* move the wrapper
* fix args
* increase timeout
* run multitenant reset operations out of process as well
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add emails to retry with on 403
* attempted fix for connector test
* CW comments
* connector test fix
* test fixes and continue on 403
* fix tests
* fix tests
* fix concurrency tests
* fix integration tests with llmprovider eager loading
* Add multi text array field
* Add multiple values to model configuration for a custom LLM provider
* Fix reference to old field name
* Add migration
* Update all instances of model_names / display_model_names to use new schema migration
* Update background task
* Update endpoints to not throw errors
* Add test
* Update backend/alembic/versions/7a70b7664e37_add_models_configuration_table.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Fix list comprehension nits
* Update web/src/components/admin/connectors/Field.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/admin/configuration/llm/interfaces.ts
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implement greptile recommendations
* Update backend/onyx/db/llm.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/server/manage/llm/api.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/db/llm.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Fix more greptile suggestions
* Run formatter again
* Update backend/onyx/db/models.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add relationship to `LLMProvider` and `ModelConfigurations` classes
* Use sqlalchemy ORM relationships instead of manually populating fields
* Upgrade migration
* Update interface
* Remove all instances of model_names and display_model_names from backend
* Add more tests and fix bugs
* Run prettier
* Add types
* Update migration to perform data transformation
* Ensure native llm providers don't have custom max input tokens
* Start updating frontend logic to support custom max input tokens
* Pass max input tokens to LLM class (to be passed into `litellm.completion` call later)
* Add ModelConfigurationField component for custom llm providers
* Edit spacing and styling of model configuration matrix
* Fix error message displaying bug
* Edit opacity of `FiX` field for first index
* Change opacity back
* Change roundness
* Address comments on PR
* Perform fetching of `max_input_tokens` at the beginning of the callgraph and rope it throughout the entire callstack
* Change `add` to `execute`
* Move `max_input_tokens` into `LLMConfig`
* Fix bug with error messages not being cleared
* Change field used to fetch LLMProvider
* Fix model-configuration UI
* Address comments
* Remove circular import
* Fix failing tests in GH
* Fix failing tests
* Use `isSubset` instead of equality to determine native vs custom LLM Provider
* Remove unused import
* Make responses always display max_input_tokens
* Fix api endpoint to hit
* Update types in web application
* Update object field
* Fix more type errors
* Fix failing llm provider tests
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* add o3 + o4 mini
* k
* see which ones fail
* attempt
* k
* k
* llm ordering passing
* all tests passing
* quick bump
* Revert "add o3 + o4 mini"
This reverts commit 4cfa1984ec.
* k
* k
* tool to generate vespa schema variations for our cloud
* extraneous assign
* use a real templating system instead of search/replace
* fix float
* maybe this should be double
* remove redundant var
* template the other files
* try a spawned process
* move the wrapper
* fix args
* increase timeout
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* tool to generate vespa schema variations for our cloud
* extraneous assign
* float, not double
* back to double
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* refactor a mega function for readability and make sure to increment retry_count on exception so that we don't infinitely loop
* improve session and page level context handling
* don't use pydantic for the session context
* we don't need retry success
* move playwright handling into the session context
* need to break on ok
* return doc from scrape
* fix comment
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Fix duplicate kwarg issue
* Change how vertex_credentials are passed
* Modify temporary dict instead
* Change string to a global constant
* Add extra condition to if-check during population of map
* small improvement to checking for image attachments
* better comments
* check centralized list of types instead of hardcoding them in the connector
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* upgrade celery to release version
* make the watchdog script more reusable
* use constant
* code review
* catch interrupt
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* rollback properly on exception
* rollback on exception
* don't continue if we can't set the search path
* cleaner handling via context manager
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* initial working version
* ranking profile
* modification for keyword/instruction retrieval
* mypy fixes
* EL comments
* added env var (True for now)
* flipped default to False
* mypy & final EL/CW comments + import issue
* refactor salesforce sqlite db access
* more refactoring
* refactor again
* refactor again
* rename object
* add finalizer to ensure db connection is always closed
* avoid unnecessarily nesting connections and commit regularly when possible
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* bump fastapi and starlette
* bumping llama index and nltk and associated deps
* bump to fix python-multipart
* bump aiohttp
* update package lock for examples/widget
* bump black
* sentencesplitter has changed namespaces
* fix reorder import check, fix missing passlib
* update package-lock.json
* black formatter updated
* reformatted again
* change to black compatible reorder
* change to black compatible reorder-python-imports fork
* fix pytest dependency
* black format again
* we don't need cdk.txt. update packages to be consistent across all packages
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* pass through various id's and log them in the model server for better tracking
* fix test
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* use send_task to be consistent
* add pidbox monitoring task
* add logging so we can track the task execution
* log the idletime of the pidbox
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
The code for token cost calculation fails when using a LiteLLM proxy due to mismatch with the provider naming. For now, just handle this exception and assume cost 0 when that happens instead of breaking the flow - A more precise, LiteLLM proxy based cost calculation (relying in the `/model/info`) LiteLLM Proxy method will be needed
* Add gemini well-known-llm-provider
* Edit styling of anonymous function
* Remove space
* Edit how advanced options are displayed
* Add VertexAI to acceptable llm providers
* Add new `FileUploadFormField` component
* Edit FileUpload component
* Clean up logic for displaying native llm providers; add support for more complex `CustomConfigKey` types
* Fix minor nits in web app
* Add ability to pass vertex credentials to `litellm`
* Remove unused prop
* Change name of enum value
* Add back ability to change form based on first time configurations
* Create new Error with string instead of throwing raw string
* Add more Gemini models
* Edit mappings for Gemini models
* Edit comment
* Rearrange llm models
* Run black formatter
* Remove complex configurations during first time registration
* Fix nit
* Update llm provider name
* Edit temporary formik field to also have the filename
* Run reformatter
* Reorder commits
* Add advanced configurations for enabled LLM Providers
* WIP
* WIP almost done, but realized we can just do basic retrieval
* rebased and added scripts
* improved approach to extracting smart chips
* remove files from previous branch
* fix connector tests
* fix test
* Update web connector implementation and fix line length issues
* Update configurations and fix connector issues
* Update Slack connector
* Update connectors and add jira_test_env to gitignore, removing sensitive information
* Restore checkpointing functionality and remove sensitive information
* Fix agent mode to properly handle thinking tokens
* up
* Enhance ThinkingBox component with improved content handling and animations. Added support for partial thinking tokens, refined scrolling behavior, and updated CSS for better visual feedback during thinking states.
* Create clean branch with frontend thinking mode changes only
* Update ThinkingBox component to include new props for completion and streaming states. Refactor smooth scrolling logic into a dedicated function for improved readability. Add new entry to .gitignore for jira_test_env.
* Remove autoCollapse prop from AIMessage component for improved flexibility in message display.
* Update thinking tokens handling in chat utils
* Remove unused cleanThinkingContent import from Messages component to streamline code.
---------
Co-authored-by: ferdinand loesch <f.loesch@sportradar.com>
Co-authored-by: EC2 Default User <ec2-user@ip-10-73-128-233.eu-central-1.compute.internal>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
* working around a gong race condition in their api
* add back gong basic test
* formatting
* add the call index
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* add some gc
* small refactoring for temp directories
* WIP
* add some gc collects and size calculations
* un-xfail
* fix salesforce test
* loose check for number of docs
* adjust test again
* cleanup
* nuke directory param, remove using sqlite db to cache email / id mappings
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Enhance Highspot connector with error handling and add unit tests for poll_source functionality
* Fix file extension validation logic to allow either plain text or document format
* gong debugging
* add retries via class level session, add debugging
* add gong connector test
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* add prometheus metrics endpoints via helper package
* model server specific requirements
* mark as public endpoint
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* fix large docs selected in chat pruning
* better approach to length restriction
* comments
* comments
* fix unit tests and minor pruning bug
* remove prints
* stubbing out request id
* passthru or create request id's in api and model server
* add onyx request id
* get request id logging into uvicorn
* no logs
* change prefixes
* fix comment
* docker image needs specific shared files
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* use slack's built in rate limit handler for the bot
* WIP
* fix the slack rate limit handler
* change default to 8
* cleanup
* try catch int conversion just in case
* linearize this logic better
* code review comments
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* new mit integration test template
* edit
* fix problem with ACL type tags and MIT testing for test_connector_deletion
* fix test_connector_deletion_for_overlapping_connectors
* disable some enterprise only tests in MIT version
* disable a bunch of user group / curator tests in MIT version
* wire off more tests
* typo fix
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* fix acl prefixing
* increase timeout a tad
* block access to init'ing DocumentAccess directly, fix test to work with ee/MIT
* fix env var checks
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* refactor file extension checking and add test for blob s3
* code review
* fix checking ext
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* possible fix for confluence query filter
* nuke the attachment filter query ... it doesn't work!
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* fix issue with drive connector service account indexing
* correct checkpoint resumption
* final set of fixes
* nit
* fix typing
* logging and CW comments
* nit
* wire off image downloading for confluence and gdrive if not enabled in settings
* fix partial func
* fix confluence basic test
* add test for skipping/allowing images
* review comments
* skip allow images test
* mock function using the db
* mock at the proper level
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* sanitize llm keys and handle updates properly
* fix llm provider testing
* fix test
* mypy
* fix default model editing
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Checkpointed Jira connector
* nit
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* typing improvements and test fixes
* cleaner typing
* remove default because it is from the future
* mypy
* Address EL comments
---------
Co-authored-by: evan-danswer <evan@danswer.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* work in progress
* work in progress
* WIP
* refactor, use inline attachment for image (base64 encoding doesn't work)
* pretty sure this belongs behind a multi_tenant check
* code review / refactor
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* remove title for slack
* initial working code
* simplification
* improvements
* name change to information_content_model
* avoid boost_score > 1.0
* nit
* EL comments and improvements
Improvements:
- proper import of information content model from cache or HF
- warm up for information content model
Other:
- EL PR review comments
* nit
* requirements version update
* fixed docker file
* new home for model_server configs
* default off
* small updates
* YS comments - pt 1
* renaming to chunk_boost & chunk table def
* saving and deleting chunk stats in new table
* saving and updating chunk stats
* improved dict score update
* create columns for individual boost factors
* RK comments
* Update migration
* manual import reordering
* fix oauth downloading and size limits in confluence
* bump black to get past corrupt hash
* try working around another corrupt package
* fix raw_bytes
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* rename agent test script to prevent pytest autodiscovery
* first cut
* fix log message
* fix up typing
* add a sample test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* functional initial auth modal
* k
* k
* k
* looking good
* k
* k
* k
* k
* update
* k
* k
* misc bunch
* improvements
* k
* address comments
* k
* nit
* update
* k
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
* slight improvements and notes to query history and seeding
* update test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add ingress for api and web
* helm setup docs
* add letsencrypt. close blocks
* use pathType ImplementationSpecific as Prefix is deprecated
* fix backend labels. configure nginx routes. update annotations
* fix linting
---------
Co-authored-by: Sajjad Anwar <sajjadkm@gmail.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* * Replaces Amazon and Anthropic Icons with version better suitable for both Dark and Light modes;
* Adds icon for DeepSeek;
* Simplify logic on icon selection;
* Adds entries for Phi-4, Claude 3.7, Ministral and Gemini 2.0 models
* nit
* k
* k
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* Update text embedding model to version 005 and enhance embedding retrieval process
* re
* Fix formatting issues
* Add support for Bedrock reranking provider and AWS credentials handling
* fix: improve AWS key format validation and error messages
* Fix vertex embedding model crash
* feat: add environment template for local development setup
* Add display name for Claude 3.7 Sonnet model
* Add display names for Gemini 2.0 models and update Claude 3.7 Sonnet entry
* Fix ruff errors by ensuring lines are within 130 characters
* revert to currently default onyx browser settings
* add / fix boto requirements
---------
Co-authored-by: ferdinand loesch <f.loesch@sportradar.com>
Co-authored-by: Ferdinand Loesch <ferdinandloesch@me.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* fix blowing up the entire task on exception and trying to reuse an invalid db session
* list comprehension
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
A new setting 'is_ephemeral' has been added to the Slack channel configurations.
Key features/effects:
- if is_ephemeral is set for standard channel (and a Search Assistant is chosen):
- the answer is only shown to user as an ephemeral message
- the user has access to his private documents for a search (as the answer is only shown to them)
- the user has the ability to share the answer with the channel or keep private
- a recipient list cannot be defined if the channel is set up as ephemeral
- if is_ephemeral is set and DM with bot:
- the user has access to private docs in searches
- the message is not sent as ephemeral, as it is a 1:1 discussion with bot
- if is_ephemeral is not set but recipient list is set:
- the user search does *not* have access to their private documents as the information goes to the recipient list team members, and they may have different access rights
- Overall:
- Unless the channel is set to is_ephemeral or it is a direct conversation with the Bot, only public docs are accessible
- The ACL is never bypassed, also not in cases where the admin explicitly attached a document set to the bot config.
* print the test name when it runs
* type hints
* can't reuse session after an exception
* better logging
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
* first cut at confluence oauth
* work in progress
* work in progress
* work in progress
* work in progress
* work in progress
* first cut at distributed locking
* WIP to make test work
* add some dev mode affordances and gate usage of redis behind dynamic credentials
* mypy and credentials provider fixes
* WIP
* fix created at
* fix setting initialValue on everything
* remove debugging, fix ??? some TextFormField issues
* npm fixes
* comment cleanup
* fix comments
* pin the size of the card section
* more review fixes
* more fixes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* trying out a fix
* add ability to manually run model tests
* add log dump
* check status code, not text?
* just the model server
* add port mapping to host
* pass through more api keys
* add azure tests
* fix litellm env vars
* fix env vars in github workflow
* temp disable litellm test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* prompt addition for gpt o-series to encourage markdown formatting of code blocks
* fix to match https://simonwillison.net/tags/markdown/
* chris comment
* chris comment
* thread utils respect contextvars now
* address pablo comments
* removed tenant id from places it was already being passed
* fix rate limit check and pablo comment
* WIP
* implement hard timeout
* fix callbacks
* put back the timeout
* missed a file
* fixes
* try installing playwright deps
* Revert "try installing playwright deps"
This reverts commit 4217427568.
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* added timeouts for agent llm calls
* timing suggestions in agent config
* improved timeout that actually exits early
* added new global timeout and connection timeout distinction
* fixed error raising bug and made entity extraction recoverable
* warnings and refactor
* mypy
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* wip checkpointing/continue on failure
more stuff for checkpointing
Basic implementation
FE stuff
More checkpointing/failure handling
rebase
rebase
initial scaffolding for IT
IT to test checkpointing
Cleanup
cleanup
Fix it
Rebase
Add todo
Fix actions IT
Test more
Pagination + fixes + cleanup
Fix IT networking
fix it
* rebase
* Address misc comments
* Address comments
* Remove unused router
* rebase
* Fix mypy
* Fixes
* fix it
* Fix tests
* Add drop index
* Add retries
* reset lock timeout
* Try hard drop of schema
* Add timeout/retries to downgrade
* rebase
* test
* test
* test
* Close all connections
* test closing idle only
* Fix it
* fix
* try using null pool
* Test
* fix
* rebase
* log
* Fix
* apply null pool
* Fix other test
* Fix quality checks
* Test not using the fixture
* Fix ordering
* fix test
* Change pooling behavior
* better propagation of exceptions up the stack
* remove debug testing
* refactor the watchdog more to emit data consistently at the end of the function
* enumerate a lot more terminal statuses
* handle more codes
* improve logging
* handle "-9"
* single line exception logging
* typo/grammar
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* ignore result when using send_task on lightweight tasks
* fix ignore_result
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* no thread local locks in callbacks and raise permission sync timeout by a lot based on empirical log observations
* more fixes
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* move indexing
* all monitor work moved
* reacquire lock more
* remove monitor task completely
* fix import
* fix pruning finalization
* no multiplier on system/cloud tasks
* monitor queues every 30 seconds in the cloud
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* dedupe make_private_persona and update test
* add comment
* comments, and just have duplicate user id's for the test instead of modifying edit
* found the magic word
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add validation for pruning
* fix missing class
* get external group sync validation working
* backport fix for pruning check
* fix pruning
* log the payload id
* remove scan_iter from pruning
* missed removed scan_iter, also remove other scan_iters and replace with sscan_iter of the lookup table
* external group sync needs active signal. h
* log the payload id when the task starts
* log the payload id in more places
* use the replica
* increase primary pool and slow down beat
* scale sql pool based on concurrency
* fix concurrency
* add debugging for external group sync and tenant
* remove debugging and fix payload id
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* migrate most beat tasks to fan out strategy
* fix kwargs
* migrate EE tasks
* lock on the task_name level
* typo fix
* transform beat tasks for cloud
* cloud multiplier is only for cloud tasks
* bumpity
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* trigger indexing immediately when the ccpair is created
* add some logging and indexing trigger to the mock-credential endpoint
* better comments
* fix integration test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* try adding back some params
* raise timeout
* update chromatic version
* fix typo
* use chromatic imports
* update gitignore
* slim down the config file
* update readme
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* initial commit for helm chart refactoring
* Continue refactoring helm. I was able to use helm to deploy all of the apps to a cluster in aws. The bottleneck was setting up PVC dynamic provisioning.
* use default storage class
* Fix linter errors
* Fix broken helm test
* update
* Helm chart fixes
* remove reference to ebsstorage
* Fix linter errors
---------
Co-authored-by: jpb80 <jordan.buttkevitz@gmail.com>
- summarize history if long
- introduced cited_docs from SQ as those must be provided to answer generations
- limit number of docs
TODO: same for refined flow
* initial commit for helm chart refactoring
* Continue refactoring helm. I was able to use helm to deploy all of the apps to a cluster in aws. The bottleneck was setting up PVC dynamic provisioning.
* use default storage class
* Fix linter errors
* Fix broken helm test
---------
Co-authored-by: jpb80 <jordan.buttkevitz@gmail.com>
* Fix airtable connector w/ mt cloud + move telem logic to match new standard
* Address Greptile comment
* Small fixes/improvements
* Revert back monitoring frequency
* Small monitoring fix
* WIP for external group sync lock fixes
* prototyping permissions validation
* validate permission sync tasks in celery
* mypy
* cleanup and wire off external group sync checks for now
* add active key to reset
* improve logging
* reset on payload format change
* return False on exception
* missed a return
* add count of tasks scanned
* add comment
* better logging
* add return
* more return
* catch payload exceptions
* code review fixes
* push to restart test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add timings for syncing
* add more logging
* more debugging
* refactor multipass/db check out of VespaIndex
* circular imports?
* more debugging
* add logs
* various improvements
* additional logs to narrow down issue
* use global httpx pool for the main vespa flows in celery. Use in more places eventually.
* cleanup debug logging, etc
* remove debug logging
* this should use the secondary index
* mypy
* missed some logging
* review fixes
* refactor get_default_document_index to use search settings
* more missed logging
* fix circular refs
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Add support for filtering 0xFDD0-0xFDEF Unicode range
- Update remove_invalid_unicode_chars to handle 0xFDD0-0xFDEF range
- Add comprehensive test cases for Unicode character sanitization
- Fix issue with illegal code point 0xFDDB in Vespa indexing
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Remove unused pytest import
Co-Authored-By: Chris Weaver <chris@onyx.app>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
* feat: add option to treat all non-attachment fields as metadata in Airtable connector
- Added new UI option 'treat_all_non_attachment_fields_as_metadata'
- Updated backend logic to support treating all fields except attachments as metadata
- Added tests for both default and all-metadata behaviors
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: handle missing environment variables gracefully in airtable tests
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: clean up test file and handle environment variables properly
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add missing test fixture and fix formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type annotation for metadata dict in airtable tests
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type annotation for mock_get_api_key fixture
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: update Generator import to use collections.abc
Co-Authored-By: Chris Weaver <chris@onyx.app>
* refactor: make treat_all_non_attachment_fields_as_metadata a direct required parameter
- Move parameter from connector_config to direct class parameter
- Place parameter right under table_name_or_id argument
- Make parameter required in UI with no default value
- Update tests to use new parameter structure
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: rename _METADATA_FIELD_TYPES to DEFAULT_METADATA_FIELD_TYPES and clarify usage
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting in docstring
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: make airtable tests fail loudly on missing env vars
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: add required newline between test functions
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: update error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: fix error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix line length in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: simplify error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: add type validation test for treat_all_non_attachment_fields_as_metadata
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add missing required parameter in test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: remove parameter from test to properly validate it is required
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type validation for treat_all_non_attachment_fields_as_metadata parameter
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in airtable_connector.py
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: update type validation test to handle mypy errors
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: specify mypy ignore type for call-arg
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Also handle rows w/o sections
* style: fix black formatting in test assertion
Co-Authored-By: Chris Weaver <chris@onyx.app>
* add TODO
* Remove unnecessary check
* Fix test
* Do not break existing airtable connectors
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
Co-authored-by: Weves <chrisweaver101@gmail.com>
* try using a redis replica in some areas
* harden up replica usage
* comment
* slow down cloud dispatch temporarily
* add ignored syncing list back
* raise multiplier to 8
* comment out per tenant code (no longer used by fanout)
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* migrate most beat tasks to fan out strategy
* fix kwargs
* migrate EE tasks
* lock on the task_name level
* typo fix
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* cloud check for migrations
* fix table declaration
* change back interval
* Fix usage of POSTGRES_DEFAULT_SCHEMA
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* signal from the watchdog so that the monitor task doesn't try to clean up before it can exit
* ttl constants
* improve comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Added ability to use a tag to insert the current datetime in prompts
* made tagging logic more robust
* rename
* k
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
* Various fixes/improvements to document counting
* Add new column + index
* Avoid double scan
* comment fixes
* Fix revision history
* Fix IT
* Fix IT
* Fix migration
* Rebase
* Made copy button and cmd+c work for cmd+v and cmd+shift+v
* made sub selections work as well
* ok it works
* fixed npm run build
* im not from earth
* added logging
* more logging
* bye logs
* should work now
* whoops
* added stuff
* made it robust
* ctrl shift v behavior
* WIP
* WIP
* try spinning out check for indexing into a system task
* check for the correct delimiter
* use constants
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Combined Persona and Prompt API
* quality
* added tests
* consolidated models and got rid of redundant fields
* tenant appreciation day
* reverted default
* added missing dependency, missing api key placeholder, updated docs
* Apply black formatting and validate bot token functionality
* acknowledging black formatting
* added the validation to update tokens as well
* Made the token validation errors looks nicer
* getting rif of duplicate dependency
* testing some tweaks based on issues seen with okteto
* shorten session usage in indexing. still a couple of long running sessions to clean up
* merge sessions
* fixing detached session issues
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* prototype tools for handling prod issues
* add some commands
* add batching and dry run options
* custom redis tool
* comment
* default to app config settings for redis
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add index to speed up get last attempt
* use descending order
* put back unique param
* how did this not get formatted?
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* more debugging
* test reacquire outside of loop
* more logging
* move lock_beat test outside the try catch so that we don't worry about testing locks we never took
* use a larger scan_iter value for performance
* batch stale document sync batches
* add debug logging for a particular timeout issue
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Added Permission Syncing for Salesforce
* cleanup
* updated connector doc conversion
* finished salesforce permission syncing
* fixed connector to batch Salesforce queries
* tests!
* k
* Added error handling and check for ee and sync type for postprocessing
* comments
* minor touchups
* tested to work!
* done
* my pie
* lil cleanup
* minor comment
* discord: frontend and backend poll connector
* added requirements for discord installation
* fixed the mypy errors
* process messages not part of any thread
* minor change
* updated the connector; this logic works & am able to docs when i print
* minor change
* ability to enter a start date to pull docs from and refactor
* added the load connector and fixed mypy errors
* local commit test
done!
* minor refactor and properly commented everything
* updated the logic to handle permissions and index active/archived threads
* basic discord test template
* cleanup
* going away with the danswer discord client class ; using an async context manager
* moved to proper folder
* minor fixes
* needs improvement
* fixed discord icon
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
- renamed post-reranking/validation citation information consistently to final_... (example: doc_id_to_rank_map -> final_doc_id_to_rank_map)
- changed and renamed objects containing initial ranking information (now: display_...) consistent with final rankings (final_...). Specifically, {} to [] for displayed_search_results
- for CitationInfo, changed citation_num from 'x-th citation in response stream' to the initial position of the doc [NOTE: test implications]
- changed tests:
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_processing.py
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_substitution.py
* re-prep user group deletion on the actual deletion
* user group needs to be synced to be prepped
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* improve model server logging
* improve exception logging with provider/model names
* get everything into one log line
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* try fixing exception in cloud
* raise beat expiry ... 60 seconds might be starving certain tasks completely
* adjust expiry down to 10 min
* raise concurrency overflow for indexing worker.
* parent pid check
* fix comment
* fix parent pid check, also actually raise an exception from the task if the spawned task exit status is bad
* fix pid check
* some cleanup and task wait fixes
* review fixes
* comment some code so we don't change too many things at once
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* old oauth file left behind
* fix function change that was lost in merge
* fix some testing vars
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* associating credentials with connectors is not considered editing
* formatting
* formatting
* Update credentials.py
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
* temporarily disabling validate indexing fences
* add back a few startup checks in the cloud
* use common vespa client to perform health check
* log vespa url and try using http1 on light worker index methods
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* k
* functional iam auth
* k
* k
* improve typing
* add deployment options
* cleanup
* quick clean up
* minor cleanup
* additional clarity for db session operations
* nit
* k
* k
* update configs
* docker compose spacing
* allow beat tasks to expire. it isn't important that they all run
* validate fences are in a good state and cancel/fail them if not
* add function timings for important beat tasks
* optimize lookups, add lots of comments
* review changes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Mismatch issue of Documents shown and Citation number in text fix
When document order presented to LLM differs from order shown to user, wrong doc numbers are cited.
Fix:
- SearchTool.get_search_result returns now final and initial ranking
- initial ranking is passed through a few objects and used for replacement in citation processing
Notes:
- the citation_num in the CitationInfo() object has not been changed.
* PR fixes
- linting
- removed erroneous tab
- added a substitution test case
- adjusted original citation extraction use case
* Included a key test and
* Fixed extra spaces
* Updated test documentation
Updated:
- test_citation_substitution (changed description)
- test_citation_processing (removed data only relevant for the substitution)
* better handling around index attempts that don't exist and remove unnecessary index attempt deletions
* don't delete index attempts, just update them
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* change text and formatting to guide users away from thinking "Back to Danswer" is a back button
* regular text color and different icon
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* More logging for external group syncing
* Fixed edge case where some spaces were not being fetched
* made refresh frequency for confluence syncs configurable
* clarity
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* cleanup
* comment work in progress
* move some stuff to ee, add some playwright tests for the oauth callback edge cases
* fix ee, fix test name
* fix tests
* code review fixes
if [[ ! "$PR_TITLE" =~ ^(feat|fix|docs|test|ci|refactor|perf|chore|revert|build)(\(.+\))?:\ .+ ]]; then
echo "::error::❌ Your PR title does not follow the Conventional Commits format.
This check ensures that all pull requests use clear, consistent titles that help automate changelogs and improve project history.
Please update your PR title to follow the Conventional Commits style.
Here is a link to a blog explaining the reason why we've included the Conventional Commits style into our PR titles: https://xfuture-blog.com/working-with-conventional-commits
- [Nginx](https://nginx.org/) (Not needed for development flows generally)
> **Note:**
> This guide provides instructions to build and run Danswer locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Danswer stack within Docker below.
> This guide provides instructions to build and run Onyx locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Onyx stack within Docker below.
### Local Set Up
Be sure to use Python version 3.11. For instructions on installing Python 3.11 on macOS, refer to the [CONTRIBUTING_MACOS.md](./CONTRIBUTING_MACOS.md) readme.
If using a lower version, modifications will have to be made to the code.
If using a higher version, sometimes some libraries will not be available (i.e. we had problems with Tensorflow in the past with higher versions of python).
#### Backend: Python requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
```bash
python -m venv .venv
source .venv/bin/activate
```
> **Note:**
> This virtual environment MUST NOT be set up WITHIN the danswer directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the danswer directory.
> This virtual environment MUST NOT be set up WITHIN the onyx directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the onyx directory.
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
```
If using PowerShell, the command slightly differs:
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
$env:AUTH_TYPE='disabled'
uvicorn danswer.main:app --reload --port 8080
uvicorn onyx.main:app --reload --port 8080
"
```
@@ -182,57 +246,32 @@ You should now have 4 servers running:
- Model server
- Background jobs
Now, visit `http://localhost:3000` in your browser. You should see the Danswer onboarding wizard where you can connect your external LLM provider to Danswer.
Now, visit `http://localhost:3000` in your browser. You should see the Onyx onboarding wizard where you can connect your external LLM provider to Onyx.
You've successfully set up a local Danswer instance! 🏁
You've successfully set up a local Onyx instance! 🏁
#### Running the Danswer application in a container
#### Running the Onyx application in a container
You can run the full Danswer application stack from pre-built images including all external software dependencies.
You can run the full Onyx application stack from pre-built images including all external software dependencies.
Navigate to `danswer/deployment/docker_compose` and run:
Navigate to `onyx/deployment/docker_compose` and run:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d
docker compose -f docker-compose.dev.yml -p onyx-stack up -d
```
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Danswer.
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Onyx.
If you want to make changes to Danswer and run those changes in Docker, you can also build a local version of the Danswer container images that incorporates your changes like so:
If you want to make changes to Onyx and run those changes in Docker, you can also build a local version of the Onyx container images that incorporates your changes like so:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --build
docker compose -f docker-compose.dev.yml -p onyx-stack up -d --build
```
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
With the virtual environment active, install the pre-commit library with:
```bash
pip install pre-commit
```
Then, from the `danswer/backend` directory, run:
```bash
pre-commit install
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we want to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
#### Web
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `danswer/web` directory.
To run the formatter, use `npx prettier --write .` from the `danswer/web` directory.
Please double check that prettier passes before creating a pull request.
### Release Process
Danswer loosely follows the SemVer versioning standard.
Onyx loosely follows the SemVer versioning standard.
Major changes are released with a "minor" version bump. Currently we use patch release versions to indicate small feature changes.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
You can see the containers [here](https://hub.docker.com/search?q=onyx%2F).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/onyx-dot-app/onyx/blob/main/CONTRIBUTING.md).
### Setting up Python
Ensure [Homebrew](https://brew.sh/) is already set up.
Then install python 3.11.
```bash
brew install python@3.11
```
Add python 3.11 to your path: add the following line to ~/.zshrc
@@ -2,9 +2,9 @@ Copyright (c) 2023-present DanswerAI, Inc.
Portions of this software are licensed as follows:
* All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
* All third party components incorporated into the Danswer Software are licensed under the original license provided by the owner of the applicable component.
* Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
- All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
- All third party components incorporated into the Onyx Software are licensed under the original license provided by the owner of the applicable component.
- Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
**To try it out for free and get started in seconds, check out [Onyx Cloud](https://cloud.onyx.app/signup)**.
Danswer can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
Onyx can also be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.onyx.app/deployment/getting_started/quickstart) to learn more.
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/danswer-ai/danswer/tree/main/deployment/kubernetes).
We also have built-in support for high-availability/scalable deployment on Kubernetes.
* Chat UI with the ability to select documents to chat with.
*Create custom AI Assistants with different prompts and backing knowledge sets.
*Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
*Document Search + AI Answers for natural language queries.
* Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
* Slack integration to get answers and search results directly in Slack.
## 🔍 Other Notable Benefits of Onyx
- Custom deep learning models for indexing and inference time, only through Onyx + learning from user feedback.
-Flexible security features like SSO (OIDC/SAML/OAuth2), RBAC, encryption of credentials, etc.
-Knowledge curation features like document-sets, query history, usage analytics, etc.
-Scalable deployment options tested up to many tens of thousands users and hundreds of millions of documents.
## 🚧 Roadmap
*Chat/Prompt sharing with specific teammates and user groups.
*Multimodal model support, chat with images, video etc.
*Choosing between LLMs and parameters during chat session.
*Tool calling and agent configurations options.
*Organizational understanding and ability to locate and suggest experts from your team.
## Other Notable Benefits of Danswer
* User Authentication with document level access management.
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* Admin Dashboard to configure connectors, document-sets, access, etc.
* Custom deep learning models + learn from user feedback.
* Easy deployment and ability to host Danswer anywhere of your choosing.
-New methods in information retrieval (StructRAG, LightGraphRAG, etc.)
-Personalized Search
-Organizational understanding and ability to locate and suggest experts from your team.
-Code Search
-SQL and Structured Query Language
## 🔌 Connectors
Efficiently pulls the latest changes from:
* Slack
* GitHub
* Google Drive
* Confluence
* Jira
* Zendesk
* Gmail
* Notion
* Gong
* Slab
* Linear
* Productboard
* Guru
* Bookstack
* Document360
* Sharepoint
* Hubspot
* Local Files
* Websites
* And more ...
Keep knowledge and access up to sync across 40+ connectors:
## 📚 Editions
- Google Drive
- Confluence
- Slack
- Gmail
- Salesforce
- Microsoft Sharepoint
- Github
- Jira
- Zendesk
- Gong
- Microsoft Teams
- Dropbox
- Local Files
- Websites
- And more ...
There are two editions of Danswer:
See the full list [here](https://docs.onyx.app/admin/connectors/overview).
* Danswer Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Danswer you will get if you follow the Deployment guide above.
* Danswer Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
* Single Sign-On (SSO), with support for both SAML and OIDC
* Role-based access control
* Document permission inheritance from connected sources
* Usage analytics and query history accessible to admins
* Whitelabeling
* API key authentication
* Encryption of secrets
* Any many more! Checkout [our website](https://www.danswer.ai/) for the latest.
To try the Danswer Enterprise Edition:
## 📚 Licensing
There are two editions of Onyx:
- Onyx Community Edition (CE) is available freely under the MIT Expat license. Simply follow the Deployment guide above.
- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations.
For feature details, check out [our website](https://www.onyx.app/pricing).
2. For self-hosting the Enterprise Edition, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/onyx/founders).
2. For self-hosting, contact us at [founders@danswer.ai](mailto:founders@danswer.ai) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
## ⭐Star History
[](https://star-history.com/#danswer-ai/danswer&Date)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.