* use slack's built in rate limit handler for the bot
* WIP
* fix the slack rate limit handler
* change default to 8
* cleanup
* try catch int conversion just in case
* linearize this logic better
* code review comments
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* new mit integration test template
* edit
* fix problem with ACL type tags and MIT testing for test_connector_deletion
* fix test_connector_deletion_for_overlapping_connectors
* disable some enterprise only tests in MIT version
* disable a bunch of user group / curator tests in MIT version
* wire off more tests
* typo fix
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* fix acl prefixing
* increase timeout a tad
* block access to init'ing DocumentAccess directly, fix test to work with ee/MIT
* fix env var checks
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* refactor file extension checking and add test for blob s3
* code review
* fix checking ext
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* possible fix for confluence query filter
* nuke the attachment filter query ... it doesn't work!
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* fix issue with drive connector service account indexing
* correct checkpoint resumption
* final set of fixes
* nit
* fix typing
* logging and CW comments
* nit
* wire off image downloading for confluence and gdrive if not enabled in settings
* fix partial func
* fix confluence basic test
* add test for skipping/allowing images
* review comments
* skip allow images test
* mock function using the db
* mock at the proper level
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* sanitize llm keys and handle updates properly
* fix llm provider testing
* fix test
* mypy
* fix default model editing
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Checkpointed Jira connector
* nit
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* typing improvements and test fixes
* cleaner typing
* remove default because it is from the future
* mypy
* Address EL comments
---------
Co-authored-by: evan-danswer <evan@danswer.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* work in progress
* work in progress
* WIP
* refactor, use inline attachment for image (base64 encoding doesn't work)
* pretty sure this belongs behind a multi_tenant check
* code review / refactor
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* remove title for slack
* initial working code
* simplification
* improvements
* name change to information_content_model
* avoid boost_score > 1.0
* nit
* EL comments and improvements
Improvements:
- proper import of information content model from cache or HF
- warm up for information content model
Other:
- EL PR review comments
* nit
* requirements version update
* fixed docker file
* new home for model_server configs
* default off
* small updates
* YS comments - pt 1
* renaming to chunk_boost & chunk table def
* saving and deleting chunk stats in new table
* saving and updating chunk stats
* improved dict score update
* create columns for individual boost factors
* RK comments
* Update migration
* manual import reordering
* fix oauth downloading and size limits in confluence
* bump black to get past corrupt hash
* try working around another corrupt package
* fix raw_bytes
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* rename agent test script to prevent pytest autodiscovery
* first cut
* fix log message
* fix up typing
* add a sample test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* functional initial auth modal
* k
* k
* k
* looking good
* k
* k
* k
* k
* update
* k
* k
* misc bunch
* improvements
* k
* address comments
* k
* nit
* update
* k
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
* slight improvements and notes to query history and seeding
* update test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add ingress for api and web
* helm setup docs
* add letsencrypt. close blocks
* use pathType ImplementationSpecific as Prefix is deprecated
* fix backend labels. configure nginx routes. update annotations
* fix linting
---------
Co-authored-by: Sajjad Anwar <sajjadkm@gmail.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* * Replaces Amazon and Anthropic Icons with version better suitable for both Dark and Light modes;
* Adds icon for DeepSeek;
* Simplify logic on icon selection;
* Adds entries for Phi-4, Claude 3.7, Ministral and Gemini 2.0 models
* nit
* k
* k
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* Update text embedding model to version 005 and enhance embedding retrieval process
* re
* Fix formatting issues
* Add support for Bedrock reranking provider and AWS credentials handling
* fix: improve AWS key format validation and error messages
* Fix vertex embedding model crash
* feat: add environment template for local development setup
* Add display name for Claude 3.7 Sonnet model
* Add display names for Gemini 2.0 models and update Claude 3.7 Sonnet entry
* Fix ruff errors by ensuring lines are within 130 characters
* revert to currently default onyx browser settings
* add / fix boto requirements
---------
Co-authored-by: ferdinand loesch <f.loesch@sportradar.com>
Co-authored-by: Ferdinand Loesch <ferdinandloesch@me.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* fix blowing up the entire task on exception and trying to reuse an invalid db session
* list comprehension
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
A new setting 'is_ephemeral' has been added to the Slack channel configurations.
Key features/effects:
- if is_ephemeral is set for standard channel (and a Search Assistant is chosen):
- the answer is only shown to user as an ephemeral message
- the user has access to his private documents for a search (as the answer is only shown to them)
- the user has the ability to share the answer with the channel or keep private
- a recipient list cannot be defined if the channel is set up as ephemeral
- if is_ephemeral is set and DM with bot:
- the user has access to private docs in searches
- the message is not sent as ephemeral, as it is a 1:1 discussion with bot
- if is_ephemeral is not set but recipient list is set:
- the user search does *not* have access to their private documents as the information goes to the recipient list team members, and they may have different access rights
- Overall:
- Unless the channel is set to is_ephemeral or it is a direct conversation with the Bot, only public docs are accessible
- The ACL is never bypassed, also not in cases where the admin explicitly attached a document set to the bot config.
* print the test name when it runs
* type hints
* can't reuse session after an exception
* better logging
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
* first cut at confluence oauth
* work in progress
* work in progress
* work in progress
* work in progress
* work in progress
* first cut at distributed locking
* WIP to make test work
* add some dev mode affordances and gate usage of redis behind dynamic credentials
* mypy and credentials provider fixes
* WIP
* fix created at
* fix setting initialValue on everything
* remove debugging, fix ??? some TextFormField issues
* npm fixes
* comment cleanup
* fix comments
* pin the size of the card section
* more review fixes
* more fixes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* trying out a fix
* add ability to manually run model tests
* add log dump
* check status code, not text?
* just the model server
* add port mapping to host
* pass through more api keys
* add azure tests
* fix litellm env vars
* fix env vars in github workflow
* temp disable litellm test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* prompt addition for gpt o-series to encourage markdown formatting of code blocks
* fix to match https://simonwillison.net/tags/markdown/
* chris comment
* chris comment
* thread utils respect contextvars now
* address pablo comments
* removed tenant id from places it was already being passed
* fix rate limit check and pablo comment
* WIP
* implement hard timeout
* fix callbacks
* put back the timeout
* missed a file
* fixes
* try installing playwright deps
* Revert "try installing playwright deps"
This reverts commit 4217427568.
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* added timeouts for agent llm calls
* timing suggestions in agent config
* improved timeout that actually exits early
* added new global timeout and connection timeout distinction
* fixed error raising bug and made entity extraction recoverable
* warnings and refactor
* mypy
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* wip checkpointing/continue on failure
more stuff for checkpointing
Basic implementation
FE stuff
More checkpointing/failure handling
rebase
rebase
initial scaffolding for IT
IT to test checkpointing
Cleanup
cleanup
Fix it
Rebase
Add todo
Fix actions IT
Test more
Pagination + fixes + cleanup
Fix IT networking
fix it
* rebase
* Address misc comments
* Address comments
* Remove unused router
* rebase
* Fix mypy
* Fixes
* fix it
* Fix tests
* Add drop index
* Add retries
* reset lock timeout
* Try hard drop of schema
* Add timeout/retries to downgrade
* rebase
* test
* test
* test
* Close all connections
* test closing idle only
* Fix it
* fix
* try using null pool
* Test
* fix
* rebase
* log
* Fix
* apply null pool
* Fix other test
* Fix quality checks
* Test not using the fixture
* Fix ordering
* fix test
* Change pooling behavior
* better propagation of exceptions up the stack
* remove debug testing
* refactor the watchdog more to emit data consistently at the end of the function
* enumerate a lot more terminal statuses
* handle more codes
* improve logging
* handle "-9"
* single line exception logging
* typo/grammar
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* ignore result when using send_task on lightweight tasks
* fix ignore_result
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* no thread local locks in callbacks and raise permission sync timeout by a lot based on empirical log observations
* more fixes
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* move indexing
* all monitor work moved
* reacquire lock more
* remove monitor task completely
* fix import
* fix pruning finalization
* no multiplier on system/cloud tasks
* monitor queues every 30 seconds in the cloud
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* dedupe make_private_persona and update test
* add comment
* comments, and just have duplicate user id's for the test instead of modifying edit
* found the magic word
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add validation for pruning
* fix missing class
* get external group sync validation working
* backport fix for pruning check
* fix pruning
* log the payload id
* remove scan_iter from pruning
* missed removed scan_iter, also remove other scan_iters and replace with sscan_iter of the lookup table
* external group sync needs active signal. h
* log the payload id when the task starts
* log the payload id in more places
* use the replica
* increase primary pool and slow down beat
* scale sql pool based on concurrency
* fix concurrency
* add debugging for external group sync and tenant
* remove debugging and fix payload id
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* migrate most beat tasks to fan out strategy
* fix kwargs
* migrate EE tasks
* lock on the task_name level
* typo fix
* transform beat tasks for cloud
* cloud multiplier is only for cloud tasks
* bumpity
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* trigger indexing immediately when the ccpair is created
* add some logging and indexing trigger to the mock-credential endpoint
* better comments
* fix integration test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* try adding back some params
* raise timeout
* update chromatic version
* fix typo
* use chromatic imports
* update gitignore
* slim down the config file
* update readme
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* initial commit for helm chart refactoring
* Continue refactoring helm. I was able to use helm to deploy all of the apps to a cluster in aws. The bottleneck was setting up PVC dynamic provisioning.
* use default storage class
* Fix linter errors
* Fix broken helm test
* update
* Helm chart fixes
* remove reference to ebsstorage
* Fix linter errors
---------
Co-authored-by: jpb80 <jordan.buttkevitz@gmail.com>
- summarize history if long
- introduced cited_docs from SQ as those must be provided to answer generations
- limit number of docs
TODO: same for refined flow
* initial commit for helm chart refactoring
* Continue refactoring helm. I was able to use helm to deploy all of the apps to a cluster in aws. The bottleneck was setting up PVC dynamic provisioning.
* use default storage class
* Fix linter errors
* Fix broken helm test
---------
Co-authored-by: jpb80 <jordan.buttkevitz@gmail.com>
* Fix airtable connector w/ mt cloud + move telem logic to match new standard
* Address Greptile comment
* Small fixes/improvements
* Revert back monitoring frequency
* Small monitoring fix
* WIP for external group sync lock fixes
* prototyping permissions validation
* validate permission sync tasks in celery
* mypy
* cleanup and wire off external group sync checks for now
* add active key to reset
* improve logging
* reset on payload format change
* return False on exception
* missed a return
* add count of tasks scanned
* add comment
* better logging
* add return
* more return
* catch payload exceptions
* code review fixes
* push to restart test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add timings for syncing
* add more logging
* more debugging
* refactor multipass/db check out of VespaIndex
* circular imports?
* more debugging
* add logs
* various improvements
* additional logs to narrow down issue
* use global httpx pool for the main vespa flows in celery. Use in more places eventually.
* cleanup debug logging, etc
* remove debug logging
* this should use the secondary index
* mypy
* missed some logging
* review fixes
* refactor get_default_document_index to use search settings
* more missed logging
* fix circular refs
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Add support for filtering 0xFDD0-0xFDEF Unicode range
- Update remove_invalid_unicode_chars to handle 0xFDD0-0xFDEF range
- Add comprehensive test cases for Unicode character sanitization
- Fix issue with illegal code point 0xFDDB in Vespa indexing
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Remove unused pytest import
Co-Authored-By: Chris Weaver <chris@onyx.app>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
* feat: add option to treat all non-attachment fields as metadata in Airtable connector
- Added new UI option 'treat_all_non_attachment_fields_as_metadata'
- Updated backend logic to support treating all fields except attachments as metadata
- Added tests for both default and all-metadata behaviors
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: handle missing environment variables gracefully in airtable tests
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: clean up test file and handle environment variables properly
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add missing test fixture and fix formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type annotation for metadata dict in airtable tests
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type annotation for mock_get_api_key fixture
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: update Generator import to use collections.abc
Co-Authored-By: Chris Weaver <chris@onyx.app>
* refactor: make treat_all_non_attachment_fields_as_metadata a direct required parameter
- Move parameter from connector_config to direct class parameter
- Place parameter right under table_name_or_id argument
- Make parameter required in UI with no default value
- Update tests to use new parameter structure
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: rename _METADATA_FIELD_TYPES to DEFAULT_METADATA_FIELD_TYPES and clarify usage
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting in docstring
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: make airtable tests fail loudly on missing env vars
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: add required newline between test functions
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: update error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: fix error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix line length in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: simplify error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: add type validation test for treat_all_non_attachment_fields_as_metadata
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add missing required parameter in test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: remove parameter from test to properly validate it is required
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type validation for treat_all_non_attachment_fields_as_metadata parameter
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in airtable_connector.py
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: update type validation test to handle mypy errors
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: specify mypy ignore type for call-arg
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Also handle rows w/o sections
* style: fix black formatting in test assertion
Co-Authored-By: Chris Weaver <chris@onyx.app>
* add TODO
* Remove unnecessary check
* Fix test
* Do not break existing airtable connectors
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
Co-authored-by: Weves <chrisweaver101@gmail.com>
* try using a redis replica in some areas
* harden up replica usage
* comment
* slow down cloud dispatch temporarily
* add ignored syncing list back
* raise multiplier to 8
* comment out per tenant code (no longer used by fanout)
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* migrate most beat tasks to fan out strategy
* fix kwargs
* migrate EE tasks
* lock on the task_name level
* typo fix
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* cloud check for migrations
* fix table declaration
* change back interval
* Fix usage of POSTGRES_DEFAULT_SCHEMA
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* signal from the watchdog so that the monitor task doesn't try to clean up before it can exit
* ttl constants
* improve comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Added ability to use a tag to insert the current datetime in prompts
* made tagging logic more robust
* rename
* k
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
* Various fixes/improvements to document counting
* Add new column + index
* Avoid double scan
* comment fixes
* Fix revision history
* Fix IT
* Fix IT
* Fix migration
* Rebase
* Made copy button and cmd+c work for cmd+v and cmd+shift+v
* made sub selections work as well
* ok it works
* fixed npm run build
* im not from earth
* added logging
* more logging
* bye logs
* should work now
* whoops
* added stuff
* made it robust
* ctrl shift v behavior
* WIP
* WIP
* try spinning out check for indexing into a system task
* check for the correct delimiter
* use constants
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Combined Persona and Prompt API
* quality
* added tests
* consolidated models and got rid of redundant fields
* tenant appreciation day
* reverted default
* added missing dependency, missing api key placeholder, updated docs
* Apply black formatting and validate bot token functionality
* acknowledging black formatting
* added the validation to update tokens as well
* Made the token validation errors looks nicer
* getting rif of duplicate dependency
* testing some tweaks based on issues seen with okteto
* shorten session usage in indexing. still a couple of long running sessions to clean up
* merge sessions
* fixing detached session issues
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* prototype tools for handling prod issues
* add some commands
* add batching and dry run options
* custom redis tool
* comment
* default to app config settings for redis
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add index to speed up get last attempt
* use descending order
* put back unique param
* how did this not get formatted?
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* more debugging
* test reacquire outside of loop
* more logging
* move lock_beat test outside the try catch so that we don't worry about testing locks we never took
* use a larger scan_iter value for performance
* batch stale document sync batches
* add debug logging for a particular timeout issue
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Added Permission Syncing for Salesforce
* cleanup
* updated connector doc conversion
* finished salesforce permission syncing
* fixed connector to batch Salesforce queries
* tests!
* k
* Added error handling and check for ee and sync type for postprocessing
* comments
* minor touchups
* tested to work!
* done
* my pie
* lil cleanup
* minor comment
* discord: frontend and backend poll connector
* added requirements for discord installation
* fixed the mypy errors
* process messages not part of any thread
* minor change
* updated the connector; this logic works & am able to docs when i print
* minor change
* ability to enter a start date to pull docs from and refactor
* added the load connector and fixed mypy errors
* local commit test
done!
* minor refactor and properly commented everything
* updated the logic to handle permissions and index active/archived threads
* basic discord test template
* cleanup
* going away with the danswer discord client class ; using an async context manager
* moved to proper folder
* minor fixes
* needs improvement
* fixed discord icon
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
- renamed post-reranking/validation citation information consistently to final_... (example: doc_id_to_rank_map -> final_doc_id_to_rank_map)
- changed and renamed objects containing initial ranking information (now: display_...) consistent with final rankings (final_...). Specifically, {} to [] for displayed_search_results
- for CitationInfo, changed citation_num from 'x-th citation in response stream' to the initial position of the doc [NOTE: test implications]
- changed tests:
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_processing.py
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_substitution.py
* re-prep user group deletion on the actual deletion
* user group needs to be synced to be prepped
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* improve model server logging
* improve exception logging with provider/model names
* get everything into one log line
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* try fixing exception in cloud
* raise beat expiry ... 60 seconds might be starving certain tasks completely
* adjust expiry down to 10 min
* raise concurrency overflow for indexing worker.
* parent pid check
* fix comment
* fix parent pid check, also actually raise an exception from the task if the spawned task exit status is bad
* fix pid check
* some cleanup and task wait fixes
* review fixes
* comment some code so we don't change too many things at once
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* old oauth file left behind
* fix function change that was lost in merge
* fix some testing vars
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* associating credentials with connectors is not considered editing
* formatting
* formatting
* Update credentials.py
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
* temporarily disabling validate indexing fences
* add back a few startup checks in the cloud
* use common vespa client to perform health check
* log vespa url and try using http1 on light worker index methods
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* k
* functional iam auth
* k
* k
* improve typing
* add deployment options
* cleanup
* quick clean up
* minor cleanup
* additional clarity for db session operations
* nit
* k
* k
* update configs
* docker compose spacing
* allow beat tasks to expire. it isn't important that they all run
* validate fences are in a good state and cancel/fail them if not
* add function timings for important beat tasks
* optimize lookups, add lots of comments
* review changes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Mismatch issue of Documents shown and Citation number in text fix
When document order presented to LLM differs from order shown to user, wrong doc numbers are cited.
Fix:
- SearchTool.get_search_result returns now final and initial ranking
- initial ranking is passed through a few objects and used for replacement in citation processing
Notes:
- the citation_num in the CitationInfo() object has not been changed.
* PR fixes
- linting
- removed erroneous tab
- added a substitution test case
- adjusted original citation extraction use case
* Included a key test and
* Fixed extra spaces
* Updated test documentation
Updated:
- test_citation_substitution (changed description)
- test_citation_processing (removed data only relevant for the substitution)
* better handling around index attempts that don't exist and remove unnecessary index attempt deletions
* don't delete index attempts, just update them
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* change text and formatting to guide users away from thinking "Back to Danswer" is a back button
* regular text color and different icon
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* More logging for external group syncing
* Fixed edge case where some spaces were not being fetched
* made refresh frequency for confluence syncs configurable
* clarity
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* cleanup
* comment work in progress
* move some stuff to ee, add some playwright tests for the oauth callback edge cases
* fix ee, fix test name
* fix tests
* code review fixes
* checkpoint
* add celery termination of the task
* rename to RedisConnectorPermissionSyncPayload, add RedisLock to more places, add get_active_search_settings
* rename payload
* pretty sure these weren't named correctly
* testing in progress
* cleanup
* remove space
* merge fix
* three dots animation on Pausing
* improve messaging when connector is stopped or killed and animate buttons
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* use indexing flag in db for manually trigger indexing
* add comment.
* only try to release the lock if we actually succeeded with the lock
* ensure we don't trigger manual indexing on anything but the primary search settings
* comment usage of primary search settings
* run check for indexing immediately after indexing triggers are set
* reorder fix
* all done except routing
* fixed initial changes
* added backend endpoint for duplicating a chat session from Slack
* got chat duplication routing done
* got login routing working
* improved answer handling
* finished all checks
* finished all!
* made sure it works with google oauth
* dont remove that lol
* fixed weird thing
* bad comments
* Add description for Google Gemini models and custom model icons for LiteLLM (OpenAI) proxied models
* Adds Vertex AI aliases for Claude
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* shared admin level test dependency
* change to on - push (recommended by chromatic)
* change playwright reporter to list, name test jobs
* use test tags ... much cleaner
* test vs prod
* try copying templates
* run with localhost?
* revert to dev
* new tests and a bit of refactoring
* add additional checks so that page snapshots reflect loaded state
* more admin tests
* User Management tests
* remaining admin pages
* test search and chat
* await fix and exclude UI that changes with dates.
* test overlapping connectors (but using a source that is way too big and slow, fix that next)
* pass thru secrets
* rename
* rename again
* now we are fixing it
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* standardized escaping of CQL strings
* think i found it
* fix
* should be fixed
* added handling for special linking behavior in confluence
* Update onyx_confluence.py
* Update onyx_confluence.py
---------
Co-authored-by: rkuo-danswer <rkuo@danswer.ai>
* more logs
* this fence should be set to None
* type hinting
* reset deletion attempt if conditions are inconsistent
* always clean up in db if we reach reconciliation
* add reset method
* more logging
* harden up error checking
* Made external permissioned users and slack users show diff
* finished
* Fix typing
* k
* Fix
* k
---------
Co-authored-by: Weves <chrisweaver101@gmail.com>
* initial PoC
* preliminary working config
* first cut at chromatic tests
* first cut at chromatic tests
* fix yaml
* fix yaml again
* use workingDir
* adapt playwright example
* remove env
* fix working directory
* fix more paths
* fix dir
* add playwright setup
* accidentally deleted a step
* update test
* think we don't need home.png right now
* remove unused home.png
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add creator id to cc pair
* fix alembic head
* show email instead of UUID
* safer check on email
* make foreign key relationships optional
* always allow creator to edit (per hagen)
* use primary join
* no index_doc_batch spam
* try this again
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Make curators able to create permission synced connectors
* removed editing permission synced connectors for curators
* updated tests to use access type instead of is_public
* update copy
* in progress PoC
* working limited user, needs routes to be marked next
* make selected endpoint available to limited user role
* xfail on test_slack_prune
* add comment to sync function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* cloud auth referral source
* minor clarity
* k
* minor modification to be best practice
* typing
* Update ReferralSourceSelector.tsx
* Update ReferralSourceSelector.tsx
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* doc_sync is refactored
* maybe this works
* tested to work!
* mypy fixes
* enabled integration tests
* fixed the test
* added external group sync
* testing should work now
* mypy
* confluence doc id fix
* got group sync working
* addressed feedback
* renamed some vars and fixed mypy
* conf fix?
* added wiki handling to confluence connector
* test fixes
* revert google drive connector
* fixed groups
* hotfix
* re-enable helm
* allow manual triggering
* change vespa host
* change vespa chart location
* update Chart.lock
* update ct.yaml with new vespa chart repo
* bump vespa to 0.2.5
* update Chart.lock
* update to vespa 0.2.6
* bump vespa to 0.2.7
* bump to 0.2.8
* bump version
* try appending the ordinal
* try new configmap
* bump vespa
* bump vespa
* add debug to see if we can figure out what ct install thinks is failing
* add debug flag to helm
* try disabling nginx because of KinD
* use helm-extra-set-args
* try command line
* try pointing test connection to the correct service name
* bump vespa to 0.2.12
* update chart.lock
* bump vespa to 0.2.13
* bump vespa to 0.2.14
* bump vespa
* bump vespa
* re-enable chart testing only on changes
* name the check more specifically than "lint-test"
* add some debugging
* try setting remote
* might have to specify chart dirs directly
* add comments
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
- [Nginx](https://nginx.org/) (Not needed for development flows generally)
> **Note:**
> This guide provides instructions to build and run Danswer locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Danswer stack within Docker below.
> This guide provides instructions to build and run Onyx locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Onyx stack within Docker below.
### Local Set Up
Be sure to use Python version 3.11. For instructions on installing Python 3.11 on macOS, refer to the [CONTRIBUTING_MACOS.md](./CONTRIBUTING_MACOS.md) readme.
If using a lower version, modifications will have to be made to the code.
If using a higher version, sometimes some libraries will not be available (i.e. we had problems with Tensorflow in the past with higher versions of python).
#### Backend: Python requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
```bash
python -m venv .venv
source .venv/bin/activate
```
> **Note:**
> This virtual environment MUST NOT be set up WITHIN the danswer directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the danswer directory.
> This virtual environment MUST NOT be set up WITHIN the onyx directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the onyx directory.
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
```
If using PowerShell, the command slightly differs:
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
$env:AUTH_TYPE='disabled'
uvicorn danswer.main:app --reload --port 8080
uvicorn onyx.main:app --reload --port 8080
"
```
@@ -182,57 +245,32 @@ You should now have 4 servers running:
- Model server
- Background jobs
Now, visit `http://localhost:3000` in your browser. You should see the Danswer onboarding wizard where you can connect your external LLM provider to Danswer.
Now, visit `http://localhost:3000` in your browser. You should see the Onyx onboarding wizard where you can connect your external LLM provider to Onyx.
You've successfully set up a local Danswer instance! 🏁
You've successfully set up a local Onyx instance! 🏁
#### Running the Danswer application in a container
#### Running the Onyx application in a container
You can run the full Danswer application stack from pre-built images including all external software dependencies.
You can run the full Onyx application stack from pre-built images including all external software dependencies.
Navigate to `danswer/deployment/docker_compose` and run:
Navigate to `onyx/deployment/docker_compose` and run:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d
docker compose -f docker-compose.dev.yml -p onyx-stack up -d
```
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Danswer.
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Onyx.
If you want to make changes to Danswer and run those changes in Docker, you can also build a local version of the Danswer container images that incorporates your changes like so:
If you want to make changes to Onyx and run those changes in Docker, you can also build a local version of the Onyx container images that incorporates your changes like so:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --build
docker compose -f docker-compose.dev.yml -p onyx-stack up -d --build
```
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
With the virtual environment active, install the pre-commit library with:
```bash
pip install pre-commit
```
Then, from the `danswer/backend` directory, run:
```bash
pre-commit install
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we want to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
#### Web
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `danswer/web` directory.
To run the formatter, use `npx prettier --write .` from the `danswer/web` directory.
Please double check that prettier passes before creating a pull request.
### Release Process
Danswer loosely follows the SemVer versioning standard.
Onyx loosely follows the SemVer versioning standard.
Major changes are released with a "minor" version bump. Currently we use patch release versions to indicate small feature changes.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
You can see the containers [here](https://hub.docker.com/search?q=onyx%2F).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/onyx-dot-app/onyx/blob/main/CONTRIBUTING.md).
### Setting up Python
Ensure [Homebrew](https://brew.sh/) is already set up.
Then install python 3.11.
```bash
brew install python@3.11
```
Add python 3.11 to your path: add the following line to ~/.zshrc
@@ -2,9 +2,9 @@ Copyright (c) 2023-present DanswerAI, Inc.
Portions of this software are licensed as follows:
* All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
* All third party components incorporated into the Danswer Software are licensed under the original license provided by the owner of the applicable component.
* Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
- All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
- All third party components incorporated into the Onyx Software are licensed under the original license provided by the owner of the applicable component.
- Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
**To try it out for free and get started in seconds, check out [Onyx Cloud](https://cloud.onyx.app/signup)**.
Danswer can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
Onyx can also be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.onyx.app/quickstart) to learn more.
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/danswer-ai/danswer/tree/main/deployment/kubernetes).
We also have built-in support for high-availability/scalable deployment on Kubernetes.
* Chat UI with the ability to select documents to chat with.
*Create custom AI Assistants with different prompts and backing knowledge sets.
*Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
*Document Search + AI Answers for natural language queries.
* Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
* Slack integration to get answers and search results directly in Slack.
## 🔍 Other Notable Benefits of Onyx
- Custom deep learning models for indexing and inference time, only through Onyx + learning from user feedback.
-Flexible security features like SSO (OIDC/SAML/OAuth2), RBAC, encryption of credentials, etc.
-Knowledge curation features like document-sets, query history, usage analytics, etc.
-Scalable deployment options tested up to many tens of thousands users and hundreds of millions of documents.
## 🚧 Roadmap
*Chat/Prompt sharing with specific teammates and user groups.
*Multimodal model support, chat with images, video etc.
*Choosing between LLMs and parameters during chat session.
*Tool calling and agent configurations options.
*Organizational understanding and ability to locate and suggest experts from your team.
## Other Notable Benefits of Danswer
* User Authentication with document level access management.
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* Admin Dashboard to configure connectors, document-sets, access, etc.
* Custom deep learning models + learn from user feedback.
* Easy deployment and ability to host Danswer anywhere of your choosing.
-New methods in information retrieval (StructRAG, LightGraphRAG, etc.)
-Personalized Search
-Organizational understanding and ability to locate and suggest experts from your team.
-Code Search
-SQL and Structured Query Language
## 🔌 Connectors
Efficiently pulls the latest changes from:
* Slack
* GitHub
* Google Drive
* Confluence
* Jira
* Zendesk
* Gmail
* Notion
* Gong
* Slab
* Linear
* Productboard
* Guru
* Bookstack
* Document360
* Sharepoint
* Hubspot
* Local Files
* Websites
* And more ...
Keep knowledge and access up to sync across 40+ connectors:
## 📚 Editions
- Google Drive
- Confluence
- Slack
- Gmail
- Salesforce
- Microsoft Sharepoint
- Github
- Jira
- Zendesk
- Gong
- Microsoft Teams
- Dropbox
- Local Files
- Websites
- And more ...
There are two editions of Danswer:
See the full list [here](https://docs.onyx.app/connectors).
* Danswer Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Danswer you will get if you follow the Deployment guide above.
* Danswer Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
* Single Sign-On (SSO), with support for both SAML and OIDC
* Role-based access control
* Document permission inheritance from connected sources
* Usage analytics and query history accessible to admins
* Whitelabeling
* API key authentication
* Encryption of secrets
* Any many more! Checkout [our website](https://www.danswer.ai/) for the latest.
To try the Danswer Enterprise Edition:
## 📚 Licensing
There are two editions of Onyx:
- Onyx Community Edition (CE) is available freely under the MIT Expat license. Simply follow the Deployment guide above.
- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations.
For feature details, check out [our website](https://www.onyx.app/pricing).
2. For self-hosting the Enterprise Edition, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/onyx/founders).
2. For self-hosting, contact us at [founders@danswer.ai](mailto:founders@danswer.ai) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
## ⭐Star History
[](https://star-history.com/#danswer-ai/danswer&Date)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.