* add ingress for api and web
* helm setup docs
* add letsencrypt. close blocks
* use pathType ImplementationSpecific as Prefix is deprecated
* fix backend labels. configure nginx routes. update annotations
* fix linting
---------
Co-authored-by: Sajjad Anwar <sajjadkm@gmail.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* * Replaces Amazon and Anthropic Icons with version better suitable for both Dark and Light modes;
* Adds icon for DeepSeek;
* Simplify logic on icon selection;
* Adds entries for Phi-4, Claude 3.7, Ministral and Gemini 2.0 models
* nit
* k
* k
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* Update text embedding model to version 005 and enhance embedding retrieval process
* re
* Fix formatting issues
* Add support for Bedrock reranking provider and AWS credentials handling
* fix: improve AWS key format validation and error messages
* Fix vertex embedding model crash
* feat: add environment template for local development setup
* Add display name for Claude 3.7 Sonnet model
* Add display names for Gemini 2.0 models and update Claude 3.7 Sonnet entry
* Fix ruff errors by ensuring lines are within 130 characters
* revert to currently default onyx browser settings
* add / fix boto requirements
---------
Co-authored-by: ferdinand loesch <f.loesch@sportradar.com>
Co-authored-by: Ferdinand Loesch <ferdinandloesch@me.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* fix blowing up the entire task on exception and trying to reuse an invalid db session
* list comprehension
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
A new setting 'is_ephemeral' has been added to the Slack channel configurations.
Key features/effects:
- if is_ephemeral is set for standard channel (and a Search Assistant is chosen):
- the answer is only shown to user as an ephemeral message
- the user has access to his private documents for a search (as the answer is only shown to them)
- the user has the ability to share the answer with the channel or keep private
- a recipient list cannot be defined if the channel is set up as ephemeral
- if is_ephemeral is set and DM with bot:
- the user has access to private docs in searches
- the message is not sent as ephemeral, as it is a 1:1 discussion with bot
- if is_ephemeral is not set but recipient list is set:
- the user search does *not* have access to their private documents as the information goes to the recipient list team members, and they may have different access rights
- Overall:
- Unless the channel is set to is_ephemeral or it is a direct conversation with the Bot, only public docs are accessible
- The ACL is never bypassed, also not in cases where the admin explicitly attached a document set to the bot config.
* print the test name when it runs
* type hints
* can't reuse session after an exception
* better logging
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
* first cut at confluence oauth
* work in progress
* work in progress
* work in progress
* work in progress
* work in progress
* first cut at distributed locking
* WIP to make test work
* add some dev mode affordances and gate usage of redis behind dynamic credentials
* mypy and credentials provider fixes
* WIP
* fix created at
* fix setting initialValue on everything
* remove debugging, fix ??? some TextFormField issues
* npm fixes
* comment cleanup
* fix comments
* pin the size of the card section
* more review fixes
* more fixes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* trying out a fix
* add ability to manually run model tests
* add log dump
* check status code, not text?
* just the model server
* add port mapping to host
* pass through more api keys
* add azure tests
* fix litellm env vars
* fix env vars in github workflow
* temp disable litellm test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* prompt addition for gpt o-series to encourage markdown formatting of code blocks
* fix to match https://simonwillison.net/tags/markdown/
* chris comment
* chris comment
* thread utils respect contextvars now
* address pablo comments
* removed tenant id from places it was already being passed
* fix rate limit check and pablo comment
* WIP
* implement hard timeout
* fix callbacks
* put back the timeout
* missed a file
* fixes
* try installing playwright deps
* Revert "try installing playwright deps"
This reverts commit 4217427568.
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* added timeouts for agent llm calls
* timing suggestions in agent config
* improved timeout that actually exits early
* added new global timeout and connection timeout distinction
* fixed error raising bug and made entity extraction recoverable
* warnings and refactor
* mypy
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* wip checkpointing/continue on failure
more stuff for checkpointing
Basic implementation
FE stuff
More checkpointing/failure handling
rebase
rebase
initial scaffolding for IT
IT to test checkpointing
Cleanup
cleanup
Fix it
Rebase
Add todo
Fix actions IT
Test more
Pagination + fixes + cleanup
Fix IT networking
fix it
* rebase
* Address misc comments
* Address comments
* Remove unused router
* rebase
* Fix mypy
* Fixes
* fix it
* Fix tests
* Add drop index
* Add retries
* reset lock timeout
* Try hard drop of schema
* Add timeout/retries to downgrade
* rebase
* test
* test
* test
* Close all connections
* test closing idle only
* Fix it
* fix
* try using null pool
* Test
* fix
* rebase
* log
* Fix
* apply null pool
* Fix other test
* Fix quality checks
* Test not using the fixture
* Fix ordering
* fix test
* Change pooling behavior
* better propagation of exceptions up the stack
* remove debug testing
* refactor the watchdog more to emit data consistently at the end of the function
* enumerate a lot more terminal statuses
* handle more codes
* improve logging
* handle "-9"
* single line exception logging
* typo/grammar
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* ignore result when using send_task on lightweight tasks
* fix ignore_result
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* no thread local locks in callbacks and raise permission sync timeout by a lot based on empirical log observations
* more fixes
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* move indexing
* all monitor work moved
* reacquire lock more
* remove monitor task completely
* fix import
* fix pruning finalization
* no multiplier on system/cloud tasks
* monitor queues every 30 seconds in the cloud
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* dedupe make_private_persona and update test
* add comment
* comments, and just have duplicate user id's for the test instead of modifying edit
* found the magic word
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add validation for pruning
* fix missing class
* get external group sync validation working
* backport fix for pruning check
* fix pruning
* log the payload id
* remove scan_iter from pruning
* missed removed scan_iter, also remove other scan_iters and replace with sscan_iter of the lookup table
* external group sync needs active signal. h
* log the payload id when the task starts
* log the payload id in more places
* use the replica
* increase primary pool and slow down beat
* scale sql pool based on concurrency
* fix concurrency
* add debugging for external group sync and tenant
* remove debugging and fix payload id
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* migrate most beat tasks to fan out strategy
* fix kwargs
* migrate EE tasks
* lock on the task_name level
* typo fix
* transform beat tasks for cloud
* cloud multiplier is only for cloud tasks
* bumpity
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* trigger indexing immediately when the ccpair is created
* add some logging and indexing trigger to the mock-credential endpoint
* better comments
* fix integration test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* try adding back some params
* raise timeout
* update chromatic version
* fix typo
* use chromatic imports
* update gitignore
* slim down the config file
* update readme
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* initial commit for helm chart refactoring
* Continue refactoring helm. I was able to use helm to deploy all of the apps to a cluster in aws. The bottleneck was setting up PVC dynamic provisioning.
* use default storage class
* Fix linter errors
* Fix broken helm test
* update
* Helm chart fixes
* remove reference to ebsstorage
* Fix linter errors
---------
Co-authored-by: jpb80 <jordan.buttkevitz@gmail.com>
- summarize history if long
- introduced cited_docs from SQ as those must be provided to answer generations
- limit number of docs
TODO: same for refined flow
* initial commit for helm chart refactoring
* Continue refactoring helm. I was able to use helm to deploy all of the apps to a cluster in aws. The bottleneck was setting up PVC dynamic provisioning.
* use default storage class
* Fix linter errors
* Fix broken helm test
---------
Co-authored-by: jpb80 <jordan.buttkevitz@gmail.com>
* Fix airtable connector w/ mt cloud + move telem logic to match new standard
* Address Greptile comment
* Small fixes/improvements
* Revert back monitoring frequency
* Small monitoring fix
* WIP for external group sync lock fixes
* prototyping permissions validation
* validate permission sync tasks in celery
* mypy
* cleanup and wire off external group sync checks for now
* add active key to reset
* improve logging
* reset on payload format change
* return False on exception
* missed a return
* add count of tasks scanned
* add comment
* better logging
* add return
* more return
* catch payload exceptions
* code review fixes
* push to restart test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add timings for syncing
* add more logging
* more debugging
* refactor multipass/db check out of VespaIndex
* circular imports?
* more debugging
* add logs
* various improvements
* additional logs to narrow down issue
* use global httpx pool for the main vespa flows in celery. Use in more places eventually.
* cleanup debug logging, etc
* remove debug logging
* this should use the secondary index
* mypy
* missed some logging
* review fixes
* refactor get_default_document_index to use search settings
* more missed logging
* fix circular refs
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Add support for filtering 0xFDD0-0xFDEF Unicode range
- Update remove_invalid_unicode_chars to handle 0xFDD0-0xFDEF range
- Add comprehensive test cases for Unicode character sanitization
- Fix issue with illegal code point 0xFDDB in Vespa indexing
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Remove unused pytest import
Co-Authored-By: Chris Weaver <chris@onyx.app>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
* feat: add option to treat all non-attachment fields as metadata in Airtable connector
- Added new UI option 'treat_all_non_attachment_fields_as_metadata'
- Updated backend logic to support treating all fields except attachments as metadata
- Added tests for both default and all-metadata behaviors
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: handle missing environment variables gracefully in airtable tests
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: clean up test file and handle environment variables properly
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add missing test fixture and fix formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type annotation for metadata dict in airtable tests
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type annotation for mock_get_api_key fixture
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: update Generator import to use collections.abc
Co-Authored-By: Chris Weaver <chris@onyx.app>
* refactor: make treat_all_non_attachment_fields_as_metadata a direct required parameter
- Move parameter from connector_config to direct class parameter
- Place parameter right under table_name_or_id argument
- Make parameter required in UI with no default value
- Update tests to use new parameter structure
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: rename _METADATA_FIELD_TYPES to DEFAULT_METADATA_FIELD_TYPES and clarify usage
Co-Authored-By: Chris Weaver <chris@onyx.app>
* chore: fix black formatting in docstring
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: make airtable tests fail loudly on missing env vars
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: add required newline between test functions
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: update error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: fix error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix line length in test file
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: simplify error message pattern in parameter validation test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* test: add type validation test for treat_all_non_attachment_fields_as_metadata
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add missing required parameter in test
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: remove parameter from test to properly validate it is required
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: add type validation for treat_all_non_attachment_fields_as_metadata parameter
Co-Authored-By: Chris Weaver <chris@onyx.app>
* style: fix black formatting in airtable_connector.py
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: update type validation test to handle mypy errors
Co-Authored-By: Chris Weaver <chris@onyx.app>
* fix: specify mypy ignore type for call-arg
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Also handle rows w/o sections
* style: fix black formatting in test assertion
Co-Authored-By: Chris Weaver <chris@onyx.app>
* add TODO
* Remove unnecessary check
* Fix test
* Do not break existing airtable connectors
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
Co-authored-by: Weves <chrisweaver101@gmail.com>
* try using a redis replica in some areas
* harden up replica usage
* comment
* slow down cloud dispatch temporarily
* add ignored syncing list back
* raise multiplier to 8
* comment out per tenant code (no longer used by fanout)
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* WIP
* migrate most beat tasks to fan out strategy
* fix kwargs
* migrate EE tasks
* lock on the task_name level
* typo fix
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* cloud check for migrations
* fix table declaration
* change back interval
* Fix usage of POSTGRES_DEFAULT_SCHEMA
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* signal from the watchdog so that the monitor task doesn't try to clean up before it can exit
* ttl constants
* improve comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Added ability to use a tag to insert the current datetime in prompts
* made tagging logic more robust
* rename
* k
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
* Various fixes/improvements to document counting
* Add new column + index
* Avoid double scan
* comment fixes
* Fix revision history
* Fix IT
* Fix IT
* Fix migration
* Rebase
* Made copy button and cmd+c work for cmd+v and cmd+shift+v
* made sub selections work as well
* ok it works
* fixed npm run build
* im not from earth
* added logging
* more logging
* bye logs
* should work now
* whoops
* added stuff
* made it robust
* ctrl shift v behavior
* WIP
* WIP
* try spinning out check for indexing into a system task
* check for the correct delimiter
* use constants
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Combined Persona and Prompt API
* quality
* added tests
* consolidated models and got rid of redundant fields
* tenant appreciation day
* reverted default
* added missing dependency, missing api key placeholder, updated docs
* Apply black formatting and validate bot token functionality
* acknowledging black formatting
* added the validation to update tokens as well
* Made the token validation errors looks nicer
* getting rif of duplicate dependency
* testing some tweaks based on issues seen with okteto
* shorten session usage in indexing. still a couple of long running sessions to clean up
* merge sessions
* fixing detached session issues
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* prototype tools for handling prod issues
* add some commands
* add batching and dry run options
* custom redis tool
* comment
* default to app config settings for redis
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add index to speed up get last attempt
* use descending order
* put back unique param
* how did this not get formatted?
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* more debugging
* test reacquire outside of loop
* more logging
* move lock_beat test outside the try catch so that we don't worry about testing locks we never took
* use a larger scan_iter value for performance
* batch stale document sync batches
* add debug logging for a particular timeout issue
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Added Permission Syncing for Salesforce
* cleanup
* updated connector doc conversion
* finished salesforce permission syncing
* fixed connector to batch Salesforce queries
* tests!
* k
* Added error handling and check for ee and sync type for postprocessing
* comments
* minor touchups
* tested to work!
* done
* my pie
* lil cleanup
* minor comment
* discord: frontend and backend poll connector
* added requirements for discord installation
* fixed the mypy errors
* process messages not part of any thread
* minor change
* updated the connector; this logic works & am able to docs when i print
* minor change
* ability to enter a start date to pull docs from and refactor
* added the load connector and fixed mypy errors
* local commit test
done!
* minor refactor and properly commented everything
* updated the logic to handle permissions and index active/archived threads
* basic discord test template
* cleanup
* going away with the danswer discord client class ; using an async context manager
* moved to proper folder
* minor fixes
* needs improvement
* fixed discord icon
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
- renamed post-reranking/validation citation information consistently to final_... (example: doc_id_to_rank_map -> final_doc_id_to_rank_map)
- changed and renamed objects containing initial ranking information (now: display_...) consistent with final rankings (final_...). Specifically, {} to [] for displayed_search_results
- for CitationInfo, changed citation_num from 'x-th citation in response stream' to the initial position of the doc [NOTE: test implications]
- changed tests:
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_processing.py
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_substitution.py
* re-prep user group deletion on the actual deletion
* user group needs to be synced to be prepped
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* improve model server logging
* improve exception logging with provider/model names
* get everything into one log line
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* try fixing exception in cloud
* raise beat expiry ... 60 seconds might be starving certain tasks completely
* adjust expiry down to 10 min
* raise concurrency overflow for indexing worker.
* parent pid check
* fix comment
* fix parent pid check, also actually raise an exception from the task if the spawned task exit status is bad
* fix pid check
* some cleanup and task wait fixes
* review fixes
* comment some code so we don't change too many things at once
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* old oauth file left behind
* fix function change that was lost in merge
* fix some testing vars
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* associating credentials with connectors is not considered editing
* formatting
* formatting
* Update credentials.py
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
* temporarily disabling validate indexing fences
* add back a few startup checks in the cloud
* use common vespa client to perform health check
* log vespa url and try using http1 on light worker index methods
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* k
* functional iam auth
* k
* k
* improve typing
* add deployment options
* cleanup
* quick clean up
* minor cleanup
* additional clarity for db session operations
* nit
* k
* k
* update configs
* docker compose spacing
* allow beat tasks to expire. it isn't important that they all run
* validate fences are in a good state and cancel/fail them if not
* add function timings for important beat tasks
* optimize lookups, add lots of comments
* review changes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Mismatch issue of Documents shown and Citation number in text fix
When document order presented to LLM differs from order shown to user, wrong doc numbers are cited.
Fix:
- SearchTool.get_search_result returns now final and initial ranking
- initial ranking is passed through a few objects and used for replacement in citation processing
Notes:
- the citation_num in the CitationInfo() object has not been changed.
* PR fixes
- linting
- removed erroneous tab
- added a substitution test case
- adjusted original citation extraction use case
* Included a key test and
* Fixed extra spaces
* Updated test documentation
Updated:
- test_citation_substitution (changed description)
- test_citation_processing (removed data only relevant for the substitution)
* better handling around index attempts that don't exist and remove unnecessary index attempt deletions
* don't delete index attempts, just update them
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* change text and formatting to guide users away from thinking "Back to Danswer" is a back button
* regular text color and different icon
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* More logging for external group syncing
* Fixed edge case where some spaces were not being fetched
* made refresh frequency for confluence syncs configurable
* clarity
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* cleanup
* comment work in progress
* move some stuff to ee, add some playwright tests for the oauth callback edge cases
* fix ee, fix test name
* fix tests
* code review fixes
* checkpoint
* add celery termination of the task
* rename to RedisConnectorPermissionSyncPayload, add RedisLock to more places, add get_active_search_settings
* rename payload
* pretty sure these weren't named correctly
* testing in progress
* cleanup
* remove space
* merge fix
* three dots animation on Pausing
* improve messaging when connector is stopped or killed and animate buttons
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* use indexing flag in db for manually trigger indexing
* add comment.
* only try to release the lock if we actually succeeded with the lock
* ensure we don't trigger manual indexing on anything but the primary search settings
* comment usage of primary search settings
* run check for indexing immediately after indexing triggers are set
* reorder fix
* all done except routing
* fixed initial changes
* added backend endpoint for duplicating a chat session from Slack
* got chat duplication routing done
* got login routing working
* improved answer handling
* finished all checks
* finished all!
* made sure it works with google oauth
* dont remove that lol
* fixed weird thing
* bad comments
* Add description for Google Gemini models and custom model icons for LiteLLM (OpenAI) proxied models
* Adds Vertex AI aliases for Claude
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* shared admin level test dependency
* change to on - push (recommended by chromatic)
* change playwright reporter to list, name test jobs
* use test tags ... much cleaner
* test vs prod
* try copying templates
* run with localhost?
* revert to dev
* new tests and a bit of refactoring
* add additional checks so that page snapshots reflect loaded state
* more admin tests
* User Management tests
* remaining admin pages
* test search and chat
* await fix and exclude UI that changes with dates.
* test overlapping connectors (but using a source that is way too big and slow, fix that next)
* pass thru secrets
* rename
* rename again
* now we are fixing it
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* standardized escaping of CQL strings
* think i found it
* fix
* should be fixed
* added handling for special linking behavior in confluence
* Update onyx_confluence.py
* Update onyx_confluence.py
---------
Co-authored-by: rkuo-danswer <rkuo@danswer.ai>
* more logs
* this fence should be set to None
* type hinting
* reset deletion attempt if conditions are inconsistent
* always clean up in db if we reach reconciliation
* add reset method
* more logging
* harden up error checking
* Made external permissioned users and slack users show diff
* finished
* Fix typing
* k
* Fix
* k
---------
Co-authored-by: Weves <chrisweaver101@gmail.com>
* initial PoC
* preliminary working config
* first cut at chromatic tests
* first cut at chromatic tests
* fix yaml
* fix yaml again
* use workingDir
* adapt playwright example
* remove env
* fix working directory
* fix more paths
* fix dir
* add playwright setup
* accidentally deleted a step
* update test
* think we don't need home.png right now
* remove unused home.png
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add creator id to cc pair
* fix alembic head
* show email instead of UUID
* safer check on email
* make foreign key relationships optional
* always allow creator to edit (per hagen)
* use primary join
* no index_doc_batch spam
* try this again
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Make curators able to create permission synced connectors
* removed editing permission synced connectors for curators
* updated tests to use access type instead of is_public
* update copy
* in progress PoC
* working limited user, needs routes to be marked next
* make selected endpoint available to limited user role
* xfail on test_slack_prune
* add comment to sync function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* cloud auth referral source
* minor clarity
* k
* minor modification to be best practice
* typing
* Update ReferralSourceSelector.tsx
* Update ReferralSourceSelector.tsx
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* doc_sync is refactored
* maybe this works
* tested to work!
* mypy fixes
* enabled integration tests
* fixed the test
* added external group sync
* testing should work now
* mypy
* confluence doc id fix
* got group sync working
* addressed feedback
* renamed some vars and fixed mypy
* conf fix?
* added wiki handling to confluence connector
* test fixes
* revert google drive connector
* fixed groups
* hotfix
* re-enable helm
* allow manual triggering
* change vespa host
* change vespa chart location
* update Chart.lock
* update ct.yaml with new vespa chart repo
* bump vespa to 0.2.5
* update Chart.lock
* update to vespa 0.2.6
* bump vespa to 0.2.7
* bump to 0.2.8
* bump version
* try appending the ordinal
* try new configmap
* bump vespa
* bump vespa
* add debug to see if we can figure out what ct install thinks is failing
* add debug flag to helm
* try disabling nginx because of KinD
* use helm-extra-set-args
* try command line
* try pointing test connection to the correct service name
* bump vespa to 0.2.12
* update chart.lock
* bump vespa to 0.2.13
* bump vespa to 0.2.14
* bump vespa
* bump vespa
* re-enable chart testing only on changes
* name the check more specifically than "lint-test"
* add some debugging
* try setting remote
* might have to specify chart dirs directly
* add comments
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* k
* clean up test embeddings
* nit
* minor update to ensure consistency
* minor organizational update
* minor updates
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* add provisioning on data plane
* functional but scrappy
* minor cleanup
* minor clean up
* k
* simplify
* update provisioning
* improve import logic
* ensure proper conditional
* minor pydantic update
* minor config update
* nit
* wait for db before allowing worker to proceed (reduces error spam on container startup)
* fix session usage
* rework readiness probe logic to be less confusing and word ongoing probes better
* add vespa probe too
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* refactor RedisConnectorDeletion into RedisConnector
* refactor redis stop and deletion
* port pruning
* nest pruning
* port deletion
* port indexing
* refactor into individual files
* refactor redis connector index to take search settings at init
* move back to debug level log
* refactor doc set and user group (mostly)
* mypy fixes
* make pywikibot store its working files in a system provided temp directory
* move the config setting around
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* refactoring changes
* everything working for service account
* works with service account
* combined scopes
* copy change
* oauth prep
* Works for oauth and service account credentials
* mypy
* merge fixes
* Refactor Google Drive connector
* finished backend
* auth changes
* if its stupid but it works, its not stupid
* npm run dev fixes
* addressed change requests
* string fix
* minor fixes and cleanup
* spacing cleanup
* Update connector.py
* everything done
* testing!
* Delete backend/tests/daily/connectors/google_drive/file_generator.py
* cleaned up
---------
Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
* cleaner initial chat screen
* slightly cleaner animation
* cleaner cards
* use display name + minor updates to models
* minor udpate to ui
* remove logs
* update based on feedback
* minor nits
* formatting
* logging cleanup
* raise vespa_timeout to 15 by default
* implement backoff for document index methods specifically
* do not retry on 400 BAD_REQUEST
* handle RetryError
* actually check status code and fix type errors
* check for index swap
* initial bones
* kk
* k
* k:
* nit
* nit
* rebase + update
* nit
* minior update
* k
* minor integration test fixes
* nit
* ensure we build test docker image
* remove one space
* k
* ensure we wipe volumes
* remove log
* typo
* nit
* k
* k
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* keep index button disabled until indexing is truly finished
* change priority order of tooltips
* should be using the logger from app_base
* if we run out of retries, just mark the doc as modified so it gets synced later
* tighten up the logging ... we know these are ID's
* add logging
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* code review
Error Handling: Add more specific error handling to make it easier to debug issues.
Configuration Management: Use environment variables or a configuration file for settings like DOCUMENT_INDEX_NAME and DOCUMENT_ID_ENDPOINT.
Logging: Improve logging to include more details about the operations.
Retry Mechanism: Add a retry mechanism for network requests to handle transient errors.
Testing: Add unit tests for the functions to ensure they work as expected
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* fix where num_indexing_workers falls back
* remove extra brace
* use with for update instead of serializable
* remove tenant logic handled now by get_session_with_tenant
* remove usage of begin_nested ... it's not necessary
* use native rate limiting in the confluence client
* upgrade urllib3 to v2.2.3 to support retries in confluence client
* improve logging so that progress is visible.
* check last_pruned instead of is_pruning
* try using the ThreadingHTTPServer class for stability and avoiding blocking single-threaded behavior
* add startup delay to web server in test
* just explicitly return None if we can't parse the datetime
* switch to uvicorn for test stability
* try rate limiting through redis
* fix circular import issue
* fix bad formatting of family string
* Revert "fix bad formatting of family string"
This reverts commit be688899e5.
* redis usage optional
* disable test that doesn't match with new design
* fix formatting
* fix poorly structured doc id, fix empty page id, fix family_class_dispatch invalid name (no spaces), fix setting id with int pageid
* fix mediawiki test
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* stash merge (may not function yet)
* remove dead code
* more cleanup
* remove dead file
* we shouldn't be checking for deletion attempts in the db any more
* print cc_pair_id
* print status on status mismatch again
* add logging when cc_pair isn't present
* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active
* add more specific check for deletion completion
* remove flaky mediawiki test site
* move is_pruning
* remove unused code
* remove old function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add tenant provisioning to data plane
* minor typing update
* ensure tenant router included
* proper auth check
* update disabling logic
* validated basic provisioning
* use new kv store
* set broker_connection_retry_on_startup to silence deprecation warning (we're OK with retrying on startup)
* env var for CELERY_BROKER_POOL_LIMIT
* add redis retry on timeout and health check interval
* set socket_keepalive = True
* remove shadow declaration of REDIS_HEALTH_CHECK_INTERVAL, add socket_keepalive_options where possible
* fix mypy complaint
* pass through vars in docker compose
* remove extra '='
* wrap in a try
* Allow config of background concurrency
* Add comment
* Fix light worker
* use backslashes to continue lines in supervisord with bash
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@danswer.ai>
* Added permission sync tests for Slack
* moved folders
* prune test + mypy
* added wait for indexing to cc_pair creation
* commented out check
* should fix other tests
* added slack channel pool
* fixed everything and mypy
* reduced flake
* disable trivy for the moment due to db download flakiness on their end causing the action to fail
* try hardcoding to amazon registry as others have suggested
* checkpoint
* k
* k
* need frontend
* add api key check + ui component
* add proper ports + icons + functions
* k
* k
* k
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* experiment with build and no push
* use slightly more descriptive and consistent tags and names
* name integration test workflow consistently with other workflows
* put the tag back
* try runs-on s3 backend
* try adding runs-on cache
* add with key
* add a dummy path
* forget about multiline
* maybe we don't need runs-on cache immediately
* lower ram slightly, name test with a version bump
* don't need to explicitly include runs-on/cache for docker caching
* comment out flaky portion of knowledge chat test
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Xenforo forum parser support
* clarify ssl cert reqs
* missed a file
* add isLoadState function, fix up xenforo for data driven connector approach
* fixing a new edge case to skip an unexpected parsed element
* change documentsource to xenforo
* make doc id unique and comment what's happening
* remove stray log line
* address code review
---------
Co-authored-by: sime2408 <simun.sunjic@gmail.com>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* addressing code review
* fix import
* fix prune_documents_task references
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* rename classes and ignore deprecation warnings we mostly don't have control over
* copy pytest.ini
* ignore CryptographyDeprecationWarning
* fully qualify the warning
* test self hosted runner
* update more docker builds with self hosted runner
* convert everything to runs-on (except web container)
* try upping the RAM for future flake proofing
* initial Asana connector
* hint on how to get Asana workspace ID
* re-format with black
* re-order imports
* update asana connector for clarity
* minor robustification
* minor update to naming
* update for best practice
* update connector
---------
Co-authored-by: Daniel Naber <naber@danielnaber.de>
* Added permission syncing on the backend
* Rewored to work with celery
alembic fix
fixed test
* frontend changes
* got groups working
* added comments and fixed public docs
* fixed merge issues
* frontend complete!
* frontend cleanup and mypy fixes
* refactored connector access_type selection
* mypy fixes
* minor refactor and frontend improvements
* get to fetch
* renames and comments
* minor change to var names
* got curator stuff working
* addressed pablo's comments
* refactored user_external_group to reference users table
* implemented polling
* small refactor
* fixed a whoopsies on the frontend
* added scripts to seed dummy docs and test query times
* fixed frontend build issue
* alembic fix
* handled is_public overlap
* yuhong feedback
* added more checks for sync
* black
* mypy
* fixed circular import
* todos
* alembic fix
* alembic
* add pip retries to the github workflows too
* let's try running on amd64 ... docker builds are unusually flaky
* bump
* try large
* no yaml anchors
* switch back down to Amd64
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Deleting a connector should redirect to the indexing status page
* minor update to dev background jobs
* update refresh logic
* remove print statement
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* add db refresh to connector deletion
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* use separate database number for celery result backend
* add comments
* add env var for celery's result_expires
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Move StandardAnswer to EE section of danswer/db/models
* Move StandardAnswer DB layer to EE
* Add EERequiredError for distinct error handling here
* Handle EE fallback for slack bot config
* Migrate all standard answer models to ee
* Flagging categories for removal
* Add missing versioned impl for update_slack_bot_config
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* persona
* all prepared excluding configuration
* more sensical model structure
* update tstream
* type updates
* rm
* quick and simple updates
* minor updates
* te
* ensure typing + naming
* remove old todo + rebase update
* remove unnecessary check
* allow setting of CORS origin
* simplify
* add environment variable + rename
* slightly more efficient
* simplify so mypy doens't complain
* temp
* go back to my preferred formatting
* make it impossible to switch to non-image
* revert ports
* proper provider support
* remove unused imports
* minor rename
* simplify interface
* remove logs
* migration: add column "match_any_keywords" to StandardAnswer
* Implement any/all keyword matching for standard answers
* Add match_any_keywords to non-searchable fields
* Remove stray print
* Simplify Slack messages for any and all cases
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Migrate standard answers implementations to ee/
* renaming
* Clean up slackbot non-ee standard answers import
* Move backend api/manage/standard_answer route to ee
* Move standard answers web UI to ee
* Hide standard answer controls in bot edit page
* Kwargs for fetch_versioned_implementation
* Add docstring explaining return types for handle_standard_answers
* Consolidate blocks into ee/handle_standard_answers
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Support regex in standard answers
* fix mypy
* Add match_regex boolean column to StandardAnswer
* Add match_regex flag and validation to Pydantic models
* GET /manage/admin/standard-answer: add match_regex to create_standard_answer
* PATCH /manage/admin/standard-answer/🆔 add match_regex to update_standard_answer
* Add "Match Regex" toggle to standard answer form
* Decode error pattern in case it's bytes
* Refactor regex support to use match_regex flag instead of supplemental tuple
* Better error handling for invalid regexes
* Show "match regex" in table and style keywords appropriately
* Fix stale UI copy for non-"match_regex" branch
* Fix stale docstring in find_matching_standard_answers
* Update down_revision to reflect most recent migration
* Update UI copy
* Initial implementation of match group display
* Fix pydantic StandardAnswer vs SQLAlchemy StandardAnswer model usage
* Update docstring return type
* Fix missing key prop
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Reorder and clarify dependency installation instructions
* Clarify instructions for local development with Docker external deps vs full Docker stack
* Final words at the end of the local setup process
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Fail instead of continuing if vespa cannot be reached within the timeout period
* improve startup readability
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Add user when they interact outside of UI (e.g. Slack bot)
* fix mypy errors
* don't use user manager to avoid async messiness
* fix email is none scenario
* fix mypy
* make code slightly clearer
* PR comments
* get slack email in generate button as well
* fix alembic migration
* update name to be more descriptive
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
The commit skips reading 'external_object_instance_page' blocks in the NotionConnector due to the lack of support in the Notion API. This change is in response to the issue #1761.
Co-authored-by: Cola Chen <6825116+colachg@users.noreply.github.com>
* validate web list
* update pdf extraction of metadat
* remove pdf + log
* stricter type enforcing
* fix up indexing widths
* minor formatting
* add list case
* check for empty metadata
* first cut at redis
* fix startup dependencies on redis
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* update contributing guide
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Added pagination to individual connector pages
* I cooked
* Gordon Ramsay in this b
* meepe
* properly calculated max chunk and switch dict to array
* chunks -> batches
* increased max page size
* renmaed var
* initial commit
* almost done
* finished 3 tests
* minor refactor
* built out initial permisison tests
* reworked test_deletion
* removed logging
* all original tests have been converted
* renamed user_groups to user_group
* mypy
* added test for doc set permissions
* unified naming for manager methods
* Refactored models and added new deletion test
* minor additions
* better logging+fixed input variables
* commented out failed tests
* Added readme
* readme update
* Added auth to IT
set auth_type to basic and require_email_verification to false
* Update run-it.yml
* used verify and added to readme
* added api key manager
* get accurate model output max
* squash
* udpated max default tokens
* rename + use fallbacks
* functional
* remove max tokens
* update naming
* comment out function to prevent mypy issues
* ran bump-pydantic
* replace root_validator with model_validator
* mostly working. some alternate assistant error. changed root_validator and typing_extensions
* working generation chat. changed type
* replacing .dict with .model_dump
* argument needed to bring model_dump up to parity with dict()
* fix a fewremaining issues -- working with llama and gpt
* updating requirements file
* more requirement updates
* more requirement updates
* fix to make search work
* return type fix:
* half way tpyes change
* fixes for mypy and pydantic:
* endpoint fix
* fix pydantic protected namespaces
* it works!
* removed unecessary None initializations
* better logging
* changed default values to empty lists
* mypy fixes
* fixed array defaulting
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* add new user provider hook
* account for additional logic
* add users
* remove is loading
* Curator polish
* useeffect -> provider + effect
* squash
* use use user for user default models
* squash
* Added ability to add users to groups among other things
* final polish
* added connection button to groups
* mypy fix
* Improved document set clarity
* string fixes
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Added backend support for curator role
* modal refactor
* finalized first 2 commits
same as before
finally
what was it for
* added credential, cc_pair, and cleanup
mypy is super helpful hahahahahahahahahahahaha
* curator support for personas
* added connector management permission checks
* fixed the connector creation flow
* added document access to curator
* small cleanup added comments and started ui
* groups and assistant editor
* Persona frontend
* Document set frontend
* cleaned up the entire frontend
* alembic fix
* Minor fixes
* credentials section
* some credential updates
* removed logging statements
* fixed try catch
* fixed model name
* made everything happen in one db commit
* Final cleanup
* cleaned up fast code
* mypy/build fixes
* polish
* more token rate limit polish
* fixed weird credential permissions
* Addressed chris feedback
* addressed pablo feedback
* fixed alembic
* removed deduping and caching
* polish!!!!
* add regenerate
* functional once again post rebase but quite ugly
* validated + cleaner UI
* more robust implementation for first messages
* squash
* remove parameter
* proper margin
* clarify for future programmers
* remove some logs
* self nit pick - smoother ux
* more self-nits
* stroke line cap
* rebase
* support indexing attachments as separate docs when not part of a page
* fix time filter, fix batch handling, fix returned number of attachments processed
* backend changes to handle partial completion of index attempts
* typo fix
* Display partial success in UI
* make log timing more readable by limiting printed precision to milliseconds
* forgot alembic
* initial cut at "completed with errors" indexing
* remove and reorganize unused imports
* show view errors while indexing is in progress
* code review fixes
* add ux improvements
* add danswer version display
* show version properly
* improve copy + add web version to settings context
* update copy + danswer version
* stopgap: clarify text on standard answer page for improved UX
* replce apostrophe
* using tailwind:
---------
Co-authored-by: Jos Van der westhuizen <jos@danser.ai>
* allow admin role api keys
* bump to rerun deployment
* types needs explicit export now for APIKey
* remove api_key.role, use User.role instead
* fix formatting
* formatting
* formatting
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add send-message-simple-with-history endpoint to support ramp. avoids bad json output in models and allows client to pass history in instead of maintaining it in our own session
* slightly better error checking
* addressing code review
* reject on any empty message
* update test naming
* avoid reindexing secondary indexes after they succeed
* use postgres application names to facilitate connection debugging
* centralize all postgres application_name constants in the constants file
* missed a couple of files
* mypy fixes
* update dev background script
* also allow access to a persona if the user is in the list of authorized users or groups
* add comment on potential performance improvements
* work around for mypy typing
My website (https://shukantpal.com) uses Let's Encrypt certificates, which aren't accepted by the Python urllib certificate verifier for some reason. My website is set up correctly otherwise (https://www.sslshopper.com/ssl-checker.html#hostname=www.shukantpal.com)
This change adds a fix so the correct traceback is shown in Danswer, instead of a generic "unable to connect, check your Internet connection".
* Added ability to control LLM access based on group
* completed relationship deletion
* cleaned up function
* added comments
* fixed frontend strings
* mypy fixes
* added case handling for deletion of user groups
* hidden advanced options now
* removed unnecessary code
- cron:'0 11 * * *'# Runs every day at 3 AM PST / 4 AM PDT / 11 AM UTC
permissions:
# contents: write # only for delete-branch option
issues:write
pull-requests:write
jobs:
stale:
runs-on:ubuntu-latest
steps:
- uses:actions/stale@v9
with:
stale-issue-message:'This issue is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.'
stale-pr-message:'This PR is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.'
close-issue-message:'This issue was closed because it has been stalled for 90 days with no activity.'
close-pr-message:'This PR was closed because it has been stalled for 90 days with no activity.'
days-before-stale:75
# days-before-close: 90 # uncomment after we test stale behavior
# Copy this file to .env at the base of the repo and fill in the <REPLACE THIS> values
# This will help with development iteration speed and reduce repeat tasks for dev
# Copy this file to .env in the .vscode folder
# Fill in the <REPLACE THIS> values as needed, it is recommended to set the GEN_AI_API_KEY value to avoid having to set up an LLM in the UI
# Also check out danswer/backend/scripts/restart_containers.sh for a script to restart the containers which Danswer relies on outside of VSCode/Cursor processes
# For local dev, often user Authentication is not needed
AUTH_TYPE=disabled
# Skip warm up for dev
SKIP_WARM_UP=True
# Always keep these on for Dev
# Logs all model prompts to stdout
@@ -15,7 +17,7 @@ LOG_LEVEL=debug
# This passes top N results to LLM an additional time for reranking prior to answer generation
# This step is quite heavy on token usage so we disable it for dev generally
DISABLE_LLM_CHUNK_FILTER=True
DISABLE_LLM_DOC_RELEVANCE=False
# Useful if you want to toggle auth on/off (google_oauth/OIDC specifically)
- [Nginx](https://nginx.org/) (Not needed for development flows generally)
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
development purposes but also feel free to just use the containers and update with local changes by providing the
`--build` flag.
> **Note:**
> This guide provides instructions to build and run Onyx locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Onyx stack within Docker below.
### Local Set Up
It is recommended to use Python version 3.11
Be sure to use Python version 3.11. For instructions on installing Python 3.11 on macOS, refer to the [CONTRIBUTING_MACOS.md](./CONTRIBUTING_MACOS.md) readme.
If using a lower version, modifications will have to be made to the code.
If using a higher version, the version of Tensorflow we use may not be available for your platform.
If using a higher version, sometimes some libraries will not be available (i.e. we had problems with Tensorflow in the past with higher versions of python).
#### Backend: Python requirements
#### Installing Requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
```bash
python -m venv .venv
source .venv/bin/activate
```
--> Note that this virtual environment MUST NOT be set up WITHIN the danswer
directory
> **Note:**
> This virtual environment MUST NOT be set up WITHIN the onyx directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the onyx directory.
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
```
If using PowerShell, the command slightly differs:
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
$env:AUTH_TYPE='disabled'
uvicorn danswer.main:app --reload --port 8080
uvicorn onyx.main:app --reload --port 8080
"
```
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
> **Note:**
> If you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
#### Wrapping up
You should now have 4 servers running:
- Web server
- Backend API
- Model server
- Background jobs
Now, visit `http://localhost:3000` in your browser. You should see the Onyx onboarding wizard where you can connect your external LLM provider to Onyx.
You've successfully set up a local Onyx instance! 🏁
#### Running the Onyx application in a container
You can run the full Onyx application stack from pre-built images including all external software dependencies.
Navigate to `onyx/deployment/docker_compose` and run:
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
Then, from the `danswer/backend` directory, run:
```bash
pre-commit install
docker compose -f docker-compose.dev.yml -p onyx-stack up -d
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we would like to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Onyx.
If you want to make changes to Onyx and run those changes in Docker, you can also build a local version of the Onyx container images that incorporates your changes like so:
#### Web
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `danswer/web` directory.
To run the formatter, use `npx prettier --write .` from the `danswer/web` directory.
Please double check that prettier passes before creating a pull request.
```bash
docker compose -f docker-compose.dev.yml -p onyx-stack up -d --build
```
### Release Process
Danswer follows the semver versioning standard.
Onyx loosely follows the SemVer versioning standard.
Major changes are released with a "minor" version bump. Currently we use patch release versions to indicate small feature changes.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
You can see the containers [here](https://hub.docker.com/search?q=onyx%2F).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/onyx-dot-app/onyx/blob/main/CONTRIBUTING.md).
### Setting up Python
Ensure [Homebrew](https://brew.sh/) is already set up.
Then install python 3.11.
```bash
brew install python@3.11
```
Add python 3.11 to your path: add the following line to ~/.zshrc
@@ -2,9 +2,9 @@ Copyright (c) 2023-present DanswerAI, Inc.
Portions of this software are licensed as follows:
* All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
* All third party components incorporated into the Danswer Software are licensed under the original license provided by the owner of the applicable component.
* Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
- All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
- All third party components incorporated into the Onyx Software are licensed under the original license provided by the owner of the applicable component.
- Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
**To try it out for free and get started in seconds, check out [Onyx Cloud](https://cloud.onyx.app/signup)**.
Danswer can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
Onyx can also be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.onyx.app/quickstart) to learn more.
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/danswer-ai/danswer/tree/main/deployment/kubernetes).
We also have built-in support for high-availability/scalable deployment on Kubernetes.
* Chat UI with the ability to select documents to chat with.
*Create custom AI Assistants with different prompts and backing knowledge sets.
*Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
*Document Search + AI Answers for natural language queries.
* Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
* Slack integration to get answers and search results directly in Slack.
## 🔍 Other Notable Benefits of Onyx
- Custom deep learning models for indexing and inference time, only through Onyx + learning from user feedback.
-Flexible security features like SSO (OIDC/SAML/OAuth2), RBAC, encryption of credentials, etc.
-Knowledge curation features like document-sets, query history, usage analytics, etc.
-Scalable deployment options tested up to many tens of thousands users and hundreds of millions of documents.
## 🚧 Roadmap
*Chat/Prompt sharing with specific teammates and user groups.
* Multi-Model model support, chat with images, video etc.
*Choosing between LLMs and parameters during chat session.
*Tool calling and agent configurations options.
*Organizational understanding and ability to locate and suggest experts from your team.
## Other Noteable Benefits of Danswer
* User Authentication with document level access management.
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* Admin Dashboard to configure connectors, document-sets, access, etc.
* Custom deep learning models + learn from user feedback.
* Easy deployment and ability to host Danswer anywhere of your choosing.
-New methods in information retrieval (StructRAG, LightGraphRAG, etc.)
- Personalized Search
-Organizational understanding and ability to locate and suggest experts from your team.
-Code Search
-SQL and Structured Query Language
## 🔌 Connectors
Efficiently pulls the latest changes from:
* Slack
* GitHub
* Google Drive
* Confluence
* Jira
* Zendesk
* Gmail
* Notion
* Gong
* Slab
* Linear
* Productboard
* Guru
* Bookstack
* Document360
* Sharepoint
* Hubspot
* Local Files
* Websites
* And more ...
Keep knowledge and access up to sync across 40+ connectors:
## 📚 Editions
- Google Drive
- Confluence
- Slack
- Gmail
- Salesforce
- Microsoft Sharepoint
- Github
- Jira
- Zendesk
- Gong
- Microsoft Teams
- Dropbox
- Local Files
- Websites
- And more ...
There are two editions of Danswer:
See the full list [here](https://docs.onyx.app/connectors).
* Danswer Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Danswer you will get if you follow the Deployment guide above.
* Danswer Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
* Single Sign-On (SSO), with support for both SAML and OIDC
* Role-based access control
* Document permission inheritance from connected sources
* Usage analytics and query history accessible to admins
* Whitelabeling
* API key authentication
* Encryption of secrets
* Any many more! Checkout [our website](https://www.danswer.ai/) for the latest.
To try the Danswer Enterprise Edition:
## 📚 Licensing
There are two editions of Onyx:
- Onyx Community Edition (CE) is available freely under the MIT Expat license. Simply follow the Deployment guide above.
- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations.
For feature details, check out [our website](https://www.onyx.app/pricing).
2. For self-hosting the Enterprise Edition, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/onyx/founders).
2. For self-hosting, contact us at [founders@danswer.ai](mailto:founders@danswer.ai) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.