* temporarily disabling validate indexing fences
* add back a few startup checks in the cloud
* use common vespa client to perform health check
* log vespa url and try using http1 on light worker index methods
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* k
* functional iam auth
* k
* k
* improve typing
* add deployment options
* cleanup
* quick clean up
* minor cleanup
* additional clarity for db session operations
* nit
* k
* k
* update configs
* docker compose spacing
* allow beat tasks to expire. it isn't important that they all run
* validate fences are in a good state and cancel/fail them if not
* add function timings for important beat tasks
* optimize lookups, add lots of comments
* review changes
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* early cut at google drive oauth
* second pass
* switch to production uri's
* try handling oauth_interactive differently
* pass through client id and secret if uploaded
* fix call
* fix test
* temporarily disable check for testing
* Revert "temporarily disable check for testing"
This reverts commit 4b5a022a5f.
* support visibility in test
* missed file
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* Mismatch issue of Documents shown and Citation number in text fix
When document order presented to LLM differs from order shown to user, wrong doc numbers are cited.
Fix:
- SearchTool.get_search_result returns now final and initial ranking
- initial ranking is passed through a few objects and used for replacement in citation processing
Notes:
- the citation_num in the CitationInfo() object has not been changed.
* PR fixes
- linting
- removed erroneous tab
- added a substitution test case
- adjusted original citation extraction use case
* Included a key test and
* Fixed extra spaces
* Updated test documentation
Updated:
- test_citation_substitution (changed description)
- test_citation_processing (removed data only relevant for the substitution)
* better handling around index attempts that don't exist and remove unnecessary index attempt deletions
* don't delete index attempts, just update them
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* change text and formatting to guide users away from thinking "Back to Danswer" is a back button
* regular text color and different icon
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* More logging for external group syncing
* Fixed edge case where some spaces were not being fetched
* made refresh frequency for confluence syncs configurable
* clarity
* first cut at slack oauth flow
* fix usage of hooks
* fix button spacing
* add additional error logging
* no dev redirect
* cleanup
* comment work in progress
* move some stuff to ee, add some playwright tests for the oauth callback edge cases
* fix ee, fix test name
* fix tests
* code review fixes
* checkpoint
* add celery termination of the task
* rename to RedisConnectorPermissionSyncPayload, add RedisLock to more places, add get_active_search_settings
* rename payload
* pretty sure these weren't named correctly
* testing in progress
* cleanup
* remove space
* merge fix
* three dots animation on Pausing
* improve messaging when connector is stopped or killed and animate buttons
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* use indexing flag in db for manually trigger indexing
* add comment.
* only try to release the lock if we actually succeeded with the lock
* ensure we don't trigger manual indexing on anything but the primary search settings
* comment usage of primary search settings
* run check for indexing immediately after indexing triggers are set
* reorder fix
* all done except routing
* fixed initial changes
* added backend endpoint for duplicating a chat session from Slack
* got chat duplication routing done
* got login routing working
* improved answer handling
* finished all checks
* finished all!
* made sure it works with google oauth
* dont remove that lol
* fixed weird thing
* bad comments
* Add description for Google Gemini models and custom model icons for LiteLLM (OpenAI) proxied models
* Adds Vertex AI aliases for Claude
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* shared admin level test dependency
* change to on - push (recommended by chromatic)
* change playwright reporter to list, name test jobs
* use test tags ... much cleaner
* test vs prod
* try copying templates
* run with localhost?
* revert to dev
* new tests and a bit of refactoring
* add additional checks so that page snapshots reflect loaded state
* more admin tests
* User Management tests
* remaining admin pages
* test search and chat
* await fix and exclude UI that changes with dates.
* test overlapping connectors (but using a source that is way too big and slow, fix that next)
* pass thru secrets
* rename
* rename again
* now we are fixing it
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* standardized escaping of CQL strings
* think i found it
* fix
* should be fixed
* added handling for special linking behavior in confluence
* Update onyx_confluence.py
* Update onyx_confluence.py
---------
Co-authored-by: rkuo-danswer <rkuo@danswer.ai>
* more logs
* this fence should be set to None
* type hinting
* reset deletion attempt if conditions are inconsistent
* always clean up in db if we reach reconciliation
* add reset method
* more logging
* harden up error checking
* Made external permissioned users and slack users show diff
* finished
* Fix typing
* k
* Fix
* k
---------
Co-authored-by: Weves <chrisweaver101@gmail.com>
* initial PoC
* preliminary working config
* first cut at chromatic tests
* first cut at chromatic tests
* fix yaml
* fix yaml again
* use workingDir
* adapt playwright example
* remove env
* fix working directory
* fix more paths
* fix dir
* add playwright setup
* accidentally deleted a step
* update test
* think we don't need home.png right now
* remove unused home.png
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add creator id to cc pair
* fix alembic head
* show email instead of UUID
* safer check on email
* make foreign key relationships optional
* always allow creator to edit (per hagen)
* use primary join
* no index_doc_batch spam
* try this again
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Make curators able to create permission synced connectors
* removed editing permission synced connectors for curators
* updated tests to use access type instead of is_public
* update copy
* in progress PoC
* working limited user, needs routes to be marked next
* make selected endpoint available to limited user role
* xfail on test_slack_prune
* add comment to sync function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* cloud auth referral source
* minor clarity
* k
* minor modification to be best practice
* typing
* Update ReferralSourceSelector.tsx
* Update ReferralSourceSelector.tsx
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* doc_sync is refactored
* maybe this works
* tested to work!
* mypy fixes
* enabled integration tests
* fixed the test
* added external group sync
* testing should work now
* mypy
* confluence doc id fix
* got group sync working
* addressed feedback
* renamed some vars and fixed mypy
* conf fix?
* added wiki handling to confluence connector
* test fixes
* revert google drive connector
* fixed groups
* hotfix
* re-enable helm
* allow manual triggering
* change vespa host
* change vespa chart location
* update Chart.lock
* update ct.yaml with new vespa chart repo
* bump vespa to 0.2.5
* update Chart.lock
* update to vespa 0.2.6
* bump vespa to 0.2.7
* bump to 0.2.8
* bump version
* try appending the ordinal
* try new configmap
* bump vespa
* bump vespa
* add debug to see if we can figure out what ct install thinks is failing
* add debug flag to helm
* try disabling nginx because of KinD
* use helm-extra-set-args
* try command line
* try pointing test connection to the correct service name
* bump vespa to 0.2.12
* update chart.lock
* bump vespa to 0.2.13
* bump vespa to 0.2.14
* bump vespa
* bump vespa
* re-enable chart testing only on changes
* name the check more specifically than "lint-test"
* add some debugging
* try setting remote
* might have to specify chart dirs directly
* add comments
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* k
* clean up test embeddings
* nit
* minor update to ensure consistency
* minor organizational update
* minor updates
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* add provisioning on data plane
* functional but scrappy
* minor cleanup
* minor clean up
* k
* simplify
* update provisioning
* improve import logic
* ensure proper conditional
* minor pydantic update
* minor config update
* nit
* wait for db before allowing worker to proceed (reduces error spam on container startup)
* fix session usage
* rework readiness probe logic to be less confusing and word ongoing probes better
* add vespa probe too
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* refactor RedisConnectorDeletion into RedisConnector
* refactor redis stop and deletion
* port pruning
* nest pruning
* port deletion
* port indexing
* refactor into individual files
* refactor redis connector index to take search settings at init
* move back to debug level log
* refactor doc set and user group (mostly)
* mypy fixes
* make pywikibot store its working files in a system provided temp directory
* move the config setting around
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* refactoring changes
* everything working for service account
* works with service account
* combined scopes
* copy change
* oauth prep
* Works for oauth and service account credentials
* mypy
* merge fixes
* Refactor Google Drive connector
* finished backend
* auth changes
* if its stupid but it works, its not stupid
* npm run dev fixes
* addressed change requests
* string fix
* minor fixes and cleanup
* spacing cleanup
* Update connector.py
* everything done
* testing!
* Delete backend/tests/daily/connectors/google_drive/file_generator.py
* cleaned up
---------
Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
* cleaner initial chat screen
* slightly cleaner animation
* cleaner cards
* use display name + minor updates to models
* minor udpate to ui
* remove logs
* update based on feedback
* minor nits
* formatting
* logging cleanup
* raise vespa_timeout to 15 by default
* implement backoff for document index methods specifically
* do not retry on 400 BAD_REQUEST
* handle RetryError
* actually check status code and fix type errors
* check for index swap
* initial bones
* kk
* k
* k:
* nit
* nit
* rebase + update
* nit
* minior update
* k
* minor integration test fixes
* nit
* ensure we build test docker image
* remove one space
* k
* ensure we wipe volumes
* remove log
* typo
* nit
* k
* k
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* keep index button disabled until indexing is truly finished
* change priority order of tooltips
* should be using the logger from app_base
* if we run out of retries, just mark the doc as modified so it gets synced later
* tighten up the logging ... we know these are ID's
* add logging
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* code review
Error Handling: Add more specific error handling to make it easier to debug issues.
Configuration Management: Use environment variables or a configuration file for settings like DOCUMENT_INDEX_NAME and DOCUMENT_ID_ENDPOINT.
Logging: Improve logging to include more details about the operations.
Retry Mechanism: Add a retry mechanism for network requests to handle transient errors.
Testing: Add unit tests for the functions to ensure they work as expected
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* fix where num_indexing_workers falls back
* remove extra brace
* use with for update instead of serializable
* remove tenant logic handled now by get_session_with_tenant
* remove usage of begin_nested ... it's not necessary
* use native rate limiting in the confluence client
* upgrade urllib3 to v2.2.3 to support retries in confluence client
* improve logging so that progress is visible.
* check last_pruned instead of is_pruning
* try using the ThreadingHTTPServer class for stability and avoiding blocking single-threaded behavior
* add startup delay to web server in test
* just explicitly return None if we can't parse the datetime
* switch to uvicorn for test stability
* try rate limiting through redis
* fix circular import issue
* fix bad formatting of family string
* Revert "fix bad formatting of family string"
This reverts commit be688899e5.
* redis usage optional
* disable test that doesn't match with new design
* fix formatting
* fix poorly structured doc id, fix empty page id, fix family_class_dispatch invalid name (no spaces), fix setting id with int pageid
* fix mediawiki test
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* stash merge (may not function yet)
* remove dead code
* more cleanup
* remove dead file
* we shouldn't be checking for deletion attempts in the db any more
* print cc_pair_id
* print status on status mismatch again
* add logging when cc_pair isn't present
* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active
* add more specific check for deletion completion
* remove flaky mediawiki test site
* move is_pruning
* remove unused code
* remove old function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add tenant provisioning to data plane
* minor typing update
* ensure tenant router included
* proper auth check
* update disabling logic
* validated basic provisioning
* use new kv store
* set broker_connection_retry_on_startup to silence deprecation warning (we're OK with retrying on startup)
* env var for CELERY_BROKER_POOL_LIMIT
* add redis retry on timeout and health check interval
* set socket_keepalive = True
* remove shadow declaration of REDIS_HEALTH_CHECK_INTERVAL, add socket_keepalive_options where possible
* fix mypy complaint
* pass through vars in docker compose
* remove extra '='
* wrap in a try
* Allow config of background concurrency
* Add comment
* Fix light worker
* use backslashes to continue lines in supervisord with bash
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@danswer.ai>
* Added permission sync tests for Slack
* moved folders
* prune test + mypy
* added wait for indexing to cc_pair creation
* commented out check
* should fix other tests
* added slack channel pool
* fixed everything and mypy
* reduced flake
* disable trivy for the moment due to db download flakiness on their end causing the action to fail
* try hardcoding to amazon registry as others have suggested
* checkpoint
* k
* k
* need frontend
* add api key check + ui component
* add proper ports + icons + functions
* k
* k
* k
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* experiment with build and no push
* use slightly more descriptive and consistent tags and names
* name integration test workflow consistently with other workflows
* put the tag back
* try runs-on s3 backend
* try adding runs-on cache
* add with key
* add a dummy path
* forget about multiline
* maybe we don't need runs-on cache immediately
* lower ram slightly, name test with a version bump
* don't need to explicitly include runs-on/cache for docker caching
* comment out flaky portion of knowledge chat test
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Xenforo forum parser support
* clarify ssl cert reqs
* missed a file
* add isLoadState function, fix up xenforo for data driven connector approach
* fixing a new edge case to skip an unexpected parsed element
* change documentsource to xenforo
* make doc id unique and comment what's happening
* remove stray log line
* address code review
---------
Co-authored-by: sime2408 <simun.sunjic@gmail.com>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* addressing code review
* fix import
* fix prune_documents_task references
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* rename classes and ignore deprecation warnings we mostly don't have control over
* copy pytest.ini
* ignore CryptographyDeprecationWarning
* fully qualify the warning
* test self hosted runner
* update more docker builds with self hosted runner
* convert everything to runs-on (except web container)
* try upping the RAM for future flake proofing
* initial Asana connector
* hint on how to get Asana workspace ID
* re-format with black
* re-order imports
* update asana connector for clarity
* minor robustification
* minor update to naming
* update for best practice
* update connector
---------
Co-authored-by: Daniel Naber <naber@danielnaber.de>
* Added permission syncing on the backend
* Rewored to work with celery
alembic fix
fixed test
* frontend changes
* got groups working
* added comments and fixed public docs
* fixed merge issues
* frontend complete!
* frontend cleanup and mypy fixes
* refactored connector access_type selection
* mypy fixes
* minor refactor and frontend improvements
* get to fetch
* renames and comments
* minor change to var names
* got curator stuff working
* addressed pablo's comments
* refactored user_external_group to reference users table
* implemented polling
* small refactor
* fixed a whoopsies on the frontend
* added scripts to seed dummy docs and test query times
* fixed frontend build issue
* alembic fix
* handled is_public overlap
* yuhong feedback
* added more checks for sync
* black
* mypy
* fixed circular import
* todos
* alembic fix
* alembic
* add pip retries to the github workflows too
* let's try running on amd64 ... docker builds are unusually flaky
* bump
* try large
* no yaml anchors
* switch back down to Amd64
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Deleting a connector should redirect to the indexing status page
* minor update to dev background jobs
* update refresh logic
* remove print statement
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* add db refresh to connector deletion
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* use separate database number for celery result backend
* add comments
* add env var for celery's result_expires
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Move StandardAnswer to EE section of danswer/db/models
* Move StandardAnswer DB layer to EE
* Add EERequiredError for distinct error handling here
* Handle EE fallback for slack bot config
* Migrate all standard answer models to ee
* Flagging categories for removal
* Add missing versioned impl for update_slack_bot_config
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* persona
* all prepared excluding configuration
* more sensical model structure
* update tstream
* type updates
* rm
* quick and simple updates
* minor updates
* te
* ensure typing + naming
* remove old todo + rebase update
* remove unnecessary check
* allow setting of CORS origin
* simplify
* add environment variable + rename
* slightly more efficient
* simplify so mypy doens't complain
* temp
* go back to my preferred formatting
* make it impossible to switch to non-image
* revert ports
* proper provider support
* remove unused imports
* minor rename
* simplify interface
* remove logs
* migration: add column "match_any_keywords" to StandardAnswer
* Implement any/all keyword matching for standard answers
* Add match_any_keywords to non-searchable fields
* Remove stray print
* Simplify Slack messages for any and all cases
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Migrate standard answers implementations to ee/
* renaming
* Clean up slackbot non-ee standard answers import
* Move backend api/manage/standard_answer route to ee
* Move standard answers web UI to ee
* Hide standard answer controls in bot edit page
* Kwargs for fetch_versioned_implementation
* Add docstring explaining return types for handle_standard_answers
* Consolidate blocks into ee/handle_standard_answers
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Support regex in standard answers
* fix mypy
* Add match_regex boolean column to StandardAnswer
* Add match_regex flag and validation to Pydantic models
* GET /manage/admin/standard-answer: add match_regex to create_standard_answer
* PATCH /manage/admin/standard-answer/🆔 add match_regex to update_standard_answer
* Add "Match Regex" toggle to standard answer form
* Decode error pattern in case it's bytes
* Refactor regex support to use match_regex flag instead of supplemental tuple
* Better error handling for invalid regexes
* Show "match regex" in table and style keywords appropriately
* Fix stale UI copy for non-"match_regex" branch
* Fix stale docstring in find_matching_standard_answers
* Update down_revision to reflect most recent migration
* Update UI copy
* Initial implementation of match group display
* Fix pydantic StandardAnswer vs SQLAlchemy StandardAnswer model usage
* Update docstring return type
* Fix missing key prop
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Reorder and clarify dependency installation instructions
* Clarify instructions for local development with Docker external deps vs full Docker stack
* Final words at the end of the local setup process
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Fail instead of continuing if vespa cannot be reached within the timeout period
* improve startup readability
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Add user when they interact outside of UI (e.g. Slack bot)
* fix mypy errors
* don't use user manager to avoid async messiness
* fix email is none scenario
* fix mypy
* make code slightly clearer
* PR comments
* get slack email in generate button as well
* fix alembic migration
* update name to be more descriptive
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
The commit skips reading 'external_object_instance_page' blocks in the NotionConnector due to the lack of support in the Notion API. This change is in response to the issue #1761.
Co-authored-by: Cola Chen <6825116+colachg@users.noreply.github.com>
* validate web list
* update pdf extraction of metadat
* remove pdf + log
* stricter type enforcing
* fix up indexing widths
* minor formatting
* add list case
* check for empty metadata
* first cut at redis
* fix startup dependencies on redis
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* update contributing guide
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Added pagination to individual connector pages
* I cooked
* Gordon Ramsay in this b
* meepe
* properly calculated max chunk and switch dict to array
* chunks -> batches
* increased max page size
* renmaed var
- cron:'0 11 * * *'# Runs every day at 3 AM PST / 4 AM PDT / 11 AM UTC
permissions:
# contents: write # only for delete-branch option
issues:write
pull-requests:write
jobs:
stale:
runs-on:ubuntu-latest
steps:
- uses:actions/stale@v9
with:
stale-issue-message:'This issue is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.'
stale-pr-message:'This PR is stale because it has been open 75 days with no activity. Remove stale label or comment or this will be closed in 15 days.'
close-issue-message:'This issue was closed because it has been stalled for 90 days with no activity.'
close-pr-message:'This PR was closed because it has been stalled for 90 days with no activity.'
days-before-stale:75
# days-before-close: 90 # uncomment after we test stale behavior
# Copy this file to .env at the base of the repo and fill in the <REPLACE THIS> values
# This will help with development iteration speed and reduce repeat tasks for dev
# Copy this file to .env in the .vscode folder
# Fill in the <REPLACE THIS> values as needed, it is recommended to set the GEN_AI_API_KEY value to avoid having to set up an LLM in the UI
# Also check out danswer/backend/scripts/restart_containers.sh for a script to restart the containers which Danswer relies on outside of VSCode/Cursor processes
# For local dev, often user Authentication is not needed
@@ -15,7 +15,7 @@ LOG_LEVEL=debug
# This passes top N results to LLM an additional time for reranking prior to answer generation
# This step is quite heavy on token usage so we disable it for dev generally
DISABLE_LLM_DOC_RELEVANCE=True
DISABLE_LLM_DOC_RELEVANCE=False
# Useful if you want to toggle auth on/off (google_oauth/OIDC specifically)
- [Nginx](https://nginx.org/) (Not needed for development flows generally)
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
development purposes but also feel free to just use the containers and update with local changes by providing the
`--build` flag.
> **Note:**
> This guide provides instructions to build and run Onyx locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Onyx stack within Docker below.
### Local Set Up
It is recommended to use Python version 3.11
Be sure to use Python version 3.11. For instructions on installing Python 3.11 on macOS, refer to the [CONTRIBUTING_MACOS.md](./CONTRIBUTING_MACOS.md) readme.
If using a lower version, modifications will have to be made to the code.
If using a higher version, the version of Tensorflow we use may not be available for your platform.
If using a higher version, sometimes some libraries will not be available (i.e. we had problems with Tensorflow in the past with higher versions of python).
#### Backend: Python requirements
#### Installing Requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
```bash
python -m venv .venv
source .venv/bin/activate
```
--> Note that this virtual environment MUST NOT be set up WITHIN the danswer
directory
> **Note:**
> This virtual environment MUST NOT be set up WITHIN the onyx directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the onyx directory.
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
```
If using PowerShell, the command slightly differs:
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
$env:AUTH_TYPE='disabled'
uvicorn danswer.main:app --reload --port 8080
uvicorn onyx.main:app --reload --port 8080
"
```
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
> **Note:**
> If you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
#### Wrapping up
You should now have 4 servers running:
- Web server
- Backend API
- Model server
- Background jobs
Now, visit `http://localhost:3000` in your browser. You should see the Onyx onboarding wizard where you can connect your external LLM provider to Onyx.
You've successfully set up a local Onyx instance! 🏁
#### Running the Onyx application in a container
You can run the full Onyx application stack from pre-built images including all external software dependencies.
Navigate to `onyx/deployment/docker_compose` and run:
```bash
docker compose -f docker-compose.dev.yml -p onyx-stack up -d
```
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Onyx.
If you want to make changes to Onyx and run those changes in Docker, you can also build a local version of the Onyx container images that incorporates your changes like so:
```bash
docker compose -f docker-compose.dev.yml -p onyx-stack up -d --build
```
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
Then, from the `danswer/backend` directory, run:
With the virtual environment active, install the pre-commit library with:
```bash
pip install pre-commit
```
Then, from the `onyx/backend` directory, run:
```bash
pre-commit install
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we would like to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
Onyx is fully type-annotated, and we want to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `onyx/backend` directory.
#### Web
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `danswer/web` directory.
To run the formatter, use `npx prettier --write .` from the `danswer/web` directory.
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `onyx/web` directory.
To run the formatter, use `npx prettier --write .` from the `onyx/web` directory.
Please double check that prettier passes before creating a pull request.
### Release Process
Danswer follows the semver versioning standard.
Onyx loosely follows the SemVer versioning standard.
Major changes are released with a "minor" version bump. Currently we use patch release versions to indicate small feature changes.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
You can see the containers [here](https://hub.docker.com/search?q=onyx%2F).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/onyx-dot-app/onyx/blob/main/CONTRIBUTING.md).
### Setting up Python
Ensure [Homebrew](https://brew.sh/) is already set up.
Then install python 3.11.
```bash
brew install python@3.11
```
Add python 3.11 to your path: add the following line to ~/.zshrc
@@ -2,9 +2,9 @@ Copyright (c) 2023-present DanswerAI, Inc.
Portions of this software are licensed as follows:
* All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
* All third party components incorporated into the Danswer Software are licensed under the original license provided by the owner of the applicable component.
* Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
- All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
- All third party components incorporated into the Onyx Software are licensed under the original license provided by the owner of the applicable component.
- Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
<strong>[Danswer](https://www.danswer.ai/)</strong> is the AI Assistant connected to your company's docs, apps, and people.
Danswer provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any
scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your
own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready
for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for
configuring Personas (AI Assistants) and their Prompts.
<strong>[Onyx](https://www.onyx.app/)</strong> (Formerly Danswer) is the AI Assistant connected to your company's docs, apps, and people.
Onyx provides a Chat interface and plugs into any LLM of your choice. Onyx can be deployed anywhere and for any
scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your
own control. Onyx is dual Licensed with most of it under MIT license and designed to be modular and easily extensible. The system also comes fully ready
for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for
configuring AI Assistants.
Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc.
By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if
Onyx also serves as a Enterprise Search across all common workplace tools such as Slack, Google Drive, Confluence, etc.
By combining LLMs and team specific knowledge, Onyx becomes a subject matter expert for the team. Imagine ChatGPT if
it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already
supported?" or "Where's the pull request for feature Y?"
For more details on the Admin UI to manage connectors and users, check out our
For more details on the Admin UI to manage connectors and users, check out our
<strong><ahref="https://www.youtube.com/watch?v=geNzY1nbCnU">Full Video Demo</a></strong>!
## Deployment
Danswer can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
Onyx can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.onyx.app/quickstart) to learn more.
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/danswer-ai/danswer/tree/main/deployment/kubernetes).
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/onyx-dot-app/onyx/tree/main/deployment/kubernetes).
## 💃 Main Features
## 💃 Main Features
* Chat UI with the ability to select documents to chat with.
* Create custom AI Assistants with different prompts and backing knowledge sets.
*Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
*Document Search + AI Answers for natural language queries.
*Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
* Slack integration to get answers and search results directly in Slack.
- Chat UI with the ability to select documents to chat with.
- Create custom AI Assistants with different prompts and backing knowledge sets.
- Connect Onyx with LLM of your choice (self-host for a fully airgapped solution).
-Document Search + AI Answers for natural language queries.
-Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
-Slack integration to get answers and search results directly in Slack.
## 🚧 Roadmap
* Chat/Prompt sharing with specific teammates and user groups.
* Multi-Model model support, chat with images, video etc.
* Choosing between LLMs and parameters during chat session.
* Tool calling and agent configurations options.
* Organizational understanding and ability to locate and suggest experts from your team.
- Chat/Prompt sharing with specific teammates and user groups.
- Multimodal model support, chat with images, video etc.
- Choosing between LLMs and parameters during chat session.
- Tool calling and agent configurations options.
- Organizational understanding and ability to locate and suggest experts from your team.
## Other Noteable Benefits of Danswer
* User Authentication with document level access management.
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* Admin Dashboard to configure connectors, document-sets, access, etc.
* Custom deep learning models + learn from user feedback.
* Easy deployment and ability to host Danswer anywhere of your choosing.
## Other Notable Benefits of Onyx
- User Authentication with document level access management.
- Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
- Admin Dashboard to configure connectors, document-sets, access, etc.
- Custom deep learning models + learn from user feedback.
- Easy deployment and ability to host Onyx anywhere of your choosing.
## 🔌 Connectors
Efficiently pulls the latest changes from:
* Slack
* GitHub
* Google Drive
* Confluence
* Jira
* Zendesk
* Gmail
* Notion
* Gong
* Slab
* Linear
* Productboard
* Guru
* Bookstack
* Document360
* Sharepoint
* Hubspot
* Local Files
* Websites
* And more ...
- Slack
- GitHub
- Google Drive
- Confluence
- Jira
- Zendesk
- Gmail
- Notion
- Gong
- Slab
- Linear
- Productboard
- Guru
- Bookstack
- Document360
- Sharepoint
- Hubspot
- Local Files
- Websites
- And more ...
## 📚 Editions
There are two editions of Danswer:
There are two editions of Onyx:
* Danswer Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Danswer you will get if you follow the Deployment guide above.
* Danswer Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
* Single Sign-On (SSO), with support for both SAML and OIDC
* Role-based access control
* Document permission inheritance from connected sources
* Usage analytics and query history accessible to admins
* Whitelabeling
* API key authentication
* Encryption of secrets
* Any many more! Checkout [our website](https://www.danswer.ai/) for the latest.
- Onyx Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Onyx you will get if you follow the Deployment guide above.
- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
- Single Sign-On (SSO), with support for both SAML and OIDC
- Role-based access control
- Document permission inheritance from connected sources
- Usage analytics and query history accessible to admins
- Whitelabeling
- API key authentication
- Encryption of secrets
- Any many more! Checkout [our website](https://www.onyx.app/) for the latest.
2. For self-hosting, contact us at [founders@danswer.ai](mailto:founders@danswer.ai) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
2. For self-hosting, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
## ⭐Star History
[](https://star-history.com/#onyx-dot-app/onyx&Date)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.