* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* stash merge (may not function yet)
* remove dead code
* more cleanup
* remove dead file
* we shouldn't be checking for deletion attempts in the db any more
* print cc_pair_id
* print status on status mismatch again
* add logging when cc_pair isn't present
* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active
* add more specific check for deletion completion
* remove flaky mediawiki test site
* move is_pruning
* remove unused code
* remove old function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add tenant provisioning to data plane
* minor typing update
* ensure tenant router included
* proper auth check
* update disabling logic
* validated basic provisioning
* use new kv store
* set broker_connection_retry_on_startup to silence deprecation warning (we're OK with retrying on startup)
* env var for CELERY_BROKER_POOL_LIMIT
* add redis retry on timeout and health check interval
* set socket_keepalive = True
* remove shadow declaration of REDIS_HEALTH_CHECK_INTERVAL, add socket_keepalive_options where possible
* fix mypy complaint
* pass through vars in docker compose
* remove extra '='
* wrap in a try
* Allow config of background concurrency
* Add comment
* Fix light worker
* use backslashes to continue lines in supervisord with bash
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@danswer.ai>
* Added permission sync tests for Slack
* moved folders
* prune test + mypy
* added wait for indexing to cc_pair creation
* commented out check
* should fix other tests
* added slack channel pool
* fixed everything and mypy
* reduced flake
* disable trivy for the moment due to db download flakiness on their end causing the action to fail
* try hardcoding to amazon registry as others have suggested
* checkpoint
* k
* k
* need frontend
* add api key check + ui component
* add proper ports + icons + functions
* k
* k
* k
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* experiment with build and no push
* use slightly more descriptive and consistent tags and names
* name integration test workflow consistently with other workflows
* put the tag back
* try runs-on s3 backend
* try adding runs-on cache
* add with key
* add a dummy path
* forget about multiline
* maybe we don't need runs-on cache immediately
* lower ram slightly, name test with a version bump
* don't need to explicitly include runs-on/cache for docker caching
* comment out flaky portion of knowledge chat test
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Xenforo forum parser support
* clarify ssl cert reqs
* missed a file
* add isLoadState function, fix up xenforo for data driven connector approach
* fixing a new edge case to skip an unexpected parsed element
* change documentsource to xenforo
* make doc id unique and comment what's happening
* remove stray log line
* address code review
---------
Co-authored-by: sime2408 <simun.sunjic@gmail.com>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* addressing code review
* fix import
* fix prune_documents_task references
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* rename classes and ignore deprecation warnings we mostly don't have control over
* copy pytest.ini
* ignore CryptographyDeprecationWarning
* fully qualify the warning
* test self hosted runner
* update more docker builds with self hosted runner
* convert everything to runs-on (except web container)
* try upping the RAM for future flake proofing
* initial Asana connector
* hint on how to get Asana workspace ID
* re-format with black
* re-order imports
* update asana connector for clarity
* minor robustification
* minor update to naming
* update for best practice
* update connector
---------
Co-authored-by: Daniel Naber <naber@danielnaber.de>
* Added permission syncing on the backend
* Rewored to work with celery
alembic fix
fixed test
* frontend changes
* got groups working
* added comments and fixed public docs
* fixed merge issues
* frontend complete!
* frontend cleanup and mypy fixes
* refactored connector access_type selection
* mypy fixes
* minor refactor and frontend improvements
* get to fetch
* renames and comments
* minor change to var names
* got curator stuff working
* addressed pablo's comments
* refactored user_external_group to reference users table
* implemented polling
* small refactor
* fixed a whoopsies on the frontend
* added scripts to seed dummy docs and test query times
* fixed frontend build issue
* alembic fix
* handled is_public overlap
* yuhong feedback
* added more checks for sync
* black
* mypy
* fixed circular import
* todos
* alembic fix
* alembic
* add pip retries to the github workflows too
* let's try running on amd64 ... docker builds are unusually flaky
* bump
* try large
* no yaml anchors
* switch back down to Amd64
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Deleting a connector should redirect to the indexing status page
* minor update to dev background jobs
* update refresh logic
* remove print statement
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* add db refresh to connector deletion
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* use separate database number for celery result backend
* add comments
* add env var for celery's result_expires
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Move StandardAnswer to EE section of danswer/db/models
* Move StandardAnswer DB layer to EE
* Add EERequiredError for distinct error handling here
* Handle EE fallback for slack bot config
* Migrate all standard answer models to ee
* Flagging categories for removal
* Add missing versioned impl for update_slack_bot_config
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* persona
* all prepared excluding configuration
* more sensical model structure
* update tstream
* type updates
* rm
* quick and simple updates
* minor updates
* te
* ensure typing + naming
* remove old todo + rebase update
* remove unnecessary check
* allow setting of CORS origin
* simplify
* add environment variable + rename
* slightly more efficient
* simplify so mypy doens't complain
* temp
* go back to my preferred formatting
* make it impossible to switch to non-image
* revert ports
* proper provider support
* remove unused imports
* minor rename
* simplify interface
* remove logs
* migration: add column "match_any_keywords" to StandardAnswer
* Implement any/all keyword matching for standard answers
* Add match_any_keywords to non-searchable fields
* Remove stray print
* Simplify Slack messages for any and all cases
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Migrate standard answers implementations to ee/
* renaming
* Clean up slackbot non-ee standard answers import
* Move backend api/manage/standard_answer route to ee
* Move standard answers web UI to ee
* Hide standard answer controls in bot edit page
* Kwargs for fetch_versioned_implementation
* Add docstring explaining return types for handle_standard_answers
* Consolidate blocks into ee/handle_standard_answers
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Support regex in standard answers
* fix mypy
* Add match_regex boolean column to StandardAnswer
* Add match_regex flag and validation to Pydantic models
* GET /manage/admin/standard-answer: add match_regex to create_standard_answer
* PATCH /manage/admin/standard-answer/🆔 add match_regex to update_standard_answer
* Add "Match Regex" toggle to standard answer form
* Decode error pattern in case it's bytes
* Refactor regex support to use match_regex flag instead of supplemental tuple
* Better error handling for invalid regexes
* Show "match regex" in table and style keywords appropriately
* Fix stale UI copy for non-"match_regex" branch
* Fix stale docstring in find_matching_standard_answers
* Update down_revision to reflect most recent migration
* Update UI copy
* Initial implementation of match group display
* Fix pydantic StandardAnswer vs SQLAlchemy StandardAnswer model usage
* Update docstring return type
* Fix missing key prop
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* Reorder and clarify dependency installation instructions
* Clarify instructions for local development with Docker external deps vs full Docker stack
* Final words at the end of the local setup process
---------
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Fail instead of continuing if vespa cannot be reached within the timeout period
* improve startup readability
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Add user when they interact outside of UI (e.g. Slack bot)
* fix mypy errors
* don't use user manager to avoid async messiness
* fix email is none scenario
* fix mypy
* make code slightly clearer
* PR comments
* get slack email in generate button as well
* fix alembic migration
* update name to be more descriptive
---------
Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
The commit skips reading 'external_object_instance_page' blocks in the NotionConnector due to the lack of support in the Notion API. This change is in response to the issue #1761.
Co-authored-by: Cola Chen <6825116+colachg@users.noreply.github.com>
* validate web list
* update pdf extraction of metadat
* remove pdf + log
* stricter type enforcing
* fix up indexing widths
* minor formatting
* add list case
* check for empty metadata
* first cut at redis
* fix startup dependencies on redis
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* update contributing guide
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Added pagination to individual connector pages
* I cooked
* Gordon Ramsay in this b
* meepe
* properly calculated max chunk and switch dict to array
* chunks -> batches
* increased max page size
* renmaed var
* initial commit
* almost done
* finished 3 tests
* minor refactor
* built out initial permisison tests
* reworked test_deletion
* removed logging
* all original tests have been converted
* renamed user_groups to user_group
* mypy
* added test for doc set permissions
* unified naming for manager methods
* Refactored models and added new deletion test
* minor additions
* better logging+fixed input variables
* commented out failed tests
* Added readme
* readme update
* Added auth to IT
set auth_type to basic and require_email_verification to false
* Update run-it.yml
* used verify and added to readme
* added api key manager
* get accurate model output max
* squash
* udpated max default tokens
* rename + use fallbacks
* functional
* remove max tokens
* update naming
* comment out function to prevent mypy issues
* ran bump-pydantic
* replace root_validator with model_validator
* mostly working. some alternate assistant error. changed root_validator and typing_extensions
* working generation chat. changed type
* replacing .dict with .model_dump
* argument needed to bring model_dump up to parity with dict()
* fix a fewremaining issues -- working with llama and gpt
* updating requirements file
* more requirement updates
* more requirement updates
* fix to make search work
* return type fix:
* half way tpyes change
* fixes for mypy and pydantic:
* endpoint fix
* fix pydantic protected namespaces
* it works!
* removed unecessary None initializations
* better logging
* changed default values to empty lists
* mypy fixes
* fixed array defaulting
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* add new user provider hook
* account for additional logic
* add users
* remove is loading
* Curator polish
* useeffect -> provider + effect
* squash
* use use user for user default models
* squash
* Added ability to add users to groups among other things
* final polish
* added connection button to groups
* mypy fix
* Improved document set clarity
* string fixes
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Added backend support for curator role
* modal refactor
* finalized first 2 commits
same as before
finally
what was it for
* added credential, cc_pair, and cleanup
mypy is super helpful hahahahahahahahahahahaha
* curator support for personas
* added connector management permission checks
* fixed the connector creation flow
* added document access to curator
* small cleanup added comments and started ui
* groups and assistant editor
* Persona frontend
* Document set frontend
* cleaned up the entire frontend
* alembic fix
* Minor fixes
* credentials section
* some credential updates
* removed logging statements
* fixed try catch
* fixed model name
* made everything happen in one db commit
* Final cleanup
* cleaned up fast code
* mypy/build fixes
* polish
* more token rate limit polish
* fixed weird credential permissions
* Addressed chris feedback
* addressed pablo feedback
* fixed alembic
* removed deduping and caching
* polish!!!!
* add regenerate
* functional once again post rebase but quite ugly
* validated + cleaner UI
* more robust implementation for first messages
* squash
* remove parameter
* proper margin
* clarify for future programmers
* remove some logs
* self nit pick - smoother ux
* more self-nits
* stroke line cap
* rebase
* support indexing attachments as separate docs when not part of a page
* fix time filter, fix batch handling, fix returned number of attachments processed
* backend changes to handle partial completion of index attempts
* typo fix
* Display partial success in UI
* make log timing more readable by limiting printed precision to milliseconds
* forgot alembic
* initial cut at "completed with errors" indexing
* remove and reorganize unused imports
* show view errors while indexing is in progress
* code review fixes
* add ux improvements
* add danswer version display
* show version properly
* improve copy + add web version to settings context
* update copy + danswer version
* stopgap: clarify text on standard answer page for improved UX
* replce apostrophe
* using tailwind:
---------
Co-authored-by: Jos Van der westhuizen <jos@danser.ai>
* allow admin role api keys
* bump to rerun deployment
* types needs explicit export now for APIKey
* remove api_key.role, use User.role instead
* fix formatting
* formatting
* formatting
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add send-message-simple-with-history endpoint to support ramp. avoids bad json output in models and allows client to pass history in instead of maintaining it in our own session
* slightly better error checking
* addressing code review
* reject on any empty message
* update test naming
* avoid reindexing secondary indexes after they succeed
* use postgres application names to facilitate connection debugging
* centralize all postgres application_name constants in the constants file
* missed a couple of files
* mypy fixes
* update dev background script
* also allow access to a persona if the user is in the list of authorized users or groups
* add comment on potential performance improvements
* work around for mypy typing
My website (https://shukantpal.com) uses Let's Encrypt certificates, which aren't accepted by the Python urllib certificate verifier for some reason. My website is set up correctly otherwise (https://www.sslshopper.com/ssl-checker.html#hostname=www.shukantpal.com)
This change adds a fix so the correct traceback is shown in Danswer, instead of a generic "unable to connect, check your Internet connection".
* Added ability to control LLM access based on group
* completed relationship deletion
* cleaned up function
* added comments
* fixed frontend strings
* mypy fixes
* added case handling for deletion of user groups
* hidden advanced options now
* removed unnecessary code
* quick fix to test on ec2
* quick cleanup
* modify a name
* address full doc as well
* additional timing info + handling
* clean up
* squash
* Print only
* added retries and multithreading for cloud embedding
* refactored a bit
* cleaned up code
* got the errors to bubble up to the ui correctly
* added exceptin printing
* added requirements
* touchups
---------
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
Makes it so if you change which LLM you are using in a given ChatSession, that is persisted and sticks around if you reload the page / come back to the ChatSession later
* Added connector for clickup
* Fixed mypy issues
* Fallback to description if markdown is not available
* Added extra information in metadata, and support to index comments
* Fixes for fields parsing
* updated fetcher to errorHandlingFetcher
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* Confluence: Add page attachments indexing
* used the centralized file processing to extract file content
* flipped input order for extract_file_text
* added bytes support for pdf converter
* brought out the io.BytesIO to the confluence connector
---------
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
* Changes for Gitlab connector
* Changes to Rebase from Main
* Changes to Rebase from Main
* Changes to Rebase from Main
* Changes to Rebase from Main
* made indexing code files a config setting
* Update app_configs.py
created env variable
* Update app_configs.py
added false
---------
Co-authored-by: Varun Gaur <vgaur@roku.com>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
Vespa is not working because this configuration
As you can see in this Issue https://github.com/unoplat/vespa-helm-charts/issues/20
You have to use this podLabels to be accord with the other configuration.
vespa:
podLabels:
app: vespa
app.kubernetes.io/instance: danswer <-------------
* start dropbox connector
* add wip ui
* polish ui
* Fix some ci
* ignore types
* addressed, fixed, and tested all comments
* ran prettier
* ran mypy fixes
---------
Co-authored-by: Bill Yang <bill@Bills-MacBook-Pro.local>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* start dropbox connector
* add wip ui
* polish ui
* Fix some ci
* ignore types
* addressed, fixed, and tested all comments
* ran prettier
* ran mypy fixes
---------
Co-authored-by: Bill Yang <bill@Bills-MacBook-Pro.local>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* Add MediaWikiConnector first draft
* Add MediaWikiConnector first draft
* Add MediaWikiConnector first draft
* Add MediaWikiConnector sections for each document
* Add MediaWikiConnector to constants and factory
* Integrate MediaWikiConnector with connectors page
* Unit tests + bug fixes
* Allow adding multiple mediawikiconnectors
* add wikipedia connector
* add wikipedia connector to factory
* improve docstrings of mediawiki connector backend
* improve docstrings of mediawiki connector backend
* move wikipedia and mediawiki icon locations in admin page
* undo accidental commit of modified docker compose yaml
- Implement `extract_text_from_content` to parse nested text elements from comment bodies.
- Modify `_get_comment_strs` to use the new text extraction method, improving handling of various content structures.
# Fill in the <REPLACE THIS> values as needed, it is recommended to set the GEN_AI_API_KEY value to avoid having to set up an LLM in the UI
# Also check out danswer/backend/scripts/restart_containers.sh for a script to restart the containers which Danswer relies on outside of VSCode/Cursor processes
# For local dev, often user Authentication is not needed
AUTH_TYPE=disabled
# Always keep these on for Dev
# Logs all model prompts to stdout
LOG_DANSWER_MODEL_INTERACTIONS=True
# More verbose logging
LOG_LEVEL=debug
# This passes top N results to LLM an additional time for reranking prior to answer generation
# This step is quite heavy on token usage so we disable it for dev generally
DISABLE_LLM_DOC_RELEVANCE=False
# Useful if you want to toggle auth on/off (google_oauth/OIDC specifically)
OAUTH_CLIENT_ID=<REPLACE THIS>
OAUTH_CLIENT_SECRET=<REPLACE THIS>
# Generally not useful for dev, we don't generally want to set up an SMTP server for dev
REQUIRE_EMAIL_VERIFICATION=False
# Set these so if you wipe the DB, you don't end up having to go through the UI every time
GEN_AI_API_KEY=<REPLACE THIS>
# If answer quality isn't important for dev, use gpt-4o-mini since it's cheaper
GEN_AI_MODEL_VERSION=gpt-4o
FAST_GEN_AI_MODEL_VERSION=gpt-4o
# For Danswer Slack Bot, overrides the UI values so no need to set this up via UI every time
# Only needed if using DanswerBot
#DANSWER_BOT_SLACK_APP_TOKEN=<REPLACE THIS>
#DANSWER_BOT_SLACK_BOT_TOKEN=<REPLACE THIS>
# Python stuff
PYTHONPATH=../backend
PYTHONUNBUFFERED=1
# Internet Search
BING_API_KEY=<REPLACE THIS>
# Enable the full set of Danswer Enterprise Edition features
# NOTE: DO NOT ENABLE THIS UNLESS YOU HAVE A PAID ENTERPRISE LICENSE (or if you are using this for local testing/development)
- [Nginx](https://nginx.org/) (Not needed for development flows generally)
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
development purposes but also feel free to just use the containers and update with local changes by providing the
`--build` flag.
> **Note:**
> This guide provides instructions to build and run Danswer locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Danswer stack within Docker below.
### Local Set Up
It is recommended to use Python version 3.11
Be sure to use Python version 3.11. For instructions on installing Python 3.11 on macOS, refer to the [CONTRIBUTING_MACOS.md](./CONTRIBUTING_MACOS.md) readme.
If using a lower version, modifications will have to be made to the code.
If using a higher version, the version of Tensorflow we use may not be available for your platform.
If using a higher version, sometimes some libraries will not be available (i.e. we had problems with Tensorflow in the past with higher versions of python).
#### Installing Requirements
#### Backend: Python requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
@@ -72,6 +75,11 @@ For convenience here's a command for it:
python -m venv .venv
source .venv/bin/activate
```
> **Note:**
> This virtual environment MUST NOT be set up WITHIN the danswer directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the danswer directory.
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
@@ -85,34 +93,38 @@ Install the required python dependencies:
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
@@ -158,20 +170,58 @@ powershell -Command "
"
```
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
> **Note:**
> If you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
#### Wrapping up
You should now have 4 servers running:
- Web server
- Backend API
- Model server
- Background jobs
Now, visit `http://localhost:3000` in your browser. You should see the Danswer onboarding wizard where you can connect your external LLM provider to Danswer.
You've successfully set up a local Danswer instance! 🏁
#### Running the Danswer application in a container
You can run the full Danswer application stack from pre-built images including all external software dependencies.
Navigate to `danswer/deployment/docker_compose` and run:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d
```
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Danswer.
If you want to make changes to Danswer and run those changes in Docker, you can also build a local version of the Danswer container images that incorporates your changes like so:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --build
```
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
With the virtual environment active, install the pre-commit library with:
```bash
pip install pre-commit
```
Then, from the `danswer/backend` directory, run:
```bash
pre-commit install
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we would like to keep it that way!
Danswer is fully type-annotated, and we want to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
@@ -182,6 +232,7 @@ Please double check that prettier passes before creating a pull request.
### Release Process
Danswer follows the semver versioning standard.
Danswer loosely follows the SemVer versioning standard.
Major changes are released with a "minor" version bump. Currently we use patch release versions to indicate small feature changes.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md).
### Setting up Python
Ensure [Homebrew](https://brew.sh/) is already set up.
Then install python 3.11.
```bash
brew install python@3.11
```
Add python 3.11 to your path: add the following line to ~/.zshrc
Portions of this software are licensed as follows:
* All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
* All third party components incorporated into the Danswer Software are licensed under the original license provided by the owner of the applicable component.
* Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
@@ -105,5 +105,25 @@ Efficiently pulls the latest changes from:
* Websites
* And more ...
## 📚 Editions
There are two editions of Danswer:
* Danswer Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Danswer you will get if you follow the Deployment guide above.
* Danswer Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
* Single Sign-On (SSO), with support for both SAML and OIDC
* Role-based access control
* Document permission inheritance from connected sources
* Usage analytics and query history accessible to admins
* Whitelabeling
* API key authentication
* Encryption of secrets
* Any many more! Checkout [our website](https://www.danswer.ai/) for the latest.
2. For self-hosting, contact us at [founders@danswer.ai](mailto:founders@danswer.ai) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.