Compare commits

...

857 Commits

Author SHA1 Message Date
Weves
5f82de7c45 Debug test 2024-09-23 11:05:27 -07:00
pablodanswer
45f67368a2 Add support for o1 (#2538)
* add o1 support + bump litellm/openai

* ports

* update exception message for testing
2024-09-22 23:16:28 +00:00
pablodanswer
014ba9e220 Begin distinguishing upsert operations for clarity (#2535)
* additional clarity for llm provider creation / updates

* update provider APIs

* update typing (minor)
2024-09-21 22:36:22 +00:00
pablodanswer
ba64543dd7 Updated modals for clarity (#2529)
* udpated modals for clarity

* fix build
2024-09-21 19:55:54 +00:00
pablodanswer
18c62a0c24 Add additional custom tooling configuration (#2426)
* add custom headers

* add tool seeding

* squash

* tmep

* validated

* rm

* update typing

* update alembic

* update import name

* reformat

* alembic
2024-09-20 23:12:52 +00:00
Chris Weaver
33f555922c Fix duplicate users from slack / web (#2530) 2024-09-20 21:51:33 +00:00
pablodanswer
05f6f6d5b5 update default search assistant selection (#2527)
* update default search assistant selection

* update language
2024-09-20 21:21:44 +00:00
hagen-danswer
19dae1d870 Wrote tests for the chat apis (#2525)
* Wrote tests for the chat apis

* slight changes to the case
2024-09-20 19:00:03 +00:00
rkuo-danswer
6d859bd37c try adding build essential (#2526) 2024-09-20 11:51:44 -07:00
pablodanswer
122e3fa3fa Access type (#2523) 2024-09-20 11:16:37 -07:00
pablodanswer
87b542b335 align alembic 2024-09-20 11:13:00 -07:00
pablodanswer
00229d2abe Add start date to persona (#2407)
* add start date to persona

* remove logs

* rename

* update assistant editor

* update alembic

* update alembic

* update alembic

* udpate alembic

* remove rebase artifacts
2024-09-20 16:39:34 +00:00
pablodanswer
5f2644985c Route name (#2520)
* clearer refresh logic

* rename path
2024-09-20 15:44:28 +00:00
pablodanswer
c82a36ad68 Saml account fastapi deletion (#2512)
* saml account fastapi deletion

* update error detail
2024-09-20 00:20:50 +00:00
hagen-danswer
16d1c19d9f Added bool to disable chat_session_id check for search_docs for api 2024-09-19 17:36:46 -07:00
pablodanswer
9f179940f8 Asana connector (community originated) (#2485)
* initial Asana connector

* hint on how to get Asana workspace ID

* re-format with black

* re-order imports

* update asana connector for clarity

* minor robustification

* minor update to naming

* update for best practice

* update connector

---------

Co-authored-by: Daniel Naber <naber@danielnaber.de>
2024-09-19 23:54:18 +00:00
pablodanswer
8a8e2b310e Assistants panel rework (#2509)
* update user model

* squash - update assistant gallery

* rework assistant display logic + ux

* update tool + assistant display

* update a couple function names

* update typing + some logic

* remove unnecessary comments

* finalize functionality

* updated logic

* fully functional

* remove logs + ports

* small update to logic

* update typing

* allow seeding of display priority

* reorder migrations

* update for alembic
2024-09-19 23:36:15 +00:00
hagen-danswer
2274cab554 Added permission syncing (#2340)
* Added permission syncing on the backend

* Rewored to work with celery

alembic fix

fixed test

* frontend changes

* got groups working

* added comments and fixed public docs

* fixed merge issues

* frontend complete!

* frontend cleanup and mypy fixes

* refactored connector access_type selection

* mypy fixes

* minor refactor and frontend improvements

* get to fetch

* renames and comments

* minor change to var names

* got curator stuff working

* addressed pablo's comments

* refactored user_external_group to reference users table

* implemented polling

* small refactor

* fixed a whoopsies on the frontend

* added scripts to seed dummy docs and test query times

* fixed frontend build issue

* alembic fix

* handled is_public overlap

* yuhong feedback

* added more checks for sync

* black

* mypy

* fixed circular import

* todos

* alembic fix

* alembic
2024-09-19 22:07:36 +00:00
pablodanswer
ef104e9a82 Non-spotfix deletion of users (#2499)
* add description / robustify

* additional minor robustification (ideally we organized cascades slightly better)

* update deletion for simplicity

* minor typing update
2024-09-19 20:02:36 +00:00
hagen-danswer
a575d7f1eb Citations prompt for slack now includes thread history (#2510) 2024-09-19 19:31:26 +00:00
pablodanswer
f404c4b448 Move code block default language creation to citation processing (#2501)
* move code block default language creation to citaiton processing

* add test cases

* update copy
2024-09-19 06:00:58 +00:00
rkuo-danswer
3884f1d70a Bugfix/larger test runner (#2508)
* add pip retries to the github workflows too

* let's try running on amd64 ... docker builds are unusually flaky

* bump

* try large

* no yaml anchors

* switch back down to Amd64

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-19 05:36:07 +00:00
rkuo-danswer
bc9d5fece7 prevent trying to submit to jobclient when it can't take any more work (reduces log spam) (#2482) 2024-09-19 04:01:15 +00:00
rkuo-danswer
bb279a8580 add pip retries. should help with github's occasional flaky network during build/test (#2506) 2024-09-19 00:46:41 +00:00
pablodanswer
a9403016c9 fix basic auth (#2505) 2024-09-18 22:45:58 +00:00
hagen-danswer
f3cea79c1c Deleting a connector should redirect to the indexing status page (#2504)
* Deleting a connector should redirect to the indexing status page

* minor update to dev background jobs

* update refresh logic

* remove print statement

---------

Co-authored-by: pablodanswer <pablo@danswer.ai>
2024-09-18 21:38:35 +00:00
hagen-danswer
54bb79303c corrected error message (#2502) 2024-09-18 19:13:28 +00:00
pablodanswer
d3dfabb20e fix parentheses (#2486) 2024-09-18 18:39:23 +00:00
pablodanswer
7d1ec1095c proper z index for chat bubbles (#2500) 2024-09-18 18:02:50 +00:00
rkuo-danswer
f531d071af Feature/background deletion (#2337)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* in flight

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* actually working connector deletion

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

* fix imports

* refactor to use update_single

* mypy fixes

* add vespa test

* add db refresh to connector deletion

* code review fixes

* move monitor_usergroup_taskset to ee, improve logging

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-18 16:50:11 +00:00
Chris Weaver
4218814385 Add flow to query history CSV (#2492) 2024-09-18 14:23:56 +00:00
rkuo-danswer
e662e3b57d clarify ssl cert reqs (#2494)
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-18 05:35:57 +00:00
pablodanswer
2073820e33 Update default assistants to all visible (#2490)
* update default assistants to all visible

* update with catch-all

* minor update

* update
2024-09-18 02:08:11 +00:00
Chris Weaver
5f25b243c5 Add back llm_chunks_indices (#2491) 2024-09-18 01:21:31 +00:00
pablodanswer
a9427f190a Extend time range (contributor submission) (#2484)
* added new options for time range; removed duplicated code

* refactor + remove unused code

---------

Co-authored-by: Zoltan Szabo <zoltan.szabo@eaudeweb.ro>
2024-09-17 22:36:25 +00:00
pablodanswer
18fbe9d7e8 Warn users of gpu-sensitive operation (#2488)
* warn users of gpu-sensitive operation

* update copy
2024-09-17 21:59:43 +00:00
Chris Weaver
75c9b1cafe Fix concatenate string with toolcallkickoff issue (#2487) 2024-09-17 21:25:06 +00:00
rkuo-danswer
632a8f700b Feature/celery backend db number (#2475)
* use separate database number for celery result backend

* add comments

* add env var for celery's result_expires

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-17 21:06:36 +00:00
pablodanswer
cd58c96014 Memoize AI message component (#2483)
* memoize AI message component

* rename memoized file

* remove "zz"

* update name

* memoize for coverage

* add display name
2024-09-17 18:47:23 +00:00
pablodanswer
c5032d25c9 Minor clarity update for connectors (#2480) 2024-09-17 10:25:39 -07:00
pablodanswer
72acde6fd4 Handle tool errors in display properly (can show valueError to user) (#2481)
* handle tool errors in display properly (can show valueerrors to user)

* update for clarity
2024-09-17 17:08:46 +00:00
rkuo-danswer
5596a68d08 harden migration (#2476)
* harden migration

* remove duplicate line
2024-09-17 16:44:53 +00:00
Weves
5b18409c89 Change user-message to user-prompt 2024-09-16 21:53:27 -07:00
Chris Weaver
84272af5ac Add back scrolling to ExceptionTraceModal (#2473) 2024-09-17 02:25:53 +00:00
pablodanswer
6bef70c8b7 ensure disabled gets propagated 2024-09-16 19:27:31 -07:00
pablodanswer
7f7559e3d2 Allow users to share assistants (#2434)
* enable assistant sharing

* functional

* remove logs

* revert ports

* remove accidental update

* minor updates to copy

* update formatting

* update for merge queue
2024-09-17 01:35:29 +00:00
Chris Weaver
7ba829a585 Add top_documents to APIs (#2469)
* Add top_documents

* Fix test

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-09-16 23:48:33 +00:00
trial-danswer
8b2ecb4eab EE movement followup for Standard Answers (#2467)
* Move StandardAnswer to EE section of danswer/db/models

* Move StandardAnswer DB layer to EE

* Add EERequiredError for distinct error handling here

* Handle EE fallback for slack bot config

* Migrate all standard answer models to ee

* Flagging categories for removal

* Add missing versioned impl for update_slack_bot_config

---------

Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-16 22:05:53 +00:00
pablodanswer
2dd3870504 Add ability to specify persona in API request (#2302)
* persona

* all prepared excluding configuration

* more sensical model structure

* update tstream

* type updates

* rm

* quick and simple updates

* minor updates

* te

* ensure typing + naming

* remove old todo + rebase update

* remove unnecessary check
2024-09-16 21:31:01 +00:00
pablodanswer
df464fc54b Allow for CORS Origin Setting (#2449)
* allow setting of CORS origin

* simplify

* add environment variable + rename

* slightly more efficient

* simplify so mypy doens't complain

* temp

* go back to my preferred formatting
2024-09-16 18:54:36 +00:00
pablodanswer
96b98fbc4a Make it impossible to switch to non-image (#2440)
* make it impossible to switch to non-image

* revert ports

* proper provider support

* remove unused imports

* minor rename

* simplify interface

* remove logs
2024-09-16 18:35:40 +00:00
trial-danswer
66cf67d04d hotfix: sqlalchemy default -> server_default (#2442)
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-16 17:49:01 +00:00
pablodanswer
285bdbbaf9 Fix stop generating locally (#2452)
* fix stop generating locally

* .
2024-09-15 23:55:30 +00:00
pablodanswer
e2c37d6847 Test stream + Update Copy (#2317)
* update copy + conditional ordering

* answer stream checks

* update

* add basic tests for chat streams

* slightly simplify

* fix typing

* quick typing updates + nits
2024-09-15 19:40:48 +00:00
Yuhong Sun
3ff2ba7ee4 k (#2450) 2024-09-15 17:32:58 +00:00
pablodanswer
290f4f0f8c add some minor ux updates (#2441) 2024-09-15 08:29:31 +00:00
rkuo-danswer
3c934a93cd using is_up_to_date cached outside of the fence was causing a race condition where the same sync could be kicked off again (#2433) 2024-09-15 06:27:05 +00:00
Yuhong Sun
a51b0f636e Logs from API Server Container on Merge Queue (#2448)
* k

* k
2024-09-14 20:32:18 +00:00
pablodanswer
a50c2e30ec Very minor polish (#2445)
* fix minor polish

* cleaner chat flow

* remove keys

* slight robustification to copying
2024-09-14 17:54:29 +00:00
pablodanswer
ee278522ef update indexing status clarity (#2446) 2024-09-14 17:19:55 +00:00
trial-danswer
430c9a47d7 Match any/all keywords in Standard Answers (#2443)
* migration: add column "match_any_keywords" to StandardAnswer

* Implement any/all keyword matching for standard answers

* Add match_any_keywords to non-searchable fields

* Remove stray print

* Simplify Slack messages for any and all cases

---------

Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-14 05:28:07 +00:00
hj-danswer
974f85da66 Migrate standard answers implementations to ee/ (#2378)
* Migrate standard answers implementations to ee/

* renaming

* Clean up slackbot non-ee standard answers import

* Move backend api/manage/standard_answer route to ee

* Move standard answers web UI to ee

* Hide standard answer controls in bot edit page

* Kwargs for fetch_versioned_implementation

* Add docstring explaining return types for handle_standard_answers

* Consolidate blocks into ee/handle_standard_answers

---------

Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-14 01:57:03 +00:00
hagen-danswer
a63cb9da43 fixed /danswer handling (#2436)
* fixed

* mypy

* cleaned up and commented

* mypy

* Update handle_regular_answer.py
2024-09-14 01:21:13 +00:00
rkuo-danswer
d807ad7699 fix document set connection removal sync, add tests for document set and user group removal (#2437) 2024-09-14 01:01:26 +00:00
hj-danswer
3cb00de6d4 Support regex in standard answers (#2377)
* Support regex in standard answers

* fix mypy

* Add match_regex boolean column to StandardAnswer

* Add match_regex flag and validation to Pydantic models

* GET /manage/admin/standard-answer: add match_regex to create_standard_answer

* PATCH /manage/admin/standard-answer/🆔 add match_regex to update_standard_answer

* Add "Match Regex" toggle to standard answer form

* Decode error pattern in case it's bytes

* Refactor regex support to use match_regex flag instead of supplemental tuple

* Better error handling for invalid regexes

* Show "match regex" in table and style keywords appropriately

* Fix stale UI copy for non-"match_regex" branch

* Fix stale docstring in find_matching_standard_answers

* Update down_revision to reflect most recent migration

* Update UI copy

* Initial implementation of match group display

* Fix pydantic StandardAnswer vs SQLAlchemy StandardAnswer model usage

* Update docstring return type

* Fix missing key prop

---------

Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-14 00:07:42 +00:00
Chris Weaver
da6e46ae75 Slack flow improvements (#2366) 2024-09-13 16:56:45 -07:00
pablodanswer
648c2531f9 Add custom tool chat session / message ID dynamic prompting (#2404)
* add custom tool chat session / message ID dynamic prompting

* update some formatting

* code organization + remove unnecessary card

* remove log

* update for clarity
2024-09-13 18:42:21 +00:00
pablodanswer
fc98c560a4 Add fix for logging (#2431) 2024-09-13 11:27:20 -07:00
pablodanswer
566f44fcd6 Minor update to llm image ability tracking (#2423)
* minor update to llm image ability tracking

* quick robustification
2024-09-13 17:24:51 +00:00
rkuo-danswer
2fe49e5efb add ssl testing for redis against a cloud instance (#2422) 2024-09-13 10:28:04 -07:00
rkuo-danswer
f58acd4e2a Add redis to helm chart (#2390) 2024-09-13 10:26:51 -07:00
pablodanswer
53008a0271 update multipass indeixng server default 2024-09-13 10:24:26 -07:00
pablodanswer
13278663d9 Update refresh + robustify embeddings (#2420)
* update refresh + robustify embeddings

* squash
2024-09-13 14:26:33 +00:00
pablodanswer
31ca6857fb Custom Refresh on Client Side (#2376) 2024-09-13 00:04:03 -07:00
pablodanswer
6dd91414be delete chat session immediately 2024-09-13 00:02:43 -07:00
rkuo-danswer
140c34e59e ephemeral behavior for redis (#2373)
* ephemeral behavior for redis

* notes for redis command line consistency
2024-09-13 04:48:50 +00:00
rkuo-danswer
da8e68b320 reformat celery logging to match danswer style logging across services (#2409)
* reformat celery logging to match danswer style logging across services

* mypy fixes

* handle logfile argument
2024-09-13 01:51:51 +00:00
hagen-danswer
e9a616e579 Added search_doc_ids to the simple api to allow for skipping search (#2421)
* Added search_doc_ids to the simple api to allow for skipping search

* comment

* fixed behaviour
2024-09-12 23:22:41 +00:00
pablodanswer
cb2169f2a3 Warm up reranker on model switch (#2408)
* warm up reranker on model switch

* properly type

* fix issue

* Update search_settings.py
2024-09-12 22:12:17 +00:00
pablodanswer
79aa5dd6e0 add a tiny bit of clarity to index doc counts (#2414) 2024-09-12 21:59:10 +00:00
hagen-danswer
604ebafe6c simple apis now cited/context doc indices (#2419)
* simple apis now cited/context doc indices

* minor fixes
2024-09-12 21:29:24 +00:00
pablodanswer
a2d775efbd Reformatted tailwind config (#2417)
* reformatted tailwind config

* minor update
2024-09-12 19:41:11 +00:00
rkuo-danswer
641690e3f7 fix enabling ssl in connection pool (#2418) 2024-09-12 19:18:04 +00:00
rkuo-danswer
eebf98e3a6 fix setting redis_scheme (#2416) 2024-09-12 18:07:38 +00:00
rkuo-danswer
4bc4da29f5 add SSL parameter support for redis (#2389)
* add SSL parameter support for redis

* add ssl support to redis pool
2024-09-12 16:18:11 +00:00
pablodanswer
7af572d0e7 display only failed (#2413) 2024-09-12 16:01:17 +00:00
pablodanswer
58bdf9d684 Add connector deletion failure message (#2392) 2024-09-11 22:38:15 -07:00
pablodanswer
f69922fff7 Add environment variable for setting vespa search threads (#2400) 2024-09-11 22:37:38 -07:00
pablodanswer
d4d37c9cdd add bedrock models (#2405) 2024-09-12 04:34:43 +00:00
Yuhong Sun
2654df49fd Update CONTRIBUTING.md 2024-09-11 19:17:23 -07:00
pablodanswer
aee5fcd4e0 Add env variables for overriding embedding batch size (#2395)
* add env variabels for overriding

* proper ports

* proper overrides
2024-09-12 00:51:45 +00:00
pablodanswer
2c77dd241b Add error table to re-indexing (#2388)
* add error table to re-indexing

* robustify

* update with proper comment

* add popup

* update typo
2024-09-11 22:55:55 +00:00
pablodanswer
d90c90dd92 simplify unnecessary display logic (#2406) 2024-09-11 21:35:50 +00:00
pablodanswer
2c971cf774 add claude image-support 2024-09-11 13:31:27 -07:00
trial-danswer
eab55bdd85 Misc clarifications for CONTRIBUTING.md (#2401)
* Reorder and clarify dependency installation instructions

* Clarify instructions for local development with Docker external deps vs full Docker stack

* Final words at the end of the local setup process

---------

Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-11 19:16:37 +00:00
rkuo-danswer
f4f2fb5943 Bugfix/connector deletion test (#2402)
* fixes a bug with deleting connectors and foreign keys

* test foreign key handling on deletion
2024-09-11 12:04:27 -07:00
rkuo-danswer
71f2f1a90a fixes a bug with deleting connectors and foreign keys (#2398) 2024-09-11 12:03:51 -07:00
hagen-danswer
74a2271422 Added HARD_DELETE_CHATS to environment variables (#2397) 2024-09-11 18:08:29 +00:00
trial-danswer
d42fb6ce34 Add link to macOS contributions doc for installing Python 3.11 (#2396)
Co-authored-by: danswer-trial <danswer-trial@danswer-trials-MacBook-Pro.local>
2024-09-11 17:45:52 +00:00
pablodanswer
0d749ebd46 add ccpair id to logging (#2391) 2024-09-11 01:27:03 +00:00
pablodanswer
9f6e8bd124 Improve Dev Experience (#2347)
* clean interfaces + improve dex experience

* update formatting

* update ports

* ports

* remove some number of unnecessary lines

* remove unnecssary isPublicGroupSelector checks in all spots

* add comment

* update building
2024-09-10 20:49:04 +00:00
pablodanswer
3a2a6abed4 Add basic virtualization (#2370)
* add basic virtualization

* functioning perfectly

* squash

* change ports

* remove some comments

* remove comment

* update buffering clarity
2024-09-10 19:06:04 +00:00
pablodanswer
07f49a384f Update spread order (#2386)
* update spread

* update
2024-09-10 18:04:47 +00:00
rkuo-danswer
f1c5e80f17 Feature/background processing (#2275)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-10 16:28:19 +00:00
pablodanswer
b7ad810d83 Prevent spam search (#2367) 2024-09-10 08:44:50 -07:00
pablodanswer
99b28643f7 show groups if they exist for user (#2384) 2024-09-10 15:14:30 +00:00
rkuo-danswer
f52d1142eb Fail instead of continuing if vespa cannot be reached within the time… (#2379)
* Fail instead of continuing if vespa cannot be reached within the timeout period

* improve startup readability

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-10 03:10:25 +00:00
pablodanswer
e563746730 Consent screen (#2381)
* update

* add consent popup

* rm
2024-09-10 02:40:32 +00:00
Yuhong Sun
aa86830bde mypy 2024-09-09 16:43:45 -07:00
James Jordan
4558351801 Zendesk tickets (#2192) 2024-09-09 16:36:53 -07:00
Sebastian Müller
a4dcae57cd Google Drive Plaintext Types (#2371) 2024-09-09 15:37:47 -07:00
pablodanswer
dbd56f946f address pablo's nits (#2368) 2024-09-09 14:44:27 -07:00
hj-danswer
e4e4765c60 Add user when they interact outside of UI (e.g. Slack bot) (#2369)
* Add user when they interact outside of UI (e.g. Slack bot)

* fix mypy errors

* don't use user manager to avoid async messiness

* fix email is none scenario

* fix mypy

* make code slightly clearer

* PR comments

* get slack email in generate button as well

* fix alembic migration

* update name to be more descriptive

---------

Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
2024-09-09 20:21:31 +00:00
rkuo-danswer
c967f53c02 docker versions have been deprecated for a while, so fixing the annoying warning (#2372) 2024-09-09 18:26:12 +00:00
pablodanswer
3a9b964d5c Add Litellm Rerank proxy (#2346)
* add ability ot set reranking litellm proxy

* add fully functional rerank litellm cards

* minor formatting enforcement

* remove logs
2024-09-09 15:57:01 +00:00
Yuhong Sun
f04ecbf87a Un-bump nltk due to llamaindex issue 2024-09-08 16:39:19 -07:00
Shukant Pal
362156f97e Model inference for connector classifier on queries (#2137) 2024-09-08 14:46:00 -07:00
Andres Jose Sebastian Rincon Gonzalez
3fa9676478 [1802] adjust the code to support a different db schemas (#1803) 2024-09-08 14:16:54 -07:00
Chris Weaver
be4b6189d2 Fix streaming auth locally (#2357) 2024-09-08 14:01:26 -07:00
pablodanswer
ace041415a Clearer onboarding + Provider Updates (#2361) 2024-09-08 13:35:20 -07:00
Yuhong Sun
148c2a7375 Remove wordnet (#2365) 2024-09-08 12:34:09 -07:00
pablodanswer
1555ac9dab More explicit credential creation flow (#2363)
* more explcit drive credential creation flow

* remove logs

* update naming

* fix user-contributed formatting

* fix (^) v2
2024-09-08 12:09:23 -07:00
Weves
80de408cef Fix formatting 2024-09-08 12:09:14 -07:00
Cola Chen
e20c825e16 Notion Connector to skip reading external blocks in NotionConnector
The commit skips reading 'external_object_instance_page' blocks in the NotionConnector due to the lack of support in the Notion API. This change is in response to the issue #1761.

Co-authored-by: Cola Chen <6825116+colachg@users.noreply.github.com>
2024-09-08 11:34:04 -07:00
mattboret
b0568ac8ae Sharepoint: Fix get all sites (#1700)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2024-09-08 11:28:11 -07:00
Art Matsak
0896d3b7da Fix content extraction from JIRA with API v2 vs. v3 (#1678) 2024-09-08 11:27:14 -07:00
Kshitiz Gupta
87b27046bd changes to the docker file for mac (#1773) 2024-09-08 11:02:18 -07:00
dependabot[bot]
5e9c6d1499 Bump aiohttp from 3.9.4 to 3.10.2 in /backend/requirements (#2097)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.4 to 3.10.2.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.9.4...v3.10.2)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-08 10:59:47 -07:00
dependabot[bot]
50211ec401 Bump nltk from 3.8.1 to 3.9 in /backend/requirements (#2174)
Bumps [nltk](https://github.com/nltk/nltk) from 3.8.1 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](https://github.com/nltk/nltk/compare/3.8.1...3.9)

---
updated-dependencies:
- dependency-name: nltk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-08 10:50:36 -07:00
Bart Schuller
6012a7cbd9 Fix multilingual .env embedding dimension (#1976) 2024-09-08 10:25:07 -07:00
dependabot[bot]
1e4b27185d Bump torch from 2.0.1 to 2.2.0 in /backend/requirements (#1933)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.0.1 to 2.2.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.0.1...v2.2.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-08 10:17:17 -07:00
Moshe Zada
0c66da17bb Web Connector - Get doc_updated_at from Last-Modified header (#1693) 2024-09-08 10:05:04 -07:00
Art Matsak
d985cd4352 Fix JIRA comment indexing when author has no email (#1663) 2024-09-08 09:43:09 -07:00
Yuhong Sun
c8891a5829 Remove LangChain Community (#2362) 2024-09-08 09:41:20 -07:00
Art Matsak
51a13f5fc7 Implement indexing of simple tables in Word files (#1651) 2024-09-08 09:38:46 -07:00
dependabot[bot]
57c1deb8b8 Bump braces from 3.0.2 to 3.0.3 in /web (#1628)
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.
- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/braces/compare/3.0.2...3.0.3)

---
updated-dependencies:
- dependency-name: braces
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-07 21:06:34 -07:00
dependabot[bot]
e2e04af7e2 Bump msal from 1.26.0 to 1.28.0 in /backend/requirements (#1626)
Bumps [msal](https://github.com/AzureAD/microsoft-authentication-library-for-python) from 1.26.0 to 1.28.0.
- [Release notes](https://github.com/AzureAD/microsoft-authentication-library-for-python/releases)
- [Commits](https://github.com/AzureAD/microsoft-authentication-library-for-python/compare/1.26.0...1.28.0)

---
updated-dependencies:
- dependency-name: msal
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-07 21:05:11 -07:00
lombax85
c1735fcd3a Google Drive connector - txt and markdown support (#1469) 2024-09-07 20:28:23 -07:00
hj-danswer
b43e5735d7 Use user information in Slack bot DMs (#2360)
* Use user information from Slack bot DMs

* fix lint

---------

Co-authored-by: Hyeong Joon Suh <hyeongjoonsuh@Hyeongs-MacBook-Pro.local>
2024-09-08 03:08:24 +00:00
pablodanswer
7d4f8ef4e8 Minor Confluence Fixes for Robustification (#2349)
* add connector config

* update confluence connector
2024-09-08 01:39:49 +00:00
Weves
7c03b6f521 Fix responses for HTTPExceptions 2024-09-07 17:40:21 -07:00
Chris Weaver
ccf986808c Add retries (#2358)
* Add retries

* fix

* add

* remove --build

* Remove cache-to

* Don't push

* Add back push

* Add newline

* Remove alembic logs
2024-09-08 00:12:32 +00:00
pablodanswer
350482e53e Squash misc UX bugs (#2356) 2024-09-07 14:26:14 -07:00
pablodanswer
fb3d7330fa minor QOL improvement on first chat (#2353) 2024-09-07 14:25:05 -07:00
Yuhong Sun
6cec31088d CONTRIBUTING updates (#2354) 2024-09-07 14:05:36 -07:00
pablodanswer
491f3254a5 regeneration - don't remove human message unnecessarily 2024-09-06 15:38:02 -07:00
pablodanswer
5abf67fbf0 PDF metadata + list defaults (#2341)
* validate web list

* update pdf extraction of metadat

* remove pdf + log

* stricter type enforcing

* fix up indexing widths

* minor formatting

* add list case

* check for empty metadata
2024-09-06 21:21:24 +00:00
rkuo-danswer
2933c3598b first cut at redis (#2226)
* first cut at redis

* fix startup dependencies on redis

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* update contributing guide

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-06 19:21:29 +00:00
pablodanswer
aeb6060854 Add ability to delete users (#2342)
* add ability to delete users

* fix tiny build issue

* Add comments
2024-09-06 17:37:04 +00:00
hagen-danswer
8977b1b5fc Paginate connector page (#2328)
* Added pagination to individual connector pages

* I cooked

* Gordon Ramsay in this b

* meepe

* properly calculated max chunk and switch dict to array

* chunks -> batches

* increased max page size

* renmaed var
2024-09-06 17:00:25 +00:00
pablodanswer
69c0419146 Updated refreshing (#2327)
* clean up + add environment variables

* remove log

* update

* update api settings

* somewhat cleaner refresh functionality

* fully functional

* update settings

* validated

* remove random logs

* remove unneeded paramter + log

* move to ee + remove comments

* Cleanup unused

---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-09-06 04:36:55 +00:00
pablodanswer
2bd3833c55 Update search settings + chat/search handling (#2333)
* validate web list

* update search settings + chat/search handling

* remove accidentally added search manager

* minor build fix

* push from local
2024-09-06 00:07:39 +00:00
rkuo-danswer
2d7b312e6c harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now. (#2338) 2024-09-05 20:09:33 +00:00
pablodanswer
ebe3674ca7 update for edge case (#2336) 2024-09-05 17:58:49 +00:00
pablodanswer
04f83eb1e1 Proper popover behavior, no showing queries with no docs, + bubbles (#2330) 2024-09-04 21:26:19 -07:00
pablodanswer
420aabc963 Update UX (#2324) 2024-09-04 18:45:52 -07:00
pablodanswer
61a17319c9 rename directory if needed 2024-09-04 17:22:59 -07:00
hagen-danswer
e4c85352b4 made connectors summary page faster (#2320)
* made connectors summary page faster

* not worth risk
2024-09-04 23:25:45 +00:00
pablodanswer
34ba3181ff Update auth for litellm proxy (#2316)
* update for auth

* validated embedding model names

* remove embedding provider

* remove logs

* add ability to delete search setting

* add abiility to delete models + more streamlined API endpoints

* remove upsert

* minor typing fix

* add connector utils
2024-09-04 20:59:07 +00:00
rkuo-danswer
630e2248bd fixing a race condition in celery task wrapper. could randomly blow up any task. (#2321) 2024-09-04 04:17:29 +00:00
hagen-danswer
c358c91e4c Added instance domain to telemetry (#2310) 2024-09-03 21:04:40 -07:00
Yuhong Sun
2b7915f33b Update Connector README PATH (#2323) 2024-09-03 20:56:37 -07:00
pablodanswer
0ff1a023cd Minor search setting clarity (#2300)
* minor search setting clarity

* 5433

* squash

* remove logs
2024-09-03 20:48:34 -07:00
Yuhong Sun
d68d281e1c Slight copy update (#2322) 2024-09-03 20:14:03 -07:00
hagen-danswer
ebce3ff6ba added wait for sync after creating document set in tests (#2319) 2024-09-04 00:34:40 +00:00
pablodanswer
f96bd12ab8 prevent accidental submission (#2318) 2024-09-03 16:44:54 -07:00
pablodanswer
32359d2dff Add user dropdown seed-able list (#2308)
* add user dropdown seedable list

* minor cleanup

* fix build issue

* minor type update

* remove log

* quick update to divider logic (squash)

* tiny icon updates
2024-09-03 19:24:50 +00:00
Chris Weaver
5da6d792de Add ingestion as a "Source" for the FE + improve typing (#2312) 2024-09-03 12:34:31 -07:00
pablodanswer
fb95398e5b Cleaner stream handling in Answer class (#2314)
* add cleaner stream

* add cleaner stream handling
2024-09-03 18:36:01 +00:00
rkuo-danswer
af66650ee3 fail safely if lookup for document fails (#2309) 2024-09-03 10:01:17 -07:00
pablodanswer
5b1f3c8d4e Formatting nits (#2311)
* stream in all cases

* update code block

* code formatting nits

* proper ports

* proper ports

* remove unnecessary lines
2024-09-03 16:05:02 +00:00
hagen-danswer
a3b1b1db38 fixed doc set table (#2306) 2024-09-03 15:36:07 +00:00
Weves
7520fae068 Add back test 2024-09-02 18:04:55 -07:00
Weves
39c946536c Fix deletion due to foreign key issue 2024-09-02 17:56:43 -07:00
Yuhong Sun
90528ba195 k 2024-09-02 17:33:33 -07:00
pablodanswer
6afcaafe54 Continue Generating (#2286)
* add stop reason

* add initial propagation

* add continue generating full functionality

* proper continue across chat session

* add new look

* propagate proper types

* fix typing

* cleaner continue generating functionality

* update types

* remove unused imports

* proper infodump

* temp

* add standardized stream handling

* validateing chosen tool args

* properly handle tools

* proper ports

* remove logs + build

* minor typing fix

* fix more minor typing issues

* add stashed reversion for tool call chunks

* ignore model dump types

* remove stop stream

* fix typing
2024-09-02 22:49:56 +00:00
Yuhong Sun
812ca69949 Vespa Degraded Handling (#2304) 2024-09-02 15:53:37 -07:00
rkuo-danswer
abe01144ca Update CONTRIBUTING.md (#2298) 2024-09-02 15:30:18 -07:00
Yuhong Sun
d988a3e736 Productboard Minor Fix (#2303) 2024-09-02 14:46:35 -07:00
pablodanswer
2b14afe878 Add proper typing such that tests pass mypy (#2301)
* add proper typing such that tests pass mypy

* nit (squash)

* minor update
2024-09-02 21:03:53 +00:00
Chris Weaver
033ec0b6b1 Remove unused env variables (#2299) 2024-09-02 20:29:14 +00:00
pablodanswer
14a9fecc64 update code block (#2297) 2024-09-02 13:33:18 -07:00
Weves
0027f161d7 Fix revisions 2024-09-02 11:13:55 -07:00
Yuhong Sun
32e551b69c Vespa Log No Response (#2295) 2024-09-02 09:14:28 -07:00
pablodanswer
299cb5035c Add litellm proxy embeddings (#2291)
* add litellm proxy

* formatting

* move `api_url` to cloud provider + nits

* remove log

* typing

* quick tuyping fix

* update LiteLLM selection logic

* remove logs + validate functionality

* rename proxy var

* update path casing

* remove pricing for custom models

* functional values
2024-09-02 09:08:35 -07:00
pablodanswer
910821c723 Ordered indexing status (#2292) 2024-09-02 08:39:18 -07:00
hagen-danswer
aa84846298 Connector deletion fix (#2293)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-09-01 23:32:20 -07:00
pablodanswer
c122be2f6a More explicit Confluence Connector (#2289) 2024-09-01 20:35:29 -07:00
Weves
f871b4c6eb Update default anthropic / bedrock models 2024-09-01 20:35:00 -07:00
hagen-danswer
a96cea2ce0 logging improvements 2024-09-01 16:21:35 -07:00
hagen-danswer
8d443ada5b Integration tests (#2256)
* initial commit

* almost done

* finished 3 tests

* minor refactor

* built out initial permisison tests

* reworked test_deletion

* removed logging

* all original tests have been converted

* renamed user_groups to user_group

* mypy

* added test for doc set permissions

* unified naming for manager methods

* Refactored models and added new deletion test

* minor additions

* better logging+fixed input variables

* commented out failed tests

* Added readme

* readme update

* Added auth to IT

set auth_type to basic and require_email_verification to false

* Update run-it.yml

* used verify and added to readme

* added api key manager
2024-09-01 22:21:00 +00:00
pablodanswer
634de83d72 Very minor update to divider logic (#2287) 2024-08-31 14:40:15 -07:00
Yuhong Sun
580848cf8c mypy (#2283) 2024-08-30 18:02:18 -07:00
Yuhong Sun
f01027cfb7 Catch LLM Eval Failures (#2272) 2024-08-30 17:42:58 -07:00
pablodanswer
76db4b765a Detect GPU on startup for default multi-pass indexing value (#2242) 2024-08-30 17:38:31 -07:00
pablodanswer
5800c7158e Add typing to pdf extraction (#2280) 2024-08-30 17:16:56 -07:00
Weves
21af852073 Add connector creation docs 2024-08-30 16:43:42 -07:00
hagen-danswer
355326f935 Added frontend logical polish (#2274) 2024-08-30 16:42:54 -07:00
Chris Weaver
762b7b1047 Connector tests (#2273) 2024-08-30 15:48:26 -07:00
pablodanswer
df31cac1f1 allow users to deselect reranking (#2243) 2024-08-30 15:40:54 -07:00
pablodanswer
4181124e7a add metadata to pdf extraction (#2278) 2024-08-30 15:14:02 -07:00
pablodanswer
44c45cbf2a Minor simplification to chat header (#2277) 2024-08-30 15:01:55 -07:00
pablodanswer
f2e8680955 Account for edge case in indexing times with connectors #2190 (#2190) 2024-08-30 14:07:07 -07:00
pablodanswer
b952dbef42 Minor search formatting updates (#2276) 2024-08-30 14:02:35 -07:00
pablodanswer
e2f4145cd2 add better spacing (#2265) 2024-08-30 11:56:24 -07:00
pablodanswer
183569061b Minor search UX improvements + Critical connector fixes (#2259) 2024-08-30 11:47:52 -07:00
pablodanswer
8f26728a29 update command keys (#2271) 2024-08-30 10:54:24 -07:00
hagen-danswer
1734a4a18c Added DanswerBot response limit environment variables (#2266)
* Added DanswerBot response limit environment variables

* mypy fix

* changed defaults
2024-08-29 19:25:11 +00:00
rkuo-danswer
766652de14 ignore kombu tables used by celery in alembic (#2261) 2024-08-29 18:49:35 +00:00
pablodanswer
00fa36d591 Get accurate model output max (#2260)
* get accurate model output max

* squash

* udpated max default tokens

* rename + use fallbacks

* functional

* remove max tokens

* update naming

* comment out function to prevent mypy issues
2024-08-29 18:01:56 +00:00
pablodanswer
3b596fd6a8 Default rerank API key to None (new Pydantic compatibility) (#2258)
* default to None

* rm
2024-08-28 16:02:06 +00:00
pablodanswer
5a83b00190 change backg (#2255) 2024-08-28 03:20:06 +00:00
Chris Weaver
57491ceaae Lowercase slack channels automatically (#2254)
* Improve slack channel selection

* Lowercasing slack channels
2024-08-28 03:07:26 +00:00
hagen-danswer
e4e67c61ef Some additional curator polish (#2253) 2024-08-28 02:44:24 +00:00
Chris Weaver
8afa53c6bf Confluence improvements (#2248)
* Confluence improvements

* Improve CONFLUENCE_CONNECTOR_INDEX_ONLY_ACTIVE_PAGES
2024-08-28 02:16:10 +00:00
Weves
fb6637d5b3 Fix quality-checks on merge queue 2024-08-27 19:15:53 -07:00
Yuhong Sun
1e67332078 Remove warning on user signup (#2252) 2024-08-27 18:49:05 -07:00
Weves
effce919bd Remove redundant merge queue files 2024-08-27 18:03:43 -07:00
Weves
e5b3843ef8 Add othe checks to merge queue 2024-08-27 17:55:36 -07:00
josvdw
50c17438d5 Litellm bump (#2195)
* ran bump-pydantic

* replace root_validator with model_validator

* mostly working. some alternate assistant error. changed root_validator and typing_extensions

* working generation chat. changed type

* replacing .dict with .model_dump

* argument needed to bring model_dump up to parity with dict()

* fix a fewremaining issues -- working with llama and gpt

* updating requirements file

* more requirement updates

* more requirement updates

* fix to make search work

* return type fix:

* half way tpyes change

* fixes for mypy and pydantic:

* endpoint fix

* fix pydantic protected namespaces

* it works!

* removed unecessary None initializations

* better logging

* changed default values to empty lists

* mypy fixes

* fixed array defaulting

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-08-28 00:00:27 +00:00
Yuhong Sun
657d2050a5 Confluence Internal Error Handling (#2247) 2024-08-27 15:23:02 -07:00
Yuhong Sun
3640d0c550 Better Web Connector Logging (#2246) 2024-08-27 15:06:24 -07:00
pablodanswer
336ddbd1fe Filter by user for docset display (#2245)
* filter by user for docset display

* spacing
2024-08-27 21:01:04 +00:00
Chris Weaver
8614cd8934 Handle missing email more gracefully (#2244) 2024-08-27 20:29:25 +00:00
pablodanswer
525f3e01f5 remove constant refresh artifact (#2241) 2024-08-27 17:51:59 +00:00
pablodanswer
feaa85f764 new util for modal edge cases 2024-08-27 09:50:17 -07:00
pablodanswer
b36cd4937f Cleaner + cleaner assistants creation flow etc. (#2232)
* rework assistants creation flow + components

* remove unnecessary padding + validate each page

* remove additional spacing

* rebase + form
2024-08-27 16:01:57 +00:00
pablodanswer
97ba71e1b3 Db search (#2235)
* k

* update enum imports

* add functional types + model swaps

* remove a log

* remove kv

* fully functional + robustified for kv swap

* validated with hosted + cloud

* ensure not updating current search settings when reindexing

* add instance check

* revert back to updating search settings (will need a slight refactor for endpoint)

* protect advanced config override1

* run pretty

* fix typing

* update typing

* remove unnecessary function

* update model name

* clearer interface names

* validated foreign key constaint

* proper migration

* squash

---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-08-27 04:26:51 +00:00
pablodanswer
5f12b7ad58 Rebased concurrent chats (#2214)
* refactored for stop / regenerate

* properly reset blank screen

* functional new message carry-over

* robust chat session state persistence

* add env variable

* rebased onto regenerate

* squash

* squash

* squash

* rebase + robustify tool calling

* squash

* alembic

* remove environment variable

* simplify interface

* squash

* minor streaming improvement

* some robustification
2024-08-27 02:57:31 +00:00
Chris Weaver
a873fc6483 Fix Confluence freezing (#2239) 2024-08-26 19:44:01 -07:00
Chris Weaver
c0e1a02e8e Add it on merge queue (#2112)
* Github action to run integration tests

* Improve

* Fix build

* Add pull

* Fix readiness script

* Add IT runner

* Add IT runner

* Add logs

* update

* Fix

* Fix path

* file path

* test

* fix

* fix

* fix

* test

* network

* fix

* cleanup

* fix

* test

* Fix downgrade

* Add OpenAI API key

* Add VESPA_HOST

* test pulling first

* Add API server host

* Cache tweak

* Fix pull/push settings:

* Stop pushing to latest tag

* test cache change

* test

* test

* test

* remove cache temporarily

* Fix

* Enable EE

* test

* Remove duplicate funcs

* add back build

* Update all

* Fix stop cmd

* Add to merge queue

* Cleanup image tag
2024-08-26 07:20:28 +00:00
hagen-danswer
205c3c3fc8 Combined the get document set endpoints (#2234)
* Combined the get document set endpoints

* removed unused function

* fixed permissioning for document sets
2024-08-25 19:02:27 +00:00
Christian Köberl
e5ceb76de8 Fix icons in personas (assistants) - AWS and Azrue were mixed up (#2027)
Fixes #2025
2024-08-25 06:36:50 +00:00
hagen-danswer
c21b0ee3f5 Curator polish (#2229)
* add new user provider hook

* account for additional logic

* add users

* remove is loading

* Curator polish

* useeffect -> provider + effect

* squash

* use use user for user default models

* squash

* Added ability to add users to groups among other things

* final polish

* added connection button to groups

* mypy fix

* Improved document set clarity

* string fixes

---------

Co-authored-by: pablodanswer <pablo@danswer.ai>
2024-08-25 01:10:24 +00:00
pablodanswer
1e1b2a0901 add some quick search filterl logic (UX) (#2218) 2024-08-24 15:45:46 -07:00
hagen-danswer
c1c35b00cb Fixed slack bot auto filters for document sets (#2231) 2024-08-24 18:54:21 +00:00
pablodanswer
1bc899cc67 Add CSS identifiers to main sections (#2224)
* squash

* add initial ids
2024-08-23 23:59:17 +00:00
pablodanswer
6fc6ee5c37 Update white-labelling to be clearer (advanced settings) (#2228)
* update white labelling to be somewhat clearer

* ensure logotype set to null post submission
2024-08-23 20:47:16 +00:00
Weves
7d201f67d4 Fix typing for custom tool response 2024-08-23 13:34:53 -07:00
pablodanswer
e749fa0f28 Update search tool selection (#2223)
* update search tool selection

* squash
2024-08-23 16:58:39 +00:00
pablodanswer
2e0222d1c1 logotype from toggle -> redirect (#2222) 2024-08-23 16:15:50 +00:00
pablodanswer
c152123ef4 alembic once again (#2221) 2024-08-23 05:28:13 +00:00
Chris Weaver
5cb9c17ddf Add better logging for connectors (#2219)
* Add better logging for connectors

* fix
2024-08-23 03:58:29 +00:00
Chris Weaver
b1302303b2 Add chat_session_id + message_pair_num (#2220) 2024-08-22 20:55:21 -07:00
pablodanswer
e89dc67e5d Update embedding interface (#2205)
* squash

* simplify interface

* some updates to typing

* cloud provider type

* update typing to be even clearer

* push local commit (squash)

* cleaner interfaces

* another quick pass

* squash

* cleaner alembic

* cleaner

* remove trailing whitespace

* add sequence

* quick circle back to double check

* update

* update naming

* update naming
2024-08-23 03:52:02 +00:00
pablodanswer
7da6d33451 slightly updated settings error (#2217)
* update settings issues

* slightly updated settings error
2024-08-23 02:19:00 +00:00
hagen-danswer
c042a19c00 Curator role (#2166)
* Added backend support for curator role

* modal refactor

* finalized first 2 commits

same as before

finally

what was it for

* added credential, cc_pair, and cleanup

mypy is super helpful hahahahahahahahahahahaha

* curator support for personas

* added connector management permission checks

* fixed the connector creation flow

* added document access to curator

* small cleanup added comments and started ui

* groups and assistant editor

* Persona frontend

* Document set frontend

* cleaned up the entire frontend

* alembic fix

* Minor fixes

* credentials section

* some credential updates

* removed logging statements

* fixed try catch

* fixed model name

* made everything happen in one db commit

* Final cleanup

* cleaned up fast code

* mypy/build fixes

* polish

* more token rate limit polish

* fixed weird credential permissions

* Addressed chris feedback

* addressed pablo feedback

* fixed alembic

* removed deduping and caching

* polish!!!!
2024-08-23 01:39:37 +00:00
pablodanswer
5409777e0b add edge case (#2216) 2024-08-22 20:40:18 +00:00
josvdw
5f4b7dd23e clarify what model and provider name should be for custom models (#2215) 2024-08-22 20:04:10 +00:00
Chris Weaver
99db27d989 Add metadata for simple doc (#2212) 2024-08-22 12:30:28 -07:00
pablodanswer
197b62aed1 Regenerate (branch of stop) (#2157)
* add regenerate

* functional once again post rebase but quite ugly

* validated + cleaner UI

* more robust implementation for first messages

* squash

* remove parameter

* proper margin

* clarify for future programmers

* remove some logs

* self nit pick - smoother ux

* more self-nits

* stroke line cap

* rebase
2024-08-22 19:06:44 +00:00
Yuhong Sun
9d5db05e4b Add Migration (#2213) 2024-08-22 10:44:42 -07:00
pablodanswer
27e094d2ec allow graceful 404s (#2211) 2024-08-22 17:00:20 +00:00
Yuhong Sun
1a9e5da7c0 Enable Surrounding Context (#2210) 2024-08-22 09:59:13 -07:00
Chris Weaver
8afcb03f3c Fix OIDC expiry issues (#2206)
* Fix oidc expiry issues

* fix

* fix
2024-08-22 03:15:17 +00:00
Chris Weaver
9bf42d2303 Fix connectors running while deleting (#2204)
* Fix connectors running while deleting

* fix
2024-08-22 02:18:01 +00:00
rkuo-danswer
e50b558b5b prevent usage of combinedSettings if endpoints fail (which none of them should) (#2201) 2024-08-22 01:27:38 +00:00
Chris Weaver
020dff52f7 Remove settings cache (#2203) 2024-08-21 17:55:23 -07:00
pablodanswer
13303edf29 Jira email optional + PAT (#2198)
* make jira email optional

* remove logs

* remove more logs

* change wording from PAT -> Personal Access Token

* ensure name fits in default width
2024-08-21 22:59:08 +00:00
rkuo-danswer
584eae17e3 fix message param to use query instead of rephrased query (#2199) 2024-08-21 18:00:55 +00:00
rkuo-danswer
b9b633bb74 support indexing attachments as separate docs when not part of a page (#2194)
* support indexing attachments as separate docs when not part of a page

* fix time filter, fix batch handling, fix returned number of attachments processed
2024-08-21 17:15:13 +00:00
Yuhong Sun
bb1916d5d0 Warm Up Models Prep (#2196) 2024-08-20 20:53:02 -07:00
pablodanswer
048cb8dd55 update alembic version (for rebase) (#2193) 2024-08-21 02:58:22 +00:00
Yuhong Sun
3b035d791e Fix Model Server (#2191) 2024-08-20 17:57:09 -07:00
pablodanswer
53387ab3eb Simplify index and model name swap logic (#2188) 2024-08-20 17:31:00 -07:00
Yuhong Sun
ec6e2369a1 Log YQL (#2189) 2024-08-20 17:03:57 -07:00
hagen-danswer
075eacdd91 added collection and collection type to Guru metadata (#2187)
* added collection and collection type to metadata

* removed collection type
2024-08-20 23:29:40 +00:00
hagen-danswer
f77b1ebd87 Updated pruning defaults (#2186)
* Updated pruning defaults

* changed minutes to days
2024-08-20 23:29:19 +00:00
rkuo-danswer
1ddb4b2025 normalize emails on bulk invite, normalize/lowercase emails on invite… (#2184)
* normalize emails on bulk invite, normalize/lowercase emails on invite matching

* fix validate_email import
2024-08-20 22:15:42 +00:00
Yuhong Sun
42f0fea9f8 Fix Assistant vs Persona (#2185) 2024-08-20 14:43:15 -07:00
Yuhong Sun
8de04acb7f k 2024-08-20 14:06:49 -07:00
pablodanswer
5053f4e383 Add granularity to filter widths (#2183) 2024-08-20 13:39:08 -07:00
Chris Weaver
730a757090 Disable oidc_expiry by default (#2182) 2024-08-20 13:24:58 -07:00
pablodanswer
006cfa1d3d fix text selection + closing modal 2024-08-20 13:15:17 -07:00
pablodanswer
69f6b7d148 Update SSE handling to accommodate slow networks (#2180) 2024-08-20 12:57:17 -07:00
pablodanswer
53a3fb8e52 Scrollable user model (#2177) 2024-08-20 12:25:06 -07:00
pablodanswer
919110a655 Untoggle sidebar fully on untoggling (#2179)
* add explicit untoggle

* add to all history sidebars

* add back commented out line

* add comment
2024-08-20 19:19:17 +00:00
pablodanswer
19cccd267d show full stack trace 2024-08-20 11:45:14 -07:00
pablodanswer
71c2b16a01 Pull out stripping of model suffix (#2175) 2024-08-20 11:32:03 -07:00
Yuhong Sun
12f0dbcfc5 Background Container Logs (#2176) 2024-08-20 11:26:45 -07:00
rkuo-danswer
583bd1d207 add kombu message cleanup task (#2172)
* add kombu message cleanup task

* added some logging if we find an associated task (since tasks shouldn't be around for longer than 7 days)
2024-08-20 05:15:44 +00:00
pablodanswer
8a4e47781b remove history sidebar on mouse exiting window (#2173) 2024-08-19 23:15:54 +00:00
Chris Weaver
af647959f6 Performance Improvements (#2162) 2024-08-19 11:07:00 -07:00
pablodanswer
ea53977617 prevent empty doc link click (#2170) 2024-08-19 18:03:36 +00:00
Weves
c44c22a009 Fix model server 2024-08-19 07:23:24 -07:00
Yuhong Sun
5ab4d94d94 Logging Level Update (#2165) 2024-08-18 21:53:40 -07:00
Yuhong Sun
119aefba88 Add log files to containers (#2164) 2024-08-18 19:18:28 -07:00
pablodanswer
12fccfeffd Add stop generating functionality (#2100)
* functional types + sidebar

* remove commits

* remove logs

* functional rework of temporary user/assistant ID

* robustify switching

* remove logs

* typing

* robustify frontend handling

* cleaner loop + data persistence

* migrate to streaming response

* formatting

* add new loading state to prevent collisions

* add `ChatState` for more robust handling

* remove logs

* robustify typing

* unnecessary list removed

* robustify

* remove log

* remove false comment

* slightly more robust chat state

* update utility + copy

* improve clarity + new SSE handling utility function

* remove comments

* clearer

* add back stack trace detail

* cleaner messages

* clean final message handling

* tiny formatting (remove newline)

* add synchronous wrapper to avoid hampering main event loop

* update typing

* include logs

* slightly more specific logs

* add `critical` error just in case
2024-08-18 22:15:55 +00:00
Yuhong Sun
8a7bc4e411 Log Level Default (#2163) 2024-08-18 14:35:32 -07:00
rkuo-danswer
492797c9f3 Feature/indexing errors (#2148)
* backend changes to handle partial completion of index attempts

* typo fix

* Display partial success in UI

* make log timing more readable by limiting printed precision to milliseconds

* forgot alembic

* initial cut at "completed with errors" indexing

* remove and reorganize unused imports

* show view errors while indexing is in progress

* code review fixes
2024-08-18 19:14:32 +00:00
Yuhong Sun
739058aacc Logging updates (#2159) 2024-08-17 22:05:09 -07:00
Chris Weaver
17570038bb Add PG query logging (#2156) 2024-08-16 21:53:54 -07:00
Yuhong Sun
c0edfb50df k 2024-08-16 21:43:14 -07:00
pablodanswer
22573aba2a Improve Search (#2105) 2024-08-16 21:29:15 -07:00
Chris Weaver
efae24acd0 improve model seeding (#2155) 2024-08-17 01:30:13 +00:00
pablodanswer
f8e0e6f015 Extremely robustified Index Attempt migration (#2151)
* account for connector_id edge case

* robustified
2024-08-17 01:12:18 +00:00
pablodanswer
3cbc341b60 Enable persistence / removal of assistant icons + remove accidental regression (#2153)
* enable persistence / removal of assistant icons + remove accidental regression

* simpler env seeding for web building
2024-08-17 01:11:04 +00:00
pablodanswer
46c7089328 Enable seeding of analytics via file path (#2146)
* enable seeding of analytics via file path

* remove log
2024-08-16 03:14:56 +00:00
pablodanswer
3ffbe659e3 add handling for poorly formatting model names (#2143) 2024-08-15 22:01:57 +00:00
pablodanswer
33fed955d9 Add verbose error messages + robustify assistant switching (#2144)
* add verbose error messages + robustify assistant switching and chat sessions

* fix typing

* cleaner errors + add stack trace
2024-08-15 21:05:04 +00:00
rkuo-danswer
9fa4280f96 add configurable support for memory tracing during indexing (#2140) 2024-08-15 20:40:17 +00:00
Yuhong Sun
4d194bc86a Cohere No Large Chunks (#2145) 2024-08-15 10:18:54 -07:00
Weves
0853d1a8f1 Update force deletion script 2024-08-14 23:29:26 -07:00
Weves
f6547a64a0 More logging for SAML endpoints 2024-08-14 23:25:42 -07:00
hagen-danswer
61b5bd569b Reworked chunking to support mega chunks (#2032) 2024-08-14 22:18:53 -07:00
pablodanswer
680388537b UX clarity + minor new features (#2136) 2024-08-14 15:23:36 -07:00
pablodanswer
d9bcacfae7 validate messages (#2139) 2024-08-14 22:06:48 +00:00
hagen-danswer
2ab192933b Added import statement to fix typescript error (#2138) 2024-08-14 20:10:08 +00:00
Yuhong Sun
1c10f54294 GPU Model Server (#2135) 2024-08-14 11:04:28 -07:00
josvdw
0530f4283e updating readme for widget (#2132)
Co-authored-by: Jos Van der westhuizen <jos@danser.ai>
2024-08-14 16:55:59 +00:00
pablodanswer
3540aa579b Add ux improvements (#2130)
* add ux improvements

* add danswer version display

* show version properly

* improve copy + add web version to settings context

* update copy + danswer version
2024-08-14 16:43:52 +00:00
josvdw
54732a83c9 stopgap: clarify text on standard answer page for improved UX (#2122)
* stopgap: clarify text on standard answer page for improved UX

* replce apostrophe

* using tailwind:

---------

Co-authored-by: Jos Van der westhuizen <jos@danser.ai>
2024-08-14 01:28:49 +00:00
pablodanswer
5e6365c449 Minor update to clarify user adding (#2126)
* minor update to clarify user adding

* Update page.tsx

* run pretty
2024-08-13 21:09:51 +00:00
rkuo-danswer
20369fc451 Refactor/default indexing embedder (#2073)
* refactor embedding model instantiation

* remove unused UNCERTAINTY_PAT constant

* typo fixes

* fix mypy typing issues

* more typing fixes

* log attempt.id on dispatch

* unnecessary check removed after fixing type
2024-08-13 21:01:34 +00:00
rkuo-danswer
f15d6d2b59 allow admin role api keys (#2124)
* allow admin role api keys

* bump to rerun deployment

* types needs explicit export now for APIKey

* remove api_key.role, use User.role instead

* fix formatting

* formatting

* formatting

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-08-13 21:00:57 +00:00
pablodanswer
5dda047999 Always show search filters (#2128) 2024-08-13 13:36:46 -07:00
pablodanswer
ffd9b0180b Fix overflow for quotes in search section (#2123)
* fix overflow for quotes in search section

* proper overflow check
2024-08-13 20:32:11 +00:00
Yuhong Sun
5ad54fec87 Inference to handle no link docs (#2129) 2024-08-13 12:40:11 -07:00
hagen-danswer
d636181aa5 Added catch for empty link (#2037) 2024-08-12 20:08:56 -07:00
pablodanswer
e12ed7750a Add scrollbar to search / chat (#2121)
* add scrollbar to search / chat

* show overflow for lists
2024-08-13 03:07:37 +00:00
hagen-danswer
bbb8c5ff0b Speed up docker launch (#2099)
* use move instead of copy

* added logging

* fix overwrites

* tested throughly

* fixes

* clearer commenting
2024-08-13 00:45:05 +00:00
pablodanswer
83e945ba57 add cleaner / consolidate no docs found message (#2119) 2024-08-12 16:04:59 -07:00
rkuo-danswer
26df869b91 Feature/harden memory limits (#2118)
* log warning in indexer when size exceeds INDEXING_SIZE_WARNING_THRESHOLD

* add configurable attachment size limit for confluence

* specify "attachments"
2024-08-12 15:12:34 -07:00
Weves
1a4df1d65e Remove unnecessary LLM settings 2024-08-12 11:33:49 -07:00
Chris Weaver
0a165aae0b Slack improvements (#2113) 2024-08-11 21:27:37 -07:00
rkuo-danswer
e517f47a89 add send-message-simple-with-history endpoint to avoid… (#2101)
* add send-message-simple-with-history endpoint to support ramp. avoids bad json output in models and allows client to pass history in instead of maintaining it in our own session

* slightly better error checking

* addressing code review

* reject on any empty message

* update test naming
2024-08-12 03:33:52 +00:00
Nathan Schwerdfeger
c7e5b11c63 EE Connector Deletion Bugfix + Refactor (#2042)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-08-11 20:33:07 -07:00
Yuhong Sun
79523f2e0a Warm up reranker (#2111) 2024-08-11 15:20:51 -07:00
pablodanswer
7fae66b766 provider type default to none (#2110) 2024-08-11 14:51:12 -07:00
Yuhong Sun
386b229ed3 Cohere Rerank (#2109) 2024-08-11 14:22:42 -07:00
Yuhong Sun
ce666f3320 Propagate Embedding Enum (#2108) 2024-08-11 12:17:54 -07:00
Yuhong Sun
d60fb15ad3 Allowing users to set Search Settings (#2106) 2024-08-10 20:48:58 -07:00
pablodanswer
7358ece008 enable assistant editing 2024-08-10 14:38:34 -07:00
josvdw
9c5d33e198 open chatdocument links in a new tab instead of overriding danswer (#2090)
Co-authored-by: Jos Van der westhuizen <jos@danser.ai>
2024-08-10 21:37:59 +00:00
pablodanswer
7d5cfd2fa3 Add user specific model defaults (#2043) 2024-08-10 14:37:33 -07:00
Yuhong Sun
a4caf66a35 User Notification Backend (#2104) 2024-08-10 11:39:21 -07:00
pablodanswer
0a8d44b44c quote processing for lengthy intros (#2103) 2024-08-10 11:09:45 -07:00
pablodanswer
cc8a6da8e3 improve llm-generated citations (account for edge case) (#2096)
* improve llm-generated citations (account for edge case)

* additional test case
2024-08-10 02:06:39 +00:00
pablodanswer
54d4526b73 (Minor) Add cleaner search, feedback model, and connector view (#2098)
* add cleaner search, feedback model, and connector view

* Update ChatPage.tsx
2024-08-10 01:54:31 +00:00
Yuhong Sun
c8ead6a0dc Need Reindexing Flag Setup (#2102) 2024-08-09 17:44:57 -07:00
pablodanswer
7bfa99766d Add support for google slides (#2083)
* add support for google slides

* remove log + account for dead code

* squash
2024-08-09 17:12:51 +00:00
hagen-danswer
b230082891 Openai encoding temp hotfix (#2094) 2024-08-09 08:17:31 -07:00
Yuhong Sun
8cd1eda8b1 Rework Rerankers (#2093) 2024-08-08 21:33:49 -07:00
Yuhong Sun
7dcc42aa95 Intent Model Update (#2069) 2024-08-08 20:45:53 -07:00
pablodanswer
e59d1a0294 fix edge case with simpler code block + python formatting (#2092) 2024-08-08 20:44:32 -07:00
pablodanswer
384e61f4b0 add new gpt-4o model 2024-08-08 16:32:57 -07:00
pablodanswer
f28b930475 Image -> img (#2087) 2024-08-08 21:46:42 +00:00
pablodanswer
1d989f5343 Fix model override for persisting default assistant (#2081)
* fix model override for persisting default assistant

* run pretty

* don't modify

* Update ChatPage.tsx
2024-08-08 21:22:19 +00:00
pablodanswer
c1e3a1b3e7 Select proper assistant override (#2068)
* encode images properly

* proper assistant default model updates

* remove now unneeded image encoding update

* update naming of persona llm option gathering
2024-08-08 21:02:11 +00:00
rkuo-danswer
be9ed319d5 add unit test for quotes (#2085)
* add unit test for quotes

* test answer and quotes together
2024-08-08 18:20:07 +00:00
pablodanswer
c630fcffee Improve code block formatting (#2084)
* initial update to styling

* fix chat input bar padding

* improve color choices
2024-08-08 17:12:35 +00:00
josvdw
f411b9cb55 quality of life improvements for the launch.json template (#2082)
Co-authored-by: Jos Van der westhuizen <jos@danser.ai>
2024-08-08 06:39:30 +00:00
Richard Kuo (Danswer)
bdaaebe955 use re.search instead of re.match (which searches from start of string only) 2024-08-07 20:55:18 -07:00
pablodanswer
9eb48ca2c3 account for empty links + fix quote processing 2024-08-07 20:55:18 -07:00
rkuo-danswer
509fa3a994 add postgres configuration (#2076) 2024-08-08 00:13:59 +00:00
pablodanswer
5097c7f284 Handle saved search docs in eval flow (#2075) 2024-08-07 16:18:34 -07:00
pablodanswer
c4e1c62c00 Admin UX updates (#2057) 2024-08-07 14:55:16 -07:00
pablodanswer
eab82782ca Add proper delay for assistant switching (#2070)
* add proper delay for assistant switching

* persist input if possible
2024-08-07 14:46:15 -07:00
pablodanswer
53d976234a proper new chat button redirects (#2074) 2024-08-07 14:44:42 -07:00
pablodanswer
44d8e34b5a Improve seeding (includes all enterprise features) (#2065) 2024-08-07 10:44:33 -07:00
pablodanswer
d2e16a599d Improve shared chat page (#2066)
* improve look of shared chat page

* remove log

* cleaner display

* add initializing loader to shared chat page

* updated danswer loaders (for prism)

* remove default share
2024-08-07 16:13:55 +00:00
pablodanswer
291e6c4198 somewhat clearer API errors (#2064) 2024-08-07 03:04:26 +00:00
Chris Weaver
bb7e1d6e55 Add integration tests for document set syncing (#1904) 2024-08-06 18:00:19 -07:00
rkuo-danswer
fcc4c30ead don't skip the start of the json answer value (#2067) 2024-08-06 23:59:13 +00:00
pablodanswer
f20984ea1d Don't persist error perennially (#2061)
* don't persist error perennially

* proper functionality

* remove logs

* remove another log

* add comments for clarity + reverse conditional

* add comment back

* remove comment
2024-08-06 23:09:25 +00:00
pablodanswer
e0f0cfd92e Ensure relevance functions for selected docs (#2063)
* ensure relevance functions for selected docs

* remove logs

* remove log
2024-08-06 21:06:44 +00:00
pablodanswer
57aec7d02a doc sidebar width fix 2024-08-06 13:48:47 -07:00
pablodanswer
6350219143 Add proper default temperature + overrides (#2059)
* add proper default temperature + overrides

* remove unclear commment

* ammend defaults + include internet serach
2024-08-06 19:57:14 +00:00
pablodanswer
3bc2cf9946 update tool display bubbles to have cursor-dfeault 2024-08-06 12:49:42 -07:00
pablodanswer
7f7452dc98 Whitelabelling consistency (#2058)
* add white labelling to admin sidebar

* even more consistency
2024-08-06 19:45:38 +00:00
pablodanswer
dc2a50034d Clean chat banner (#2056)
* fully functional

* formatting

* ensure consistency with large logos

* ensure mobile support
2024-08-06 19:44:14 +00:00
pablodanswer
ab564a9ec8 Add cleaner loading / streaming for image loading (#2055)
* add image loading

* clean

* add loading skeleton

* clean up

* clearer comments
2024-08-06 19:28:48 +00:00
rkuo-danswer
cc3856ef6d enforce index attempt deduping on secondary indexing. (#2054)
* enforce index attempt deduping on secondary indexing.

* black fix

* typo fixes

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-08-06 17:45:16 +00:00
Yuhong Sun
a8a4ad9546 Chunk Filter Metadata Format (#2053) 2024-08-05 15:12:36 -07:00
pablodanswer
5bfdecacad fix assistant drag transform effect (#2052) 2024-08-05 14:53:38 -07:00
pablodanswer
0bde66a888 remove "quotes" section (#2049) 2024-08-05 18:51:43 +00:00
pablodanswer
5825d01d53 Better assistant interactions + UI (#2029)
* add assistnat re-ordering, selections, etc.

* squash

* remove unnecessary comment

* squash

* adapt dragging for all IDs + smoother animation + consistency

* fix minor typing issue

* fix minor typing issue

* remove logs
2024-08-05 18:22:57 +00:00
pablodanswer
cd22cca4e8 remove non-EE public connector options 2024-08-05 11:14:20 -07:00
pablodanswer
a3ea217f40 ensure consistency of answers + update llm relevance prompting (#2045) 2024-08-05 08:27:15 -07:00
pablodanswer
66e4dded91 Add properly random icons to assistant creation page (#2044) 2024-08-04 23:30:17 -07:00
pablodanswer
6d67d472cd Add answers to search (#2020) 2024-08-04 23:02:55 -07:00
Weves
76b7792e69 Harden embedding calls 2024-08-04 15:11:45 -07:00
Chris Weaver
9d7100a287 Fix secondary index attempts showing up as the primary index status + scheduling while in-progress (#2039) 2024-08-04 13:29:44 -07:00
pablodanswer
876feecd6f Fix code pasting formatting (#2033)
* fix pasting formatting

* add back small comments
2024-08-04 09:56:48 -07:00
pablodanswer
0261d689dc Various Admin Page + User Flow Improvements (#1987) 2024-08-03 18:09:46 -07:00
pablodanswer
aa4a00cbc2 fix minor html error (#2034) 2024-08-03 12:40:07 -07:00
Nathan Schwerdfeger
52c505c210 Remove partially implemented reply cancellation (#2031)
* fix: remove partially implemented response cancellation

* feat: notify user when unsupported chat cancellation is requested

* fix: correct ChatInputBar streaming detection logic
2024-08-03 18:12:04 +00:00
pablodanswer
ed455394fc detect foreign key composition sessions (#2024) 2024-08-02 17:26:57 +00:00
hagen-danswer
57cc53ab94 Added content tags to zendesk connector (#2017) 2024-08-02 10:09:53 -07:00
rkuo-danswer
6a61331cba Feature/log despam (#2022)
* move a lot of log spam to debug level. Consolidate some info level logging

* reformat more indexing logging
2024-08-02 15:28:53 +00:00
Weves
51731ad0dd Fix issue where large docs/batches break openai embedding 2024-08-02 01:07:09 -07:00
rkuo-danswer
f280586e68 pass function to Process correctly instead of running it inline (#2018)
* pass function to Process correctly instead of running it inline

* mypy fixes and pass back return result (even tho we don't use it right now)
2024-08-02 00:06:35 +00:00
hagen-danswer
e31d6be4ce Switched build to use a larger runner (#2019) 2024-08-01 14:29:45 -07:00
hagen-danswer
e6a92aa936 support confluence single page only indexing (#2008)
* added index recursively checkbox

* mypy fixes

* added migration to not break existing connectors
2024-08-01 20:32:46 +00:00
pablodanswer
a54ea9f9fa Fix cartesian issue with index attempts (#2015) 2024-08-01 10:25:25 -07:00
Yuhong Sun
73a92c046d Fix chunker (#2014) 2024-08-01 10:18:02 -07:00
pablodanswer
459bd46846 Add Prompt library (#1990) 2024-08-01 08:40:35 -07:00
Chris Weaver
445f7e70ba Fix image generation (#2009) 2024-08-01 00:27:02 -07:00
Yuhong Sun
ca893f9918 Rerank Handle Null (#2010) 2024-07-31 22:59:02 -07:00
hagen-danswer
1be1959d80 Changed default local model to nomic (#1943) 2024-07-31 18:54:02 -07:00
Chris Weaver
1654378850 Fix user dropdown font (#2007) 2024-08-01 00:29:14 +00:00
Chris Weaver
d6d391d244 Fix not_applicable (#2003) 2024-07-31 21:30:07 +00:00
rkuo-danswer
7c283b090d Feature/postgres connection names (#1998)
* avoid reindexing secondary indexes after they succeed

* use postgres application names to facilitate connection debugging

* centralize all postgres application_name constants in the constants file

* missed a couple of files

* mypy fixes

* update dev background script
2024-07-31 20:36:30 +00:00
pablodanswer
40226678af Add proper default values for assistant editing / creation (#2001) 2024-07-31 13:34:42 -07:00
rkuo-danswer
288e6fa606 Bugfix/pg connections (#2002)
* increase max_connections to 150 in all docker files

* lower celery worker concurrency to 6
2024-07-31 19:49:20 +00:00
hagen-danswer
5307d38472 Fixed tokenizer logic (#1986) 2024-07-31 09:59:45 -07:00
Yuhong Sun
d619602a6f Skip shortcut docs (#1999) 2024-07-31 09:51:01 -07:00
Yuhong Sun
348a2176f0 Fix Dropped Documents (#1997) 2024-07-31 09:33:36 -07:00
pablodanswer
89b6da36a6 process files with null title (#1989) 2024-07-31 08:18:50 -07:00
Yuhong Sun
036d5c737e No Null Embeddings (#1982) 2024-07-30 19:54:49 -07:00
pablodanswer
60a87d9472 Add back modals on chat page (#1983) 2024-07-30 17:42:59 -07:00
pablodanswer
eb9bb56829 Add initial mobile support (#1962) 2024-07-30 17:13:50 -07:00
hagen-danswer
d151082871 Moved warmup_encoders into scope (#1978) 2024-07-30 16:37:32 +00:00
pablodanswer
e4b1f5b963 fix index attempt migration where no credential ID 2024-07-30 08:57:57 -07:00
hagen-danswer
3938a053aa Rework tokenizer (#1957) 2024-07-29 23:01:49 -07:00
pablodanswer
7932e764d6 Make chat page layout cleaner + fix updating assistant images (#1973)
* ux updates for clarity
- [x] 'folders' -> 'chat folders'
- [x] sidebar to bottom left and smaller
- [x] Sidebar -> smaller logo
- [x] Align things properly
- [x] Expliti Pin: immediate + "Pin / Unpin"
- [x] Logo size smaller
- [x] Align things properly
- [x] Optionally fix gradient in sidebar
- [x] Upload logo to existing assistants

* remove unneeded logs

* run pretty

* actually run pretty!

* fix web file type

* fix very minor typo

* clean type for buildPersonaAPIBody

* fix span formatting

* HUGE ui change
2024-07-30 03:44:35 +00:00
Chris Weaver
fb6695a983 Fix flow where oidc_expiry is different from token expiry (#1974) 2024-07-30 03:17:08 +00:00
rkuo-danswer
015f415b71 avoid reindexing secondary indexes after they succeed (#1971) 2024-07-30 03:12:58 +00:00
rkuo-danswer
96b582070b authorized users and groups only have read access (#1960)
* authorized users and groups only have read access

* slightly better variable naming
2024-07-29 19:53:42 +00:00
rkuo-danswer
4a0a927a64 fix removed parameter in MediaWikiConnector (#1970) 2024-07-29 18:47:30 +00:00
hagen-danswer
ea9a9cb553 Fix typing for previous message 2024-07-29 10:01:38 -07:00
pablodanswer
38af12ab97 remove unnecessary index drop (#1968) 2024-07-29 09:51:53 -07:00
hagen-danswer
1b3154188d Fixed default indexing frequency (#1965)
* Fixed default indexing frequency

* fixed more defaults
2024-07-29 08:14:49 -07:00
Weves
1f321826ad Bigger images 2024-07-28 23:47:06 -07:00
Weves
cbfbe4e5d8 Fix image generation follow up q 2024-07-28 23:47:06 -07:00
pablodanswer
3aa0e0124b Add new admin page (#1947)
* add admin page

* credential + typing fix

* rebase fix

* on add, cleaner buttons

* functional G + Ddrive

* organized auth sections

* update types and remove logs

* ccs -> connectors

* validated formik

* update styling + connector-handling logic

* udpate colors

* separate out hooks + util functions

* update to adhere to rest standards

* remove "todos"

* rebase

* copy + formatting + sidebar

* update statuses + configuration possibilities

* update interfaces to be clearer

* update indexing status page

* formatting

* address backend security + comments

* update font

* fix form routing

* fix hydration error

* add statuses, fix bugs, etc. (squash)

* fix color (squash)

* squash

* add functionality to sidebar

* disblae buttons if deleting

* add color

* minor copy + formatting updates
- on modify credential, close
- update copy for deletion of connectors

* fix build error

* copy

---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-07-28 20:57:43 -07:00
Yuhong Sun
f2f60c9cc0 Fix EE Import backoff Logic (#1959) 2024-07-27 11:06:11 -07:00
Emerson Gomes
6c32821ad4 Allow removal of max_output_tokens by setting GEN_AI_MAX_OUTPUT_TOKENS=0 (#1958)
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
2024-07-27 09:07:29 -07:00
Weves
d839595330 Add query override 2024-07-26 17:40:21 -07:00
Yuhong Sun
e422f96dff Pull Request Template (#1956) 2024-07-26 17:34:05 -07:00
Weves
d28f460330 Fix black 2024-07-26 16:43:15 -07:00
Eugene Astroner
8e441d975d Issue fix 2024-07-26 16:40:31 -07:00
pablodanswer
5c78af1f07 Deduplicate model names (#1950) 2024-07-26 16:30:49 -07:00
rkuo-danswer
e325e063ed Bugfix/persona access (#1951)
* also allow access to a persona if the user is in the list of authorized users or groups

* add comment on potential performance improvements

* work around for mypy typing
2024-07-26 22:05:57 +00:00
pablodanswer
c81b45300b Configurable models + updated assistants bar (#1942) 2024-07-26 11:00:49 -07:00
pablodanswer
26a1e963d1 Update personas.yaml (#1948) 2024-07-25 20:35:49 -07:00
pablodanswer
2a983263c7 Small update- Danswer update icons as well (#1945) 2024-07-25 20:31:41 -07:00
Yuhong Sun
2a37c95a5e Types for Migrations (#1944) 2024-07-25 18:18:48 -07:00
pablodanswer
c277a74f82 Add icons to assistants! (#1930) 2024-07-25 18:02:39 -07:00
rkuo-danswer
e4b31cd0d9 allow setting secondary worker count via environment variable. default to primary worker count if unset. (#1941) 2024-07-25 20:25:43 +00:00
hagen-danswer
a40d2a1e2e Change the way we get sqlalchemy session (#1940)
* changed default fast model to gpt-4o-mini

* Changed the way we get the sqlalchemy session
2024-07-25 18:36:14 +00:00
hagen-danswer
c9fb99d719 changed default fast model to gpt-4o-mini (#1939) 2024-07-25 10:50:02 -07:00
hagen-danswer
a4d71e08aa Added check for unknown tool names (#1924)
* answer.py

* Let it continue if broken
2024-07-25 00:19:08 +00:00
rkuo-danswer
546bfbd24b autoscale with pool=thread crashes celery. remove and use concurrency… (#1929)
* autoscale with pool=thread crashes celery. remove and use concurrency instead (to be improved later)

* update dev background script as well
2024-07-25 00:15:27 +00:00
hagen-danswer
27824d6cc6 Fixed login issue (#1920)
* included check for existing emails

* cleaned up logic
2024-07-25 00:03:29 +00:00
Weves
9d5c4ad634 Small fix for non tool calling LLMs 2024-07-24 15:41:43 -07:00
Shukant Pal
9b32003816 Handle SSL error tracebacks in site indexing connector (#1911)
My website (https://shukantpal.com) uses Let's Encrypt certificates, which aren't accepted by the Python urllib certificate verifier for some reason. My website is set up correctly otherwise (https://www.sslshopper.com/ssl-checker.html#hostname=www.shukantpal.com)

This change adds a fix so the correct traceback is shown in Danswer, instead of a generic "unable to connect, check your Internet connection".
2024-07-24 22:36:29 +00:00
pablodanswer
8bc4123ed7 add modern health check banner + expiration tracking (#1730)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-07-24 15:34:22 -07:00
pablodanswer
d58aaf7a59 add href 2024-07-24 14:33:56 -07:00
pablodanswer
a0056a1b3c add files (images) (#1926) 2024-07-24 21:26:01 +00:00
pablodanswer
d2584c773a slightly clearer description of model settings in assistants creation tab (#1925) 2024-07-24 21:25:30 +00:00
pablodanswer
807bef8ada Add environment variable for defaulted sidebar toggling (#1923)
* add env variable for defaulted sidebar toggling

* formatting

* update naming
2024-07-24 21:23:37 +00:00
rkuo-danswer
5afddacbb2 order list of new attempts from oldest to newest to prevent connector starvation (#1918) 2024-07-24 21:02:20 +00:00
hagen-danswer
4fb6a88f1e Quick fix (#1919) 2024-07-24 11:56:14 -07:00
rkuo-danswer
7057be6a88 Bugfix/indexing progress (#1916)
* mark in progress should always be committed

* no_commit version of mark_attempt is not needed
2024-07-24 11:39:44 -07:00
Yuhong Sun
91be8e7bfb Skip Null Docs (#1917) 2024-07-24 11:31:33 -07:00
Yuhong Sun
9651ea828b Handling Metadata by Vector and Keyword (#1909) 2024-07-24 11:05:56 -07:00
rkuo-danswer
6ee74bd0d1 fix pointers to various background tasks and scripts (#1914) 2024-07-24 10:12:51 -07:00
pablodanswer
48a0d29a5c Fix empty / reverted embeddings (#1910) 2024-07-23 22:41:31 -07:00
hagen-danswer
6ff8e6c0ea Improve eval pipeline qol (#1908) 2024-07-23 17:16:34 -07:00
Yuhong Sun
2470c68506 Don't rephrase first chat query (#1907) 2024-07-23 16:20:11 -07:00
hagen-danswer
866bc803b1 Implemented LLM disabling for api call (#1905) 2024-07-23 16:12:51 -07:00
pablodanswer
9c6084bd0d Embeddings- Clean up modal + "Important" call out (#1903) 2024-07-22 21:29:22 -07:00
hagen-danswer
a0b46c60c6 Switched eval api target back to oneshotqa (#1902) 2024-07-22 20:55:18 -07:00
pablodanswer
4029233df0 hide incomplete sources for non-admins (#1901) 2024-07-22 13:40:11 -07:00
hagen-danswer
6c88c0156c Added file upload retry logic (#1889) 2024-07-22 13:13:22 -07:00
pablodanswer
33332d08f2 fix citation title (#1900)
* fix citation title

* remove title function
2024-07-22 17:37:04 +00:00
hagen-danswer
17005fb705 switched default pruning behavior and removed some logging (#1898) 2024-07-22 17:36:26 +00:00
hagen-danswer
48a7fe80b1 Committed LLM updates to db (#1899) 2024-07-22 10:30:24 -07:00
pablodanswer
1276732409 Misc bug fixes (#1895) 2024-07-22 10:22:43 -07:00
Weves
f91b92a898 Make is_public default true for LLMProvider 2024-07-21 22:22:37 -07:00
Weves
6222f533be Update force delete script to handle user groups 2024-07-21 22:22:37 -07:00
hagen-danswer
1b49d17239 Added ability to control LLM access based on group (#1870)
* Added ability to control LLM access based on group

* completed relationship deletion

* cleaned up function

* added comments

* fixed frontend strings

* mypy fixes

* added case handling for deletion of user groups

* hidden advanced options now

* removed unnecessary code
2024-07-22 04:31:44 +00:00
Yuhong Sun
2f5f19642e Double Check Max Tokens for Indexing (#1893) 2024-07-21 21:12:39 -07:00
Yuhong Sun
6db4634871 Token Truncation (#1892) 2024-07-21 16:26:32 -07:00
Yuhong Sun
5cfed45cef Handle Empty Titles (#1891) 2024-07-21 14:59:23 -07:00
Weves
581ffde35a Fix jira connector failures for server deployments 2024-07-21 14:44:25 -07:00
pablodanswer
6313e6d91d Remove visit api when unneded (#1885)
* quick fix to test on ec2

* quick cleanup

* modify a name

* address full doc as well

* additional timing info + handling

* clean up

* squash

* Print only
2024-07-21 20:57:24 +00:00
Weves
c09c94bf32 Fix assistant swap 2024-07-21 13:57:36 -07:00
Yuhong Sun
0e8ba111c8 Model Touchups (#1887) 2024-07-21 12:31:00 -07:00
Yuhong Sun
2ba24b1734 Reenable Search Pipeline (#1886) 2024-07-21 10:33:29 -07:00
Yuhong Sun
44820b4909 k 2024-07-21 10:27:57 -07:00
hagen-danswer
eb3e7610fc Added retries and multithreading for cloud embedding (#1879)
* added retries and multithreading for cloud embedding

* refactored a bit

* cleaned up code

* got the errors to bubble up to the ui correctly

* added exceptin printing

* added requirements

* touchups

---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-07-20 22:10:18 -07:00
pablodanswer
7fbbb174bb minor fixes (#1882)
- Assistants tab size
- Fixed logo -> absolute
2024-07-20 21:02:57 -07:00
pablodanswer
3854ca11af add newlines for message content 2024-07-20 18:57:29 -07:00
Yuhong Sun
e95bfa0e0b Suffix Test (#1880) 2024-07-20 15:54:55 -07:00
Yuhong Sun
4848b5f1de Suffix Edits (#1878) 2024-07-20 13:59:14 -07:00
Yuhong Sun
7ba5c434fa Missing Comma (#1877) 2024-07-19 22:15:45 -07:00
Yuhong Sun
59bf5ba848 File Connector Metadata (#1876) 2024-07-19 20:45:18 -07:00
Weves
f66c33380c Improve widget README 2024-07-19 20:21:07 -07:00
Weves
115650ce9f Add example widget code 2024-07-19 20:14:52 -07:00
Weves
7aa3602fca Fix black 2024-07-19 18:55:09 -07:00
Weves
864c552a17 Fix UT 2024-07-19 18:55:09 -07:00
Brent Kwok
07b2ed3d8f Fix HTTP 422 error for api_inference_sample.py (#1868) 2024-07-19 18:54:43 -07:00
Yuhong Sun
38290057f2 Search Eval (#1873) 2024-07-19 16:48:58 -07:00
Weves
2344edf158 Change default login time to 7 days 2024-07-19 13:58:50 -07:00
versecafe
86d1804eb0 Add GPT-4o-Mini & fix a missing gpt-4o 2024-07-19 12:10:27 -07:00
pablodanswer
1ebae50d0c minor udpate 2024-07-19 10:53:28 -07:00
Weves
a9fbaa396c Stop building on every PR 2024-07-19 10:21:19 -07:00
pablodanswer
27d5f69427 udpate to headers (#1864) 2024-07-19 08:38:54 -07:00
pablodanswer
5d98421ae8 show "analysis" (#1863) 2024-07-18 18:18:36 -07:00
Kevin Shi
6b561b8ca9 Add config to skip zendesk article labels 2024-07-18 18:00:51 -07:00
pablodanswer
2dc7e64dd7 fix internet search icons / text + assistants tab (#1862) 2024-07-18 16:15:19 -07:00
Yuhong Sun
5230f7e22f Enforce Disable GenAI if set (#1860) 2024-07-18 13:25:55 -07:00
hagen-danswer
a595d43ae3 Fixed deleting toolcall by message 2024-07-18 12:52:28 -07:00
Yuhong Sun
ee561f42ff Cleaner Layout (#1857) 2024-07-18 11:13:16 -07:00
Yuhong Sun
f00b3d76b3 Touchup NoOp (#1856) 2024-07-18 08:44:27 -07:00
Yuhong Sun
e4984153c0 Touchups (#1855) 2024-07-17 23:47:10 -07:00
pablodanswer
87fadb07ea COMPLETE USER EXPERIENCE OVERHAUL (#1822) 2024-07-17 19:44:21 -07:00
pablodanswer
2b07c102f9 fix discourse connector rate limiting + topic fetching (#1820) 2024-07-17 14:57:40 -07:00
hagen-danswer
e93de602c3 Use SHA instead of branch and save more data (#1850) 2024-07-17 14:56:24 -07:00
hagen-danswer
1c77395503 Fixed llm_indices from document search api (#1853) 2024-07-17 14:52:49 -07:00
Victorivus
cdf6089b3e Fix bug XML files in chat (#1804) 2024-07-17 08:09:40 -07:00
pablodanswer
d01f46af2b fix search doc bug (#1851) 2024-07-16 15:27:04 -07:00
hagen-danswer
b83f435bb0 Catch dropped eval questions and added multiprocessing (#1849) 2024-07-16 12:33:02 -07:00
hagen-danswer
25b3dacaba Seperated model caching volumes (#1845) 2024-07-15 15:32:04 -07:00
hagen-danswer
a1e638a73d Improved eval logging and stability (#1843) 2024-07-15 14:58:45 -07:00
Yuhong Sun
bd1e0c5969 Add Enum File (#1842) 2024-07-15 09:13:27 -07:00
Yuhong Sun
4d295ab97d Model Server Logging (#1839) 2024-07-15 09:00:27 -07:00
Weves
6fe3eeaa48 Fix model serer startup 2024-07-14 23:33:58 -07:00
Chris Weaver
078d5defbb Update Slack link in README.md 2024-07-14 16:50:48 -07:00
Weves
0d52e99bd4 Improve confluence rate limiting 2024-07-14 16:40:45 -07:00
hagen-danswer
1b864a00e4 Added support for multiple Eval Pipeline UIs (#1830) 2024-07-14 15:16:20 -07:00
Weves
dae4f6a0bd Fix latency caused by large numbers of tags 2024-07-14 14:21:07 -07:00
Yuhong Sun
f63d0ca3ad Title Truncation Logic (#1828) 2024-07-14 13:54:36 -07:00
Yuhong Sun
da31da33e7 Fix Title for docs without (#1827) 2024-07-14 13:51:11 -07:00
Yuhong Sun
56b175f597 Fix Sitemap Robo (#1826) 2024-07-14 13:29:26 -07:00
Zoltan Szabo
1b311d092e Try to find the sitemap for a given site (#1538) 2024-07-14 13:24:10 -07:00
Moshe Zada
6ee1292757 Fix semantic id for web pdfs (#1823) 2024-07-14 11:38:11 -07:00
Yuhong Sun
017af052be Global Tokenizer Fix (#1825) 2024-07-14 11:37:10 -07:00
pablodanswer
e7f81d1688 add third party embedding models (#1818) 2024-07-14 10:19:53 -07:00
Weves
b6bd818e60 Fix user groups page when a persona is deleted 2024-07-13 15:35:50 -07:00
hagen-danswer
36da2e4b27 Fixed slack groups (#1814)
* Simplified slackbot response groups and fixed need more help bug

* mypy fixes

* added exceptions for the couldnt find passthrough arrays
2024-07-13 22:34:35 +00:00
pablodanswer
c7af6a4601 add new standard answer test endpoint (#1789) 2024-07-12 10:06:30 -07:00
Yuhong Sun
e90c66c1b6 Include Titles in Chunks (#1817) 2024-07-12 09:42:24 -07:00
hagen-danswer
8c312482c1 fixed id retrieval from zip metadata (#1813) 2024-07-11 20:38:12 -07:00
Weves
e50820e65e Remove Internet 'Connector' that mistakenly appears on the Add Connector page 2024-07-11 18:00:59 -07:00
hagen-danswer
991ee79e47 some qol improvements for search pipeline (#1809) 2024-07-11 17:42:11 -07:00
hagen-danswer
3e645a510e Fix slack error logging (#1800) 2024-07-11 08:31:48 -07:00
Yuhong Sun
08c6e821e7 Merge Sections Logic (#1801) 2024-07-10 20:14:02 -07:00
hagen-danswer
47a550221f slackbot doesnt respond without citations/quotes (#1798)
* slackbot doesnt respond without citations/quotes

fixed logical issues

fixed dict logic

* added slackbot shim for the llm source/time feature

* mypy fixes

* slackbot doesnt respond without citations/quotes

fixed logical issues

fixed dict logic

* Update handle_regular_answer.py

* added bypass_filter check

* final fixes
2024-07-11 00:18:26 +00:00
Weves
511f619212 Add content to /document-search response 2024-07-10 15:44:58 -07:00
Varun Gaur
6c51f001dc Confluence Connector to Sync Child pages only (#1629)
---------

Co-authored-by: Varun Gaur <vgaur@roku.com>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
Co-authored-by: pablodanswer <pablo@danswer.ai>
2024-07-10 14:17:03 -07:00
pablodanswer
09a11b5e1a Fix citations + unit tests (#1760) 2024-07-10 10:05:20 -07:00
pablodanswer
aa0f7abdac add basic table wrapping (#1791) 2024-07-09 19:14:41 -07:00
Yuhong Sun
7c8f8dba17 Break the Danswer LLM logging from LiteLLM Verbose (#1795) 2024-07-09 18:18:29 -07:00
Yuhong Sun
39982e5fdc Info propagating to allow Chunk Merging (#1794) 2024-07-09 18:15:07 -07:00
pablodanswer
5e0de111f9 fix wrapping in error hover connector (#1790) 2024-07-09 11:54:35 -07:00
pablodanswer
727d80f168 fix gpt-4o image issue (#1786) 2024-07-08 23:07:53 +00:00
rashad-danswer
146f85936b Internet Search Tool (#1666)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-07-06 18:01:24 -07:00
Chris Weaver
e06f8a0a4b Standard Answers (#1753)
---------

Co-authored-by: druhinsgoel <druhin@danswer.ai>
2024-07-06 16:11:11 -07:00
Yuhong Sun
f0888f2f61 Eval Script Incremental Write (#1784) 2024-07-06 15:43:40 -07:00
Yuhong Sun
d35d7ee833 Evaluation Pipeline Touchup (#1783) 2024-07-06 13:17:05 -07:00
Yuhong Sun
c5bb3fde94 Ignore Eval Files (#1782) 2024-07-06 12:15:03 -07:00
Yuhong Sun
79190030a5 New Env File for Eval (#1781) 2024-07-06 12:07:31 -07:00
Yuhong Sun
8e8f262ed3 Docker Compose Eval Pipeline Cleanup (#1780) 2024-07-06 12:04:57 -07:00
hagen-danswer
ac14369716 Added search quality testing pipeline (#1774) 2024-07-06 11:51:50 -07:00
Weves
de4d8e9a65 Fix shared chats 2024-07-04 11:41:16 -07:00
hagen-danswer
0b384c5b34 fixed salesforce url generation (#1777) 2024-07-04 10:43:21 -07:00
Weves
fa049f4f98 Add UI support for github configs 2024-07-03 17:37:59 -07:00
pablodanswer
72d6a0ef71 minor updates to assistant UI (#1771) 2024-07-03 18:28:25 +00:00
pablodanswer
ae4e643266 Update Assistants Creation UI (#1714)
* slide up "Tools"

* rework assistants page

* update layout

* reorg complete

- pending: useful header text?

* add tooltips

* alter organizational structure

* rm shadcn

* rm dependencies

* revalidate dependencies

* restore

* update component structure

* [s] format

* rm package json

* add package-lock.json [s]

* collapsible

* naming + width

* formatting

* formatting

* updated user flow

- Fix error/detail messages
- Fix tooltip delay
- Fix icons

* 1 -> 2

* naming fixes

* ran pretty

* fix build issue?

* web build issues?
2024-07-03 17:11:14 +00:00
hagen-danswer
a7da07afc0 allowed arbitrary types to handle the sqlalchemy datatype (#1758)
* allowed arbitrary types to handle the sqlalchemy datatype

* changed persona_upsert to take in ids instead of objects
2024-07-03 07:10:57 +00:00
Weves
7f1bb67e52 Pass through API base to ImageGenerationTool 2024-07-02 23:31:04 -07:00
Weves
982b1b0c49 Add litellm.set_verbose support 2024-07-02 23:22:17 -07:00
Daniel Naber
2db128fb36 Notion date filter fix (#1755)
* fix filter logic

* make comparison better readable
2024-07-02 15:39:35 -07:00
Christoph Petzold
3ebac6256f Fix "cannot access local variable" for bot direct messages (#1737)
* Update handle_message.py

* Update handle_message.py

* Update handle_message.py

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-07-02 15:36:23 -07:00
Weves
1a3ec59610 Fix build caused by bad seeding config 2024-07-01 23:41:43 -07:00
hagen-danswer
581cb827bb added settings and persona seeding options (#1742)
* added settings and persona seeding options

* updated recency_bias

* changed variable type

* another fix

* Update seeding.py

* fixed mypy

* push
2024-07-01 22:22:17 +00:00
Weves
393b3c9343 Fix misc chat bugs 2024-06-30 18:08:03 -07:00
Weves
2035e9f39c Fix docker build 2024-06-30 14:34:47 -07:00
Weves
52c3a5e9d2 Fix slackbot citation images 2024-06-30 13:42:56 -07:00
pablodanswer
3e45a41617 Bugfix/scroll (#1748) 2024-06-30 12:58:57 -07:00
Weves
415960564d Fix fast models 2024-06-29 15:19:09 -07:00
pablodanswer
ed550986a6 Feature/assistants (#1581)
* include alternate assisstant

- migrate models
- migrate db

* functional alternate assistant selection

* refactor chat components for persona API

* functional assistants api

* add full functionality- assistants

* add functional assistants dropdown handler

* refactor assistants for full compatability

- hooks
- track the live assistant for edge cases
- UI updates

* add assistant UI features

- Autotab
- Arrow selection
- Icons
- Proper @ detection
- Info Popup

prune unnecessary comments

* functional search toggling for assistants

* add functional cross-page assistants

rebase with main

* add proper interactivity for edge cases

- click outside of input / text box
- "force search" assistant consistency

* refactor alt assistant consistency

* update alembic versions

* rebased

* undo formatting changes

* additional formatting

* current processing

* merge fixes

* formatting

* colors

* 2 -> 1

* 1 -> 2

---------

Co-authored-by: “Pablo <“pablo@danswer.ai”>
2024-06-29 00:18:39 +00:00
hagen-danswer
60dd77393d Disallowed simultaneous pruning jobs (#1704)
* Added TTL to EE Celery tasks

* fixed alembic files

* fixed frontend build issue and reworked file deletion

* FileD

* revert change

* reworked delete chatmessage

* added orphan cleanup

* ensured syntax

* Disallowed simultaneous pruning jobs

* added rate limiting and env vars

* i hope this is how you use decorators

* nonsense

* cleaned up names, added config

* renamed other utils

* Update celery_utils.py

* reverted changes
2024-06-28 23:26:00 +00:00
hagen-danswer
3fe5313b02 renamed alembic table (#1741) 2024-06-28 22:54:19 +00:00
hagen-danswer
bd0925611a Added TTL to EE Celery tasks (#1713)
* Added TTL to EE Celery tasks

* fixed alembic files

* fixed frontend build issue and reworked file deletion

* FileD

* revert change

* reworked delete chatmessage

* added orphan cleanup

* ensured syntax

* default value to None

* made all deletions manual

* added fix

* Use tremor buttons now

* removed words

* Update 23957775e5f5_remove_feedback_foreignkey_constraint.py

* fixed alembic version
2024-06-28 22:13:47 +00:00
pablodanswer
de6d040349 add boto3 typing to default requirements (#1740) 2024-06-28 13:06:59 -07:00
Chris Weaver
38da3128d8 Pass headers into image generation (#1739) 2024-06-28 12:33:53 -07:00
Chris Weaver
e47da0d688 Small readme improvement (#1735) 2024-06-27 19:12:24 -07:00
rkuo-danswer
2c0e0c5f11 Merge pull request #1731 from danswer-ai/feature/merge-queue-workflows
add workflows that automate docker builds against merge group events
2024-06-27 18:25:05 -07:00
Richard Kuo (Danswer)
29d57f6354 remove obsolete comment 2024-06-27 17:39:17 -07:00
rkuo-danswer
369e607631 Fixes DAN-189 (safari bug in admin). Removed td/absolute positioning behavior which is unde… (#1718) 2024-06-27 17:15:42 -07:00
pablodanswer
f03f97307f Blob Storage (#1705)
S3 + OCI + Google Cloud Storage + R2
---------

Co-authored-by: Art Matsak <5328078+artmatsak@users.noreply.github.com>
2024-06-27 17:12:20 -07:00
Weves
145cdb69b7 Remove duplicate tool check 2024-06-27 17:05:50 -07:00
pablodanswer
9310a8edc2 Feature/scroll (#1694)
---------

Co-authored-by: “Pablo <“pablo@danswer.ai”>
2024-06-27 16:40:23 -07:00
Yuhong Sun
2140f80891 Tidy up Actions ported from EE (#1732) 2024-06-27 16:20:34 -07:00
Richard Kuo (Danswer)
52dab23295 add workflows that automate docker builds against merge group events 2024-06-27 14:31:39 -07:00
Weves
91c9b2eb42 Add more logging for num workers in simple job client 2024-06-27 11:49:08 -07:00
Weves
5764cdd469 Use FiEdit2 as the standard edit icon 2024-06-27 11:32:47 -07:00
Weves
8fea6d7f64 Fix share for insecure: 2024-06-27 11:19:04 -07:00
pablodanswer
5324b15397 Chat overflow (#1723) 2024-06-27 11:02:11 -07:00
Yuhong Sun
8be42a5f98 Touchup for Multilingual Users (#1725) 2024-06-26 22:44:06 -07:00
Weves
062dc98719 Fix search tool 2024-06-26 21:28:16 -07:00
pablodanswer
43557f738b add copy-paste images (#1722) 2024-06-26 20:12:34 -07:00
Weves
b5aa7370a2 Make seeded model default 2024-06-26 16:03:24 -07:00
Weves
4ba6e45128 Small template fix 2024-06-26 13:58:30 -07:00
pablodanswer
d6e5a98a22 Minor Update to UI (#1692)
New sidebar / chatbar color, hidable right-panel, and many more small tweaks.

---------

Co-authored-by: pablodanswer <“pablo@danswer.ai”>
2024-06-26 13:54:41 -07:00
Yuhong Sun
20c4cdbdda Catch LLM Generation Failure (#1712) 2024-06-26 10:44:22 -07:00
Yuhong Sun
0d814939ee Bugfix for Selected Doc when the message it is selected from failed (#1711) 2024-06-26 10:32:30 -07:00
Yuhong Sun
7d2b0ffcc5 Developer Env Setup (#1710) 2024-06-26 09:45:19 -07:00
hagen-danswer
8c6cd661f5 Ignore messages from Slack's official bot (#1703)
* Ignore messages from Slack's official bot

* re-added message filter
2024-06-25 20:33:36 -07:00
Weves
5d552705aa Change EE environment variable name 2024-06-25 16:59:09 -07:00
Weves
1ee8ee9e8b Prepare EE to merge with MIT 2024-06-25 15:07:56 -07:00
Chris Weaver
f0b2b57d81 Usage reports (#118)
---------

Co-authored-by: amohamdy99 <a.mohamdy99@gmail.com>
2024-06-25 15:07:56 -07:00
hagen-danswer
5c12a3e872 brought out the UsersResponse interface (#119) 2024-06-25 15:07:56 -07:00
Weves
3af81ca96b Fix seed config when left empty 2024-06-25 15:07:56 -07:00
Weves
f55e5415bb Add empty assets folder 2024-06-25 15:07:56 -07:00
Weves
3d434c2c9e Fix persona access for answer-with-quote API 2024-06-25 15:07:56 -07:00
pablodanswer
90ec156791 formatting 2024-06-25 15:07:56 -07:00
pablodanswer
8ba48e24a6 minor build fix 2024-06-25 15:07:56 -07:00
pablodanswer
e34bcbbd06 Add persistent name and logo seeding (#107) 2024-06-25 15:07:56 -07:00
pablodanswer
db319168f8 stronger wording 2024-06-25 15:07:56 -07:00
pablodanswer
010ce5395f Minor/ee optional branding (#105) 2024-06-25 15:07:56 -07:00
rashad-danswer
98a58337a7 Query history speed fix (#109) 2024-06-25 15:07:56 -07:00
Weves
733d4e666b Add support for private file connectors 2024-06-25 15:07:56 -07:00
Weves
2937fe9e7d Fix backend build 2024-06-25 15:07:56 -07:00
Weves
457527ac86 Try different runner groups for each build 2024-06-25 15:07:56 -07:00
Weves
7cc51376f2 Allow basic seeding of Danswer via env variable 2024-06-25 15:07:56 -07:00
Weves
7278d45552 Fix rebase issue 2024-06-25 15:07:56 -07:00
Yuhong Sun
1c343bbee7 Enable Dedup Flag for Doc Search Endpoint 2024-06-25 15:07:56 -07:00
Weves
bdcfb39724 Add whitelabeled name to login page 2024-06-25 15:07:56 -07:00
Weves
694d20ea8f Fix user groups issue from rebase 2024-06-25 15:07:56 -07:00
Weves
45402d0755 Add back custom logo/name to sidebar header 2024-06-25 15:07:56 -07:00
Weves
69740ba3d5 Fix rebase issue 2024-06-25 15:07:56 -07:00
Yuhong Sun
6162283beb Fix formatting issues (#93) 2024-06-25 15:07:56 -07:00
Yuhong Sun
44284f7912 Fix Rebase Issues (#92) 2024-06-25 15:07:56 -07:00
Weves
775ca5787b Move web build to a matrix build 2024-06-25 15:07:56 -07:00
Weves
c6e49a3034 Don't get duplicate docs during user group syncing 2024-06-25 15:07:56 -07:00
Weves
9c8cfd9175 Fix mypy 2024-06-25 15:07:56 -07:00
Weves
fc3ed76d12 Add pagination to user group syncing 2024-06-25 15:07:56 -07:00
Weves
a2597d5f21 Fix rebase issue with dev compose file 2024-06-25 15:07:56 -07:00
Yuhong Sun
af588461d2 Enable Encryption 2024-06-25 15:07:56 -07:00
Weves
460e61b3a7 Fix document lock acquisition for user group sync 2024-06-25 15:07:56 -07:00
Weves
c631ac0c3a Change secret name 2024-06-25 15:07:56 -07:00
Yuhong Sun
10be91a8cc Track Slack questions Autoresolved (#86) 2024-06-25 15:07:56 -07:00
Weves
eadad34a77 Fix /send-message-simple-api endpoint 2024-06-25 15:07:56 -07:00
Weves
b19d88a151 Fix rebase issue with file_store 2024-06-25 15:07:56 -07:00
Weves
e33b469915 Remove unused Chat.tsx file 2024-06-25 15:07:56 -07:00
Weves
719fc06604 Fix rebase issue with UI-based LLM selection 2024-06-25 15:07:56 -07:00
Alan Hagedorn
d7a704c0d9 Token Rate Limiting
WIP

Cleanup 🧹

Remove existing rate limiting logic

Cleanup 🧼

Undo nit

Cleanup 🧽

Move db constants (avoids circular import)

WIP

WIP

Cleanup

Lint

Resolve alembic conflict

Fix mypy

Add backfill to migration

Update comment

Make unauthenticated users still adhere to global limits

Use Depends

Remove enum from table

Update migration error handling + deletion

Address verbal feedback, cleanup urls, minor nits
2024-06-25 15:07:56 -07:00
Yuhong Sun
7a408749cf Fix Web Compile Issue (#81) 2024-06-25 15:07:56 -07:00
Yuhong Sun
d9acd03a85 Query History Include Feedback Text (#80) 2024-06-25 15:07:56 -07:00
Yuhong Sun
af94c092e7 Reduce sync jobs batch size (#79) 2024-06-25 15:07:56 -07:00
Yuhong Sun
f55a4ef9bd Remove Nested Session (#78) 2024-06-25 15:07:56 -07:00
Yuhong Sun
6c6e33e001 Allow Empty API Names (#77) 2024-06-25 15:07:56 -07:00
Yuhong Sun
336c046e5d Better Naming for API Keys (#76) 2024-06-25 15:07:56 -07:00
Weves
9a9b89f073 Fix rebase issue with public assistants 2024-06-25 15:07:56 -07:00
Weves
89fac98534 Fix ee redirect 2024-06-25 15:07:56 -07:00
Weves
65b65518de Add back ChatBanner 2024-06-25 15:07:56 -07:00
Yuhong Sun
0c827d1e6c Permission Sync Framework (#44) 2024-06-25 15:07:56 -07:00
Weves
1984f2c1ca Add automated auth checks for ee 2024-06-25 15:07:56 -07:00
Yuhong Sun
50f006557f Add message id to simple message endpoint (#69) 2024-06-25 15:07:56 -07:00
Yuhong Sun
c00bd44bcc Add Chunk Context options for EE APIs (#68) 2024-06-25 15:07:56 -07:00
Yuhong Sun
680aca68e5 Make EE containers public changes (#67) 2024-06-25 15:07:56 -07:00
Weves
22a2f86fb9 FE build fix 2024-06-25 15:07:56 -07:00
Weves
c055dc1535 Add custom analytics script 2024-06-25 15:07:56 -07:00
Alan Hagedorn
81e9880d9d Add names to API Keys (#63) 2024-06-25 15:07:56 -07:00
Weves
3466f6d3a4 Custom banner 2024-06-25 15:07:56 -07:00
Weves
91cf45165f Small fixes + adding 'Powered by Danswer' 2024-06-25 15:07:56 -07:00
Weves
ee2a5bbf49 Add custom-styling ability via themes 2024-06-25 15:07:56 -07:00
Weves
153007c57c Whitelableing for Logo / Name via Admin panel 2024-06-25 15:07:56 -07:00
Yuhong Sun
fa8cc10063 Allow Optional Rerank in APIs (#60) 2024-06-25 15:07:56 -07:00
Yuhong Sun
2c3ba5f021 Include User in Query Export (#59) 2024-06-25 15:07:56 -07:00
Yuhong Sun
e3ef620094 Query History Now Handles Old Messages (#58) 2024-06-25 15:07:56 -07:00
Yuhong Sun
40369e0538 Formatter (#57) 2024-06-25 15:07:56 -07:00
Yuhong Sun
d6c5c65b51 Fix Query History (#56) 2024-06-25 15:07:56 -07:00
Yuhong Sun
7b16cb9562 Rebase search changes to EE APIs (#55) 2024-06-25 15:07:56 -07:00
Weves
ef4f06a375 Fix SAML for /manage/me 2024-06-25 15:07:56 -07:00
Chris Weaver
17cc262f5d Private personas doc sets (#52)
Private Personas and Document Sets

---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-06-25 15:07:56 -07:00
Yuhong Sun
680482bd06 Metadata filter for document search API (#53) 2024-06-25 15:07:56 -07:00
Weves
64874d2737 Change runner for backend 2024-06-25 15:07:56 -07:00
Weves
00ade322f1 Fix small answer-with-quote bug 2024-06-25 15:07:56 -07:00
Weves
eab5d054d5 Add env variable to control hash rounds 2024-06-25 15:07:56 -07:00
Weves
a09d60d7d0 Fix user group deletion bug 2024-06-25 15:07:56 -07:00
Yuhong Sun
f17dc52b37 One Shot API No Stream (#41) 2024-06-25 15:07:56 -07:00
Yuhong Sun
c1862e961b Simple API No Longer Require Specify Prompt (#40) 2024-06-25 15:07:56 -07:00
Yuhong Sun
6b46a71cb5 Fix Empty Chat for API (#39) 2024-06-25 15:07:56 -07:00
Yuhong Sun
9ae3a4af7f Basic Chat API (#38) 2024-06-25 15:07:56 -07:00
Weves
328b96c9ff Make public/not-public selector prettier 2024-06-25 15:07:56 -07:00
Yuhong Sun
bac34a47b2 Embedding Model Swap Changes (#35) 2024-06-25 15:07:56 -07:00
Weves
15934ee268 Fix limit/offset for document-search endpoint 2024-06-25 15:07:56 -07:00
Weves
fe975c3357 Add global prefix to EE endpoints 2024-06-25 15:07:56 -07:00
Weves
8bf483904d Fix users page with API keys + add spinner on key creation 2024-06-25 15:07:56 -07:00
Yuhong Sun
db338bfddf Introduce EE only Backend APIs (#29) 2024-06-25 15:07:56 -07:00
Weves
ae02a5199a Add API key generation in the UI + allow it to be used across all endpoints 2024-06-25 15:07:56 -07:00
Yuhong Sun
4b44073d9a CVEs (#26) 2024-06-25 15:07:56 -07:00
Weves
ce36530c79 Fix viewing other users' chat histories in query history 2024-06-25 15:07:56 -07:00
Weves
39d69838c5 Make query history fetch client-side 2024-06-25 15:07:56 -07:00
Weves
e11f0f6202 Fix /chat-session-history/{chat_session_id} endpoint when auth is enabled 2024-06-25 15:07:56 -07:00
Weves
ce870ff577 Re-style user group pages 2024-06-25 15:07:56 -07:00
Weves
a52711967f Fix analytics + query history 2024-06-25 15:07:56 -07:00
Weves
67a4eb6f6f Fix frontend typing rebase issue 2024-06-25 15:07:56 -07:00
Weves
9599388db8 Fix sidebar typo 2024-06-25 15:07:56 -07:00
Weves
f82ae158ea Mark indexing jobs as ee when running ee supervisord 2024-06-25 15:07:56 -07:00
Weves
670de6c00d Add new env variable to EE supervisord 2024-06-25 15:07:56 -07:00
Yuhong Sun
56c52bddff Fix missing supervisord change from Danswer MIT (#18) 2024-06-25 15:07:56 -07:00
Chris Weaver
3984350ff9 Improvements to Query History (#17)
* Add option to download query-history as a CSV

* Add user email + more complete timestamp
2024-06-25 15:07:56 -07:00
Weves
f799d9aa11 Fix EE import 2024-06-25 15:07:56 -07:00
Yuhong Sun
529f2c8c2d Danswer EE Version Text (#12) 2024-06-25 15:07:56 -07:00
Yuhong Sun
b4683dc841 Fix Rebase Issue 2024-06-25 15:07:56 -07:00
Chris Weaver
db8ce61ff4 Fix group prefix (#11) 2024-06-25 15:07:56 -07:00
Yuhong Sun
d016e8335e Update default SAML config location (#10) 2024-06-25 15:07:56 -07:00
Yuhong Sun
0c295d1de5 Enable EE features for no-letsencrypt deployment (#9) 2024-06-25 15:07:56 -07:00
Chris Weaver
e9f273d99a Admin Analytics/Query History dashboards (#6) 2024-06-25 15:07:56 -07:00
Weves
428f5edd21 Update ee supervisord 2024-06-25 15:07:56 -07:00
Weves
50170cc97e Move user group syncing to Celery Beat 2024-06-25 15:07:56 -07:00
Chris Weaver
7503f8f37b Add User Groups (a.k.a. RBAC) (#4) 2024-06-25 15:07:56 -07:00
Yuhong Sun
92de6acc6f Initial EE features (#3) 2024-06-25 15:07:56 -07:00
mattboret
65d5808ea7 Confluence: add pages labels indexation (#1635)
* Confluence: add pages labels indexation

* changed the default and fixed the dict building

* Update app_configs.py

* Update connector.py

---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-06-25 10:41:29 -07:00
Yuhong Sun
061dab7f37 Touchup (#1702) 2024-06-25 10:34:03 -07:00
hagen-danswer
e65d9e155d fixed confluence breaking on unknown filetypes (#1698) 2024-06-25 10:19:01 -07:00
hagen-danswer
50f799edf4 Merge pull request #1672 from danswer-ai/add-groups-to-slack-bot-responses
add slack groups to user response list
2024-06-24 15:10:12 -07:00
pablodanswer
c1d8f6cb66 add basic 403 support for healthcheck (#1689) 2024-06-23 11:19:46 -07:00
pablodanswer
6c71bc05ea modify script deletion name (#1690) 2024-06-23 08:29:37 -07:00
Yuhong Sun
123ec4342a Relari (#1687)
Also includes some bugfixes
2024-06-22 18:52:48 -07:00
pablodanswer
7253316b9e Add script for forced connector deletion (#1683) 2024-06-22 17:15:25 -07:00
Weves
4ae924662c Add migration for usage reports 2024-06-22 16:21:41 -07:00
Yuhong Sun
094eea2742 Discourse Edge Case (#1685) 2024-06-22 15:17:33 -07:00
pablodanswer
8178d536b4 Add functional thread modification endpoints (#1668)
Makes it so if you change which LLM you are using in a given ChatSession, that is persisted and sticks around if you reload the page / come back to the ChatSession later
2024-06-21 18:10:30 -07:00
Yuhong Sun
5cafc96cae Enable Internet Search for Deployment Options (#1684) 2024-06-21 17:38:49 -07:00
Weves
3e39a921b0 Fix image generation output 2024-06-21 15:41:55 -07:00
Weves
98b2507045 Improve persona access 2024-06-21 13:35:14 -07:00
hagen-danswer
3dfe17a54d google drive ignores shortcut filetypes now 2024-06-20 18:50:12 -07:00
hagen-danswer
b4675082b1 Update handle_message.py 2024-06-20 21:16:26 -04:00
hagen-danswer
287a706e89 combined the input fields 2024-06-20 11:48:14 -07:00
Kevin Shi
ba58208a85 Transform HTML links to markdown behind config option (#1671) 2024-06-20 10:43:15 -07:00
hagen-danswer
694e9e8679 finished first draft 2024-06-19 22:11:33 -07:00
pablodanswer
9e30ec1f1f hide popup for non admin + if search is disabled 2024-06-19 13:08:24 -07:00
pablodanswer
1b56c75527 [minor] proper assistant line length 2024-06-19 12:14:42 -07:00
Weves
b07fdbf1d1 Cleanup user management 2024-06-18 14:37:47 -07:00
hagen-danswer
54c2547d89 Add connector document pruning task (#1652) 2024-06-18 14:26:12 -07:00
Liam Norris
58b5e25c97 User Management: Invite, Deactivate, Search, & Paginate (#1631) 2024-06-18 11:28:47 -07:00
hagen-danswer
4e15ba78d5 replicated drive fix for gmail connector (#1658) 2024-06-18 08:47:06 -07:00
Yuhong Sun
c798ade127 Code for ease of eval (#1656) 2024-06-17 20:32:12 -07:00
hagen-danswer
93cc5a9e77 improved salesforce description text (#1655) 2024-06-17 20:14:34 -07:00
Weves
7746375bfd Custom tools 2024-06-17 15:12:50 -07:00
Weves
c6d094b2ee Fix google drive connector page refresh 2024-06-16 14:56:36 -07:00
hagen-danswer
7a855192c3 added google slides format to gdrive connector (#1645) 2024-06-16 13:59:20 -07:00
hagen-danswer
c3577cf346 salesforce qol changes (#1644) 2024-06-16 11:12:37 -07:00
hagen-danswer
722a1dd919 Add salesforce connector (#1621) 2024-06-16 10:04:44 -07:00
hagen-danswer
e4999266ca added azure models to vision capable list (#1638) 2024-06-16 08:38:30 -07:00
Weves
f294dba095 Fix google drive page 2024-06-15 16:04:21 -07:00
Weves
03105ad551 Fix bypass_acl support for Slack bot 2024-06-14 16:57:06 -07:00
hagen-danswer
4b0ff95b26 added pptx to drive reader (#1634) 2024-06-13 22:50:28 -07:00
Vikas Neha Ojha
ff06d62acf Added ClickUp Connector (#1521)
* Added connector for clickup

* Fixed mypy issues

* Fallback to description if markdown is not available

* Added extra information in metadata, and support to index comments

* Fixes for fields parsing

* updated fetcher to errorHandlingFetcher

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-06-13 17:49:31 -07:00
Yuhong Sun
26fee36ed4 Catch Slack Greeting Msgs Generic (#1633) 2024-06-13 09:41:01 -07:00
Scott Davidson
428439447e Fix Helm vespa resource limits (#1606)
Co-authored-by: sd109 <scott@stackhpc.com>
2024-06-11 21:01:26 -07:00
hagen-danswer
e8cfbc1dd8 added a check for empty URL list in web connector (#1573)
* added a check for empty URL list in web connector

* added raise condition for improper sitemap designation
2024-06-11 18:26:44 -07:00
hagen-danswer
486b0ecb31 Confluence: Add page attachments indexing (#1617)
* Confluence: Add page attachments indexing

* used the centralized file processing to extract file content

* flipped input order for extract_file_text

* added bytes support for pdf converter

* brought out the io.BytesIO to the confluence connector

---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2024-06-11 18:23:13 -07:00
dependabot[bot]
8c324f8f01 Bump requests from 2.31.0 to 2.32.2 in /backend/requirements (#1509)
Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-11 18:20:08 -07:00
Varun Gaur
5a577f9a00 Added a Logic to Index entire Gitlab Project (#1586)
* Changes for Gitlab connector

* Changes to Rebase from Main

* Changes to Rebase from Main

* Changes to Rebase from Main

* Changes to Rebase from Main

* made indexing code files a config setting

* Update app_configs.py

created env variable

* Update app_configs.py

added false

---------

Co-authored-by: Varun Gaur <vgaur@roku.com>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-06-11 18:18:14 -07:00
Weves
e6d5b95b4a Fix web build 2024-06-11 15:06:55 -07:00
Weves
cc0320b50a Apply passthrough headers for chat renaming 2024-06-10 18:24:46 -07:00
Yuhong Sun
36afa9370f Vespa remove apostrophe in URLs (#1618) 2024-06-10 17:19:47 -07:00
Yuhong Sun
7c9d037b7c Env Template Update (#1615) 2024-06-10 14:24:17 -07:00
Yuhong Sun
a2065f018a Environment Template for VSCode/Cursor (#1614) 2024-06-10 13:36:20 -07:00
Weves
b723627e0c Ability to pass through headers to LLM call 2024-06-10 13:18:31 -07:00
hagen-danswer
180b592afe added html support to file text extractor (#1611) 2024-06-10 12:46:05 -07:00
Weves
e8306b0fa5 Fix web build 2024-06-10 11:40:49 -07:00
Weves
64ee5ffff5 Fix slack bot creation with document sets 2024-06-10 11:29:44 -07:00
hagen-danswer
ead6a851cc Merge pull request #1144 from hagen6835/add-teams-connector
added teams connector
2024-06-10 13:25:54 -04:00
hagen-danswer
73575f22d8 prettier 2024-06-10 09:44:20 -07:00
hagen-danswer
be5dd3eefb final revisions fr 2024-06-10 09:32:39 -07:00
hagen-danswer
f18aa2368e Merge pull request #1601 from danswer-ai/prune-model-list
chatpage now checks for llm override for image uploads
2024-06-09 20:17:13 -04:00
hagen-danswer
3ec559ade2 added null inputs for other usages 2024-06-09 17:11:06 -07:00
hagen-danswer
4d0794f4f5 chatpage now checks for llm override for image uploads 2024-06-09 17:05:41 -07:00
hagen-danswer
64a042b94d cleaned up sharepoint connector (#1599)
* cleaned up sharepoint connector

* additional cleanup

* fixed dropbox connector string
2024-06-09 12:15:52 -07:00
Yuhong Sun
fa3a3d348c Precommit Fixes (#1596) 2024-06-09 00:44:36 -07:00
mattboret
a0e10ac9c2 Slack: add support to rephrase user message (#1528)
* Slack: add support to rephrase user message

* fix: handle rephrase error

* Update listener.py

---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-06-08 18:11:32 -07:00
Art Matsak
e1ece4a27a Fix Helm chart run failures due to low resources (#1536)
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-06-08 18:05:44 -07:00
Bijay Regmi
260149b35a Feature/support all gpus (#1515)
* add `count` explicitely to ensure backwards compat

* fix main document URL
2024-06-08 17:45:26 -07:00
Alexander L
5f2737f9ee Add OAuth env vars to API Server deployment (#1500)
* Add oauth vars to the kubernetes api server deployment

* Add example google oauth id and secret to kubernetes secrets
2024-06-08 17:43:36 -07:00
Moncho Pena
e1d8b88318 Changed the vespa podLabels to work properly (#1594)
Vespa is not working because this configuration

As you can see in this Issue https://github.com/unoplat/vespa-helm-charts/issues/20

You have to use this podLabels to be accord with the other configuration.

vespa:
  podLabels:
    app: vespa
    app.kubernetes.io/instance: danswer <-------------
2024-06-08 17:40:27 -07:00
Joe Shindelar
fc5337d4db Set ignore_danswer_metadata=False when calling read_text_file in file connector. (#1501)
Fixes an issue where metadata specified in a #DANSWER_METADATA line in a file read by the file connector is ignored.
2024-06-08 17:39:18 -07:00
hagen-danswer
bd9335e832 disabled reindexing for dropbox and added warning (#1593) 2024-06-08 17:26:07 -07:00
hagen-danswer
cbc53fd500 moved methods to top and fixed logic errors 2024-06-07 20:34:24 -07:00
Nils
7a3c102c74 Added support for Sites.Selected permissions to the SharePoint Connector and enabled the selection of individual subfolders (#1583) 2024-06-07 15:47:17 -07:00
Weves
4274c114c5 Fix display for std markdown without a language 2024-06-07 15:22:06 -07:00
Weves
d56e6c495a Fix code block copy 2024-06-07 15:22:06 -07:00
Weves
c9160c705a Fix list of assistants in Assistants Modal 2024-06-07 14:21:57 -07:00
Chris Weaver
3bc46ef40e Fix slack bot with document set (#1588)
Also includes a Persona db layer refactor
2024-06-07 14:14:08 -07:00
hagen-danswer
ff59858327 final bugfixes 2024-06-07 13:19:48 -07:00
Yuhong Sun
643176407c Fix Dedupe (#1587) 2024-06-07 11:35:27 -07:00
Weves
eacfd8f33f Use errorHandlingFetcher 2024-06-07 11:25:48 -07:00
Weves
f6fb963419 Fix indexing status page crashing 2024-06-07 11:25:48 -07:00
hagen-danswer
16e023a8ce Revert "ran prettier"
This reverts commit 750c1df0bb.
2024-06-07 11:20:58 -07:00
hagen-danswer
b79820a309 Revert "Update init-letsencrypt.sh (#1380)"
This reverts commit 9e0b6aa531.
2024-06-07 11:05:57 -07:00
hagen-danswer
754b735174 Revert "fix gitlab-connector - wrong datetime format (#1559)"
This reverts commit 8dfba97c09.
2024-06-07 11:00:01 -07:00
hagen-danswer
58c305a539 Revert "Add Dropbox connector (#956)"
This reverts commit 914dc27a8f.
2024-06-07 10:58:37 -07:00
hagen-danswer
26bc785625 Revert "Update README.md with fixed Slack link round 2"
This reverts commit 0b6e85c26b.
2024-06-07 10:53:15 -07:00
Yuhong Sun
09da456bba Remove Redundant Dedupe Logic (#1577) 2024-06-06 14:36:41 -07:00
Yuhong Sun
da43bac456 Dedupe Flag (#1576) 2024-06-06 14:10:40 -07:00
Weves
adcbd354f4 Fix fast model update + slight reword on model selection 2024-06-05 18:43:37 -07:00
Weves
41fbaf5698 Fix prompt access 2024-06-05 18:43:13 -07:00
Hagen O'Neill
0b83396c4d disabled dropbox polling 2024-06-05 15:14:08 -07:00
Hagen O'Neill
785d7736ed extracted semantic identifier into its own method 2024-06-05 14:37:53 -07:00
Hagen O'Neill
9a9a879aee bugfixes 2024-06-05 14:30:34 -07:00
Hagen O'Neill
7b36f7aa4f chat_message: ChatMessage 2024-06-05 14:18:28 -07:00
Hagen O'Neill
8d74176348 completed code revision suggestions 2024-06-05 14:11:38 -07:00
Hagen O'Neill
713d325f42 fixed rebase issues 2024-06-04 21:25:16 -07:00
Hagen O'Neill
f34b26b3d0 seperated teams and sharepoint enviornment variables 2024-06-04 20:08:24 -07:00
Hagenoneill
a2349af65c added teams connector 2024-06-04 20:08:07 -07:00
Bill Yang
914dc27a8f Add Dropbox connector (#956)
* start dropbox connector

* add wip ui

* polish ui

* Fix some ci

* ignore types

* addressed, fixed, and tested all comments

* ran prettier

* ran mypy fixes

---------

Co-authored-by: Bill Yang <bill@Bills-MacBook-Pro.local>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-06-04 20:04:50 -07:00
Chris Weaver
0b6e85c26b Update README.md with fixed Slack link round 2 2024-06-04 20:04:50 -07:00
Chris Weaver
291a3f9ca0 Fix Slack link in README.md 2024-06-04 20:04:50 -07:00
NP
8dfba97c09 fix gitlab-connector - wrong datetime format (#1559) 2024-06-04 20:04:50 -07:00
Keiron Stoddart
9e0b6aa531 Update init-letsencrypt.sh (#1380)
Add ability to accept a `.ev.nginx` configuration that lists a subdomain.
2024-06-04 20:04:50 -07:00
Bill Yang
1fb47d70b3 Add Dropbox connector (#956)
* start dropbox connector

* add wip ui

* polish ui

* Fix some ci

* ignore types

* addressed, fixed, and tested all comments

* ran prettier

* ran mypy fixes

---------

Co-authored-by: Bill Yang <bill@Bills-MacBook-Pro.local>
Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-06-04 17:58:01 -07:00
Chris Weaver
25d40f8daa Update README.md with fixed Slack link round 2 2024-06-04 17:19:59 -07:00
Chris Weaver
154cdec0db Fix Slack link in README.md 2024-06-04 16:42:33 -07:00
Hagen O'Neill
f5deb37fde fixed mypy issues 2024-06-04 13:35:30 -07:00
Hagen O'Neill
750c1df0bb ran prettier 2024-06-04 13:23:02 -07:00
Hagen O'Neill
e608febb7f fixed messages nesting 2024-06-04 13:17:19 -07:00
hagen-danswer
c8e10282b2 Update connector.py
removed unused var
2024-06-04 15:49:05 -04:00
hagen-danswer
d7f66ba8c4 Update connector.py
changed raise output
2024-06-04 15:46:55 -04:00
NP
4d5a39628f fix gitlab-connector - wrong datetime format (#1559) 2024-06-04 10:41:19 -07:00
Keiron Stoddart
b6d0ecec4f Update init-letsencrypt.sh (#1380)
Add ability to accept a `.ev.nginx` configuration that lists a subdomain.
2024-06-04 10:38:01 -07:00
Hagen O'Neill
14a39e88e8 seperated teams and sharepoint enviornment variables 2024-06-04 09:45:10 -07:00
hagen-danswer
ea71b9830c Update types.ts
removed semi
2024-06-04 12:11:19 -04:00
hagen-danswer
a9834853ef Merge branch 'main' into add-teams-connector 2024-06-04 12:09:52 -04:00
hagen-danswer
8b10535c93 Merge pull request #1505 from mboret/fix/sharepoint-connector-missing-drive-items
fix sharepoint connector missing objects
2024-06-04 10:27:22 -04:00
pablodanswer
e1e1f036a7 Remove React Markdown for human messages (#1562)
* Remove React Markdown for human messages

* Update globals.css

* add formatting

* formatting

---------

Co-authored-by: “Pablo <“pablo@danswer.ai”>
2024-06-03 18:20:46 -07:00
Weves
0c642f25dd Add image generation for gpt-4o 2024-06-03 16:38:49 -07:00
Weves
3788041115 Fix missing end of answer for quotes processor 2024-06-03 16:37:27 -07:00
Weves
5a75470d23 Fix no-search assistants with DISABLE_LLM_CHOOSE_SEARCH enabled 2024-06-03 14:38:42 -07:00
Weves
81aada7c0f Add option to hide logout 2024-06-03 12:13:40 -07:00
Chris Weaver
e4a08c5546 Improve error msg in chat UI (#1557) 2024-06-02 19:53:04 -07:00
Weves
61d096533c Allow multiple files to be selected for file upload 2024-06-02 16:20:59 -07:00
Weves
0543abac9a Move to matrix builds 2024-06-02 15:57:53 -07:00
Weves
1d0ce49c05 Fix slowness due to hitting async Postgres driver pool limit 2024-06-01 20:18:14 -07:00
Weves
6e9d7acb9c Latency logging 2024-06-01 20:18:14 -07:00
Yuhong Sun
026652d827 Helm tuning (#1553) 2024-06-01 17:29:58 -07:00
Yuhong Sun
2363698c20 Consolidate Helm Charts (#1552) 2024-06-01 16:46:44 -07:00
Clay Rosenthal
0ea257d030 adding secrets to helm (#1541)
* adding secrets to helm

* incrementing chart version
2024-05-31 19:11:36 -07:00
Yuhong Sun
d141e637d0 Disable Search if User uploads files in Chat (#1550) 2024-05-31 19:07:56 -07:00
Yuhong Sun
4b53cb56a6 Fix File Type Migration (#1549) 2024-05-31 18:35:36 -07:00
Weves
b690ae05b4 Add assistant gallery 2024-05-29 21:05:31 -07:00
Matthieu Boret
fbdf882299 fix sharepoint connector missing objects 2024-05-29 10:13:41 +02:00
Yuhong Sun
44d57f1b53 Reenable force search (#1531) 2024-05-28 11:36:02 -07:00
Art Matsak
d2b58bdb40 Fix DISABLE_LLM_CHOOSE_SEARCH being ignored (#1523) 2024-05-28 08:39:02 -07:00
Yuhong Sun
aa98200bec SlackBot Disable AI option (#1527) 2024-05-28 01:35:57 -07:00
Yuhong Sun
32c37f8b17 Update Prompt (#1526) 2024-05-28 01:08:29 -07:00
Weves
008a91bff0 Partial fix for links not working 2024-05-27 17:31:10 -07:00
Weves
9a3613eb44 Fix unknown languages causing the chat to crash 2024-05-27 17:31:10 -07:00
Yuhong Sun
90d5b41901 Fix Citation Prompt Optionality (#1524) 2024-05-27 14:11:58 -07:00
Yuhong Sun
8688226003 Remove Tag Unique Constraint Bug (#1519) 2024-05-26 15:10:56 -07:00
Weves
97d058b8b2 Fix mypy for mediawiki tests 2024-05-25 17:16:47 -07:00
Weves
26ef5b897d Code highlighting 2024-05-25 17:12:12 -07:00
Weves
dfd233b985 Fix mypy for mediawiki connector 2024-05-25 13:04:23 -07:00
Weves
2dab9c576c Add Slack payload log 2024-05-24 19:24:17 -07:00
Weves
a9f5952510 Fix migration multiple head issue 2024-05-24 14:43:52 -07:00
Andrew Sansom
94018e83b0 Add MediaWiki and Wikipedia Connectors (#1250)
* Add MediaWikiConnector first draft

* Add MediaWikiConnector first draft

* Add MediaWikiConnector first draft

* Add MediaWikiConnector sections for each document

* Add MediaWikiConnector to constants and factory

* Integrate MediaWikiConnector with connectors page

* Unit tests + bug fixes

* Allow adding multiple mediawikiconnectors

* add wikipedia connector

* add wikipedia connector to factory

* improve docstrings of mediawiki connector backend

* improve docstrings of mediawiki connector backend

* move wikipedia and mediawiki icon locations in admin page

* undo accidental commit of modified docker compose yaml
2024-05-24 08:51:20 -07:00
Hagenoneill
818dfd0413 JUST get_all LOL 2024-04-01 14:39:37 -04:00
Hagenoneill
51b4e63218 organized documents by post instead of by channel 2024-04-01 14:31:26 -04:00
Hagenoneill
73b063b66c added teams connector 2024-02-29 12:36:05 -05:00
1084 changed files with 91512 additions and 28337 deletions

View File

@@ -0,0 +1,76 @@
name: 'Build and Push Docker Image with Retry'
description: 'Attempts to build and push a Docker image, with a retry on failure'
inputs:
context:
description: 'Build context'
required: true
file:
description: 'Dockerfile location'
required: true
platforms:
description: 'Target platforms'
required: true
pull:
description: 'Always attempt to pull a newer version of the image'
required: false
default: 'true'
push:
description: 'Push the image to registry'
required: false
default: 'true'
load:
description: 'Load the image into Docker daemon'
required: false
default: 'true'
tags:
description: 'Image tags'
required: true
cache-from:
description: 'Cache sources'
required: false
cache-to:
description: 'Cache destinations'
required: false
retry-wait-time:
description: 'Time to wait before retry in seconds'
required: false
default: '5'
runs:
using: "composite"
steps:
- name: Build and push Docker image (First Attempt)
id: buildx1
uses: docker/build-push-action@v5
continue-on-error: true
with:
context: ${{ inputs.context }}
file: ${{ inputs.file }}
platforms: ${{ inputs.platforms }}
pull: ${{ inputs.pull }}
push: ${{ inputs.push }}
load: ${{ inputs.load }}
tags: ${{ inputs.tags }}
cache-from: ${{ inputs.cache-from }}
cache-to: ${{ inputs.cache-to }}
- name: Wait to retry
if: steps.buildx1.outcome != 'success'
run: |
echo "First attempt failed. Waiting ${{ inputs.retry-wait-time }} seconds before retry..."
sleep ${{ inputs.retry-wait-time }}
shell: bash
- name: Build and push Docker image (Retry Attempt)
if: steps.buildx1.outcome != 'success'
uses: docker/build-push-action@v5
with:
context: ${{ inputs.context }}
file: ${{ inputs.file }}
platforms: ${{ inputs.platforms }}
pull: ${{ inputs.pull }}
push: ${{ inputs.push }}
load: ${{ inputs.load }}
tags: ${{ inputs.tags }}
cache-from: ${{ inputs.cache-from }}
cache-to: ${{ inputs.cache-to }}

25
.github/pull_request_template.md vendored Normal file
View File

@@ -0,0 +1,25 @@
## Description
[Provide a brief description of the changes in this PR]
## How Has This Been Tested?
[Describe the tests you ran to verify your changes]
## Accepted Risk
[Any know risks or failure modes to point out to reviewers]
## Related Issue(s)
[If applicable, link to the issue(s) this PR addresses]
## Checklist:
- [ ] All of the automated tests pass
- [ ] All PR comments are addressed and marked resolved
- [ ] If there are migrations, they have been rebased to latest main
- [ ] If there are new dependencies, they are added to the requirements
- [ ] If there are new environment variables, they are added to all of the deployment methods
- [ ] If there are new APIs that don't require auth, they are added to PUBLIC_ENDPOINT_SPECS
- [ ] Docker images build and basic functionalities work
- [ ] Author has done a final read through of the PR right before merge

View File

@@ -5,9 +5,14 @@ on:
tags:
- '*'
env:
REGISTRY_IMAGE: danswer/danswer-backend
jobs:
build-and-push:
runs-on: ubuntu-latest
# TODO: make this a matrix build like the web containers
runs-on:
group: amd64-image-builders
steps:
- name: Checkout code
@@ -22,6 +27,11 @@ jobs:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Install build-essential
run: |
sudo apt-get update
sudo apt-get install -y build-essential
- name: Backend Image Docker Build and Push
uses: docker/build-push-action@v5
with:
@@ -30,8 +40,8 @@ jobs:
platforms: linux/amd64,linux/arm64
push: true
tags: |
danswer/danswer-backend:${{ github.ref_name }}
danswer/danswer-backend:latest
${{ env.REGISTRY_IMAGE }}:${{ github.ref_name }}
${{ env.REGISTRY_IMAGE }}:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
@@ -39,6 +49,6 @@ jobs:
uses: aquasecurity/trivy-action@master
with:
# To run locally: trivy image --severity HIGH,CRITICAL danswer/danswer-backend
image-ref: docker.io/danswer/danswer-backend:${{ github.ref_name }}
image-ref: docker.io/${{ env.REGISTRY_IMAGE }}:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'
trivyignores: ./backend/.trivyignore

View File

@@ -7,7 +7,8 @@ on:
jobs:
build-and-push:
runs-on: ubuntu-latest
runs-on:
group: amd64-image-builders
steps:
- name: Checkout code

View File

@@ -5,40 +5,115 @@ on:
tags:
- '*'
env:
REGISTRY_IMAGE: danswer/danswer-web-server
jobs:
build-and-push:
runs-on: ubuntu-latest
build:
runs-on:
group: ${{ matrix.platform == 'linux/amd64' && 'amd64-image-builders' || 'arm64-image-builders' }}
strategy:
fail-fast: false
matrix:
platform:
- linux/amd64
- linux/arm64
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Prepare
run: |
platform=${{ matrix.platform }}
echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV
- name: Checkout
uses: actions/checkout@v4
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY_IMAGE }}
tags: |
type=raw,value=${{ env.REGISTRY_IMAGE }}:${{ github.ref_name }}
type=raw,value=${{ env.REGISTRY_IMAGE }}:latest
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Build and push by digest
id: build
uses: docker/build-push-action@v5
with:
context: ./web
file: ./web/Dockerfile
platforms: ${{ matrix.platform }}
push: true
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
# needed due to weird interactions with the builds for different platforms
no-cache: true
labels: ${{ steps.meta.outputs.labels }}
outputs: type=image,name=${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-${{ env.PLATFORM_PAIR }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
merge:
runs-on: ubuntu-latest
needs:
- build
steps:
- name: Download digests
uses: actions/download-artifact@v4
with:
path: /tmp/digests
pattern: digests-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY_IMAGE }}
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
$(printf '${{ env.REGISTRY_IMAGE }}@sha256:%s ' *)
- name: Inspect image
run: |
docker buildx imagetools inspect ${{ env.REGISTRY_IMAGE }}:${{ steps.meta.outputs.version }}
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Web Image Docker Build and Push
uses: docker/build-push-action@v5
with:
context: ./web
file: ./web/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: |
danswer/danswer-web-server:${{ github.ref_name }}
danswer/danswer-web-server:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
# needed due to weird interactions with the builds for different platforms
no-cache: true
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/danswer/danswer-web-server:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/${{ env.REGISTRY_IMAGE }}:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'

View File

@@ -0,0 +1,67 @@
# This workflow is intentionally disabled while we're still working on it
# It's close to ready, but a race condition needs to be fixed with
# API server and Vespa startup, and it needs to have a way to build/test against
# local containers
name: Helm - Lint and Test Charts
on:
merge_group:
pull_request:
branches: [ main ]
jobs:
lint-test:
runs-on: Amd64
# fetch-depth 0 is required for helm/chart-testing-action
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Helm
uses: azure/setup-helm@v4.2.0
with:
version: v3.14.4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'
cache-dependency-path: |
backend/requirements/default.txt
backend/requirements/dev.txt
backend/requirements/model_server.txt
- run: |
python -m pip install --upgrade pip
pip install --retries 5 --timeout 30 -r backend/requirements/default.txt
pip install --retries 5 --timeout 30 -r backend/requirements/dev.txt
pip install --retries 5 --timeout 30 -r backend/requirements/model_server.txt
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.6.1
- name: Run chart-testing (list-changed)
id: list-changed
run: |
changed=$(ct list-changed --target-branch ${{ github.event.repository.default_branch }})
if [[ -n "$changed" ]]; then
echo "changed=true" >> "$GITHUB_OUTPUT"
fi
- name: Run chart-testing (lint)
# if: steps.list-changed.outputs.changed == 'true'
run: ct lint --all --config ct.yaml --target-branch ${{ github.event.repository.default_branch }}
- name: Create kind cluster
# if: steps.list-changed.outputs.changed == 'true'
uses: helm/kind-action@v1.10.0
- name: Run chart-testing (install)
# if: steps.list-changed.outputs.changed == 'true'
run: ct install --all --config ct.yaml
# run: ct install --target-branch ${{ github.event.repository.default_branch }}

View File

@@ -1,6 +1,7 @@
name: Python Checks
on:
merge_group:
pull_request:
branches: [ main ]
@@ -23,9 +24,9 @@ jobs:
backend/requirements/model_server.txt
- run: |
python -m pip install --upgrade pip
pip install -r backend/requirements/default.txt
pip install -r backend/requirements/dev.txt
pip install -r backend/requirements/model_server.txt
pip install --retries 5 --timeout 30 -r backend/requirements/default.txt
pip install --retries 5 --timeout 30 -r backend/requirements/dev.txt
pip install --retries 5 --timeout 30 -r backend/requirements/model_server.txt
- name: Run MyPy
run: |

View File

@@ -0,0 +1,57 @@
name: Connector Tests
on:
pull_request:
branches: [main]
schedule:
# This cron expression runs the job daily at 16:00 UTC (9am PT)
- cron: "0 16 * * *"
env:
# Confluence
CONFLUENCE_TEST_SPACE_URL: ${{ secrets.CONFLUENCE_TEST_SPACE_URL }}
CONFLUENCE_TEST_SPACE: ${{ secrets.CONFLUENCE_TEST_SPACE }}
CONFLUENCE_IS_CLOUD: ${{ secrets.CONFLUENCE_IS_CLOUD }}
CONFLUENCE_TEST_PAGE_ID: ${{ secrets.CONFLUENCE_TEST_PAGE_ID }}
CONFLUENCE_USER_NAME: ${{ secrets.CONFLUENCE_USER_NAME }}
CONFLUENCE_ACCESS_TOKEN: ${{ secrets.CONFLUENCE_ACCESS_TOKEN }}
jobs:
connectors-check:
runs-on: ubuntu-latest
env:
PYTHONPATH: ./backend
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
cache: "pip"
cache-dependency-path: |
backend/requirements/default.txt
backend/requirements/dev.txt
- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install --retries 5 --timeout 30 -r backend/requirements/default.txt
pip install --retries 5 --timeout 30 -r backend/requirements/dev.txt
- name: Run Tests
shell: script -q -e -c "bash --noprofile --norc -eo pipefail {0}"
run: py.test -o junit_family=xunit2 -xv --ff backend/tests/daily/connectors
- name: Alert on Failure
if: failure() && github.event_name == 'schedule'
env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
run: |
curl -X POST \
-H 'Content-type: application/json' \
--data '{"text":"Scheduled Connector Tests failed! Check the run at: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"}' \
$SLACK_WEBHOOK

View File

@@ -1,6 +1,7 @@
name: Python Unit Tests
on:
merge_group:
pull_request:
branches: [ main ]
@@ -10,7 +11,8 @@ jobs:
env:
PYTHONPATH: ./backend
REDIS_CLOUD_PYTEST_PASSWORD: ${{ secrets.REDIS_CLOUD_PYTEST_PASSWORD }}
steps:
- name: Checkout code
uses: actions/checkout@v4
@@ -27,8 +29,8 @@ jobs:
- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install -r backend/requirements/default.txt
pip install -r backend/requirements/dev.txt
pip install --retries 5 --timeout 30 -r backend/requirements/default.txt
pip install --retries 5 --timeout 30 -r backend/requirements/dev.txt
- name: Run Tests
shell: script -q -e -c "bash --noprofile --norc -eo pipefail {0}"

View File

@@ -4,18 +4,19 @@ concurrency:
cancel-in-progress: true
on:
merge_group:
pull_request: null
jobs:
quality-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: pre-commit/action@v3.0.0
with:
extra_args: --from-ref ${{ github.event.pull_request.base.sha }} --to-ref ${{ github.event.pull_request.head.sha }}
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- uses: pre-commit/action@v3.0.0
with:
extra_args: ${{ github.event_name == 'pull_request' && format('--from-ref {0} --to-ref {1}', github.event.pull_request.base.sha, github.event.pull_request.head.sha) || '' }}

161
.github/workflows/run-it.yml vendored Normal file
View File

@@ -0,0 +1,161 @@
name: Run Integration Tests
concurrency:
group: Run-Integration-Tests-${{ github.head_ref }}
cancel-in-progress: true
on:
merge_group:
pull_request:
branches: [ main ]
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
jobs:
integration-tests:
runs-on: Amd64
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
# NOTE: we don't need to build the Web Docker image since it's not used
# during the IT for now. We have a separate action to verify it builds
# succesfully
- name: Pull Web Docker image
run: |
docker pull danswer/danswer-web-server:latest
docker tag danswer/danswer-web-server:latest danswer/danswer-web-server:it
- name: Build Backend Docker image
uses: ./.github/actions/custom-build-and-push
with:
context: ./backend
file: ./backend/Dockerfile
platforms: linux/amd64
tags: danswer/danswer-backend:it
cache-from: type=registry,ref=danswer/danswer-backend:it
cache-to: |
type=registry,ref=danswer/danswer-backend:it,mode=max
type=inline
- name: Build Model Server Docker image
uses: ./.github/actions/custom-build-and-push
with:
context: ./backend
file: ./backend/Dockerfile.model_server
platforms: linux/amd64
tags: danswer/danswer-model-server:it
cache-from: type=registry,ref=danswer/danswer-model-server:it
cache-to: |
type=registry,ref=danswer/danswer-model-server:it,mode=max
type=inline
- name: Build integration test Docker image
uses: ./.github/actions/custom-build-and-push
with:
context: ./backend
file: ./backend/tests/integration/Dockerfile
platforms: linux/amd64
tags: danswer/integration-test-runner:it
cache-from: type=registry,ref=danswer/integration-test-runner:it
cache-to: |
type=registry,ref=danswer/integration-test-runner:it,mode=max
type=inline
- name: Start Docker containers
run: |
cd deployment/docker_compose
ENABLE_PAID_ENTERPRISE_EDITION_FEATURES=true \
AUTH_TYPE=basic \
REQUIRE_EMAIL_VERIFICATION=false \
DISABLE_TELEMETRY=true \
IMAGE_TAG=it \
docker compose -f docker-compose.dev.yml -p danswer-stack up -d
id: start_docker
- name: Wait for service to be ready
run: |
echo "Starting wait-for-service script..."
docker logs -f danswer-stack-api_server-1 &
start_time=$(date +%s)
timeout=300 # 5 minutes in seconds
while true; do
current_time=$(date +%s)
elapsed_time=$((current_time - start_time))
if [ $elapsed_time -ge $timeout ]; then
echo "Timeout reached. Service did not become ready in 5 minutes."
exit 1
fi
# Use curl with error handling to ignore specific exit code 56
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health || echo "curl_error")
if [ "$response" = "200" ]; then
echo "Service is ready!"
break
elif [ "$response" = "curl_error" ]; then
echo "Curl encountered an error, possibly exit code 56. Continuing to retry..."
else
echo "Service not ready yet (HTTP status $response). Retrying in 5 seconds..."
fi
sleep 5
done
echo "Finished waiting for service."
- name: Run integration tests
run: |
echo "Running integration tests..."
docker run --rm --network danswer-stack_default \
-e POSTGRES_HOST=relational_db \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=postgres \
-e VESPA_HOST=index \
-e REDIS_HOST=cache \
-e API_SERVER_HOST=api_server \
-e OPENAI_API_KEY=${OPENAI_API_KEY} \
danswer/integration-test-runner:it
continue-on-error: true
id: run_tests
- name: Check test results
run: |
if [ ${{ steps.run_tests.outcome }} == 'failure' ]; then
echo "Integration tests failed. Exiting with error."
exit 1
else
echo "All integration tests passed successfully."
fi
- name: Save Docker logs
if: success() || failure()
run: |
cd deployment/docker_compose
docker compose -f docker-compose.dev.yml -p danswer-stack logs > docker-compose.log
mv docker-compose.log ${{ github.workspace }}/docker-compose.log
- name: Upload logs
if: success() || failure()
uses: actions/upload-artifact@v3
with:
name: docker-logs
path: ${{ github.workspace }}/docker-compose.log
- name: Stop Docker containers
run: |
cd deployment/docker_compose
docker compose -f docker-compose.dev.yml -p danswer-stack down -v

4
.gitignore vendored
View File

@@ -4,4 +4,6 @@
.mypy_cache
.idea
/deployment/data/nginx/app.conf
.vscode/launch.json
.vscode/
*.sw?
/backend/tests/regression/answer_quality/search_test_config.yaml

51
.vscode/env_template.txt vendored Normal file
View File

@@ -0,0 +1,51 @@
# Copy this file to .env in the .vscode folder
# Fill in the <REPLACE THIS> values as needed, it is recommended to set the GEN_AI_API_KEY value to avoid having to set up an LLM in the UI
# Also check out danswer/backend/scripts/restart_containers.sh for a script to restart the containers which Danswer relies on outside of VSCode/Cursor processes
# For local dev, often user Authentication is not needed
AUTH_TYPE=disabled
# Always keep these on for Dev
# Logs all model prompts to stdout
LOG_DANSWER_MODEL_INTERACTIONS=True
# More verbose logging
LOG_LEVEL=debug
# This passes top N results to LLM an additional time for reranking prior to answer generation
# This step is quite heavy on token usage so we disable it for dev generally
DISABLE_LLM_DOC_RELEVANCE=False
# Useful if you want to toggle auth on/off (google_oauth/OIDC specifically)
OAUTH_CLIENT_ID=<REPLACE THIS>
OAUTH_CLIENT_SECRET=<REPLACE THIS>
# Generally not useful for dev, we don't generally want to set up an SMTP server for dev
REQUIRE_EMAIL_VERIFICATION=False
# Set these so if you wipe the DB, you don't end up having to go through the UI every time
GEN_AI_API_KEY=<REPLACE THIS>
# If answer quality isn't important for dev, use gpt-4o-mini since it's cheaper
GEN_AI_MODEL_VERSION=gpt-4o
FAST_GEN_AI_MODEL_VERSION=gpt-4o
# For Danswer Slack Bot, overrides the UI values so no need to set this up via UI every time
# Only needed if using DanswerBot
#DANSWER_BOT_SLACK_APP_TOKEN=<REPLACE THIS>
#DANSWER_BOT_SLACK_BOT_TOKEN=<REPLACE THIS>
# Python stuff
PYTHONPATH=../backend
PYTHONUNBUFFERED=1
# Internet Search
BING_API_KEY=<REPLACE THIS>
# Enable the full set of Danswer Enterprise Edition features
# NOTE: DO NOT ENABLE THIS UNLESS YOU HAVE A PAID ENTERPRISE LICENSE (or if you are using this for local testing/development)
ENABLE_PAID_ENTERPRISE_EDITION_FEATURES=False

View File

@@ -1,15 +1,23 @@
/*
Copy this file into '.vscode/launch.json' or merge its
contents into your existing configurations.
*/
/* Copy this file into '.vscode/launch.json' or merge its contents into your existing configurations. */
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"compounds": [
{
"name": "Run All Danswer Services",
"configurations": [
"Web Server",
"Model Server",
"API Server",
"Indexing",
"Background Jobs",
"Slack Bot"
]
}
],
"configurations": [
{
"name": "Web Server",
@@ -17,6 +25,7 @@
"request": "launch",
"cwd": "${workspaceRoot}/web",
"runtimeExecutable": "npm",
"envFile": "${workspaceFolder}/.vscode/.env",
"runtimeArgs": [
"run", "dev"
],
@@ -24,10 +33,12 @@
},
{
"name": "Model Server",
"type": "python",
"consoleName": "Model Server",
"type": "debugpy",
"request": "launch",
"module": "uvicorn",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.vscode/.env",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1"
@@ -41,12 +52,14 @@
},
{
"name": "API Server",
"type": "python",
"consoleName": "API Server",
"type": "debugpy",
"request": "launch",
"module": "uvicorn",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.vscode/.env",
"env": {
"LOG_ALL_MODEL_INTERACTIONS": "True",
"LOG_DANSWER_MODEL_INTERACTIONS": "True",
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1"
},
@@ -59,12 +72,14 @@
},
{
"name": "Indexing",
"type": "python",
"consoleName": "Indexing",
"type": "debugpy",
"request": "launch",
"program": "danswer/background/update.py",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.vscode/.env",
"env": {
"ENABLE_MINI_CHUNK": "false",
"ENABLE_MULTIPASS_INDEXING": "false",
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
@@ -73,11 +88,14 @@
// Celery and all async jobs, usually would include indexing as well but this is handled separately above for dev
{
"name": "Background Jobs",
"type": "python",
"consoleName": "Background Jobs",
"type": "debugpy",
"request": "launch",
"program": "scripts/dev_run_background_jobs.py",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.vscode/.env",
"env": {
"LOG_DANSWER_MODEL_INTERACTIONS": "True",
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
@@ -90,16 +108,46 @@
// DANSWER_BOT_SLACK_APP_TOKEN & DANSWER_BOT_SLACK_BOT_TOKEN need to be set in .env file located in the root of the project
{
"name": "Slack Bot",
"type": "python",
"consoleName": "Slack Bot",
"type": "debugpy",
"request": "launch",
"program": "danswer/danswerbot/slack/listener.py",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.env",
"envFile": "${workspaceFolder}/.vscode/.env",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
}
},
{
"name": "Pytest",
"consoleName": "Pytest",
"type": "debugpy",
"request": "launch",
"module": "pytest",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.vscode/.env",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
},
"args": [
"-v"
// Specify a sepcific module/test to run or provide nothing to run all tests
//"tests/unit/danswer/llm/answering/test_prune_and_merge.py"
]
},
{
"name": "Clear and Restart External Volumes and Containers",
"type": "node",
"request": "launch",
"runtimeExecutable": "bash",
"runtimeArgs": ["${workspaceFolder}/backend/scripts/restart_containers.sh"],
"cwd": "${workspaceFolder}",
"console": "integratedTerminal",
"stopOnEntry": true
}
]
}
}

View File

@@ -48,23 +48,26 @@ We would love to see you there!
## Get Started 🚀
Danswer being a fully functional app, relies on some external pieces of software, specifically:
Danswer being a fully functional app, relies on some external software, specifically:
- [Postgres](https://www.postgresql.org/) (Relational DB)
- [Vespa](https://vespa.ai/) (Vector DB/Search Engine)
- [Redis](https://redis.io/) (Cache)
- [Nginx](https://nginx.org/) (Not needed for development flows generally)
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
development purposes but also feel free to just use the containers and update with local changes by providing the
`--build` flag.
> **Note:**
> This guide provides instructions to build and run Danswer locally from source with Docker containers providing the above external software. We believe this combination is easier for
> development purposes. If you prefer to use pre-built container images, we provide instructions on running the full Danswer stack within Docker below.
### Local Set Up
It is recommended to use Python version 3.11
Be sure to use Python version 3.11. For instructions on installing Python 3.11 on macOS, refer to the [CONTRIBUTING_MACOS.md](./CONTRIBUTING_MACOS.md) readme.
If using a lower version, modifications will have to be made to the code.
If using a higher version, the version of Tensorflow we use may not be available for your platform.
If using a higher version, sometimes some libraries will not be available (i.e. we had problems with Tensorflow in the past with higher versions of python).
#### Installing Requirements
#### Backend: Python requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
@@ -72,6 +75,11 @@ For convenience here's a command for it:
python -m venv .venv
source .venv/bin/activate
```
> **Note:**
> This virtual environment MUST NOT be set up WITHIN the danswer directory if you plan on using mypy within certain IDEs.
> For simplicity, we recommend setting up the virtual environment outside of the danswer directory.
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
@@ -85,34 +93,38 @@ Install the required python dependencies:
```bash
pip install -r danswer/backend/requirements/default.txt
pip install -r danswer/backend/requirements/dev.txt
pip install -r danswer/backend/requirements/ee.txt
pip install -r danswer/backend/requirements/model_server.txt
```
Install Playwright for Python (headless browser required by the Web Connector)
In the activated Python virtualenv, install Playwright for Python by running:
```bash
playwright install
```
You may have to deactivate and reactivate your virtualenv for `playwright` to appear on your path.
#### Frontend: Node dependencies
Install [Node.js and npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) for the frontend.
Once the above is done, navigate to `danswer/web` run:
```bash
npm i
```
Install Playwright (required by the Web Connector)
#### Docker containers for external software
You will need Docker installed to run these containers.
> Note: If you have just done the pip install, open a new terminal and source the python virtual-env again.
This will update the path to include playwright
Then install Playwright by running:
First navigate to `danswer/deployment/docker_compose`, then start up Postgres/Vespa/Redis with:
```bash
playwright install
docker compose -f docker-compose.dev.yml -p danswer-stack up -d index relational_db cache
```
(index refers to Vespa, relational_db refers to Postgres, and cache refers to Redis)
#### Dependent Docker Containers
First navigate to `danswer/deployment/docker_compose`, then start up Vespa and Postgres with:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d index relational_db
```
(index refers to Vespa and relational_db refers to Postgres)
#### Running Danswer
#### Running Danswer locally
To start the frontend, navigate to `danswer/web` and run:
```bash
npm run dev
@@ -123,11 +135,10 @@ Navigate to `danswer/backend` and run:
```bash
uvicorn model_server.main:app --reload --port 9000
```
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
uvicorn model_server.main:app --reload --port 9000
"
powershell -Command "uvicorn model_server.main:app --reload --port 9000"
```
The first time running Danswer, you will need to run the DB migrations for Postgres.
@@ -150,6 +161,7 @@ To run the backend API server, navigate back to `danswer/backend` and run:
```bash
AUTH_TYPE=disabled uvicorn danswer.main:app --reload --port 8080
```
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
@@ -158,20 +170,58 @@ powershell -Command "
"
```
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
> **Note:**
> If you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
#### Wrapping up
You should now have 4 servers running:
- Web server
- Backend API
- Model server
- Background jobs
Now, visit `http://localhost:3000` in your browser. You should see the Danswer onboarding wizard where you can connect your external LLM provider to Danswer.
You've successfully set up a local Danswer instance! 🏁
#### Running the Danswer application in a container
You can run the full Danswer application stack from pre-built images including all external software dependencies.
Navigate to `danswer/deployment/docker_compose` and run:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d
```
After Docker pulls and starts these containers, navigate to `http://localhost:3000` to use Danswer.
If you want to make changes to Danswer and run those changes in Docker, you can also build a local version of the Danswer container images that incorporates your changes like so:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --build
```
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
With the virtual environment active, install the pre-commit library with:
```bash
pip install pre-commit
```
Then, from the `danswer/backend` directory, run:
```bash
pre-commit install
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we would like to keep it that way!
Danswer is fully type-annotated, and we want to keep it that way!
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
@@ -182,6 +232,7 @@ Please double check that prettier passes before creating a pull request.
### Release Process
Danswer follows the semver versioning standard.
Danswer loosely follows the SemVer versioning standard.
Major changes are released with a "minor" version bump. Currently we use patch release versions to indicate small feature changes.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).

31
CONTRIBUTING_MACOS.md Normal file
View File

@@ -0,0 +1,31 @@
## Some additional notes for Mac Users
The base instructions to set up the development environment are located in [CONTRIBUTING.md](https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md).
### Setting up Python
Ensure [Homebrew](https://brew.sh/) is already set up.
Then install python 3.11.
```bash
brew install python@3.11
```
Add python 3.11 to your path: add the following line to ~/.zshrc
```
export PATH="$(brew --prefix)/opt/python@3.11/libexec/bin:$PATH"
```
> **Note:**
> You will need to open a new terminal for the path change above to take effect.
### Setting up Docker
On macOS, you will need to install [Docker Desktop](https://www.docker.com/products/docker-desktop/) and
ensure it is running before continuing with the docker commands.
### Formatting and Linting
MacOS will likely require you to remove some quarantine attributes on some of the hooks for them to execute properly.
After installing pre-commit, run the following command:
```bash
sudo xattr -r -d com.apple.quarantine ~/.cache/pre-commit
```

View File

@@ -1,6 +1,10 @@
MIT License
Copyright (c) 2023-present DanswerAI, Inc.
Copyright (c) 2023 Yuhong Sun, Chris Weaver
Portions of this software are licensed as follows:
* All content that resides under "ee" directories of this repository, if that directory exists, is licensed under the license defined in "backend/ee/LICENSE". Specifically all content under "backend/ee" and "web/src/app/ee" is licensed under the license defined in "backend/ee/LICENSE".
* All third party components incorporated into the Danswer Software are licensed under the original license provided by the owner of the applicable component.
* Content outside of the above mentioned directories or restrictions above is available under the "MIT Expat" license as defined below.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -11,7 +11,7 @@
<a href="https://docs.danswer.dev/" target="_blank">
<img src="https://img.shields.io/badge/docs-view-blue" alt="Documentation">
</a>
<a href="https://join.slack.com/t/danswer/shared_invite/zt-2afut44lv-Rw3kSWu6_OmdAXRpCv80DQ" target="_blank">
<a href="https://join.slack.com/t/danswer/shared_invite/zt-2lcmqw703-071hBuZBfNEOGUsLa5PXvQ" target="_blank">
<img src="https://img.shields.io/badge/slack-join-blue.svg?logo=slack" alt="Slack">
</a>
<a href="https://discord.gg/TDJ59cGV2X" target="_blank">
@@ -105,5 +105,25 @@ Efficiently pulls the latest changes from:
* Websites
* And more ...
## 📚 Editions
There are two editions of Danswer:
* Danswer Community Edition (CE) is available freely under the MIT Expat license. This version has ALL the core features discussed above. This is the version of Danswer you will get if you follow the Deployment guide above.
* Danswer Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations. Specifically, this includes:
* Single Sign-On (SSO), with support for both SAML and OIDC
* Role-based access control
* Document permission inheritance from connected sources
* Usage analytics and query history accessible to admins
* Whitelabeling
* API key authentication
* Encryption of secrets
* Any many more! Checkout [our website](https://www.danswer.ai/) for the latest.
To try the Danswer Enterprise Edition:
1. Checkout our [Cloud product](https://app.danswer.ai/signup).
2. For self-hosting, contact us at [founders@danswer.ai](mailto:founders@danswer.ai) or book a call with us on our [Cal](https://cal.com/team/danswer/founders).
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.

2
backend/.gitignore vendored
View File

@@ -5,7 +5,7 @@ site_crawls/
.ipynb_checkpoints/
api_keys.py
*ipynb
.env
.env*
vespa-app.zip
dynamic_config_storage/
celerybeat-schedule*

View File

@@ -1,15 +1,18 @@
FROM python:3.11.7-slim-bookworm
LABEL com.danswer.maintainer="founders@danswer.ai"
LABEL com.danswer.description="This image is for the backend of Danswer. It is MIT Licensed and \
free for all to use. You can find it at https://hub.docker.com/r/danswer/danswer-backend. For \
more details, visit https://github.com/danswer-ai/danswer."
LABEL com.danswer.description="This image is the web/frontend container of Danswer which \
contains code for both the Community and Enterprise editions of Danswer. If you do not \
have a contract or agreement with DanswerAI, you are not permitted to use the Enterprise \
Edition features outside of personal development or testing purposes. Please reach out to \
founders@danswer.ai for more information. Please visit https://github.com/danswer-ai/danswer"
# Default DANSWER_VERSION, typically overriden during builds by GitHub Actions.
ARG DANSWER_VERSION=0.3-dev
ENV DANSWER_VERSION=${DANSWER_VERSION}
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
ENV DANSWER_VERSION=${DANSWER_VERSION} \
DANSWER_RUNNING_IN_DOCKER="true"
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
# Install system dependencies
# cmake needed for psycopg (postgres)
# libpq-dev needed for psycopg (postgres)
@@ -17,18 +20,34 @@ RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
# zip for Vespa step futher down
# ca-certificates for HTTPS
RUN apt-get update && \
apt-get install -y cmake curl zip ca-certificates libgnutls30=3.7.9-2+deb12u2 \
libblkid1=2.38.1-5+deb12u1 libmount1=2.38.1-5+deb12u1 libsmartcols1=2.38.1-5+deb12u1 \
libuuid1=2.38.1-5+deb12u1 && \
apt-get install -y \
cmake \
curl \
zip \
ca-certificates \
libgnutls30=3.7.9-2+deb12u3 \
libblkid1=2.38.1-5+deb12u1 \
libmount1=2.38.1-5+deb12u1 \
libsmartcols1=2.38.1-5+deb12u1 \
libuuid1=2.38.1-5+deb12u1 \
libxmlsec1-dev \
pkg-config \
gcc && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# Install Python dependencies
# Remove py which is pulled in by retry, py is not needed and is a CVE
COPY ./requirements/default.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt && \
COPY ./requirements/ee.txt /tmp/ee-requirements.txt
RUN pip install --no-cache-dir --upgrade \
--retries 5 \
--timeout 30 \
-r /tmp/requirements.txt \
-r /tmp/ee-requirements.txt && \
pip uninstall -y py && \
playwright install chromium && playwright install-deps chromium && \
playwright install chromium && \
playwright install-deps chromium && \
ln -s /usr/local/bin/supervisord /usr/bin/supervisord
# Cleanup for CVEs and size reduction
@@ -36,29 +55,52 @@ RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt && \
# xserver-common and xvfb included by playwright installation but not needed after
# perl-base is part of the base Python Debian image but not needed for Danswer functionality
# perl-base could only be removed with --allow-remove-essential
RUN apt-get remove -y --allow-remove-essential perl-base xserver-common xvfb cmake \
libldap-2.5-0 libldap-2.5-0 && \
RUN apt-get update && \
apt-get remove -y --allow-remove-essential \
perl-base \
xserver-common \
xvfb \
cmake \
libldap-2.5-0 \
libxmlsec1-dev \
pkg-config \
gcc && \
apt-get install -y libxmlsec1-openssl && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/* && \
rm /usr/local/lib/python3.11/site-packages/tornado/test/test.key
rm -f /usr/local/lib/python3.11/site-packages/tornado/test/test.key
# Pre-downloading models for setups with limited egress
RUN python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('intfloat/e5-base-v2')"
RUN python -c "from tokenizers import Tokenizer; \
Tokenizer.from_pretrained('nomic-ai/nomic-embed-text-v1')"
# Pre-downloading NLTK for setups with limited egress
RUN python -c "import nltk; \
nltk.download('stopwords', quiet=True); \
nltk.download('wordnet', quiet=True); \
nltk.download('punkt', quiet=True);"
# nltk.download('wordnet', quiet=True); introduce this back if lemmatization is needed
# Set up application files
WORKDIR /app
# Enterprise Version Files
COPY ./ee /app/ee
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Set up application files
COPY ./danswer /app/danswer
COPY ./shared_configs /app/shared_configs
COPY ./alembic /app/alembic
COPY ./alembic.ini /app/alembic.ini
COPY supervisord.conf /usr/etc/supervisord.conf
# Escape hatch
COPY ./scripts/force_delete_connector_by_id.py /app/scripts/force_delete_connector_by_id.py
# Put logo in assets
COPY ./assets /app/assets
ENV PYTHONPATH /app
# Default command which does nothing

View File

@@ -8,24 +8,38 @@ visit https://github.com/danswer-ai/danswer."
# Default DANSWER_VERSION, typically overriden during builds by GitHub Actions.
ARG DANSWER_VERSION=0.3-dev
ENV DANSWER_VERSION=${DANSWER_VERSION}
ENV DANSWER_VERSION=${DANSWER_VERSION} \
DANSWER_RUNNING_IN_DOCKER="true"
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
COPY ./requirements/model_server.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade \
--retries 5 \
--timeout 30 \
-r /tmp/requirements.txt
RUN apt-get remove -y --allow-remove-essential perl-base && \
apt-get autoremove -y
# Pre-downloading models for setups with limited egress
RUN python -c "from transformers import AutoModel, AutoTokenizer, TFDistilBertForSequenceClassification; \
from huggingface_hub import snapshot_download; \
AutoTokenizer.from_pretrained('danswer/intent-model'); \
AutoTokenizer.from_pretrained('intfloat/e5-base-v2'); \
# Download tokenizers, distilbert for the Danswer model
# Download model weights
# Run Nomic to pull in the custom architecture and have it cached locally
RUN python -c "from transformers import AutoTokenizer; \
AutoTokenizer.from_pretrained('distilbert-base-uncased'); \
AutoTokenizer.from_pretrained('mixedbread-ai/mxbai-rerank-xsmall-v1'); \
snapshot_download('danswer/intent-model'); \
snapshot_download('intfloat/e5-base-v2'); \
snapshot_download('mixedbread-ai/mxbai-rerank-xsmall-v1')"
from huggingface_hub import snapshot_download; \
snapshot_download(repo_id='danswer/hybrid-intent-token-classifier', revision='v1.0.3'); \
snapshot_download('nomic-ai/nomic-embed-text-v1'); \
snapshot_download('mixedbread-ai/mxbai-rerank-xsmall-v1'); \
from sentence_transformers import SentenceTransformer; \
SentenceTransformer(model_name_or_path='nomic-ai/nomic-embed-text-v1', trust_remote_code=True);"
# In case the user has volumes mounted to /root/.cache/huggingface that they've downloaded while
# running Danswer, don't overwrite it with the built in cache folder
RUN mv /root/.cache/huggingface /root/.cache/temp_huggingface
WORKDIR /app

View File

@@ -8,6 +8,7 @@ from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import create_async_engine
from celery.backends.database.session import ResultModelBase # type: ignore
from sqlalchemy.schema import SchemaItem
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
@@ -15,7 +16,9 @@ config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
if config.config_file_name is not None and config.attributes.get(
"configure_logger", True
):
fileConfig(config.config_file_name)
# add your model's MetaData object here
@@ -29,6 +32,20 @@ target_metadata = [Base.metadata, ResultModelBase.metadata]
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
EXCLUDE_TABLES = {"kombu_queue", "kombu_message"}
def include_object(
object: SchemaItem,
name: str,
type_: str,
reflected: bool,
compare_to: SchemaItem | None,
) -> bool:
if type_ == "table" and name in EXCLUDE_TABLES:
return False
return True
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode.
@@ -55,7 +72,11 @@ def run_migrations_offline() -> None:
def do_run_migrations(connection: Connection) -> None:
context.configure(connection=connection, target_metadata=target_metadata) # type: ignore
context.configure(
connection=connection,
target_metadata=target_metadata, # type: ignore
include_object=include_object,
) # type: ignore
with context.begin_transaction():
context.run_migrations()

View File

@@ -0,0 +1,27 @@
"""Add thread specific model selection
Revision ID: 0568ccf46a6b
Revises: e209dc5a8156
Create Date: 2024-06-19 14:25:36.376046
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "0568ccf46a6b"
down_revision = "e209dc5a8156"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_session",
sa.Column("current_alternate_model", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("chat_session", "current_alternate_model")

View File

@@ -0,0 +1,32 @@
"""add search doc relevance details
Revision ID: 05c07bf07c00
Revises: b896bbd0d5a7
Create Date: 2024-07-10 17:48:15.886653
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "05c07bf07c00"
down_revision = "b896bbd0d5a7"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"search_doc",
sa.Column("is_relevant", sa.Boolean(), nullable=True),
)
op.add_column(
"search_doc",
sa.Column("relevance_explanation", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("search_doc", "relevance_explanation")
op.drop_column("search_doc", "is_relevant")

View File

@@ -0,0 +1,26 @@
"""add_indexing_start_to_connector
Revision ID: 08a1eda20fe1
Revises: 8a87bd6ec550
Create Date: 2024-07-23 11:12:39.462397
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "08a1eda20fe1"
down_revision = "8a87bd6ec550"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"connector", sa.Column("indexing_start", sa.DateTime(), nullable=True)
)
def downgrade() -> None:
op.drop_column("connector", "indexing_start")

View File

@@ -0,0 +1,27 @@
"""add ccpair deletion failure message
Revision ID: 0ebb1d516877
Revises: 52a219fb5233
Create Date: 2024-09-10 15:03:48.233926
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "0ebb1d516877"
down_revision = "52a219fb5233"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"connector_credential_pair",
sa.Column("deletion_failure_message", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("connector_credential_pair", "deletion_failure_message")

View File

@@ -0,0 +1,102 @@
"""add_user_delete_cascades
Revision ID: 1b8206b29c5d
Revises: 35e6853a51d5
Create Date: 2024-09-18 11:48:59.418726
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "1b8206b29c5d"
down_revision = "35e6853a51d5"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.drop_constraint("credential_user_id_fkey", "credential", type_="foreignkey")
op.create_foreign_key(
"credential_user_id_fkey",
"credential",
"user",
["user_id"],
["id"],
ondelete="CASCADE",
)
op.drop_constraint("chat_session_user_id_fkey", "chat_session", type_="foreignkey")
op.create_foreign_key(
"chat_session_user_id_fkey",
"chat_session",
"user",
["user_id"],
["id"],
ondelete="CASCADE",
)
op.drop_constraint("chat_folder_user_id_fkey", "chat_folder", type_="foreignkey")
op.create_foreign_key(
"chat_folder_user_id_fkey",
"chat_folder",
"user",
["user_id"],
["id"],
ondelete="CASCADE",
)
op.drop_constraint("prompt_user_id_fkey", "prompt", type_="foreignkey")
op.create_foreign_key(
"prompt_user_id_fkey", "prompt", "user", ["user_id"], ["id"], ondelete="CASCADE"
)
op.drop_constraint("notification_user_id_fkey", "notification", type_="foreignkey")
op.create_foreign_key(
"notification_user_id_fkey",
"notification",
"user",
["user_id"],
["id"],
ondelete="CASCADE",
)
op.drop_constraint("inputprompt_user_id_fkey", "inputprompt", type_="foreignkey")
op.create_foreign_key(
"inputprompt_user_id_fkey",
"inputprompt",
"user",
["user_id"],
["id"],
ondelete="CASCADE",
)
def downgrade() -> None:
op.drop_constraint("credential_user_id_fkey", "credential", type_="foreignkey")
op.create_foreign_key(
"credential_user_id_fkey", "credential", "user", ["user_id"], ["id"]
)
op.drop_constraint("chat_session_user_id_fkey", "chat_session", type_="foreignkey")
op.create_foreign_key(
"chat_session_user_id_fkey", "chat_session", "user", ["user_id"], ["id"]
)
op.drop_constraint("chat_folder_user_id_fkey", "chat_folder", type_="foreignkey")
op.create_foreign_key(
"chat_folder_user_id_fkey", "chat_folder", "user", ["user_id"], ["id"]
)
op.drop_constraint("prompt_user_id_fkey", "prompt", type_="foreignkey")
op.create_foreign_key("prompt_user_id_fkey", "prompt", "user", ["user_id"], ["id"])
op.drop_constraint("notification_user_id_fkey", "notification", type_="foreignkey")
op.create_foreign_key(
"notification_user_id_fkey", "notification", "user", ["user_id"], ["id"]
)
op.drop_constraint("inputprompt_user_id_fkey", "inputprompt", type_="foreignkey")
op.create_foreign_key(
"inputprompt_user_id_fkey", "inputprompt", "user", ["user_id"], ["id"]
)

View File

@@ -0,0 +1,135 @@
"""embedding model -> search settings
Revision ID: 1f60f60c3401
Revises: f17bf3b0d9f1
Create Date: 2024-08-25 12:39:51.731632
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
from danswer.configs.chat_configs import NUM_POSTPROCESSED_RESULTS
# revision identifiers, used by Alembic.
revision = "1f60f60c3401"
down_revision = "f17bf3b0d9f1"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.drop_constraint(
"index_attempt__embedding_model_fk", "index_attempt", type_="foreignkey"
)
# Rename the table
op.rename_table("embedding_model", "search_settings")
# Add new columns
op.add_column(
"search_settings",
sa.Column(
"multipass_indexing", sa.Boolean(), nullable=False, server_default="false"
),
)
op.add_column(
"search_settings",
sa.Column(
"multilingual_expansion",
postgresql.ARRAY(sa.String()),
nullable=False,
server_default="{}",
),
)
op.add_column(
"search_settings",
sa.Column(
"disable_rerank_for_streaming",
sa.Boolean(),
nullable=False,
server_default="false",
),
)
op.add_column(
"search_settings", sa.Column("rerank_model_name", sa.String(), nullable=True)
)
op.add_column(
"search_settings", sa.Column("rerank_provider_type", sa.String(), nullable=True)
)
op.add_column(
"search_settings", sa.Column("rerank_api_key", sa.String(), nullable=True)
)
op.add_column(
"search_settings",
sa.Column(
"num_rerank",
sa.Integer(),
nullable=False,
server_default=str(NUM_POSTPROCESSED_RESULTS),
),
)
# Add the new column as nullable initially
op.add_column(
"index_attempt", sa.Column("search_settings_id", sa.Integer(), nullable=True)
)
# Populate the new column with data from the existing embedding_model_id
op.execute("UPDATE index_attempt SET search_settings_id = embedding_model_id")
# Create the foreign key constraint
op.create_foreign_key(
"fk_index_attempt_search_settings",
"index_attempt",
"search_settings",
["search_settings_id"],
["id"],
)
# Make the new column non-nullable
op.alter_column("index_attempt", "search_settings_id", nullable=False)
# Drop the old embedding_model_id column
op.drop_column("index_attempt", "embedding_model_id")
def downgrade() -> None:
# Add back the embedding_model_id column
op.add_column(
"index_attempt", sa.Column("embedding_model_id", sa.Integer(), nullable=True)
)
# Populate the old column with data from search_settings_id
op.execute("UPDATE index_attempt SET embedding_model_id = search_settings_id")
# Make the old column non-nullable
op.alter_column("index_attempt", "embedding_model_id", nullable=False)
# Drop the foreign key constraint
op.drop_constraint(
"fk_index_attempt_search_settings", "index_attempt", type_="foreignkey"
)
# Drop the new search_settings_id column
op.drop_column("index_attempt", "search_settings_id")
# Rename the table back
op.rename_table("search_settings", "embedding_model")
# Remove added columns
op.drop_column("embedding_model", "num_rerank")
op.drop_column("embedding_model", "rerank_api_key")
op.drop_column("embedding_model", "rerank_provider_type")
op.drop_column("embedding_model", "rerank_model_name")
op.drop_column("embedding_model", "disable_rerank_for_streaming")
op.drop_column("embedding_model", "multilingual_expansion")
op.drop_column("embedding_model", "multipass_indexing")
op.create_foreign_key(
"index_attempt__embedding_model_fk",
"index_attempt",
"embedding_model",
["embedding_model_id"],
["id"],
)

View File

@@ -0,0 +1,44 @@
"""notifications
Revision ID: 213fd978c6d8
Revises: 5fc1f54cc252
Create Date: 2024-08-10 11:13:36.070790
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "213fd978c6d8"
down_revision = "5fc1f54cc252"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"notification",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"notif_type",
sa.String(),
nullable=False,
),
sa.Column(
"user_id",
sa.UUID(),
nullable=True,
),
sa.Column("dismissed", sa.Boolean(), nullable=False),
sa.Column("last_shown", sa.DateTime(timezone=True), nullable=False),
sa.Column("first_shown", sa.DateTime(timezone=True), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("notification")

View File

@@ -0,0 +1,86 @@
"""remove-feedback-foreignkey-constraint
Revision ID: 23957775e5f5
Revises: bc9771dccadf
Create Date: 2024-06-27 16:04:51.480437
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "23957775e5f5"
down_revision = "bc9771dccadf"
branch_labels = None # type: ignore
depends_on = None # type: ignore
def upgrade() -> None:
op.drop_constraint(
"chat_feedback__chat_message_fk", "chat_feedback", type_="foreignkey"
)
op.create_foreign_key(
"chat_feedback__chat_message_fk",
"chat_feedback",
"chat_message",
["chat_message_id"],
["id"],
ondelete="SET NULL",
)
op.alter_column(
"chat_feedback", "chat_message_id", existing_type=sa.Integer(), nullable=True
)
op.drop_constraint(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
type_="foreignkey",
)
op.create_foreign_key(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
"chat_message",
["chat_message_id"],
["id"],
ondelete="SET NULL",
)
op.alter_column(
"document_retrieval_feedback",
"chat_message_id",
existing_type=sa.Integer(),
nullable=True,
)
def downgrade() -> None:
op.alter_column(
"chat_feedback", "chat_message_id", existing_type=sa.Integer(), nullable=False
)
op.drop_constraint(
"chat_feedback__chat_message_fk", "chat_feedback", type_="foreignkey"
)
op.create_foreign_key(
"chat_feedback__chat_message_fk",
"chat_feedback",
"chat_message",
["chat_message_id"],
["id"],
)
op.alter_column(
"document_retrieval_feedback",
"chat_message_id",
existing_type=sa.Integer(),
nullable=False,
)
op.drop_constraint(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
type_="foreignkey",
)
op.create_foreign_key(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
"chat_message",
["chat_message_id"],
["id"],
)

View File

@@ -160,12 +160,28 @@ def downgrade() -> None:
nullable=False,
),
)
op.drop_constraint(
"fk_index_attempt_credential_id", "index_attempt", type_="foreignkey"
)
op.drop_constraint(
"fk_index_attempt_connector_id", "index_attempt", type_="foreignkey"
)
# Check if the constraint exists before dropping
conn = op.get_bind()
inspector = sa.inspect(conn)
constraints = inspector.get_foreign_keys("index_attempt")
if any(
constraint["name"] == "fk_index_attempt_credential_id"
for constraint in constraints
):
op.drop_constraint(
"fk_index_attempt_credential_id", "index_attempt", type_="foreignkey"
)
if any(
constraint["name"] == "fk_index_attempt_connector_id"
for constraint in constraints
):
op.drop_constraint(
"fk_index_attempt_connector_id", "index_attempt", type_="foreignkey"
)
op.drop_column("index_attempt", "credential_id")
op.drop_column("index_attempt", "connector_id")
op.drop_table("connector_credential_pair")

View File

@@ -0,0 +1,32 @@
"""Add Above Below to Persona
Revision ID: 2d2304e27d8c
Revises: 4b08d97e175a
Create Date: 2024-08-21 19:15:15.762948
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "2d2304e27d8c"
down_revision = "4b08d97e175a"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("persona", sa.Column("chunks_above", sa.Integer(), nullable=True))
op.add_column("persona", sa.Column("chunks_below", sa.Integer(), nullable=True))
op.execute(
"UPDATE persona SET chunks_above = 1, chunks_below = 1 WHERE chunks_above IS NULL AND chunks_below IS NULL"
)
op.alter_column("persona", "chunks_above", nullable=False)
op.alter_column("persona", "chunks_below", nullable=False)
def downgrade() -> None:
op.drop_column("persona", "chunks_below")
op.drop_column("persona", "chunks_above")

View File

@@ -0,0 +1,70 @@
"""Add icon_color and icon_shape to Persona
Revision ID: 325975216eb3
Revises: 91ffac7e65b3
Create Date: 2024-07-24 21:29:31.784562
"""
import random
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table, column, select
# revision identifiers, used by Alembic.
revision = "325975216eb3"
down_revision = "91ffac7e65b3"
branch_labels: None = None
depends_on: None = None
colorOptions = [
"#FF6FBF",
"#6FB1FF",
"#B76FFF",
"#FFB56F",
"#6FFF8D",
"#FF6F6F",
"#6FFFFF",
]
# Function to generate a random shape ensuring at least 3 of the middle 4 squares are filled
def generate_random_shape() -> int:
center_squares = [12, 10, 6, 14, 13, 11, 7, 15]
center_fill = random.choice(center_squares)
remaining_squares = [i for i in range(16) if not (center_fill & (1 << i))]
random.shuffle(remaining_squares)
for i in range(10 - bin(center_fill).count("1")):
center_fill |= 1 << remaining_squares[i]
return center_fill
def upgrade() -> None:
op.add_column("persona", sa.Column("icon_color", sa.String(), nullable=True))
op.add_column("persona", sa.Column("icon_shape", sa.Integer(), nullable=True))
op.add_column("persona", sa.Column("uploaded_image_id", sa.String(), nullable=True))
persona = table(
"persona",
column("id", sa.Integer),
column("icon_color", sa.String),
column("icon_shape", sa.Integer),
)
conn = op.get_bind()
personas = conn.execute(select(persona.c.id))
for persona_id in personas:
random_color = random.choice(colorOptions)
random_shape = generate_random_shape()
conn.execute(
persona.update()
.where(persona.c.id == persona_id[0])
.values(icon_color=random_color, icon_shape=random_shape)
)
def downgrade() -> None:
op.drop_column("persona", "icon_shape")
op.drop_column("persona", "uploaded_image_id")
op.drop_column("persona", "icon_color")

View File

@@ -0,0 +1,90 @@
"""Add curator fields
Revision ID: 351faebd379d
Revises: ee3f4b47fad5
Create Date: 2024-08-15 22:37:08.397052
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "351faebd379d"
down_revision = "ee3f4b47fad5"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
# Add is_curator column to User__UserGroup table
op.add_column(
"user__user_group",
sa.Column("is_curator", sa.Boolean(), nullable=False, server_default="false"),
)
# Use batch mode to modify the enum type
with op.batch_alter_table("user", schema=None) as batch_op:
batch_op.alter_column( # type: ignore[attr-defined]
"role",
type_=sa.Enum(
"BASIC",
"ADMIN",
"CURATOR",
"GLOBAL_CURATOR",
name="userrole",
native_enum=False,
),
existing_type=sa.Enum("BASIC", "ADMIN", name="userrole", native_enum=False),
existing_nullable=False,
)
# Create the association table
op.create_table(
"credential__user_group",
sa.Column("credential_id", sa.Integer(), nullable=False),
sa.Column("user_group_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["credential_id"],
["credential.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("credential_id", "user_group_id"),
)
op.add_column(
"credential",
sa.Column(
"curator_public", sa.Boolean(), nullable=False, server_default="false"
),
)
def downgrade() -> None:
# Update existing records to ensure they fit within the BASIC/ADMIN roles
op.execute(
"UPDATE \"user\" SET role = 'ADMIN' WHERE role IN ('CURATOR', 'GLOBAL_CURATOR')"
)
# Remove is_curator column from User__UserGroup table
op.drop_column("user__user_group", "is_curator")
with op.batch_alter_table("user", schema=None) as batch_op:
batch_op.alter_column( # type: ignore[attr-defined]
"role",
type_=sa.Enum(
"BASIC", "ADMIN", name="userrole", native_enum=False, length=20
),
existing_type=sa.Enum(
"BASIC",
"ADMIN",
"CURATOR",
"GLOBAL_CURATOR",
name="userrole",
native_enum=False,
),
existing_nullable=False,
)
# Drop the association table
op.drop_table("credential__user_group")
op.drop_column("credential", "curator_public")

View File

@@ -0,0 +1,64 @@
"""server default chosen assistants
Revision ID: 35e6853a51d5
Revises: c99d76fcd298
Create Date: 2024-09-13 13:20:32.885317
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "35e6853a51d5"
down_revision = "c99d76fcd298"
branch_labels = None
depends_on = None
DEFAULT_ASSISTANTS = [-2, -1, 0]
def upgrade() -> None:
# Step 1: Update any NULL values to the default value
# This upgrades existing users without ordered assistant
# to have default assistants set to visible assistants which are
# accessible by them.
op.execute(
"""
UPDATE "user" u
SET chosen_assistants = (
SELECT jsonb_agg(
p.id ORDER BY
COALESCE(p.display_priority, 2147483647) ASC,
p.id ASC
)
FROM persona p
LEFT JOIN persona__user pu ON p.id = pu.persona_id AND pu.user_id = u.id
WHERE p.is_visible = true
AND (p.is_public = true OR pu.user_id IS NOT NULL)
)
WHERE chosen_assistants IS NULL
OR chosen_assistants = 'null'
OR jsonb_typeof(chosen_assistants) = 'null'
OR (jsonb_typeof(chosen_assistants) = 'string' AND chosen_assistants = '"null"')
"""
)
# Step 2: Alter the column to make it non-nullable
op.alter_column(
"user",
"chosen_assistants",
type_=postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
server_default=sa.text(f"'{DEFAULT_ASSISTANTS}'::jsonb"),
)
def downgrade() -> None:
op.alter_column(
"user",
"chosen_assistants",
type_=postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
server_default=None,
)

View File

@@ -0,0 +1,35 @@
"""add alternate assistant to chat message
Revision ID: 3a7802814195
Revises: 23957775e5f5
Create Date: 2024-06-05 11:18:49.966333
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "3a7802814195"
down_revision = "23957775e5f5"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_message", sa.Column("alternate_assistant_id", sa.Integer(), nullable=True)
)
op.create_foreign_key(
"fk_chat_message_persona",
"chat_message",
"persona",
["alternate_assistant_id"],
["id"],
)
def downgrade() -> None:
op.drop_constraint("fk_chat_message_persona", "chat_message", type_="foreignkey")
op.drop_column("chat_message", "alternate_assistant_id")

View File

@@ -0,0 +1,42 @@
"""Rename index_origin to index_recursively
Revision ID: 1d6ad76d1f37
Revises: e1392f05e840
Create Date: 2024-08-01 12:38:54.466081
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "1d6ad76d1f37"
down_revision = "e1392f05e840"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute(
"""
UPDATE connector
SET connector_specific_config = jsonb_set(
connector_specific_config,
'{index_recursively}',
'true'::jsonb
) - 'index_origin'
WHERE connector_specific_config ? 'index_origin'
"""
)
def downgrade() -> None:
op.execute(
"""
UPDATE connector
SET connector_specific_config = jsonb_set(
connector_specific_config,
'{index_origin}',
connector_specific_config->'index_recursively'
) - 'index_recursively'
WHERE connector_specific_config ? 'index_recursively'
"""
)

View File

@@ -0,0 +1,65 @@
"""add cloud embedding model and update embedding_model
Revision ID: 44f856ae2a4a
Revises: d716b0791ddd
Create Date: 2024-06-28 20:01:05.927647
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "44f856ae2a4a"
down_revision = "d716b0791ddd"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
# Create embedding_provider table
op.create_table(
"embedding_provider",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column("api_key", sa.LargeBinary(), nullable=True),
sa.Column("default_model_id", sa.Integer(), nullable=True),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("name"),
)
# Add cloud_provider_id to embedding_model table
op.add_column(
"embedding_model", sa.Column("cloud_provider_id", sa.Integer(), nullable=True)
)
# Add foreign key constraints
op.create_foreign_key(
"fk_embedding_model_cloud_provider",
"embedding_model",
"embedding_provider",
["cloud_provider_id"],
["id"],
)
op.create_foreign_key(
"fk_embedding_provider_default_model",
"embedding_provider",
"embedding_model",
["default_model_id"],
["id"],
)
def downgrade() -> None:
# Remove foreign key constraints
op.drop_constraint(
"fk_embedding_model_cloud_provider", "embedding_model", type_="foreignkey"
)
op.drop_constraint(
"fk_embedding_provider_default_model", "embedding_provider", type_="foreignkey"
)
# Remove cloud_provider_id column
op.drop_column("embedding_model", "cloud_provider_id")
# Drop embedding_provider table
op.drop_table("embedding_provider")

View File

@@ -0,0 +1,23 @@
"""added is_internet to DBDoc
Revision ID: 4505fd7302e1
Revises: c18cdf4b497e
Create Date: 2024-06-18 20:46:09.095034
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "4505fd7302e1"
down_revision = "c18cdf4b497e"
def upgrade() -> None:
op.add_column("search_doc", sa.Column("is_internet", sa.Boolean(), nullable=True))
op.add_column("tool", sa.Column("display_name", sa.String(), nullable=True))
def downgrade() -> None:
op.drop_column("tool", "display_name")
op.drop_column("search_doc", "is_internet")

View File

@@ -0,0 +1,49 @@
"""Add display_model_names to llm_provider
Revision ID: 473a1a7ca408
Revises: 325975216eb3
Create Date: 2024-07-25 14:31:02.002917
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "473a1a7ca408"
down_revision = "325975216eb3"
branch_labels: None = None
depends_on: None = None
default_models_by_provider = {
"openai": ["gpt-4", "gpt-4o", "gpt-4o-mini"],
"bedrock": [
"meta.llama3-1-70b-instruct-v1:0",
"meta.llama3-1-8b-instruct-v1:0",
"anthropic.claude-3-opus-20240229-v1:0",
"mistral.mistral-large-2402-v1:0",
"anthropic.claude-3-5-sonnet-20240620-v1:0",
],
"anthropic": ["claude-3-opus-20240229", "claude-3-5-sonnet-20240620"],
}
def upgrade() -> None:
op.add_column(
"llm_provider",
sa.Column("display_model_names", postgresql.ARRAY(sa.String()), nullable=True),
)
connection = op.get_bind()
for provider, models in default_models_by_provider.items():
connection.execute(
sa.text(
"UPDATE llm_provider SET display_model_names = :models WHERE provider = :provider"
),
{"models": models, "provider": provider},
)
def downgrade() -> None:
op.drop_column("llm_provider", "display_model_names")

View File

@@ -0,0 +1,61 @@
"""Add support for custom tools
Revision ID: 48d14957fe80
Revises: b85f02ec1308
Create Date: 2024-06-09 14:58:19.946509
"""
from alembic import op
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "48d14957fe80"
down_revision = "b85f02ec1308"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"tool",
sa.Column(
"openapi_schema",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
op.add_column(
"tool",
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
)
op.create_foreign_key("tool_user_fk", "tool", "user", ["user_id"], ["id"])
op.create_table(
"tool_call",
sa.Column("id", sa.Integer(), primary_key=True),
sa.Column("tool_id", sa.Integer(), nullable=False),
sa.Column("tool_name", sa.String(), nullable=False),
sa.Column(
"tool_arguments", postgresql.JSONB(astext_type=sa.Text()), nullable=False
),
sa.Column(
"tool_result", postgresql.JSONB(astext_type=sa.Text()), nullable=False
),
sa.Column(
"message_id", sa.Integer(), sa.ForeignKey("chat_message.id"), nullable=False
),
)
def downgrade() -> None:
op.drop_table("tool_call")
op.drop_constraint("tool_user_fk", "tool", type_="foreignkey")
op.drop_column("tool", "user_id")
op.drop_column("tool", "openapi_schema")

View File

@@ -0,0 +1,80 @@
"""Moved status to connector credential pair
Revision ID: 4a951134c801
Revises: 7477a5f5d728
Create Date: 2024-08-10 19:20:34.527559
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "4a951134c801"
down_revision = "7477a5f5d728"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"connector_credential_pair",
sa.Column(
"status",
sa.Enum(
"ACTIVE",
"PAUSED",
"DELETING",
name="connectorcredentialpairstatus",
native_enum=False,
),
nullable=True,
),
)
# Update status of connector_credential_pair based on connector's disabled status
op.execute(
"""
UPDATE connector_credential_pair
SET status = CASE
WHEN (
SELECT disabled
FROM connector
WHERE connector.id = connector_credential_pair.connector_id
) = FALSE THEN 'ACTIVE'
ELSE 'PAUSED'
END
"""
)
# Make the status column not nullable after setting values
op.alter_column("connector_credential_pair", "status", nullable=False)
op.drop_column("connector", "disabled")
def downgrade() -> None:
op.add_column(
"connector",
sa.Column("disabled", sa.BOOLEAN(), autoincrement=False, nullable=True),
)
# Update disabled status of connector based on connector_credential_pair's status
op.execute(
"""
UPDATE connector
SET disabled = CASE
WHEN EXISTS (
SELECT 1
FROM connector_credential_pair
WHERE connector_credential_pair.connector_id = connector.id
AND connector_credential_pair.status = 'ACTIVE'
) THEN FALSE
ELSE TRUE
END
"""
)
# Make the disabled column not nullable after setting values
op.alter_column("connector", "disabled", nullable=False)
op.drop_column("connector_credential_pair", "status")

View File

@@ -0,0 +1,34 @@
"""change default prune_freq
Revision ID: 4b08d97e175a
Revises: d9ec13955951
Create Date: 2024-08-20 15:28:52.993827
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "4b08d97e175a"
down_revision = "d9ec13955951"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute(
"""
UPDATE connector
SET prune_freq = 2592000
WHERE prune_freq = 86400
"""
)
def downgrade() -> None:
op.execute(
"""
UPDATE connector
SET prune_freq = 86400
WHERE prune_freq = 2592000
"""
)

View File

@@ -0,0 +1,72 @@
"""Add type to credentials
Revision ID: 4ea2c93919c1
Revises: 473a1a7ca408
Create Date: 2024-07-18 13:07:13.655895
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "4ea2c93919c1"
down_revision = "473a1a7ca408"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
# Add the new 'source' column to the 'credential' table
op.add_column(
"credential",
sa.Column(
"source",
sa.String(length=100), # Use String instead of Enum
nullable=True, # Initially allow NULL values
),
)
op.add_column(
"credential",
sa.Column(
"name",
sa.String(),
nullable=True,
),
)
# Create a temporary table that maps each credential to a single connector source.
# This is needed because a credential can be associated with multiple connectors,
# but we want to assign a single source to each credential.
# We use DISTINCT ON to ensure we only get one row per credential_id.
op.execute(
"""
CREATE TEMPORARY TABLE temp_connector_credential AS
SELECT DISTINCT ON (cc.credential_id)
cc.credential_id,
c.source AS connector_source
FROM connector_credential_pair cc
JOIN connector c ON cc.connector_id = c.id
"""
)
# Update the 'source' column in the 'credential' table
op.execute(
"""
UPDATE credential cred
SET source = COALESCE(
(SELECT connector_source
FROM temp_connector_credential temp
WHERE cred.id = temp.credential_id),
'NOT_APPLICABLE'
)
"""
)
# If no exception was raised, alter the column
op.alter_column("credential", "source", nullable=True) # TODO modify
# # ### end Alembic commands ###
def downgrade() -> None:
op.drop_column("credential", "source")
op.drop_column("credential", "name")

View File

@@ -0,0 +1,66 @@
"""Add last synced and last modified to document table
Revision ID: 52a219fb5233
Revises: f7e58d357687
Create Date: 2024-08-28 17:40:46.077470
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import func
# revision identifiers, used by Alembic.
revision = "52a219fb5233"
down_revision = "f7e58d357687"
branch_labels = None
depends_on = None
def upgrade() -> None:
# last modified represents the last time anything needing syncing to vespa changed
# including row metadata and the document itself. This obviously does not include
# the last_synced column.
op.add_column(
"document",
sa.Column(
"last_modified",
sa.DateTime(timezone=True),
nullable=False,
server_default=func.now(),
),
)
# last synced represents the last time this document was synced to Vespa
op.add_column(
"document",
sa.Column("last_synced", sa.DateTime(timezone=True), nullable=True),
)
# Set last_synced to the same value as last_modified for existing rows
op.execute(
"""
UPDATE document
SET last_synced = last_modified
"""
)
op.create_index(
op.f("ix_document_last_modified"),
"document",
["last_modified"],
unique=False,
)
op.create_index(
op.f("ix_document_last_synced"),
"document",
["last_synced"],
unique=False,
)
def downgrade() -> None:
op.drop_index(op.f("ix_document_last_synced"), table_name="document")
op.drop_index(op.f("ix_document_last_modified"), table_name="document")
op.drop_column("document", "last_synced")
op.drop_column("document", "last_modified")

View File

@@ -0,0 +1,79 @@
"""assistant_rework
Revision ID: 55546a7967ee
Revises: 61ff3651add4
Create Date: 2024-09-18 17:00:23.755399
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "55546a7967ee"
down_revision = "61ff3651add4"
branch_labels = None
depends_on = None
def upgrade() -> None:
# Reworking persona and user tables for new assistant features
# keep track of user's chosen assistants separate from their `ordering`
op.add_column("persona", sa.Column("builtin_persona", sa.Boolean(), nullable=True))
op.execute("UPDATE persona SET builtin_persona = default_persona")
op.alter_column("persona", "builtin_persona", nullable=False)
op.drop_index("_default_persona_name_idx", table_name="persona")
op.create_index(
"_builtin_persona_name_idx",
"persona",
["name"],
unique=True,
postgresql_where=sa.text("builtin_persona = true"),
)
op.add_column(
"user", sa.Column("visible_assistants", postgresql.JSONB(), nullable=True)
)
op.add_column(
"user", sa.Column("hidden_assistants", postgresql.JSONB(), nullable=True)
)
op.execute(
"UPDATE \"user\" SET visible_assistants = '[]'::jsonb, hidden_assistants = '[]'::jsonb"
)
op.alter_column(
"user",
"visible_assistants",
nullable=False,
server_default=sa.text("'[]'::jsonb"),
)
op.alter_column(
"user",
"hidden_assistants",
nullable=False,
server_default=sa.text("'[]'::jsonb"),
)
op.drop_column("persona", "default_persona")
op.add_column(
"persona", sa.Column("is_default_persona", sa.Boolean(), nullable=True)
)
def downgrade() -> None:
# Reverting changes made in upgrade
op.drop_column("user", "hidden_assistants")
op.drop_column("user", "visible_assistants")
op.drop_index("_builtin_persona_name_idx", table_name="persona")
op.drop_column("persona", "is_default_persona")
op.add_column("persona", sa.Column("default_persona", sa.Boolean(), nullable=True))
op.execute("UPDATE persona SET default_persona = builtin_persona")
op.alter_column("persona", "default_persona", nullable=False)
op.drop_column("persona", "builtin_persona")
op.create_index(
"_default_persona_name_idx",
"persona",
["name"],
unique=True,
postgresql_where=sa.text("default_persona = true"),
)

View File

@@ -0,0 +1,35 @@
"""match_any_keywords flag for standard answers
Revision ID: 5c7fdadae813
Revises: efb35676026c
Create Date: 2024-09-13 18:52:59.256478
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5c7fdadae813"
down_revision = "efb35676026c"
branch_labels = None
depends_on = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column(
"standard_answer",
sa.Column(
"match_any_keywords",
sa.Boolean(),
nullable=False,
server_default=sa.false(),
),
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column("standard_answer", "match_any_keywords")
# ### end Alembic commands ###

View File

@@ -0,0 +1,25 @@
"""hybrid-enum
Revision ID: 5fc1f54cc252
Revises: 1d6ad76d1f37
Create Date: 2024-08-06 15:35:40.278485
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5fc1f54cc252"
down_revision = "1d6ad76d1f37"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.drop_column("persona", "search_type")
def downgrade() -> None:
op.add_column("persona", sa.Column("search_type", sa.String(), nullable=True))
op.execute("UPDATE persona SET search_type = 'SEMANTIC'")
op.alter_column("persona", "search_type", nullable=False)

View File

@@ -0,0 +1,162 @@
"""Add Permission Syncing
Revision ID: 61ff3651add4
Revises: 1b8206b29c5d
Create Date: 2024-09-05 13:57:11.770413
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "61ff3651add4"
down_revision = "1b8206b29c5d"
branch_labels = None
depends_on = None
def upgrade() -> None:
# Admin user who set up connectors will lose access to the docs temporarily
# only way currently to give back access is to rerun from beginning
op.add_column(
"connector_credential_pair",
sa.Column(
"access_type",
sa.String(),
nullable=True,
),
)
op.execute(
"UPDATE connector_credential_pair SET access_type = 'PUBLIC' WHERE is_public = true"
)
op.execute(
"UPDATE connector_credential_pair SET access_type = 'PRIVATE' WHERE is_public = false"
)
op.alter_column("connector_credential_pair", "access_type", nullable=False)
op.add_column(
"connector_credential_pair",
sa.Column(
"auto_sync_options",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
op.add_column(
"connector_credential_pair",
sa.Column("last_time_perm_sync", sa.DateTime(timezone=True), nullable=True),
)
op.drop_column("connector_credential_pair", "is_public")
op.add_column(
"document",
sa.Column("external_user_emails", postgresql.ARRAY(sa.String()), nullable=True),
)
op.add_column(
"document",
sa.Column(
"external_user_group_ids", postgresql.ARRAY(sa.String()), nullable=True
),
)
op.add_column(
"document",
sa.Column("is_public", sa.Boolean(), nullable=True),
)
op.create_table(
"user__external_user_group_id",
sa.Column(
"user_id", fastapi_users_db_sqlalchemy.generics.GUID(), nullable=False
),
sa.Column("external_user_group_id", sa.String(), nullable=False),
sa.Column("cc_pair_id", sa.Integer(), nullable=False),
sa.PrimaryKeyConstraint("user_id"),
)
op.drop_column("external_permission", "user_id")
op.drop_column("email_to_external_user_cache", "user_id")
op.drop_table("permission_sync_run")
op.drop_table("external_permission")
op.drop_table("email_to_external_user_cache")
def downgrade() -> None:
op.add_column(
"connector_credential_pair",
sa.Column("is_public", sa.BOOLEAN(), nullable=True),
)
op.execute(
"UPDATE connector_credential_pair SET is_public = (access_type = 'PUBLIC')"
)
op.alter_column("connector_credential_pair", "is_public", nullable=False)
op.drop_column("connector_credential_pair", "auto_sync_options")
op.drop_column("connector_credential_pair", "access_type")
op.drop_column("connector_credential_pair", "last_time_perm_sync")
op.drop_column("document", "external_user_emails")
op.drop_column("document", "external_user_group_ids")
op.drop_column("document", "is_public")
op.drop_table("user__external_user_group_id")
# Drop the enum type at the end of the downgrade
op.create_table(
"permission_sync_run",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"source_type",
sa.String(),
nullable=False,
),
sa.Column("update_type", sa.String(), nullable=False),
sa.Column("cc_pair_id", sa.Integer(), nullable=True),
sa.Column(
"status",
sa.String(),
nullable=False,
),
sa.Column("error_msg", sa.Text(), nullable=True),
sa.Column(
"updated_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["cc_pair_id"],
["connector_credential_pair.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"external_permission",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("user_id", sa.UUID(), nullable=True),
sa.Column("user_email", sa.String(), nullable=False),
sa.Column(
"source_type",
sa.String(),
nullable=False,
),
sa.Column("external_permission_group", sa.String(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"email_to_external_user_cache",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("external_user_id", sa.String(), nullable=False),
sa.Column("user_id", sa.UUID(), nullable=True),
sa.Column("user_email", sa.String(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)

View File

@@ -0,0 +1,24 @@
"""Added model defaults for users
Revision ID: 7477a5f5d728
Revises: 213fd978c6d8
Create Date: 2024-08-04 19:00:04.512634
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "7477a5f5d728"
down_revision = "213fd978c6d8"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("user", sa.Column("default_model", sa.Text(), nullable=True))
def downgrade() -> None:
op.drop_column("user", "default_model")

View File

@@ -28,5 +28,9 @@ def upgrade() -> None:
def downgrade() -> None:
# This wasn't really required by the code either, no good reason to make it unique again
pass
op.create_unique_constraint(
"connector_credential_pair__name__key", "connector_credential_pair", ["name"]
)
op.alter_column(
"connector_credential_pair", "name", existing_type=sa.String(), nullable=True
)

View File

@@ -10,7 +10,7 @@ import sqlalchemy as sa
from danswer.db.models import IndexModelStatus
from danswer.search.enums import RecencyBiasSetting
from danswer.search.models import SearchType
from danswer.search.enums import SearchType
# revision identifiers, used by Alembic.
revision = "776b3bbe9092"

View File

@@ -0,0 +1,41 @@
"""add_llm_group_permissions_control
Revision ID: 795b20b85b4b
Revises: 05c07bf07c00
Create Date: 2024-07-19 11:54:35.701558
"""
from alembic import op
import sqlalchemy as sa
revision = "795b20b85b4b"
down_revision = "05c07bf07c00"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"llm_provider__user_group",
sa.Column("llm_provider_id", sa.Integer(), nullable=False),
sa.Column("user_group_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["llm_provider_id"],
["llm_provider.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("llm_provider_id", "user_group_id"),
)
op.add_column(
"llm_provider",
sa.Column("is_public", sa.Boolean(), nullable=False, server_default="true"),
)
def downgrade() -> None:
op.drop_table("llm_provider__user_group")
op.drop_column("llm_provider", "is_public")

View File

@@ -0,0 +1,27 @@
"""persona_start_date
Revision ID: 797089dfb4d2
Revises: 55546a7967ee
Create Date: 2024-09-11 14:51:49.785835
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "797089dfb4d2"
down_revision = "55546a7967ee"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("search_start_date", sa.DateTime(timezone=True), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "search_start_date")

View File

@@ -0,0 +1,35 @@
"""added slack_auto_filter
Revision ID: 7aea705850d5
Revises: 4505fd7302e1
Create Date: 2024-07-10 11:01:23.581015
"""
from alembic import op
import sqlalchemy as sa
revision = "7aea705850d5"
down_revision = "4505fd7302e1"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"slack_bot_config",
sa.Column("enable_auto_filters", sa.Boolean(), nullable=True),
)
op.execute(
"UPDATE slack_bot_config SET enable_auto_filters = FALSE WHERE enable_auto_filters IS NULL"
)
op.alter_column(
"slack_bot_config",
"enable_auto_filters",
existing_type=sa.Boolean(),
nullable=False,
server_default=sa.false(),
)
def downgrade() -> None:
op.drop_column("slack_bot_config", "enable_auto_filters")

View File

@@ -0,0 +1,107 @@
"""associate index attempts with ccpair
Revision ID: 8a87bd6ec550
Revises: 4ea2c93919c1
Create Date: 2024-07-22 15:15:52.558451
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "8a87bd6ec550"
down_revision = "4ea2c93919c1"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
# Add the new connector_credential_pair_id column
op.add_column(
"index_attempt",
sa.Column("connector_credential_pair_id", sa.Integer(), nullable=True),
)
# Create a foreign key constraint to the connector_credential_pair table
op.create_foreign_key(
"fk_index_attempt_connector_credential_pair_id",
"index_attempt",
"connector_credential_pair",
["connector_credential_pair_id"],
["id"],
)
# Populate the new connector_credential_pair_id column using existing connector_id and credential_id
op.execute(
"""
UPDATE index_attempt ia
SET connector_credential_pair_id = (
SELECT id FROM connector_credential_pair ccp
WHERE
(ia.connector_id IS NULL OR ccp.connector_id = ia.connector_id)
AND (ia.credential_id IS NULL OR ccp.credential_id = ia.credential_id)
LIMIT 1
)
WHERE ia.connector_id IS NOT NULL OR ia.credential_id IS NOT NULL
"""
)
# For good measure
op.execute(
"""
DELETE FROM index_attempt
WHERE connector_credential_pair_id IS NULL
"""
)
# Make the new connector_credential_pair_id column non-nullable
op.alter_column("index_attempt", "connector_credential_pair_id", nullable=False)
# Drop the old connector_id and credential_id columns
op.drop_column("index_attempt", "connector_id")
op.drop_column("index_attempt", "credential_id")
# Update the index to use connector_credential_pair_id
op.create_index(
"ix_index_attempt_latest_for_connector_credential_pair",
"index_attempt",
["connector_credential_pair_id", "time_created"],
)
def downgrade() -> None:
# Add back the old connector_id and credential_id columns
op.add_column(
"index_attempt", sa.Column("connector_id", sa.Integer(), nullable=True)
)
op.add_column(
"index_attempt", sa.Column("credential_id", sa.Integer(), nullable=True)
)
# Populate the old connector_id and credential_id columns using the connector_credential_pair_id
op.execute(
"""
UPDATE index_attempt ia
SET connector_id = ccp.connector_id, credential_id = ccp.credential_id
FROM connector_credential_pair ccp
WHERE ia.connector_credential_pair_id = ccp.id
"""
)
# Make the old connector_id and credential_id columns non-nullable
op.alter_column("index_attempt", "connector_id", nullable=False)
op.alter_column("index_attempt", "credential_id", nullable=False)
# Drop the new connector_credential_pair_id column
op.drop_constraint(
"fk_index_attempt_connector_credential_pair_id",
"index_attempt",
type_="foreignkey",
)
op.drop_column("index_attempt", "connector_credential_pair_id")
op.create_index(
"ix_index_attempt_latest_for_connector_credential_pair",
"index_attempt",
["connector_id", "credential_id", "time_created"],
)

View File

@@ -0,0 +1,26 @@
"""add expiry time
Revision ID: 91ffac7e65b3
Revises: bc9771dccadf
Create Date: 2024-06-24 09:39:56.462242
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "91ffac7e65b3"
down_revision = "795b20b85b4b"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"user", sa.Column("oidc_expiry", sa.DateTime(timezone=True), nullable=True)
)
def downgrade() -> None:
op.drop_column("user", "oidc_expiry")

View File

@@ -0,0 +1,158 @@
"""migration confluence to be explicit
Revision ID: a3795dce87be
Revises: 1f60f60c3401
Create Date: 2024-09-01 13:52:12.006740
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
from sqlalchemy.sql import table, column
revision = "a3795dce87be"
down_revision = "1f60f60c3401"
branch_labels: None = None
depends_on: None = None
def extract_confluence_keys_from_url(wiki_url: str) -> tuple[str, str, str, bool]:
from urllib.parse import urlparse
def _extract_confluence_keys_from_cloud_url(wiki_url: str) -> tuple[str, str, str]:
parsed_url = urlparse(wiki_url)
wiki_base = f"{parsed_url.scheme}://{parsed_url.netloc}{parsed_url.path.split('/spaces')[0]}"
path_parts = parsed_url.path.split("/")
space = path_parts[3]
page_id = path_parts[5] if len(path_parts) > 5 else ""
return wiki_base, space, page_id
def _extract_confluence_keys_from_datacenter_url(
wiki_url: str,
) -> tuple[str, str, str]:
DISPLAY = "/display/"
PAGE = "/pages/"
parsed_url = urlparse(wiki_url)
wiki_base = f"{parsed_url.scheme}://{parsed_url.netloc}{parsed_url.path.split(DISPLAY)[0]}"
space = DISPLAY.join(parsed_url.path.split(DISPLAY)[1:]).split("/")[0]
page_id = ""
if (content := parsed_url.path.split(PAGE)) and len(content) > 1:
page_id = content[1]
return wiki_base, space, page_id
is_confluence_cloud = (
".atlassian.net/wiki/spaces/" in wiki_url
or ".jira.com/wiki/spaces/" in wiki_url
)
if is_confluence_cloud:
wiki_base, space, page_id = _extract_confluence_keys_from_cloud_url(wiki_url)
else:
wiki_base, space, page_id = _extract_confluence_keys_from_datacenter_url(
wiki_url
)
return wiki_base, space, page_id, is_confluence_cloud
def reconstruct_confluence_url(
wiki_base: str, space: str, page_id: str, is_cloud: bool
) -> str:
if is_cloud:
url = f"{wiki_base}/spaces/{space}"
if page_id:
url += f"/pages/{page_id}"
else:
url = f"{wiki_base}/display/{space}"
if page_id:
url += f"/pages/{page_id}"
return url
def upgrade() -> None:
connector = table(
"connector",
column("id", sa.Integer),
column("source", sa.String()),
column("input_type", sa.String()),
column("connector_specific_config", postgresql.JSONB),
)
# Fetch all Confluence connectors
connection = op.get_bind()
confluence_connectors = connection.execute(
sa.select(connector).where(
sa.and_(
connector.c.source == "CONFLUENCE", connector.c.input_type == "POLL"
)
)
).fetchall()
for row in confluence_connectors:
config = row.connector_specific_config
wiki_page_url = config["wiki_page_url"]
wiki_base, space, page_id, is_cloud = extract_confluence_keys_from_url(
wiki_page_url
)
new_config = {
"wiki_base": wiki_base,
"space": space,
"page_id": page_id,
"is_cloud": is_cloud,
}
for key, value in config.items():
if key not in ["wiki_page_url"]:
new_config[key] = value
op.execute(
connector.update()
.where(connector.c.id == row.id)
.values(connector_specific_config=new_config)
)
def downgrade() -> None:
connector = table(
"connector",
column("id", sa.Integer),
column("source", sa.String()),
column("input_type", sa.String()),
column("connector_specific_config", postgresql.JSONB),
)
confluence_connectors = (
op.get_bind()
.execute(
sa.select(connector).where(
connector.c.source == "CONFLUENCE", connector.c.input_type == "POLL"
)
)
.fetchall()
)
for row in confluence_connectors:
config = row.connector_specific_config
if all(key in config for key in ["wiki_base", "space", "is_cloud"]):
wiki_page_url = reconstruct_confluence_url(
config["wiki_base"],
config["space"],
config.get("page_id", ""),
config["is_cloud"],
)
new_config = {"wiki_page_url": wiki_page_url}
new_config.update(
{
k: v
for k, v in config.items()
if k not in ["wiki_base", "space", "page_id", "is_cloud"]
}
)
op.execute(
connector.update()
.where(connector.c.id == row.id)
.values(connector_specific_config=new_config)
)

View File

@@ -0,0 +1,27 @@
"""Add chosen_assistants to User table
Revision ID: a3bfd0d64902
Revises: ec85f2b3c544
Create Date: 2024-05-26 17:22:24.834741
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "a3bfd0d64902"
down_revision = "ec85f2b3c544"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"user",
sa.Column("chosen_assistants", postgresql.ARRAY(sa.Integer()), nullable=True),
)
def downgrade() -> None:
op.drop_column("user", "chosen_assistants")

View File

@@ -16,7 +16,6 @@ depends_on: None = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.alter_column(
"connector_credential_pair",
"last_attempt_status",
@@ -29,11 +28,9 @@ def upgrade() -> None:
),
nullable=True,
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.alter_column(
"connector_credential_pair",
"last_attempt_status",
@@ -46,4 +43,3 @@ def downgrade() -> None:
),
nullable=False,
)
# ### end Alembic commands ###

View File

@@ -0,0 +1,28 @@
"""fix-file-type-migration
Revision ID: b85f02ec1308
Revises: a3bfd0d64902
Create Date: 2024-05-31 18:09:26.658164
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "b85f02ec1308"
down_revision = "a3bfd0d64902"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute(
"""
UPDATE file_store
SET file_origin = UPPER(file_origin)
"""
)
def downgrade() -> None:
# Let's not break anything on purpose :)
pass

View File

@@ -0,0 +1,23 @@
"""backfill is_internet data to False
Revision ID: b896bbd0d5a7
Revises: 44f856ae2a4a
Create Date: 2024-07-16 15:21:05.718571
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "b896bbd0d5a7"
down_revision = "44f856ae2a4a"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute("UPDATE search_doc SET is_internet = FALSE WHERE is_internet IS NULL")
def downgrade() -> None:
pass

View File

@@ -0,0 +1,26 @@
"""add support for litellm proxy in reranking
Revision ID: ba98eba0f66a
Revises: bceb1e139447
Create Date: 2024-09-06 10:36:04.507332
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "ba98eba0f66a"
down_revision = "bceb1e139447"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"search_settings", sa.Column("rerank_api_url", sa.String(), nullable=True)
)
def downgrade() -> None:
op.drop_column("search_settings", "rerank_api_url")

View File

@@ -0,0 +1,51 @@
"""create usage reports table
Revision ID: bc9771dccadf
Revises: 0568ccf46a6b
Create Date: 2024-06-18 10:04:26.800282
"""
from alembic import op
import sqlalchemy as sa
import fastapi_users_db_sqlalchemy
# revision identifiers, used by Alembic.
revision = "bc9771dccadf"
down_revision = "0568ccf46a6b"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"usage_reports",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("report_name", sa.String(), nullable=False),
sa.Column(
"requestor_user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column("period_from", sa.DateTime(timezone=True), nullable=True),
sa.Column("period_to", sa.DateTime(timezone=True), nullable=True),
sa.ForeignKeyConstraint(
["report_name"],
["file_store.file_name"],
),
sa.ForeignKeyConstraint(
["requestor_user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("usage_reports")

View File

@@ -0,0 +1,26 @@
"""Add base_url to CloudEmbeddingProvider
Revision ID: bceb1e139447
Revises: a3795dce87be
Create Date: 2024-08-28 17:00:52.554580
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "bceb1e139447"
down_revision = "a3795dce87be"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"embedding_provider", sa.Column("api_url", sa.String(), nullable=True)
)
def downgrade() -> None:
op.drop_column("embedding_provider", "api_url")

View File

@@ -0,0 +1,43 @@
"""non nullable default persona
Revision ID: bd2921608c3a
Revises: 797089dfb4d2
Create Date: 2024-09-20 10:28:37.992042
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "bd2921608c3a"
down_revision = "797089dfb4d2"
branch_labels = None
depends_on = None
def upgrade() -> None:
# Set existing NULL values to False
op.execute(
"UPDATE persona SET is_default_persona = FALSE WHERE is_default_persona IS NULL"
)
# Alter the column to be not nullable with a default value of False
op.alter_column(
"persona",
"is_default_persona",
existing_type=sa.Boolean(),
nullable=False,
server_default=sa.text("false"),
)
def downgrade() -> None:
# Revert the changes
op.alter_column(
"persona",
"is_default_persona",
existing_type=sa.Boolean(),
nullable=True,
server_default=None,
)

View File

@@ -0,0 +1,75 @@
"""Add standard_answer tables
Revision ID: c18cdf4b497e
Revises: 3a7802814195
Create Date: 2024-06-06 15:15:02.000648
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "c18cdf4b497e"
down_revision = "3a7802814195"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"standard_answer",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("keyword", sa.String(), nullable=False),
sa.Column("answer", sa.String(), nullable=False),
sa.Column("active", sa.Boolean(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("keyword"),
)
op.create_table(
"standard_answer_category",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("name"),
)
op.create_table(
"standard_answer__standard_answer_category",
sa.Column("standard_answer_id", sa.Integer(), nullable=False),
sa.Column("standard_answer_category_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["standard_answer_category_id"],
["standard_answer_category.id"],
),
sa.ForeignKeyConstraint(
["standard_answer_id"],
["standard_answer.id"],
),
sa.PrimaryKeyConstraint("standard_answer_id", "standard_answer_category_id"),
)
op.create_table(
"slack_bot_config__standard_answer_category",
sa.Column("slack_bot_config_id", sa.Integer(), nullable=False),
sa.Column("standard_answer_category_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["slack_bot_config_id"],
["slack_bot_config.id"],
),
sa.ForeignKeyConstraint(
["standard_answer_category_id"],
["standard_answer_category.id"],
),
sa.PrimaryKeyConstraint("slack_bot_config_id", "standard_answer_category_id"),
)
op.add_column(
"chat_session", sa.Column("slack_thread_id", sa.String(), nullable=True)
)
def downgrade() -> None:
op.drop_column("chat_session", "slack_thread_id")
op.drop_table("slack_bot_config__standard_answer_category")
op.drop_table("standard_answer__standard_answer_category")
op.drop_table("standard_answer_category")
op.drop_table("standard_answer")

View File

@@ -0,0 +1,57 @@
"""Add index_attempt_errors table
Revision ID: c5b692fa265c
Revises: 4a951134c801
Create Date: 2024-08-08 14:06:39.581972
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "c5b692fa265c"
down_revision = "4a951134c801"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"index_attempt_errors",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("index_attempt_id", sa.Integer(), nullable=True),
sa.Column("batch", sa.Integer(), nullable=True),
sa.Column(
"doc_summaries",
postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
),
sa.Column("error_msg", sa.Text(), nullable=True),
sa.Column("traceback", sa.Text(), nullable=True),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["index_attempt_id"],
["index_attempt.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
"index_attempt_id",
"index_attempt_errors",
["time_created"],
unique=False,
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_index("index_attempt_id", table_name="index_attempt_errors")
op.drop_table("index_attempt_errors")
# ### end Alembic commands ###

View File

@@ -0,0 +1,31 @@
"""add nullable to persona id in Chat Session
Revision ID: c99d76fcd298
Revises: 5c7fdadae813
Create Date: 2024-07-09 19:27:01.579697
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "c99d76fcd298"
down_revision = "5c7fdadae813"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.alter_column(
"chat_session", "persona_id", existing_type=sa.INTEGER(), nullable=True
)
def downgrade() -> None:
op.alter_column(
"chat_session",
"persona_id",
existing_type=sa.INTEGER(),
nullable=False,
)

View File

@@ -19,6 +19,9 @@ depends_on: None = None
def upgrade() -> None:
op.drop_table("deletion_attempt")
# Remove the DeletionStatus enum
op.execute("DROP TYPE IF EXISTS deletionstatus;")
def downgrade() -> None:
op.create_table(

View File

@@ -0,0 +1,45 @@
"""combined slack id fields
Revision ID: d716b0791ddd
Revises: 7aea705850d5
Create Date: 2024-07-10 17:57:45.630550
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "d716b0791ddd"
down_revision = "7aea705850d5"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute(
"""
UPDATE slack_bot_config
SET channel_config = jsonb_set(
channel_config,
'{respond_member_group_list}',
coalesce(channel_config->'respond_team_member_list', '[]'::jsonb) ||
coalesce(channel_config->'respond_slack_group_list', '[]'::jsonb)
) - 'respond_team_member_list' - 'respond_slack_group_list'
"""
)
def downgrade() -> None:
op.execute(
"""
UPDATE slack_bot_config
SET channel_config = jsonb_set(
jsonb_set(
channel_config - 'respond_member_group_list',
'{respond_team_member_list}',
'[]'::jsonb
),
'{respond_slack_group_list}',
'[]'::jsonb
)
"""
)

View File

@@ -0,0 +1,31 @@
"""Remove _alt suffix from model_name
Revision ID: d9ec13955951
Revises: da4c21c69164
Create Date: 2024-08-20 16:31:32.955686
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "d9ec13955951"
down_revision = "da4c21c69164"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute(
"""
UPDATE embedding_model
SET model_name = regexp_replace(model_name, '__danswer_alt_index$', '')
WHERE model_name LIKE '%__danswer_alt_index'
"""
)
def downgrade() -> None:
# We can't reliably add the __danswer_alt_index suffix back, so we'll leave this empty
pass

View File

@@ -0,0 +1,65 @@
"""chosen_assistants changed to jsonb
Revision ID: da4c21c69164
Revises: c5b692fa265c
Create Date: 2024-08-18 19:06:47.291491
"""
import json
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "da4c21c69164"
down_revision = "c5b692fa265c"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
conn = op.get_bind()
existing_ids_and_chosen_assistants = conn.execute(
sa.text("select id, chosen_assistants from public.user")
)
op.drop_column(
"user",
"chosen_assistants",
)
op.add_column(
"user",
sa.Column(
"chosen_assistants",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
for id, chosen_assistants in existing_ids_and_chosen_assistants:
conn.execute(
sa.text(
"update public.user set chosen_assistants = :chosen_assistants where id = :id"
),
{"chosen_assistants": json.dumps(chosen_assistants), "id": id},
)
def downgrade() -> None:
conn = op.get_bind()
existing_ids_and_chosen_assistants = conn.execute(
sa.text("select id, chosen_assistants from public.user")
)
op.drop_column(
"user",
"chosen_assistants",
)
op.add_column(
"user",
sa.Column("chosen_assistants", postgresql.ARRAY(sa.Integer()), nullable=True),
)
for id, chosen_assistants in existing_ids_and_chosen_assistants:
conn.execute(
sa.text(
"update public.user set chosen_assistants = :chosen_assistants where id = :id"
),
{"chosen_assistants": chosen_assistants, "id": id},
)

View File

@@ -9,7 +9,7 @@ from alembic import op
import sqlalchemy as sa
from sqlalchemy import table, column, String, Integer, Boolean
from danswer.db.embedding_model import (
from danswer.db.search_settings import (
get_new_default_embedding_model,
get_old_default_embedding_model,
user_has_overridden_embedding_model,
@@ -71,14 +71,14 @@ def upgrade() -> None:
"query_prefix": old_embedding_model.query_prefix,
"passage_prefix": old_embedding_model.passage_prefix,
"index_name": old_embedding_model.index_name,
"status": old_embedding_model.status,
"status": IndexModelStatus.PRESENT,
}
],
)
# if the user has not overridden the default embedding model via env variables,
# insert the new default model into the database to auto-upgrade them
if not user_has_overridden_embedding_model():
new_embedding_model = get_new_default_embedding_model(is_present=False)
new_embedding_model = get_new_default_embedding_model()
op.bulk_insert(
EmbeddingModel,
[
@@ -136,4 +136,4 @@ def downgrade() -> None:
)
op.drop_column("index_attempt", "embedding_model_id")
op.drop_table("embedding_model")
op.execute("DROP TYPE indexmodelstatus;")
op.execute("DROP TYPE IF EXISTS indexmodelstatus;")

View File

@@ -0,0 +1,58 @@
"""Added input prompts
Revision ID: e1392f05e840
Revises: 08a1eda20fe1
Create Date: 2024-07-13 19:09:22.556224
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "e1392f05e840"
down_revision = "08a1eda20fe1"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"inputprompt",
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
sa.Column("prompt", sa.String(), nullable=False),
sa.Column("content", sa.String(), nullable=False),
sa.Column("active", sa.Boolean(), nullable=False),
sa.Column("is_public", sa.Boolean(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"inputprompt__user",
sa.Column("input_prompt_id", sa.Integer(), nullable=False),
sa.Column("user_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["input_prompt_id"],
["inputprompt.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["inputprompt.id"],
),
sa.PrimaryKeyConstraint("input_prompt_id", "user_id"),
)
def downgrade() -> None:
op.drop_table("inputprompt__user")
op.drop_table("inputprompt")

View File

@@ -0,0 +1,22 @@
"""added-prune-frequency
Revision ID: e209dc5a8156
Revises: 48d14957fe80
Create Date: 2024-06-16 16:02:35.273231
"""
from alembic import op
import sqlalchemy as sa
revision = "e209dc5a8156"
down_revision = "48d14957fe80"
branch_labels = None # type: ignore
depends_on = None # type: ignore
def upgrade() -> None:
op.add_column("connector", sa.Column("prune_freq", sa.Integer(), nullable=True))
def downgrade() -> None:
op.drop_column("connector", "prune_freq")

View File

@@ -10,7 +10,7 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "ec85f2b3c544"
down_revision = "3879338f8ba1"
down_revision = "70f00c45c0f2"
branch_labels: None = None
depends_on: None = None

View File

@@ -0,0 +1,28 @@
"""Added alternate model to chat message
Revision ID: ee3f4b47fad5
Revises: 2d2304e27d8c
Create Date: 2024-08-12 00:11:50.915845
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "ee3f4b47fad5"
down_revision = "2d2304e27d8c"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_message",
sa.Column("overridden_model", sa.String(length=255), nullable=True),
)
def downgrade() -> None:
op.drop_column("chat_message", "overridden_model")

View File

@@ -0,0 +1,32 @@
"""standard answer match_regex flag
Revision ID: efb35676026c
Revises: 0ebb1d516877
Create Date: 2024-09-11 13:55:46.101149
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "efb35676026c"
down_revision = "0ebb1d516877"
branch_labels = None
depends_on = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column(
"standard_answer",
sa.Column(
"match_regex", sa.Boolean(), nullable=False, server_default=sa.false()
),
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column("standard_answer", "match_regex")
# ### end Alembic commands ###

View File

@@ -0,0 +1,172 @@
"""embedding provider by provider type
Revision ID: f17bf3b0d9f1
Revises: 351faebd379d
Create Date: 2024-08-21 13:13:31.120460
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "f17bf3b0d9f1"
down_revision = "351faebd379d"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
# Add provider_type column to embedding_provider
op.add_column(
"embedding_provider",
sa.Column("provider_type", sa.String(50), nullable=True),
)
# Update provider_type with existing name values
op.execute("UPDATE embedding_provider SET provider_type = UPPER(name)")
# Make provider_type not nullable
op.alter_column("embedding_provider", "provider_type", nullable=False)
# Drop the foreign key constraint in embedding_model table
op.drop_constraint(
"fk_embedding_model_cloud_provider", "embedding_model", type_="foreignkey"
)
# Drop the existing primary key constraint
op.drop_constraint("embedding_provider_pkey", "embedding_provider", type_="primary")
# Create a new primary key constraint on provider_type
op.create_primary_key(
"embedding_provider_pkey", "embedding_provider", ["provider_type"]
)
# Add provider_type column to embedding_model
op.add_column(
"embedding_model",
sa.Column("provider_type", sa.String(50), nullable=True),
)
# Update provider_type for existing embedding models
op.execute(
"""
UPDATE embedding_model
SET provider_type = (
SELECT provider_type
FROM embedding_provider
WHERE embedding_provider.id = embedding_model.cloud_provider_id
)
"""
)
# Drop the old id column from embedding_provider
op.drop_column("embedding_provider", "id")
# Drop the name column from embedding_provider
op.drop_column("embedding_provider", "name")
# Drop the default_model_id column from embedding_provider
op.drop_column("embedding_provider", "default_model_id")
# Drop the old cloud_provider_id column from embedding_model
op.drop_column("embedding_model", "cloud_provider_id")
# Create the new foreign key constraint
op.create_foreign_key(
"fk_embedding_model_cloud_provider",
"embedding_model",
"embedding_provider",
["provider_type"],
["provider_type"],
)
def downgrade() -> None:
# Drop the foreign key constraint in embedding_model table
op.drop_constraint(
"fk_embedding_model_cloud_provider", "embedding_model", type_="foreignkey"
)
# Add back the cloud_provider_id column to embedding_model
op.add_column(
"embedding_model", sa.Column("cloud_provider_id", sa.Integer(), nullable=True)
)
op.add_column("embedding_provider", sa.Column("id", sa.Integer(), nullable=True))
# Assign incrementing IDs to embedding providers
op.execute(
"""
CREATE SEQUENCE IF NOT EXISTS embedding_provider_id_seq;"""
)
op.execute(
"""
UPDATE embedding_provider SET id = nextval('embedding_provider_id_seq');
"""
)
# Update cloud_provider_id based on provider_type
op.execute(
"""
UPDATE embedding_model
SET cloud_provider_id = CASE
WHEN provider_type IS NULL THEN NULL
ELSE (
SELECT id
FROM embedding_provider
WHERE embedding_provider.provider_type = embedding_model.provider_type
)
END
"""
)
# Drop the provider_type column from embedding_model
op.drop_column("embedding_model", "provider_type")
# Add back the columns to embedding_provider
op.add_column("embedding_provider", sa.Column("name", sa.String(50), nullable=True))
op.add_column(
"embedding_provider", sa.Column("default_model_id", sa.Integer(), nullable=True)
)
# Drop the existing primary key constraint on provider_type
op.drop_constraint("embedding_provider_pkey", "embedding_provider", type_="primary")
# Create the original primary key constraint on id
op.create_primary_key("embedding_provider_pkey", "embedding_provider", ["id"])
# Update name with existing provider_type values
op.execute(
"""
UPDATE embedding_provider
SET name = CASE
WHEN provider_type = 'OPENAI' THEN 'OpenAI'
WHEN provider_type = 'COHERE' THEN 'Cohere'
WHEN provider_type = 'GOOGLE' THEN 'Google'
WHEN provider_type = 'VOYAGE' THEN 'Voyage'
ELSE provider_type
END
"""
)
# Drop the provider_type column from embedding_provider
op.drop_column("embedding_provider", "provider_type")
# Recreate the foreign key constraint in embedding_model table
op.create_foreign_key(
"fk_embedding_model_cloud_provider",
"embedding_model",
"embedding_provider",
["cloud_provider_id"],
["id"],
)
# Recreate the foreign key constraint in embedding_model table
op.create_foreign_key(
"fk_embedding_provider_default_model",
"embedding_provider",
"embedding_model",
["default_model_id"],
["id"],
)

View File

@@ -0,0 +1,26 @@
"""add custom headers to tools
Revision ID: f32615f71aeb
Revises: bd2921608c3a
Create Date: 2024-09-12 20:26:38.932377
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "f32615f71aeb"
down_revision = "bd2921608c3a"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"tool", sa.Column("custom_headers", postgresql.JSONB(), nullable=True)
)
def downgrade() -> None:
op.drop_column("tool", "custom_headers")

View File

@@ -0,0 +1,26 @@
"""add has_web_login column to user
Revision ID: f7e58d357687
Revises: ba98eba0f66a
Create Date: 2024-09-07 20:20:54.522620
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "f7e58d357687"
down_revision = "ba98eba0f66a"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"user",
sa.Column("has_web_login", sa.Boolean(), nullable=False, server_default="true"),
)
def downgrade() -> None:
op.drop_column("user", "has_web_login")

2
backend/assets/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
*
!.gitignore

View File

@@ -1,49 +1,95 @@
from sqlalchemy.orm import Session
from danswer.access.models import DocumentAccess
from danswer.access.utils import prefix_user_email
from danswer.configs.constants import PUBLIC_DOC_PAT
from danswer.db.document import get_acccess_info_for_documents
from danswer.db.document import get_access_info_for_document
from danswer.db.document import get_access_info_for_documents
from danswer.db.models import User
from danswer.server.documents.models import ConnectorCredentialPairIdentifier
from danswer.utils.variable_functionality import fetch_versioned_implementation
def _get_access_for_document(
document_id: str,
db_session: Session,
) -> DocumentAccess:
info = get_access_info_for_document(
db_session=db_session,
document_id=document_id,
)
return DocumentAccess.build(
user_emails=info[1] if info and info[1] else [],
user_groups=[],
external_user_emails=[],
external_user_group_ids=[],
is_public=info[2] if info else False,
)
def get_access_for_document(
document_id: str,
db_session: Session,
) -> DocumentAccess:
versioned_get_access_for_document_fn = fetch_versioned_implementation(
"danswer.access.access", "_get_access_for_document"
)
return versioned_get_access_for_document_fn(document_id, db_session) # type: ignore
def get_null_document_access() -> DocumentAccess:
return DocumentAccess(
user_emails=set(),
user_groups=set(),
is_public=False,
external_user_emails=set(),
external_user_group_ids=set(),
)
def _get_access_for_documents(
document_ids: list[str],
db_session: Session,
cc_pair_to_delete: ConnectorCredentialPairIdentifier | None = None,
) -> dict[str, DocumentAccess]:
document_access_info = get_acccess_info_for_documents(
document_access_info = get_access_info_for_documents(
db_session=db_session,
document_ids=document_ids,
cc_pair_to_delete=cc_pair_to_delete,
)
return {
document_id: DocumentAccess.build(user_ids, is_public)
for document_id, user_ids, is_public in document_access_info
doc_access = {
document_id: DocumentAccess(
user_emails=set([email for email in user_emails if email]),
# MIT version will wipe all groups and external groups on update
user_groups=set(),
is_public=is_public,
external_user_emails=set(),
external_user_group_ids=set(),
)
for document_id, user_emails, is_public in document_access_info
}
# Sometimes the document has not be indexed by the indexing job yet, in those cases
# the document does not exist and so we use least permissive. Specifically the EE version
# checks the MIT version permissions and creates a superset. This ensures that this flow
# does not fail even if the Document has not yet been indexed.
for doc_id in document_ids:
if doc_id not in doc_access:
doc_access[doc_id] = get_null_document_access()
return doc_access
def get_access_for_documents(
document_ids: list[str],
db_session: Session,
cc_pair_to_delete: ConnectorCredentialPairIdentifier | None = None,
) -> dict[str, DocumentAccess]:
"""Fetches all access information for the given documents."""
versioned_get_access_for_documents_fn = fetch_versioned_implementation(
"danswer.access.access", "_get_access_for_documents"
)
return versioned_get_access_for_documents_fn(
document_ids, db_session, cc_pair_to_delete
document_ids, db_session
) # type: ignore
def prefix_user(user_id: str) -> str:
"""Prefixes a user ID to eliminate collision with group names.
This assumes that groups are prefixed with a different prefix."""
return f"user_id:{user_id}"
def _get_acl_for_user(user: User | None, db_session: Session) -> set[str]:
"""Returns a list of ACL entries that the user has access to. This is meant to be
used downstream to filter out documents that the user does not have access to. The
@@ -51,7 +97,7 @@ def _get_acl_for_user(user: User | None, db_session: Session) -> set[str]:
matches one entry in the returned set.
"""
if user:
return {prefix_user(str(user.id)), PUBLIC_DOC_PAT}
return {prefix_user_email(user.email), PUBLIC_DOC_PAT}
return {PUBLIC_DOC_PAT}

View File

@@ -1,20 +1,72 @@
from dataclasses import dataclass
from uuid import UUID
from danswer.access.utils import prefix_external_group
from danswer.access.utils import prefix_user_email
from danswer.access.utils import prefix_user_group
from danswer.configs.constants import PUBLIC_DOC_PAT
@dataclass(frozen=True)
class DocumentAccess:
user_ids: set[str] # stringified UUIDs
class ExternalAccess:
# Emails of external users with access to the doc externally
external_user_emails: set[str]
# Names or external IDs of groups with access to the doc
external_user_group_ids: set[str]
# Whether the document is public in the external system or Danswer
is_public: bool
def to_acl(self) -> list[str]:
return list(self.user_ids) + ([PUBLIC_DOC_PAT] if self.is_public else [])
@dataclass(frozen=True)
class DocumentAccess(ExternalAccess):
# User emails for Danswer users, None indicates admin
user_emails: set[str | None]
# Names of user groups associated with this document
user_groups: set[str]
def to_acl(self) -> set[str]:
return set(
[
prefix_user_email(user_email)
for user_email in self.user_emails
if user_email
]
+ [prefix_user_group(group_name) for group_name in self.user_groups]
+ [
prefix_user_email(user_email)
for user_email in self.external_user_emails
]
+ [
# The group names are already prefixed by the source type
# This adds an additional prefix of "external_group:"
prefix_external_group(group_name)
for group_name in self.external_user_group_ids
]
+ ([PUBLIC_DOC_PAT] if self.is_public else [])
)
@classmethod
def build(cls, user_ids: list[UUID | None], is_public: bool) -> "DocumentAccess":
def build(
cls,
user_emails: list[str | None],
user_groups: list[str],
external_user_emails: list[str],
external_user_group_ids: list[str],
is_public: bool,
) -> "DocumentAccess":
return cls(
user_ids={str(user_id) for user_id in user_ids if user_id},
external_user_emails={
prefix_user_email(external_email)
for external_email in external_user_emails
},
external_user_group_ids={
prefix_external_group(external_group_id)
for external_group_id in external_user_group_ids
},
user_emails={
prefix_user_email(user_email)
for user_email in user_emails
if user_email
},
user_groups=set(user_groups),
is_public=is_public,
)

View File

@@ -0,0 +1,24 @@
from danswer.configs.constants import DocumentSource
def prefix_user_email(user_email: str) -> str:
"""Prefixes a user email to eliminate collision with group names.
This applies to both a Danswer user and an External user, this is to make the query time
more efficient"""
return f"user_email:{user_email}"
def prefix_user_group(user_group_name: str) -> str:
"""Prefixes a user group name to eliminate collision with user emails.
This assumes that user ids are prefixed with a different prefix."""
return f"group:{user_group_name}"
def prefix_external_group(ext_group_name: str) -> str:
"""Prefixes an external group name to eliminate collision with user emails / Danswer groups."""
return f"external_group:{ext_group_name}"
def prefix_group_w_source(ext_group_name: str, source: DocumentSource) -> str:
"""External groups may collide across sources, every source needs its own prefix."""
return f"{source.value.upper()}_{ext_group_name}"

View File

@@ -0,0 +1,20 @@
from typing import cast
from danswer.configs.constants import KV_USER_STORE_KEY
from danswer.dynamic_configs.factory import get_dynamic_config_store
from danswer.dynamic_configs.interface import ConfigNotFoundError
from danswer.dynamic_configs.interface import JSON_ro
def get_invited_users() -> list[str]:
try:
store = get_dynamic_config_store()
return cast(list, store.load(KV_USER_STORE_KEY))
except ConfigNotFoundError:
return list()
def write_invited_users(emails: list[str]) -> int:
store = get_dynamic_config_store()
store.store(KV_USER_STORE_KEY, cast(JSON_ro, emails))
return len(emails)

View File

@@ -0,0 +1,38 @@
from collections.abc import Mapping
from typing import Any
from typing import cast
from danswer.auth.schemas import UserRole
from danswer.configs.constants import KV_NO_AUTH_USER_PREFERENCES_KEY
from danswer.dynamic_configs.store import ConfigNotFoundError
from danswer.dynamic_configs.store import DynamicConfigStore
from danswer.server.manage.models import UserInfo
from danswer.server.manage.models import UserPreferences
def set_no_auth_user_preferences(
store: DynamicConfigStore, preferences: UserPreferences
) -> None:
store.store(KV_NO_AUTH_USER_PREFERENCES_KEY, preferences.model_dump())
def load_no_auth_user_preferences(store: DynamicConfigStore) -> UserPreferences:
try:
preferences_data = cast(
Mapping[str, Any], store.load(KV_NO_AUTH_USER_PREFERENCES_KEY)
)
return UserPreferences(**preferences_data)
except ConfigNotFoundError:
return UserPreferences(chosen_assistants=None, default_model=None)
def fetch_no_auth_user(store: DynamicConfigStore) -> UserInfo:
return UserInfo(
id="__no_auth_user__",
email="anonymous@danswer.ai",
is_active=True,
is_superuser=False,
is_verified=True,
role=UserRole.ADMIN,
preferences=load_no_auth_user_preferences(store),
)

View File

@@ -5,8 +5,26 @@ from fastapi_users import schemas
class UserRole(str, Enum):
"""
User roles
- Basic can't perform any admin actions
- Admin can perform all admin actions
- Curator can perform admin actions for
groups they are curators of
- Global Curator can perform admin actions
for all groups they are a member of
"""
BASIC = "basic"
ADMIN = "admin"
CURATOR = "curator"
GLOBAL_CURATOR = "global_curator"
class UserStatus(str, Enum):
LIVE = "live"
INVITED = "invited"
DEACTIVATED = "deactivated"
class UserRead(schemas.BaseUser[uuid.UUID]):
@@ -15,7 +33,9 @@ class UserRead(schemas.BaseUser[uuid.UUID]):
class UserCreate(schemas.BaseUserCreate):
role: UserRole = UserRole.BASIC
has_web_login: bool | None = True
class UserUpdate(schemas.BaseUserUpdate):
role: UserRole
has_web_login: bool | None = True

View File

@@ -1,19 +1,24 @@
import os
import smtplib
import uuid
from collections.abc import AsyncGenerator
from datetime import datetime
from datetime import timezone
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from typing import Optional
from typing import Tuple
from email_validator import EmailNotValidError
from email_validator import validate_email
from fastapi import APIRouter
from fastapi import Depends
from fastapi import HTTPException
from fastapi import Request
from fastapi import Response
from fastapi import status
from fastapi.security import OAuth2PasswordRequestForm
from fastapi_users import BaseUserManager
from fastapi_users import exceptions
from fastapi_users import FastAPIUsers
from fastapi_users import models
from fastapi_users import schemas
@@ -27,8 +32,10 @@ from fastapi_users.openapi import OpenAPIResponseType
from fastapi_users_db_sqlalchemy import SQLAlchemyUserDatabase
from sqlalchemy.orm import Session
from danswer.auth.invited_users import get_invited_users
from danswer.auth.schemas import UserCreate
from danswer.auth.schemas import UserRole
from danswer.auth.schemas import UserUpdate
from danswer.configs.app_configs import AUTH_TYPE
from danswer.configs.app_configs import DISABLE_AUTH
from danswer.configs.app_configs import EMAIL_FROM
@@ -38,6 +45,7 @@ from danswer.configs.app_configs import SMTP_PASS
from danswer.configs.app_configs import SMTP_PORT
from danswer.configs.app_configs import SMTP_SERVER
from danswer.configs.app_configs import SMTP_USER
from danswer.configs.app_configs import TRACK_EXTERNAL_IDP_EXPIRY
from danswer.configs.app_configs import USER_AUTH_SECRET
from danswer.configs.app_configs import VALID_EMAIL_DOMAINS
from danswer.configs.app_configs import WEB_DOMAIN
@@ -46,21 +54,28 @@ from danswer.configs.constants import DANSWER_API_KEY_DUMMY_EMAIL_DOMAIN
from danswer.configs.constants import DANSWER_API_KEY_PREFIX
from danswer.configs.constants import UNNAMED_KEY_PLACEHOLDER
from danswer.db.auth import get_access_token_db
from danswer.db.auth import get_default_admin_user_emails
from danswer.db.auth import get_user_count
from danswer.db.auth import get_user_db
from danswer.db.engine import get_session
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.models import AccessToken
from danswer.db.models import User
from danswer.db.users import get_user_by_email
from danswer.utils.logger import setup_logger
from danswer.utils.telemetry import optional_telemetry
from danswer.utils.telemetry import RecordType
from danswer.utils.variable_functionality import fetch_versioned_implementation
logger = setup_logger()
USER_WHITELIST_FILE = "/home/danswer_whitelist.txt"
_user_whitelist: list[str] | None = None
def is_user_admin(user: User | None) -> bool:
if AUTH_TYPE == AuthType.DISABLED:
return True
if user and user.role == UserRole.ADMIN:
return True
return False
def verify_auth_setting() -> None:
@@ -69,7 +84,7 @@ def verify_auth_setting() -> None:
"User must choose a valid user authentication method: "
"disabled, basic, or google_oauth"
)
logger.info(f"Using Auth Type: {AUTH_TYPE.value}")
logger.notice(f"Using Auth Type: {AUTH_TYPE.value}")
def get_display_email(email: str | None, space_less: bool = False) -> str:
@@ -92,22 +107,36 @@ def user_needs_to_be_verified() -> bool:
return AUTH_TYPE != AuthType.BASIC or REQUIRE_EMAIL_VERIFICATION
def get_user_whitelist() -> list[str]:
global _user_whitelist
if _user_whitelist is None:
if os.path.exists(USER_WHITELIST_FILE):
with open(USER_WHITELIST_FILE, "r") as file:
_user_whitelist = [line.strip() for line in file]
else:
_user_whitelist = []
def verify_email_is_invited(email: str) -> None:
whitelist = get_invited_users()
if not whitelist:
return
return _user_whitelist
if not email:
raise PermissionError("Email must be specified")
email_info = validate_email(email) # can raise EmailNotValidError
for email_whitelist in whitelist:
try:
# normalized emails are now being inserted into the db
# we can remove this normalization on read after some time has passed
email_info_whitelist = validate_email(email_whitelist)
except EmailNotValidError:
continue
# oddly, normalization does not include lowercasing the user part of the
# email address ... which we want to allow
if email_info.normalized.lower() == email_info_whitelist.normalized.lower():
return
raise PermissionError("User not on allowed user whitelist")
def verify_email_in_whitelist(email: str) -> None:
whitelist = get_user_whitelist()
if (whitelist and email not in whitelist) or not email:
raise PermissionError("User not on allowed user whitelist")
with Session(get_sqlalchemy_engine()) as db_session:
if not get_user_by_email(email, db_session):
verify_email_is_invited(email)
def verify_email_domain(email: str) -> None:
@@ -158,16 +187,36 @@ class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
user_create: schemas.UC | UserCreate,
safe: bool = False,
request: Optional[Request] = None,
) -> models.UP:
verify_email_in_whitelist(user_create.email)
) -> User:
verify_email_is_invited(user_create.email)
verify_email_domain(user_create.email)
if hasattr(user_create, "role"):
user_count = await get_user_count()
if user_count == 0:
if user_count == 0 or user_create.email in get_default_admin_user_emails():
user_create.role = UserRole.ADMIN
else:
user_create.role = UserRole.BASIC
return await super().create(user_create, safe=safe, request=request) # type: ignore
user = None
try:
user = await super().create(user_create, safe=safe, request=request) # type: ignore
except exceptions.UserAlreadyExists:
user = await self.get_by_email(user_create.email)
# Handle case where user has used product outside of web and is now creating an account through web
if (
not user.has_web_login
and hasattr(user_create, "has_web_login")
and user_create.has_web_login
):
user_update = UserUpdate(
password=user_create.password,
has_web_login=True,
role=user_create.role,
is_verified=user_create.is_verified,
)
user = await self.update(user_update, user)
else:
raise exceptions.UserAlreadyExists()
return user
async def oauth_callback(
self: "BaseUserManager[models.UOAP, models.ID]",
@@ -185,7 +234,7 @@ class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
verify_email_in_whitelist(account_email)
verify_email_domain(account_email)
return await super().oauth_callback( # type: ignore
user = await super().oauth_callback( # type: ignore
oauth_name=oauth_name,
access_token=access_token,
account_id=account_id,
@@ -197,10 +246,35 @@ class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
is_verified_by_default=is_verified_by_default,
)
# NOTE: Most IdPs have very short expiry times, and we don't want to force the user to
# re-authenticate that frequently, so by default this is disabled
if expires_at and TRACK_EXTERNAL_IDP_EXPIRY:
oidc_expiry = datetime.fromtimestamp(expires_at, tz=timezone.utc)
await self.user_db.update(user, update_dict={"oidc_expiry": oidc_expiry})
# this is needed if an organization goes from `TRACK_EXTERNAL_IDP_EXPIRY=true` to `false`
# otherwise, the oidc expiry will always be old, and the user will never be able to login
if user.oidc_expiry and not TRACK_EXTERNAL_IDP_EXPIRY:
await self.user_db.update(user, update_dict={"oidc_expiry": None})
# Handle case where user has used product outside of web and is now creating an account through web
if not user.has_web_login:
await self.user_db.update(
user,
update_dict={
"is_verified": is_verified_by_default,
"has_web_login": True,
},
)
user.is_verified = is_verified_by_default
user.has_web_login = True
return user
async def on_after_register(
self, user: User, request: Optional[Request] = None
) -> None:
logger.info(f"User {user.id} has registered.")
logger.notice(f"User {user.id} has registered.")
optional_telemetry(
record_type=RecordType.SIGN_UP,
data={"action": "create"},
@@ -210,19 +284,45 @@ class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
async def on_after_forgot_password(
self, user: User, token: str, request: Optional[Request] = None
) -> None:
logger.info(f"User {user.id} has forgot their password. Reset token: {token}")
logger.notice(f"User {user.id} has forgot their password. Reset token: {token}")
async def on_after_request_verify(
self, user: User, token: str, request: Optional[Request] = None
) -> None:
verify_email_domain(user.email)
logger.info(
logger.notice(
f"Verification requested for user {user.id}. Verification token: {token}"
)
send_user_verification_email(user.email, token)
async def authenticate(
self, credentials: OAuth2PasswordRequestForm
) -> Optional[User]:
try:
user = await self.get_by_email(credentials.username)
except exceptions.UserNotExists:
self.password_helper.hash(credentials.password)
return None
if not user.has_web_login:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="NO_WEB_LOGIN_AND_HAS_NO_PASSWORD",
)
verified, updated_password_hash = self.password_helper.verify_and_update(
credentials.password, user.hashed_password
)
if not verified:
return None
if updated_password_hash is not None:
await self.user_db.update(user, {"hashed_password": updated_password_hash})
return user
async def get_user_manager(
user_db: SQLAlchemyUserDatabase = Depends(get_user_db),
@@ -239,10 +339,12 @@ cookie_transport = CookieTransport(
def get_database_strategy(
access_token_db: AccessTokenDatabase[AccessToken] = Depends(get_access_token_db),
) -> DatabaseStrategy:
return DatabaseStrategy(
strategy = DatabaseStrategy(
access_token_db, lifetime_seconds=SESSION_EXPIRE_TIME_SECONDS # type: ignore
)
return strategy
auth_backend = AuthenticationBackend(
name="database",
@@ -323,6 +425,7 @@ async def optional_user(
async def double_check_user(
user: User | None,
optional: bool = DISABLE_AUTH,
include_expired: bool = False,
) -> User | None:
if optional:
return None
@@ -339,15 +442,53 @@ async def double_check_user(
detail="Access denied. User is not verified.",
)
if (
user.oidc_expiry
and user.oidc_expiry < datetime.now(timezone.utc)
and not include_expired
):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User's OIDC token has expired.",
)
return user
async def current_user_with_expired_token(
user: User | None = Depends(optional_user),
) -> User | None:
return await double_check_user(user, include_expired=True)
async def current_user(
user: User | None = Depends(optional_user),
) -> User | None:
return await double_check_user(user)
async def current_curator_or_admin_user(
user: User | None = Depends(current_user),
) -> User | None:
if DISABLE_AUTH:
return None
if not user or not hasattr(user, "role"):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not authenticated or lacks role information.",
)
allowed_roles = {UserRole.GLOBAL_CURATOR, UserRole.CURATOR, UserRole.ADMIN}
if user.role not in allowed_roles:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not a curator or admin.",
)
return user
async def current_admin_user(user: User | None = Depends(current_user)) -> User | None:
if DISABLE_AUTH:
return None
@@ -355,6 +496,12 @@ async def current_admin_user(user: User | None = Depends(current_user)) -> User
if not user or not hasattr(user, "role") or user.role != UserRole.ADMIN:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not an admin.",
detail="Access denied. User must be an admin to perform this action.",
)
return user
def get_default_admin_user_emails_() -> list[str]:
# No default seeding available for Danswer MIT
return []

View File

@@ -1,214 +0,0 @@
from datetime import timedelta
from typing import cast
from celery import Celery # type: ignore
from sqlalchemy.orm import Session
from danswer.background.connector_deletion import delete_connector_credential_pair
from danswer.background.task_utils import build_celery_task_wrapper
from danswer.background.task_utils import name_cc_cleanup_task
from danswer.background.task_utils import name_document_set_sync_task
from danswer.configs.app_configs import JOB_TIMEOUT
from danswer.db.connector_credential_pair import get_connector_credential_pair
from danswer.db.deletion_attempt import check_deletion_attempt_is_allowed
from danswer.db.document import prepare_to_modify_documents
from danswer.db.document_set import delete_document_set
from danswer.db.document_set import fetch_document_sets
from danswer.db.document_set import fetch_document_sets_for_documents
from danswer.db.document_set import fetch_documents_for_document_set_paginated
from danswer.db.document_set import get_document_set_by_id
from danswer.db.document_set import mark_document_set_as_synced
from danswer.db.engine import build_connection_string
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.engine import SYNC_DB_API
from danswer.db.models import DocumentSet
from danswer.db.tasks import check_live_task_not_timed_out
from danswer.db.tasks import get_latest_task
from danswer.document_index.document_index_utils import get_both_index_names
from danswer.document_index.factory import get_default_document_index
from danswer.document_index.interfaces import UpdateRequest
from danswer.utils.logger import setup_logger
logger = setup_logger()
connection_string = build_connection_string(db_api=SYNC_DB_API)
celery_broker_url = f"sqla+{connection_string}"
celery_backend_url = f"db+{connection_string}"
celery_app = Celery(__name__, broker=celery_broker_url, backend=celery_backend_url)
_SYNC_BATCH_SIZE = 100
#####
# Tasks that need to be run in job queue, registered via APIs
#
# If imports from this module are needed, use local imports to avoid circular importing
#####
@build_celery_task_wrapper(name_cc_cleanup_task)
@celery_app.task(soft_time_limit=JOB_TIMEOUT)
def cleanup_connector_credential_pair_task(
connector_id: int,
credential_id: int,
) -> int:
"""Connector deletion task. This is run as an async task because it is a somewhat slow job.
Needs to potentially update a large number of Postgres and Vespa docs, including deleting them
or updating the ACL"""
engine = get_sqlalchemy_engine()
with Session(engine) as db_session:
# validate that the connector / credential pair is deletable
cc_pair = get_connector_credential_pair(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
)
if not cc_pair:
raise ValueError(
f"Cannot run deletion attempt - connector_credential_pair with Connector ID: "
f"{connector_id} and Credential ID: {credential_id} does not exist."
)
deletion_attempt_disallowed_reason = check_deletion_attempt_is_allowed(
connector_credential_pair=cc_pair, db_session=db_session
)
if deletion_attempt_disallowed_reason:
raise ValueError(deletion_attempt_disallowed_reason)
try:
# The bulk of the work is in here, updates Postgres and Vespa
curr_ind_name, sec_ind_name = get_both_index_names(db_session)
document_index = get_default_document_index(
primary_index_name=curr_ind_name, secondary_index_name=sec_ind_name
)
return delete_connector_credential_pair(
db_session=db_session,
document_index=document_index,
cc_pair=cc_pair,
)
except Exception as e:
logger.exception(f"Failed to run connector_deletion due to {e}")
raise e
@build_celery_task_wrapper(name_document_set_sync_task)
@celery_app.task(soft_time_limit=JOB_TIMEOUT)
def sync_document_set_task(document_set_id: int) -> None:
"""For document sets marked as not up to date, sync the state from postgres
into the datastore. Also handles deletions."""
def _sync_document_batch(document_ids: list[str], db_session: Session) -> None:
logger.debug(f"Syncing document sets for: {document_ids}")
# Acquires a lock on the documents so that no other process can modify them
with prepare_to_modify_documents(
db_session=db_session, document_ids=document_ids
):
# get current state of document sets for these documents
document_set_map = {
document_id: document_sets
for document_id, document_sets in fetch_document_sets_for_documents(
document_ids=document_ids, db_session=db_session
)
}
# update Vespa
curr_ind_name, sec_ind_name = get_both_index_names(db_session)
document_index = get_default_document_index(
primary_index_name=curr_ind_name, secondary_index_name=sec_ind_name
)
update_requests = [
UpdateRequest(
document_ids=[document_id],
document_sets=set(document_set_map.get(document_id, [])),
)
for document_id in document_ids
]
document_index.update(update_requests=update_requests)
with Session(get_sqlalchemy_engine()) as db_session:
try:
cursor = None
while True:
document_batch, cursor = fetch_documents_for_document_set_paginated(
document_set_id=document_set_id,
db_session=db_session,
current_only=False,
last_document_id=cursor,
limit=_SYNC_BATCH_SIZE,
)
_sync_document_batch(
document_ids=[document.id for document in document_batch],
db_session=db_session,
)
if cursor is None:
break
# if there are no connectors, then delete the document set. Otherwise, just
# mark it as successfully synced.
document_set = cast(
DocumentSet,
get_document_set_by_id(
db_session=db_session, document_set_id=document_set_id
),
) # casting since we "know" a document set with this ID exists
if not document_set.connector_credential_pairs:
delete_document_set(
document_set_row=document_set, db_session=db_session
)
logger.info(
f"Successfully deleted document set with ID: '{document_set_id}'!"
)
else:
mark_document_set_as_synced(
document_set_id=document_set_id, db_session=db_session
)
logger.info(f"Document set sync for '{document_set_id}' complete!")
except Exception:
logger.exception("Failed to sync document set %s", document_set_id)
raise
#####
# Periodic Tasks
#####
@celery_app.task(
name="check_for_document_sets_sync_task",
soft_time_limit=JOB_TIMEOUT,
)
def check_for_document_sets_sync_task() -> None:
"""Runs periodically to check if any document sets are out of sync
Creates a task to sync the set if needed"""
with Session(get_sqlalchemy_engine()) as db_session:
# check if any document sets are not synced
document_set_info = fetch_document_sets(
user_id=None, db_session=db_session, include_outdated=True
)
for document_set, _ in document_set_info:
if not document_set.is_up_to_date:
task_name = name_document_set_sync_task(document_set.id)
latest_sync = get_latest_task(task_name, db_session)
if latest_sync and check_live_task_not_timed_out(
latest_sync, db_session
):
logger.info(
f"Document set '{document_set.id}' is already syncing. Skipping."
)
continue
logger.info(f"Document set {document_set.id} syncing now!")
sync_document_set_task.apply_async(
kwargs=dict(document_set_id=document_set.id),
)
#####
# Celery Beat (Periodic Tasks) Settings
#####
celery_app.conf.beat_schedule = {
"check-for-document-set-sync": {
"task": "check_for_document_sets_sync_task",
"schedule": timedelta(seconds=5),
},
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,361 @@
# These are helper objects for tracking the keys we need to write in redis
import time
from abc import ABC
from abc import abstractmethod
from typing import cast
from uuid import uuid4
import redis
from celery import Celery
from redis import Redis
from sqlalchemy.orm import Session
from danswer.background.celery.celeryconfig import CELERY_SEPARATOR
from danswer.configs.constants import CELERY_VESPA_SYNC_BEAT_LOCK_TIMEOUT
from danswer.configs.constants import DanswerCeleryPriority
from danswer.configs.constants import DanswerCeleryQueues
from danswer.db.connector_credential_pair import get_connector_credential_pair_from_id
from danswer.db.document import construct_document_select_for_connector_credential_pair
from danswer.db.document import (
construct_document_select_for_connector_credential_pair_by_needs_sync,
)
from danswer.db.document_set import construct_document_select_by_docset
from danswer.utils.variable_functionality import fetch_versioned_implementation
class RedisObjectHelper(ABC):
PREFIX = "base"
FENCE_PREFIX = PREFIX + "_fence"
TASKSET_PREFIX = PREFIX + "_taskset"
def __init__(self, id: int):
self._id: int = id
@property
def task_id_prefix(self) -> str:
return f"{self.PREFIX}_{self._id}"
@property
def fence_key(self) -> str:
# example: documentset_fence_1
return f"{self.FENCE_PREFIX}_{self._id}"
@property
def taskset_key(self) -> str:
# example: documentset_taskset_1
return f"{self.TASKSET_PREFIX}_{self._id}"
@staticmethod
def get_id_from_fence_key(key: str) -> int | None:
"""
Extracts the object ID from a fence key in the format `PREFIX_fence_X`.
Args:
key (str): The fence key string.
Returns:
Optional[int]: The extracted ID if the key is in the correct format, otherwise None.
"""
parts = key.split("_")
if len(parts) != 3:
return None
try:
object_id = int(parts[2])
except ValueError:
return None
return object_id
@staticmethod
def get_id_from_task_id(task_id: str) -> int | None:
"""
Extracts the object ID from a task ID string.
This method assumes the task ID is formatted as `prefix_objectid_suffix`, where:
- `prefix` is an arbitrary string (e.g., the name of the task or entity),
- `objectid` is the ID you want to extract,
- `suffix` is another arbitrary string (e.g., a UUID).
Example:
If the input `task_id` is `documentset_1_cbfdc96a-80ca-4312-a242-0bb68da3c1dc`,
this method will return the string `"1"`.
Args:
task_id (str): The task ID string from which to extract the object ID.
Returns:
str | None: The extracted object ID if the task ID is in the correct format, otherwise None.
"""
# example: task_id=documentset_1_cbfdc96a-80ca-4312-a242-0bb68da3c1dc
parts = task_id.split("_")
if len(parts) != 3:
return None
try:
object_id = int(parts[1])
except ValueError:
return None
return object_id
@abstractmethod
def generate_tasks(
self,
celery_app: Celery,
db_session: Session,
redis_client: Redis,
lock: redis.lock.Lock,
) -> int | None:
pass
class RedisDocumentSet(RedisObjectHelper):
PREFIX = "documentset"
FENCE_PREFIX = PREFIX + "_fence"
TASKSET_PREFIX = PREFIX + "_taskset"
def generate_tasks(
self,
celery_app: Celery,
db_session: Session,
redis_client: Redis,
lock: redis.lock.Lock,
) -> int | None:
last_lock_time = time.monotonic()
async_results = []
stmt = construct_document_select_by_docset(self._id, current_only=False)
for doc in db_session.scalars(stmt).yield_per(1):
current_time = time.monotonic()
if current_time - last_lock_time >= (
CELERY_VESPA_SYNC_BEAT_LOCK_TIMEOUT / 4
):
lock.reacquire()
last_lock_time = current_time
# celery's default task id format is "dd32ded3-00aa-4884-8b21-42f8332e7fac"
# the key for the result is "celery-task-meta-dd32ded3-00aa-4884-8b21-42f8332e7fac"
# we prefix the task id so it's easier to keep track of who created the task
# aka "documentset_1_6dd32ded3-00aa-4884-8b21-42f8332e7fac"
custom_task_id = f"{self.task_id_prefix}_{uuid4()}"
# add to the set BEFORE creating the task.
redis_client.sadd(self.taskset_key, custom_task_id)
result = celery_app.send_task(
"vespa_metadata_sync_task",
kwargs=dict(document_id=doc.id),
queue=DanswerCeleryQueues.VESPA_METADATA_SYNC,
task_id=custom_task_id,
priority=DanswerCeleryPriority.LOW,
)
async_results.append(result)
return len(async_results)
class RedisUserGroup(RedisObjectHelper):
PREFIX = "usergroup"
FENCE_PREFIX = PREFIX + "_fence"
TASKSET_PREFIX = PREFIX + "_taskset"
def generate_tasks(
self,
celery_app: Celery,
db_session: Session,
redis_client: Redis,
lock: redis.lock.Lock,
) -> int | None:
last_lock_time = time.monotonic()
async_results = []
try:
construct_document_select_by_usergroup = fetch_versioned_implementation(
"danswer.db.user_group",
"construct_document_select_by_usergroup",
)
except ModuleNotFoundError:
return 0
stmt = construct_document_select_by_usergroup(self._id)
for doc in db_session.scalars(stmt).yield_per(1):
current_time = time.monotonic()
if current_time - last_lock_time >= (
CELERY_VESPA_SYNC_BEAT_LOCK_TIMEOUT / 4
):
lock.reacquire()
last_lock_time = current_time
# celery's default task id format is "dd32ded3-00aa-4884-8b21-42f8332e7fac"
# the key for the result is "celery-task-meta-dd32ded3-00aa-4884-8b21-42f8332e7fac"
# we prefix the task id so it's easier to keep track of who created the task
# aka "documentset_1_6dd32ded3-00aa-4884-8b21-42f8332e7fac"
custom_task_id = f"{self.task_id_prefix}_{uuid4()}"
# add to the set BEFORE creating the task.
redis_client.sadd(self.taskset_key, custom_task_id)
result = celery_app.send_task(
"vespa_metadata_sync_task",
kwargs=dict(document_id=doc.id),
queue=DanswerCeleryQueues.VESPA_METADATA_SYNC,
task_id=custom_task_id,
priority=DanswerCeleryPriority.LOW,
)
async_results.append(result)
return len(async_results)
class RedisConnectorCredentialPair(RedisObjectHelper):
"""This class differs from the default in that the taskset used spans
all connectors and is not per connector."""
PREFIX = "connectorsync"
FENCE_PREFIX = PREFIX + "_fence"
TASKSET_PREFIX = PREFIX + "_taskset"
@classmethod
def get_fence_key(cls) -> str:
return RedisConnectorCredentialPair.FENCE_PREFIX
@classmethod
def get_taskset_key(cls) -> str:
return RedisConnectorCredentialPair.TASKSET_PREFIX
@property
def taskset_key(self) -> str:
"""Notice that this is intentionally reusing the same taskset for all
connector syncs"""
# example: connector_taskset
return f"{self.TASKSET_PREFIX}"
def generate_tasks(
self,
celery_app: Celery,
db_session: Session,
redis_client: Redis,
lock: redis.lock.Lock,
) -> int | None:
last_lock_time = time.monotonic()
async_results = []
cc_pair = get_connector_credential_pair_from_id(self._id, db_session)
if not cc_pair:
return None
stmt = construct_document_select_for_connector_credential_pair_by_needs_sync(
cc_pair.connector_id, cc_pair.credential_id
)
for doc in db_session.scalars(stmt).yield_per(1):
current_time = time.monotonic()
if current_time - last_lock_time >= (
CELERY_VESPA_SYNC_BEAT_LOCK_TIMEOUT / 4
):
lock.reacquire()
last_lock_time = current_time
# celery's default task id format is "dd32ded3-00aa-4884-8b21-42f8332e7fac"
# the key for the result is "celery-task-meta-dd32ded3-00aa-4884-8b21-42f8332e7fac"
# we prefix the task id so it's easier to keep track of who created the task
# aka "documentset_1_6dd32ded3-00aa-4884-8b21-42f8332e7fac"
custom_task_id = f"{self.task_id_prefix}_{uuid4()}"
# add to the tracking taskset in redis BEFORE creating the celery task.
# note that for the moment we are using a single taskset key, not differentiated by cc_pair id
redis_client.sadd(
RedisConnectorCredentialPair.get_taskset_key(), custom_task_id
)
# Priority on sync's triggered by new indexing should be medium
result = celery_app.send_task(
"vespa_metadata_sync_task",
kwargs=dict(document_id=doc.id),
queue=DanswerCeleryQueues.VESPA_METADATA_SYNC,
task_id=custom_task_id,
priority=DanswerCeleryPriority.MEDIUM,
)
async_results.append(result)
return len(async_results)
class RedisConnectorDeletion(RedisObjectHelper):
PREFIX = "connectordeletion"
FENCE_PREFIX = PREFIX + "_fence"
TASKSET_PREFIX = PREFIX + "_taskset"
def generate_tasks(
self,
celery_app: Celery,
db_session: Session,
redis_client: Redis,
lock: redis.lock.Lock,
) -> int | None:
last_lock_time = time.monotonic()
async_results = []
cc_pair = get_connector_credential_pair_from_id(self._id, db_session)
if not cc_pair:
return None
stmt = construct_document_select_for_connector_credential_pair(
cc_pair.connector_id, cc_pair.credential_id
)
for doc in db_session.scalars(stmt).yield_per(1):
current_time = time.monotonic()
if current_time - last_lock_time >= (
CELERY_VESPA_SYNC_BEAT_LOCK_TIMEOUT / 4
):
lock.reacquire()
last_lock_time = current_time
# celery's default task id format is "dd32ded3-00aa-4884-8b21-42f8332e7fac"
# the actual redis key is "celery-task-meta-dd32ded3-00aa-4884-8b21-42f8332e7fac"
# we prefix the task id so it's easier to keep track of who created the task
# aka "documentset_1_6dd32ded3-00aa-4884-8b21-42f8332e7fac"
custom_task_id = f"{self.task_id_prefix}_{uuid4()}"
# add to the tracking taskset in redis BEFORE creating the celery task.
# note that for the moment we are using a single taskset key, not differentiated by cc_pair id
redis_client.sadd(self.taskset_key, custom_task_id)
# Priority on sync's triggered by new indexing should be medium
result = celery_app.send_task(
"document_by_cc_pair_cleanup_task",
kwargs=dict(
document_id=doc.id,
connector_id=cc_pair.connector_id,
credential_id=cc_pair.credential_id,
),
queue=DanswerCeleryQueues.CONNECTOR_DELETION,
task_id=custom_task_id,
priority=DanswerCeleryPriority.MEDIUM,
)
async_results.append(result)
return len(async_results)
def celery_get_queue_length(queue: str, r: Redis) -> int:
"""This is a redis specific way to get the length of a celery queue.
It is priority aware and knows how to count across the multiple redis lists
used to implement task prioritization.
This operation is not atomic."""
total_length = 0
for i in range(len(DanswerCeleryPriority)):
queue_name = queue
if i > 0:
queue_name += CELERY_SEPARATOR
queue_name += str(i)
length = r.llen(queue_name)
total_length += cast(int, length)
return total_length

View File

@@ -0,0 +1,9 @@
"""Entry point for running celery worker / celery beat."""
from danswer.utils.variable_functionality import fetch_versioned_implementation
from danswer.utils.variable_functionality import set_is_ee_based_on_env_variable
set_is_ee_based_on_env_variable()
celery_app = fetch_versioned_implementation(
"danswer.background.celery.celery_app", "celery_app"
)

View File

@@ -1,23 +1,143 @@
from datetime import datetime
from datetime import timezone
from sqlalchemy.orm import Session
from danswer.background.task_utils import name_cc_cleanup_task
from danswer.background.celery.celery_redis import RedisConnectorDeletion
from danswer.background.task_utils import name_cc_prune_task
from danswer.configs.app_configs import ALLOW_SIMULTANEOUS_PRUNING
from danswer.configs.app_configs import MAX_PRUNING_DOCUMENT_RETRIEVAL_PER_MINUTE
from danswer.connectors.cross_connector_utils.rate_limit_wrapper import (
rate_limit_builder,
)
from danswer.connectors.interfaces import BaseConnector
from danswer.connectors.interfaces import IdConnector
from danswer.connectors.interfaces import LoadConnector
from danswer.connectors.interfaces import PollConnector
from danswer.connectors.models import Document
from danswer.db.connector_credential_pair import get_connector_credential_pair
from danswer.db.engine import get_db_current_time
from danswer.db.enums import TaskStatus
from danswer.db.models import Connector
from danswer.db.models import Credential
from danswer.db.models import TaskQueueState
from danswer.db.tasks import check_task_is_live_and_not_timed_out
from danswer.db.tasks import get_latest_task
from danswer.db.tasks import get_latest_task_by_type
from danswer.redis.redis_pool import RedisPool
from danswer.server.documents.models import DeletionAttemptSnapshot
from danswer.utils.logger import setup_logger
logger = setup_logger()
redis_pool = RedisPool()
def get_deletion_status(
def _get_deletion_status(
connector_id: int, credential_id: int, db_session: Session
) -> TaskQueueState | None:
"""We no longer store TaskQueueState in the DB for a deletion attempt.
This function populates TaskQueueState by just checking redis.
"""
cc_pair = get_connector_credential_pair(
connector_id=connector_id, credential_id=credential_id, db_session=db_session
)
if not cc_pair:
return None
rcd = RedisConnectorDeletion(cc_pair.id)
r = redis_pool.get_client()
if not r.exists(rcd.fence_key):
return None
return TaskQueueState(
task_id="", task_name=rcd.fence_key, status=TaskStatus.STARTED
)
def get_deletion_attempt_snapshot(
connector_id: int, credential_id: int, db_session: Session
) -> DeletionAttemptSnapshot | None:
cleanup_task_name = name_cc_cleanup_task(
connector_id=connector_id, credential_id=credential_id
)
task_state = get_latest_task(task_name=cleanup_task_name, db_session=db_session)
if not task_state:
deletion_task = _get_deletion_status(connector_id, credential_id, db_session)
if not deletion_task:
return None
return DeletionAttemptSnapshot(
connector_id=connector_id,
credential_id=credential_id,
status=task_state.status,
status=deletion_task.status,
)
def should_prune_cc_pair(
connector: Connector, credential: Credential, db_session: Session
) -> bool:
if not connector.prune_freq:
return False
pruning_task_name = name_cc_prune_task(
connector_id=connector.id, credential_id=credential.id
)
last_pruning_task = get_latest_task(pruning_task_name, db_session)
current_db_time = get_db_current_time(db_session)
if not last_pruning_task:
time_since_initialization = current_db_time - connector.time_created
if time_since_initialization.total_seconds() >= connector.prune_freq:
return True
return False
if not ALLOW_SIMULTANEOUS_PRUNING:
pruning_type_task_name = name_cc_prune_task()
last_pruning_type_task = get_latest_task_by_type(
pruning_type_task_name, db_session
)
if last_pruning_type_task and check_task_is_live_and_not_timed_out(
last_pruning_type_task, db_session
):
return False
if check_task_is_live_and_not_timed_out(last_pruning_task, db_session):
return False
if not last_pruning_task.start_time:
return False
time_since_last_pruning = current_db_time - last_pruning_task.start_time
return time_since_last_pruning.total_seconds() >= connector.prune_freq
def document_batch_to_ids(doc_batch: list[Document]) -> set[str]:
return {doc.id for doc in doc_batch}
def extract_ids_from_runnable_connector(runnable_connector: BaseConnector) -> set[str]:
"""
If the PruneConnector hasnt been implemented for the given connector, just pull
all docs using the load_from_state and grab out the IDs
"""
all_connector_doc_ids: set[str] = set()
doc_batch_generator = None
if isinstance(runnable_connector, IdConnector):
all_connector_doc_ids = runnable_connector.retrieve_all_source_ids()
elif isinstance(runnable_connector, LoadConnector):
doc_batch_generator = runnable_connector.load_from_state()
elif isinstance(runnable_connector, PollConnector):
start = datetime(1970, 1, 1, tzinfo=timezone.utc).timestamp()
end = datetime.now(timezone.utc).timestamp()
doc_batch_generator = runnable_connector.poll_source(start=start, end=end)
else:
raise RuntimeError("Pruning job could not find a valid runnable_connector.")
if doc_batch_generator:
doc_batch_processing_func = document_batch_to_ids
if MAX_PRUNING_DOCUMENT_RETRIEVAL_PER_MINUTE:
doc_batch_processing_func = rate_limit_builder(
max_calls=MAX_PRUNING_DOCUMENT_RETRIEVAL_PER_MINUTE, period=60
)(document_batch_to_ids)
for doc_batch in doc_batch_generator:
all_connector_doc_ids.update(doc_batch_processing_func(doc_batch))
return all_connector_doc_ids

View File

@@ -0,0 +1,76 @@
# docs: https://docs.celeryq.dev/en/stable/userguide/configuration.html
from danswer.configs.app_configs import CELERY_RESULT_EXPIRES
from danswer.configs.app_configs import REDIS_DB_NUMBER_CELERY
from danswer.configs.app_configs import REDIS_DB_NUMBER_CELERY_RESULT_BACKEND
from danswer.configs.app_configs import REDIS_HOST
from danswer.configs.app_configs import REDIS_PASSWORD
from danswer.configs.app_configs import REDIS_PORT
from danswer.configs.app_configs import REDIS_SSL
from danswer.configs.app_configs import REDIS_SSL_CA_CERTS
from danswer.configs.app_configs import REDIS_SSL_CERT_REQS
from danswer.configs.constants import DanswerCeleryPriority
CELERY_SEPARATOR = ":"
CELERY_PASSWORD_PART = ""
if REDIS_PASSWORD:
CELERY_PASSWORD_PART = f":{REDIS_PASSWORD}@"
REDIS_SCHEME = "redis"
# SSL-specific query parameters for Redis URL
SSL_QUERY_PARAMS = ""
if REDIS_SSL:
REDIS_SCHEME = "rediss"
SSL_QUERY_PARAMS = f"?ssl_cert_reqs={REDIS_SSL_CERT_REQS}"
if REDIS_SSL_CA_CERTS:
SSL_QUERY_PARAMS += f"&ssl_ca_certs={REDIS_SSL_CA_CERTS}"
# example celery_broker_url: "redis://:password@localhost:6379/15"
broker_url = f"{REDIS_SCHEME}://{CELERY_PASSWORD_PART}{REDIS_HOST}:{REDIS_PORT}/{REDIS_DB_NUMBER_CELERY}{SSL_QUERY_PARAMS}"
result_backend = f"{REDIS_SCHEME}://{CELERY_PASSWORD_PART}{REDIS_HOST}:{REDIS_PORT}/{REDIS_DB_NUMBER_CELERY_RESULT_BACKEND}{SSL_QUERY_PARAMS}"
# NOTE: prefetch 4 is significantly faster than prefetch 1 for small tasks
# however, prefetching is bad when tasks are lengthy as those tasks
# can stall other tasks.
worker_prefetch_multiplier = 4
broker_transport_options = {
"priority_steps": list(range(len(DanswerCeleryPriority))),
"sep": CELERY_SEPARATOR,
"queue_order_strategy": "priority",
}
task_default_priority = DanswerCeleryPriority.MEDIUM
task_acks_late = True
# It's possible we don't even need celery's result backend, in which case all of the optimization below
# might be irrelevant
result_expires = CELERY_RESULT_EXPIRES # 86400 seconds is the default
# Option 0: Defaults (json serializer, no compression)
# about 1.5 KB per queued task. 1KB in queue, 400B for result, 100 as a child entry in generator result
# Option 1: Reduces generator task result sizes by roughly 20%
# task_compression = "bzip2"
# task_serializer = "pickle"
# result_compression = "bzip2"
# result_serializer = "pickle"
# accept_content=["pickle"]
# Option 2: this significantly reduces the size of the result for generator tasks since the list of children
# can be large. small tasks change very little
# def pickle_bz2_encoder(data):
# return bz2.compress(pickle.dumps(data))
# def pickle_bz2_decoder(data):
# return pickle.loads(bz2.decompress(data))
# from kombu import serialization # To register custom serialization with Celery/Kombu
# serialization.register('pickle-bzip2', pickle_bz2_encoder, pickle_bz2_decoder, 'application/x-pickle-bz2', 'binary')
# task_serializer = "pickle-bzip2"
# result_serializer = "pickle-bzip2"
# accept_content=["pickle", "pickle-bzip2"]

View File

@@ -10,27 +10,15 @@ are multiple connector / credential pairs that have indexed it
connector / credential pair from the access list
(6) delete all relevant entries from postgres
"""
import time
from sqlalchemy.orm import Session
from danswer.access.access import get_access_for_documents
from danswer.db.connector import fetch_connector_by_id
from danswer.db.connector_credential_pair import (
delete_connector_credential_pair__no_commit,
)
from danswer.db.document import delete_document_by_connector_credential_pair__no_commit
from danswer.db.document import delete_documents_by_connector_credential_pair__no_commit
from danswer.db.document import delete_documents_complete__no_commit
from danswer.db.document import get_document_connector_cnts
from danswer.db.document import get_documents_for_connector_credential_pair
from danswer.db.document import get_document_connector_counts
from danswer.db.document import prepare_to_modify_documents
from danswer.db.document_set import get_document_sets_by_ids
from danswer.db.document_set import (
mark_cc_pair__document_set_relationships_to_be_deleted__no_commit,
)
from danswer.db.document_set import fetch_document_sets_for_documents
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.index_attempt import delete_index_attempts
from danswer.db.models import ConnectorCredentialPair
from danswer.document_index.interfaces import DocumentIndex
from danswer.document_index.interfaces import UpdateRequest
from danswer.server.documents.models import ConnectorCredentialPairIdentifier
@@ -41,7 +29,7 @@ logger = setup_logger()
_DELETION_BATCH_SIZE = 1000
def _delete_connector_credential_pair_batch(
def delete_connector_credential_pair_batch(
document_ids: list[str],
connector_id: int,
credential_id: int,
@@ -57,13 +45,15 @@ def _delete_connector_credential_pair_batch(
with prepare_to_modify_documents(
db_session=db_session, document_ids=document_ids
):
document_connector_cnts = get_document_connector_cnts(
document_connector_counts = get_document_connector_counts(
db_session=db_session, document_ids=document_ids
)
# figure out which docs need to be completely deleted
document_ids_to_delete = [
document_id for document_id, cnt in document_connector_cnts if cnt == 1
document_id
for document_id, cnt in document_connector_counts
if cnt == 1
]
logger.debug(f"Deleting documents: {document_ids_to_delete}")
@@ -76,28 +66,40 @@ def _delete_connector_credential_pair_batch(
# figure out which docs need to be updated
document_ids_to_update = [
document_id for document_id, cnt in document_connector_cnts if cnt > 1
document_id for document_id, cnt in document_connector_counts if cnt > 1
]
# maps document id to list of document set names
new_doc_sets_for_documents: dict[str, set[str]] = {
document_id_and_document_set_names_tuple[0]: set(
document_id_and_document_set_names_tuple[1]
)
for document_id_and_document_set_names_tuple in fetch_document_sets_for_documents(
db_session=db_session,
document_ids=document_ids_to_update,
)
}
# determine future ACLs for documents in batch
access_for_documents = get_access_for_documents(
document_ids=document_ids_to_update,
db_session=db_session,
cc_pair_to_delete=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
# update Vespa
logger.debug(f"Updating documents: {document_ids_to_update}")
update_requests = [
UpdateRequest(
document_ids=[document_id],
access=access,
document_sets=new_doc_sets_for_documents[document_id],
)
for document_id, access in access_for_documents.items()
]
logger.debug(f"Updating documents: {document_ids_to_update}")
document_index.update(update_requests=update_requests)
delete_document_by_connector_credential_pair__no_commit(
# clean up Postgres
delete_documents_by_connector_credential_pair__no_commit(
db_session=db_session,
document_ids=document_ids_to_update,
connector_credential_pair_identifier=ConnectorCredentialPairIdentifier(
@@ -106,105 +108,3 @@ def _delete_connector_credential_pair_batch(
),
)
db_session.commit()
def cleanup_synced_entities(
cc_pair: ConnectorCredentialPair, db_session: Session
) -> None:
"""Updates the document sets associated with the connector / credential pair,
then relies on the document set sync script to kick off Celery jobs which will
sync these updates to Vespa.
Waits until the document sets are synced before returning."""
logger.info(f"Cleaning up Document Sets for CC Pair with ID: '{cc_pair.id}'")
document_sets_ids_to_sync = list(
mark_cc_pair__document_set_relationships_to_be_deleted__no_commit(
cc_pair_id=cc_pair.id,
db_session=db_session,
)
)
db_session.commit()
# wait till all document sets are synced before continuing
while True:
all_synced = True
document_sets = get_document_sets_by_ids(
db_session=db_session, document_set_ids=document_sets_ids_to_sync
)
for document_set in document_sets:
if not document_set.is_up_to_date:
all_synced = False
if all_synced:
break
# wait for 30 seconds before checking again
db_session.commit() # end transaction
logger.info(
f"Document sets '{document_sets_ids_to_sync}' not synced yet, waiting 30s"
)
time.sleep(30)
logger.info(
f"Finished cleaning up Document Sets for CC Pair with ID: '{cc_pair.id}'"
)
def delete_connector_credential_pair(
db_session: Session,
document_index: DocumentIndex,
cc_pair: ConnectorCredentialPair,
) -> int:
connector_id = cc_pair.connector_id
credential_id = cc_pair.credential_id
num_docs_deleted = 0
while True:
documents = get_documents_for_connector_credential_pair(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
limit=_DELETION_BATCH_SIZE,
)
if not documents:
break
_delete_connector_credential_pair_batch(
document_ids=[document.id for document in documents],
connector_id=connector_id,
credential_id=credential_id,
document_index=document_index,
)
num_docs_deleted += len(documents)
# Clean up document sets / access information from Postgres
# and sync these updates to Vespa
# TODO: add user group cleanup with `fetch_versioned_implementation`
cleanup_synced_entities(cc_pair, db_session)
# clean up the rest of the related Postgres entities
delete_index_attempts(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
)
delete_connector_credential_pair__no_commit(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
)
# if there are no credentials left, delete the connector
connector = fetch_connector_by_id(
db_session=db_session,
connector_id=connector_id,
)
if not connector or not len(connector.credentials):
logger.debug("Found no credentials left for connector, deleting connector")
db_session.delete(connector)
db_session.commit()
logger.info(
"Successfully deleted connector_credential_pair with connector_id:"
f" '{connector_id}' and credential_id: '{credential_id}'. Deleted {num_docs_deleted} docs."
)
return num_docs_deleted

View File

@@ -41,6 +41,12 @@ def _initializer(
return func(*args, **kwargs)
def _run_in_process(
func: Callable, args: list | tuple, kwargs: dict[str, Any] | None = None
) -> None:
_initializer(func, args, kwargs)
@dataclass
class SimpleJob:
"""Drop in replacement for `dask.distributed.Future`"""
@@ -105,13 +111,15 @@ class SimpleJobClient:
"""NOTE: `pure` arg is needed so this can be a drop in replacement for Dask"""
self._cleanup_completed_jobs()
if len(self.jobs) >= self.n_workers:
logger.debug("No available workers to run job")
logger.debug(
f"No available workers to run job. Currently running '{len(self.jobs)}' jobs, with a limit of '{self.n_workers}'."
)
return None
job_id = self.job_id_counter
self.job_id_counter += 1
process = Process(target=_initializer(func=func, args=args), daemon=True)
process = Process(target=_run_in_process, args=(func, args), daemon=True)
job = SimpleJob(id=job_id, process=process)
process.start()

View File

@@ -6,27 +6,22 @@ from datetime import timezone
from sqlalchemy.orm import Session
from danswer.background.connector_deletion import (
_delete_connector_credential_pair_batch,
)
from danswer.background.indexing.checkpointing import get_time_windows_for_index_attempt
from danswer.configs.app_configs import DISABLE_DOCUMENT_CLEANUP
from danswer.background.indexing.tracer import DanswerTracer
from danswer.configs.app_configs import INDEXING_SIZE_WARNING_THRESHOLD
from danswer.configs.app_configs import INDEXING_TRACER_INTERVAL
from danswer.configs.app_configs import POLL_CONNECTOR_OFFSET
from danswer.connectors.connector_runner import ConnectorRunner
from danswer.connectors.factory import instantiate_connector
from danswer.connectors.interfaces import GenerateDocumentsOutput
from danswer.connectors.interfaces import LoadConnector
from danswer.connectors.interfaces import PollConnector
from danswer.connectors.models import IndexAttemptMetadata
from danswer.connectors.models import InputType
from danswer.db.connector import disable_connector
from danswer.db.connector_credential_pair import get_last_successful_attempt_time
from danswer.db.connector_credential_pair import update_connector_credential_pair
from danswer.db.credentials import backend_update_credential_json
from danswer.db.document import get_documents_for_connector_credential_pair
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.enums import ConnectorCredentialPairStatus
from danswer.db.index_attempt import get_index_attempt
from danswer.db.index_attempt import mark_attempt_failed
from danswer.db.index_attempt import mark_attempt_in_progress__no_commit
from danswer.db.index_attempt import mark_attempt_in_progress
from danswer.db.index_attempt import mark_attempt_partially_succeeded
from danswer.db.index_attempt import mark_attempt_succeeded
from danswer.db.index_attempt import update_docs_indexed
from danswer.db.models import IndexAttempt
@@ -37,16 +32,19 @@ from danswer.indexing.embedder import DefaultIndexingEmbedder
from danswer.indexing.indexing_pipeline import build_indexing_pipeline
from danswer.utils.logger import IndexAttemptSingleton
from danswer.utils.logger import setup_logger
from danswer.utils.variable_functionality import global_version
logger = setup_logger()
INDEXING_TRACER_NUM_PRINT_ENTRIES = 5
def _get_document_generator(
def _get_connector_runner(
db_session: Session,
attempt: IndexAttempt,
start_time: datetime,
end_time: datetime,
) -> tuple[GenerateDocumentsOutput, bool]:
) -> ConnectorRunner:
"""
NOTE: `start_time` and `end_time` are only used for poll connectors
@@ -54,47 +52,31 @@ def _get_document_generator(
are the complete list of existing documents of the connector. If the task
of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
"""
task = attempt.connector.input_type
task = attempt.connector_credential_pair.connector.input_type
try:
runnable_connector, new_credential_json = instantiate_connector(
attempt.connector.source,
task,
attempt.connector.connector_specific_config,
attempt.credential.credential_json,
runnable_connector = instantiate_connector(
db_session=db_session,
source=attempt.connector_credential_pair.connector.source,
input_type=task,
connector_specific_config=attempt.connector_credential_pair.connector.connector_specific_config,
credential=attempt.connector_credential_pair.credential,
)
if new_credential_json is not None:
backend_update_credential_json(
attempt.credential, new_credential_json, db_session
)
except Exception as e:
logger.exception(f"Unable to instantiate connector due to {e}")
disable_connector(attempt.connector.id, db_session)
# since we failed to even instantiate the connector, we pause the CCPair since
# it will never succeed
update_connector_credential_pair(
db_session=db_session,
connector_id=attempt.connector_credential_pair.connector.id,
credential_id=attempt.connector_credential_pair.credential.id,
status=ConnectorCredentialPairStatus.PAUSED,
)
raise e
if task == InputType.LOAD_STATE:
assert isinstance(runnable_connector, LoadConnector)
doc_batch_generator = runnable_connector.load_from_state()
is_listing_complete = True
elif task == InputType.POLL:
assert isinstance(runnable_connector, PollConnector)
if attempt.connector_id is None or attempt.credential_id is None:
raise ValueError(
f"Polling attempt {attempt.id} is missing connector_id or credential_id, "
f"can't fetch time range."
)
logger.info(f"Polling for updates between {start_time} and {end_time}")
doc_batch_generator = runnable_connector.poll_source(
start=start_time.timestamp(), end=end_time.timestamp()
)
is_listing_complete = False
else:
# Event types cannot be handled by a background type
raise RuntimeError(f"Invalid task type: {task}")
return doc_batch_generator, is_listing_complete
return ConnectorRunner(
connector=runnable_connector, time_range=(start_time, end_time)
)
def _run_indexing(
@@ -108,46 +90,62 @@ def _run_indexing(
"""
start_time = time.time()
db_embedding_model = index_attempt.embedding_model
index_name = db_embedding_model.index_name
search_settings = index_attempt.search_settings
index_name = search_settings.index_name
# Only update cc-pair status for primary index jobs
# Secondary index syncs at the end when swapping
is_primary = index_attempt.embedding_model.status == IndexModelStatus.PRESENT
is_primary = search_settings.status == IndexModelStatus.PRESENT
# Indexing is only done into one index at a time
document_index = get_default_document_index(
primary_index_name=index_name, secondary_index_name=None
)
embedding_model = DefaultIndexingEmbedder(
model_name=db_embedding_model.model_name,
normalize=db_embedding_model.normalize,
query_prefix=db_embedding_model.query_prefix,
passage_prefix=db_embedding_model.passage_prefix,
embedding_model = DefaultIndexingEmbedder.from_db_search_settings(
search_settings=search_settings
)
indexing_pipeline = build_indexing_pipeline(
attempt_id=index_attempt.id,
embedder=embedding_model,
document_index=document_index,
ignore_time_skip=index_attempt.from_beginning
or (db_embedding_model.status == IndexModelStatus.FUTURE),
or (search_settings.status == IndexModelStatus.FUTURE),
db_session=db_session,
)
db_connector = index_attempt.connector
db_credential = index_attempt.credential
db_cc_pair = index_attempt.connector_credential_pair
db_connector = index_attempt.connector_credential_pair.connector
db_credential = index_attempt.connector_credential_pair.credential
earliest_index_time = (
db_connector.indexing_start.timestamp() if db_connector.indexing_start else 0
)
last_successful_index_time = (
0.0
earliest_index_time
if index_attempt.from_beginning
else get_last_successful_attempt_time(
connector_id=db_connector.id,
credential_id=db_credential.id,
embedding_model=index_attempt.embedding_model,
earliest_index=earliest_index_time,
search_settings=index_attempt.search_settings,
db_session=db_session,
)
)
if INDEXING_TRACER_INTERVAL > 0:
logger.debug(f"Memory tracer starting: interval={INDEXING_TRACER_INTERVAL}")
tracer = DanswerTracer()
tracer.start()
tracer.snap()
index_attempt_md = IndexAttemptMetadata(
connector_id=db_connector.id,
credential_id=db_credential.id,
)
batch_num = 0
net_doc_change = 0
document_count = 0
chunk_count = 0
@@ -166,7 +164,7 @@ def _run_indexing(
datetime(1970, 1, 1, tzinfo=timezone.utc),
)
doc_batch_generator, is_listing_complete = _get_document_generator(
connector_runner = _get_connector_runner(
db_session=db_session,
attempt=index_attempt,
start_time=window_start,
@@ -174,15 +172,23 @@ def _run_indexing(
)
all_connector_doc_ids: set[str] = set()
for doc_batch in doc_batch_generator:
tracer_counter = 0
if INDEXING_TRACER_INTERVAL > 0:
tracer.snap()
for doc_batch in connector_runner.run():
# Check if connector is disabled mid run and stop if so unless it's the secondary
# index being built. We want to populate it even for paused connectors
# Often paused connectors are sources that aren't updated frequently but the
# contents still need to be initially pulled.
db_session.refresh(db_connector)
if (
db_connector.disabled
and db_embedding_model.status != IndexModelStatus.FUTURE
(
db_cc_pair.status == ConnectorCredentialPairStatus.PAUSED
and search_settings.status != IndexModelStatus.FUTURE
)
# if it's deleting, we don't care if this is a secondary index
or db_cc_pair.status == ConnectorCredentialPairStatus.DELETING
):
# let the `except` block handle this
raise RuntimeError("Connector was disabled mid run")
@@ -192,17 +198,30 @@ def _run_indexing(
# Likely due to user manually disabling it or model swap
raise RuntimeError("Index Attempt was canceled")
logger.debug(
f"Indexing batch of documents: {[doc.to_short_descriptor() for doc in doc_batch]}"
batch_description = []
for doc in doc_batch:
batch_description.append(doc.to_short_descriptor())
doc_size = 0
for section in doc.sections:
doc_size += len(section.text)
if doc_size > INDEXING_SIZE_WARNING_THRESHOLD:
logger.warning(
f"Document size: doc='{doc.to_short_descriptor()}' "
f"size={doc_size} "
f"threshold={INDEXING_SIZE_WARNING_THRESHOLD}"
)
logger.debug(f"Indexing batch of documents: {batch_description}")
index_attempt_md.batch_num = batch_num + 1 # use 1-index for this
new_docs, total_batch_chunks = indexing_pipeline(
document_batch=doc_batch,
index_attempt_metadata=index_attempt_md,
)
new_docs, total_batch_chunks = indexing_pipeline(
documents=doc_batch,
index_attempt_metadata=IndexAttemptMetadata(
connector_id=db_connector.id,
credential_id=db_credential.id,
),
)
batch_num += 1
net_doc_change += new_docs
chunk_count += total_batch_chunks
document_count += len(doc_batch)
@@ -224,38 +243,16 @@ def _run_indexing(
docs_removed_from_index=0,
)
if is_listing_complete and not DISABLE_DOCUMENT_CLEANUP:
# clean up all documents from the index that have not been returned from the connector
all_indexed_document_ids = {
d.id
for d in get_documents_for_connector_credential_pair(
db_session=db_session,
connector_id=db_connector.id,
credential_id=db_credential.id,
tracer_counter += 1
if (
INDEXING_TRACER_INTERVAL > 0
and tracer_counter % INDEXING_TRACER_INTERVAL == 0
):
logger.debug(
f"Running trace comparison for batch {tracer_counter}. interval={INDEXING_TRACER_INTERVAL}"
)
}
doc_ids_to_remove = list(
all_indexed_document_ids - all_connector_doc_ids
)
logger.debug(
f"Cleaning up {len(doc_ids_to_remove)} documents that are not contained in the newest connector state"
)
# delete docs from cc-pair and receive the number of completely deleted docs in return
_delete_connector_credential_pair_batch(
document_ids=doc_ids_to_remove,
connector_id=db_connector.id,
credential_id=db_credential.id,
document_index=document_index,
)
update_docs_indexed(
db_session=db_session,
index_attempt=index_attempt,
total_docs_indexed=document_count,
new_docs_indexed=net_doc_change,
docs_removed_from_index=len(doc_ids_to_remove),
)
tracer.snap()
tracer.log_previous_diff(INDEXING_TRACER_NUM_PRINT_ENTRIES)
run_end_dt = window_end
if is_primary:
@@ -267,7 +264,7 @@ def _run_indexing(
run_dt=run_end_dt,
)
except Exception as e:
logger.info(
logger.exception(
f"Connector run ran into exception after elapsed time: {time.time() - start_time} seconds"
)
# Only mark the attempt as a complete failure if this is the first indexing window.
@@ -279,7 +276,7 @@ def _run_indexing(
# to give better clarity in the UI, as the next run will never happen.
if (
ind == 0
or db_connector.disabled
or not db_cc_pair.status.is_active()
or index_attempt.status != IndexingStatus.IN_PROGRESS
):
mark_attempt_failed(
@@ -291,17 +288,66 @@ def _run_indexing(
if is_primary:
update_connector_credential_pair(
db_session=db_session,
connector_id=index_attempt.connector.id,
credential_id=index_attempt.credential.id,
connector_id=db_connector.id,
credential_id=db_credential.id,
net_docs=net_doc_change,
)
if INDEXING_TRACER_INTERVAL > 0:
tracer.stop()
raise e
# break => similar to success case. As mentioned above, if the next run fails for the same
# reason it will then be marked as a failure
break
mark_attempt_succeeded(index_attempt, db_session)
if INDEXING_TRACER_INTERVAL > 0:
logger.debug(
f"Running trace comparison between start and end of indexing. {tracer_counter} batches processed."
)
tracer.snap()
tracer.log_first_diff(INDEXING_TRACER_NUM_PRINT_ENTRIES)
tracer.stop()
logger.debug("Memory tracer stopped.")
if (
index_attempt_md.num_exceptions > 0
and index_attempt_md.num_exceptions >= batch_num
):
mark_attempt_failed(
index_attempt,
db_session,
failure_reason="All batches exceptioned.",
)
if is_primary:
update_connector_credential_pair(
db_session=db_session,
connector_id=index_attempt.connector_credential_pair.connector.id,
credential_id=index_attempt.connector_credential_pair.credential.id,
)
raise Exception(
f"Connector failed - All batches exceptioned: batches={batch_num}"
)
elapsed_time = time.time() - start_time
if index_attempt_md.num_exceptions == 0:
mark_attempt_succeeded(index_attempt, db_session)
logger.info(
f"Connector succeeded: "
f"docs={document_count} chunks={chunk_count} elapsed={elapsed_time:.2f}s"
)
else:
mark_attempt_partially_succeeded(index_attempt, db_session)
logger.info(
f"Connector completed with some errors: "
f"exceptions={index_attempt_md.num_exceptions} "
f"batches={batch_num} "
f"docs={document_count} "
f"chunks={chunk_count} "
f"elapsed={elapsed_time:.2f}s"
)
if is_primary:
update_connector_credential_pair(
db_session=db_session,
@@ -310,13 +356,6 @@ def _run_indexing(
run_dt=run_end_dt,
)
logger.info(
f"Indexed or refreshed {document_count} total documents for a total of {chunk_count} indexed chunks"
)
logger.info(
f"Connector successfully finished, elapsed time: {time.time() - start_time} seconds"
)
def _prepare_index_attempt(db_session: Session, index_attempt_id: int) -> IndexAttempt:
# make sure that the index attempt can't change in between checking the
@@ -329,6 +368,7 @@ def _prepare_index_attempt(db_session: Session, index_attempt_id: int) -> IndexA
db_session=db_session,
index_attempt_id=index_attempt_id,
)
if attempt is None:
raise RuntimeError(f"Unable to find IndexAttempt for ID '{index_attempt_id}'")
@@ -339,21 +379,27 @@ def _prepare_index_attempt(db_session: Session, index_attempt_id: int) -> IndexA
)
# only commit once, to make sure this all happens in a single transaction
mark_attempt_in_progress__no_commit(attempt)
if attempt.embedding_model.status != IndexModelStatus.PRESENT:
db_session.commit()
mark_attempt_in_progress(attempt, db_session)
return attempt
def run_indexing_entrypoint(index_attempt_id: int) -> None:
def run_indexing_entrypoint(
index_attempt_id: int, connector_credential_pair_id: int, is_ee: bool = False
) -> None:
"""Entrypoint for indexing run when using dask distributed.
Wraps the actual logic in a `try` block so that we can catch any exceptions
and mark the attempt as failed."""
try:
if is_ee:
global_version.set_ee()
# set the indexing attempt ID so that all log messages from this process
# will have it added as a prefix
IndexAttemptSingleton.set_index_attempt_id(index_attempt_id)
IndexAttemptSingleton.set_cc_and_index_id(
index_attempt_id, connector_credential_pair_id
)
with Session(get_sqlalchemy_engine()) as db_session:
# make sure that it is valid to run this indexing attempt + mark it
@@ -361,17 +407,19 @@ def run_indexing_entrypoint(index_attempt_id: int) -> None:
attempt = _prepare_index_attempt(db_session, index_attempt_id)
logger.info(
f"Running indexing attempt for connector: '{attempt.connector.name}', "
f"with config: '{attempt.connector.connector_specific_config}', and "
f"with credentials: '{attempt.credential_id}'"
f"Indexing starting: "
f"connector='{attempt.connector_credential_pair.connector.name}' "
f"config='{attempt.connector_credential_pair.connector.connector_specific_config}' "
f"credentials='{attempt.connector_credential_pair.connector_id}'"
)
_run_indexing(db_session, attempt)
logger.info(
f"Completed indexing attempt for connector: '{attempt.connector.name}', "
f"with config: '{attempt.connector.connector_specific_config}', and "
f"with credentials: '{attempt.credential_id}'"
f"Indexing finished: "
f"connector='{attempt.connector_credential_pair.connector.name}' "
f"config='{attempt.connector_credential_pair.connector.connector_specific_config}' "
f"credentials='{attempt.connector_credential_pair.connector_id}'"
)
except Exception as e:
logger.exception(f"Indexing job with ID '{index_attempt_id}' failed due to {e}")

View File

@@ -0,0 +1,77 @@
import tracemalloc
from danswer.utils.logger import setup_logger
logger = setup_logger()
DANSWER_TRACEMALLOC_FRAMES = 10
class DanswerTracer:
def __init__(self) -> None:
self.snapshot_first: tracemalloc.Snapshot | None = None
self.snapshot_prev: tracemalloc.Snapshot | None = None
self.snapshot: tracemalloc.Snapshot | None = None
def start(self) -> None:
tracemalloc.start(DANSWER_TRACEMALLOC_FRAMES)
def stop(self) -> None:
tracemalloc.stop()
def snap(self) -> None:
snapshot = tracemalloc.take_snapshot()
# Filter out irrelevant frames (e.g., from tracemalloc itself or importlib)
snapshot = snapshot.filter_traces(
(
tracemalloc.Filter(False, tracemalloc.__file__), # Exclude tracemalloc
tracemalloc.Filter(
False, "<frozen importlib._bootstrap>"
), # Exclude importlib
tracemalloc.Filter(
False, "<frozen importlib._bootstrap_external>"
), # Exclude external importlib
)
)
if not self.snapshot_first:
self.snapshot_first = snapshot
if self.snapshot:
self.snapshot_prev = self.snapshot
self.snapshot = snapshot
def log_snapshot(self, numEntries: int) -> None:
if not self.snapshot:
return
stats = self.snapshot.statistics("traceback")
for s in stats[:numEntries]:
logger.debug(f"Tracer snap: {s}")
for line in s.traceback:
logger.debug(f"* {line}")
@staticmethod
def log_diff(
snap_current: tracemalloc.Snapshot,
snap_previous: tracemalloc.Snapshot,
numEntries: int,
) -> None:
stats = snap_current.compare_to(snap_previous, "traceback")
for s in stats[:numEntries]:
logger.debug(f"Tracer diff: {s}")
for line in s.traceback.format():
logger.debug(f"* {line}")
def log_previous_diff(self, numEntries: int) -> None:
if not self.snapshot or not self.snapshot_prev:
return
DanswerTracer.log_diff(self.snapshot, self.snapshot_prev, numEntries)
def log_first_diff(self, numEntries: int) -> None:
if not self.snapshot or not self.snapshot_first:
return
DanswerTracer.log_diff(self.snapshot, self.snapshot_first, numEntries)

Some files were not shown because too many files have changed in this diff Show More