Compare commits

...

785 Commits

Author SHA1 Message Date
Weves
30983657ec Fix indexing of whitespace only 2024-01-05 19:35:38 -08:00
Yuhong Sun
6b6b3daab7 Reenable option to run Danswer without Gen AI (#906) 2024-01-03 18:31:16 -08:00
Chris Weaver
20441df4a4 Add Tag Filter UI + other UI cleanup (#905) 2024-01-02 11:30:36 -08:00
Yuhong Sun
d7141df5fc Metadata and Title Search (#903) 2024-01-02 11:25:50 -08:00
Yuhong Sun
615bb7b095 Update CONTRIBUTING.md 2024-01-01 18:07:50 -08:00
Yuhong Sun
e759718c3e Update CONTRIBUTING.md 2024-01-01 18:06:56 -08:00
Yuhong Sun
06d8d0e53c Update CONTRIBUTING.md 2024-01-01 18:06:17 -08:00
Weves
ae9b556876 Revamp new chat screen for chat UI 2023-12-30 18:13:24 -08:00
Chris Weaver
f883611e94 Add query editing in Chat UI (#899) 2023-12-30 12:46:48 -08:00
Yuhong Sun
13c536c033 Final Backend CVEs (#900) 2023-12-30 11:57:49 -08:00
Yuhong Sun
2e6be57880 Model Server CVEs (#898) 2023-12-29 21:14:08 -08:00
Weves
b352d83b8c Increase max upload size 2023-12-29 21:11:57 -08:00
Yuhong Sun
aa67768c79 CVEs continued (#889) 2023-12-29 20:42:16 -08:00
Weves
6004e540f3 Improve Vespa invalid char cleanup 2023-12-29 20:36:03 -08:00
eukub
64d2cea396 reduced redunduncy and changed concatenation of strings to f-strings 2023-12-29 00:35:04 -08:00
Weves
b5947a1c74 Add illegal char stripping to title field 2023-12-29 00:17:40 -08:00
Weves
cdf260b277 FIx chat refresh + add stop button 2023-12-28 23:33:41 -08:00
Weves
73483b5e09 Fix more auth disabled flakiness 2023-12-27 01:23:29 -08:00
Yuhong Sun
a6a444f365 Bump Python Version for security (#887) 2023-12-26 16:15:14 -08:00
Yuhong Sun
449a403c73 Automatic Security Scan (#886) 2023-12-26 14:41:23 -08:00
Yuhong Sun
4aebf824d2 Fix broken build SHA issue (#885) 2023-12-26 14:36:40 -08:00
Weves
26946198de Fix disabled auth 2023-12-26 12:51:58 -08:00
Yuhong Sun
e5035b8992 Move some util functions around (#883) 2023-12-26 00:38:29 -08:00
Weves
2e9af3086a Remove old comment 2023-12-25 21:36:54 -08:00
Weves
dab3ba8a41 Add support for basic auth on FE 2023-12-25 21:19:59 -08:00
Yuhong Sun
1e84b0daa4 Fix escape character handling in DanswerBot (#880) 2023-12-25 12:28:35 -08:00
Yuhong Sun
f4c8abdf21 Remove Extraneous Persona Config (#878) 2023-12-24 22:48:48 -08:00
sweep-ai[bot]
ccc5bb1e67 Configure Sweep (#875)
* Create sweep.yaml

* Create sweep template

* Update sweep.yaml

---------

Co-authored-by: sweep-ai[bot] <128439645+sweep-ai[bot]@users.noreply.github.com>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-12-24 19:04:52 -08:00
Yuhong Sun
c3cf9134bb Telemetry Revision (#868) 2023-12-24 17:39:37 -08:00
Weves
0370b9b38d Stop copying local node_modules / .next dir into web docker image 2023-12-24 15:27:11 -08:00
Weves
95bf1c13ad Add http2 dependency 2023-12-24 14:49:31 -08:00
Yuhong Sun
00c1f93b12 Zendesk Tiny Cleanup (#867) 2023-12-23 16:39:15 -08:00
Yuhong Sun
a122510cee Zendesk Connector Metadata and small batch fix (#866) 2023-12-23 16:34:48 -08:00
Weves
dca4f7a72b Adding http2 support to Vespa 2023-12-23 16:23:24 -08:00
Weves
535dc265c5 Fix boost resetting on document update + fix refresh on re-index 2023-12-23 15:23:21 -08:00
Weves
56882367ba Fix migrations 2023-12-23 12:58:00 -08:00
Weves
d9fbd7ffe2 Add hiding + re-ordering to personas 2023-12-22 23:04:43 -08:00
Yuhong Sun
8b7d01fb3b Allow Duplicate Naming for CC-Pair (#862) 2023-12-22 23:03:44 -08:00
voarsh2
016a087b10 Refactor environment variable handling using ConfigMap for Kubernetes deployment (#515)
---------

Co-authored-by: Reese Jenner <reesevader@hotmail.co.uk>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-12-22 21:33:36 -08:00
Sam Jakos
241b886976 fix: parse INDEX_BATCH_SIZE to an int (#858) 2023-12-22 13:03:21 -08:00
Yuhong Sun
ff014e4f5a Bump Transformer Version (#857) 2023-12-22 01:47:18 -08:00
Aliaksandr_С
0318507911 Indexing settings and logging improve (#821)
---------

Co-authored-by: Aliaksandr Chernak <aliaksandr_chernak@epam.com>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-12-22 01:13:24 -08:00
Yuhong Sun
6650f01dc6 Multilingual Docs Updates (#856) 2023-12-22 00:26:00 -08:00
Yuhong Sun
962e3f726a Slack Feedback Message Tweaks (#855) 2023-12-21 20:52:11 -08:00
mattboret
25a73b9921 Slack bot improve source feedback (#827)
---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-12-21 20:33:20 -08:00
Yuhong Sun
dc0b3672ac git push --set-upstream origin danswerbot-format (#854) 2023-12-21 18:46:30 -08:00
Yuhong Sun
c4ad03a65d Handle DanswerBot case where no updated at (#853) 2023-12-21 18:33:42 -08:00
mattboret
c6f354fd03 Add the latest document update to the Slack bot answer (#817)
* Add the latest source update to the Slack bot answer

* fix mypy errors

---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-12-21 18:16:05 -08:00
Yuhong Sun
2f001c23b7 Confluence add tag to replaced names (#852) 2023-12-21 18:03:56 -08:00
mattboret
4d950aa60d Replace user id by the user display name in the exported Confluence page (#815)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-12-21 17:52:28 -08:00
Yuhong Sun
56406a0b53 Bump Vespa to 8.277.17 (#851) 2023-12-21 17:23:27 -08:00
sam lockart
eb31c08461 Update Vespa to 8.267.29 (#812) 2023-12-21 17:18:16 -08:00
Weves
26f94c9890 Improve re-sizing 2023-12-21 10:03:03 -08:00
Weves
a9570e01e2 Make document sidebar scrollbar darker 2023-12-21 10:03:03 -08:00
Weves
402d83e167 Make it so docs without links aren't clickable in chat citations 2023-12-21 10:03:03 -08:00
Ikko Eltociear Ashimine
10dcd49fc8 Update CONTRIBUTING.md
Nagivate -> Navigate
2023-12-21 09:10:52 -08:00
Yuhong Sun
0fdad0e777 Update Demo Video 2023-12-20 19:05:23 -08:00
Weves
fab767d794 Fix persona document sets 2023-12-20 15:24:32 -08:00
Weves
7dd70ca4c0 Change danswer header link in chat page 2023-12-20 11:38:33 -08:00
Weves
370760eeee Fix editing deleted personas, editing personas with no prompts, and model selection 2023-12-19 14:42:13 -08:00
Weves
24a62cb33d Fix persona + prompt apis 2023-12-19 10:23:06 -08:00
Weves
9e4a4ddf39 Update search helper styling 2023-12-19 07:08:11 -08:00
Yuhong Sun
c281859509 Google Drive handle invalid PDFs (#838) 2023-12-18 23:39:45 -08:00
Yuhong Sun
2180a40bd3 Disable Chain of Thought for now (#837) 2023-12-18 21:44:47 -08:00
Weves
997f9c3191 Fix ccPair pages crashing 2023-12-17 23:28:26 -08:00
Weves
677c32ea79 Fix issue where a message that errors out creates a bad state 2023-12-17 23:28:26 -08:00
Yuhong Sun
edfc849652 Search more frequently (#834) 2023-12-17 22:45:46 -08:00
Yuhong Sun
9d296b623b Shield Update (#833) 2023-12-17 22:17:44 -08:00
Yuhong Sun
5957b888a5 DanswerBot Chat (#831) 2023-12-17 18:18:48 -08:00
Chris Weaver
c7a91b1819 Allow re-sizing of document sidebar + make central chat smaller on small screens (#832) 2023-12-17 18:17:43 -08:00
Weves
a099f8e296 Rework header a bit + remove assumption of all personas having a prompt 2023-12-14 23:06:39 -08:00
Weves
16c8969028 Chat UI 2023-12-14 22:18:42 -08:00
Yuhong Sun
65fde8f1b3 Chat Backend (#801) 2023-12-14 22:14:37 -08:00
Yuhong Sun
229db47e5d Update LLM Key Check Logic (#825) 2023-12-09 13:41:31 -08:00
Weves
2e3397feb0 Check for slack bot token changes every 60 seconds 2023-12-08 14:14:22 -08:00
Weves
d5658ce477 Persona enhancements 2023-12-07 14:29:37 -08:00
Weves
ddf3f99da4 Add support for global API prefix env variable 2023-12-07 12:42:17 -08:00
Weves
56785e6065 Add model choice to Persona 2023-12-07 00:20:42 -08:00
Weves
26e808d2a1 Fix welcome modal 2023-12-06 21:07:34 -08:00
Yuhong Sun
e3ac373f05 Make Default Fast LLM not identical to main LLM (#818) 2023-12-06 16:14:04 -08:00
Yuhong Sun
9e9a578921 Option to speed up DanswerBot by turning off chain of thought (#816) 2023-12-05 00:43:45 -08:00
Weves
f7172612e1 Allow persona usage for Slack bots 2023-12-04 19:20:03 -08:00
Yuhong Sun
5aa2de7a40 Fix Weak Models Concurrency Issue (#811) 2023-12-04 15:40:10 -08:00
Yuhong Sun
e0b87d9d4e Fix Weak Model Prompt (#810) 2023-12-04 15:02:08 -08:00
Weves
5607fdcddd Make Slack Bot setup UI more similar to Persona setup 2023-12-03 23:36:54 -08:00
Yuhong Sun
651de071f7 Improve English rephrasing for multilingual use case (#808) 2023-12-03 14:34:12 -08:00
John Bergvall
5629ca7d96 Copy SearchQuery model with updated attribute due to Config.frozen=True (#806)
Fixes the following TypeError:

api_server_1     |   File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
api_server_1     |     return await get_asynclib().run_sync_in_worker_thread(
api_server_1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_server_1     |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
api_server_1     |     return await future
api_server_1     |            ^^^^^^^^^^^^
api_server_1     |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
api_server_1     |     result = context.run(func, *args)
api_server_1     |              ^^^^^^^^^^^^^^^^^^^^^^^^
api_server_1     |   File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 53, in _next
api_server_1     |     return next(iterator)
api_server_1     |            ^^^^^^^^^^^^^^
api_server_1     |   File "/app/danswer/utils/timing.py", line 47, in wrapped_func
api_server_1     |     value = next(gen)
api_server_1     |             ^^^^^^^^^
api_server_1     |   File "/app/danswer/direct_qa/answer_question.py", line 243, in answer_qa_query_stream
api_server_1     |     top_chunks = cast(list[InferenceChunk], next(search_generator))
api_server_1     |                                             ^^^^^^^^^^^^^^^^^^^^^^
api_server_1     |   File "/app/danswer/search/search_runner.py", line 469, in full_chunk_search_generator
api_server_1     |     retrieved_chunks = retrieve_chunks(
api_server_1     |                        ^^^^^^^^^^^^^^^^
api_server_1     |   File "/app/danswer/search/search_runner.py", line 353, in retrieve_chunks
api_server_1     |     q_copy.query = rephrase
api_server_1     |     ^^^^^^^^^^^^
api_server_1     |   File "pydantic/main.py", line 359, in pydantic.main.BaseModel.__setattr__
api_server_1     | TypeError: "SearchQuery" is immutable and does not support item assignment
2023-12-03 13:47:11 -08:00
Yuhong Sun
bc403d97f2 Organize Prompts for Chat implementation (#807) 2023-12-03 13:27:11 -08:00
Weves
292c78b193 Always pull latest data when visiting main search page 2023-12-03 03:25:13 -08:00
Weves
ac35719038 FE improvements to make initial setup more intuitive 2023-12-02 16:40:44 -08:00
Yuhong Sun
02095e9281 Restructure APIs (#803) 2023-12-02 14:48:08 -08:00
Yuhong Sun
8954a04602 Reorder Tables for cleaner extending (#800) 2023-12-01 17:46:13 -08:00
Yuhong Sun
8020db9e9a Update connector interface with optional Owners information (#798) 2023-11-30 23:08:16 -08:00
Yuhong Sun
17c2f06338 Add more metadata options for File connector (#797) 2023-11-30 13:24:22 -08:00
Weves
9cff294a71 Increase retries for google drive connector 2023-11-30 03:03:26 -08:00
Weves
e983aaeca7 Add more logging on existing jobs 2023-11-30 02:58:37 -08:00
Weves
7ea774f35b Change in-progress status color 2023-11-29 20:57:45 -08:00
Weves
d1846823ba Associate a user with web/file connectors 2023-11-29 18:18:56 -08:00
Yuhong Sun
fda89ac810 Expert Recommendation Heuristic Only (#791) 2023-11-29 15:53:57 -08:00
Yuhong Sun
006fd4c438 Ingestion API now always updates regardless of document updated_at (#786) 2023-11-29 02:08:50 -08:00
Weves
9b7069a043 Disallow re-indexing for File connector 2023-11-29 02:01:11 -08:00
Weves
c64c25b2e1 Fix temp file deletion 2023-11-29 02:00:20 -08:00
Yuhong Sun
c2727a3f19 Custom OpenAI Model Server (#782) 2023-11-29 01:41:56 -08:00
Chris Weaver
37daf4f3e4 Remove AI Thoughts by default (#783)
- Removes AI Thoughts by default - only shows when validation fails
- Removes punctuation "words" from queries in addition to stopwords (Vespa ignores punctuation anyways)
- Fixes Vespa deletion script for larger doc counts
2023-11-29 01:00:53 -08:00
Yuhong Sun
fcb7f6fcc0 Accept files with character issues (#781) 2023-11-28 22:43:58 -08:00
Weves
429016d4a2 Fix zulip page 2023-11-28 16:28:51 -08:00
Chris Weaver
c83a450ec4 Remove personal connectors page(#779) 2023-11-28 16:11:42 -08:00
Yuhong Sun
187b94a7d8 Blurb Key Error (#778) 2023-11-28 16:09:33 -08:00
Weves
30225fd4c5 Fix filter hiding 2023-11-28 04:13:11 -08:00
Weves
a4f053fa5b Fix persona refresh 2023-11-28 02:53:18 -08:00
Weves
eab4fe83a0 Remove Slack bot personas from web UI 2023-11-28 02:53:18 -08:00
Chris Weaver
78d1ae0379 Customizable personas (#772)
Also includes a small fix to LLM filtering when combined with reranking
2023-11-28 00:57:48 -08:00
Yuhong Sun
87beb1f4d1 Log LLM details on server start (#773) 2023-11-27 21:32:48 -08:00
Yuhong Sun
05c2b7d34e Update LLM related Libs (#771) 2023-11-26 19:54:16 -08:00
Yuhong Sun
39d09a162a Danswer APIs Document Ingestion Endpoint (#716) 2023-11-26 19:09:22 -08:00
Yuhong Sun
d291fea020 Turn off Reranking for Streaming Flows (#770) 2023-11-26 16:45:23 -08:00
Yuhong Sun
2665bff78e Option to turn off LLM for eval script (#769) 2023-11-26 15:31:03 -08:00
Yuhong Sun
65d38ac8c3 Slack to respect LLM chunk filter settings (#768) 2023-11-26 01:06:12 -08:00
Yuhong Sun
8391d89bea Fix Indexing Concurrency (#767) 2023-11-25 21:40:36 -08:00
Yuhong Sun
ac2ed31726 Indexing Jobs to have shorter lived DB sessions (#766) 2023-11-24 21:38:16 -08:00
Chris Weaver
47f947b045 Use torch.multiprocessing + enable SimpleJobClient by default (#765) 2023-11-24 18:29:28 -08:00
dependabot[bot]
63b051b342 Bump sharp from 0.32.5 to 0.32.6 in /web
Bumps [sharp](https://github.com/lovell/sharp) from 0.32.5 to 0.32.6.
- [Release notes](https://github.com/lovell/sharp/releases)
- [Changelog](https://github.com/lovell/sharp/blob/main/docs/changelog.md)
- [Commits](https://github.com/lovell/sharp/compare/v0.32.5...v0.32.6)

---
updated-dependencies:
- dependency-name: sharp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-24 18:14:45 -08:00
Weves
a5729e2fa6 Add new model server env vars to the compose file 2023-11-24 00:12:04 -08:00
Weves
3cec854c5c Allow different model servers for different models / indexing jobs 2023-11-23 23:39:03 -08:00
Weves
26c6651a03 Improve LLM answer parsing 2023-11-23 15:03:35 -08:00
Yuhong Sun
13001ede98 Search Regression Test and Save/Load State updates (#761) 2023-11-23 00:00:30 -08:00
Yuhong Sun
fda377a2fa Regression Script for Search quality (#760) 2023-11-22 19:33:28 -08:00
Yuhong Sun
bdfb894507 Slack Role Override (#755) 2023-11-22 17:47:18 -08:00
Weves
35c3511daa Increase Vespa timeout 2023-11-22 01:42:59 -08:00
Chris Weaver
c1e19d0d93 Add selected docs in UI + rework the backend flow a bit(#754)
Changes the flow so that the selected docs are sent over in a separate packet rather than as part of the initial packet for the streaming QA endpoint.
2023-11-21 19:46:12 -08:00
mattboret
e78aefb408 Add script to analyse the sources selection (#721)
---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-11-21 18:35:26 -08:00
Bryan Peterson
aa2e859b46 add missing dependencies in model_server dockerfile (#752)
Thanks for catching this! Super helpful!
2023-11-21 17:59:28 -08:00
Yuhong Sun
c0c8ae6c08 Minor Tuning for Filters (#753) 2023-11-21 15:47:58 -08:00
Weves
1225c663eb Add new env variable to compose file 2023-11-20 21:40:54 -08:00
Weves
e052d607d5 Add option to log Vespa timing info 2023-11-20 21:37:22 -08:00
Yuhong Sun
8e5e11a554 Add md files to File Connector (#749) 2023-11-20 19:56:06 -08:00
Yuhong Sun
57f0323f52 NLP Model Warmup Reworked (#748) 2023-11-20 17:28:23 -08:00
Weves
6e9f31d1e9 Fix ResourceLogger blocking main thread 2023-11-20 16:46:18 -08:00
Weves
eeb844e35e Fix bug with Google Drive shortcut error case 2023-11-20 16:34:07 -08:00
Sid Ravinutala
d6a84ab413 fix for url parsing google site 2023-11-20 16:08:43 -08:00
Weves
68160d49dd Small mods to enable deployment on AWS EKS 2023-11-20 01:42:48 -08:00
Yuhong Sun
0cc3d65839 Add option to run a faster/cheaper LLM for secondary flows (#742) 2023-11-19 17:48:42 -08:00
Weves
df37387146 Fix a couple bugs with google sites link finding 2023-11-19 15:35:54 -08:00
Yuhong Sun
f72825cd46 Provide Metadata to the LLM (#740) 2023-11-19 12:28:45 -08:00
Yuhong Sun
6fb07d20cc Multilingual Query Expansion (#737) 2023-11-19 10:55:55 -08:00
Chris Weaver
b258ec1bed Adjust checks for removal from existing_jobs dict + add more logging + only one scheduled job for a connector at a time (#739) 2023-11-19 02:03:17 -08:00
Yuhong Sun
4fd55b8928 Fix GPT4All (#738) 2023-11-18 21:21:02 -08:00
Yuhong Sun
b3ea53fa46 Fix Build Version (#736) 2023-11-18 17:16:25 -08:00
Yuhong Sun
fa0d19cc8c LLM Chunk Filtering (#735) 2023-11-18 17:12:24 -08:00
Weves
d5916e420c Fix duplicated query event for 'answer_qa_query_stream' and missing llm_answer in 'answer_qa_query' 2023-11-17 21:10:23 -08:00
Weves
39b912befd Enable show GPT answer option immediately 2023-11-17 17:08:38 -08:00
Weves
37c5f24d91 Fix logout redirect 2023-11-17 16:43:24 -08:00
Weves
ae72cd56f8 Add a bit more logging in indexing pipeline 2023-11-16 12:00:19 -08:00
Yuhong Sun
be5ef77896 Optional Anonymous Telemetry (#727) 2023-11-16 09:22:36 -08:00
Weves
0ed8f14015 Improve Vespa filtering performance 2023-11-15 14:30:12 -08:00
Weves
a03e443541 Add root_page_id option for Notion connector 2023-11-15 12:46:41 -08:00
Weves
4935459798 Fix hover being transparent 2023-11-15 11:52:40 -08:00
Weves
efb52873dd Prettier fix 2023-11-14 22:22:42 -08:00
Bradley
442f7595cc Added connector configuration link and external link icon to web connector page. 2023-11-14 22:19:00 -08:00
Weves
81cbcbb403 Fix connector deletion bug 2023-11-14 09:07:59 -08:00
Weves
0a0e672b35 Fix no letsencrypt 2023-11-13 14:32:51 -08:00
Yuhong Sun
69644b266e Hybrid Search Alpha Parameter (#714) 2023-11-09 17:11:10 -08:00
Yuhong Sun
5a4820c55f Skip Index on Docs with no newer updated at (#713) 2023-11-09 16:27:32 -08:00
Weves
a5d69bb392 Add back end time to Gong 2023-11-09 14:03:46 -08:00
Weves
23ee45c033 Enhance document explorer 2023-11-09 00:58:51 -08:00
Yuhong Sun
31bfd015ae Request Tracker Connector (#709)
Contributed by Evan! Thanks for the contribution!

- Minor linting and rebasing done by Yuhong, everything else from Evan

---------

Co-authored-by: Evan Sarmiento <e.sarmiento@soax.com>
Co-authored-by: Evan <esarmien@fas.harvard.edu>
2023-11-07 16:55:10 -08:00
Yuhong Sun
0125d8a0f6 Source Filter Extraction (#708) 2023-11-07 14:21:04 -08:00
Yuhong Sun
4f64444f0f Fix Version from Tag not picked up (#705) 2023-11-06 20:01:20 -08:00
Weves
abf9cc3248 Add timeout to all Notion calls 2023-11-06 19:29:42 -08:00
Chris Weaver
f5bf2e6374 Fix experimental checkpointing + move check for disabled connector to the start of the batch (#703) 2023-11-06 17:14:31 -08:00
Yuhong Sun
24b3b1fa9e Fix GitHub Actions Naming (#702) 2023-11-06 16:40:49 -08:00
Yuhong Sun
7433dddac3 Model Server (#695)
Provides the ability to pull out the NLP models into a separate model server which can then be hosted on a GPU instance if desired.
2023-11-06 16:36:09 -08:00
Weves
fe938b6fc6 Add experimental checkpointing 2023-11-04 14:51:28 -07:00
dependabot[bot]
2db029672b Bump pypdf from 3.16.4 to 3.17.0 in /backend/requirements (#667)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 3.16.4 to 3.17.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/3.16.4...3.17.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-03 18:54:29 -07:00
Yuhong Sun
602f9c4a0a Default Version to 0.2-dev (#690) 2023-11-03 18:37:01 -07:00
Bradley
551705ad62 Implemented Danswer versioning system. (#649)
* Web & API server versioning system. Displayed on UI.

* Remove some debugging code.

* Integrated backend version into GitHub Action & Docker build workflow using env variables.

* Fixed web container environment variable name.

* Revise Dockerfiles for GitHub Actions workflow.

* Added system information page to admin panel with version info. Updated github workflows to include tagged version, and corresponding changes in the dockerfiles and codebases for web&backend to use env variables if present. Changed to 'dev' naming scheme if no env var is present to indicate local setup. Removed version from admin panel header.

* Added missing systeminfo dir to remote repo.
2023-11-03 18:02:39 -07:00
Weves
d9581ce0ae Fix Notion recursive search for non-shared database 2023-11-03 15:46:23 -07:00
Yuhong Sun
e27800d501 Formatting 2023-11-02 23:31:19 -07:00
Yuhong Sun
927dffecb5 Prompt Layer Rework (#688) 2023-11-02 23:26:47 -07:00
Weves
68b23b6339 Enable database reading in recursive notion crawl 2023-11-02 23:14:54 -07:00
Weves
174f54473e Fix notion recursive search for blocks with children 2023-11-02 22:21:55 -07:00
Weves
329824ab22 Address issue with links for Google Sites connector 2023-11-02 22:01:08 -07:00
Yuhong Sun
b0f76b97ef Guru and Productboard Time Updated (#683) 2023-11-02 14:27:06 -07:00
Weves
80eedebe86 Add env variables to dev docker compose file 2023-11-01 22:00:32 -07:00
Weves
e8786e1a20 Small formatting fixes 2023-11-01 21:46:23 -07:00
Bryan Peterson
44e3dcb19f support for zendesk help center (#661) 2023-11-01 21:11:56 -07:00
Weves
e8f778ccb5 Improve index attempt display 2023-11-01 18:33:54 -07:00
Weves
d9adee168b Add simple job client to try and get rid of some of the flakiness / weirdness that we are seeing with Dask 2023-11-01 17:43:58 -07:00
Yuhong Sun
73b653d324 More Cleanup and Deduplication (#675) 2023-11-01 16:03:48 -07:00
Weves
9cd0c197e7 Fix frozen jobs 2023-11-01 14:30:51 -07:00
Weves
0b07d615b1 Add env variable to control Gong start time 2023-11-01 14:09:13 -07:00
Weves
5c9c70dffb Remove more native enums 2023-11-01 12:51:33 -07:00
Yuhong Sun
61c9343a7e Clean Up Duplicate Code (#670) 2023-10-31 23:25:26 -07:00
Yuhong Sun
53353f9b62 Custom Model Server Note (#668) 2023-10-31 19:05:38 -07:00
Yuhong Sun
fbf7c642a3 Reworking the LLM layer (#666) 2023-10-31 18:22:42 -07:00
meherhendi
d9e5795b36 Removing unused codefrom gdrive connector
Signed-off-by: meherhendi <meherhendi0@gmail.com>
2023-10-31 14:48:39 -07:00
meherhendi
acb60f67e1 Adding credential.is_admin to fix Gdrive indexign bug 2023-10-31 14:48:39 -07:00
meherhendi
4990aacc0d google drive step3 indexing not starting bug fix 2023-10-31 14:48:39 -07:00
Yuhong Sun
947d4d0a2e Pin Litellm Version (#665) 2023-10-31 12:09:39 -07:00
dependabot[bot]
d7a90aeb2b Bump langchain from 0.0.312 to 0.0.325 in /backend/requirements (#658)
Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.312 to 0.0.325.
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.312...v0.0.325)

---
updated-dependencies:
- dependency-name: langchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-31 12:05:55 -07:00
Krish Dholakia
ee0d092dcc Add LiteLLM Support - Anthropic, Bedrock, Huggingface, TogetherAI, Replicate, etc. (#510)
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-10-31 12:01:15 -07:00
Sam Jakos
c6663d83d5 Fix nginx conf in docker-compose.prod 2023-10-31 08:50:52 -07:00
Yuhong Sun
0618b59de6 Fix Indexing Frozen (#660) 2023-10-30 20:49:39 -07:00
Weves
517a539d7e Make indexing jobs use more cores again 2023-10-30 18:50:40 -07:00
Weves
a1da4dfac6 Fix document-search endpoint with auth disabled 2023-10-30 00:46:55 -07:00
Yuhong Sun
e968e1d14b Fix Rare Vespa Document ID Issue (#656) 2023-10-30 00:06:51 -07:00
Chris Weaver
8215a7859a Small UI fixes (#655) 2023-10-29 23:17:25 -07:00
Yuhong Sun
37bba3dbe9 Gong to accept workspace IDs (#654) 2023-10-29 22:46:01 -07:00
Yuhong Sun
52c0d6e68b Hybrid Search (#653) 2023-10-29 22:18:00 -07:00
Weves
08909b40b0 Add rate limiting wrapper + add to Document360 2023-10-29 18:00:17 -07:00
Yuhong Sun
64ebaf2dda Pin Vespa Version (#651) 2023-10-29 14:30:33 -07:00
Weves
815c30c9d0 Fix jobs erroring while waiting on queue 2023-10-29 01:35:04 -07:00
Yuhong Sun
57ecab0098 Fix Json Output Issue and Fix miscount of new docs per Index Attempt (#641) 2023-10-29 00:34:28 -07:00
Yuhong Sun
26b491fb0c Prep for Hybrid Search (#648) 2023-10-29 00:13:21 -07:00
Weves
bfa338e142 Adjust time_updated assignment + increase frozen timeout to 3hrs 2023-10-28 22:27:18 -07:00
Weves
e744c6b75a Re-style cc pair status table 2023-10-28 21:11:01 -07:00
Yuhong Sun
7d6a41243c Fix Use Keyword Default (#646) 2023-10-28 14:57:24 -07:00
Weves
25814d7a23 Go back to hiding filters if screen width is too small 2023-10-28 14:52:07 -07:00
Weves
59b16ac320 Upgrade to latest NextJS version 2023-10-28 14:52:07 -07:00
Yuhong Sun
11d96b2807 Rename DanswerBot (#644) 2023-10-28 14:41:36 -07:00
Yuhong Sun
fe117513b0 Reorganize and Cleanup for Hybrid Search (#643) 2023-10-28 14:24:28 -07:00
Chris Weaver
fcce2b5a60 Individual connector page (#640) 2023-10-27 21:32:18 -07:00
Weves
ad6ea1679a Sleep for a little before starting nginx 2023-10-27 16:40:33 -07:00
Yuhong Sun
fad311282b Remove extra missed Enum (#638) 2023-10-27 13:58:20 -07:00
Yuhong Sun
ca0f186b0e Remove all in-postgres Enums (#637) 2023-10-27 12:13:57 -07:00
Weves
c9edc2711c Add basic retries to Vespa insert calls 2023-10-26 14:20:38 -07:00
Weves
dcbb7b85d9 Fix null author for Confluence connector 2023-10-26 13:18:59 -07:00
Yuhong Sun
2df9f4d7fc Confluence Author Optional (#634) 2023-10-26 12:07:05 -07:00
Yuhong Sun
7bc34ce182 Fix Container Name (#633) 2023-10-25 23:22:24 -07:00
Chris Weaver
76275b29d4 Adjust the way LLM class is instantiated + fix issue where .env file GEN_AI_API_KEY wasn't being used (#630) 2023-10-25 22:33:18 -07:00
Yuhong Sun
604e511c09 Alternative solution to up the number of threads for torch (#632) 2023-10-25 22:30:57 -07:00
Yuhong Sun
379e71160a Confluence Data Center Edge Cases (#631) 2023-10-25 21:52:07 -07:00
Chris Weaver
a8b7155b5e Add support for non-letsencrypt-based https in docker compose setup (#628) 2023-10-25 20:35:47 -07:00
Yuhong Sun
9a51745fc9 Updated Contributing for Celery (#629) 2023-10-25 18:26:02 -07:00
Weves
fbb05e630d Add more retries in Google Drive connector 2023-10-24 20:24:45 -07:00
Chris Weaver
ef2b445201 Support Confluence data center + allow for specifying labels to ignore (#624) 2023-10-24 17:40:42 -07:00
Yuhong Sun
17bd68be4c Wrap errors in an object instead of plain dict (#623) 2023-10-24 16:07:45 -07:00
Yuhong Sun
890eb7901e Capping negative boost at 0.5 (#622) 2023-10-24 15:08:27 -07:00
Chris Weaver
0a6c2afb8a Notion extra logging + small improvements (#621) 2023-10-24 15:00:50 -07:00
meherhendi
7ffba2aa60 Google Drive shared files bug fix 2023-10-23 23:51:35 -07:00
Weves
816ec5e3ca Graceful failure for pages without a navbar links in Google Sites connector 2023-10-23 23:40:00 -07:00
Weves
3554e29b8d Add updated_at to UI + add time range selector 2023-10-23 23:32:16 -07:00
Yuhong Sun
88eaae62d9 Rework Boost and Time Decay Calculations for No-Reranker flow (#618) 2023-10-23 23:25:06 -07:00
Weves
a014cb7792 Fix admin search 2023-10-23 00:25:50 -07:00
Chris Weaver
89807c8c05 Fix deletion status display + add celery util + fix seg faults (#615) 2023-10-22 19:41:29 -07:00
Yuhong Sun
8403b94722 Default Personas to have Document Sets (#614) 2023-10-22 16:57:16 -07:00
Weves
4fa96788f6 Don't try to decrypt when no pw is specified 2023-10-22 15:34:23 -07:00
Yuhong Sun
e279918f95 Introduce Time Filters (#610) 2023-10-22 15:06:52 -07:00
Weves
8e3258981e Adjust default Tremor color 2023-10-21 13:16:29 -07:00
Weves
b14b220d89 Remove double import 2023-10-21 00:52:28 -07:00
Buglover
cc5d27bff7 Update connector.py
fix issue #606
2023-10-21 00:42:31 -07:00
Yuhong Sun
5ddc9b34ab Add Document UpdatedAt times for most connectors (#605) 2023-10-20 17:03:28 -07:00
Weves
a7099a1917 Add retrieved_document_ids to QueryEvent 2023-10-20 12:43:15 -07:00
Weves
47ab273353 Add tremor 2023-10-20 11:05:34 -07:00
Yuhong Sun
7be3730038 Fix LLM error reporting (#600) 2023-10-19 18:20:24 -07:00
Yuhong Sun
f6982b03b6 Handle PDF parse failures gracefully (#599) 2023-10-19 17:46:13 -07:00
Chris Weaver
76f1f17710 Fix hidden documents (#598) 2023-10-19 17:38:11 -07:00
Yuhong Sun
bc1de6562d Fix Explorer (#597) 2023-10-19 17:33:49 -07:00
dependabot[bot]
93d4eef61d Bump @babel/traverse from 7.22.11 to 7.23.2 in /web (#591)
Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.22.11 to 7.23.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse)

---
updated-dependencies:
- dependency-name: "@babel/traverse"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-19 17:12:12 -07:00
dependabot[bot]
4ffbdbb8b0 Bump langchain from 0.0.308 to 0.0.312 in /backend/requirements (#551)
Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.308 to 0.0.312.
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.308...v0.0.312)

---
updated-dependencies:
- dependency-name: langchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-19 17:11:42 -07:00
meherhendi
764aab3e53 increasing OpenAi api key verification timeout (#587) 2023-10-19 17:09:49 -07:00
Yuhong Sun
7c34744655 Use shared PDF utility function to not error on encrypted PDFs (#596) 2023-10-19 17:01:55 -07:00
Yuhong Sun
2037e11495 Replace PyPDF2 with pypdf (#595) 2023-10-19 16:12:31 -07:00
Yuhong Sun
6a449f1fb1 Introduce Recency Bias (#592) 2023-10-19 12:54:35 -07:00
Yuhong Sun
d9076a6ff6 Use strict=False for Parsing LLM Jsons (#594) 2023-10-19 12:28:16 -07:00
Chris Weaver
1bd76f528f Document explorer admin page (#590) 2023-10-18 18:41:39 -07:00
Yuhong Sun
a5d2759fbc Recreate Tables from HTML (#588) 2023-10-18 11:16:40 -07:00
Yuhong Sun
022f59e5b2 Fix Slack Link Parsing (#589) 2023-10-18 11:14:12 -07:00
Weves
5bf998219e Add missing arg 2023-10-17 20:19:33 -07:00
Chris Weaver
5da81a3d0d Add hiding of documents to feedback page (#585) 2023-10-17 20:06:12 -07:00
Yuhong Sun
e73739547a Fix Kubernetes Templates (#584) 2023-10-17 13:32:00 -07:00
Chris Weaver
e519dfc849 Torch more cpus (#583) 2023-10-17 09:53:20 -07:00
Yuhong Sun
bf5844578c Personas to have option to be aware of current date and time (#582) 2023-10-16 23:42:39 -07:00
Chris Weaver
37e9ccf864 Make docs indexed cnt more accurate (#579) 2023-10-16 20:18:19 -07:00
Yuhong Sun
bb9a18b22c Slack Connector to not Index Bots (#581) 2023-10-16 20:08:03 -07:00
Yuhong Sun
b5982c10c3 Celery Beat (#575) 2023-10-16 14:59:42 -07:00
Chris Weaver
a7ddb22e50 Only log vespa error on second attempt (#578) 2023-10-15 21:36:25 -07:00
Yuhong Sun
595f61ea3a Add Retrieval to Chat History (#577) 2023-10-15 13:40:07 -07:00
Weves
d2f7dff464 Add max upload size to HTTPS NGINX listener 2023-10-15 12:52:13 -07:00
Weves
ae0dbfadc6 Fix Google Drive Connector when using OAuth 2023-10-15 00:08:19 -07:00
Yuhong Sun
38d516cc7a Update Danswer Docs Pointers (#573) 2023-10-14 12:12:37 -07:00
Yuhong Sun
7f029a0304 Reorder Imports (#572) 2023-10-14 10:12:27 -07:00
Yuhong Sun
2c867b5143 Fix Slack premature Reacts and Notification (#571) 2023-10-13 22:52:21 -07:00
Yuhong Sun
af510cc965 API support for Chat to have citations (#569) 2023-10-13 17:38:25 -07:00
Weves
f0337d2eba Auto-delete unlinked connectors on creation of a new connector with the same name 2023-10-13 13:40:37 -07:00
Weves
17e00b186e Add back Gong connector to sidebar + fix formatting issues 2023-10-13 12:10:51 -07:00
Yuhong Sun
dbf59d2acc Dockerfile to build smaller Images (#567) 2023-10-12 13:08:47 -07:00
Weves
41964031bf Fix FE build 2023-10-12 11:05:43 -07:00
Yuhong Sun
a7578c9707 Fix SlackBot still tagging groups (#564) 2023-10-12 00:32:43 -07:00
Yuhong Sun
51490b5cd9 Favor tz aware objects (#562) 2023-10-11 21:19:09 -07:00
Chris Weaver
e6866c92cf Fix call to versioned_get_access_for_documents_fn args order (#563) 2023-10-11 21:18:22 -07:00
Yuhong Sun
8c61e6997b Document 360 Touchups (#561) 2023-10-11 20:16:42 -07:00
nlp8899
90828008e1 Document360 Connector (#552) 2023-10-11 20:10:01 -07:00
Weves
12442c1c06 Make it harder to use unversioned access functions 2023-10-11 17:52:38 -07:00
Weves
876c6fdaa6 Address bug with automatic document set cleanup on connector deletion 2023-10-11 14:42:45 -07:00
Yuhong Sun
3e05c4fa67 Move DanswerBot Configs (#559) 2023-10-11 10:24:16 -07:00
Yuhong Sun
90fbe1ab48 Comments on advanced DanswerBot options (#557) 2023-10-10 19:11:48 -07:00
Yuhong Sun
31d5fc6d31 Officially support Slack DMs to DanswerBot (#556) 2023-10-10 18:07:29 -07:00
Weves
fa460f4da1 Small tweak to the connector deletion flow 2023-10-10 16:38:36 -07:00
Weves
e7cc0f235c Add migration 2023-10-10 15:32:30 -07:00
Weves
091c2c8a80 Automatically delete document set relationships when deleting a ConnectorCredentialPair 2023-10-10 15:32:18 -07:00
Weves
3142e2eed2 Add user group prefix + access filter utility 2023-10-10 14:07:01 -07:00
Weves
5deb12523e Allow large file uploads 2023-10-10 09:11:07 -07:00
Yuhong Sun
744c95e1e1 Remove Stopword Highlighting (#546) 2023-10-09 18:54:40 -07:00
Yuhong Sun
0d505ffea1 Provide Env variable to have chat flow always use the tools prompt (#548) 2023-10-09 09:26:00 -07:00
Yuhong Sun
30cdc5c9de Slack Bot to respond very quickly to acknowledge seeing the question (#544) 2023-10-09 09:24:28 -07:00
Weves
dff7a4ba1e Fix Google sites doc link 2023-10-08 23:13:37 -07:00
Chris Weaver
d95da554ea Add Google Sites connector (#532) 2023-10-08 19:20:38 -07:00
Weves
fb1fbbee5c Add some additional FE components 2023-10-08 17:24:12 -07:00
Chris Weaver
f045bbed70 Add infinite retry for starting up Slack bot (#540) 2023-10-08 17:02:40 -07:00
Weves
ca74884bd7 Pin pytorch version to fix segmentation fault in Docker 2023-10-08 16:11:53 -07:00
Yuhong Sun
9b185f469f Vespa edge case ID does not follow expected format (#541) 2023-10-08 13:36:00 -07:00
Weves
e8d3190770 Fix really long words / strings 2023-10-08 11:56:29 -07:00
Yuhong Sun
a6e6be4037 Fix Divide by Zero Edge Case (#535) 2023-10-08 09:30:30 -07:00
Yuhong Sun
7d3f8b7c8c Gong Connector (#529)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-10-08 00:27:15 -07:00
Weves
c658ffd0b6 Fix drive connector for service accounts with shared files turned off 2023-10-08 00:24:56 -07:00
Weves
9425ccd043 Fix import 2023-10-07 23:36:35 -07:00
Weves
829b50571d Remove outdated experimental config in NextJS 2023-10-07 23:36:35 -07:00
Chris Weaver
d09c320538 Fix logout (#536) 2023-10-07 23:22:27 -07:00
Yuhong Sun
478fb4f999 Default to API key in file (#531) 2023-10-07 17:50:17 -07:00
dependabot[bot]
0fd51409ad Bump zod and next in /web (#530)
Removes [zod](https://github.com/colinhacks/zod). It's no longer used after updating ancestor dependency [next](https://github.com/vercel/next.js). These dependencies need to be updated together.


Removes `zod`

Updates `next` from 13.4.19 to 13.5.4
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v13.4.19...v13.5.4)

---
updated-dependencies:
- dependency-name: zod
  dependency-type: indirect
- dependency-name: next
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-07 17:19:49 -07:00
dependabot[bot]
21aa233170 Bump langchain from 0.0.273 to 0.0.308 in /backend/requirements (#516)
Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.273 to 0.0.308.
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.273...v0.0.308)

---
updated-dependencies:
- dependency-name: langchain
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-07 17:18:51 -07:00
dependabot[bot]
30efe3df88 Bump postcss from 8.4.29 to 8.4.31 in /web (#528)
Bumps [postcss](https://github.com/postcss/postcss) from 8.4.29 to 8.4.31.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/postcss/postcss/compare/8.4.29...8.4.31)

---
updated-dependencies:
- dependency-name: postcss
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-07 17:17:57 -07:00
Yuhong Sun
09ba0a49b3 Default not respond to every channel(#527) 2023-10-06 14:54:34 -07:00
Yuhong Sun
beb54eaa5d Fix migrations that went to the same revision (#525) 2023-10-05 21:58:14 -07:00
Yuhong Sun
0632e92144 Chat without tools should use a less complex prompt (#524) 2023-10-05 21:44:13 -07:00
Chris Weaver
9c89ae78ba Move is_public from Credential to ConnectorCredentialPair (#523) 2023-10-05 20:55:41 -07:00
Yuhong Sun
a85e73edbe Fix exception if no filters configured (#520) 2023-10-05 15:01:35 -07:00
Weves
5a63b689eb Add DANSWER_BOT_ONLY_ANSWER_WHEN_SLACK_BOT_CONFIG_IS_PRESENT to dev compose 2023-10-05 11:52:31 -07:00
Yuhong Sun
aee573cd76 Chat Feedback Backend (#513) 2023-10-04 20:52:24 -07:00
Weves
ec7697fcfe Add session description to get-user-chat-sessions endpoint 2023-10-04 17:09:36 -07:00
mattboret
b801937299 fix: remove \xa0 from code blocks (#509)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-10-04 09:46:32 -07:00
Yuhong Sun
499dfb59da Reenable Google Colab Model (#507) 2023-10-03 21:50:55 -07:00
Weves
7cc54eed0f Add env variable to make DanswerBot only respond when a config is present 2023-10-03 18:11:31 -07:00
Weves
d04716c99d Fix import ordering 2023-10-03 17:51:18 -07:00
Weves
732f5efb12 Fix AUTH_TYPE env variable bug on frontend 2023-10-03 17:35:50 -07:00
Weves
29a0a45518 Replace 'respond_sender_only' with 'respond_tag_only' + prettify UI 2023-10-03 16:26:02 -07:00
Yuhong Sun
59bac1ca8f Support more Slack Config Options (#494)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-10-03 14:55:29 -07:00
Yuhong Sun
c2721c7889 Option to have very verbose LLM logging (#500) 2023-10-02 22:56:24 -07:00
Yuhong Sun
ab65b19c4c Add OAuth configurability (#499) 2023-10-02 11:05:08 -07:00
nlp8899
c666f35cd0 create a hubspot connector (#482) 2023-10-02 10:13:23 -07:00
Weves
dbe33959c0 Move auto-ACL update to background job 2023-10-02 00:37:51 -07:00
Weves
829d04c904 Add multi-threading to improve speed of updates / indexing 2023-10-02 00:37:51 -07:00
Yuhong Sun
351475de28 Consolidate versions for easier extension (#495) 2023-10-01 23:49:38 -07:00
Jignesh Solanki
a808c733b8 allow pdf file in File Connector (#488) 2023-10-01 22:54:40 -07:00
Chris Weaver
2d06008f6f Add document set-based filters in UI (#497) 2023-10-01 10:46:04 -07:00
Weves
aa9071e441 Only run set_acl_for_vespa once 2023-09-30 14:20:41 -07:00
Weves
22f2398269 Make launch.json a template so that devs can customize 2023-09-30 13:45:54 -07:00
Weves
1abce83626 Add option to select Scrape Type in the web connector UI 2023-09-29 19:10:00 -07:00
Chris Weaver
0c6077ee7e Fix service accounts + shared drives (#490) 2023-09-29 17:26:42 -07:00
Weves
bfab9d1ee7 More notion testing 2023-09-28 16:24:51 -07:00
Weves
28859fe127 Try to explicitly use a root page for notion 2023-09-28 16:24:51 -07:00
Weves
79c28e1988 Fix backend Dockerfile 2023-09-26 22:53:01 -07:00
Weves
7afcf3489f Auto-populate ACL fields on server startup 2023-09-26 22:53:01 -07:00
Weves
c09f00990e Updatr CONTRIBUTING.md with celery cmds 2023-09-26 18:55:45 -07:00
Weves
5f25826a98 Handle document_set deletion better + prevent updates while syncing 2023-09-26 18:06:40 -07:00
Weves
60cddee310 Add better message in Slack management UI 2023-09-26 15:32:40 -07:00
Chris Weaver
d41d844116 Slack bot management dashboard (#483) 2023-09-26 14:03:27 -07:00
Chris Weaver
0c58c8d6cb Adding Document Sets (#477)
Adds:
- name for connector credential pairs + frontend changes to start populating this field
- document set table migration
- during indexing, document sets are now checked and inserted into Vespa
- background job to check if document sets need to be synced
- document set management APIs
- document set management dashboard in the UI
2023-09-26 12:53:19 -07:00
Chris Weaver
8594bac30b Transition to using access_control_list to manage access in Vespa (#450) 2023-09-26 12:26:39 -07:00
Weves
c4e4e88301 Add NOTION_CONNECTOR_ENABLE_RECURSIVE_PAGE_LOOKUP to the .dev deployment setup 2023-09-25 23:21:41 -07:00
Yuhong Sun
6d376d3cf6 Fix Type with new OpenAI Endpoint (#480) 2023-09-25 11:15:56 -07:00
Rajeesh
0c3ecbfa2f Ability to provide different base URL for the open AI LLM endpoint (#475) 2023-09-25 11:07:07 -07:00
Yuhong Sun
8b95e2631d Make Cross Encoders Optional (#476) 2023-09-23 17:17:54 -07:00
Yuhong Sun
3c65317538 Fix Slackbot Tagging people or groups (#473) 2023-09-21 21:44:35 -07:00
Yuhong Sun
5cc17d39f0 Chat Backend API edge cases handled (#472) 2023-09-21 20:24:47 -07:00
Yuhong Sun
b416c85f0f Add Metrics to Regression Test (#470) 2023-09-20 20:42:02 -07:00
Weves
4912beb283 Add recursive Notion search 2023-09-20 15:42:18 -07:00
Weves
db024ad7b7 Provide default value for 'retrieval_enabled' column in migration 2023-09-20 13:30:57 -07:00
Yuhong Sun
4b98e47036 Add more flexibility to the Web Connector (#462) 2023-09-19 20:18:25 -07:00
Yuhong Sun
da6dd5b617 Prefix Slack Channel Identifier with a pound sign (#461) 2023-09-19 15:04:10 -07:00
Yuhong Sun
32eee88628 Special Danswer flow for Chat (#459) 2023-09-18 21:10:20 -07:00
Yuhong Sun
3641102672 Verification prompt 3.5 tuneup (#458) 2023-09-18 13:16:26 -07:00
Weves
0fcedfec17 Fix dynamic summary parsing 2023-09-18 09:54:12 -07:00
Yuhong Sun
5b1109d5c1 Clean up Slack Bot formatting (#455) 2023-09-17 22:47:33 -07:00
Yuhong Sun
b337a521f8 Slack Bot Interface Rework (#454) 2023-09-17 19:23:59 -07:00
Yuhong Sun
d7b7714d86 Cleanup for Mintlify Websites (#453) 2023-09-16 23:43:24 -07:00
Yuhong Sun
6b305c56b3 Use Sentence Aware Splitter (#452) 2023-09-16 16:28:16 -07:00
Yuhong Sun
63215e9c9a Fix Migration Conflict (#449) 2023-09-15 17:56:03 -07:00
Yuhong Sun
f802351d85 Fix Vespa Issue where Documents with no Content could be retrieved via Vector Search (#448) 2023-09-15 13:03:14 -07:00
Weves
1d945becab Update node verison from 18 -> 20 to address security scan issues 2023-09-15 12:17:54 -07:00
Yuhong Sun
e549d2bb4a Chat with Context Backend (#441) 2023-09-15 12:17:05 -07:00
Weves
a16ce56f6b Fix for notion connector 2023-09-14 20:15:05 -07:00
Weves
c4e0face9b Move connector / credential pair deletion to celery 2023-09-14 16:23:13 -07:00
Weves
3fc7a13a31 Add extra logging for failure to fetch page blocks 2023-09-13 18:53:15 -07:00
Weves
e433e27bc8 Small highlighting fix 2023-09-13 18:23:46 -07:00
Weves
2bf38fa996 Fix slack link bug with non-thread messages 2023-09-13 17:48:01 -07:00
Chris Weaver
4e359bc731 Fix bugs in Notion connector (#440)
* Fix bad pagination
* Make each block be a section -> we can link to individual blocks 
* Don't have a page include all content from child pages
2023-09-13 13:30:41 -07:00
Weves
ffa24e2f09 Handle newlines + code blocks in answer 2023-09-12 15:54:16 -07:00
Yuhong Sun
9738e5e628 minor touchups 2023-09-12 15:44:15 -07:00
Weves
6f50f6710a Fix slack links for messages inside of a thread 2023-09-12 13:57:01 -07:00
Weves
d130a93b0f Fix small spacing issue with keyword highlighting 2023-09-12 13:10:10 -07:00
Weves
cf2bd8a40c highlighting 2023-09-12 11:35:37 -07:00
Yuhong Sun
b5fc2a5775 Regression Test (#434) 2023-09-11 19:06:01 -07:00
Yuhong Sun
101ff2f392 Fix LLM warm up (#433) 2023-09-11 14:47:36 -07:00
Yuhong Sun
9316b78f47 Evaluate LLM Answers via Reflexion (#430) 2023-09-11 14:45:13 -07:00
Weves
ddfa8cf8a6 Disable indexing for empty docs 2023-09-11 13:28:13 -07:00
Yuhong Sun
6c795dfa6c Slack CoT Scratchpad (#421) 2023-09-10 16:56:44 -07:00
Weves
1d847bfd23 Fix timeout for new LLM class 2023-09-10 11:18:05 -07:00
Weves
05a5419c8e Add backwards compatibility for users who don't have groups:read 2023-09-10 11:03:23 -07:00
Weves
e72f26ef53 Fix indexing job cleanup 2023-09-10 11:01:34 -07:00
Weves
67c26f89e8 Add new Github fields to FE 2023-09-10 10:34:35 -07:00
Yuhong Sun
f34f373b08 Fix Github Metadata (#427) 2023-09-10 10:27:01 -07:00
Yuhong Sun
f126dfdbd0 Add Github Polling and Issues (#424) 2023-09-09 23:11:00 -07:00
Yuhong Sun
4a0c2bf866 Vespa Save and Load (#422) 2023-09-09 20:25:31 -07:00
Andrea Nassi
0e65688166 Change max upload size setting (#410)
---------

Co-authored-by: Gabriele Capitani <gabrielecapitani2005@gmail.com>
2023-09-09 17:01:44 -07:00
Weves
eae6f58450 Fix slack bot retrieval 2023-09-09 12:57:23 -07:00
Weves
648706d48c Allow indexing of private channels 2023-09-08 17:18:07 -07:00
Yuhong Sun
b1fe120021 Make Vespa Deployment file as simple as possible (#416)
Make Vespa Deployment file as simple as possible
2023-09-08 00:47:33 -07:00
Yuhong Sun
4ae2680384 Minor Tweaks to Default Prompt (#415) 2023-09-07 23:42:34 -07:00
Yuhong Sun
20a6de0635 Fix Github Actions names (#414) 2023-09-07 22:33:17 -07:00
Yuhong Sun
c9492bf624 Split the build on Tag actions into two jobs (#413) 2023-09-07 22:13:38 -07:00
Yuhong Sun
52fa71eaff Better QA Prompts (#409) 2023-09-06 22:46:25 -07:00
Weves
ccbc69d153 Misc slack bot improvements 2023-09-06 18:12:31 -07:00
Yuhong Sun
7972c8a71e Make Boosting give more consistent scores (#406) 2023-09-06 15:50:53 -07:00
Weves
2d077a9544 Improve quote hover display 2023-09-06 15:45:53 -07:00
Weves
78e1806688 Add more logging for notion connector + add retries 2023-09-06 11:45:46 -07:00
Chris Weaver
6a79ddce37 New prompt + show quotes on hover (#404) 2023-09-06 01:44:48 -07:00
Yuhong Sun
5977a28f58 No Context Chat Backend (#397) 2023-09-05 22:32:00 -07:00
Chris Weaver
630386c8c4 Remove tornado key + remove nodejs once copied into playwright + remove old semver module (#402) 2023-09-05 19:18:35 -07:00
Chris Weaver
b06e53a51e Feed in docs till we reach a token limit (#401) 2023-09-05 15:20:42 -07:00
Weves
58b75122f1 Fix deletion for overlapping connectors 2023-09-04 21:12:08 -07:00
Yuhong Sun
d593818996 Use Vespa Doc ID directly instead of from fields (#399) 2023-09-04 17:24:31 -07:00
Weves
f7cc7190fe Allow connectors that have documents with feedback to be deleted 2023-09-04 16:05:43 -07:00
Weves
adb22273b6 Fix Vespa limit 2023-09-04 15:49:11 -07:00
Weves
742a016175 Remove empty files from QA 2023-09-04 15:49:11 -07:00
Chris Weaver
0fcac74df1 Add WEB_DOMAIN env variable to dev compose file (#395) 2023-09-04 10:43:06 -07:00
Chris Weaver
50101a8cac Use newest version of LTS node for playwright (#393) 2023-09-04 10:42:49 -07:00
Yuhong Sun
f4866bfefc Remove libc-dev and uninstall py (#392) 2023-09-03 17:18:41 -07:00
Yuhong Sun
c28f4d4527 Remove py library due to denial of service CVE (#391) 2023-09-03 16:36:13 -07:00
Weves
884f746211 Fix oauth redirect 2023-09-02 15:28:22 -07:00
Weves
f4d55479c4 Fix popup overlap 2023-09-02 14:26:56 -07:00
Yuhong Sun
28480d19de Fix Web Connector Docker Dependencies (#388) 2023-09-02 14:20:41 -07:00
Weves
2885240183 Default to semantic search 2023-09-02 11:59:32 -07:00
Yuhong Sun
c95cf5ca74 Playwright only install Chrome (#386) 2023-09-02 10:15:15 -07:00
Yuhong Sun
4aebb69883 Upgrade packages for security reasons (#384) 2023-09-01 20:35:48 -07:00
Chris Weaver
c68afbe9d0 UI for AI thoughts (#385) 2023-09-01 20:32:22 -07:00
Yuhong Sun
06c1afce42 Remove ANSWERABLE text from model out (#383) 2023-09-01 15:10:14 -07:00
Yuhong Sun
d73d81c867 Scripts to Reset Postgres and Vespa (#382) 2023-09-01 14:43:04 -07:00
Yuhong Sun
493648d28b Reduce Slack Bot Log Spamming (#381) 2023-09-01 10:43:57 -07:00
Weves
b89a06f03b Allow admins to connect public credentials to connectors 2023-09-01 10:42:49 -07:00
Weves
0d4244f990 Fix API key specification bug 2023-09-01 10:30:02 -07:00
Yuhong Sun
bddf03cd54 Tag Latest Image on Code Tag (#380) 2023-09-01 10:22:04 -07:00
Weves
5a6abbf39e Fix query ID when giving feedback 2023-08-31 20:05:07 -07:00
Yuhong Sun
e1fbffd141 Index all Google Drive file types (#373) 2023-08-31 19:20:32 -07:00
Yuhong Sun
6bae93ad3c Notion connector test separately (#372) 2023-08-31 18:18:19 -07:00
Yuhong Sun
43efa9da94 Mark incomplete Index Attempts as Failed on job restart (#371) 2023-08-31 17:43:03 -07:00
Yuhong Sun
dac5aaea94 Fix Ruff generated mistake (#370) 2023-08-31 17:03:57 -07:00
Yuhong Sun
80a08bbf0c Return empty string for encrypted PDF (#369) 2023-08-31 16:59:28 -07:00
Yohann Fabri
d6e87df548 gdrive connector ignore encrypted pdf file (#353) (#362)
Thanks for your contribution!
2023-08-31 16:57:08 -07:00
Yuhong Sun
ac2a4f9051 Ruff Styling (#368) 2023-08-31 15:55:01 -07:00
Yuhong Sun
51ec2517cb LLM to validate user Query (#365)
Backend Only
2023-08-31 15:33:39 -07:00
Patrick Decat
0a7775860c feat(dev): use ruff for python linting (#355) 2023-08-31 15:29:59 -07:00
Yuhong Sun
8bf82ac144 Better logging for Google Drive follow shortcuts (#367) 2023-08-31 15:24:23 -07:00
Yuhong Sun
c1727e63ad bug squashed 2023-08-31 14:32:13 -07:00
Weves
5dc855c4fc Fix null document ID 2023-08-31 13:04:43 -07:00
Weves
f316c8569f Show scores 2023-08-31 11:25:02 -07:00
Weves
4bce20b5c4 Switchup icons 2023-08-31 11:25:02 -07:00
Weves
996420f92c Fix negative values for feedback page 2023-08-31 11:25:02 -07:00
Yuhong Sun
ec4d0b856c Added boost to rerank step (#360) 2023-08-30 23:12:55 -07:00
Weves
5b3abb4cb3 Update document boost UI 2023-08-30 20:02:21 -07:00
Weves
faa73b3088 Add env variables for slack bot to .dev compose file 2023-08-30 17:43:02 -07:00
Yuhong Sun
681eb6e9f2 Fix Multiarchitecture Docker (#358) 2023-08-30 16:00:32 -07:00
Weves
a6ea40714e Disable langchain retries 2023-08-30 14:28:14 -07:00
Weves
cea3e1f3d5 Add support for multiple allowed email domains + make slack bot logs go to stdout 2023-08-30 13:47:46 -07:00
Chris Weaver
038f646c09 Fe for feedback (#346) 2023-08-30 12:52:24 -07:00
Yuhong Sun
856061c7ea Fix ReDoS and Directory Traversal (#352)
Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-08-29 21:20:15 -07:00
Weves
9e82dbf8bb Fix confluence connector styling 2023-08-29 14:45:52 -07:00
Weves
1c3d0a1f3d Add environment variable which disables answering when an answer is not found 2023-08-29 14:45:52 -07:00
Patrick Decat
3c5cdb07c1 fix(confluence): add missing import (#350) 2023-08-29 11:36:14 -07:00
Patrick Decat
681a8a423f fix(confluence): ignore empty pages (#349) 2023-08-29 09:45:35 -07:00
Yuhong Sun
548f0a41cb Confluence handle pages without body.storage.value (#347)
Workaround for: https://jira.atlassian.com/browse/CONFCLOUD-76433
2023-08-28 18:35:13 -07:00
Yuhong Sun
b2a51283d1 Learn from feedback backend (#343)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-08-28 13:29:29 -07:00
Chris Weaver
c43a403b71 Update README.md 2023-08-28 02:43:46 -07:00
Chris Weaver
cddd86dd1c Update README.md
Add embeds + update features / roadmap
2023-08-28 02:42:01 -07:00
Yuhong Sun
96575bf893 Remove Unused Imports and Variables (#344) 2023-08-27 17:39:41 -07:00
Weves
4469447fde Add LangChain-based LLM 2023-08-26 21:57:15 -07:00
Weves
20b6369eea Add ability to respond with error message in slack thread 2023-08-26 15:47:01 -07:00
Yuhong Sun
a2ec1e2cda Vespa Deployment (#330)
Large Change!
2023-08-26 15:35:19 -07:00
Chris Weaver
642862bede Make public credentials accessible by all admins (#337) 2023-08-25 17:06:38 -07:00
Weves
b27107c184 Add user management page 2023-08-25 12:25:39 -07:00
Weves
8cda11c701 Respond to slack messages with file attachments 2023-08-25 11:12:26 -07:00
Yuhong Sun
384bf1befe Warm up models before first document indexed (#333) 2023-08-24 20:01:50 -07:00
Yuhong Sun
cb13f5b18b Fix non-json model output processing (#332) 2023-08-24 19:43:58 -07:00
Chris Weaver
6897416fe6 Support service accounts for Google Drive connector (#325) 2023-08-24 14:50:05 -07:00
Chris Weaver
8976ed3bcd Fix mypy (#331) 2023-08-24 12:23:04 -07:00
Yuhong Sun
81d2226b5f Minor Vespa Updates (#329) 2023-08-24 09:01:15 -07:00
Yuhong Sun
8159fdcdce Add Vespa and rework Document Indices (#317) 2023-08-24 08:46:28 -07:00
Weves
a2d3a3f116 Fix image build by pinning safetensors version 2023-08-23 22:23:45 -07:00
Weves
7836e91a20 Add extra logging in the case of null document_id 2023-08-23 18:16:19 -07:00
Chris Weaver
e307275774 Add support for multiple indexing workers (#322) 2023-08-22 18:11:31 -07:00
Weves
3ea205279f Fix bug where slack bot would tag users / everyone 2023-08-22 16:15:47 -07:00
Weves
e5352b6af8 Fix issue with Confluence errors not being ignored 2023-08-21 09:14:21 -07:00
Chris Weaver
9f1898c384 Add basic chain of thought PromptProcessor(#316) 2023-08-20 18:48:24 -07:00
Ikko Eltociear Ashimine
3ec602b47f Fix typo in indexing_pipeline.py (#318)
pipline -> pipeline
2023-08-20 09:15:23 -07:00
Weves
ab905e9fe6 Include channel name in slack bot logs 2023-08-19 22:23:57 -07:00
Chris Weaver
067503bc84 Background logs to stdout (#315) 2023-08-18 19:02:32 -07:00
Chris Weaver
f541a3ee85 Continue on some connector failures (#314) 2023-08-18 17:59:33 -07:00
Weves
70d7ca5c73 Better error for missing allowed_users / allowed_groups 2023-08-18 10:21:27 -07:00
Chris Weaver
bf4b63de19 Linear connector (#312) 2023-08-17 15:17:57 -07:00
Chris Weaver
f37ac76d3c Stop using untyped dicts to represent quotes (#310) 2023-08-17 14:53:55 -07:00
Sid Ravinutala
81a4934bb8 Google drive shared files fix + shortcuts (#300)
Also fixes foreign key constraint issue when manually wiping postgres + keeps track of accessed folders
2023-08-17 08:54:00 -07:00
Weves
11c071da33 Fix document display for docs with identical semantic IDs 2023-08-16 13:00:24 -07:00
Weves
0aa04ad616 Add chunk level logging when indexing 2023-08-15 18:43:46 -07:00
Weves
820f8b7b48 Add document-level logging for each batch of indexed documents 2023-08-15 18:09:39 -07:00
Weves
8fc74a4313 Fix slack pagination 2023-08-15 17:58:36 -07:00
Weves
78b49f546c Make LOG_LEVEL work for .dev docker compose deployment 2023-08-15 16:19:03 -07:00
Weves
a6e08b42e2 Improve slack connector logging 2023-08-15 16:19:03 -07:00
Yuhong Sun
c845a91eb0 Fix UI link to Zulip docs (#304) 2023-08-15 02:03:27 -07:00
Yuhong Sun
620280db92 Fix formatting according to precommit hooks (#303) 2023-08-15 01:32:09 -07:00
Yuhong Sun
b73d19f35f Fix Azure OpenAI Docker Deployment (#302) 2023-08-15 01:06:23 -07:00
Chris Weaver
a905373c83 Fix typing for Zulip connector (#298) 2023-08-14 16:56:50 -07:00
Yuhong Sun
e97c1226d8 Recolor Zulip Logo (#297) 2023-08-14 15:55:11 -07:00
Michał Flak
286445f9ba Zulip connector (#247)
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-08-14 15:29:34 -07:00
Yuhong Sun
848e5653a9 More permissive quote matching (#295) 2023-08-14 15:03:21 -07:00
Yuhong Sun
59db40cf36 Add Azure OpenAI parameters to background job for Slackbot (#294) 2023-08-14 14:37:02 -07:00
Matthew Holland
204d89a148 Azure OpenAI integration (#293) 2023-08-14 14:30:44 -07:00
Yuhong Sun
bb58dce1c5 Default Empty Timeout Value breaks Docker Compose (#292) 2023-08-14 10:22:31 -07:00
Weves
e0cbd087f7 Fix count of docs for connector failure 2023-08-13 17:31:55 -07:00
Yuhong Sun
be318433e3 Reset the Default GenAI model choice to OpenAI (#288) 2023-08-13 16:23:55 -07:00
Weves
67fd244e66 Make docs show up immediately rather than wait until first answer token 2023-08-13 16:11:02 -07:00
jabdoa2
a73ea23e2c add simple local llm (#202)
A very simple local llm. Not as good as OpenAI but works as a drop-in replacement for on premise deployments.

---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-08-13 15:54:59 -07:00
James Choncholas
758015baa5 Allow setting QA_TIMEOUT from env var (#258)
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-08-13 10:37:32 -07:00
Yuhong Sun
b1bd0b42e5 Add Blog Link (#286) 2023-08-12 18:29:45 -07:00
Weves
ecb26ddaf7 Add polling range for updates 2023-08-12 18:04:48 -07:00
Yuhong Sun
bcca8daab1 Fix misleading comment about HuggingFace (#284) 2023-08-12 17:40:03 -07:00
Weves
156ccc15a8 Fix fetching of latest index attempt 2023-08-12 17:18:09 -07:00
Weves
95f52a26df Fix error message popup z-index 2023-08-12 17:13:15 -07:00
Chris Weaver
ec478d97fb Better display of connector metadata on main status page (#280) 2023-08-12 17:03:20 -07:00
Weves
0381715fdd Add 'calculating rate' message 2023-08-12 15:27:28 -07:00
Chris Weaver
d5bb10b61f Improve indexing status display (#278)
Adds:
- actual error message in UI for indexing failure
- if a connector is disabled, stops indexing immediately (after the current batch of documents) to allow for deletion
- adds num docs indexed for the current run + a speed
2023-08-12 14:49:04 -07:00
Yuhong Sun
bca63e5a76 Do not stream Quote when using freeform prompt (#277) 2023-08-12 14:26:44 -07:00
Weves
54ee323e59 Fix duplicate documents with Slack connector 2023-08-10 10:54:12 -07:00
Yuhong Sun
a03818e6f6 Fix Google Colab Demo (#275) 2023-08-09 01:35:48 -07:00
Chris Weaver
89f71ac335 Support deletion of documents when a connector is deleted (#271) 2023-08-09 00:53:42 -07:00
Yuhong Sun
b6dec6dcdb Standardize model config naming (#274) 2023-08-08 00:18:13 -07:00
Yuhong Sun
02c3139bc9 Add Request Model Class for Google Colab Demo (#273)
Need to add the blog links later
2023-08-08 00:09:11 -07:00
Sid Ravinutala
ca72027b28 Allow slack channels to be specified (#238)
Adds the capability to specify specific channels to index when using the Slack connector
2023-08-07 22:09:27 -07:00
Yuhong Sun
3bfc72484d Support for Request accessed GenAI Models (#270) 2023-08-06 18:31:47 -07:00
Pratik Kabra
0e667d3384 Huggingface Inference backend internal models (#265) 2023-08-05 11:33:19 -07:00
Weves
df62648bbf Increase timeout for answer generation for slack bot 2023-08-04 18:14:52 -07:00
Weves
70a379b601 Set OAuth type to google by default 2023-08-01 11:03:13 -07:00
Chris Weaver
132a9f750d Add Github Action to run mypy / reorder-python-imports / black on all PRs (#251)
Also fixes import ordering (previously, local imports weren't grouped together as they should have been)
2023-07-29 16:53:38 -07:00
Yuhong Sun
87fe6f7575 Add ingestion metrics (#256) 2023-07-29 16:37:22 -07:00
Yuhong Sun
eec4e21bad Update README.md 2023-07-29 14:16:54 -07:00
Yuhong Sun
fe40e72b5c Require Semantic Identifier to not be None (#255) 2023-07-29 14:12:30 -07:00
jabdoa2
63780113d3 Add support for openid connect (#206)
This allow using Danswer in typical (non-google) enterprise environments.

* Access Tokens can be very large. A token without claims is already 1100 bytes for me (larger than allowed in danswer by default). With roles I got a 12kB token. For that reason I changed the field to TEXT in the database.
* Danswer used to swallow most errors when OIDC would fail. Nodejs forwards a request to the backend and swallows all errors. Even within the backend we catched all ValueErrors and only returned the last exception with the request. Added full stack trace logging to allow debugging issues with userinfo and other endpoints.
* Allow changing name of the login provider on the login button.
* Changed variables and URLs to generic OAUTH_XX (without google in the name) but kept compatibility with the existing google integration
* Tested again Keycloak with OpenID Connect

Next steps:
* Claim to role mappings
* Auto login/SSO (Login button is just an extra click)
2023-07-29 14:04:32 -07:00
jabdoa2
878d4e367f prevent crash when semantic_identifier is None (#201)
This is a workaround around intermittent issues where sementic_identifier becomes None for some reason. It usually recovers when documents are rescraped.

Obviously, we do not yet understand the issue and are interested in a better solution.
2023-07-29 12:37:02 -07:00
Yuhong Sun
17e2008027 Add TODOs and minor style changes to web connector (#254) 2023-07-29 12:35:38 -07:00
jabdoa2
0d7d54fddb Improve Web Connector Output, Add Config Options and add OAuth Backend Flow (#199) 2023-07-29 12:21:23 -07:00
cqian-github
b6b549357f Update Contributing.md with Windows Commands (#252)
Co-authored-by: AD\cqian <cqian@ucsd.edu>
2023-07-28 19:03:25 -07:00
Chris Weaver
3e8f5fa47e Fix a few bugs with Google Drive polling (#250)
- Adds some offset to the `start` for the Google Drive connector to give time for `modifiedTime` to propagate so we don't miss updates
- Moves fetching folders into a separate call since folder `modifiedTime` doesn't get updated when a file in the folder is updated
- Uses `connector_credential_pair.last_successful_index_time` instead of `updated_at` to determine the `start` for poll connectors
2023-07-28 18:27:32 -07:00
lokeshwar lakhineni
62afbcb178 added shell command for windows (#194) 2023-07-28 18:02:27 -07:00
Yuhong Sun
55adde5e27 Fix import location and mypy issue (#249) 2023-07-28 16:06:25 -07:00
Yuhong Sun
2a339ec34b Prevent too many tokens to GPT (#245) 2023-07-28 16:00:26 -07:00
Weves
d53ce3bda1 Fix arg to GuruIcon 2023-07-28 14:37:43 -07:00
Yuhong Sun
d03ac44744 Guru Connector (#177)
Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-07-28 14:27:02 -07:00
Weves
555f8bbf08 Allow shared files for drive connector 2023-07-27 17:20:34 -07:00
Yuhong Sun
4d0732395d Standalone Script to Test OpenAI API Key (#243) 2023-07-27 16:33:04 -07:00
Yuhong Sun
2a0d3b38e9 Google Drive Connector Debug Logging (#241) 2023-07-27 09:27:57 -07:00
Chris Weaver
3b546ba1c3 Make Google Drive connectors editable (#237) 2023-07-26 22:20:12 -07:00
Weves
9e6467a0c9 Fix specifying folders for Google Drive connector 2023-07-26 21:39:31 -07:00
meherhendi
1a22666810 Adding vscode run & debug config (#216)
Also adds `.env` to `.gitignore` files outside of the `deployment` dir
2023-07-26 12:35:31 -07:00
Weves
d5f172c292 Handle google drive connectors without folder_path 2023-07-26 12:15:06 -07:00
Yuhong Sun
273802eff0 Disable Gpt4all due to mac not supporting it currently (#233) 2023-07-25 22:19:15 -07:00
Yuhong Sun
e019db0bc7 Indexing Job has timezone discrepancy with DB making Poll timeframes incorrect (#231) 2023-07-23 21:59:00 -07:00
Yuhong Sun
59f27e83bf Merge pull request #227 from IDinsight/docx-googledrive
Added support for docx in Google Drive
2023-07-23 21:33:41 -07:00
Sid Ravinutala
d6d3d5291b added docx2txt 2023-07-24 01:42:39 +00:00
Sid Ravinutala
a4b47e0243 added support for docx in gdrive
rebase from main
2023-07-24 01:41:35 +00:00
Yuhong Sun
d6ca865034 Support GPT4All in memory (#230) 2023-07-23 12:26:14 -07:00
Weves
6684f1e5d5 Use approved icon colors 2023-07-22 19:22:05 -07:00
Chris Weaver
dd084d40f6 Product board connector (#228)
Also fixes misc mypy issues across the repo
2023-07-22 13:00:51 -07:00
Yuhong Sun
25a028c4a7 Merge pull request #195 from pkabra/notion-connector
Notion connector
2023-07-21 00:04:12 -07:00
Pratik Kabra
b33c8b1d7c Reorg public-private functions 2023-07-20 18:04:48 -05:00
Pratik Kabra
610fe6ebc4 Prettier fixes for web 2023-07-20 18:02:41 -05:00
Pratik Kabra
7ad98480be Black fixes for python files 2023-07-20 18:01:23 -05:00
Pratik Kabra
ab3bb13493 Fix notion titles missing in some cases 2023-07-20 17:58:09 -05:00
Yuhong Sun
0708002953 Check for Credential delete before running queued index attempt (#221) 2023-07-19 23:52:48 -07:00
Yuhong Sun
191c166ab6 Merge pull request #200 from jabdoa2/do_not_crash_when_deleting_source
catch crash when deleting a datasource
2023-07-19 23:46:14 -07:00
Chris Weaver
4958962855 Merge pull request #208 from chrisedington/ce/slack-archive-fix
Fix: Don't include archived Slack channels
2023-07-19 21:47:25 -07:00
Yuhong Sun
c41421ccf4 Add model caching to docker compose prod (#219) 2023-07-19 20:01:23 -07:00
Yuhong Sun
aed88e8b9e Merge pull request #198 from jabdoa2/cache_models_for_development
cache models for faster development cycles in docker compose
2023-07-19 19:57:06 -07:00
Yuhong Sun
f7be76dab3 Prod Template Typesense API Key Typo (#209) 2023-07-18 15:14:35 -07:00
Chris Edington
dac2fdc163 Fix: Don't include archived Slack channels, as they cannot be called on conversations.join API 2023-07-18 22:04:30 +02:00
Weves
ab37b8e8ea Fix deletion of web/slack/jira/confluence/... connector when creating a new one 2023-07-18 12:06:49 -07:00
Jan Kantert
7290f1893d catch crash when deleting a datasource
Danswer background crashes when the index task for a deleted source is still in the task queue. Without this is won't recover without manual database cleanup.
2023-07-18 13:42:16 +02:00
Jan Kantert
b3ebda714d cache models for faster development cycles in docker compose 2023-07-18 13:31:26 +02:00
Pratik Kabra
af921fb179 Add some more docstrings 2023-07-17 20:06:43 -05:00
Pratik Kabra
2a42d2df9c Notion connector frontend 2023-07-17 20:06:43 -05:00
Pratik Kabra
4c263b7130 Notion connector backend 2023-07-17 20:06:43 -05:00
Chris Weaver
3b1a8274a9 Allow specification of specific google drive folders to index (#197) 2023-07-17 14:51:16 -07:00
Chris Weaver
bc24ac53c0 Make init-letsencrypt.sh use the same stack name as docs (#192) 2023-07-16 16:49:13 -07:00
Yuhong Sun
60f05284f5 Docker compose debug option (#193) 2023-07-16 16:35:41 -07:00
Chris Weaver
676538da61 Better error message on GPT failures (#187)
* Better error message on GPT-call failures

* Add support for disabling Generative AI
2023-07-16 16:25:33 -07:00
Yuhong Sun
6c584f0650 Option for Enabling/Disabling User Auth for Docker Compose Dev (#191) 2023-07-16 15:54:55 -07:00
Yuhong Sun
554f6f3fe7 Combine Images Cleanup (#188) 2023-07-16 15:31:52 -07:00
Weves
d1003b913b Sync package-lock.json with package.json 2023-07-16 15:19:03 -07:00
Chris Weaver
9252807a51 Specify specific prettier version (#186)
* Add explicit prettier version

* Update CONTRIBUTING.md

* Add .prettierrc.json file to ensure we always use es5 trailing commas
2023-07-16 12:52:28 -07:00
Yuhong Sun
4b699fdab3 Better Logging (#184) 2023-07-16 01:41:48 -07:00
Yuhong Sun
3436b864a3 Fix missing Import (#183) 2023-07-15 18:11:24 -07:00
Yuhong Sun
1c042a8e95 Update README.md 2023-07-15 16:23:23 -07:00
Yuhong Sun
5657c5b799 Update CONTRIBUTING.md 2023-07-15 16:20:04 -07:00
Yuhong Sun
c5c1b01a4e Update README.md 2023-07-15 16:16:22 -07:00
Yuhong Sun
cdd097a4bb connectors README (#182) 2023-07-15 16:15:18 -07:00
Yuhong Sun
dbca4a7de7 Update CONTRIBUTING.md 2023-07-15 11:57:53 -07:00
Yuhong Sun
20589d8d78 Merge pull request #173 from ssddanbrown/merge_images
Merged background and api-server images
2023-07-15 11:29:48 -07:00
Dan Brown
e3a4614bfe Updated k8s and prod compose setups to work with merged images 2023-07-15 18:54:45 +01:00
Yuhong Sun
e4820045f9 Add metadata to GPT (#140) 2023-07-14 16:54:42 -07:00
Yuhong Sun
8928d61492 Merge pull request #179 from IDinsight/contributing_updates
Added missing instructions to CONTRIBUTING.md
2023-07-14 12:38:13 -07:00
Sid Ravinutala
ac2f040cef added missing instructions 2023-07-14 15:49:42 +00:00
Chris Weaver
33463b45e8 Fix issue with web connector for pages not ending with / (#176) 2023-07-13 22:30:10 -07:00
Dan Brown
f27364a442 Merged background and api-server images 2023-07-13 23:59:22 +01:00
Yuhong Sun
c6bcd5e1aa Add Contributing.md (#167) 2023-07-12 19:29:13 -07:00
Yuhong Sun
fec484d4de Merge pull request #164 from eltociear/patch-1
Fix typo in GoogleDriveCard.tsx
2023-07-12 08:59:19 -07:00
Ikko Eltociear Ashimine
741bf508ac Fix typo in GoogleDriveCard.tsx
recieved -> received
2023-07-13 00:40:02 +09:00
Chris Weaver
3889e01d86 Control streaming vs non-streaming on frontend with env variable (#162) 2023-07-12 01:12:42 -07:00
Yuhong Sun
d53ec8a905 DAN-169 Users whitelist (#153) 2023-07-11 21:23:35 -07:00
Yuhong Sun
c2fa3d5074 Fix Github Actions (#151) 2023-07-08 17:28:37 -07:00
Chris Weaver
d135bc7efa Merge pull request #139 from ssddanbrown/bookstack_connector
BookStack connector
2023-07-08 17:18:59 -07:00
Yuhong Sun
452a9f0ad6 DAN-168 Build Push Docker Images on Tag (#150) 2023-07-08 16:31:59 -07:00
Yuhong Sun
367330d27a DAN-165 Option to pull image from hub (#149) 2023-07-08 15:53:21 -07:00
Weves
3b64d62896 Make the Google Drive connector pull rather than pull everything 2023-07-08 13:26:47 -07:00
Weves
e55c23ad6f Make slab token a PW input 2023-07-08 12:40:16 -07:00
Weves
3494d6a13a Replace IDs with names in Slack connector 2023-07-07 18:10:19 -07:00
Yuhong Sun
79013ac9fd DAN-164 Background slack job to give up after 5 tries
also minor docker compose change
2023-07-07 17:19:24 -07:00
Chris Weaver
b4759403ac Adjust slack bot (#144)
* Add handling for cases where an answer is not found

* Make danswer bot slightly more configurable

* Don't respond to messages in thread + add better formatting for slack messages
2023-07-07 09:56:01 -07:00
Yuhong Sun
ef48fef62b Tiny fix for certain doc names (#143) 2023-07-07 00:26:15 -07:00
Weves
7874862902 Proper slack message batching 2023-07-06 21:33:33 -07:00
Weves
6978573a07 Make confluence connector use polling 2023-07-06 20:24:09 -07:00
Dan Brown
148d9c358f Fixed incorrect active panel in BookStack connector 2023-07-06 17:24:04 +01:00
Dan Brown
019e474a4e BookStack connector: Changed to use id-based document ids 2023-07-06 17:04:31 +01:00
Dan Brown
104a248b11 Cleaned up bookstack connector admin panel
Also fixed ESLint issues
2023-07-06 16:35:21 +01:00
Dan Brown
f587161577 Added bookstack to filters, changed inputType 2023-07-06 16:02:53 +01:00
Dan Brown
44f905ef80 Added BookStack connector code
Got to the point of working sync for shelves, books, chapters and pages.
2023-07-06 14:56:28 +01:00
Dan Brown
bfde5fd809 Got basic bookstack connector setup UI/backend working 2023-07-06 10:50:27 +01:00
Weves
7f222f376d Fix Jira connector page description 2023-07-04 14:15:38 -07:00
Weves
07fd7246d4 Fix link 2023-07-03 17:37:03 -07:00
Weves
967f9294f7 Fix duplicated docs for non-quoted docs 2023-07-03 17:34:43 -07:00
Weves
eb5eb003e2 Add slab filter 2023-07-03 16:50:59 -07:00
Chris Weaver
10b36f4ce9 Slab connector UI (#130)
Also added in missing dateutil req
2023-07-03 15:57:22 -07:00
Yuhong Sun
675a5aec9e DAN-158 Slab Connector (#129)
No support for comments or topics
2023-07-03 14:27:23 -07:00
Chris Weaver
2f54795631 Basic Slack Bot Support (#128) 2023-07-03 14:26:33 -07:00
Yuhong Sun
381b3719c9 DAN-139 web connector recursion (#126)
Now handled by checking final page after redirects
2023-07-02 19:01:13 -07:00
Chris Weaver
24e61a646d Updating env file setup (#125)
* Updating env file setup
* Update qdrant version
2023-07-01 15:16:51 -07:00
Marcel
ab83f5d17f Disable qdrant telemetry by default (#121)
* Disable qdrant telemetry by default

Signed-off-by: Marcel Coetzee <marcel@mooncoon.com>

* Add K8s config

Signed-off-by: Marcel Coetzee <marcel@mooncoon.com>

---------

Signed-off-by: Marcel Coetzee <marcel@mooncoon.com>
2023-06-30 20:53:59 -07:00
Weves
af329d31fb Cancelling searches when submitting a new one, no longer truncating at 7 docs, showing a warning message when no quotes are found 2023-06-30 10:10:54 -07:00
Weves
cb59e77278 Fix weird typesense search behavior 2023-06-29 18:02:28 -07:00
Weves
858b0582aa Fix slack message filtering 2023-06-29 17:54:47 -07:00
Weves
8f2f63bbec Update semver 2023-06-28 08:43:40 -07:00
Yuhong Sun
930d872ff0 Update README.md 2023-06-28 08:21:03 -07:00
Steven Pousty
d0cede2c3f Minor typo 2023-06-26 23:18:45 -06:00
Yuhong Sun
e6a5b7c731 Update Discord Link 2023-06-25 11:12:37 -07:00
Yuhong Sun
03006743ab DAN-118 Jira connector (#102)
* Small confluence page QoL changes

* Prevent getting into a bad state with orphan connectors for Jira / Confluence

* Jira connector + admin page
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-06-24 18:48:38 -06:00
Yuhong Sun
3701239283 DAN-141 Confluence Poll Connector (#114) 2023-06-24 00:01:09 -07:00
Yuhong Sun
34861013f8 DAN-142 OpenAI key once a day (#113) 2023-06-24 00:00:43 -07:00
Yuhong Sun
2fb2c40851 DAN-145 Danswer save state (#115) 2023-06-24 00:00:06 -07:00
Yuhong Sun
4bddafe297 DAN-136 Fix incorrect num of github docs (#112) 2023-06-22 01:19:55 -07:00
Weves
0cd18947ec fix form refresh 2023-06-22 00:39:36 -06:00
Chris Weaver
785d289c68 Fix handling for QA when documents are split into multiple mini-chunks (#110) 2023-06-21 22:46:04 -06:00
Chris Weaver
5a04df7eb0 Fix typesense search when auth is on (#108) 2023-06-20 20:27:09 -07:00
Weves
620579cbec Fix connection pooling 2023-06-19 14:45:26 -06:00
Weves
490d39f081 Removing deprecated field 2023-06-19 14:45:26 -06:00
Weves
3863ee3ce1 Force users to provide access token before creating connector for Github/Confluence 2023-06-19 12:31:12 -06:00
Weves
1d9a9a60c8 Fix bug with revoking of credentials + adding new credential + adding back connector for google drive 2023-06-19 12:31:12 -06:00
Weves
15543feac1 Fix missing sources when auth is on 2023-06-19 09:31:16 -06:00
Yuhong Sun
8ba739b4d2 Update README.md 2023-06-18 01:43:23 -07:00
Yuhong Sun
7101b1ed03 Update README.md 2023-06-18 01:38:07 -07:00
Weves
9357ba3f39 Fix docker compose for typesense 2023-06-17 22:16:56 -06:00
Weves
88399a5d7f Fix certbot 2023-06-17 17:39:50 -06:00
Yuhong Sun
02f79c3357 Enable typo search typesense (#101) 2023-06-15 22:46:43 -07:00
Yuhong Sun
6fe54a4eed DAN-115 Document Polling (#91)
Includes updated document counting for polling
2023-06-15 21:07:05 -07:00
Yuhong Sun
97b9b56b03 Update README.md 2023-06-14 00:41:26 -07:00
Yuhong Sun
590fbbc253 DAN-120 Kubernetes (#98)
Sample Kubernetes deployment with Auth default on
Also includes a bugfix for credentialed connectors with Auth turned on
2023-06-14 00:11:25 -07:00
Weves
329d0640eb Fix page crash when backend is down 2023-06-13 23:28:28 -07:00
Yuhong Sun
df79214fd6 fixed 2023-06-13 14:00:47 -07:00
Chris Weaver
b4f340b8bd Adding more details to file connector description (#97)
* Adding more details to file connector description
2023-06-12 00:53:19 -07:00
Yuhong Sun
71c1b75a02 Update README.md 2023-06-11 23:35:31 -07:00
Chris Weaver
8f5b9c0bcd Danswer assistant (#96)
Add helper!
2023-06-11 17:54:41 -07:00
Yuhong Sun
2bfbf037ee Set a minimum distance angle cutoff (#95) 2023-06-11 17:36:05 -07:00
Chris Weaver
f20563c9bc File connector (#93)
* Initial backend changes for file connector

* Add another background job to clean up files

* UI + tweaks for backend
2023-06-09 21:28:50 -07:00
Yuhong Sun
f10ece4411 Danswer Helper QA Flow Backend (#90) 2023-06-09 17:48:17 -07:00
Weves
1facd58938 Fix bug with get_connector_indexing_status 2023-06-09 01:33:03 -07:00
Yuhong Sun
7c97cc4626 DAN-55 Intent Model (#89)
Includes:
- Intent Model
- Heuristic Classifications
- GPT self error classification
- Bugfix on finding end of answer stream
2023-06-07 15:27:06 -07:00
Chris Weaver
0f1f16880a Update README.md
Fix small typo
2023-06-06 08:31:23 -07:00
Chris Weaver
e0ebdc2fc1 Keyword search (#88)
* Add keyword search support

* Fix filters display

* Make documents appear immediately
2023-06-05 22:25:15 -07:00
Chris Weaver
e202aa440e Make filters only display in-use connectors (#87) 2023-06-05 22:13:36 -07:00
Chris Weaver
711e66184e Add Filters to UI (#86)
* Adding filters

* Fix get_connector_indexing_status endpoint bug
2023-06-05 00:41:48 -07:00
Yuhong Sun
c4e8afe4d2 DAN-81 Improve search round 2 (#82)
Includes:
- Multi vector indexing/search
- Ensemble model reranking
- Keyword Search backend
2023-06-04 20:02:32 -07:00
Chris Weaver
7cc64efc3a Enable non-admin credentials + add page for google drive (#84)
* Enable non-admin credentials + add page for google drive

* Return one indexing status entry for each connector / credential pair

* Remove some logs

* Small fixes

* Sort index status by source
2023-06-04 11:26:50 -07:00
Yuhong Sun
8c9b3079aa Delete conflicting records from IndexAttempt on upgrade/downgrade (#83) 2023-05-31 08:54:09 -07:00
Yuhong Sun
bae83bc101 Update README.md 2023-05-30 21:15:42 -07:00
Yuhong Sun
6891e4f198 Standardize connectors + permissioning + new frontend for admin pages + small fixes / improvements (#75)
Introducing permissioning, standardize onboarding for connectors, re-make the data model for connectors / credentials / index-attempts, making all environment variables optional, a bunch of small fixes + improvements.

Co-authored-by: Weves <chrisweaver101@gmail.com>
2023-05-30 19:59:57 -07:00
Weves
b05bf963bf Add confluence connector page 2023-05-23 13:33:12 -07:00
Weves
49804dcc44 Fix dev web server setup 2023-05-22 14:11:27 -07:00
Weves
f4ef92e279 Fix OpenAI key validation 2023-05-22 13:42:20 -07:00
Yuhong Sun
8e9e284849 Fix Pull vs Poll naming (#77) 2023-05-22 11:29:50 -07:00
Chris Weaver
0c4dcb13c3 Small fix for indexing (#78) 2023-05-22 11:29:25 -07:00
Weves
dd79b9bf79 Fix quote loading 2023-05-21 20:28:35 -07:00
Weves
806653dcb0 Add timeout option to OpenAI models 2023-05-21 14:31:15 -07:00
Yuhong Sun
62e86efec3 Update models.py 2023-05-21 14:10:23 -07:00
Yuhong Sun
6d7e7d5b71 DAN-19 Confluence Connector Backend for Public Docs (#73)
By public we mean if there is an admin account with an API key that has it setup in Danswer. Means just no support for OAuth for individual users to add docs.
2023-05-21 13:27:37 -07:00
Yuhong Sun
7559ba6e9d DAN-93 Standardize Connectors (#70) 2023-05-21 13:24:25 -07:00
Weves
51e05e3948 Just display docs if QA fails 2023-05-21 11:43:21 -07:00
Weves
0b8c69ceeb Prompt user for OpenAI key 2023-05-20 21:11:07 -07:00
Weves
544ba8f50d initial health check 2023-05-20 12:21:56 -07:00
Yuhong Sun
16dd429826 DAN-91 Fix Web Connector Bugs (#68)
Added pdf support
2023-05-19 17:39:13 -07:00
Chris Weaver
0b46ea76e8 Don't create collection if it already exists + fix OpenAI API Key name (#66)
* Don't create collection if it already exists

* Fix openai api key name
2023-05-17 17:12:00 -07:00
Weves
8685beceb2 Add create collection to app startup 2023-05-17 16:16:09 -07:00
Weves
bdebb9d441 Fix dev setup 2023-05-17 16:15:44 -07:00
Weves
25d12ea604 Temporarily pass .env file to web_server to fix DISABLE_AUTH issues 2023-05-17 15:26:07 -07:00
Weves
784358e35d Remove TODO 2023-05-17 15:22:26 -07:00
Weves
10e1c387e4 Small style changes 2023-05-17 15:19:55 -07:00
Yuhong Sun
eef2788606 Update README.md 2023-05-17 09:27:22 -07:00
Weves
5f7d2853b8 Fix connectors 2023-05-16 22:45:32 -07:00
Weves
5ff8924a90 Fix connector button for noauth 2023-05-16 22:36:14 -07:00
Yuhong Sun
d72acd65d8 Update README.md 2023-05-16 21:27:54 -07:00
Weves
33a6b0c0ab Fix noauth FE 2023-05-16 21:26:28 -07:00
Weves
d447b66039 Add support for DISABLE_AUTH on the FE 2023-05-16 21:21:44 -07:00
Weves
16ca6f760c Switch over favicon 2023-05-16 21:07:06 -07:00
Yuhong Sun
795828180e DAN-87 Need to warn user against using docker compose down (#52) 2023-05-16 20:25:16 -07:00
Chris Weaver
494514dc68 Quote loading UI + adding back period to end of answer + adding custom logo (#55)
* Logo

* Add spinners + some small housekeeping on the backend
2023-05-16 20:14:06 -07:00
Weves
821df50fa9 Make slack periodic use the DB 2023-05-16 17:56:15 -07:00
Weves
5ce5077833 Fix doc count 2023-05-16 17:00:22 -07:00
Weves
17ed660166 Add general status page + standardize the experience a bit 2023-05-16 01:19:27 -07:00
Yuhong Sun
0d9595733b DAN-80 Example Env files (#48)
Also added alembic migrations running automatically
2023-05-16 01:18:08 -07:00
Weves
d76dbce09b Add Google Drive admin page 2023-05-15 17:41:37 -07:00
Yuhong Sun
ebf9459ae8 Update README.md 2023-05-15 12:55:52 -07:00
Yuhong Sun
266f75be24 Update README.md 2023-05-15 12:54:01 -07:00
Yuhong Sun
17544e5b40 DAN-86 Github Connector should not return API page (#49) 2023-05-15 11:19:24 -07:00
Weves
5c98310b79 Add Github admin page + adjust way index APIs work 2023-05-14 12:45:47 -07:00
Yuhong Sun
3d1fffb38b Update README.md 2023-05-14 01:26:26 -07:00
Yuhong Sun
20884bbb47 DAN-79 Update Logo Colors to Blue Red 2023-05-14 00:10:47 -07:00
Yuhong Sun
5cb6cdb152 Update Logo Colors 2023-05-14 00:07:09 -07:00
Yuhong Sun
dc4fc02ba5 DAN-60 Add streaming for chat model (#46) 2023-05-13 23:05:06 -07:00
Yuhong Sun
17bc0f89ff DAN-56 Make google drive connector production ready (#45) 2023-05-13 23:04:16 -07:00
Weves
b2cde3e4bb Add redirects for unauthenticated users 2023-05-13 17:56:02 -07:00
Weves
c68220103d Split chunks up by newline 2023-05-13 17:08:53 -07:00
Weves
b825b39763 Add link to connectors page 2023-05-13 16:10:45 -07:00
Weves
288f43111e Use HTTP/1.1 with nginx for chunked transfers 2023-05-13 14:08:14 -07:00
Yuhong Sun
fb9c3e530b DAN-58 Email validation (#39) 2023-05-13 14:05:21 -07:00
Yuhong Sun
6ed86ed369 DAN-74 Admin routers need permission (#40) 2023-05-13 13:47:49 -07:00
Yuhong Sun
6d1b750077 DAN-59 Fix all the mypy issues (#38) 2023-05-12 20:37:52 -07:00
Yuhong Sun
5af35cf07c DAN-57 Make qa / admin endpoints permissioned optionally (#37) 2023-05-12 20:34:30 -07:00
Weves
090578f1f3 Fix backend Dockerfiles 2023-05-12 16:52:04 -07:00
Weves
8de65a6536 Add streaming 2023-05-12 16:35:42 -07:00
Yuhong Sun
ae2a1d3121 DAN-71 Give back all ranked results from search (#34) 2023-05-12 15:47:58 -07:00
Weves
66130c8845 Initial document display 2023-05-12 14:38:29 -07:00
Yuhong Sun
6028523198 Minor touchup 2023-05-11 23:08:53 -07:00
Chris Weaver
da8031c1aa Fix sync DB engine (#32) 2023-05-11 22:49:44 -07:00
Yuhong Sun
20b25e322f DAN-23 Stream model output (#30) 2023-05-11 22:49:26 -07:00
Weves
c6a0baed13 Initial login flow 2023-05-11 22:33:20 -07:00
Weves
560822a327 cleanup connector interface 2023-05-11 20:10:23 -07:00
Yuhong Sun
0b610502e0 DAN-54 Github PR Connector (#29)
also fixed some mypy stuff as well
2023-05-11 18:47:32 -07:00
Yuhong Sun
279c5e0eb1 DAN-50 References should include blurb (#26) 2023-05-10 21:03:15 -07:00
Yuhong Sun
38bcb3ee6b DAN-51 Model warm up on start (#27)
Also added a minor prompt update
2023-05-10 21:01:14 -07:00
Yuhong Sun
632a643b7a DAN-52 Check user role endpoint (#28) 2023-05-10 20:48:59 -07:00
Yuhong Sun
73d83b648e Make logo workable in dark mode 2023-05-10 18:21:09 -07:00
Chris Weaver
911fcf4dd1 Adding initial admin pages (#25)
* Adding sidebar to admin page + adding scaffolding for Web connector + a little styling

* Rename APIs

* Restyling
2023-05-10 17:45:39 -07:00
Yuhong Sun
25b59217ef DAN-25 Semantic Identifier for Documents (#24) 2023-05-09 22:46:45 -07:00
Yuhong Sun
6e59b02c91 DAN-47 Use only MIT Apache models (#23)
Update this repo license as well as model attribution
2023-05-09 21:01:57 -07:00
Yuhong Sun
312366eae1 DAN-45 Fix OAuth User Creation Flow (#22) 2023-05-09 20:45:43 -07:00
Yuhong Sun
e896d0786e DAN 17 QOL Model Output (#20) 2023-05-08 20:57:52 -07:00
Weves
e8bf6b0364 Fix time_updated default 😓 2023-05-08 12:57:51 -07:00
Yuhong Sun
df35a5352b Update Logo 2023-05-07 20:52:48 -07:00
Weves
ab36ebc332 Add QDrant to onebox setup 2023-05-07 17:36:29 -07:00
Yuhong Sun
9babe7fb95 DAN-40 Admin User Support (#18) 2023-05-07 16:30:49 -07:00
Weves
5aabc5abe6 Add arbitrary domain support + website hosting to onebox setup 2023-05-07 00:38:27 -07:00
Yuhong Sun
e20179048d DAN-5 OAuth Backend (#17)
Also added in an exception handler for logging
2023-05-06 23:47:21 -07:00
Weves
4f4c65acac Fix icons + add retries for indexing 2023-05-06 23:35:34 -07:00
Yuhong Sun
63f93594a3 DAN-3 Authentication and User Registration (#12)
also added mypy.ini
2023-05-05 17:25:24 -07:00
Chris Weaver
e2a949ebaf Setup Postgres to docker compose + add web indexing APIs + update background runner to look for web indices to run (#13)
* Adding Postgres to docker compose

* Model / migrations for indexing logs
2023-05-05 13:08:32 -07:00
Yuhong Sun
22b7f7e89f DAN-26 Enable GPT 4 through chat completion endpoint (#10)
Also touched up front page README which had a typo
2023-05-02 12:08:44 -07:00
Yuhong Sun
c00d37a7d7 DAN-21 Modularize QA model for easy swapping (#9) 2023-05-01 23:12:47 -07:00
Yuhong Sun
e7b901f292 DAN-2 Backend Support for Filters (#8)
Additionally added an __init__.py for mypy issue
2023-05-01 22:29:09 -07:00
Yuhong Sun
02a6677e21 DAN-1 Dedupe index (#6) 2023-05-01 18:11:16 -07:00
Weves
213f29fde5 Fix tiny icons for sources 2023-05-01 00:30:53 -07:00
Weves
4e96283a0b Style changes to the main app page 2023-04-30 01:12:08 -07:00
Chris Weaver
e390906ac1 Make Document source required (#4) 2023-04-29 16:49:27 -07:00
Yuhong Sun
f2d3d8269d Prompt Tuning and minor QOA changes (#2) 2023-04-29 15:22:16 -07:00
Chris Weaver
f1936fb755 Background update improvements (#3)
* Adding full indexing to background loading

* Add oldest/latest support for slack pull
2023-04-29 15:15:26 -07:00
Weves
ed8fe75dd3 Fix Dockerfiles 2023-04-29 13:50:49 -07:00
Yuhong Sun
2368166cd1 Update requirements default 2023-04-29 01:23:34 -07:00
Yuhong Sun
402b89b6ec Initial Commit 2023-04-28 22:40:46 -07:00
Yuhong Sun
751a8ea69e Update README.md 2023-04-28 21:54:35 -07:00
Yuhong Sun
49d0bb29ef Initial README 2023-04-27 20:25:39 -07:00
578 changed files with 64903 additions and 661 deletions

View File

@@ -0,0 +1,15 @@
name: Sweep Issue
title: 'Sweep: '
description: For small bugs, features, refactors, and tests to be handled by Sweep, an AI-powered junior developer.
labels: sweep
body:
- type: textarea
id: description
attributes:
label: Details
description: Tell Sweep where and what to edit and provide enough context for a new developer to the codebase
placeholder: |
Unit Tests: Write unit tests for <FILE>. Test each function in the file. Make sure to test edge cases.
Bugs: The bug might be in <FILE>. Here are the logs: ...
Features: the new endpoint should use the ... class from <FILE> because it contains ... logic.
Refactors: We are migrating this function to ... version because ...

View File

@@ -0,0 +1,42 @@
name: Build and Push Backend Image on Tag
on:
push:
tags:
- '*'
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Backend Image Docker Build and Push
uses: docker/build-push-action@v2
with:
context: ./backend
file: ./backend/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: |
danswer/danswer-backend:${{ github.ref_name }}
danswer/danswer-backend:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/danswer/danswer-backend:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'

View File

@@ -0,0 +1,42 @@
name: Build and Push Model Server Image on Tag
on:
push:
tags:
- '*'
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Model Server Image Docker Build and Push
uses: docker/build-push-action@v2
with:
context: ./backend
file: ./backend/Dockerfile.model_server
platforms: linux/amd64,linux/arm64
push: true
tags: |
danswer/danswer-model-server:${{ github.ref_name }}
danswer/danswer-model-server:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/danswer/danswer-model-server:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'

View File

@@ -0,0 +1,42 @@
name: Build and Push Web Image on Tag
on:
push:
tags:
- '*'
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Web Image Docker Build and Push
uses: docker/build-push-action@v2
with:
context: ./web
file: ./web/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: |
danswer/danswer-web-server:${{ github.ref_name }}
danswer/danswer-web-server:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/danswer/danswer-web-server:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'

32
.github/workflows/docker-tag-latest.yml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Tag Latest Version
on:
workflow_dispatch:
inputs:
version:
description: 'The version (ie v0.0.1) to tag as latest'
required: true
jobs:
tag:
runs-on: ubuntu-latest
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Enable Docker CLI experimental features
run: echo "DOCKER_CLI_EXPERIMENTAL=enabled" >> $GITHUB_ENV
- name: Pull, Tag and Push Web Server Image
run: |
docker buildx imagetools create -t danswer/danswer-web-server:latest danswer/danswer-web-server:${{ github.event.inputs.version }}
- name: Pull, Tag and Push API Server Image
run: |
docker buildx imagetools create -t danswer/danswer-backend:latest danswer/danswer-backend:${{ github.event.inputs.version }}

46
.github/workflows/pr-python-checks.yml vendored Normal file
View File

@@ -0,0 +1,46 @@
name: Python Checks
on:
pull_request:
branches: [ main ]
jobs:
mypy-check:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'
cache-dependency-path: |
backend/requirements/default.txt
backend/requirements/dev.txt
- run: |
python -m pip install --upgrade pip
pip install -r backend/requirements/default.txt
pip install -r backend/requirements/dev.txt
- name: Run MyPy
run: |
cd backend
mypy .
- name: Run ruff
run: |
cd backend
ruff .
- name: Check import order with reorder-python-imports
run: |
cd backend
find ./danswer -name "*.py" | xargs reorder-python-imports --py311-plus
- name: Check code formatting with Black
run: |
cd backend
black --check .

4
.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
.env
.DS_store
.venv
.mypy_cache

82
.vscode/launch.template.jsonc vendored Normal file
View File

@@ -0,0 +1,82 @@
/*
Copy this file into '.vscode/launch.json' or merge its
contents into your existing configurations.
*/
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "API Server",
"type": "python",
"request": "launch",
"module": "uvicorn",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"DISABLE_AUTH": "True",
"TYPESENSE_API_KEY": "typesense_api_key",
"DYNAMIC_CONFIG_DIR_PATH": "./dynamic_config_storage"
},
"args": [
"danswer.main:app",
"--reload",
"--port",
"8080"
]
},
{
"name": "Indexer",
"type": "python",
"request": "launch",
"program": "danswer/background/update.py",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONPATH": ".",
"TYPESENSE_API_KEY": "typesense_api_key",
"DYNAMIC_CONFIG_DIR_PATH": "./dynamic_config_storage"
}
},
{
"name": "Temp File Deletion",
"type": "python",
"request": "launch",
"program": "danswer/background/file_deletion.py",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONPATH": "${workspaceFolder}/backend"
}
},
// For the listner to access the Slack API,
// DANSWER_BOT_SLACK_APP_TOKEN & DANSWER_BOT_SLACK_BOT_TOKEN need to be set in .env file located in the root of the project
{
"name": "Slack Bot Listener",
"type": "python",
"request": "launch",
"program": "danswer/listeners/slack_listener.py",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.env",
"env": {
"LOG_LEVEL": "DEBUG"
}
},
{
"name": "Web Server",
"type": "node",
"request": "launch",
"cwd": "${workspaceRoot}/web",
"runtimeExecutable": "npm",
"runtimeArgs": [
"run", "dev"
],
"console": "integratedTerminal"
}
]
}

192
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,192 @@
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md"} -->
# Contributing to Danswer
Hey there! We are so excited that you're interested in Danswer.
As an open source project in a rapidly changing space, we welcome all contributions.
## 💃 Guidelines
### Contribution Opportunities
The [GitHub Issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
Issues that have been explicitly approved by the maintainers (aligned with the direction of the project)
will be marked with the `approved by maintainers` label.
Issues marked `good first issue` are an especially great place to start.
**Connectors** to other tools are another great place to contribute. For details on how, refer to this
[README.md](https://github.com/danswer-ai/danswer/blob/main/backend/danswer/connectors/README.md).
If you have a new/different contribution in mind, we'd love to hear about it!
Your input is vital to making sure that Danswer moves in the right direction.
Before starting on implementation, please raise a GitHub issue.
And always feel free to message us (Chris Weaver / Yuhong Sun) on
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w) /
[Discord](https://discord.gg/TDJ59cGV2X) directly about anything at all.
### Contributing Code
To contribute to this project, please follow the
["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
When opening a pull request, mention related issues and feel free to tag relevant maintainers.
Before creating a pull request please make sure that the new changes conform to the formatting and linting requirements.
See the [Formatting and Linting](#-formatting-and-linting) section for how to run these checks locally.
### Getting Help 🙋
Our goal is to make contributing as easy as possible. If you run into any issues please don't hesitate to reach out.
That way we can help future contributors and users can avoid the same issue.
We also have support channels and generally interesting discussions on our
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w)
and
[Discord](https://discord.gg/TDJ59cGV2X).
We would love to see you there!
## Get Started 🚀
Danswer being a fully functional app, relies on some external pieces of software, specifically:
- [Postgres](https://www.postgresql.org/) (Relational DB)
- [Vespa](https://vespa.ai/) (Vector DB/Search Engine)
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
development purposes but also feel free to just use the containers and update with local changes by providing the
`--build` flag.
### Local Set Up
It is recommended to use Python versions >= 3.11.
This guide skips setting up User Authentication for the purpose of simplicity
#### Installing Requirements
Currently, we use pip and recommend creating a virtual environment.
For convenience here's a command for it:
```bash
python -m venv .venv
source .venv/bin/activate
```
_For Windows activate via:_
```bash
.venv\Scripts\activate
```
Install the required python dependencies:
```bash
pip install -r danswer/backend/requirements/default.txt
pip install -r danswer/backend/requirements/dev.txt
```
Install [Node.js and npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) for the frontend.
Once the above is done, navigate to `danswer/web` run:
```bash
npm i
```
Install Playwright (required by the Web Connector)
> Note: If you have just done the pip install, open a new terminal and source the python virtual-env again.
This will update the path to include playwright
Then install Playwright by running:
```bash
playwright install
```
#### Dependent Docker Containers
First navigate to `danswer/deployment/docker_compose`, then start up Vespa and Postgres with:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d index relational_db
```
(index refers to Vespa and relational_db refers to Postgres)
#### Running Danswer
Setup a folder to store config. Navigate to `danswer/backend` and run:
```bash
mkdir dynamic_config_storage
```
To start the frontend, navigate to `danswer/web` and run:
```bash
npm run dev
```
Package the Vespa schema. This will only need to be done when the Vespa schema is updated locally.
Navigate to `danswer/backend/danswer/document_index/vespa/app_config` and run:
```bash
zip -r ../vespa-app.zip .
```
- Note: If you don't have the `zip` utility, you will need to install it prior to running the above
The first time running Danswer, you will also need to run the DB migrations for Postgres.
After the first time, this is no longer required unless the DB models change.
Navigate to `danswer/backend` and with the venv active, run:
```bash
alembic upgrade head
```
Next, start the task queue which orchestrates the background jobs.
Jobs that take more time are run async from the API server.
Still in `danswer/backend`, run:
```bash
python ./scripts/dev_run_background_jobs.py
```
To run the backend API server, navigate back to `danswer/backend` and run:
```bash
AUTH_TYPE=disabled \
DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage \
VESPA_DEPLOYMENT_ZIP=./danswer/document_index/vespa/vespa-app.zip \
uvicorn danswer.main:app --reload --port 8080
```
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
$env:AUTH_TYPE='disabled'
$env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'
$env:VESPA_DEPLOYMENT_ZIP='./danswer/document_index/vespa/vespa-app.zip'
uvicorn danswer.main:app --reload --port 8080
"
```
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
### Formatting and Linting
#### Backend
For the backend, you'll need to setup pre-commit hooks (black / reorder-python-imports).
First, install pre-commit (if you don't have it already) following the instructions
[here](https://pre-commit.com/#installation).
Then, from the `danswer/backend` directory, run:
```bash
pre-commit install
```
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we would like to keep it that way!
Right now, there is no automated type checking at the moment (coming soon), but we ask you to manually run it before
creating a pull requests with `python -m mypy .` from the `danswer/backend` directory.
#### Web
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `danswer/web` directory.
To run the formatter, use `npx prettier --write .` from the `danswer/web` directory.
Like `mypy`, we have no automated formatting yet (coming soon), but we request that, for now,
you run this manually before creating a pull request.
### Release Process
Danswer follows the semver versioning standard.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
As pre-1.0 software, even patch releases may contain breaking or non-backwards-compatible changes.

682
LICENSE
View File

@@ -1,661 +1,21 @@
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU Affero General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Remote Network Interaction; Use with the GNU General Public License.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.
MIT License
Copyright (c) 2023 Yuhong Sun, Chris Weaver
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

93
README.md Normal file
View File

@@ -0,0 +1,93 @@
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/README.md"} -->
<h2 align="center">
<a href="https://www.danswer.ai/"> <img width="50%" src="https://github.com/danswer-owners/danswer/blob/1fabd9372d66cd54238847197c33f091a724803b/DanswerWithName.png?raw=true)" /></a>
</h2>
<p align="center">
<p align="center">OpenSource Enterprise Question-Answering</p>
<p align="center">
<a href="https://docs.danswer.dev/" target="_blank">
<img src="https://img.shields.io/badge/docs-view-blue" alt="Documentation">
</a>
<a href="https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w" target="_blank">
<img src="https://img.shields.io/badge/slack-join-blue.svg?logo=slack" alt="Slack">
</a>
<a href="https://discord.gg/TDJ59cGV2X" target="_blank">
<img src="https://img.shields.io/badge/discord-join-blue.svg?logo=discord&logoColor=white" alt="Discord">
</a>
<a href="https://github.com/danswer-ai/danswer/blob/main/README.md" target="_blank">
<img src="https://img.shields.io/static/v1?label=license&message=MIT&color=blue" alt="License">
</a>
</p>
<strong>[Danswer](https://www.danswer.ai/)</strong> allows you to ask natural language questions against internal documents and get back reliable answers backed by quotes and references from the source material so that you can always trust what you get back. You can connect to a number of common tools such as Slack, GitHub, Confluence, amongst others.
<h3>Usage</h3>
Danswer provides a fully-featured web UI:
https://github.com/danswer-ai/danswer/assets/32520769/563be14c-9304-47b5-bf0a-9049c2b6f410
Or, if you prefer, you can plug Danswer into your existing Slack workflows (more integrations to come 😁):
https://github.com/danswer-ai/danswer/assets/25087905/3e19739b-d178-4371-9a38-011430bdec1b
For more details on the admin controls, check out our <strong><a href="https://www.youtube.com/watch?v=geNzY1nbCnU">Full Video Demo</a></strong>!
<h3>Deployment</h3>
Danswer can easily be tested locally or deployed on a virtual machine with a single `docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/danswer-ai/danswer/tree/main/deployment/kubernetes).
## 💃 Features
* Direct QA + Chat powered by Generative AI models with answers backed by quotes and source links.
* Intelligent Document Retrieval (Hybrid Search + Reranking) using the latest NLP models.
* Automatic time/source filter extraction from natural language + custom model to identify user intent.
* User authentication and document level access management.
* Support for LLMs of your choice (GPT-4, Mixstral, Llama2, etc.)
* Management Dashboards to manage connectors and set up features such as live update fetching.
* One line Docker Compose (or Kubernetes) deployment to host Danswer anywhere.
## 🔌 Connectors
Efficiently pulls the latest changes from:
* Slack
* GitHub
* Google Drive
* Confluence
* Jira
* Zendesk
* Notion
* Gong
* Slab
* Linear
* Productboard
* Guru
* Zulip
* Bookstack
* Document360
* Request Tracker
* Hubspot
* Local Files
* Websites
* With more to come...
## 🚧 Roadmap
* Organizational understanding.
* Ability to locate and suggest experts from your team.
* Code Search
* Structured Query Languages (SQL, Excel formulas, etc.)
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.

17
backend/.dockerignore Normal file
View File

@@ -0,0 +1,17 @@
**/__pycache__
venv/
env/
*.egg-info
.cache
.git/
.svn/
.vscode/
.idea/
*.log
log/
.env
secrets.yaml
build/
dist/
.coverage
htmlcov/

10
backend/.gitignore vendored Normal file
View File

@@ -0,0 +1,10 @@
__pycache__/
.mypy_cache
.idea/
site_crawls/
.ipynb_checkpoints/
api_keys.py
*ipynb
.env
vespa-app.zip
dynamic_config_storage/

View File

@@ -0,0 +1,58 @@
repos:
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
language_version: python3.11
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.9.0
hooks:
- id: reorder-python-imports
args: ['--py311-plus', '--application-directories=backend/']
# need to ignore alembic files, since reorder-python-imports gets confused
# and thinks that alembic is a local package since there is a folder
# in the backend directory called `alembic`
exclude: ^backend/alembic/
# These settings will remove unused imports with side effects
# Note: The repo currently does not and should not have imports with side effects
- repo: https://github.com/PyCQA/autoflake
rev: v2.2.0
hooks:
- id: autoflake
args: [ '--remove-all-unused-imports', '--remove-unused-variables', '--in-place' , '--recursive']
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.0.286
hooks:
- id: ruff
# We would like to have a mypy pre-commit hook, but due to the fact that
# pre-commit runs in it's own isolated environment, we would need to install
# and keep in sync all dependencies so mypy has access to the appropriate type
# stubs. This does not seem worth it at the moment, so for now we will stick to
# having mypy run via Github Actions / manually by contributors
# - repo: https://github.com/pre-commit/mirrors-mypy
# rev: v1.1.1
# hooks:
# - id: mypy
# exclude: ^tests/
# # below are needed for type stubs since pre-commit runs in it's own
# # isolated environment. Unfortunately, this needs to be kept in sync
# # with requirements/dev.txt + requirements/default.txt
# additional_dependencies: [
# alembic==1.10.4,
# types-beautifulsoup4==4.12.0.3,
# types-html5lib==1.1.11.13,
# types-oauthlib==3.2.0.9,
# types-psycopg2==2.9.21.10,
# types-python-dateutil==2.8.19.13,
# types-regex==2023.3.23.1,
# types-requests==2.28.11.17,
# types-retry==0.9.9.3,
# types-urllib3==1.26.25.11
# ]
# # TODO: add back once errors are addressed
# # args: [--strict]

54
backend/Dockerfile Normal file
View File

@@ -0,0 +1,54 @@
FROM python:3.11.7-slim-bookworm
# Default DANSWER_VERSION, typically overriden during builds by GitHub Actions.
ARG DANSWER_VERSION=0.3-dev
ENV DANSWER_VERSION=${DANSWER_VERSION}
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
# Install system dependencies
# cmake needed for psycopg (postgres)
# libpq-dev needed for psycopg (postgres)
# curl included just for users' convenience
# zip for Vespa step futher down
# ca-certificates for HTTPS
RUN apt-get update && \
apt-get install -y cmake curl zip ca-certificates && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
# Install Python dependencies
# Remove py which is pulled in by retry, py is not needed and is a CVE
COPY ./requirements/default.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt && \
pip uninstall -y py && \
playwright install chromium && playwright install-deps chromium && \
ln -s /usr/local/bin/supervisord /usr/bin/supervisord
# Cleanup for CVEs and size reduction
# https://github.com/tornadoweb/tornado/issues/3107
# xserver-common and xvfb included by playwright installation but not needed after
# perl-base is part of the base Python Debian image but not needed for Danswer functionality
# perl-base could only be removed with --allow-remove-essential
RUN apt-get remove -y --allow-remove-essential perl-base xserver-common xvfb cmake libldap-2.5-0 libldap-2.5-0 && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/* && \
rm /usr/local/lib/python3.11/site-packages/tornado/test/test.key
# Set up application files
WORKDIR /app
COPY ./danswer /app/danswer
COPY ./shared_models /app/shared_models
COPY ./alembic /app/alembic
COPY ./alembic.ini /app/alembic.ini
COPY supervisord.conf /usr/etc/supervisord.conf
# Create Vespa app zip
WORKDIR /app/danswer/document_index/vespa/app_config
RUN zip -r /app/danswer/vespa-app.zip .
WORKDIR /app
ENV PYTHONPATH /app
# Default command which does nothing
# This container is used by api server and background which specify their own CMD
CMD ["tail", "-f", "/dev/null"]

View File

@@ -0,0 +1,39 @@
FROM python:3.11.7-slim-bookworm
# Default DANSWER_VERSION, typically overriden during builds by GitHub Actions.
ARG DANSWER_VERSION=0.3-dev
ENV DANSWER_VERSION=${DANSWER_VERSION}
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
COPY ./requirements/model_server.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt
RUN apt-get remove -y --allow-remove-essential perl-base && \
apt-get autoremove -y
WORKDIR /app
# Needed for model configs and defaults
COPY ./danswer/configs /app/danswer/configs
COPY ./danswer/dynamic_configs /app/danswer/dynamic_configs
# Utils used by model server
COPY ./danswer/utils/logger.py /app/danswer/utils/logger.py
COPY ./danswer/utils/timing.py /app/danswer/utils/timing.py
COPY ./danswer/utils/telemetry.py /app/danswer/utils/telemetry.py
# Place to fetch version information
COPY ./danswer/__init__.py /app/danswer/__init__.py
# Shared implementations for running NLP models locally
COPY ./danswer/search/search_nlp_models.py /app/danswer/search/search_nlp_models.py
# Request/Response models
COPY ./shared_models /app/shared_models
# Model Server main code
COPY ./model_server /app/model_server
ENV PYTHONPATH /app
CMD ["uvicorn", "model_server.main:app", "--host", "0.0.0.0", "--port", "9000"]

108
backend/alembic.ini Normal file
View File

@@ -0,0 +1,108 @@
# A generic, single database configuration.
[alembic]
# path to migration scripts
script_location = alembic
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python-dateutil library that can be
# installed by adding `alembic[tz]` to the pip requirements
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =
# max length of characters to apply to the
# "slug" field
# truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; This defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions
# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
version_path_separator = os # Use os.pathsep. Default configuration used for new projects.
# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# sqlalchemy.url = driver://user:pass@localhost/dbname
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
hooks = black
black.type = console_scripts
black.entrypoint = black
black.options = -l 79 REVISION_SCRIPT_FILENAME
# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
qualname =
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

19
backend/alembic/README.md Normal file
View File

@@ -0,0 +1,19 @@
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/backend/alembic/README.md"} -->
# Alembic DB Migrations
These files are for creating/updating the tables in the Relational DB (Postgres).
Danswer migrations use a generic single-database configuration with an async dbapi.
## To generate new migrations:
run from danswer/backend:
`alembic revision --autogenerate -m <DESCRIPTION_OF_MIGRATION>`
More info can be found here: https://alembic.sqlalchemy.org/en/latest/autogenerate.html
## Running migrations
To run all un-applied migrations:
`alembic upgrade head`
To undo migrations:
`alembic downgrade -X`
where X is the number of migrations you want to undo from the current state

90
backend/alembic/env.py Normal file
View File

@@ -0,0 +1,90 @@
import asyncio
from logging.config import fileConfig
from alembic import context
from danswer.db.engine import build_connection_string
from danswer.db.models import Base
from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import create_async_engine
from celery.backends.database.session import ResultModelBase # type: ignore
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = [Base.metadata, ResultModelBase.metadata]
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = build_connection_string()
context.configure(
url=url,
target_metadata=target_metadata, # type: ignore
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def do_run_migrations(connection: Connection) -> None:
context.configure(connection=connection, target_metadata=target_metadata) # type: ignore
with context.begin_transaction():
context.run_migrations()
async def run_async_migrations() -> None:
"""In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = create_async_engine(
build_connection_string(),
poolclass=pool.NullPool,
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode."""
asyncio.run(run_async_migrations())
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

View File

@@ -0,0 +1,24 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision = ${repr(up_revision)}
down_revision = ${repr(down_revision)}
branch_labels = ${repr(branch_labels)}
depends_on = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}

View File

@@ -0,0 +1,37 @@
"""Introduce Danswer APIs
Revision ID: 15326fcec57e
Revises: 77d07dffae64
Create Date: 2023-11-11 20:51:24.228999
"""
from alembic import op
import sqlalchemy as sa
from danswer.configs.constants import DocumentSource
# revision identifiers, used by Alembic.
revision = "15326fcec57e"
down_revision = "77d07dffae64"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.alter_column("credential", "is_admin", new_column_name="admin_public")
op.add_column(
"document",
sa.Column("from_ingestion_api", sa.Boolean(), nullable=True),
)
op.alter_column(
"connector",
"source",
type_=sa.String(length=50),
existing_type=sa.Enum(DocumentSource, native_enum=False),
existing_nullable=False,
)
def downgrade() -> None:
op.drop_column("document", "from_ingestion_api")
op.alter_column("credential", "admin_public", new_column_name="is_admin")

View File

@@ -0,0 +1,55 @@
"""Google OAuth2
Revision ID: 2666d766cb9b
Revises: 6d387b3196c2
Create Date: 2023-05-05 15:49:35.716016
"""
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision = "2666d766cb9b"
down_revision = "6d387b3196c2"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"oauth_account",
sa.Column("id", fastapi_users_db_sqlalchemy.generics.GUID(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.Column("oauth_name", sa.String(length=100), nullable=False),
sa.Column("access_token", sa.String(length=1024), nullable=False),
sa.Column("expires_at", sa.Integer(), nullable=True),
sa.Column("refresh_token", sa.String(length=1024), nullable=True),
sa.Column("account_id", sa.String(length=320), nullable=False),
sa.Column("account_email", sa.String(length=320), nullable=False),
sa.ForeignKeyConstraint(["user_id"], ["user.id"], ondelete="cascade"),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(
op.f("ix_oauth_account_account_id"),
"oauth_account",
["account_id"],
unique=False,
)
op.create_index(
op.f("ix_oauth_account_oauth_name"),
"oauth_account",
["oauth_name"],
unique=False,
)
def downgrade() -> None:
op.drop_index(op.f("ix_oauth_account_oauth_name"), table_name="oauth_account")
op.drop_index(op.f("ix_oauth_account_account_id"), table_name="oauth_account")
op.drop_table("oauth_account")

View File

@@ -0,0 +1,173 @@
"""Permission Framework
Revision ID: 27c6ecc08586
Revises: 2666d766cb9b
Create Date: 2023-05-24 18:45:17.244495
"""
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "27c6ecc08586"
down_revision = "2666d766cb9b"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.execute("TRUNCATE TABLE index_attempt")
op.create_table(
"connector",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column(
"source",
sa.Enum(
"SLACK",
"WEB",
"GOOGLE_DRIVE",
"GITHUB",
"CONFLUENCE",
name="documentsource",
native_enum=False,
),
nullable=False,
),
sa.Column(
"input_type",
sa.Enum(
"LOAD_STATE",
"POLL",
"EVENT",
name="inputtype",
native_enum=False,
),
nullable=True,
),
sa.Column(
"connector_specific_config",
postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
),
sa.Column("refresh_freq", sa.Integer(), nullable=True),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column(
"time_updated",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column("disabled", sa.Boolean(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"credential",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"credential_json",
postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column("public_doc", sa.Boolean(), nullable=False),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column(
"time_updated",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"connector_credential_pair",
sa.Column("connector_id", sa.Integer(), nullable=False),
sa.Column("credential_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["connector_id"],
["connector.id"],
),
sa.ForeignKeyConstraint(
["credential_id"],
["credential.id"],
),
sa.PrimaryKeyConstraint("connector_id", "credential_id"),
)
op.add_column(
"index_attempt",
sa.Column("connector_id", sa.Integer(), nullable=True),
)
op.add_column(
"index_attempt",
sa.Column("credential_id", sa.Integer(), nullable=True),
)
op.create_foreign_key(
"fk_index_attempt_credential_id",
"index_attempt",
"credential",
["credential_id"],
["id"],
)
op.create_foreign_key(
"fk_index_attempt_connector_id",
"index_attempt",
"connector",
["connector_id"],
["id"],
)
op.drop_column("index_attempt", "connector_specific_config")
op.drop_column("index_attempt", "source")
op.drop_column("index_attempt", "input_type")
def downgrade() -> None:
op.execute("TRUNCATE TABLE index_attempt")
op.add_column(
"index_attempt",
sa.Column("input_type", sa.VARCHAR(), autoincrement=False, nullable=False),
)
op.add_column(
"index_attempt",
sa.Column("source", sa.VARCHAR(), autoincrement=False, nullable=False),
)
op.add_column(
"index_attempt",
sa.Column(
"connector_specific_config",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
nullable=False,
),
)
op.drop_constraint(
"fk_index_attempt_credential_id", "index_attempt", type_="foreignkey"
)
op.drop_constraint(
"fk_index_attempt_connector_id", "index_attempt", type_="foreignkey"
)
op.drop_column("index_attempt", "credential_id")
op.drop_column("index_attempt", "connector_id")
op.drop_table("connector_credential_pair")
op.drop_table("credential")
op.drop_table("connector")

View File

@@ -0,0 +1,37 @@
"""Persona Datetime Aware
Revision ID: 30c1d5744104
Revises: 7f99be1cb9f5
Create Date: 2023-10-16 23:21:01.283424
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "30c1d5744104"
down_revision = "7f99be1cb9f5"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column("persona", sa.Column("datetime_aware", sa.Boolean(), nullable=True))
op.execute("UPDATE persona SET datetime_aware = TRUE")
op.alter_column("persona", "datetime_aware", nullable=False)
op.create_index(
"_default_persona_name_idx",
"persona",
["name"],
unique=True,
postgresql_where=sa.text("default_persona = true"),
)
def downgrade() -> None:
op.drop_index(
"_default_persona_name_idx",
table_name="persona",
postgresql_where=sa.text("default_persona = true"),
)
op.drop_column("persona", "datetime_aware")

View File

@@ -0,0 +1,49 @@
"""Move is_public to cc_pair
Revision ID: 3b25685ff73c
Revises: e0a68a81d434
Create Date: 2023-10-05 18:47:09.582849
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "3b25685ff73c"
down_revision = "e0a68a81d434"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"connector_credential_pair",
sa.Column("is_public", sa.Boolean(), nullable=True),
)
# fill in is_public for existing rows
op.execute(
"UPDATE connector_credential_pair SET is_public = true WHERE is_public IS NULL"
)
op.alter_column("connector_credential_pair", "is_public", nullable=False)
op.add_column(
"credential",
sa.Column("is_admin", sa.Boolean(), nullable=True),
)
op.execute("UPDATE credential SET is_admin = true WHERE is_admin IS NULL")
op.alter_column("credential", "is_admin", nullable=False)
op.drop_column("credential", "public_doc")
def downgrade() -> None:
op.add_column(
"credential",
sa.Column("public_doc", sa.Boolean(), nullable=True),
)
# setting public_doc to false for all existing rows to be safe
# NOTE: this is likely not the correct state of the world but it's the best we can do
op.execute("UPDATE credential SET public_doc = false WHERE public_doc IS NULL")
op.alter_column("credential", "public_doc", nullable=False)
op.drop_column("connector_credential_pair", "is_public")
op.drop_column("credential", "is_admin")

View File

@@ -0,0 +1,52 @@
"""Polling Document Count
Revision ID: 3c5e35aa9af0
Revises: 27c6ecc08586
Create Date: 2023-06-14 23:45:51.760440
"""
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision = "3c5e35aa9af0"
down_revision = "27c6ecc08586"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"connector_credential_pair",
sa.Column(
"last_successful_index_time",
sa.DateTime(timezone=True),
nullable=True,
),
)
op.add_column(
"connector_credential_pair",
sa.Column(
"last_attempt_status",
sa.Enum(
"NOT_STARTED",
"IN_PROGRESS",
"SUCCESS",
"FAILED",
name="indexingstatus",
native_enum=False,
),
nullable=False,
),
)
op.add_column(
"connector_credential_pair",
sa.Column("total_docs_indexed", sa.Integer(), nullable=False),
)
def downgrade() -> None:
op.drop_column("connector_credential_pair", "total_docs_indexed")
op.drop_column("connector_credential_pair", "last_attempt_status")
op.drop_column("connector_credential_pair", "last_successful_index_time")

View File

@@ -0,0 +1,24 @@
"""Larger Access Tokens for OAUTH
Revision ID: 465f78d9b7f9
Revises: 3c5e35aa9af0
Create Date: 2023-07-18 17:33:40.365034
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "465f78d9b7f9"
down_revision = "3c5e35aa9af0"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.alter_column("oauth_account", "access_token", type_=sa.Text())
def downgrade() -> None:
op.alter_column("oauth_account", "access_token", type_=sa.String(length=1024))

View File

@@ -0,0 +1,31 @@
"""Remove Native Enum
Revision ID: 46625e4745d4
Revises: 9d97fecfab7f
Create Date: 2023-10-27 11:38:33.803145
"""
from alembic import op
from sqlalchemy import String
# revision identifiers, used by Alembic.
revision = "46625e4745d4"
down_revision = "9d97fecfab7f"
branch_labels = None
depends_on = None
def upgrade() -> None:
# At this point, we directly changed some previous migrations,
# https://github.com/danswer-ai/danswer/pull/637
# Due to using Postgres native Enums, it caused some complications for first time users.
# To remove those complications, all Enums are only handled application side moving forward.
# This migration exists to ensure that existing users don't run into upgrade issues.
op.alter_column("index_attempt", "status", type_=String)
op.alter_column("connector_credential_pair", "last_attempt_status", type_=String)
op.execute("DROP TYPE IF EXISTS indexingstatus")
def downgrade() -> None:
# We don't want Native Enums, do nothing
pass

View File

@@ -0,0 +1,73 @@
"""Create IndexAttempt table
Revision ID: 47433d30de82
Revises:
Create Date: 2023-05-04 00:55:32.971991
"""
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "47433d30de82"
down_revision = None
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"index_attempt",
sa.Column("id", sa.Integer(), nullable=False),
# String type since python enum will change often
sa.Column(
"source",
sa.String(),
nullable=False,
),
# String type to easily accomodate new ways of pulling
# in documents
sa.Column(
"input_type",
sa.String(),
nullable=False,
),
sa.Column(
"connector_specific_config",
postgresql.JSONB(),
nullable=False,
),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=True,
),
sa.Column(
"time_updated",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
server_onupdate=sa.text("now()"), # type: ignore
nullable=True,
),
sa.Column(
"status",
sa.Enum(
"NOT_STARTED",
"IN_PROGRESS",
"SUCCESS",
"FAILED",
name="indexingstatus",
native_enum=False,
),
nullable=False,
),
sa.Column("document_ids", postgresql.ARRAY(sa.String()), nullable=True),
sa.Column("error_msg", sa.String(), nullable=True),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("index_attempt")

View File

@@ -0,0 +1,28 @@
"""Add additional retrieval controls to Persona
Revision ID: 50b683a8295c
Revises: 7da0ae5ad583
Create Date: 2023-11-27 17:23:29.668422
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "50b683a8295c"
down_revision = "7da0ae5ad583"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column("persona", sa.Column("num_chunks", sa.Integer(), nullable=True))
op.add_column(
"persona",
sa.Column("apply_llm_relevance_filter", sa.Boolean(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "apply_llm_relevance_filter")
op.drop_column("persona", "num_chunks")

View File

@@ -0,0 +1,59 @@
"""Add document set tables
Revision ID: 57b53544726e
Revises: 800f48024ae9
Create Date: 2023-09-20 16:59:39.097177
"""
from alembic import op
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "57b53544726e"
down_revision = "800f48024ae9"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"document_set",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column("description", sa.String(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column("is_up_to_date", sa.Boolean(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("name"),
)
op.create_table(
"document_set__connector_credential_pair",
sa.Column("document_set_id", sa.Integer(), nullable=False),
sa.Column("connector_credential_pair_id", sa.Integer(), nullable=False),
sa.Column("is_current", sa.Boolean(), nullable=False),
sa.ForeignKeyConstraint(
["connector_credential_pair_id"],
["connector_credential_pair.id"],
),
sa.ForeignKeyConstraint(
["document_set_id"],
["document_set.id"],
),
sa.PrimaryKeyConstraint(
"document_set_id", "connector_credential_pair_id", "is_current"
),
)
def downgrade() -> None:
op.drop_table("document_set__connector_credential_pair")
op.drop_table("document_set")

View File

@@ -0,0 +1,85 @@
"""Add Chat Sessions
Revision ID: 5809c0787398
Revises: d929f0c1c6af
Create Date: 2023-09-04 15:29:44.002164
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5809c0787398"
down_revision = "d929f0c1c6af"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"chat_session",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column("description", sa.Text(), nullable=False),
sa.Column("deleted", sa.Boolean(), nullable=False),
sa.Column(
"time_updated",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"chat_message",
sa.Column("chat_session_id", sa.Integer(), nullable=False),
sa.Column("message_number", sa.Integer(), nullable=False),
sa.Column("edit_number", sa.Integer(), nullable=False),
sa.Column("parent_edit_number", sa.Integer(), nullable=True),
sa.Column("latest", sa.Boolean(), nullable=False),
sa.Column("message", sa.Text(), nullable=False),
sa.Column(
"message_type",
sa.Enum(
"SYSTEM",
"USER",
"ASSISTANT",
"DANSWER",
name="messagetype",
native_enum=False,
),
nullable=False,
),
sa.Column(
"time_sent",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["chat_session_id"],
["chat_session.id"],
),
sa.PrimaryKeyConstraint("chat_session_id", "message_number", "edit_number"),
)
def downgrade() -> None:
op.drop_table("chat_message")
op.drop_table("chat_session")

View File

@@ -0,0 +1,36 @@
"""Add docs_indexed_column + time_started to index_attempt table
Revision ID: 5e84129c8be3
Revises: e6a4bbc13fe4
Create Date: 2023-08-10 21:43:09.069523
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5e84129c8be3"
down_revision = "e6a4bbc13fe4"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"index_attempt",
sa.Column("num_docs_indexed", sa.Integer()),
)
op.add_column(
"index_attempt",
sa.Column(
"time_started",
sa.DateTime(timezone=True),
nullable=True,
),
)
def downgrade() -> None:
op.drop_column("index_attempt", "time_started")
op.drop_column("index_attempt", "num_docs_indexed")

View File

@@ -0,0 +1,92 @@
"""Basic Auth
Revision ID: 6d387b3196c2
Revises: 47433d30de82
Create Date: 2023-05-05 14:40:10.242502
"""
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "6d387b3196c2"
down_revision = "47433d30de82"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"user",
sa.Column("id", fastapi_users_db_sqlalchemy.generics.GUID(), nullable=False),
sa.Column("email", sa.String(length=320), nullable=False),
sa.Column("hashed_password", sa.String(length=1024), nullable=False),
sa.Column("is_active", sa.Boolean(), nullable=False),
sa.Column("is_superuser", sa.Boolean(), nullable=False),
sa.Column("is_verified", sa.Boolean(), nullable=False),
sa.Column(
"role",
sa.Enum("BASIC", "ADMIN", name="userrole", native_enum=False),
default="BASIC",
nullable=False,
),
sa.PrimaryKeyConstraint("id"),
)
op.create_index(op.f("ix_user_email"), "user", ["email"], unique=True)
op.create_table(
"accesstoken",
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.Column("token", sa.String(length=43), nullable=False),
sa.Column(
"created_at",
fastapi_users_db_sqlalchemy.generics.TIMESTAMPAware(timezone=True),
nullable=False,
),
sa.ForeignKeyConstraint(["user_id"], ["user.id"], ondelete="cascade"),
sa.PrimaryKeyConstraint("token"),
)
op.create_index(
op.f("ix_accesstoken_created_at"),
"accesstoken",
["created_at"],
unique=False,
)
op.alter_column(
"index_attempt",
"time_created",
existing_type=postgresql.TIMESTAMP(timezone=True),
nullable=False,
existing_server_default=sa.text("now()"), # type: ignore
)
op.alter_column(
"index_attempt",
"time_updated",
existing_type=postgresql.TIMESTAMP(timezone=True),
nullable=False,
)
def downgrade() -> None:
op.alter_column(
"index_attempt",
"time_updated",
existing_type=postgresql.TIMESTAMP(timezone=True),
nullable=True,
)
op.alter_column(
"index_attempt",
"time_created",
existing_type=postgresql.TIMESTAMP(timezone=True),
nullable=True,
existing_server_default=sa.text("now()"), # type: ignore
)
op.drop_index(op.f("ix_accesstoken_created_at"), table_name="accesstoken")
op.drop_table("accesstoken")
op.drop_index(op.f("ix_user_email"), table_name="user")
op.drop_table("user")

View File

@@ -0,0 +1,26 @@
"""Count Chat Tokens
Revision ID: 767f1c2a00eb
Revises: dba7f71618f5
Create Date: 2023-09-21 10:03:21.509899
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "767f1c2a00eb"
down_revision = "dba7f71618f5"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"chat_message", sa.Column("token_count", sa.Integer(), nullable=False)
)
def downgrade() -> None:
op.drop_column("chat_message", "token_count")

View File

@@ -0,0 +1,32 @@
"""CC-Pair Name not Unique
Revision ID: 76b60d407dfb
Revises: b156fa702355
Create Date: 2023-12-22 21:42:10.018804
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "76b60d407dfb"
down_revision = "b156fa702355"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.execute("DELETE FROM connector_credential_pair WHERE name IS NULL")
op.drop_constraint(
"connector_credential_pair__name__key",
"connector_credential_pair",
type_="unique",
)
op.alter_column(
"connector_credential_pair", "name", existing_type=sa.String(), nullable=False
)
def downgrade() -> None:
# This wasn't really required by the code either, no good reason to make it unique again
pass

View File

@@ -0,0 +1,35 @@
"""forcibly remove more enum types from postgres
Revision ID: 77d07dffae64
Revises: d61e513bef0a
Create Date: 2023-11-01 12:33:01.999617
"""
from alembic import op
from sqlalchemy import String
# revision identifiers, used by Alembic.
revision = "77d07dffae64"
down_revision = "d61e513bef0a"
branch_labels = None
depends_on = None
def upgrade() -> None:
# In a PR:
# https://github.com/danswer-ai/danswer/pull/397/files#diff-f05fb341f6373790b91852579631b64ca7645797a190837156a282b67e5b19c2
# we directly changed some previous migrations. This caused some users to have native enums
# while others wouldn't. This has caused some issues when adding new fields to these enums.
# This migration manually changes the enum types to ensure that nobody uses native enums.
op.alter_column("query_event", "selected_search_flow", type_=String)
op.alter_column("query_event", "feedback", type_=String)
op.alter_column("document_retrieval_feedback", "feedback", type_=String)
op.execute("DROP TYPE IF EXISTS searchtype")
op.execute("DROP TYPE IF EXISTS qafeedbacktype")
op.execute("DROP TYPE IF EXISTS searchfeedbacktype")
def downgrade() -> None:
# We don't want Native Enums, do nothing
pass

View File

@@ -0,0 +1,48 @@
"""Task Tracking
Revision ID: 78dbe7e38469
Revises: 7ccea01261f6
Create Date: 2023-10-15 23:40:50.593262
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "78dbe7e38469"
down_revision = "7ccea01261f6"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"task_queue_jobs",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("task_id", sa.String(), nullable=False),
sa.Column("task_name", sa.String(), nullable=False),
sa.Column(
"status",
sa.Enum(
"PENDING",
"STARTED",
"SUCCESS",
"FAILURE",
name="taskstatus",
native_enum=False,
),
nullable=False,
),
sa.Column("start_time", sa.DateTime(timezone=True), nullable=True),
sa.Column(
"register_time",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("task_queue_jobs")

View File

@@ -0,0 +1,31 @@
"""Store Chat Retrieval Docs
Revision ID: 7ccea01261f6
Revises: a570b80a5f20
Create Date: 2023-10-15 10:39:23.317453
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "7ccea01261f6"
down_revision = "a570b80a5f20"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"chat_message",
sa.Column(
"reference_docs",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
def downgrade() -> None:
op.drop_column("chat_message", "reference_docs")

View File

@@ -0,0 +1,23 @@
"""Add description to persona
Revision ID: 7da0ae5ad583
Revises: e86866a9c78a
Create Date: 2023-11-27 00:16:19.959414
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "7da0ae5ad583"
down_revision = "e86866a9c78a"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column("persona", sa.Column("description", sa.String(), nullable=True))
def downgrade() -> None:
op.drop_column("persona", "description")

View File

@@ -0,0 +1,38 @@
"""Add SlackBotConfig table
Revision ID: 7da543f5672f
Revises: febe9eaa0644
Create Date: 2023-09-24 16:34:17.526128
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "7da543f5672f"
down_revision = "febe9eaa0644"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"slack_bot_config",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("persona_id", sa.Integer(), nullable=True),
sa.Column(
"channel_config",
postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("slack_bot_config")

View File

@@ -0,0 +1,35 @@
"""Add index for getting documents just by connector id / credential id
Revision ID: 7f99be1cb9f5
Revises: 78dbe7e38469
Create Date: 2023-10-15 22:48:15.487762
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "7f99be1cb9f5"
down_revision = "78dbe7e38469"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_index(
op.f(
"ix_document_by_connector_credential_pair_pkey__connector_id__credential_id"
),
"document_by_connector_credential_pair",
["connector_id", "credential_id"],
unique=False,
)
def downgrade() -> None:
op.drop_index(
op.f(
"ix_document_by_connector_credential_pair_pkey__connector_id__credential_id"
),
table_name="document_by_connector_credential_pair",
)

View File

@@ -0,0 +1,60 @@
"""Add ID to ConnectorCredentialPair
Revision ID: 800f48024ae9
Revises: 767f1c2a00eb
Create Date: 2023-09-19 16:13:42.299715
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.schema import Sequence, CreateSequence
# revision identifiers, used by Alembic.
revision = "800f48024ae9"
down_revision = "767f1c2a00eb"
branch_labels = None
depends_on = None
def upgrade() -> None:
sequence = Sequence("connector_credential_pair_id_seq")
op.execute(CreateSequence(sequence)) # type: ignore
op.add_column(
"connector_credential_pair",
sa.Column(
"id", sa.Integer(), nullable=True, server_default=sequence.next_value()
),
)
op.add_column(
"connector_credential_pair",
sa.Column("name", sa.String(), nullable=True),
)
# fill in IDs for existing rows
op.execute(
"UPDATE connector_credential_pair SET id = nextval('connector_credential_pair_id_seq') WHERE id IS NULL"
)
op.alter_column("connector_credential_pair", "id", nullable=False)
op.create_unique_constraint(
"connector_credential_pair__name__key", "connector_credential_pair", ["name"]
)
op.create_unique_constraint(
"connector_credential_pair__id__key", "connector_credential_pair", ["id"]
)
def downgrade() -> None:
op.drop_constraint(
"connector_credential_pair__name__key",
"connector_credential_pair",
type_="unique",
)
op.drop_constraint(
"connector_credential_pair__id__key",
"connector_credential_pair",
type_="unique",
)
op.drop_column("connector_credential_pair", "name")
op.drop_column("connector_credential_pair", "id")
op.execute("DROP SEQUENCE connector_credential_pair_id_seq")

View File

@@ -0,0 +1,36 @@
"""Add chat session to query_event
Revision ID: 80696cf850ae
Revises: 15326fcec57e
Create Date: 2023-11-26 02:38:35.008070
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "80696cf850ae"
down_revision = "15326fcec57e"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"query_event",
sa.Column("chat_session_id", sa.Integer(), nullable=True),
)
op.create_foreign_key(
"fk_query_event_chat_session_id",
"query_event",
"chat_session",
["chat_session_id"],
["id"],
)
def downgrade() -> None:
op.drop_constraint(
"fk_query_event_chat_session_id", "query_event", type_="foreignkey"
)
op.drop_column("query_event", "chat_session_id")

View File

@@ -0,0 +1,34 @@
"""Add is_visible to Persona
Revision ID: 891cd83c87a8
Revises: 76b60d407dfb
Create Date: 2023-12-21 11:55:54.132279
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "891cd83c87a8"
down_revision = "76b60d407dfb"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("is_visible", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET is_visible = true")
op.alter_column("persona", "is_visible", nullable=False)
op.add_column(
"persona",
sa.Column("display_priority", sa.Integer(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "is_visible")
op.drop_column("persona", "display_priority")

View File

@@ -0,0 +1,39 @@
"""Restructure Document Indices
Revision ID: 8aabb57f3b49
Revises: 5e84129c8be3
Create Date: 2023-08-18 21:15:57.629515
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "8aabb57f3b49"
down_revision = "5e84129c8be3"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.drop_table("chunk")
op.execute("DROP TYPE IF EXISTS documentstoretype")
def downgrade() -> None:
op.create_table(
"chunk",
sa.Column("id", sa.VARCHAR(), autoincrement=False, nullable=False),
sa.Column(
"document_store_type",
postgresql.ENUM("VECTOR", "KEYWORD", name="documentstoretype"),
autoincrement=False,
nullable=False,
),
sa.Column("document_id", sa.VARCHAR(), autoincrement=False, nullable=False),
sa.ForeignKeyConstraint(
["document_id"], ["document.id"], name="chunk_document_id_fkey"
),
sa.PrimaryKeyConstraint("id", "document_store_type", name="chunk_pkey"),
)

View File

@@ -0,0 +1,40 @@
"""Chat Context Addition
Revision ID: 8e26726b7683
Revises: 5809c0787398
Create Date: 2023-09-13 18:34:31.327944
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "8e26726b7683"
down_revision = "5809c0787398"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"persona",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column("system_text", sa.Text(), nullable=True),
sa.Column("tools_text", sa.Text(), nullable=True),
sa.Column("hint_text", sa.Text(), nullable=True),
sa.Column("default_persona", sa.Boolean(), nullable=False),
sa.Column("deleted", sa.Boolean(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.add_column("chat_message", sa.Column("persona_id", sa.Integer(), nullable=True))
op.create_foreign_key(
"fk_chat_message_persona_id", "chat_message", "persona", ["persona_id"], ["id"]
)
def downgrade() -> None:
op.drop_constraint("fk_chat_message_persona_id", "chat_message", type_="foreignkey")
op.drop_column("chat_message", "persona_id")
op.drop_table("persona")

View File

@@ -0,0 +1,32 @@
"""Store Tool Details
Revision ID: 904451035c9b
Revises: 3b25685ff73c
Create Date: 2023-10-05 12:29:26.620000
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "904451035c9b"
down_revision = "3b25685ff73c"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("tools", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
)
op.drop_column("persona", "tools_text")
def downgrade() -> None:
op.add_column(
"persona",
sa.Column("tools_text", sa.TEXT(), autoincrement=False, nullable=True),
)
op.drop_column("persona", "tools")

View File

@@ -0,0 +1,61 @@
"""Tags
Revision ID: 904e5138fffb
Revises: 891cd83c87a8
Create Date: 2024-01-01 10:44:43.733974
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "904e5138fffb"
down_revision = "891cd83c87a8"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"tag",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("tag_key", sa.String(), nullable=False),
sa.Column("tag_value", sa.String(), nullable=False),
sa.Column("source", sa.String(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint(
"tag_key", "tag_value", "source", name="_tag_key_value_source_uc"
),
)
op.create_table(
"document__tag",
sa.Column("document_id", sa.String(), nullable=False),
sa.Column("tag_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["document_id"],
["document.id"],
),
sa.ForeignKeyConstraint(
["tag_id"],
["tag.id"],
),
sa.PrimaryKeyConstraint("document_id", "tag_id"),
)
op.add_column(
"search_doc",
sa.Column(
"doc_metadata",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
op.execute("UPDATE search_doc SET doc_metadata = '{}' WHERE doc_metadata IS NULL")
op.alter_column("search_doc", "doc_metadata", nullable=False)
def downgrade() -> None:
op.drop_table("document__tag")
op.drop_table("tag")
op.drop_column("search_doc", "doc_metadata")

View File

@@ -0,0 +1,31 @@
"""Added retrieved docs to query event
Revision ID: 9d97fecfab7f
Revises: ffc707a226b4
Create Date: 2023-10-20 12:22:31.930449
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "9d97fecfab7f"
down_revision = "ffc707a226b4"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"query_event",
sa.Column(
"retrieved_document_ids",
postgresql.ARRAY(sa.String()),
nullable=True,
),
)
def downgrade() -> None:
op.drop_column("query_event", "retrieved_document_ids")

View File

@@ -0,0 +1,67 @@
"""UserGroup tables
Revision ID: a570b80a5f20
Revises: 904451035c9b
Create Date: 2023-10-02 12:27:10.265725
"""
from alembic import op
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "a570b80a5f20"
down_revision = "904451035c9b"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"user_group",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column("is_up_to_date", sa.Boolean(), nullable=False),
sa.Column("is_up_for_deletion", sa.Boolean(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("name"),
)
op.create_table(
"user__user_group",
sa.Column("user_group_id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("user_group_id", "user_id"),
)
op.create_table(
"user_group__connector_credential_pair",
sa.Column("user_group_id", sa.Integer(), nullable=False),
sa.Column("cc_pair_id", sa.Integer(), nullable=False),
sa.Column("is_current", sa.Boolean(), nullable=False),
sa.ForeignKeyConstraint(
["cc_pair_id"],
["connector_credential_pair.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("user_group_id", "cc_pair_id", "is_current"),
)
def downgrade() -> None:
op.drop_table("user_group__connector_credential_pair")
op.drop_table("user__user_group")
op.drop_table("user_group")

View File

@@ -0,0 +1,47 @@
"""Add SAML Accounts
Revision ID: ae62505e3acc
Revises: 7da543f5672f
Create Date: 2023-09-26 16:19:30.933183
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "ae62505e3acc"
down_revision = "7da543f5672f"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"saml",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.Column("encrypted_cookie", sa.Text(), nullable=False),
sa.Column("expires_at", sa.DateTime(timezone=True), nullable=True),
sa.Column(
"updated_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("encrypted_cookie"),
sa.UniqueConstraint("user_id"),
)
def downgrade() -> None:
op.drop_table("saml")

View File

@@ -0,0 +1,49 @@
"""Make 'last_attempt_status' nullable
Revision ID: b082fec533f0
Revises: df0c7ad8a076
Create Date: 2023-08-06 12:05:47.087325
"""
from alembic import op
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "b082fec533f0"
down_revision = "df0c7ad8a076"
branch_labels = None
depends_on = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.alter_column(
"connector_credential_pair",
"last_attempt_status",
existing_type=postgresql.ENUM(
"NOT_STARTED",
"IN_PROGRESS",
"SUCCESS",
"FAILED",
name="indexingstatus",
),
nullable=True,
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.alter_column(
"connector_credential_pair",
"last_attempt_status",
existing_type=postgresql.ENUM(
"NOT_STARTED",
"IN_PROGRESS",
"SUCCESS",
"FAILED",
name="indexingstatus",
),
nullable=False,
)
# ### end Alembic commands ###

View File

@@ -0,0 +1,520 @@
"""Chat Reworked
Revision ID: b156fa702355
Revises: baf71f781b9e
Create Date: 2023-12-12 00:57:41.823371
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
from sqlalchemy.dialects.postgresql import ENUM
from danswer.configs.constants import DocumentSource
# revision identifiers, used by Alembic.
revision = "b156fa702355"
down_revision = "baf71f781b9e"
branch_labels = None
depends_on = None
searchtype_enum = ENUM(
"KEYWORD", "SEMANTIC", "HYBRID", name="searchtype", create_type=True
)
recencybiassetting_enum = ENUM(
"FAVOR_RECENT",
"BASE_DECAY",
"NO_DECAY",
"AUTO",
name="recencybiassetting",
create_type=True,
)
def upgrade() -> None:
bind = op.get_bind()
searchtype_enum.create(bind)
recencybiassetting_enum.create(bind)
# This is irrecoverable, whatever
op.execute("DELETE FROM chat_feedback")
op.execute("DELETE FROM document_retrieval_feedback")
op.create_table(
"search_doc",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("document_id", sa.String(), nullable=False),
sa.Column("chunk_ind", sa.Integer(), nullable=False),
sa.Column("semantic_id", sa.String(), nullable=False),
sa.Column("link", sa.String(), nullable=True),
sa.Column("blurb", sa.String(), nullable=False),
sa.Column("boost", sa.Integer(), nullable=False),
sa.Column(
"source_type",
sa.Enum(DocumentSource, native=False),
nullable=False,
),
sa.Column("hidden", sa.Boolean(), nullable=False),
sa.Column("score", sa.Float(), nullable=False),
sa.Column("match_highlights", postgresql.ARRAY(sa.String()), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=True),
sa.Column("primary_owners", postgresql.ARRAY(sa.String()), nullable=True),
sa.Column("secondary_owners", postgresql.ARRAY(sa.String()), nullable=True),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"prompt",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column("name", sa.String(), nullable=False),
sa.Column("description", sa.String(), nullable=False),
sa.Column("system_prompt", sa.Text(), nullable=False),
sa.Column("task_prompt", sa.Text(), nullable=False),
sa.Column("include_citations", sa.Boolean(), nullable=False),
sa.Column("datetime_aware", sa.Boolean(), nullable=False),
sa.Column("default_prompt", sa.Boolean(), nullable=False),
sa.Column("deleted", sa.Boolean(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"persona__prompt",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column("prompt_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["prompt_id"],
["prompt.id"],
),
sa.PrimaryKeyConstraint("persona_id", "prompt_id"),
)
# Changes to persona first so chat_sessions can have the right persona
# The empty persona will be overwritten on server startup
op.add_column(
"persona",
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
)
op.add_column(
"persona",
sa.Column(
"search_type",
searchtype_enum,
nullable=True,
),
)
op.execute("UPDATE persona SET search_type = 'HYBRID'")
op.alter_column("persona", "search_type", nullable=False)
op.add_column(
"persona",
sa.Column("llm_relevance_filter", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET llm_relevance_filter = TRUE")
op.alter_column("persona", "llm_relevance_filter", nullable=False)
op.add_column(
"persona",
sa.Column("llm_filter_extraction", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET llm_filter_extraction = TRUE")
op.alter_column("persona", "llm_filter_extraction", nullable=False)
op.add_column(
"persona",
sa.Column(
"recency_bias",
recencybiassetting_enum,
nullable=True,
),
)
op.execute("UPDATE persona SET recency_bias = 'BASE_DECAY'")
op.alter_column("persona", "recency_bias", nullable=False)
op.alter_column("persona", "description", existing_type=sa.VARCHAR(), nullable=True)
op.execute("UPDATE persona SET description = ''")
op.alter_column("persona", "description", nullable=False)
op.create_foreign_key("persona__user_fk", "persona", "user", ["user_id"], ["id"])
op.drop_column("persona", "datetime_aware")
op.drop_column("persona", "tools")
op.drop_column("persona", "hint_text")
op.drop_column("persona", "apply_llm_relevance_filter")
op.drop_column("persona", "retrieval_enabled")
op.drop_column("persona", "system_text")
# Need to create a persona row so fk can work
result = bind.execute(sa.text("SELECT 1 FROM persona WHERE id = 0"))
exists = result.fetchone()
if not exists:
op.execute(
sa.text(
"""
INSERT INTO persona (
id, user_id, name, description, search_type, num_chunks,
llm_relevance_filter, llm_filter_extraction, recency_bias,
llm_model_version_override, default_persona, deleted
) VALUES (
0, NULL, '', '', 'HYBRID', NULL,
TRUE, TRUE, 'BASE_DECAY', NULL, TRUE, FALSE
)
"""
)
)
delete_statement = sa.text(
"""
DELETE FROM persona
WHERE name = 'Danswer' AND default_persona = TRUE AND id != 0
"""
)
bind.execute(delete_statement)
op.add_column(
"chat_feedback",
sa.Column("chat_message_id", sa.Integer(), nullable=False),
)
op.drop_constraint(
"chat_feedback_chat_message_chat_session_id_chat_message_me_fkey",
"chat_feedback",
type_="foreignkey",
)
op.drop_column("chat_feedback", "chat_message_edit_number")
op.drop_column("chat_feedback", "chat_message_chat_session_id")
op.drop_column("chat_feedback", "chat_message_message_number")
op.add_column(
"chat_message",
sa.Column(
"id",
sa.Integer(),
primary_key=True,
autoincrement=True,
nullable=False,
unique=True,
),
)
op.add_column(
"chat_message",
sa.Column("parent_message", sa.Integer(), nullable=True),
)
op.add_column(
"chat_message",
sa.Column("latest_child_message", sa.Integer(), nullable=True),
)
op.add_column(
"chat_message", sa.Column("rephrased_query", sa.Text(), nullable=True)
)
op.add_column("chat_message", sa.Column("prompt_id", sa.Integer(), nullable=True))
op.add_column(
"chat_message",
sa.Column("citations", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
)
op.add_column("chat_message", sa.Column("error", sa.Text(), nullable=True))
op.drop_constraint("fk_chat_message_persona_id", "chat_message", type_="foreignkey")
op.create_foreign_key(
"chat_message__prompt_fk", "chat_message", "prompt", ["prompt_id"], ["id"]
)
op.drop_column("chat_message", "parent_edit_number")
op.drop_column("chat_message", "persona_id")
op.drop_column("chat_message", "reference_docs")
op.drop_column("chat_message", "edit_number")
op.drop_column("chat_message", "latest")
op.drop_column("chat_message", "message_number")
op.add_column("chat_session", sa.Column("one_shot", sa.Boolean(), nullable=True))
op.execute("UPDATE chat_session SET one_shot = TRUE")
op.alter_column("chat_session", "one_shot", nullable=False)
op.alter_column(
"chat_session",
"persona_id",
existing_type=sa.INTEGER(),
nullable=True,
)
op.execute("UPDATE chat_session SET persona_id = 0")
op.alter_column("chat_session", "persona_id", nullable=False)
op.add_column(
"document_retrieval_feedback",
sa.Column("chat_message_id", sa.Integer(), nullable=False),
)
op.drop_constraint(
"document_retrieval_feedback_qa_event_id_fkey",
"document_retrieval_feedback",
type_="foreignkey",
)
op.create_foreign_key(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
"chat_message",
["chat_message_id"],
["id"],
)
op.drop_column("document_retrieval_feedback", "qa_event_id")
# Relation table must be created after the other tables are correct
op.create_table(
"chat_message__search_doc",
sa.Column("chat_message_id", sa.Integer(), nullable=False),
sa.Column("search_doc_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["chat_message_id"],
["chat_message.id"],
),
sa.ForeignKeyConstraint(
["search_doc_id"],
["search_doc.id"],
),
sa.PrimaryKeyConstraint("chat_message_id", "search_doc_id"),
)
# Needs to be created after chat_message id field is added
op.create_foreign_key(
"chat_feedback__chat_message_fk",
"chat_feedback",
"chat_message",
["chat_message_id"],
["id"],
)
op.drop_table("query_event")
def downgrade() -> None:
op.drop_constraint(
"chat_feedback__chat_message_fk", "chat_feedback", type_="foreignkey"
)
op.drop_constraint(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
type_="foreignkey",
)
op.drop_constraint("persona__user_fk", "persona", type_="foreignkey")
op.drop_constraint("chat_message__prompt_fk", "chat_message", type_="foreignkey")
op.drop_constraint(
"chat_message__search_doc_chat_message_id_fkey",
"chat_message__search_doc",
type_="foreignkey",
)
op.add_column(
"persona",
sa.Column("system_text", sa.TEXT(), autoincrement=False, nullable=True),
)
op.add_column(
"persona",
sa.Column(
"retrieval_enabled",
sa.BOOLEAN(),
autoincrement=False,
nullable=True,
),
)
op.execute("UPDATE persona SET retrieval_enabled = TRUE")
op.alter_column("persona", "retrieval_enabled", nullable=False)
op.add_column(
"persona",
sa.Column(
"apply_llm_relevance_filter",
sa.BOOLEAN(),
autoincrement=False,
nullable=True,
),
)
op.add_column(
"persona",
sa.Column("hint_text", sa.TEXT(), autoincrement=False, nullable=True),
)
op.add_column(
"persona",
sa.Column(
"tools",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
nullable=True,
),
)
op.add_column(
"persona",
sa.Column("datetime_aware", sa.BOOLEAN(), autoincrement=False, nullable=True),
)
op.execute("UPDATE persona SET datetime_aware = TRUE")
op.alter_column("persona", "datetime_aware", nullable=False)
op.alter_column("persona", "description", existing_type=sa.VARCHAR(), nullable=True)
op.drop_column("persona", "recency_bias")
op.drop_column("persona", "llm_filter_extraction")
op.drop_column("persona", "llm_relevance_filter")
op.drop_column("persona", "search_type")
op.drop_column("persona", "user_id")
op.add_column(
"document_retrieval_feedback",
sa.Column("qa_event_id", sa.INTEGER(), autoincrement=False, nullable=False),
)
op.drop_column("document_retrieval_feedback", "chat_message_id")
op.alter_column(
"chat_session", "persona_id", existing_type=sa.INTEGER(), nullable=True
)
op.drop_column("chat_session", "one_shot")
op.add_column(
"chat_message",
sa.Column(
"message_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
primary_key=True,
),
)
op.add_column(
"chat_message",
sa.Column("latest", sa.BOOLEAN(), autoincrement=False, nullable=False),
)
op.add_column(
"chat_message",
sa.Column(
"edit_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
primary_key=True,
),
)
op.add_column(
"chat_message",
sa.Column(
"reference_docs",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
nullable=True,
),
)
op.add_column(
"chat_message",
sa.Column("persona_id", sa.INTEGER(), autoincrement=False, nullable=True),
)
op.add_column(
"chat_message",
sa.Column(
"parent_edit_number",
sa.INTEGER(),
autoincrement=False,
nullable=True,
),
)
op.create_foreign_key(
"fk_chat_message_persona_id",
"chat_message",
"persona",
["persona_id"],
["id"],
)
op.drop_column("chat_message", "error")
op.drop_column("chat_message", "citations")
op.drop_column("chat_message", "prompt_id")
op.drop_column("chat_message", "rephrased_query")
op.drop_column("chat_message", "latest_child_message")
op.drop_column("chat_message", "parent_message")
op.drop_column("chat_message", "id")
op.add_column(
"chat_feedback",
sa.Column(
"chat_message_message_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
),
)
op.add_column(
"chat_feedback",
sa.Column(
"chat_message_chat_session_id",
sa.INTEGER(),
autoincrement=False,
nullable=False,
primary_key=True,
),
)
op.add_column(
"chat_feedback",
sa.Column(
"chat_message_edit_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
),
)
op.drop_column("chat_feedback", "chat_message_id")
op.create_table(
"query_event",
sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
sa.Column("query", sa.VARCHAR(), autoincrement=False, nullable=False),
sa.Column(
"selected_search_flow",
sa.VARCHAR(),
autoincrement=False,
nullable=True,
),
sa.Column("llm_answer", sa.VARCHAR(), autoincrement=False, nullable=True),
sa.Column("feedback", sa.VARCHAR(), autoincrement=False, nullable=True),
sa.Column("user_id", sa.UUID(), autoincrement=False, nullable=True),
sa.Column(
"time_created",
postgresql.TIMESTAMP(timezone=True),
server_default=sa.text("now()"),
autoincrement=False,
nullable=False,
),
sa.Column(
"retrieved_document_ids",
postgresql.ARRAY(sa.VARCHAR()),
autoincrement=False,
nullable=True,
),
sa.Column("chat_session_id", sa.INTEGER(), autoincrement=False, nullable=True),
sa.ForeignKeyConstraint(
["chat_session_id"],
["chat_session.id"],
name="fk_query_event_chat_session_id",
),
sa.ForeignKeyConstraint(
["user_id"], ["user.id"], name="query_event_user_id_fkey"
),
sa.PrimaryKeyConstraint("id", name="query_event_pkey"),
)
op.drop_table("chat_message__search_doc")
op.drop_table("persona__prompt")
op.drop_table("prompt")
op.drop_table("search_doc")
op.create_unique_constraint(
"uq_chat_message_combination",
"chat_message",
["chat_session_id", "message_number", "edit_number"],
)
op.create_foreign_key(
"chat_feedback_chat_message_chat_session_id_chat_message_me_fkey",
"chat_feedback",
"chat_message",
[
"chat_message_chat_session_id",
"chat_message_message_number",
"chat_message_edit_number",
],
["chat_session_id", "message_number", "edit_number"],
)
op.create_foreign_key(
"document_retrieval_feedback_qa_event_id_fkey",
"document_retrieval_feedback",
"query_event",
["qa_event_id"],
["id"],
)
op.execute("DROP TYPE IF EXISTS searchtype")
op.execute("DROP TYPE IF EXISTS recencybiassetting")
op.execute("DROP TYPE IF EXISTS documentsource")

View File

@@ -0,0 +1,26 @@
"""Add llm_model_version_override to Persona
Revision ID: baf71f781b9e
Revises: 50b683a8295c
Create Date: 2023-12-06 21:56:50.286158
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "baf71f781b9e"
down_revision = "50b683a8295c"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("llm_model_version_override", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "llm_model_version_override")

View File

@@ -0,0 +1,73 @@
"""Remove deletion_attempt table
Revision ID: d5645c915d0e
Revises: 8e26726b7683
Create Date: 2023-09-14 15:04:14.444909
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "d5645c915d0e"
down_revision = "8e26726b7683"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.drop_table("deletion_attempt")
def downgrade() -> None:
op.create_table(
"deletion_attempt",
sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
sa.Column("connector_id", sa.INTEGER(), autoincrement=False, nullable=False),
sa.Column("credential_id", sa.INTEGER(), autoincrement=False, nullable=False),
sa.Column(
"status",
postgresql.ENUM(
"NOT_STARTED",
"IN_PROGRESS",
"SUCCESS",
"FAILED",
name="deletionstatus",
),
autoincrement=False,
nullable=False,
),
sa.Column(
"num_docs_deleted",
sa.INTEGER(),
autoincrement=False,
nullable=False,
),
sa.Column("error_msg", sa.VARCHAR(), autoincrement=False, nullable=True),
sa.Column(
"time_created",
postgresql.TIMESTAMP(timezone=True),
server_default=sa.text("now()"),
autoincrement=False,
nullable=False,
),
sa.Column(
"time_updated",
postgresql.TIMESTAMP(timezone=True),
server_default=sa.text("now()"),
autoincrement=False,
nullable=False,
),
sa.ForeignKeyConstraint(
["connector_id"],
["connector.id"],
name="deletion_attempt_connector_id_fkey",
),
sa.ForeignKeyConstraint(
["credential_id"],
["credential.id"],
name="deletion_attempt_credential_id_fkey",
),
sa.PrimaryKeyConstraint("id", name="deletion_attempt_pkey"),
)

View File

@@ -0,0 +1,32 @@
"""Add Total Docs for Index Attempt
Revision ID: d61e513bef0a
Revises: 46625e4745d4
Create Date: 2023-10-27 23:02:43.369964
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "d61e513bef0a"
down_revision = "46625e4745d4"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"index_attempt",
sa.Column("new_docs_indexed", sa.Integer(), nullable=True),
)
op.alter_column(
"index_attempt", "num_docs_indexed", new_column_name="total_docs_indexed"
)
def downgrade() -> None:
op.alter_column(
"index_attempt", "total_docs_indexed", new_column_name="num_docs_indexed"
)
op.drop_column("index_attempt", "new_docs_indexed")

View File

@@ -0,0 +1,32 @@
"""Remove Document IDs
Revision ID: d7111c1238cd
Revises: 465f78d9b7f9
Create Date: 2023-07-29 15:06:25.126169
"""
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "d7111c1238cd"
down_revision = "465f78d9b7f9"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.drop_column("index_attempt", "document_ids")
def downgrade() -> None:
op.add_column(
"index_attempt",
sa.Column(
"document_ids",
postgresql.ARRAY(sa.VARCHAR()),
autoincrement=False,
nullable=True,
),
)

View File

@@ -0,0 +1,94 @@
"""Feedback Feature
Revision ID: d929f0c1c6af
Revises: 8aabb57f3b49
Create Date: 2023-08-27 13:03:54.274987
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "d929f0c1c6af"
down_revision = "8aabb57f3b49"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"query_event",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("query", sa.String(), nullable=False),
sa.Column(
"selected_search_flow",
sa.Enum("KEYWORD", "SEMANTIC", name="searchtype", native_enum=False),
nullable=True,
),
sa.Column("llm_answer", sa.String(), nullable=True),
sa.Column(
"feedback",
sa.Enum("LIKE", "DISLIKE", name="qafeedbacktype", native_enum=False),
nullable=True,
),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"document_retrieval_feedback",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("qa_event_id", sa.Integer(), nullable=False),
sa.Column("document_id", sa.String(), nullable=False),
sa.Column("document_rank", sa.Integer(), nullable=False),
sa.Column("clicked", sa.Boolean(), nullable=False),
sa.Column(
"feedback",
sa.Enum(
"ENDORSE",
"REJECT",
"HIDE",
"UNHIDE",
name="searchfeedbacktype",
native_enum=False,
),
nullable=True,
),
sa.ForeignKeyConstraint(
["document_id"],
["document.id"],
),
sa.ForeignKeyConstraint(
["qa_event_id"],
["query_event.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.add_column("document", sa.Column("boost", sa.Integer(), nullable=False))
op.add_column("document", sa.Column("hidden", sa.Boolean(), nullable=False))
op.add_column("document", sa.Column("semantic_id", sa.String(), nullable=False))
op.add_column("document", sa.Column("link", sa.String(), nullable=True))
def downgrade() -> None:
op.drop_column("document", "link")
op.drop_column("document", "semantic_id")
op.drop_column("document", "hidden")
op.drop_column("document", "boost")
op.drop_table("document_retrieval_feedback")
op.drop_table("query_event")

View File

@@ -0,0 +1,29 @@
"""Danswer Custom Tool Flow
Revision ID: dba7f71618f5
Revises: d5645c915d0e
Create Date: 2023-09-18 15:18:37.370972
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "dba7f71618f5"
down_revision = "d5645c915d0e"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("retrieval_enabled", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET retrieval_enabled = true")
op.alter_column("persona", "retrieval_enabled", nullable=False)
def downgrade() -> None:
op.drop_column("persona", "retrieval_enabled")

View File

@@ -0,0 +1,111 @@
"""Added deletion_attempt table
Revision ID: df0c7ad8a076
Revises: d7111c1238cd
Create Date: 2023-08-05 13:35:39.609619
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "df0c7ad8a076"
down_revision = "d7111c1238cd"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"document",
sa.Column("id", sa.String(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"chunk",
sa.Column("id", sa.String(), nullable=False),
sa.Column(
"document_store_type",
sa.Enum(
"VECTOR",
"KEYWORD",
name="documentstoretype",
native_enum=False,
),
nullable=False,
),
sa.Column("document_id", sa.String(), nullable=False),
sa.ForeignKeyConstraint(
["document_id"],
["document.id"],
),
sa.PrimaryKeyConstraint("id", "document_store_type"),
)
op.create_table(
"deletion_attempt",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("connector_id", sa.Integer(), nullable=False),
sa.Column("credential_id", sa.Integer(), nullable=False),
sa.Column(
"status",
sa.Enum(
"NOT_STARTED",
"IN_PROGRESS",
"SUCCESS",
"FAILED",
name="deletionstatus",
native_enum=False,
),
nullable=False,
),
sa.Column("num_docs_deleted", sa.Integer(), nullable=False),
sa.Column("error_msg", sa.String(), nullable=True),
sa.Column(
"time_created",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column(
"time_updated",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["connector_id"],
["connector.id"],
),
sa.ForeignKeyConstraint(
["credential_id"],
["credential.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"document_by_connector_credential_pair",
sa.Column("id", sa.String(), nullable=False),
sa.Column("connector_id", sa.Integer(), nullable=False),
sa.Column("credential_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["connector_id"],
["connector.id"],
),
sa.ForeignKeyConstraint(
["credential_id"],
["credential.id"],
),
sa.ForeignKeyConstraint(
["id"],
["document.id"],
),
sa.PrimaryKeyConstraint("id", "connector_id", "credential_id"),
)
def downgrade() -> None:
op.drop_table("document_by_connector_credential_pair")
op.drop_table("deletion_attempt")
op.drop_table("chunk")
op.drop_table("document")

View File

@@ -0,0 +1,44 @@
"""Add Chat Feedback
Revision ID: e0a68a81d434
Revises: ae62505e3acc
Create Date: 2023-10-04 20:22:33.380286
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "e0a68a81d434"
down_revision = "ae62505e3acc"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"chat_feedback",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("chat_message_chat_session_id", sa.Integer(), nullable=False),
sa.Column("chat_message_message_number", sa.Integer(), nullable=False),
sa.Column("chat_message_edit_number", sa.Integer(), nullable=False),
sa.Column("is_positive", sa.Boolean(), nullable=True),
sa.Column("feedback_text", sa.Text(), nullable=True),
sa.ForeignKeyConstraint(
[
"chat_message_chat_session_id",
"chat_message_message_number",
"chat_message_edit_number",
],
[
"chat_message.chat_session_id",
"chat_message.message_number",
"chat_message.edit_number",
],
),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("chat_feedback")

View File

@@ -0,0 +1,31 @@
"""Add index for retrieving latest index_attempt
Revision ID: e6a4bbc13fe4
Revises: b082fec533f0
Create Date: 2023-08-10 12:37:23.335471
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "e6a4bbc13fe4"
down_revision = "b082fec533f0"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_index(
op.f("ix_index_attempt_latest_for_connector_credential_pair"),
"index_attempt",
["connector_id", "credential_id", "time_created"],
unique=False,
)
def downgrade() -> None:
op.drop_index(
op.f("ix_index_attempt_latest_for_connector_credential_pair"),
table_name="index_attempt",
)

View File

@@ -0,0 +1,27 @@
"""Add persona to chat_session
Revision ID: e86866a9c78a
Revises: 80696cf850ae
Create Date: 2023-11-26 02:51:47.657357
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "e86866a9c78a"
down_revision = "80696cf850ae"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column("chat_session", sa.Column("persona_id", sa.Integer(), nullable=True))
op.create_foreign_key(
"fk_chat_session_persona_id", "chat_session", "persona", ["persona_id"], ["id"]
)
def downgrade() -> None:
op.drop_constraint("fk_chat_session_persona_id", "chat_session", type_="foreignkey")
op.drop_column("chat_session", "persona_id")

View File

@@ -0,0 +1,37 @@
"""Add document_set / persona relationship table
Revision ID: febe9eaa0644
Revises: 57b53544726e
Create Date: 2023-09-24 13:06:24.018610
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "febe9eaa0644"
down_revision = "57b53544726e"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"persona__document_set",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column("document_set_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["document_set_id"],
["document_set.id"],
),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.PrimaryKeyConstraint("persona_id", "document_set_id"),
)
def downgrade() -> None:
op.drop_table("persona__document_set")

View File

@@ -0,0 +1,37 @@
"""Basic Document Metadata
Revision ID: ffc707a226b4
Revises: 30c1d5744104
Create Date: 2023-10-18 16:52:25.967592
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "ffc707a226b4"
down_revision = "30c1d5744104"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.add_column(
"document",
sa.Column("doc_updated_at", sa.DateTime(timezone=True), nullable=True),
)
op.add_column(
"document",
sa.Column("primary_owners", postgresql.ARRAY(sa.String()), nullable=True),
)
op.add_column(
"document",
sa.Column("secondary_owners", postgresql.ARRAY(sa.String()), nullable=True),
)
def downgrade() -> None:
op.drop_column("document", "secondary_owners")
op.drop_column("document", "primary_owners")
op.drop_column("document", "doc_updated_at")

View File

@@ -0,0 +1,3 @@
import os
__version__ = os.environ.get("DANSWER_VERSION", "") or "0.3-dev"

View File

View File

@@ -0,0 +1,62 @@
from sqlalchemy.orm import Session
from danswer.access.models import DocumentAccess
from danswer.configs.constants import PUBLIC_DOC_PAT
from danswer.db.document import get_acccess_info_for_documents
from danswer.db.models import User
from danswer.server.documents.models import ConnectorCredentialPairIdentifier
from danswer.utils.variable_functionality import fetch_versioned_implementation
def _get_access_for_documents(
document_ids: list[str],
db_session: Session,
cc_pair_to_delete: ConnectorCredentialPairIdentifier | None = None,
) -> dict[str, DocumentAccess]:
document_access_info = get_acccess_info_for_documents(
db_session=db_session,
document_ids=document_ids,
cc_pair_to_delete=cc_pair_to_delete,
)
return {
document_id: DocumentAccess.build(user_ids, is_public)
for document_id, user_ids, is_public in document_access_info
}
def get_access_for_documents(
document_ids: list[str],
db_session: Session,
cc_pair_to_delete: ConnectorCredentialPairIdentifier | None = None,
) -> dict[str, DocumentAccess]:
"""Fetches all access information for the given documents."""
versioned_get_access_for_documents_fn = fetch_versioned_implementation(
"danswer.access.access", "_get_access_for_documents"
)
return versioned_get_access_for_documents_fn(
document_ids, db_session, cc_pair_to_delete
) # type: ignore
def prefix_user(user_id: str) -> str:
"""Prefixes a user ID to eliminate collision with group names.
This assumes that groups are prefixed with a different prefix."""
return f"user_id:{user_id}"
def _get_acl_for_user(user: User | None, db_session: Session) -> set[str]:
"""Returns a list of ACL entries that the user has access to. This is meant to be
used downstream to filter out documents that the user does not have access to. The
user should have access to a document if at least one entry in the document's ACL
matches one entry in the returned set.
"""
if user:
return {prefix_user(str(user.id)), PUBLIC_DOC_PAT}
return {PUBLIC_DOC_PAT}
def get_acl_for_user(user: User | None, db_session: Session | None = None) -> set[str]:
versioned_acl_for_user_fn = fetch_versioned_implementation(
"danswer.access.access", "_get_acl_for_user"
)
return versioned_acl_for_user_fn(user, db_session) # type: ignore

View File

@@ -0,0 +1,20 @@
from dataclasses import dataclass
from uuid import UUID
from danswer.configs.constants import PUBLIC_DOC_PAT
@dataclass(frozen=True)
class DocumentAccess:
user_ids: set[str] # stringified UUIDs
is_public: bool
def to_acl(self) -> list[str]:
return list(self.user_ids) + ([PUBLIC_DOC_PAT] if self.is_public else [])
@classmethod
def build(cls, user_ids: list[UUID | None], is_public: bool) -> "DocumentAccess":
return cls(
user_ids={str(user_id) for user_id in user_ids if user_id},
is_public=is_public,
)

View File

View File

@@ -0,0 +1,21 @@
import uuid
from enum import Enum
from fastapi_users import schemas
class UserRole(str, Enum):
BASIC = "basic"
ADMIN = "admin"
class UserRead(schemas.BaseUser[uuid.UUID]):
role: UserRole
class UserCreate(schemas.BaseUserCreate):
role: UserRole = UserRole.BASIC
class UserUpdate(schemas.BaseUserUpdate):
role: UserRole

View File

@@ -0,0 +1,316 @@
import os
import smtplib
import uuid
from collections.abc import AsyncGenerator
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from typing import Optional
from typing import Tuple
from fastapi import APIRouter
from fastapi import Depends
from fastapi import HTTPException
from fastapi import Request
from fastapi import Response
from fastapi import status
from fastapi_users import BaseUserManager
from fastapi_users import FastAPIUsers
from fastapi_users import models
from fastapi_users import schemas
from fastapi_users import UUIDIDMixin
from fastapi_users.authentication import AuthenticationBackend
from fastapi_users.authentication import CookieTransport
from fastapi_users.authentication import Strategy
from fastapi_users.authentication.strategy.db import AccessTokenDatabase
from fastapi_users.authentication.strategy.db import DatabaseStrategy
from fastapi_users.db import SQLAlchemyUserDatabase
from fastapi_users.openapi import OpenAPIResponseType
from sqlalchemy.orm import Session
from danswer.auth.schemas import UserCreate
from danswer.auth.schemas import UserRole
from danswer.configs.app_configs import AUTH_TYPE
from danswer.configs.app_configs import DISABLE_AUTH
from danswer.configs.app_configs import REQUIRE_EMAIL_VERIFICATION
from danswer.configs.app_configs import SECRET
from danswer.configs.app_configs import SESSION_EXPIRE_TIME_SECONDS
from danswer.configs.app_configs import SMTP_PASS
from danswer.configs.app_configs import SMTP_PORT
from danswer.configs.app_configs import SMTP_SERVER
from danswer.configs.app_configs import SMTP_USER
from danswer.configs.app_configs import VALID_EMAIL_DOMAINS
from danswer.configs.app_configs import WEB_DOMAIN
from danswer.configs.constants import AuthType
from danswer.db.auth import get_access_token_db
from danswer.db.auth import get_user_count
from danswer.db.auth import get_user_db
from danswer.db.engine import get_session
from danswer.db.models import AccessToken
from danswer.db.models import User
from danswer.utils.logger import setup_logger
from danswer.utils.telemetry import optional_telemetry
from danswer.utils.telemetry import RecordType
from danswer.utils.variable_functionality import fetch_versioned_implementation
logger = setup_logger()
USER_WHITELIST_FILE = "/home/danswer_whitelist.txt"
_user_whitelist: list[str] | None = None
def verify_auth_setting() -> None:
if AUTH_TYPE not in [AuthType.DISABLED, AuthType.BASIC, AuthType.GOOGLE_OAUTH]:
raise ValueError(
"User must choose a valid user authentication method: "
"disabled, basic, or google_oauth"
)
logger.info(f"Using Auth Type: {AUTH_TYPE.value}")
def user_needs_to_be_verified() -> bool:
# all other auth types besides basic should require users to be
# verified
return AUTH_TYPE != AuthType.BASIC or REQUIRE_EMAIL_VERIFICATION
def get_user_whitelist() -> list[str]:
global _user_whitelist
if _user_whitelist is None:
if os.path.exists(USER_WHITELIST_FILE):
with open(USER_WHITELIST_FILE, "r") as file:
_user_whitelist = [line.strip() for line in file]
else:
_user_whitelist = []
return _user_whitelist
def verify_email_in_whitelist(email: str) -> None:
whitelist = get_user_whitelist()
if (whitelist and email not in whitelist) or not email:
raise PermissionError("User not on allowed user whitelist")
def verify_email_domain(email: str) -> None:
if VALID_EMAIL_DOMAINS:
if email.count("@") != 1:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Email is not valid",
)
domain = email.split("@")[-1]
if domain not in VALID_EMAIL_DOMAINS:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Email domain is not valid",
)
def send_user_verification_email(user_email: str, token: str) -> None:
msg = MIMEMultipart()
msg["Subject"] = "Danswer Email Verification"
msg["To"] = user_email
link = f"{WEB_DOMAIN}/auth/verify-email?token={token}"
body = MIMEText(f"Click the following link to verify your email address: {link}")
msg.attach(body)
with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as s:
s.starttls()
# If credentials fails with gmail, check (You need an app password, not just the basic email password)
# https://support.google.com/accounts/answer/185833?sjid=8512343437447396151-NA
s.login(SMTP_USER, SMTP_PASS)
s.send_message(msg)
class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
reset_password_token_secret = SECRET
verification_token_secret = SECRET
async def create(
self,
user_create: schemas.UC | UserCreate,
safe: bool = False,
request: Optional[Request] = None,
) -> models.UP:
verify_email_in_whitelist(user_create.email)
verify_email_domain(user_create.email)
if hasattr(user_create, "role"):
user_count = await get_user_count()
if user_count == 0:
user_create.role = UserRole.ADMIN
else:
user_create.role = UserRole.BASIC
return await super().create(user_create, safe=safe, request=request) # type: ignore
async def oauth_callback(
self: "BaseUserManager[models.UOAP, models.ID]",
oauth_name: str,
access_token: str,
account_id: str,
account_email: str,
expires_at: Optional[int] = None,
refresh_token: Optional[str] = None,
request: Optional[Request] = None,
*,
associate_by_email: bool = False,
is_verified_by_default: bool = False,
) -> models.UOAP:
verify_email_in_whitelist(account_email)
verify_email_domain(account_email)
return await super().oauth_callback( # type: ignore
oauth_name=oauth_name,
access_token=access_token,
account_id=account_id,
account_email=account_email,
expires_at=expires_at,
refresh_token=refresh_token,
request=request,
associate_by_email=associate_by_email,
is_verified_by_default=is_verified_by_default,
)
async def on_after_register(
self, user: User, request: Optional[Request] = None
) -> None:
logger.info(f"User {user.id} has registered.")
optional_telemetry(record_type=RecordType.SIGN_UP, data={"user": "create"})
async def on_after_forgot_password(
self, user: User, token: str, request: Optional[Request] = None
) -> None:
logger.info(f"User {user.id} has forgot their password. Reset token: {token}")
async def on_after_request_verify(
self, user: User, token: str, request: Optional[Request] = None
) -> None:
verify_email_domain(user.email)
logger.info(
f"Verification requested for user {user.id}. Verification token: {token}"
)
send_user_verification_email(user.email, token)
async def get_user_manager(
user_db: SQLAlchemyUserDatabase = Depends(get_user_db),
) -> AsyncGenerator[UserManager, None]:
yield UserManager(user_db)
cookie_transport = CookieTransport(cookie_max_age=SESSION_EXPIRE_TIME_SECONDS)
def get_database_strategy(
access_token_db: AccessTokenDatabase[AccessToken] = Depends(get_access_token_db),
) -> DatabaseStrategy:
return DatabaseStrategy(
access_token_db, lifetime_seconds=SESSION_EXPIRE_TIME_SECONDS # type: ignore
)
auth_backend = AuthenticationBackend(
name="database",
transport=cookie_transport,
get_strategy=get_database_strategy,
)
class FastAPIUserWithLogoutRouter(FastAPIUsers[models.UP, models.ID]):
def get_logout_router(
self,
backend: AuthenticationBackend,
requires_verification: bool = REQUIRE_EMAIL_VERIFICATION,
) -> APIRouter:
"""
Provide a router for logout only for OAuth/OIDC Flows.
This way the login router does not need to be included
"""
router = APIRouter()
get_current_user_token = self.authenticator.current_user_token(
active=True, verified=requires_verification
)
logout_responses: OpenAPIResponseType = {
**{
status.HTTP_401_UNAUTHORIZED: {
"description": "Missing token or inactive user."
}
},
**backend.transport.get_openapi_logout_responses_success(),
}
@router.post(
"/logout", name=f"auth:{backend.name}.logout", responses=logout_responses
)
async def logout(
user_token: Tuple[models.UP, str] = Depends(get_current_user_token),
strategy: Strategy[models.UP, models.ID] = Depends(backend.get_strategy),
) -> Response:
user, token = user_token
return await backend.logout(strategy, user, token)
return router
fastapi_users = FastAPIUserWithLogoutRouter[User, uuid.UUID](
get_user_manager, [auth_backend]
)
# NOTE: verified=REQUIRE_EMAIL_VERIFICATION is not used here since we
# take care of that in `double_check_user` ourself. This is needed, since
# we want the /me endpoint to still return a user even if they are not
# yet verified, so that the frontend knows they exist
optional_valid_user = fastapi_users.current_user(active=True, optional=True)
async def double_check_user(
request: Request,
user: User | None,
db_session: Session,
optional: bool = DISABLE_AUTH,
) -> User | None:
if optional:
return None
if user is None:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not authenticated.",
)
if user_needs_to_be_verified() and not user.is_verified:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not verified.",
)
return user
async def current_user(
request: Request,
user: User | None = Depends(optional_valid_user),
db_session: Session = Depends(get_session),
) -> User | None:
double_check_user = fetch_versioned_implementation(
"danswer.auth.users", "double_check_user"
)
user = await double_check_user(request, user, db_session)
return user
async def current_admin_user(user: User | None = Depends(current_user)) -> User | None:
if DISABLE_AUTH:
return None
if not user or not hasattr(user, "role") or user.role != UserRole.ADMIN:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not an admin.",
)
return user

View File

@@ -0,0 +1,230 @@
import os
from datetime import timedelta
from pathlib import Path
from typing import cast
from celery import Celery # type: ignore
from sqlalchemy.orm import Session
from danswer.background.connector_deletion import delete_connector_credential_pair
from danswer.background.task_utils import build_celery_task_wrapper
from danswer.background.task_utils import name_cc_cleanup_task
from danswer.background.task_utils import name_document_set_sync_task
from danswer.configs.app_configs import FILE_CONNECTOR_TMP_STORAGE_PATH
from danswer.configs.app_configs import JOB_TIMEOUT
from danswer.connectors.file.utils import file_age_in_hours
from danswer.db.connector_credential_pair import get_connector_credential_pair
from danswer.db.deletion_attempt import check_deletion_attempt_is_allowed
from danswer.db.document import prepare_to_modify_documents
from danswer.db.document_set import delete_document_set
from danswer.db.document_set import fetch_document_sets
from danswer.db.document_set import fetch_document_sets_for_documents
from danswer.db.document_set import fetch_documents_for_document_set
from danswer.db.document_set import get_document_set_by_id
from danswer.db.document_set import mark_document_set_as_synced
from danswer.db.engine import build_connection_string
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.engine import SYNC_DB_API
from danswer.db.models import DocumentSet
from danswer.db.tasks import check_live_task_not_timed_out
from danswer.db.tasks import get_latest_task
from danswer.document_index.factory import get_default_document_index
from danswer.document_index.interfaces import DocumentIndex
from danswer.document_index.interfaces import UpdateRequest
from danswer.utils.batching import batch_generator
from danswer.utils.logger import setup_logger
logger = setup_logger()
connection_string = build_connection_string(db_api=SYNC_DB_API)
celery_broker_url = f"sqla+{connection_string}"
celery_backend_url = f"db+{connection_string}"
celery_app = Celery(__name__, broker=celery_broker_url, backend=celery_backend_url)
_SYNC_BATCH_SIZE = 1000
#####
# Tasks that need to be run in job queue, registered via APIs
#
# If imports from this module are needed, use local imports to avoid circular importing
#####
@build_celery_task_wrapper(name_cc_cleanup_task)
@celery_app.task(soft_time_limit=JOB_TIMEOUT)
def cleanup_connector_credential_pair_task(
connector_id: int,
credential_id: int,
) -> int:
"""Connector deletion task. This is run as an async task because it is a somewhat slow job.
Needs to potentially update a large number of Postgres and Vespa docs, including deleting them
or updating the ACL"""
engine = get_sqlalchemy_engine()
with Session(engine) as db_session:
# validate that the connector / credential pair is deletable
cc_pair = get_connector_credential_pair(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
)
if not cc_pair or not check_deletion_attempt_is_allowed(
connector_credential_pair=cc_pair
):
raise ValueError(
"Cannot run deletion attempt - connector_credential_pair is not deletable. "
"This is likely because there is an ongoing / planned indexing attempt OR the "
"connector is not disabled."
)
try:
# The bulk of the work is in here, updates Postgres and Vespa
return delete_connector_credential_pair(
db_session=db_session,
document_index=get_default_document_index(),
cc_pair=cc_pair,
)
except Exception as e:
logger.exception(f"Failed to run connector_deletion due to {e}")
raise e
@build_celery_task_wrapper(name_document_set_sync_task)
@celery_app.task(soft_time_limit=JOB_TIMEOUT)
def sync_document_set_task(document_set_id: int) -> None:
"""For document sets marked as not up to date, sync the state from postgres
into the datastore. Also handles deletions."""
def _sync_document_batch(
document_ids: list[str], document_index: DocumentIndex
) -> None:
logger.debug(f"Syncing document sets for: {document_ids}")
# begin a transaction, release lock at the end
with Session(get_sqlalchemy_engine()) as db_session:
# acquires a lock on the documents so that no other process can modify them
prepare_to_modify_documents(
db_session=db_session, document_ids=document_ids
)
# get current state of document sets for these documents
document_set_map = {
document_id: document_sets
for document_id, document_sets in fetch_document_sets_for_documents(
document_ids=document_ids, db_session=db_session
)
}
# update Vespa
document_index.update(
update_requests=[
UpdateRequest(
document_ids=[document_id],
document_sets=set(document_set_map.get(document_id, [])),
)
for document_id in document_ids
]
)
with Session(get_sqlalchemy_engine()) as db_session:
try:
document_index = get_default_document_index()
documents_to_update = fetch_documents_for_document_set(
document_set_id=document_set_id,
db_session=db_session,
current_only=False,
)
for document_batch in batch_generator(
documents_to_update, _SYNC_BATCH_SIZE
):
_sync_document_batch(
document_ids=[document.id for document in document_batch],
document_index=document_index,
)
# if there are no connectors, then delete the document set. Otherwise, just
# mark it as successfully synced.
document_set = cast(
DocumentSet,
get_document_set_by_id(
db_session=db_session, document_set_id=document_set_id
),
) # casting since we "know" a document set with this ID exists
if not document_set.connector_credential_pairs:
delete_document_set(
document_set_row=document_set, db_session=db_session
)
logger.info(
f"Successfully deleted document set with ID: '{document_set_id}'!"
)
else:
mark_document_set_as_synced(
document_set_id=document_set_id, db_session=db_session
)
logger.info(f"Document set sync for '{document_set_id}' complete!")
except Exception:
logger.exception("Failed to sync document set %s", document_set_id)
raise
#####
# Periodic Tasks
#####
@celery_app.task(
name="check_for_document_sets_sync_task",
soft_time_limit=JOB_TIMEOUT,
)
def check_for_document_sets_sync_task() -> None:
"""Runs periodically to check if any document sets are out of sync
Creates a task to sync the set if needed"""
with Session(get_sqlalchemy_engine()) as db_session:
# check if any document sets are not synced
document_set_info = fetch_document_sets(
db_session=db_session, include_outdated=True
)
for document_set, _ in document_set_info:
if not document_set.is_up_to_date:
task_name = name_document_set_sync_task(document_set.id)
latest_sync = get_latest_task(task_name, db_session)
if latest_sync and check_live_task_not_timed_out(
latest_sync, db_session
):
logger.info(
f"Document set '{document_set.id}' is already syncing. Skipping."
)
continue
logger.info(f"Document set {document_set.id} syncing now!")
sync_document_set_task.apply_async(
kwargs=dict(document_set_id=document_set.id),
)
@celery_app.task(name="clean_old_temp_files_task", soft_time_limit=JOB_TIMEOUT)
def clean_old_temp_files_task(
age_threshold_in_hours: float | int = 24 * 7, # 1 week,
base_path: Path | str = FILE_CONNECTOR_TMP_STORAGE_PATH,
) -> None:
"""Files added via the File connector need to be deleted after ingestion
Currently handled async of the indexing job"""
os.makedirs(base_path, exist_ok=True)
for file in os.listdir(base_path):
full_file_path = Path(base_path) / file
if file_age_in_hours(full_file_path) > age_threshold_in_hours:
logger.info(f"Cleaning up uploaded file: {full_file_path}")
os.remove(full_file_path)
#####
# Celery Beat (Periodic Tasks) Settings
#####
celery_app.conf.beat_schedule = {
"check-for-document-set-sync": {
"task": "check_for_document_sets_sync_task",
"schedule": timedelta(seconds=5),
},
"clean-old-temp-files": {
"task": "clean_old_temp_files_task",
"schedule": timedelta(minutes=30),
},
}

View File

@@ -0,0 +1,23 @@
from sqlalchemy.orm import Session
from danswer.background.task_utils import name_cc_cleanup_task
from danswer.db.tasks import get_latest_task
from danswer.server.documents.models import DeletionAttemptSnapshot
def get_deletion_status(
connector_id: int, credential_id: int, db_session: Session
) -> DeletionAttemptSnapshot | None:
cleanup_task_name = name_cc_cleanup_task(
connector_id=connector_id, credential_id=credential_id
)
task_state = get_latest_task(task_name=cleanup_task_name, db_session=db_session)
if not task_state:
return None
return DeletionAttemptSnapshot(
connector_id=connector_id,
credential_id=credential_id,
status=task_state.status,
)

View File

@@ -0,0 +1,201 @@
"""
To delete a connector / credential pair:
(1) find all documents associated with connector / credential pair where there
this the is only connector / credential pair that has indexed it
(2) delete all documents from document stores
(3) delete all entries from postgres
(4) find all documents associated with connector / credential pair where there
are multiple connector / credential pairs that have indexed it
(5) update document store entries to remove access associated with the
connector / credential pair from the access list
(6) delete all relevant entries from postgres
"""
import time
from sqlalchemy.orm import Session
from danswer.access.access import get_access_for_documents
from danswer.db.connector import fetch_connector_by_id
from danswer.db.connector_credential_pair import (
delete_connector_credential_pair__no_commit,
)
from danswer.db.document import delete_document_by_connector_credential_pair
from danswer.db.document import delete_documents_complete
from danswer.db.document import get_document_connector_cnts
from danswer.db.document import get_documents_for_connector_credential_pair
from danswer.db.document import prepare_to_modify_documents
from danswer.db.document_set import get_document_sets_by_ids
from danswer.db.document_set import (
mark_cc_pair__document_set_relationships_to_be_deleted__no_commit,
)
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.index_attempt import delete_index_attempts
from danswer.db.models import ConnectorCredentialPair
from danswer.document_index.interfaces import DocumentIndex
from danswer.document_index.interfaces import UpdateRequest
from danswer.server.documents.models import ConnectorCredentialPairIdentifier
from danswer.utils.logger import setup_logger
logger = setup_logger()
_DELETION_BATCH_SIZE = 1000
def _delete_connector_credential_pair_batch(
document_ids: list[str],
connector_id: int,
credential_id: int,
document_index: DocumentIndex,
) -> None:
with Session(get_sqlalchemy_engine()) as db_session:
# acquire lock for all documents in this batch so that indexing can't
# override the deletion
prepare_to_modify_documents(db_session=db_session, document_ids=document_ids)
document_connector_cnts = get_document_connector_cnts(
db_session=db_session, document_ids=document_ids
)
# figure out which docs need to be completely deleted
document_ids_to_delete = [
document_id for document_id, cnt in document_connector_cnts if cnt == 1
]
logger.debug(f"Deleting documents: {document_ids_to_delete}")
document_index.delete(doc_ids=document_ids_to_delete)
delete_documents_complete(
db_session=db_session,
document_ids=document_ids_to_delete,
)
# figure out which docs need to be updated
document_ids_to_update = [
document_id for document_id, cnt in document_connector_cnts if cnt > 1
]
access_for_documents = get_access_for_documents(
document_ids=document_ids_to_update,
db_session=db_session,
cc_pair_to_delete=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
update_requests = [
UpdateRequest(
document_ids=[document_id],
access=access,
)
for document_id, access in access_for_documents.items()
]
logger.debug(f"Updating documents: {document_ids_to_update}")
document_index.update(update_requests=update_requests)
delete_document_by_connector_credential_pair(
db_session=db_session,
document_ids=document_ids_to_update,
connector_credential_pair_identifier=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
db_session.commit()
def cleanup_synced_entities(
cc_pair: ConnectorCredentialPair, db_session: Session
) -> None:
"""Updates the document sets associated with the connector / credential pair,
then relies on the document set sync script to kick off Celery jobs which will
sync these updates to Vespa.
Waits until the document sets are synced before returning."""
logger.info(f"Cleaning up Document Sets for CC Pair with ID: '{cc_pair.id}'")
document_sets_ids_to_sync = list(
mark_cc_pair__document_set_relationships_to_be_deleted__no_commit(
cc_pair_id=cc_pair.id,
db_session=db_session,
)
)
db_session.commit()
# wait till all document sets are synced before continuing
while True:
all_synced = True
document_sets = get_document_sets_by_ids(
db_session=db_session, document_set_ids=document_sets_ids_to_sync
)
for document_set in document_sets:
if not document_set.is_up_to_date:
all_synced = False
if all_synced:
break
# wait for 30 seconds before checking again
db_session.commit() # end transaction
logger.info(
f"Document sets '{document_sets_ids_to_sync}' not synced yet, waiting 30s"
)
time.sleep(30)
logger.info(
f"Finished cleaning up Document Sets for CC Pair with ID: '{cc_pair.id}'"
)
def delete_connector_credential_pair(
db_session: Session,
document_index: DocumentIndex,
cc_pair: ConnectorCredentialPair,
) -> int:
connector_id = cc_pair.connector_id
credential_id = cc_pair.credential_id
num_docs_deleted = 0
while True:
documents = get_documents_for_connector_credential_pair(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
limit=_DELETION_BATCH_SIZE,
)
if not documents:
break
_delete_connector_credential_pair_batch(
document_ids=[document.id for document in documents],
connector_id=connector_id,
credential_id=credential_id,
document_index=document_index,
)
num_docs_deleted += len(documents)
# Clean up document sets / access information from Postgres
# and sync these updates to Vespa
# TODO: add user group cleanup with `fetch_versioned_implementation`
cleanup_synced_entities(cc_pair, db_session)
# clean up the rest of the related Postgres entities
delete_index_attempts(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
)
delete_connector_credential_pair__no_commit(
db_session=db_session,
connector_id=connector_id,
credential_id=credential_id,
)
# if there are no credentials left, delete the connector
connector = fetch_connector_by_id(
db_session=db_session,
connector_id=connector_id,
)
if not connector or not len(connector.credentials):
logger.debug("Found no credentials left for connector, deleting connector")
db_session.delete(connector)
db_session.commit()
logger.info(
"Successfully deleted connector_credential_pair with connector_id:"
f" '{connector_id}' and credential_id: '{credential_id}'. Deleted {num_docs_deleted} docs."
)
return num_docs_deleted

View File

@@ -0,0 +1,75 @@
"""Experimental functionality related to splitting up indexing
into a series of checkpoints to better handle intermittent failures
/ jobs being killed by cloud providers."""
import datetime
from danswer.configs.app_configs import EXPERIMENTAL_CHECKPOINTING_ENABLED
from danswer.configs.constants import DocumentSource
from danswer.connectors.cross_connector_utils.miscellaneous_utils import datetime_to_utc
def _2010_dt() -> datetime.datetime:
return datetime.datetime(year=2010, month=1, day=1, tzinfo=datetime.timezone.utc)
def _2020_dt() -> datetime.datetime:
return datetime.datetime(year=2020, month=1, day=1, tzinfo=datetime.timezone.utc)
def _default_end_time(
last_successful_run: datetime.datetime | None,
) -> datetime.datetime:
"""If year is before 2010, go to the beginning of 2010.
If year is 2010-2020, go in 5 year increments.
If year > 2020, then go in 180 day increments.
For connectors that don't support a `filter_by` and instead rely on `sort_by`
for polling, then this will cause a massive duplication of fetches. For these
connectors, you may want to override this function to return a more reasonable
plan (e.g. extending the 2020+ windows to 6 months, 1 year, or higher)."""
last_successful_run = (
datetime_to_utc(last_successful_run) if last_successful_run else None
)
if last_successful_run is None or last_successful_run < _2010_dt():
return _2010_dt()
if last_successful_run < _2020_dt():
return min(last_successful_run + datetime.timedelta(days=365 * 5), _2020_dt())
return last_successful_run + datetime.timedelta(days=180)
def find_end_time_for_indexing_attempt(
last_successful_run: datetime.datetime | None, source_type: DocumentSource
) -> datetime.datetime | None:
# NOTE: source_type can be used to override the default for certain connectors
end_of_window = _default_end_time(last_successful_run)
now = datetime.datetime.now(tz=datetime.timezone.utc)
if end_of_window < now:
return end_of_window
# None signals that we should index up to current time
return None
def get_time_windows_for_index_attempt(
last_successful_run: datetime.datetime, source_type: DocumentSource
) -> list[tuple[datetime.datetime, datetime.datetime]]:
if not EXPERIMENTAL_CHECKPOINTING_ENABLED:
return [(last_successful_run, datetime.datetime.now(tz=datetime.timezone.utc))]
time_windows: list[tuple[datetime.datetime, datetime.datetime]] = []
start_of_window: datetime.datetime | None = last_successful_run
while start_of_window:
end_of_window = find_end_time_for_indexing_attempt(
last_successful_run=start_of_window, source_type=source_type
)
time_windows.append(
(
start_of_window,
end_of_window or datetime.datetime.now(tz=datetime.timezone.utc),
)
)
start_of_window = end_of_window
return time_windows

View File

@@ -0,0 +1,33 @@
import asyncio
import psutil
from dask.distributed import WorkerPlugin
from distributed import Worker
from danswer.utils.logger import setup_logger
logger = setup_logger()
class ResourceLogger(WorkerPlugin):
def __init__(self, log_interval: int = 60 * 5):
self.log_interval = log_interval
def setup(self, worker: Worker) -> None:
"""This method will be called when the plugin is attached to a worker."""
self.worker = worker
worker.loop.add_callback(self.log_resources)
async def log_resources(self) -> None:
"""Periodically log CPU and memory usage.
NOTE: must be async or else will clog up the worker indefinitely due to the fact that
Dask uses Tornado under the hood (which is async)"""
while True:
cpu_percent = psutil.cpu_percent(interval=None)
memory_available_gb = psutil.virtual_memory().available / (1024.0**3)
# You can now log these values or send them to a monitoring service
logger.debug(
f"Worker {self.worker.address}: CPU usage {cpu_percent}%, Memory available {memory_available_gb}GB"
)
await asyncio.sleep(self.log_interval)

View File

@@ -0,0 +1,104 @@
"""Custom client that works similarly to Dask, but simpler and more lightweight.
Dask jobs behaved very strangely - they would die all the time, retries would
not follow the expected behavior, etc.
NOTE: cannot use Celery directly due to
https://github.com/celery/celery/issues/7007#issuecomment-1740139367"""
from collections.abc import Callable
from dataclasses import dataclass
from typing import Any
from typing import Literal
from torch import multiprocessing
from danswer.utils.logger import setup_logger
logger = setup_logger()
JobStatusType = (
Literal["error"]
| Literal["finished"]
| Literal["pending"]
| Literal["running"]
| Literal["cancelled"]
)
@dataclass
class SimpleJob:
"""Drop in replacement for `dask.distributed.Future`"""
id: int
process: multiprocessing.Process | None = None
def cancel(self) -> bool:
return self.release()
def release(self) -> bool:
if self.process is not None and self.process.is_alive():
self.process.terminate()
return True
return False
@property
def status(self) -> JobStatusType:
if not self.process:
return "pending"
elif self.process.is_alive():
return "running"
elif self.process.exitcode is None:
return "cancelled"
elif self.process.exitcode > 0:
return "error"
else:
return "finished"
def done(self) -> bool:
return (
self.status == "finished"
or self.status == "cancelled"
or self.status == "error"
)
def exception(self) -> str:
"""Needed to match the Dask API, but not implemented since we don't currently
have a way to get back the exception information from the child process."""
return (
f"Job with ID '{self.id}' was killed or encountered an unhandled exception."
)
class SimpleJobClient:
"""Drop in replacement for `dask.distributed.Client`"""
def __init__(self, n_workers: int = 1) -> None:
self.n_workers = n_workers
self.job_id_counter = 0
self.jobs: dict[int, SimpleJob] = {}
def _cleanup_completed_jobs(self) -> None:
current_job_ids = list(self.jobs.keys())
for job_id in current_job_ids:
job = self.jobs.get(job_id)
if job and job.done():
logger.debug(f"Cleaning up job with id: '{job.id}'")
del self.jobs[job.id]
def submit(self, func: Callable, *args: Any, pure: bool = True) -> SimpleJob | None:
"""NOTE: `pure` arg is needed so this can be a drop in replacement for Dask"""
self._cleanup_completed_jobs()
if len(self.jobs) >= self.n_workers:
logger.debug("No available workers to run job")
return None
job_id = self.job_id_counter
self.job_id_counter += 1
process = multiprocessing.Process(target=func, args=args, daemon=True)
job = SimpleJob(id=job_id, process=process)
process.start()
self.jobs[job_id] = job
return job

View File

@@ -0,0 +1,260 @@
import time
from datetime import datetime
from datetime import timezone
import torch
from sqlalchemy.orm import Session
from danswer.background.indexing.checkpointing import get_time_windows_for_index_attempt
from danswer.connectors.factory import instantiate_connector
from danswer.connectors.interfaces import GenerateDocumentsOutput
from danswer.connectors.interfaces import LoadConnector
from danswer.connectors.interfaces import PollConnector
from danswer.connectors.models import IndexAttemptMetadata
from danswer.connectors.models import InputType
from danswer.db.connector import disable_connector
from danswer.db.connector_credential_pair import get_last_successful_attempt_time
from danswer.db.connector_credential_pair import update_connector_credential_pair
from danswer.db.credentials import backend_update_credential_json
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.index_attempt import get_index_attempt
from danswer.db.index_attempt import mark_attempt_failed
from danswer.db.index_attempt import mark_attempt_in_progress
from danswer.db.index_attempt import mark_attempt_succeeded
from danswer.db.index_attempt import update_docs_indexed
from danswer.db.models import IndexAttempt
from danswer.db.models import IndexingStatus
from danswer.indexing.indexing_pipeline import build_indexing_pipeline
from danswer.utils.logger import IndexAttemptSingleton
from danswer.utils.logger import setup_logger
logger = setup_logger()
def _get_document_generator(
db_session: Session,
attempt: IndexAttempt,
start_time: datetime,
end_time: datetime,
) -> GenerateDocumentsOutput:
"""NOTE: `start_time` and `end_time` are only used for poll connectors"""
task = attempt.connector.input_type
try:
runnable_connector, new_credential_json = instantiate_connector(
attempt.connector.source,
task,
attempt.connector.connector_specific_config,
attempt.credential.credential_json,
)
if new_credential_json is not None:
backend_update_credential_json(
attempt.credential, new_credential_json, db_session
)
except Exception as e:
logger.exception(f"Unable to instantiate connector due to {e}")
disable_connector(attempt.connector.id, db_session)
raise e
if task == InputType.LOAD_STATE:
assert isinstance(runnable_connector, LoadConnector)
doc_batch_generator = runnable_connector.load_from_state()
elif task == InputType.POLL:
assert isinstance(runnable_connector, PollConnector)
if attempt.connector_id is None or attempt.credential_id is None:
raise ValueError(
f"Polling attempt {attempt.id} is missing connector_id or credential_id, "
f"can't fetch time range."
)
logger.info(f"Polling for updates between {start_time} and {end_time}")
doc_batch_generator = runnable_connector.poll_source(
start=start_time.timestamp(), end=end_time.timestamp()
)
else:
# Event types cannot be handled by a background type
raise RuntimeError(f"Invalid task type: {task}")
return doc_batch_generator
def _run_indexing(
db_session: Session,
index_attempt: IndexAttempt,
) -> None:
"""
1. Get documents which are either new or updated from specified application
2. Embed and index these documents into the chosen datastore (vespa)
3. Updates Postgres to record the indexed documents + the outcome of this run
"""
start_time = time.time()
# mark as started
mark_attempt_in_progress(index_attempt, db_session)
update_connector_credential_pair(
db_session=db_session,
connector_id=index_attempt.connector.id,
credential_id=index_attempt.credential.id,
attempt_status=IndexingStatus.IN_PROGRESS,
)
indexing_pipeline = build_indexing_pipeline()
db_connector = index_attempt.connector
db_credential = index_attempt.credential
last_successful_index_time = get_last_successful_attempt_time(
connector_id=db_connector.id,
credential_id=db_credential.id,
db_session=db_session,
)
net_doc_change = 0
document_count = 0
chunk_count = 0
run_end_dt = None
for ind, (window_start, window_end) in enumerate(
get_time_windows_for_index_attempt(
last_successful_run=datetime.fromtimestamp(
last_successful_index_time, tz=timezone.utc
),
source_type=db_connector.source,
)
):
doc_batch_generator = _get_document_generator(
db_session=db_session,
attempt=index_attempt,
start_time=window_start,
end_time=window_end,
)
try:
for doc_batch in doc_batch_generator:
# check if connector is disabled mid run and stop if so
db_session.refresh(db_connector)
if db_connector.disabled:
# let the `except` block handle this
raise RuntimeError("Connector was disabled mid run")
logger.debug(
f"Indexing batch of documents: {[doc.to_short_descriptor() for doc in doc_batch]}"
)
new_docs, total_batch_chunks = indexing_pipeline(
documents=doc_batch,
index_attempt_metadata=IndexAttemptMetadata(
connector_id=db_connector.id,
credential_id=db_credential.id,
),
)
net_doc_change += new_docs
chunk_count += total_batch_chunks
document_count += len(doc_batch)
# commit transaction so that the `update` below begins
# with a brand new transaction. Postgres uses the start
# of the transactions when computing `NOW()`, so if we have
# a long running transaction, the `time_updated` field will
# be inaccurate
db_session.commit()
# This new value is updated every batch, so UI can refresh per batch update
update_docs_indexed(
db_session=db_session,
index_attempt=index_attempt,
total_docs_indexed=document_count,
new_docs_indexed=net_doc_change,
)
run_end_dt = window_end
update_connector_credential_pair(
db_session=db_session,
connector_id=db_connector.id,
credential_id=db_credential.id,
attempt_status=IndexingStatus.IN_PROGRESS,
net_docs=net_doc_change,
run_dt=run_end_dt,
)
except Exception as e:
logger.info(
f"Connector run ran into exception after elapsed time: {time.time() - start_time} seconds"
)
# Only mark the attempt as a complete failure if this is the first indexing window.
# Otherwise, some progress was made - the next run will not start from the beginning.
# In this case, it is not accurate to mark it as a failure. When the next run begins,
# if that fails immediately, it will be marked as a failure.
#
# NOTE: if the connector is manually disabled, we should mark it as a failure regardless
# to give better clarity in the UI, as the next run will never happen.
if ind == 0 or db_connector.disabled:
mark_attempt_failed(index_attempt, db_session, failure_reason=str(e))
update_connector_credential_pair(
db_session=db_session,
connector_id=index_attempt.connector.id,
credential_id=index_attempt.credential.id,
attempt_status=IndexingStatus.FAILED,
net_docs=net_doc_change,
)
raise e
# break => similar to success case. As mentioned above, if the next run fails for the same
# reason it will then be marked as a failure
break
mark_attempt_succeeded(index_attempt, db_session)
update_connector_credential_pair(
db_session=db_session,
connector_id=db_connector.id,
credential_id=db_credential.id,
attempt_status=IndexingStatus.SUCCESS,
net_docs=net_doc_change,
run_dt=run_end_dt,
)
logger.info(
f"Indexed or refreshed {document_count} total documents for a total of {chunk_count} indexed chunks"
)
logger.info(
f"Connector successfully finished, elapsed time: {time.time() - start_time} seconds"
)
def run_indexing_entrypoint(index_attempt_id: int, num_threads: int) -> None:
"""Entrypoint for indexing run when using dask distributed.
Wraps the actual logic in a `try` block so that we can catch any exceptions
and mark the attempt as failed."""
try:
# set the indexing attempt ID so that all log messages from this process
# will have it added as a prefix
IndexAttemptSingleton.set_index_attempt_id(index_attempt_id)
logger.info(f"Setting task to use {num_threads} threads")
torch.set_num_threads(num_threads)
with Session(get_sqlalchemy_engine()) as db_session:
attempt = get_index_attempt(
db_session=db_session, index_attempt_id=index_attempt_id
)
if attempt is None:
raise RuntimeError(
f"Unable to find IndexAttempt for ID '{index_attempt_id}'"
)
logger.info(
f"Running indexing attempt for connector: '{attempt.connector.name}', "
f"with config: '{attempt.connector.connector_specific_config}', and "
f"with credentials: '{attempt.credential_id}'"
)
_run_indexing(
db_session=db_session,
index_attempt=attempt,
)
logger.info(
f"Completed indexing attempt for connector: '{attempt.connector.name}', "
f"with config: '{attempt.connector.connector_specific_config}', and "
f"with credentials: '{attempt.credential_id}'"
)
except Exception as e:
logger.exception(f"Indexing job with ID '{index_attempt_id}' failed due to {e}")

View File

@@ -0,0 +1,117 @@
from collections.abc import Callable
from functools import wraps
from typing import Any
from typing import cast
from typing import TypeVar
from celery import Task
from celery.result import AsyncResult
from sqlalchemy.orm import Session
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.tasks import mark_task_finished
from danswer.db.tasks import mark_task_start
from danswer.db.tasks import register_task
def name_cc_cleanup_task(connector_id: int, credential_id: int) -> str:
return f"cleanup_connector_credential_pair_{connector_id}_{credential_id}"
def name_document_set_sync_task(document_set_id: int) -> str:
return f"sync_doc_set_{document_set_id}"
T = TypeVar("T", bound=Callable)
def build_run_wrapper(build_name_fn: Callable[..., str]) -> Callable[[T], T]:
"""Utility meant to wrap the celery task `run` function in order to
automatically update our custom `task_queue_jobs` table appropriately"""
def wrap_task_fn(task_fn: T) -> T:
@wraps(task_fn)
def wrapped_task_fn(*args: list, **kwargs: dict) -> Any:
engine = get_sqlalchemy_engine()
task_name = build_name_fn(*args, **kwargs)
with Session(engine) as db_session:
# mark the task as started
mark_task_start(task_name=task_name, db_session=db_session)
result = None
exception = None
try:
result = task_fn(*args, **kwargs)
except Exception as e:
exception = e
with Session(engine) as db_session:
mark_task_finished(
task_name=task_name,
db_session=db_session,
success=exception is None,
)
if not exception:
return result
else:
raise exception
return cast(T, wrapped_task_fn)
return wrap_task_fn
# rough type signature for `apply_async`
AA = TypeVar("AA", bound=Callable[..., AsyncResult])
def build_apply_async_wrapper(build_name_fn: Callable[..., str]) -> Callable[[AA], AA]:
"""Utility meant to wrap celery `apply_async` function in order to automatically
update create an entry in our `task_queue_jobs` table"""
def wrapper(fn: AA) -> AA:
@wraps(fn)
def wrapped_fn(
args: tuple | None = None,
kwargs: dict[str, Any] | None = None,
*other_args: list,
**other_kwargs: dict[str, Any],
) -> Any:
# `apply_async` takes in args / kwargs directly as arguments
args_for_build_name = args or tuple()
kwargs_for_build_name = kwargs or {}
task_name = build_name_fn(*args_for_build_name, **kwargs_for_build_name)
with Session(get_sqlalchemy_engine()) as db_session:
# mark the task as started
task = fn(args, kwargs, *other_args, **other_kwargs)
register_task(task.id, task_name, db_session)
return task
return cast(AA, wrapped_fn)
return wrapper
def build_celery_task_wrapper(
build_name_fn: Callable[..., str]
) -> Callable[[Task], Task]:
"""Utility meant to wrap celery task functions in order to automatically
update our custom `task_queue_jobs` table appropriately.
On task creation (e.g. `apply_async`), a row is inserted into the table with
status `PENDING`.
On task start, the latest row is updated to have status `STARTED`.
On task success, the latest row is updated to have status `SUCCESS`.
On the task raising an unhandled exception, the latest row is updated to have
status `FAILURE`.
"""
def wrap_task(task: Task) -> Task:
task.run = build_run_wrapper(build_name_fn)(task.run) # type: ignore
task.apply_async = build_apply_async_wrapper(build_name_fn)(task.apply_async) # type: ignore
return task
return wrap_task

View File

@@ -0,0 +1,354 @@
import logging
import time
from datetime import datetime
import dask
import torch
from dask.distributed import Client
from dask.distributed import Future
from distributed import LocalCluster
from sqlalchemy.orm import Session
from danswer.background.indexing.dask_utils import ResourceLogger
from danswer.background.indexing.job_client import SimpleJob
from danswer.background.indexing.job_client import SimpleJobClient
from danswer.background.indexing.run_indexing import run_indexing_entrypoint
from danswer.configs.app_configs import CLEANUP_INDEXING_JOBS_TIMEOUT
from danswer.configs.app_configs import DASK_JOB_CLIENT_ENABLED
from danswer.configs.app_configs import LOG_LEVEL
from danswer.configs.app_configs import MODEL_SERVER_HOST
from danswer.configs.app_configs import NUM_INDEXING_WORKERS
from danswer.configs.model_configs import MIN_THREADS_ML_MODELS
from danswer.db.connector import fetch_connectors
from danswer.db.connector_credential_pair import mark_all_in_progress_cc_pairs_failed
from danswer.db.connector_credential_pair import update_connector_credential_pair
from danswer.db.engine import get_db_current_time
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.index_attempt import create_index_attempt
from danswer.db.index_attempt import get_index_attempt
from danswer.db.index_attempt import get_inprogress_index_attempts
from danswer.db.index_attempt import get_last_attempt
from danswer.db.index_attempt import get_not_started_index_attempts
from danswer.db.index_attempt import mark_attempt_failed
from danswer.db.models import Connector
from danswer.db.models import IndexAttempt
from danswer.db.models import IndexingStatus
from danswer.search.search_nlp_models import warm_up_models
from danswer.utils.logger import setup_logger
logger = setup_logger()
# If the indexing dies, it's most likely due to resource constraints,
# restarting just delays the eventual failure, not useful to the user
dask.config.set({"distributed.scheduler.allowed-failures": 0})
_UNEXPECTED_STATE_FAILURE_REASON = (
"Stopped mid run, likely due to the background process being killed"
)
"""Util funcs"""
def _get_num_threads() -> int:
"""Get # of "threads" to use for ML models in an indexing job. By default uses
the torch implementation, which returns the # of physical cores on the machine.
"""
return max(MIN_THREADS_ML_MODELS, torch.get_num_threads())
def _should_create_new_indexing(
connector: Connector, last_index: IndexAttempt | None, db_session: Session
) -> bool:
if connector.refresh_freq is None:
return False
if not last_index:
return True
# only one scheduled job per connector at a time
if last_index.status == IndexingStatus.NOT_STARTED:
return False
current_db_time = get_db_current_time(db_session)
time_since_index = current_db_time - last_index.time_updated
return time_since_index.total_seconds() >= connector.refresh_freq
def _is_indexing_job_marked_as_finished(index_attempt: IndexAttempt | None) -> bool:
if index_attempt is None:
return False
return (
index_attempt.status == IndexingStatus.FAILED
or index_attempt.status == IndexingStatus.SUCCESS
)
def _mark_run_failed(
db_session: Session, index_attempt: IndexAttempt, failure_reason: str
) -> None:
"""Marks the `index_attempt` row as failed + updates the `
connector_credential_pair` to reflect that the run failed"""
logger.warning(
f"Marking in-progress attempt 'connector: {index_attempt.connector_id}, "
f"credential: {index_attempt.credential_id}' as failed due to {failure_reason}"
)
mark_attempt_failed(
index_attempt=index_attempt,
db_session=db_session,
failure_reason=failure_reason,
)
if (
index_attempt.connector_id is not None
and index_attempt.credential_id is not None
):
update_connector_credential_pair(
db_session=db_session,
connector_id=index_attempt.connector_id,
credential_id=index_attempt.credential_id,
attempt_status=IndexingStatus.FAILED,
)
"""Main funcs"""
def create_indexing_jobs(existing_jobs: dict[int, Future | SimpleJob]) -> None:
"""Creates new indexing jobs for each connector / credential pair which is:
1. Enabled
2. `refresh_frequency` time has passed since the last indexing run for this pair
3. There is not already an ongoing indexing attempt for this pair
"""
with Session(get_sqlalchemy_engine()) as db_session:
ongoing_pairs: set[tuple[int | None, int | None]] = set()
for attempt_id in existing_jobs:
attempt = get_index_attempt(
db_session=db_session, index_attempt_id=attempt_id
)
if attempt is None:
logger.error(
f"Unable to find IndexAttempt for ID '{attempt_id}' when creating "
"indexing jobs"
)
continue
ongoing_pairs.add((attempt.connector_id, attempt.credential_id))
enabled_connectors = fetch_connectors(db_session, disabled_status=False)
for connector in enabled_connectors:
for association in connector.credentials:
credential = association.credential
# check if there is an ongoing indexing attempt for this connector + credential pair
if (connector.id, credential.id) in ongoing_pairs:
continue
last_attempt = get_last_attempt(connector.id, credential.id, db_session)
if not _should_create_new_indexing(connector, last_attempt, db_session):
continue
create_index_attempt(connector.id, credential.id, db_session)
update_connector_credential_pair(
db_session=db_session,
connector_id=connector.id,
credential_id=credential.id,
attempt_status=IndexingStatus.NOT_STARTED,
)
def cleanup_indexing_jobs(
existing_jobs: dict[int, Future | SimpleJob],
timeout_hours: int = CLEANUP_INDEXING_JOBS_TIMEOUT,
) -> dict[int, Future | SimpleJob]:
existing_jobs_copy = existing_jobs.copy()
# clean up completed jobs
with Session(get_sqlalchemy_engine()) as db_session:
for attempt_id, job in existing_jobs.items():
index_attempt = get_index_attempt(
db_session=db_session, index_attempt_id=attempt_id
)
# do nothing for ongoing jobs that haven't been stopped
if not job.done() and not _is_indexing_job_marked_as_finished(
index_attempt
):
continue
if job.status == "error":
logger.error(job.exception())
job.release()
del existing_jobs_copy[attempt_id]
if not index_attempt:
logger.error(
f"Unable to find IndexAttempt for ID '{attempt_id}' when cleaning "
"up indexing jobs"
)
continue
if (
index_attempt.status == IndexingStatus.IN_PROGRESS
or job.status == "error"
):
_mark_run_failed(
db_session=db_session,
index_attempt=index_attempt,
failure_reason=_UNEXPECTED_STATE_FAILURE_REASON,
)
# clean up in-progress jobs that were never completed
connectors = fetch_connectors(db_session)
for connector in connectors:
in_progress_indexing_attempts = get_inprogress_index_attempts(
connector.id, db_session
)
for index_attempt in in_progress_indexing_attempts:
if index_attempt.id in existing_jobs:
# check to see if the job has been updated in last n hours, if not
# assume it to frozen in some bad state and just mark it as failed. Note: this relies
# on the fact that the `time_updated` field is constantly updated every
# batch of documents indexed
current_db_time = get_db_current_time(db_session=db_session)
time_since_update = current_db_time - index_attempt.time_updated
if time_since_update.total_seconds() > 60 * 60 * timeout_hours:
existing_jobs[index_attempt.id].cancel()
_mark_run_failed(
db_session=db_session,
index_attempt=index_attempt,
failure_reason="Indexing run frozen - no updates in an hour. "
"The run will be re-attempted at next scheduled indexing time.",
)
else:
# If job isn't known, simply mark it as failed
_mark_run_failed(
db_session=db_session,
index_attempt=index_attempt,
failure_reason=_UNEXPECTED_STATE_FAILURE_REASON,
)
return existing_jobs_copy
def kickoff_indexing_jobs(
existing_jobs: dict[int, Future | SimpleJob],
client: Client | SimpleJobClient,
) -> dict[int, Future | SimpleJob]:
existing_jobs_copy = existing_jobs.copy()
engine = get_sqlalchemy_engine()
# Don't include jobs waiting in the Dask queue that just haven't started running
# Also (rarely) don't include for jobs that started but haven't updated the indexing tables yet
with Session(engine) as db_session:
new_indexing_attempts = [
attempt
for attempt in get_not_started_index_attempts(db_session)
if attempt.id not in existing_jobs
]
logger.info(f"Found {len(new_indexing_attempts)} new indexing tasks.")
if not new_indexing_attempts:
return existing_jobs
for attempt in new_indexing_attempts:
if attempt.connector is None:
logger.warning(
f"Skipping index attempt as Connector has been deleted: {attempt}"
)
with Session(engine) as db_session:
mark_attempt_failed(
attempt, db_session, failure_reason="Connector is null"
)
continue
if attempt.credential is None:
logger.warning(
f"Skipping index attempt as Credential has been deleted: {attempt}"
)
with Session(engine) as db_session:
mark_attempt_failed(
attempt, db_session, failure_reason="Credential is null"
)
continue
run = client.submit(
run_indexing_entrypoint, attempt.id, _get_num_threads(), pure=False
)
if run:
logger.info(
f"Kicked off indexing attempt for connector: '{attempt.connector.name}', "
f"with config: '{attempt.connector.connector_specific_config}', and "
f"with credentials: '{attempt.credential_id}'"
)
existing_jobs_copy[attempt.id] = run
return existing_jobs_copy
def update_loop(delay: int = 10, num_workers: int = NUM_INDEXING_WORKERS) -> None:
client: Client | SimpleJobClient
if DASK_JOB_CLIENT_ENABLED:
cluster = LocalCluster(
n_workers=num_workers,
threads_per_worker=1,
# there are warning about high memory usage + "Event loop unresponsive"
# which are not relevant to us since our workers are expected to use a
# lot of memory + involve CPU intensive tasks that will not relinquish
# the event loop
silence_logs=logging.ERROR,
)
client = Client(cluster)
if LOG_LEVEL.lower() == "debug":
client.register_worker_plugin(ResourceLogger())
else:
client = SimpleJobClient(n_workers=num_workers)
existing_jobs: dict[int, Future | SimpleJob] = {}
engine = get_sqlalchemy_engine()
with Session(engine) as db_session:
# Previous version did not always clean up cc-pairs well leaving some connectors undeleteable
# This ensures that bad states get cleaned up
mark_all_in_progress_cc_pairs_failed(db_session)
while True:
start = time.time()
start_time_utc = datetime.utcfromtimestamp(start).strftime("%Y-%m-%d %H:%M:%S")
logger.info(f"Running update, current UTC time: {start_time_utc}")
if existing_jobs:
# TODO: make this debug level once the "no jobs are being scheduled" issue is resolved
logger.info(
"Found existing indexing jobs: "
f"{[(attempt_id, job.status) for attempt_id, job in existing_jobs.items()]}"
)
try:
existing_jobs = cleanup_indexing_jobs(existing_jobs=existing_jobs)
create_indexing_jobs(existing_jobs=existing_jobs)
existing_jobs = kickoff_indexing_jobs(
existing_jobs=existing_jobs, client=client
)
except Exception as e:
logger.exception(f"Failed to run update due to {e}")
sleep_time = delay - (time.time() - start)
if sleep_time > 0:
time.sleep(sleep_time)
def update__main() -> None:
# needed for CUDA to work with multiprocessing
# NOTE: needs to be done on application startup
# before any other torch code has been run
if not DASK_JOB_CLIENT_ENABLED:
torch.multiprocessing.set_start_method("spawn")
if not MODEL_SERVER_HOST:
logger.info("Warming up Embedding Model(s)")
warm_up_models(indexer_only=True, skip_cross_encoders=True)
logger.info("Starting Indexing Loop")
update_loop()
if __name__ == "__main__":
update__main()

View File

View File

@@ -0,0 +1,479 @@
import re
from collections.abc import Callable
from collections.abc import Iterator
from functools import lru_cache
from typing import cast
from langchain.schema.messages import BaseMessage
from langchain.schema.messages import HumanMessage
from langchain.schema.messages import SystemMessage
from sqlalchemy.orm import Session
from danswer.chat.models import CitationInfo
from danswer.chat.models import DanswerAnswerPiece
from danswer.chat.models import LlmDoc
from danswer.configs.chat_configs import MULTILINGUAL_QUERY_EXPANSION
from danswer.configs.chat_configs import NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL
from danswer.configs.constants import IGNORE_FOR_QA
from danswer.configs.model_configs import GEN_AI_HISTORY_CUTOFF
from danswer.configs.model_configs import GEN_AI_MAX_INPUT_TOKENS
from danswer.db.chat import get_chat_messages_by_session
from danswer.db.models import ChatMessage
from danswer.db.models import Prompt
from danswer.indexing.models import InferenceChunk
from danswer.llm.utils import check_number_of_tokens
from danswer.prompts.chat_prompts import CHAT_USER_CONTEXT_FREE_PROMPT
from danswer.prompts.chat_prompts import CHAT_USER_PROMPT
from danswer.prompts.chat_prompts import CITATION_REMINDER
from danswer.prompts.chat_prompts import DEFAULT_IGNORE_STATEMENT
from danswer.prompts.chat_prompts import NO_CITATION_STATEMENT
from danswer.prompts.chat_prompts import REQUIRE_CITATION_STATEMENT
from danswer.prompts.constants import CODE_BLOCK_PAT
from danswer.prompts.direct_qa_prompts import LANGUAGE_HINT
from danswer.prompts.prompt_utils import get_current_llm_day_time
# Maps connector enum string to a more natural language representation for the LLM
# If not on the list, uses the original but slightly cleaned up, see below
CONNECTOR_NAME_MAP = {
"web": "Website",
"requesttracker": "Request Tracker",
"github": "GitHub",
"file": "File Upload",
}
def clean_up_source(source_str: str) -> str:
if source_str in CONNECTOR_NAME_MAP:
return CONNECTOR_NAME_MAP[source_str]
return source_str.replace("_", " ").title()
def build_context_str(
context_docs: list[LlmDoc | InferenceChunk],
include_metadata: bool = True,
) -> str:
context_str = ""
for ind, doc in enumerate(context_docs, start=1):
if include_metadata:
context_str += f"DOCUMENT {ind}: {doc.semantic_identifier}\n"
context_str += f"Source: {clean_up_source(doc.source_type)}\n"
if doc.updated_at:
update_str = doc.updated_at.strftime("%B %d, %Y %H:%M")
context_str += f"Updated: {update_str}\n"
context_str += f"{CODE_BLOCK_PAT.format(doc.content.strip())}\n\n\n"
return context_str.strip()
@lru_cache()
def build_chat_system_message(
prompt: Prompt,
context_exists: bool,
llm_tokenizer: Callable,
citation_line: str = REQUIRE_CITATION_STATEMENT,
no_citation_line: str = NO_CITATION_STATEMENT,
) -> tuple[SystemMessage | None, int]:
system_prompt = prompt.system_prompt.strip()
if prompt.include_citations:
if context_exists:
system_prompt += citation_line
else:
system_prompt += no_citation_line
if prompt.datetime_aware:
if system_prompt:
system_prompt += (
f"\n\nAdditional Information:\n\t- {get_current_llm_day_time()}."
)
else:
system_prompt = get_current_llm_day_time()
if not system_prompt:
return None, 0
token_count = len(llm_tokenizer(system_prompt))
system_msg = SystemMessage(content=system_prompt)
return system_msg, token_count
def build_task_prompt_reminders(
prompt: Prompt,
use_language_hint: bool = bool(MULTILINGUAL_QUERY_EXPANSION),
citation_str: str = CITATION_REMINDER,
language_hint_str: str = LANGUAGE_HINT,
) -> str:
base_task = prompt.task_prompt
citation_or_nothing = citation_str if prompt.include_citations else ""
language_hint_or_nothing = language_hint_str.lstrip() if use_language_hint else ""
return base_task + citation_or_nothing + language_hint_or_nothing
def llm_doc_from_inference_chunk(inf_chunk: InferenceChunk) -> LlmDoc:
return LlmDoc(
document_id=inf_chunk.document_id,
content=inf_chunk.content,
semantic_identifier=inf_chunk.semantic_identifier,
source_type=inf_chunk.source_type,
updated_at=inf_chunk.updated_at,
link=inf_chunk.source_links[0] if inf_chunk.source_links else None,
)
def map_document_id_order(
chunks: list[InferenceChunk | LlmDoc], one_indexed: bool = True
) -> dict[str, int]:
order_mapping = {}
current = 1 if one_indexed else 0
for chunk in chunks:
if chunk.document_id not in order_mapping:
order_mapping[chunk.document_id] = current
current += 1
return order_mapping
def build_chat_user_message(
chat_message: ChatMessage,
prompt: Prompt,
context_docs: list[LlmDoc],
llm_tokenizer: Callable,
all_doc_useful: bool,
user_prompt_template: str = CHAT_USER_PROMPT,
context_free_template: str = CHAT_USER_CONTEXT_FREE_PROMPT,
ignore_str: str = DEFAULT_IGNORE_STATEMENT,
) -> tuple[HumanMessage, int]:
user_query = chat_message.message
if not context_docs:
# Simpler prompt for cases where there is no context
user_prompt = (
context_free_template.format(
task_prompt=prompt.task_prompt, user_query=user_query
)
if prompt.task_prompt
else user_query
)
user_prompt = user_prompt.strip()
token_count = len(llm_tokenizer(user_prompt))
user_msg = HumanMessage(content=user_prompt)
return user_msg, token_count
context_docs_str = build_context_str(
cast(list[LlmDoc | InferenceChunk], context_docs)
)
optional_ignore = "" if all_doc_useful else ignore_str
task_prompt_with_reminder = build_task_prompt_reminders(prompt)
user_prompt = user_prompt_template.format(
optional_ignore_statement=optional_ignore,
context_docs_str=context_docs_str,
task_prompt=task_prompt_with_reminder,
user_query=user_query,
)
user_prompt = user_prompt.strip()
token_count = len(llm_tokenizer(user_prompt))
user_msg = HumanMessage(content=user_prompt)
return user_msg, token_count
def _get_usable_chunks(
chunks: list[InferenceChunk], token_limit: int
) -> list[InferenceChunk]:
total_token_count = 0
usable_chunks = []
for chunk in chunks:
chunk_token_count = check_number_of_tokens(chunk.content)
if total_token_count + chunk_token_count > token_limit:
break
total_token_count += chunk_token_count
usable_chunks.append(chunk)
# try and return at least one chunk if possible. This chunk will
# get truncated later on in the pipeline. This would only occur if
# the first chunk is larger than the token limit (usually due to character
# count -> token count mismatches caused by special characters / non-ascii
# languages)
if not usable_chunks and chunks:
usable_chunks = [chunks[0]]
return usable_chunks
def get_usable_chunks(
chunks: list[InferenceChunk],
token_limit: int = NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL,
offset: int = 0,
) -> list[InferenceChunk]:
offset_into_chunks = 0
usable_chunks: list[InferenceChunk] = []
for _ in range(min(offset + 1, 1)): # go through this process at least once
if offset_into_chunks >= len(chunks) and offset_into_chunks > 0:
raise ValueError(
"Chunks offset too large, should not retry this many times"
)
usable_chunks = _get_usable_chunks(
chunks=chunks[offset_into_chunks:], token_limit=token_limit
)
offset_into_chunks += len(usable_chunks)
return usable_chunks
def get_chunks_for_qa(
chunks: list[InferenceChunk],
llm_chunk_selection: list[bool],
token_limit: float | None = NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL,
batch_offset: int = 0,
) -> list[int]:
"""
Gives back indices of chunks to pass into the LLM for Q&A.
Only selects chunks viable for Q&A, within the token limit, and prioritize those selected
by the LLM in a separate flow (this can be turned off)
Note, the batch_offset calculation has to count the batches from the beginning each time as
there's no way to know which chunks were included in the prior batches without recounting atm,
this is somewhat slow as it requires tokenizing all the chunks again
"""
batch_index = 0
latest_batch_indices: list[int] = []
token_count = 0
# First iterate the LLM selected chunks, then iterate the rest if tokens remaining
for selection_target in [True, False]:
for ind, chunk in enumerate(chunks):
if llm_chunk_selection[ind] is not selection_target or chunk.metadata.get(
IGNORE_FOR_QA
):
continue
# We calculate it live in case the user uses a different LLM + tokenizer
chunk_token = check_number_of_tokens(chunk.content)
# 50 for an approximate/slight overestimate for # tokens for metadata for the chunk
token_count += chunk_token + 50
# Always use at least 1 chunk
if (
token_limit is None
or token_count <= token_limit
or not latest_batch_indices
):
latest_batch_indices.append(ind)
current_chunk_unused = False
else:
current_chunk_unused = True
if token_limit is not None and token_count >= token_limit:
if batch_index < batch_offset:
batch_index += 1
if current_chunk_unused:
latest_batch_indices = [ind]
token_count = chunk_token
else:
latest_batch_indices = []
token_count = 0
else:
return latest_batch_indices
return latest_batch_indices
def create_chat_chain(
chat_session_id: int,
db_session: Session,
) -> tuple[ChatMessage, list[ChatMessage]]:
"""Build the linear chain of messages without including the root message"""
mainline_messages: list[ChatMessage] = []
all_chat_messages = get_chat_messages_by_session(
chat_session_id=chat_session_id,
user_id=None,
db_session=db_session,
skip_permission_check=True,
)
id_to_msg = {msg.id: msg for msg in all_chat_messages}
if not all_chat_messages:
raise ValueError("No messages in Chat Session")
root_message = all_chat_messages[0]
if root_message.parent_message is not None:
raise RuntimeError(
"Invalid root message, unable to fetch valid chat message sequence"
)
current_message: ChatMessage | None = root_message
while current_message is not None:
child_msg = current_message.latest_child_message
if not child_msg:
break
current_message = id_to_msg.get(child_msg)
if current_message is None:
raise RuntimeError(
"Invalid message chain,"
"could not find next message in the same session"
)
mainline_messages.append(current_message)
if not mainline_messages:
raise RuntimeError("Could not trace chat message history")
return mainline_messages[-1], mainline_messages[:-1]
def combine_message_chain(
messages: list[ChatMessage],
msg_limit: int | None = 10,
token_limit: int | None = GEN_AI_HISTORY_CUTOFF,
) -> str:
"""Used for secondary LLM flows that require the chat history"""
message_strs: list[str] = []
total_token_count = 0
if msg_limit is not None:
messages = messages[-msg_limit:]
for message in reversed(messages):
message_token_count = message.token_count
if (
token_limit is not None
and total_token_count + message_token_count > token_limit
):
break
role = message.message_type.value.upper()
message_strs.insert(0, f"{role}:\n{message.message}")
total_token_count += message_token_count
return "\n\n".join(message_strs)
def find_last_index(
lst: list[int], max_prompt_tokens: int = GEN_AI_MAX_INPUT_TOKENS
) -> int:
"""From the back, find the index of the last element to include
before the list exceeds the maximum"""
running_sum = 0
last_ind = 0
for i in range(len(lst) - 1, -1, -1):
running_sum += lst[i]
if running_sum > max_prompt_tokens:
last_ind = i + 1
break
if last_ind >= len(lst):
raise ValueError("Last message alone is too large!")
return last_ind
def drop_messages_history_overflow(
system_msg: BaseMessage | None,
system_token_count: int,
history_msgs: list[BaseMessage],
history_token_counts: list[int],
final_msg: BaseMessage,
final_msg_token_count: int,
) -> list[BaseMessage]:
"""As message history grows, messages need to be dropped starting from the furthest in the past.
The System message should be kept if at all possible and the latest user input which is inserted in the
prompt template must be included"""
if len(history_msgs) != len(history_token_counts):
# This should never happen
raise ValueError("Need exactly 1 token count per message for tracking overflow")
prompt: list[BaseMessage] = []
# Start dropping from the history if necessary
all_tokens = history_token_counts + [system_token_count, final_msg_token_count]
ind_prev_msg_start = find_last_index(all_tokens)
if system_msg and ind_prev_msg_start <= len(history_msgs):
prompt.append(system_msg)
prompt.extend(history_msgs[ind_prev_msg_start:])
prompt.append(final_msg)
return prompt
def extract_citations_from_stream(
tokens: Iterator[str],
context_docs: list[LlmDoc],
doc_id_to_rank_map: dict[str, int],
) -> Iterator[DanswerAnswerPiece | CitationInfo]:
max_citation_num = len(context_docs)
curr_segment = ""
prepend_bracket = False
cited_inds = set()
for token in tokens:
# Special case of [1][ where ][ is a single token
# This is where the model attempts to do consecutive citations like [1][2]
if prepend_bracket:
curr_segment += "[" + curr_segment
prepend_bracket = False
curr_segment += token
possible_citation_pattern = r"(\[\d*$)" # [1, [, etc
possible_citation_found = re.search(possible_citation_pattern, curr_segment)
citation_pattern = r"\[(\d+)\]" # [1], [2] etc
citation_found = re.search(citation_pattern, curr_segment)
if citation_found:
numerical_value = int(citation_found.group(1))
if 1 <= numerical_value <= max_citation_num:
context_llm_doc = context_docs[
numerical_value - 1
] # remove 1 index offset
link = context_llm_doc.link
target_citation_num = doc_id_to_rank_map[context_llm_doc.document_id]
# Use the citation number for the document's rank in
# the search (or selected docs) results
curr_segment = re.sub(
rf"\[{numerical_value}\]", f"[{target_citation_num}]", curr_segment
)
if target_citation_num not in cited_inds:
cited_inds.add(target_citation_num)
yield CitationInfo(
citation_num=target_citation_num,
document_id=context_llm_doc.document_id,
)
if link:
curr_segment = re.sub(r"\[", "[[", curr_segment, count=1)
curr_segment = re.sub("]", f"]]({link})", curr_segment, count=1)
# In case there's another open bracket like [1][, don't want to match this
possible_citation_found = None
# if we see "[", but haven't seen the right side, hold back - this may be a
# citation that needs to be replaced with a link
if possible_citation_found:
continue
# Special case with back to back citations [1][2]
if curr_segment and curr_segment[-1] == "[":
curr_segment = curr_segment[:-1]
prepend_bracket = True
yield DanswerAnswerPiece(answer_piece=curr_segment)
curr_segment = ""
if curr_segment:
if prepend_bracket:
yield DanswerAnswerPiece(answer_piece="[" + curr_segment)
else:
yield DanswerAnswerPiece(answer_piece=curr_segment)

View File

@@ -0,0 +1,106 @@
from typing import cast
import yaml
from sqlalchemy.orm import Session
from danswer.configs.chat_configs import DEFAULT_NUM_CHUNKS_FED_TO_CHAT
from danswer.configs.chat_configs import PERSONAS_YAML
from danswer.configs.chat_configs import PROMPTS_YAML
from danswer.db.chat import get_prompt_by_name
from danswer.db.chat import upsert_persona
from danswer.db.chat import upsert_prompt
from danswer.db.document_set import get_or_create_document_set_by_name
from danswer.db.engine import get_sqlalchemy_engine
from danswer.db.models import DocumentSet as DocumentSetDBModel
from danswer.db.models import Prompt as PromptDBModel
from danswer.search.models import RecencyBiasSetting
def load_prompts_from_yaml(prompts_yaml: str = PROMPTS_YAML) -> None:
with open(prompts_yaml, "r") as file:
data = yaml.safe_load(file)
all_prompts = data.get("prompts", [])
with Session(get_sqlalchemy_engine()) as db_session:
for prompt in all_prompts:
upsert_prompt(
user_id=None,
prompt_id=prompt.get("id"),
name=prompt["name"],
description=prompt["description"].strip(),
system_prompt=prompt["system"].strip(),
task_prompt=prompt["task"].strip(),
include_citations=prompt["include_citations"],
datetime_aware=prompt.get("datetime_aware", True),
default_prompt=True,
personas=None,
shared=True,
db_session=db_session,
commit=True,
)
def load_personas_from_yaml(
personas_yaml: str = PERSONAS_YAML,
default_chunks: float = DEFAULT_NUM_CHUNKS_FED_TO_CHAT,
) -> None:
with open(personas_yaml, "r") as file:
data = yaml.safe_load(file)
all_personas = data.get("personas", [])
with Session(get_sqlalchemy_engine()) as db_session:
for persona in all_personas:
doc_set_names = persona["document_sets"]
doc_sets: list[DocumentSetDBModel] | None = [
get_or_create_document_set_by_name(db_session, name)
for name in doc_set_names
]
# Assume if user hasn't set any document sets for the persona, the user may want
# to later attach document sets to the persona manually, therefore, don't overwrite/reset
# the document sets for the persona
if not doc_sets:
doc_sets = None
prompt_set_names = persona["prompts"]
if not prompt_set_names:
prompts: list[PromptDBModel | None] | None = None
else:
prompts = [
get_prompt_by_name(
prompt_name, user_id=None, shared=True, db_session=db_session
)
for prompt_name in prompt_set_names
]
if any([prompt is None for prompt in prompts]):
raise ValueError("Invalid Persona configs, not all prompts exist")
if not prompts:
prompts = None
upsert_persona(
user_id=None,
persona_id=persona.get("id"),
name=persona["name"],
description=persona["description"],
num_chunks=persona.get("num_chunks")
if persona.get("num_chunks") is not None
else default_chunks,
llm_relevance_filter=persona.get("llm_relevance_filter"),
llm_filter_extraction=persona.get("llm_filter_extraction"),
llm_model_version_override=None,
recency_bias=RecencyBiasSetting(persona["recency_bias"]),
prompts=cast(list[PromptDBModel] | None, prompts),
document_sets=doc_sets,
default_persona=True,
shared=True,
db_session=db_session,
)
def load_chat_yamls(
prompt_yaml: str = PROMPTS_YAML,
personas_yaml: str = PERSONAS_YAML,
) -> None:
load_prompts_from_yaml(prompt_yaml)
load_personas_from_yaml(personas_yaml)

View File

@@ -0,0 +1,100 @@
from collections.abc import Iterator
from datetime import datetime
from typing import Any
from pydantic import BaseModel
from danswer.configs.constants import DocumentSource
from danswer.search.models import QueryFlow
from danswer.search.models import RetrievalDocs
from danswer.search.models import SearchResponse
from danswer.search.models import SearchType
class LlmDoc(BaseModel):
"""This contains the minimal set information for the LLM portion including citations"""
document_id: str
content: str
semantic_identifier: str
source_type: DocumentSource
updated_at: datetime | None
link: str | None
# First chunk of info for streaming QA
class QADocsResponse(RetrievalDocs):
rephrased_query: str | None = None
predicted_flow: QueryFlow | None
predicted_search: SearchType | None
applied_source_filters: list[DocumentSource] | None
applied_time_cutoff: datetime | None
recency_bias_multiplier: float
def dict(self, *args: list, **kwargs: dict[str, Any]) -> dict[str, Any]: # type: ignore
initial_dict = super().dict(*args, **kwargs) # type: ignore
initial_dict["applied_time_cutoff"] = (
self.applied_time_cutoff.isoformat() if self.applied_time_cutoff else None
)
return initial_dict
# Second chunk of info for streaming QA
class LLMRelevanceFilterResponse(BaseModel):
relevant_chunk_indices: list[int]
class DanswerAnswerPiece(BaseModel):
# A small piece of a complete answer. Used for streaming back answers.
answer_piece: str | None # if None, specifies the end of an Answer
# An intermediate representation of citations, later translated into
# a mapping of the citation [n] number to SearchDoc
class CitationInfo(BaseModel):
citation_num: int
document_id: str
class StreamingError(BaseModel):
error: str
class DanswerQuote(BaseModel):
# This is during inference so everything is a string by this point
quote: str
document_id: str
link: str | None
source_type: str
semantic_identifier: str
blurb: str
class DanswerQuotes(BaseModel):
quotes: list[DanswerQuote]
class DanswerAnswer(BaseModel):
answer: str | None
class QAResponse(SearchResponse, DanswerAnswer):
quotes: list[DanswerQuote] | None
predicted_flow: QueryFlow
predicted_search: SearchType
eval_res_valid: bool | None = None
llm_chunks_indices: list[int] | None = None
error_msg: str | None = None
AnswerQuestionReturn = tuple[DanswerAnswer, DanswerQuotes]
AnswerQuestionStreamReturn = Iterator[
DanswerAnswerPiece | DanswerQuotes | StreamingError
]
class LLMMetricsContainer(BaseModel):
prompt_tokens: int
response_tokens: int

View File

@@ -0,0 +1,65 @@
# Currently in the UI, each Persona only has one prompt, which is why there are 3 very similar personas defined below.
personas:
# This id field can be left blank for other default personas, however an id 0 persona must exist
# this is for DanswerBot to use when tagged in a non-configured channel
# Careful setting specific IDs, this won't autoincrement the next ID value for postgres
- id: 0
name: "Default"
description: >
Default Danswer Question Answering functionality.
# Default Prompt objects attached to the persona, see prompts.yaml
prompts:
- "Answer-Question"
# Default number of chunks to include as context, set to 0 to disable retrieval
# Remove the field to set to the system default number of chunks/tokens to pass to Gen AI
# If selecting documents, user can bypass this up until NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL
# Each chunk is 512 tokens long
num_chunks: 5
# Enable/Disable usage of the LLM chunk filter feature whereby each chunk is passed to the LLM to determine
# if the chunk is useful or not towards the latest user query
# This feature can be overriden for all personas via DISABLE_LLM_CHUNK_FILTER env variable
llm_relevance_filter: true
# Enable/Disable usage of the LLM to extract query time filters including source type and time range filters
llm_filter_extraction: true
# Decay documents priority as they age, options are:
# - favor_recent (2x base by default, configurable)
# - base_decay
# - no_decay
# - auto (model chooses between favor_recent and base_decay based on user query)
recency_bias: "auto"
# Default Document Sets for this persona, specified as a list of names here.
# If the document set by the name exists, it will be attached to the persona
# If the document set by the name does not exist, it will be created as an empty document set with no connectors
# The admin can then use the UI to add new connectors to the document set
# Example:
# document_sets:
# - "HR Resources"
# - "Engineer Onboarding"
# - "Benefits"
document_sets: []
- name: "Summarize"
description: >
A less creative assistant which summarizes relevant documents but does not try to
extrapolate any answers for you.
prompts:
- "Summarize"
num_chunks: 5
llm_relevance_filter: true
llm_filter_extraction: true
recency_bias: "auto"
document_sets: []
- name: "Paraphrase"
description: >
The least creative default assistant that only provides quotes from the documents.
prompts:
- "Paraphrase"
num_chunks: 5
llm_relevance_filter: true
llm_filter_extraction: true
recency_bias: "auto"
document_sets: []

View File

@@ -0,0 +1,471 @@
from collections.abc import Callable
from collections.abc import Iterator
from functools import partial
from typing import cast
from sqlalchemy.orm import Session
from danswer.chat.chat_utils import build_chat_system_message
from danswer.chat.chat_utils import build_chat_user_message
from danswer.chat.chat_utils import create_chat_chain
from danswer.chat.chat_utils import drop_messages_history_overflow
from danswer.chat.chat_utils import extract_citations_from_stream
from danswer.chat.chat_utils import get_chunks_for_qa
from danswer.chat.chat_utils import llm_doc_from_inference_chunk
from danswer.chat.chat_utils import map_document_id_order
from danswer.chat.models import CitationInfo
from danswer.chat.models import DanswerAnswerPiece
from danswer.chat.models import LlmDoc
from danswer.chat.models import LLMRelevanceFilterResponse
from danswer.chat.models import QADocsResponse
from danswer.chat.models import StreamingError
from danswer.configs.chat_configs import CHUNK_SIZE
from danswer.configs.chat_configs import DEFAULT_NUM_CHUNKS_FED_TO_CHAT
from danswer.configs.constants import DISABLED_GEN_AI_MSG
from danswer.configs.constants import MessageType
from danswer.db.chat import create_db_search_doc
from danswer.db.chat import create_new_chat_message
from danswer.db.chat import get_chat_message
from danswer.db.chat import get_chat_session_by_id
from danswer.db.chat import get_db_search_doc_by_id
from danswer.db.chat import get_doc_query_identifiers_from_model
from danswer.db.chat import get_or_create_root_message
from danswer.db.chat import translate_db_message_to_chat_message_detail
from danswer.db.chat import translate_db_search_doc_to_server_search_doc
from danswer.db.models import ChatMessage
from danswer.db.models import SearchDoc as DbSearchDoc
from danswer.db.models import User
from danswer.document_index.factory import get_default_document_index
from danswer.indexing.models import InferenceChunk
from danswer.llm.exceptions import GenAIDisabledException
from danswer.llm.factory import get_default_llm
from danswer.llm.interfaces import LLM
from danswer.llm.utils import get_default_llm_token_encode
from danswer.llm.utils import translate_history_to_basemessages
from danswer.search.models import OptionalSearchSetting
from danswer.search.models import RetrievalDetails
from danswer.search.request_preprocessing import retrieval_preprocessing
from danswer.search.search_runner import chunks_to_search_docs
from danswer.search.search_runner import full_chunk_search_generator
from danswer.search.search_runner import inference_documents_from_ids
from danswer.secondary_llm_flows.choose_search import check_if_need_search
from danswer.secondary_llm_flows.query_expansion import history_based_query_rephrase
from danswer.server.query_and_chat.models import CreateChatMessageRequest
from danswer.server.utils import get_json_line
from danswer.utils.logger import setup_logger
from danswer.utils.timing import log_generator_function_time
logger = setup_logger()
def generate_ai_chat_response(
query_message: ChatMessage,
history: list[ChatMessage],
context_docs: list[LlmDoc],
doc_id_to_rank_map: dict[str, int],
llm: LLM | None,
llm_tokenizer: Callable,
all_doc_useful: bool,
) -> Iterator[DanswerAnswerPiece | CitationInfo | StreamingError]:
if llm is None:
try:
llm = get_default_llm()
except GenAIDisabledException:
# Not an error if it's a user configuration
yield DanswerAnswerPiece(answer_piece=DISABLED_GEN_AI_MSG)
return
if query_message.prompt is None:
raise RuntimeError("No prompt received for generating Gen AI answer.")
try:
context_exists = len(context_docs) > 0
system_message_or_none, system_tokens = build_chat_system_message(
prompt=query_message.prompt,
context_exists=context_exists,
llm_tokenizer=llm_tokenizer,
)
history_basemessages, history_token_counts = translate_history_to_basemessages(
history
)
# Be sure the context_docs passed to build_chat_user_message
# Is the same as passed in later for extracting citations
user_message, user_tokens = build_chat_user_message(
chat_message=query_message,
prompt=query_message.prompt,
context_docs=context_docs,
llm_tokenizer=llm_tokenizer,
all_doc_useful=all_doc_useful,
)
prompt = drop_messages_history_overflow(
system_msg=system_message_or_none,
system_token_count=system_tokens,
history_msgs=history_basemessages,
history_token_counts=history_token_counts,
final_msg=user_message,
final_msg_token_count=user_tokens,
)
# Good Debug/Breakpoint
tokens = llm.stream(prompt)
yield from extract_citations_from_stream(
tokens, context_docs, doc_id_to_rank_map
)
except Exception as e:
logger.exception(f"LLM failed to produce valid chat message, error: {e}")
yield StreamingError(error=str(e))
def translate_citations(
citations_list: list[CitationInfo], db_docs: list[DbSearchDoc]
) -> dict[int, int]:
"""Always cites the first instance of the document_id, assumes the db_docs
are sorted in the order displayed in the UI"""
doc_id_to_saved_doc_id_map: dict[str, int] = {}
for db_doc in db_docs:
if db_doc.document_id not in doc_id_to_saved_doc_id_map:
doc_id_to_saved_doc_id_map[db_doc.document_id] = db_doc.id
citation_to_saved_doc_id_map: dict[int, int] = {}
for citation in citations_list:
if citation.citation_num not in citation_to_saved_doc_id_map:
citation_to_saved_doc_id_map[
citation.citation_num
] = doc_id_to_saved_doc_id_map[citation.document_id]
return citation_to_saved_doc_id_map
@log_generator_function_time()
def stream_chat_message(
new_msg_req: CreateChatMessageRequest,
user: User | None,
db_session: Session,
# Needed to translate persona num_chunks to tokens to the LLM
default_num_chunks: float = DEFAULT_NUM_CHUNKS_FED_TO_CHAT,
default_chunk_size: int = CHUNK_SIZE,
) -> Iterator[str]:
"""Streams in order:
1. [conditional] Retrieved documents if a search needs to be run
2. [conditional] LLM selected chunk indices if LLM chunk filtering is turned on
3. [always] A set of streamed LLM tokens or an error anywhere along the line if something fails
4. [always] Details on the final AI response message that is created
"""
try:
user_id = user.id if user is not None else None
chat_session = get_chat_session_by_id(
chat_session_id=new_msg_req.chat_session_id,
user_id=user_id,
db_session=db_session,
)
message_text = new_msg_req.message
chat_session_id = new_msg_req.chat_session_id
parent_id = new_msg_req.parent_message_id
prompt_id = new_msg_req.prompt_id
reference_doc_ids = new_msg_req.search_doc_ids
retrieval_options = new_msg_req.retrieval_options
persona = chat_session.persona
query_override = new_msg_req.query_override
if reference_doc_ids is None and retrieval_options is None:
raise RuntimeError(
"Must specify a set of documents for chat or specify search options"
)
try:
llm = get_default_llm()
except GenAIDisabledException:
llm = None
llm_tokenizer = get_default_llm_token_encode()
document_index = get_default_document_index()
# Every chat Session begins with an empty root message
root_message = get_or_create_root_message(
chat_session_id=chat_session_id, db_session=db_session
)
if parent_id is not None:
parent_message = get_chat_message(
chat_message_id=parent_id,
user_id=user_id,
db_session=db_session,
)
else:
parent_message = root_message
# Create new message at the right place in the tree and update the parent's child pointer
# Don't commit yet until we verify the chat message chain
new_user_message = create_new_chat_message(
chat_session_id=chat_session_id,
parent_message=parent_message,
prompt_id=prompt_id,
message=message_text,
token_count=len(llm_tokenizer(message_text)),
message_type=MessageType.USER,
db_session=db_session,
commit=False,
)
# Create linear history of messages
final_msg, history_msgs = create_chat_chain(
chat_session_id=chat_session_id, db_session=db_session
)
if final_msg.id != new_user_message.id:
db_session.rollback()
raise RuntimeError(
"The new message was not on the mainline. "
"Be sure to update the chat pointers before calling this."
)
# Save now to save the latest chat message
db_session.commit()
run_search = False
# Retrieval options are only None if reference_doc_ids are provided
if retrieval_options is not None and persona.num_chunks != 0:
if retrieval_options.run_search == OptionalSearchSetting.ALWAYS:
run_search = True
elif retrieval_options.run_search == OptionalSearchSetting.NEVER:
run_search = False
else:
run_search = check_if_need_search(
query_message=final_msg, history=history_msgs, llm=llm
)
rephrased_query = None
if reference_doc_ids:
identifier_tuples = get_doc_query_identifiers_from_model(
search_doc_ids=reference_doc_ids,
chat_session=chat_session,
user_id=user_id,
db_session=db_session,
)
# Generates full documents currently
# May extend to include chunk ranges
llm_docs: list[LlmDoc] = inference_documents_from_ids(
doc_identifiers=identifier_tuples,
document_index=get_default_document_index(),
)
doc_id_to_rank_map = map_document_id_order(
cast(list[InferenceChunk | LlmDoc], llm_docs)
)
# In case the search doc is deleted, just don't include it
# though this should never happen
db_search_docs_or_none = [
get_db_search_doc_by_id(doc_id=doc_id, db_session=db_session)
for doc_id in reference_doc_ids
]
reference_db_search_docs = [
db_sd for db_sd in db_search_docs_or_none if db_sd
]
elif run_search:
rephrased_query = (
history_based_query_rephrase(
query_message=final_msg, history=history_msgs, llm=llm
)
if query_override is None
else query_override
)
(
retrieval_request,
predicted_search_type,
predicted_flow,
) = retrieval_preprocessing(
query=rephrased_query,
retrieval_details=cast(RetrievalDetails, retrieval_options),
persona=persona,
user=user,
db_session=db_session,
)
documents_generator = full_chunk_search_generator(
search_query=retrieval_request,
document_index=document_index,
)
time_cutoff = retrieval_request.filters.time_cutoff
recency_bias_multiplier = retrieval_request.recency_bias_multiplier
run_llm_chunk_filter = not retrieval_request.skip_llm_chunk_filter
# First fetch and return the top chunks to the UI so the user can
# immediately see some results
top_chunks = cast(list[InferenceChunk], next(documents_generator))
# Get ranking of the documents for citation purposes later
doc_id_to_rank_map = map_document_id_order(
cast(list[InferenceChunk | LlmDoc], top_chunks)
)
top_docs = chunks_to_search_docs(top_chunks)
reference_db_search_docs = [
create_db_search_doc(server_search_doc=top_doc, db_session=db_session)
for top_doc in top_docs
]
response_docs = [
translate_db_search_doc_to_server_search_doc(db_search_doc)
for db_search_doc in reference_db_search_docs
]
initial_response = QADocsResponse(
rephrased_query=rephrased_query,
top_documents=response_docs,
predicted_flow=predicted_flow,
predicted_search=predicted_search_type,
applied_source_filters=retrieval_request.filters.source_type,
applied_time_cutoff=time_cutoff,
recency_bias_multiplier=recency_bias_multiplier,
).dict()
yield get_json_line(initial_response)
# Get the final ordering of chunks for the LLM call
llm_chunk_selection = cast(list[bool], next(documents_generator))
# Yield the list of LLM selected chunks for showing the LLM selected icons in the UI
llm_relevance_filtering_response = LLMRelevanceFilterResponse(
relevant_chunk_indices=[
index for index, value in enumerate(llm_chunk_selection) if value
]
if run_llm_chunk_filter
else []
).dict()
yield get_json_line(llm_relevance_filtering_response)
# Prep chunks to pass to LLM
num_llm_chunks = (
persona.num_chunks
if persona.num_chunks is not None
else default_num_chunks
)
llm_chunks_indices = get_chunks_for_qa(
chunks=top_chunks,
llm_chunk_selection=llm_chunk_selection,
token_limit=num_llm_chunks * default_chunk_size,
)
llm_chunks = [top_chunks[i] for i in llm_chunks_indices]
llm_docs = [llm_doc_from_inference_chunk(chunk) for chunk in llm_chunks]
else:
llm_docs = []
doc_id_to_rank_map = {}
reference_db_search_docs = None
# Cannot determine these without the LLM step or breaking out early
partial_response = partial(
create_new_chat_message,
chat_session_id=chat_session_id,
parent_message=new_user_message,
prompt_id=prompt_id,
# message=,
rephrased_query=rephrased_query,
# token_count=,
message_type=MessageType.ASSISTANT,
# error=,
reference_docs=reference_db_search_docs,
db_session=db_session,
commit=True,
)
# If no prompt is provided, this is interpreted as not wanting an AI Answer
# Simply provide/save the retrieval results
if final_msg.prompt is None:
gen_ai_response_message = partial_response(
message="",
token_count=0,
citations=None,
error=None,
)
msg_detail_response = translate_db_message_to_chat_message_detail(
gen_ai_response_message
)
yield get_json_line(msg_detail_response.dict())
# Stop here after saving message details, the above still needs to be sent for the
# message id to send the next follow-up message
return
# LLM prompt building, response capturing, etc.
response_packets = generate_ai_chat_response(
query_message=final_msg,
history=history_msgs,
context_docs=llm_docs,
doc_id_to_rank_map=doc_id_to_rank_map,
llm=llm,
llm_tokenizer=llm_tokenizer,
all_doc_useful=reference_doc_ids is not None,
)
# Capture outputs and errors
llm_output = ""
error: str | None = None
citations: list[CitationInfo] = []
for packet in response_packets:
if isinstance(packet, DanswerAnswerPiece):
token = packet.answer_piece
if token:
llm_output += token
elif isinstance(packet, StreamingError):
error = packet.error
elif isinstance(packet, CitationInfo):
citations.append(packet)
continue
yield get_json_line(packet.dict())
except Exception as e:
logger.exception(e)
# Frontend will erase whatever answer and show this instead
# This will be the issue 99% of the time
error_packet = StreamingError(
error="LLM failed to respond, have you set your API key?"
)
yield get_json_line(error_packet.dict())
return
# Post-LLM answer processing
try:
db_citations = None
if reference_db_search_docs:
db_citations = translate_citations(
citations_list=citations,
db_docs=reference_db_search_docs,
)
# Saving Gen AI answer and responding with message info
gen_ai_response_message = partial_response(
message=llm_output,
token_count=len(llm_tokenizer(llm_output)),
citations=db_citations,
error=error,
)
msg_detail_response = translate_db_message_to_chat_message_detail(
gen_ai_response_message
)
yield get_json_line(msg_detail_response.dict())
except Exception as e:
logger.exception(e)
# Frontend will erase whatever answer and show this instead
error_packet = StreamingError(error="Failed to parse LLM output")
yield get_json_line(error_packet.dict())

View File

@@ -0,0 +1,68 @@
prompts:
# This id field can be left blank for other default prompts, however an id 0 prompt must exist
# This is to act as a default
# Careful setting specific IDs, this won't autoincrement the next ID value for postgres
- id: 0
name: "Answer-Question"
description: "Answers user questions using retrieved context!"
# System Prompt (as shown in UI)
system: >
You are a question answering system that is constantly learning and improving.
You can process and comprehend vast amounts of text and utilize this knowledge to provide
grounded, accurate, and concise answers to diverse queries.
You always clearly communicate ANY UNCERTAINTY in your answer.
# Task Prompt (as shown in UI)
task: >
Answer my query based on the documents provided.
The documents may not all be relevant, ignore any documents that are not directly relevant
to the most recent user query.
I have not read or seen any of the documents and do not want to read them.
If there are no relevant documents, refer to the chat history and existing knowledge.
# Inject a statement at the end of system prompt to inform the LLM of the current date/time
# Format looks like: "October 16, 2023 14:30"
datetime_aware: true
# Prompts the LLM to include citations in the for [1], [2] etc.
# which get parsed to match the passed in sources
include_citations: true
- name: "Summarize"
description: "Summarize relevant information from retrieved context!"
system: >
You are a text summarizing assistant that highlights the most important knowledge from the
context provided, prioritizing the information that relates to the user query.
You ARE NOT creative and always stick to the provided documents.
If there are no documents, refer to the conversation history.
IMPORTANT: YOU ONLY SUMMARIZE THE IMPORTANT INFORMATION FROM THE PROVIDED DOCUMENTS,
NEVER USE YOUR OWN KNOWLEDGE.
task: >
Summarize the documents provided in relation to the query below.
NEVER refer to the documents by number, I do not have them in the same order as you.
Do not make up any facts, only use what is in the documents.
datetime_aware: true
include_citations: true
- name: "Paraphrase"
description: "Recites information from retrieved context! Least creative but most safe!"
system: >
Quote and cite relevant information from provided context based on the user query.
You only provide quotes that are EXACT substrings from provided documents!
If there are no documents provided,
simply tell the user that there are no documents to reference.
You NEVER generate new text or phrases outside of the citation.
DO NOT explain your responses, only provide the quotes and NOTHING ELSE.
task: >
Provide EXACT quotes from the provided documents above. Do not generate any new text that is not
directly from the documents.
datetime_aware: true
include_citations: true

View File

@@ -0,0 +1,115 @@
from typing import TypedDict
from pydantic import BaseModel
from danswer.prompts.chat_tools import DANSWER_TOOL_DESCRIPTION
from danswer.prompts.chat_tools import DANSWER_TOOL_NAME
from danswer.prompts.chat_tools import TOOL_FOLLOWUP
from danswer.prompts.chat_tools import TOOL_LESS_FOLLOWUP
from danswer.prompts.chat_tools import TOOL_LESS_PROMPT
from danswer.prompts.chat_tools import TOOL_TEMPLATE
from danswer.prompts.chat_tools import USER_INPUT
class ToolInfo(TypedDict):
name: str
description: str
class DanswerChatModelOut(BaseModel):
model_raw: str
action: str
action_input: str
def call_tool(
model_actions: DanswerChatModelOut,
) -> str:
raise NotImplementedError("There are no additional tool integrations right now")
def form_user_prompt_text(
query: str,
tool_text: str | None,
hint_text: str | None,
user_input_prompt: str = USER_INPUT,
tool_less_prompt: str = TOOL_LESS_PROMPT,
) -> str:
user_prompt = tool_text or tool_less_prompt
user_prompt += user_input_prompt.format(user_input=query)
if hint_text:
if user_prompt[-1] != "\n":
user_prompt += "\n"
user_prompt += "\nHint: " + hint_text
return user_prompt.strip()
def form_tool_section_text(
tools: list[ToolInfo] | None, retrieval_enabled: bool, template: str = TOOL_TEMPLATE
) -> str | None:
if not tools and not retrieval_enabled:
return None
if retrieval_enabled and tools:
tools.append(
{"name": DANSWER_TOOL_NAME, "description": DANSWER_TOOL_DESCRIPTION}
)
tools_intro = []
if tools:
num_tools = len(tools)
for tool in tools:
description_formatted = tool["description"].replace("\n", " ")
tools_intro.append(f"> {tool['name']}: {description_formatted}")
prefix = "Must be one of " if num_tools > 1 else "Must be "
tools_intro_text = "\n".join(tools_intro)
tool_names_text = prefix + ", ".join([tool["name"] for tool in tools])
else:
return None
return template.format(
tool_overviews=tools_intro_text, tool_names=tool_names_text
).strip()
def form_tool_followup_text(
tool_output: str,
query: str,
hint_text: str | None,
tool_followup_prompt: str = TOOL_FOLLOWUP,
ignore_hint: bool = False,
) -> str:
# If multi-line query, it likely confuses the model more than helps
if "\n" not in query:
optional_reminder = f"\nAs a reminder, my query was: {query}\n"
else:
optional_reminder = ""
if not ignore_hint and hint_text:
hint_text_spaced = f"\nHint: {hint_text}\n"
else:
hint_text_spaced = ""
return tool_followup_prompt.format(
tool_output=tool_output,
optional_reminder=optional_reminder,
hint=hint_text_spaced,
).strip()
def form_tool_less_followup_text(
tool_output: str,
query: str,
hint_text: str | None,
tool_followup_prompt: str = TOOL_LESS_FOLLOWUP,
) -> str:
hint = f"Hint: {hint_text}" if hint_text else ""
return tool_followup_prompt.format(
context_str=tool_output, user_query=query, hint_text=hint
).strip()

View File

View File

@@ -0,0 +1,224 @@
import os
from danswer.configs.constants import AuthType
from danswer.configs.constants import DocumentIndexType
#####
# App Configs
#####
APP_HOST = "0.0.0.0"
APP_PORT = 8080
# API_PREFIX is used to prepend a base path for all API routes
# generally used if using a reverse proxy which doesn't support stripping the `/api`
# prefix from requests directed towards the API server. In these cases, set this to `/api`
APP_API_PREFIX = os.environ.get("API_PREFIX", "")
#####
# User Facing Features Configs
#####
BLURB_SIZE = 128 # Number Encoder Tokens included in the chunk blurb
GENERATIVE_MODEL_ACCESS_CHECK_FREQ = 86400 # 1 day
DISABLE_GENERATIVE_AI = os.environ.get("DISABLE_GENERATIVE_AI", "").lower() == "true"
#####
# Web Configs
#####
# WEB_DOMAIN is used to set the redirect_uri after login flows
WEB_DOMAIN = os.environ.get("WEB_DOMAIN") or "http://localhost:3000"
#####
# Auth Configs
#####
AUTH_TYPE = AuthType((os.environ.get("AUTH_TYPE") or AuthType.DISABLED.value).lower())
DISABLE_AUTH = AUTH_TYPE == AuthType.DISABLED
# Turn off mask if admin users should see full credentials for data connectors.
MASK_CREDENTIAL_PREFIX = (
os.environ.get("MASK_CREDENTIAL_PREFIX", "True").lower() != "false"
)
SECRET = os.environ.get("SECRET", "")
SESSION_EXPIRE_TIME_SECONDS = int(
os.environ.get("SESSION_EXPIRE_TIME_SECONDS") or 86400
) # 1 day
# set `VALID_EMAIL_DOMAINS` to a comma seperated list of domains in order to
# restrict access to Danswer to only users with emails from those domains.
# E.g. `VALID_EMAIL_DOMAINS=example.com,example.org` will restrict Danswer
# signups to users with either an @example.com or an @example.org email.
# NOTE: maintaining `VALID_EMAIL_DOMAIN` to keep backwards compatibility
_VALID_EMAIL_DOMAIN = os.environ.get("VALID_EMAIL_DOMAIN", "")
_VALID_EMAIL_DOMAINS_STR = (
os.environ.get("VALID_EMAIL_DOMAINS", "") or _VALID_EMAIL_DOMAIN
)
VALID_EMAIL_DOMAINS = (
[domain.strip() for domain in _VALID_EMAIL_DOMAINS_STR.split(",")]
if _VALID_EMAIL_DOMAINS_STR
else []
)
# OAuth Login Flow
# Used for both Google OAuth2 and OIDC flows
OAUTH_CLIENT_ID = (
os.environ.get("OAUTH_CLIENT_ID", os.environ.get("GOOGLE_OAUTH_CLIENT_ID")) or ""
)
OAUTH_CLIENT_SECRET = (
os.environ.get("OAUTH_CLIENT_SECRET", os.environ.get("GOOGLE_OAUTH_CLIENT_SECRET"))
or ""
)
# for basic auth
REQUIRE_EMAIL_VERIFICATION = (
os.environ.get("REQUIRE_EMAIL_VERIFICATION", "").lower() == "true"
)
SMTP_SERVER = os.environ.get("SMTP_SERVER") or "smtp.gmail.com"
SMTP_PORT = int(os.environ.get("SMTP_PORT") or "587")
SMTP_USER = os.environ.get("SMTP_USER", "your-email@gmail.com")
SMTP_PASS = os.environ.get("SMTP_PASS", "your-gmail-password")
#####
# DB Configs
#####
DOCUMENT_INDEX_NAME = "danswer_index"
# Vespa is now the default document index store for both keyword and vector
DOCUMENT_INDEX_TYPE = os.environ.get(
"DOCUMENT_INDEX_TYPE", DocumentIndexType.COMBINED.value
)
VESPA_HOST = os.environ.get("VESPA_HOST") or "localhost"
VESPA_PORT = os.environ.get("VESPA_PORT") or "8081"
VESPA_TENANT_PORT = os.environ.get("VESPA_TENANT_PORT") or "19071"
# The default below is for dockerized deployment
VESPA_DEPLOYMENT_ZIP = (
os.environ.get("VESPA_DEPLOYMENT_ZIP") or "/app/danswer/vespa-app.zip"
)
# Number of documents in a batch during indexing (further batching done by chunks before passing to bi-encoder)
try:
INDEX_BATCH_SIZE = int(os.environ.get("INDEX_BATCH_SIZE", 16))
except ValueError:
INDEX_BATCH_SIZE = 16
# Below are intended to match the env variables names used by the official postgres docker image
# https://hub.docker.com/_/postgres
POSTGRES_USER = os.environ.get("POSTGRES_USER") or "postgres"
POSTGRES_PASSWORD = os.environ.get("POSTGRES_PASSWORD") or "password"
POSTGRES_HOST = os.environ.get("POSTGRES_HOST") or "localhost"
POSTGRES_PORT = os.environ.get("POSTGRES_PORT") or "5432"
POSTGRES_DB = os.environ.get("POSTGRES_DB") or "postgres"
#####
# Connector Configs
#####
GOOGLE_DRIVE_INCLUDE_SHARED = False
GOOGLE_DRIVE_FOLLOW_SHORTCUTS = False
FILE_CONNECTOR_TMP_STORAGE_PATH = os.environ.get(
"FILE_CONNECTOR_TMP_STORAGE_PATH", "/home/file_connector_storage"
)
# TODO these should be available for frontend configuration, via advanced options expandable
WEB_CONNECTOR_IGNORED_CLASSES = os.environ.get(
"WEB_CONNECTOR_IGNORED_CLASSES", "sidebar,footer"
).split(",")
WEB_CONNECTOR_IGNORED_ELEMENTS = os.environ.get(
"WEB_CONNECTOR_IGNORED_ELEMENTS", "nav,footer,meta,script,style,symbol,aside"
).split(",")
WEB_CONNECTOR_OAUTH_CLIENT_ID = os.environ.get("WEB_CONNECTOR_OAUTH_CLIENT_ID")
WEB_CONNECTOR_OAUTH_CLIENT_SECRET = os.environ.get("WEB_CONNECTOR_OAUTH_CLIENT_SECRET")
WEB_CONNECTOR_OAUTH_TOKEN_URL = os.environ.get("WEB_CONNECTOR_OAUTH_TOKEN_URL")
NOTION_CONNECTOR_ENABLE_RECURSIVE_PAGE_LOOKUP = (
os.environ.get("NOTION_CONNECTOR_ENABLE_RECURSIVE_PAGE_LOOKUP", "").lower()
== "true"
)
CONFLUENCE_CONNECTOR_LABELS_TO_SKIP = [
ignored_tag
for ignored_tag in os.environ.get("CONFLUENCE_CONNECTOR_LABELS_TO_SKIP", "").split(
","
)
if ignored_tag
]
GONG_CONNECTOR_START_TIME = os.environ.get("GONG_CONNECTOR_START_TIME")
DASK_JOB_CLIENT_ENABLED = (
os.environ.get("DASK_JOB_CLIENT_ENABLED", "").lower() == "true"
)
EXPERIMENTAL_CHECKPOINTING_ENABLED = (
os.environ.get("EXPERIMENTAL_CHECKPOINTING_ENABLED", "").lower() == "true"
)
#####
# Indexing Configs
#####
# NOTE: Currently only supported in the Confluence and Google Drive connectors +
# only handles some failures (Confluence = handles API call failures, Google
# Drive = handles failures pulling files / parsing them)
CONTINUE_ON_CONNECTOR_FAILURE = os.environ.get(
"CONTINUE_ON_CONNECTOR_FAILURE", ""
).lower() not in ["false", ""]
# Controls how many worker processes we spin up to index documents in the
# background. This is useful for speeding up indexing, but does require a
# fairly large amount of memory in order to increase substantially, since
# each worker loads the embedding models into memory.
NUM_INDEXING_WORKERS = int(os.environ.get("NUM_INDEXING_WORKERS") or 1)
CHUNK_OVERLAP = 0
# More accurate results at the expense of indexing speed and index size (stores additional 4 MINI_CHUNK vectors)
ENABLE_MINI_CHUNK = os.environ.get("ENABLE_MINI_CHUNK", "").lower() == "true"
# Finer grained chunking for more detail retention
# Slightly larger since the sentence aware split is a max cutoff so most minichunks will be under MINI_CHUNK_SIZE
# tokens. But we need it to be at least as big as 1/4th chunk size to avoid having a tiny mini-chunk at the end
MINI_CHUNK_SIZE = 150
# Timeout to wait for job's last update before killing it, in hours
CLEANUP_INDEXING_JOBS_TIMEOUT = int(os.environ.get("CLEANUP_INDEXING_JOBS_TIMEOUT", 1))
#####
# Model Server Configs
#####
# If MODEL_SERVER_HOST is set, the NLP models required for Danswer are offloaded to the server via
# requests. Be sure to include the scheme in the MODEL_SERVER_HOST value.
MODEL_SERVER_HOST = os.environ.get("MODEL_SERVER_HOST") or None
MODEL_SERVER_ALLOWED_HOST = os.environ.get("MODEL_SERVER_HOST") or "0.0.0.0"
MODEL_SERVER_PORT = int(os.environ.get("MODEL_SERVER_PORT") or "9000")
# specify this env variable directly to have a different model server for the background
# indexing job vs the api server so that background indexing does not effect query-time
# performance
INDEXING_MODEL_SERVER_HOST = (
os.environ.get("INDEXING_MODEL_SERVER_HOST") or MODEL_SERVER_HOST
)
#####
# Miscellaneous
#####
DYNAMIC_CONFIG_STORE = os.environ.get(
"DYNAMIC_CONFIG_STORE", "FileSystemBackedDynamicConfigStore"
)
DYNAMIC_CONFIG_DIR_PATH = os.environ.get("DYNAMIC_CONFIG_DIR_PATH", "/home/storage")
JOB_TIMEOUT = 60 * 60 * 6 # 6 hours default
# used to allow the background indexing jobs to use a different embedding
# model server than the API server
CURRENT_PROCESS_IS_AN_INDEXING_JOB = (
os.environ.get("CURRENT_PROCESS_IS_AN_INDEXING_JOB", "").lower() == "true"
)
# Logs every model prompt and output, mostly used for development or exploration purposes
LOG_ALL_MODEL_INTERACTIONS = (
os.environ.get("LOG_ALL_MODEL_INTERACTIONS", "").lower() == "true"
)
# If set to `true` will enable additional logs about Vespa query performance
# (time spent on finding the right docs + time spent fetching summaries from disk)
LOG_VESPA_TIMING_INFORMATION = (
os.environ.get("LOG_VESPA_TIMING_INFORMATION", "").lower() == "true"
)
# Anonymous usage telemetry
DISABLE_TELEMETRY = os.environ.get("DISABLE_TELEMETRY", "").lower() == "true"
# notset, debug, info, warning, error, or critical
LOG_LEVEL = os.environ.get("LOG_LEVEL", "info")

View File

@@ -0,0 +1,75 @@
import os
from danswer.configs.model_configs import CHUNK_SIZE
PROMPTS_YAML = "./danswer/chat/prompts.yaml"
PERSONAS_YAML = "./danswer/chat/personas.yaml"
NUM_RETURNED_HITS = 50
NUM_RERANKED_RESULTS = 15
# We feed in document chunks until we reach this token limit.
# Default is ~5 full chunks (max chunk size is 2000 chars), although some chunks may be
# significantly smaller which could result in passing in more total chunks.
# There is also a slight bit of overhead, not accounted for here such as separator patterns
# between the docs, metadata for the docs, etc.
# Finally, this is combined with the rest of the QA prompt, so don't set this too close to the
# model token limit
NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL = int(
os.environ.get("NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL") or (CHUNK_SIZE * 5)
)
DEFAULT_NUM_CHUNKS_FED_TO_CHAT: float = (
float(NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL) / CHUNK_SIZE
)
NUM_DOCUMENT_TOKENS_FED_TO_CHAT = int(
os.environ.get("NUM_DOCUMENT_TOKENS_FED_TO_CHAT") or (CHUNK_SIZE * 3)
)
# For selecting a different LLM question-answering prompt format
# Valid values: default, cot, weak
QA_PROMPT_OVERRIDE = os.environ.get("QA_PROMPT_OVERRIDE") or None
# 1 / (1 + DOC_TIME_DECAY * doc-age-in-years), set to 0 to have no decay
# Capped in Vespa at 0.5
DOC_TIME_DECAY = float(
os.environ.get("DOC_TIME_DECAY") or 0.5 # Hits limit at 2 years by default
)
FAVOR_RECENT_DECAY_MULTIPLIER = 2.0
# Currently this next one is not configurable via env
DISABLE_LLM_QUERY_ANSWERABILITY = QA_PROMPT_OVERRIDE == "weak"
DISABLE_LLM_FILTER_EXTRACTION = (
os.environ.get("DISABLE_LLM_FILTER_EXTRACTION", "").lower() == "true"
)
# Whether the LLM should evaluate all of the document chunks passed in for usefulness
# in relation to the user query
DISABLE_LLM_CHUNK_FILTER = (
os.environ.get("DISABLE_LLM_CHUNK_FILTER", "").lower() == "true"
)
# Whether the LLM should be used to decide if a search would help given the chat history
DISABLE_LLM_CHOOSE_SEARCH = (
os.environ.get("DISABLE_LLM_CHOOSE_SEARCH", "").lower() == "true"
)
# 1 edit per 20 characters, currently unused due to fuzzy match being too slow
QUOTE_ALLOWED_ERROR_PERCENT = 0.05
QA_TIMEOUT = int(os.environ.get("QA_TIMEOUT") or "60") # 60 seconds
# Include additional document/chunk metadata in prompt to GenerativeAI
INCLUDE_METADATA = False
# Keyword Search Drop Stopwords
# If user has changed the default model, would most likely be to use a multilingual
# model, the stopwords are NLTK english stopwords so then we would want to not drop the keywords
if os.environ.get("EDIT_KEYWORD_QUERY"):
EDIT_KEYWORD_QUERY = os.environ.get("EDIT_KEYWORD_QUERY", "").lower() == "true"
else:
EDIT_KEYWORD_QUERY = not os.environ.get("DOCUMENT_ENCODER_MODEL")
# Weighting factor between Vector and Keyword Search, 1 for completely vector search
HYBRID_ALPHA = max(0, min(1, float(os.environ.get("HYBRID_ALPHA") or 0.66)))
# Weighting factor between Title and Content of documents during search, 1 for completely
# Title based. Default heavily favors Content because Title is also included at the top of
# Content. This is to avoid cases where the Content is very relevant but it may not be clear
# if the title is separated out. Title is most of a "boost" than a separate field.
TITLE_CONTENT_RATIO = max(
0, min(1, float(os.environ.get("TITLE_CONTENT_RATIO") or 0.20))
)
# A list of languages passed to the LLM to rephase the query
# For example "English,French,Spanish", be sure to use the "," separator
MULTILINGUAL_QUERY_EXPANSION = os.environ.get("MULTILINGUAL_QUERY_EXPANSION") or None
# The backend logic for this being True isn't fully supported yet
HARD_DELETE_CHATS = False

View File

@@ -0,0 +1,108 @@
from enum import Enum
DOCUMENT_ID = "document_id"
CHUNK_ID = "chunk_id"
BLURB = "blurb"
CONTENT = "content"
SOURCE_TYPE = "source_type"
SOURCE_LINKS = "source_links"
SOURCE_LINK = "link"
SEMANTIC_IDENTIFIER = "semantic_identifier"
TITLE = "title"
SECTION_CONTINUATION = "section_continuation"
EMBEDDINGS = "embeddings"
TITLE_EMBEDDING = "title_embedding"
ALLOWED_USERS = "allowed_users"
ACCESS_CONTROL_LIST = "access_control_list"
DOCUMENT_SETS = "document_sets"
TIME_FILTER = "time_filter"
METADATA = "metadata"
METADATA_LIST = "metadata_list"
MATCH_HIGHLIGHTS = "match_highlights"
# stored in the `metadata` of a chunk. Used to signify that this chunk should
# not be used for QA. For example, Google Drive file types which can't be parsed
# are still useful as a search result but not for QA.
IGNORE_FOR_QA = "ignore_for_qa"
GEN_AI_API_KEY_STORAGE_KEY = "genai_api_key"
PUBLIC_DOC_PAT = "PUBLIC"
PUBLIC_DOCUMENT_SET = "__PUBLIC"
QUOTE = "quote"
BOOST = "boost"
DOC_UPDATED_AT = "doc_updated_at" # Indexed as seconds since epoch
PRIMARY_OWNERS = "primary_owners"
SECONDARY_OWNERS = "secondary_owners"
RECENCY_BIAS = "recency_bias"
HIDDEN = "hidden"
SCORE = "score"
ID_SEPARATOR = ":;:"
DEFAULT_BOOST = 0
SESSION_KEY = "session"
QUERY_EVENT_ID = "query_event_id"
LLM_CHUNKS = "llm_chunks"
# For chunking/processing chunks
TITLE_SEPARATOR = "\n\r\n"
SECTION_SEPARATOR = "\n\n"
# For combining attributes, doesn't have to be unique/perfect to work
INDEX_SEPARATOR = "==="
# Messages
DISABLED_GEN_AI_MSG = (
"Your System Admin has disabled the Generative AI functionalities of Danswer.\n"
"Please contact them if you wish to have this enabled.\n"
"You can still use Danswer as a search engine."
)
class DocumentSource(str, Enum):
# Special case, document passed in via Danswer APIs without specifying a source type
INGESTION_API = "ingestion_api"
SLACK = "slack"
WEB = "web"
GOOGLE_DRIVE = "google_drive"
REQUESTTRACKER = "requesttracker"
GITHUB = "github"
GURU = "guru"
BOOKSTACK = "bookstack"
CONFLUENCE = "confluence"
SLAB = "slab"
JIRA = "jira"
PRODUCTBOARD = "productboard"
FILE = "file"
NOTION = "notion"
ZULIP = "zulip"
LINEAR = "linear"
HUBSPOT = "hubspot"
DOCUMENT360 = "document360"
GONG = "gong"
GOOGLE_SITES = "google_sites"
ZENDESK = "zendesk"
class DocumentIndexType(str, Enum):
COMBINED = "combined" # Vespa
SPLIT = "split" # Typesense + Qdrant
class AuthType(str, Enum):
DISABLED = "disabled"
BASIC = "basic"
GOOGLE_OAUTH = "google_oauth"
OIDC = "oidc"
SAML = "saml"
class SearchFeedbackType(str, Enum):
ENDORSE = "endorse" # boost this document for all future queries
REJECT = "reject" # down-boost this document for all future queries
HIDE = "hide" # mark this document as untrusted, hide from LLM
UNHIDE = "unhide"
class MessageType(str, Enum):
# Using OpenAI standards, Langchain equivalent shown in comment
# System message is always constructed on the fly, not saved
SYSTEM = "system" # SystemMessage
USER = "user" # HumanMessage
ASSISTANT = "assistant" # AIMessage

View File

@@ -0,0 +1,50 @@
import os
#####
# Danswer Slack Bot Configs
#####
DANSWER_BOT_NUM_RETRIES = int(os.environ.get("DANSWER_BOT_NUM_RETRIES", "5"))
DANSWER_BOT_ANSWER_GENERATION_TIMEOUT = int(
os.environ.get("DANSWER_BOT_ANSWER_GENERATION_TIMEOUT", "90")
)
# Number of docs to display in "Reference Documents"
DANSWER_BOT_NUM_DOCS_TO_DISPLAY = int(
os.environ.get("DANSWER_BOT_NUM_DOCS_TO_DISPLAY", "5")
)
# If the LLM fails to answer, Danswer can still show the "Reference Documents"
DANSWER_BOT_DISABLE_DOCS_ONLY_ANSWER = os.environ.get(
"DANSWER_BOT_DISABLE_DOCS_ONLY_ANSWER", ""
).lower() not in ["false", ""]
# When Danswer is considering a message, what emoji does it react with
DANSWER_REACT_EMOJI = os.environ.get("DANSWER_REACT_EMOJI") or "eyes"
# Should DanswerBot send an apology message if it's not able to find an answer
# That way the user isn't confused as to why DanswerBot reacted but then said nothing
# Off by default to be less intrusive (don't want to give a notif that just says we couldnt help)
NOTIFY_SLACKBOT_NO_ANSWER = (
os.environ.get("NOTIFY_SLACKBOT_NO_ANSWER", "").lower() == "true"
)
# Mostly for debugging purposes but it's for explaining what went wrong
# if DanswerBot couldn't find an answer
DANSWER_BOT_DISPLAY_ERROR_MSGS = os.environ.get(
"DANSWER_BOT_DISPLAY_ERROR_MSGS", ""
).lower() not in [
"false",
"",
]
# Default is only respond in channels that are included by a slack config set in the UI
DANSWER_BOT_RESPOND_EVERY_CHANNEL = (
os.environ.get("DANSWER_BOT_RESPOND_EVERY_CHANNEL", "").lower() == "true"
)
# Auto detect query options like time cutoff or heavily favor recently updated docs
DISABLE_DANSWER_BOT_FILTER_DETECT = (
os.environ.get("DISABLE_DANSWER_BOT_FILTER_DETECT", "").lower() == "true"
)
# Add a second LLM call post Answer to verify if the Answer is valid
# Throws out answers that don't directly or fully answer the user query
# This is the default for all DanswerBot channels unless the channel is configured individually
# Set/unset by "Hide Non Answers"
ENABLE_DANSWERBOT_REFLEXION = (
os.environ.get("ENABLE_DANSWERBOT_REFLEXION", "").lower() == "true"
)
# Currently not support chain of thought, probably will add back later
DANSWER_BOT_DISABLE_COT = True

View File

@@ -0,0 +1,104 @@
import os
#####
# Embedding/Reranking Model Configs
#####
CHUNK_SIZE = 512
# Important considerations when choosing models
# Max tokens count needs to be high considering use case (at least 512)
# Models used must be MIT or Apache license
# Inference/Indexing speed
# https://huggingface.co/DOCUMENT_ENCODER_MODEL
# The useable models configured as below must be SentenceTransformer compatible
DOCUMENT_ENCODER_MODEL = (
os.environ.get("DOCUMENT_ENCODER_MODEL") or "thenlper/gte-small"
)
# If the below is changed, Vespa deployment must also be changed
DOC_EMBEDDING_DIM = 384
# Model should be chosen with 512 context size, ideally don't change this
DOC_EMBEDDING_CONTEXT_SIZE = 512
NORMALIZE_EMBEDDINGS = (
os.environ.get("NORMALIZE_EMBEDDINGS") or "False"
).lower() == "true"
# These are only used if reranking is turned off, to normalize the direct retrieval scores for display
# Currently unused
SIM_SCORE_RANGE_LOW = float(os.environ.get("SIM_SCORE_RANGE_LOW") or 0.0)
SIM_SCORE_RANGE_HIGH = float(os.environ.get("SIM_SCORE_RANGE_HIGH") or 1.0)
# Certain models like e5, BGE, etc use a prefix for asymmetric retrievals (query generally shorter than docs)
ASYM_QUERY_PREFIX = os.environ.get("ASYM_QUERY_PREFIX", "")
ASYM_PASSAGE_PREFIX = os.environ.get("ASYM_PASSAGE_PREFIX", "")
# Purely an optimization, memory limitation consideration
BATCH_SIZE_ENCODE_CHUNKS = 8
# This controls the minimum number of pytorch "threads" to allocate to the embedding
# model. If torch finds more threads on its own, this value is not used.
MIN_THREADS_ML_MODELS = int(os.environ.get("MIN_THREADS_ML_MODELS") or 1)
# Cross Encoder Settings
ENABLE_RERANKING_ASYNC_FLOW = (
os.environ.get("ENABLE_RERANKING_ASYNC_FLOW", "").lower() == "true"
)
ENABLE_RERANKING_REAL_TIME_FLOW = (
os.environ.get("ENABLE_RERANKING_REAL_TIME_FLOW", "").lower() == "true"
)
# https://www.sbert.net/docs/pretrained-models/ce-msmarco.html
CROSS_ENCODER_MODEL_ENSEMBLE = [
"cross-encoder/ms-marco-MiniLM-L-4-v2",
"cross-encoder/ms-marco-TinyBERT-L-2-v2",
]
# For score normalizing purposes, only way is to know the expected ranges
CROSS_ENCODER_RANGE_MAX = 12
CROSS_ENCODER_RANGE_MIN = -12
CROSS_EMBED_CONTEXT_SIZE = 512
# Unused currently, can't be used with the current default encoder model due to its output range
SEARCH_DISTANCE_CUTOFF = 0
# Intent model max context size
QUERY_MAX_CONTEXT_SIZE = 256
# Danswer custom Deep Learning Models
INTENT_MODEL_VERSION = "danswer/intent-model"
#####
# Generative AI Model Configs
#####
# If changing GEN_AI_MODEL_PROVIDER or GEN_AI_MODEL_VERSION from the default,
# be sure to use one that is LiteLLM compatible:
# https://litellm.vercel.app/docs/providers/azure#completion---using-env-variables
# The provider is the prefix before / in the model argument
# Additionally Danswer supports GPT4All and custom request library based models
# Set GEN_AI_MODEL_PROVIDER to "custom" to use the custom requests approach
# Set GEN_AI_MODEL_PROVIDER to "gpt4all" to use gpt4all models running locally
GEN_AI_MODEL_PROVIDER = os.environ.get("GEN_AI_MODEL_PROVIDER") or "openai"
# If using Azure, it's the engine name, for example: Danswer
GEN_AI_MODEL_VERSION = os.environ.get("GEN_AI_MODEL_VERSION") or "gpt-3.5-turbo"
# For secondary flows like extracting filters or deciding if a chunk is useful, we don't need
# as powerful of a model as say GPT-4 so we can use an alternative that is faster and cheaper
FAST_GEN_AI_MODEL_VERSION = (
os.environ.get("FAST_GEN_AI_MODEL_VERSION") or GEN_AI_MODEL_VERSION
)
# If the Generative AI model requires an API key for access, otherwise can leave blank
GEN_AI_API_KEY = (
os.environ.get("GEN_AI_API_KEY", os.environ.get("OPENAI_API_KEY")) or None
)
# API Base, such as (for Azure): https://danswer.openai.azure.com/
GEN_AI_API_ENDPOINT = os.environ.get("GEN_AI_API_ENDPOINT") or None
# API Version, such as (for Azure): 2023-09-15-preview
GEN_AI_API_VERSION = os.environ.get("GEN_AI_API_VERSION") or None
# LiteLLM custom_llm_provider
GEN_AI_LLM_PROVIDER_TYPE = os.environ.get("GEN_AI_LLM_PROVIDER_TYPE") or None
# Set this to be enough for an answer + quotes. Also used for Chat
GEN_AI_MAX_OUTPUT_TOKENS = int(os.environ.get("GEN_AI_MAX_OUTPUT_TOKENS") or 1024)
# This next restriction is only used for chat ATM, used to expire old messages as needed
GEN_AI_MAX_INPUT_TOKENS = int(os.environ.get("GEN_AI_MAX_INPUT_TOKENS") or 3000)
# History for secondary LLM flows, not primary chat flow, generally we don't need to
# include as much as possible as this just bumps up the cost unnecessarily
GEN_AI_HISTORY_CUTOFF = int(0.5 * GEN_AI_MAX_INPUT_TOKENS)
GEN_AI_TEMPERATURE = float(os.environ.get("GEN_AI_TEMPERATURE") or 0)

View File

@@ -0,0 +1,84 @@
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/backend/danswer/connectors/README.md"} -->
# Writing a new Danswer Connector
This README covers how to contribute a new Connector for Danswer. It includes an overview of the design, interfaces,
and required changes.
Thank you for your contribution!
### Connector Overview
Connectors come in 3 different flows:
- Load Connector:
- Bulk indexes documents to reflect a point in time. This type of connector generally works by either pulling all
documents via a connector's API or loads the documents from some sort of a dump file.
- Poll connector:
- Incrementally updates documents based on a provided time range. It is used by the background job to pull the latest
changes additions and changes since the last round of polling. This connector helps keep the document index up to date
without needing to fetch/embed/index every document which generally be too slow to do frequently on large sets of
documents.
- Event Based connectors:
- Connectors that listen to events and update documents accordingly.
- Currently not used by the background job, this exists for future design purposes.
### Connector Implementation
Refer to [interfaces.py](https://github.com/danswer-ai/danswer/blob/main/backend/danswer/connectors/interfaces.py)
and this first contributor created Pull Request for a new connector (Shoutout to Dan Brown):
[Reference Pull Request](https://github.com/danswer-ai/danswer/pull/139)
#### Implementing the new Connector
The connector must subclass one or more of LoadConnector, PollConnector, or EventConnector.
The `__init__` should take arguments for configuring what documents the connector will and where it finds those
documents. For example, if you have a wiki site, it may include the configuration for the team, topic, folder, etc. of
the documents to fetch. It may also include the base domain of the wiki. Alternatively, if all the access information
of the connector is stored in the credential/token, then there may be no required arguments.
`load_credentials` should take a dictionary which provides all the access information that the connector might need.
For example this could be the user's username and access token.
Refer to the existing connectors for `load_from_state` and `poll_source` examples. There is not yet a process to listen
for EventConnector events, this will come down the line.
#### Development Tip
It may be handy to test your new connector separate from the rest of the stack while developing.
Follow the below template:
```commandline
if __name__ == "__main__":
import time
test_connector = NewConnector(space="engineering")
test_connector.load_credentials({
"user_id": "foobar",
"access_token": "fake_token"
})
all_docs = test_connector.load_from_state()
current = time.time()
one_day_ago = current - 24 * 60 * 60 # 1 day
latest_docs = test_connector.poll_source(one_day_ago, current)
```
### Additional Required Changes:
#### Backend Changes
- Add a new type to
[DocumentSource](https://github.com/danswer-ai/danswer/blob/main/backend/danswer/configs/constants.py)
- Add a mapping from DocumentSource (and optionally connector type) to the right connector class
[here](https://github.com/danswer-ai/danswer/blob/main/backend/danswer/connectors/factory.py#L33)
#### Frontend Changes
- Create the new connector directory and admin page under `danswer/web/src/app/admin/connectors/`
- Create the new icon, type, source, and filter changes
(refer to existing [PR](https://github.com/danswer-ai/danswer/pull/139))
#### Docs Changes
Create the new connector page (with guiding images!) with how to get the connector credentials and how to set up the
connector in Danswer. Then create a Pull Request in https://github.com/danswer-ai/danswer-docs
### Before opening PR
1. Be sure to fully test changes end to end with setting up the connector and updating the index with new docs from the
new connector.
2. Be sure to run the linting/formatting, refer to the formatting and linting section in
[CONTRIBUTING.md](https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md#formatting-and-linting)

View File

View File

@@ -0,0 +1,56 @@
from typing import Any
import requests
class BookStackClientRequestFailedError(ConnectionError):
def __init__(self, status: int, error: str) -> None:
super().__init__(
"BookStack Client request failed with status {status}: {error}".format(
status=status, error=error
)
)
class BookStackApiClient:
def __init__(
self,
base_url: str,
token_id: str,
token_secret: str,
) -> None:
self.base_url = base_url
self.token_id = token_id
self.token_secret = token_secret
def get(self, endpoint: str, params: dict[str, str]) -> dict[str, Any]:
url: str = self._build_url(endpoint)
headers = self._build_headers()
response = requests.get(url, headers=headers, params=params)
try:
json = response.json()
except Exception:
json = {}
if response.status_code >= 300:
error = response.reason
response_error = json.get("error", {}).get("message", "")
if response_error:
error = response_error
raise BookStackClientRequestFailedError(response.status_code, error)
return json
def _build_headers(self) -> dict[str, str]:
auth = "Token " + self.token_id + ":" + self.token_secret
return {
"Authorization": auth,
"Accept": "application/json",
}
def _build_url(self, endpoint: str) -> str:
return self.base_url.rstrip("/") + "/api/" + endpoint.lstrip("/")
def build_app_url(self, endpoint: str) -> str:
return self.base_url.rstrip("/") + "/" + endpoint.lstrip("/")

Some files were not shown because too many files have changed in this diff Show More