Compare commits

..

543 Commits

Author SHA1 Message Date
Weves
5da0751ac6 Move web Dockerfile to single stage since multi-stage has been causing issues w/ github actions 2024-05-17 00:34:18 -07:00
Weves
c934ed78f4 Upgrade FE packages 2024-05-16 22:13:02 -07:00
Weves
0369ddef58 Fix document set editor refresh 2024-05-16 14:42:28 -07:00
Weves
8c17c77ed9 Small fixes to LLM configuration screen 2024-05-14 22:51:28 -07:00
Weves
f087d3eac0 Fix sleep + handle duplicates 2024-05-14 17:44:33 -07:00
Weves
10232c7c54 Bump openai version 2024-05-14 17:28:50 -07:00
Weves
7928ea2fff Improve document set page UX 2024-05-14 15:43:26 -07:00
Weves
05bc6b1c65 Add pagination to document set syncing + improve speed 2024-05-14 15:43:26 -07:00
Weves
6f90308278 Add gpt-4o support 2024-05-14 13:32:00 -07:00
Weves
d0850a0288 Fix model names for enabled LLM providers 2024-05-14 13:32:00 -07:00
Weves
e573ba80b9 Revert black bump 2024-05-14 00:20:03 -07:00
Weves
5d1a81001e bump litellm 2024-05-13 18:29:33 -07:00
Weves
8b95395f34 Remove chunk limit of 20 2024-05-13 11:35:16 -07:00
Weves
e8b38d5f63 Hide search tool if no connectors exist 2024-05-13 01:22:37 -07:00
Weves
c2cdce4d49 Tool calling framework 2024-05-13 00:47:39 -07:00
Yuhong Sun
546815dc8c Consolidate File Processing (#1449) 2024-05-11 23:11:22 -07:00
Yuhong Sun
e89c81de76 Make User Promotion Demotion sync calls (#1448) 2024-05-11 16:25:56 -07:00
Ryan Gordon
5bf123da53 Added user demotion functionality. (#1444) 2024-05-11 15:59:47 -07:00
Yuhong Sun
7a02fd7ad7 Touchups from Contributor PRs (#1447) 2024-05-11 15:58:33 -07:00
Weves
4e759717ab Fix mypy 2024-05-11 12:36:49 -07:00
Weves
2e0be9f2da Folder support 2024-05-11 12:29:35 -07:00
mattboret
eb1b604b8c Allow to define custom conditions for the answer prompt answer validation (#1347)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2024-05-11 10:43:05 -07:00
Shravan Vishwanathan
b8af38bb95 Refactor comment extraction in JIRA connector to handle nested content (#1329)
- Implement `extract_text_from_content` to parse nested text elements from comment bodies.
- Modify `_get_comment_strs` to use the new text extraction method, improving handling of various content structures.
2024-05-11 10:41:27 -07:00
dependabot[bot]
cfd9159b27 Bump aiohttp from 3.9.2 to 3.9.4 in /backend/requirements (#1349) 2024-05-11 10:33:08 -07:00
dependabot[bot]
52fd18d3bd Bump pydantic from 1.10.7 to 1.10.13 in /backend/requirements (#1377) 2024-05-11 10:32:31 -07:00
EdmundKorley
b72e6861e7 Add handling for unsupported block types in NotionConnector (#1231) 2024-05-11 10:32:02 -07:00
Davy Peter Braun
20a22e2bc0 fix(config): password auth to be url-encoded to avoid some deployment errors (#1422) 2024-05-11 10:29:51 -07:00
mattboret
a467999984 Add Slack feedback reminder (#1262)
---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2024-05-11 10:27:09 -07:00
mattboret
1729f78930 set follow-up emoji on an invalid answer (#1263)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2024-05-11 10:16:42 -07:00
dependabot[bot]
94a6db51c8 Bump black from 23.3.0 to 24.3.0 in /backend/requirements (#1236) 2024-05-11 10:14:52 -07:00
Matthew Holland
d729066194 Feature: Added File connector support for .docx, .pptx, .xlsx, .csv, .eml, and .epub file types (#1284) 2024-05-10 19:06:13 -07:00
Yuhong Sun
c6b45a550f Update Launch Json (#1443) 2024-05-10 16:52:10 -07:00
Yuhong Sun
34d05f4599 Mypy fixes for default configs (#1442) 2024-05-10 16:46:28 -07:00
Weves
7f1ffa3921 Add predefined feedback option 2024-05-09 19:11:17 -07:00
Weves
957d3625c2 Add autorefresh for document sets page 2024-05-09 14:33:55 -07:00
Weves
683addc390 Use Vespa Visit to handle long documents 2024-05-09 14:33:55 -07:00
Moshe Zada
2952b1dd96 Split slack messages up to 3K messages (#1379) 2024-05-09 11:05:02 -07:00
Moshe Zada
9e08ab98a0 show error when warmin up encoders (#1314) 2024-05-09 10:52:54 -07:00
Bijay Regmi
436806f2e3 add gpu support and README for documentation (#1398) 2024-05-09 10:51:37 -07:00
JayGhiya
ffea041398 Helm Chart Support (#1177) 2024-05-08 18:06:00 -07:00
Weves
eef54c8a86 Add non-ee fallback to fetch_versioned_implementation 2024-05-08 16:26:45 -07:00
Weves
7ed176b7cc Lock improvement 2024-05-08 16:11:56 -07:00
Weves
8cbf7c8097 Custom LLM provider fix 2024-05-07 17:17:38 -07:00
Weves
76a5f26fe1 Add display names to LLMProvider + allow multiple configs from the same provider 2024-05-07 16:26:04 -07:00
Weves
d6522426c9 Make access key and secret optional for AWS Bedrock 2024-05-07 01:11:09 -07:00
Yuhong Sun
45d5d7af4a Ingestion API Additions (#1424) 2024-05-06 21:53:35 -07:00
Yuhong Sun
01476a37c3 Encrypted Sensitive Fields (#1423) 2024-05-06 18:02:42 -07:00
Yuhong Sun
060a8d0aad Discourse Connector (#1420) 2024-05-05 16:54:08 -07:00
Yuhong Sun
03911de8b2 Danswerbot Stats (#1421) 2024-05-04 18:12:54 -07:00
Weves
1d3d84456a Small cleanup 2024-05-03 17:41:51 -07:00
Yuhong Sun
745f68241d Chat Folders Backend (#1419) 2024-05-03 16:37:18 -07:00
Mehmet Bektas
6cbfe1bcdb support for passing extra headers to litellm using env variables 2024-05-03 14:42:22 -07:00
mattboret
2ff207218e Confluence: Add config to index only active pages (#1348)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2024-05-03 09:04:09 -07:00
Vikas Neha Ojha
143b50c519 Save correct document url from document360 (#1413) 2024-05-01 18:28:34 -07:00
Yuhong Sun
577c870acb Manage me endpoint key rename (#1410) 2024-04-30 23:27:33 -07:00
Yuhong Sun
7b94159115 Api key email display for manage/me endpoint (#1409) 2024-04-30 22:05:40 -07:00
Weves
96762cfe44 Misc UI improvments 2024-04-30 20:54:59 -07:00
Weves
b89e9127d7 Fix double message send 2024-04-30 13:10:40 -07:00
Weves
3fb68af405 Address rate limiting for Notion 2024-04-30 01:43:08 -07:00
Weves
5b93e786ad Add image upload capabilities 2024-04-29 01:53:34 -07:00
Davy Peter Braun
350e548b2d fix(gitignore): properly ignore celery-generated file 2024-04-28 11:23:07 -07:00
Weves
a2156dd836 Cancel scheduled indexing attempts on deletion request 2024-04-27 16:02:30 -07:00
Weves
a19290cb27 Address 'PGRES_TUPLES_OK and no message from the libpq' issues 2024-04-27 14:42:18 -07:00
Weves
f5b3333df3 Add UI-based LLM selection 2024-04-27 01:41:26 -07:00
Yuhong Sun
4c740060aa Fix Citation Sort Tiebreak (#1397) 2024-04-26 18:33:09 -07:00
Yuhong Sun
6f2d6fc5f2 Set message logging to debug (#1396) 2024-04-26 17:27:56 -07:00
Weves
73d94086d6 Fix compose file dependencies 2024-04-26 16:57:20 -07:00
Weves
9211334597 Fix document360 mypy issue 2024-04-26 00:49:00 -07:00
Weves
60d5abae3c Add token rate limit tables to MIT 2024-04-26 00:37:28 -07:00
Weves
85a8f9926c Fix credential form refresh 2024-04-25 17:46:21 -07:00
Vikas Neha Ojha
fe03747a1a Support to loop through all nested subcategories (#1382)
* Fix for parser failing if doc is blank

* Support to loop through all nested child categories
2024-04-25 17:27:30 -07:00
Yuhong Sun
ead7a80297 Fix Tag Integer Enums (#1388) 2024-04-25 17:20:35 -07:00
Yuhong Sun
d756ad34f3 Connectors Whitelist Fix (#1387) 2024-04-25 16:43:25 -07:00
Yuhong Sun
b4842e3a0d Rework Disabled Connector options to Whitelist instead (#1386) 2024-04-25 16:23:11 -07:00
Mehmet Bektas
ee6b8b7f50 properly yield non-streaming responses 2024-04-25 14:49:25 -07:00
Weves
648f2d06bf Add env variable to disable streaming for the DefaultMultiLLM class 2024-04-25 09:22:14 -07:00
Yuhong Sun
66d95690cb Disabled Connectors List (#1376) 2024-04-24 17:24:17 -07:00
Yuhong Sun
d2774f8979 k 2024-04-24 16:50:40 -07:00
Jignesh Solanki
0b1695f616 fix error: NextRouter was not mounted 2024-04-23 11:05:21 -07:00
Yuhong Sun
8b4e55ca82 Vespa Batch Size (#1368) 2024-04-22 23:45:54 -07:00
Yuhong Sun
7044cae0e2 Remove Nested DB Sessions (#1367) 2024-04-22 23:05:43 -07:00
Weves
832d40e490 Allow separate vespa config server host 2024-04-22 21:36:54 -07:00
Weves
df216eafa5 Assistant rework fixes 2024-04-22 13:16:13 -07:00
Weves
b407edbe49 Personal assistants 2024-04-21 21:06:16 -07:00
Yuhong Sun
f616b7e6e5 Web Connector to only allow Global IPs (#1357) 2024-04-20 15:24:00 -07:00
Yuhong Sun
7d51549b1b Remove Unused Volumes (#1356) 2024-04-20 10:27:41 -07:00
Yuhong Sun
4e9605e652 Only Log Index Attempt CC Pair Miscount (#1355) 2024-04-20 09:25:08 -07:00
Yuhong Sun
58545ccf3a Pre download models (#1354) 2024-04-19 21:52:53 -07:00
Yuhong Sun
87f304dfd0 Swap Index Early (#1353) 2024-04-19 10:38:15 -07:00
Weves
82b9cb4cc1 Add check to ensure auth is enabled for every endpoint unless explicitly whitelisted 2024-04-19 01:26:24 -07:00
Yuhong Sun
e361e92230 Healthcheck for model server (#1350) 2024-04-18 16:22:38 -07:00
Yuhong Sun
89ff07a96b Slack improvement 2024-04-17 21:38:00 -07:00
Yuhong Sun
be12e4fa64 Double Check Files/URLs (#1344) 2024-04-17 21:34:29 -07:00
Yuhong Sun
26f8d884e1 Allow NLTK Failures (#1340) 2024-04-17 21:34:29 -07:00
Alan Hagedorn
654c103f36 test 2024-04-15 12:20:04 -07:00
Yuhong Sun
599db71238 Permission Sync Models (#1334) 2024-04-14 23:29:32 -07:00
Yuhong Sun
1b41ec2b50 Remove Search Only Model (#1331) 2024-04-14 18:49:07 -07:00
Yuhong Sun
a17060af5a Provide Additional Context for Chunk Options in APIs (#1330) 2024-04-14 18:32:22 -07:00
Alan Hagedorn
b9b1e22fac Add name to API Key (#1327) 2024-04-13 12:36:46 -07:00
Yuhong Sun
d2d042a2cc Add Container Descriptions (#1326) 2024-04-13 12:10:46 -07:00
2pac
7810e931f3 fix: Consider Hubspot ticket notes body 2024-04-12 10:14:18 -07:00
2pac
6be5f51440 fix: Stop reading Notion pages on polling 2024-04-12 09:27:45 -07:00
Yuhong Sun
b59912884b Fix Model Server (#1320) 2024-04-10 23:13:22 -07:00
Yuhong Sun
f346c2fc86 Axero Link Fix (#1317) 2024-04-10 09:44:12 -07:00
Weves
714a3c867d Add option to skip Jira tickets with a certain label 2024-04-09 19:40:36 -07:00
Weves
31bfbe5d16 Fix chat sharing 2024-04-09 11:57:09 -07:00
Weves
dac4be62e0 Fix prod compose files 2024-04-08 16:01:56 -07:00
Yuhong Sun
b432d42205 Mypy Fix (#1308) 2024-04-08 00:52:14 -07:00
Yuhong Sun
2db906b7a2 Always Use Model Server (#1306) 2024-04-07 21:25:06 -07:00
Chris Weaver
795243283d Update README.md
Remove 'Danswer is the ChatGPT for teams'
2024-04-07 14:30:26 -07:00
Weves
eb367de44d Small token budget tweaks 2024-04-04 20:58:45 -07:00
Chris Weaver
447791b455 Token budgets (#1302)
---------

Co-authored-by: Nick Donohue <ndonohue@gmail.com>
2024-04-04 20:43:24 -07:00
Weves
7ba7224929 Allow seeding of chat sessions via POST 2024-04-04 12:59:39 -07:00
Yuhong Sun
33da86c802 Reranker Warning Log (#1299) 2024-04-04 06:59:48 -07:00
Yuhong Sun
58dc620c28 Add Check for Enabling Reranking (#1298) 2024-04-04 06:29:00 -07:00
Yuhong Sun
7298cc2835 Add verbose logging in case of query failure (#1297) 2024-04-04 05:30:23 -07:00
Yuhong Sun
4abf5f27a0 Axero Forums Support (#1287) 2024-04-04 03:51:10 -07:00
Weves
c7efce3bde Enable bedrock nodels in dev compose file 2024-04-03 23:21:10 -07:00
ThomaciousD
d329061f92 Fixed: Web connector - documents deleted when no internet #1161 (#1292)
* fixing check connection before scrape in web connector #1161

* reformat

---------

Co-authored-by: ThomaciousD <ThomaciousD@me>
2024-04-02 23:17:53 -07:00
Weves
b06b95dc3a Bump litellm version to support latest Anthropic models 2024-04-02 19:32:19 -07:00
Yuhong Sun
b0e0557d63 Update Contributing (#1288) 2024-04-01 22:41:40 -07:00
Weves
87019fc18e Notion 404 graceful error handling 2024-04-01 17:37:35 -07:00
Weves
e82061a5ec Add support for specifying title via search params 2024-04-01 11:25:57 -07:00
Weves
0b0fc785a1 Fix fetch settings SS 2024-03-31 23:51:11 -07:00
Yuhong Sun
5b8cdd4eee Gpt-3.5-0125 Option (#1282) 2024-03-31 21:58:00 -07:00
Weves
a4869b727d Add ability to control available pages 2024-03-31 21:49:34 -07:00
Yuhong Sun
15f7b42e2b More Options for Search APIs (#1280) 2024-03-31 21:45:46 -07:00
Weves
32f55ddb8f URL-based chat seeding 2024-03-31 18:10:49 -07:00
Yuhong Sun
b8af1377ba Trivy Ignore Path (#1278) 2024-03-31 16:22:48 -07:00
Yuhong Sun
29f251660b Trivi Security Scan (#1277) 2024-03-31 15:32:22 -07:00
Yuhong Sun
783696a671 Axero Spaces (#1276) 2024-03-31 14:45:20 -07:00
Yuhong Sun
22477b1aca Chunker Gmail Issue Logging (#1274) 2024-03-30 00:18:57 -07:00
Weves
49acde0a8f URL-based chat sharing 2024-03-29 00:51:17 -07:00
Yuhong Sun
055cab2944 Public Slack Feedback Option (#1270) 2024-03-28 18:21:28 -07:00
Yuhong Sun
f46e65be92 Save One Shot Docs (#1269) 2024-03-28 12:48:01 -07:00
Yuhong Sun
d46b475410 Make porting from persistent volumes optional (#1268) 2024-03-28 11:26:11 -07:00
Yuhong Sun
fd69203be8 More accurate input token count for LLM (#1267) 2024-03-28 11:11:37 -07:00
Yuhong Sun
9757fbee90 Axero Connector (#1253)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-03-27 11:12:01 -07:00
Weves
5a967322fd Add ability to specify custom embedding models 2024-03-27 00:02:51 -07:00
Yuhong Sun
fbff5b5784 Save Retrieved Docs for One Shot Flows (#1259) 2024-03-26 22:48:40 -07:00
Weves
efc7d6e098 Add support for Github Flavored Markdown 2024-03-26 11:15:06 -07:00
Weves
f135ba9c0c Rework LLM answering flow 2024-03-25 13:34:03 -07:00
Weves
1ba74ee4df Refactor search pipeline 2024-03-25 13:34:03 -07:00
Yuhong Sun
7a861ecec4 Session Dependency for Chat Streaming (#1256) 2024-03-24 19:40:06 -07:00
Johannes Vass
3107edc921 Do not obtain DB session via Depends() (#1238)
Endpoints that use Depends(get_session) with a StreamingResponse have
the problem that Depends() releases the session again after the endpoint
function returns. At that point, the streaming response is not
finished yet but still holds a reference to the session and uses it.
However, there is no cleanup of the session after the answer stream
finishes which leads to the connections accumulating in state "idle in
transaction".

This was due to a breaking change in FastAPI 0.106.0
https://fastapi.tiangolo.com/release-notes/#01060

Co-authored-by: Johannes Vass <johannes.vass@cloudflight.io>
2024-03-24 19:31:07 -07:00
Yuhong Sun
49263ed146 Linting (#1255) 2024-03-24 19:07:57 -07:00
Matthew Holland
bd1df9649b Added check for internet connection (#1214) 2024-03-24 19:04:40 -07:00
Yuhong Sun
d3674b02e6 Add Llama2 Prompt Option (#1254) 2024-03-24 19:01:38 -07:00
Arslan
b28b3cfa37 Making searching docs as a default option (#904) 2024-03-24 18:54:38 -07:00
Arnaud Ritti
12e8fd852c feat: add Helm chart (#1186) 2024-03-24 18:37:27 -07:00
Weves
b8f767adf2 Fix persona client side error 2024-03-23 14:58:01 -07:00
Arthur De Kimpe
920d059da5 Bugfix: Support more Confluence Cloud hostname (*.jira.com) (#1244) 2024-03-23 14:04:26 -07:00
Yuhong Sun
aaa7b26a4d Remove All Enums from Postgres (#1247) 2024-03-22 23:01:05 -07:00
Weves
89e72783a7 Add some private Persona / Document Set stuff 2024-03-22 21:44:31 -07:00
Weves
ec48142a2d Move some of the user re-work stuff to MIT repo 2024-03-22 16:29:24 -07:00
Yuhong Sun
c28a95e367 Port File Store from Volume to PG (#1241) 2024-03-21 20:10:08 -07:00
Weves
8dbe5cbaa6 Add private Persona / Document Set migration 2024-03-21 19:57:51 -07:00
Yuhong Sun
d66b6c0559 Fix Tag Document Source Enum (#1240) 2024-03-21 12:27:56 -07:00
Weves
6a776648b3 Fix LLM max tokens 2024-03-19 18:02:28 -07:00
Yuhong Sun
3a6d32da7c Port KV Store to Postgres (#1227) 2024-03-19 16:21:22 -07:00
Yuhong Sun
fab2be510a Update README (#1226) 2024-03-19 00:22:40 -07:00
Weves
04ae8b1bf9 Increase SQLAlchemy pool size 2024-03-18 12:19:49 -07:00
Weves
4b9c4667f6 Increase connection pool size 2024-03-18 12:19:49 -07:00
Weves
8e89d00e32 Improve Confluence rate limit handling 2024-03-14 19:33:16 -07:00
Yuhong Sun
f45e2476d0 Sharepoint Logging (#1218) 2024-03-14 18:35:31 -07:00
Yuhong Sun
4036e7c6c6 Remove DocumentSource Enum from postgres (#1217) 2024-03-14 18:19:40 -07:00
Kevin Shi
2a8e53c94f Skip draft zendesk articles 2024-03-11 12:00:18 -07:00
Yuhong Sun
90a6e23546 Jira Version Option (#1205) 2024-03-10 12:30:47 -07:00
teocns
19c7ebdc26 connector: ensure absolute URL integrity (#1196) 2024-03-10 01:04:05 -08:00
George 
f292ede85a Jira connector improvements (#1199)
* Jira connector:
- Add feature exclude comments from particular users
- Add feature common and custom fields to jira tasks

Fix bug on web in ConfigDisplay.tsx

* Mypy fixes

* move to metadata field

* k

---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-03-10 01:00:19 -08:00
Riccardo Schirone
0442513539 backend: remove duplicated word in ANSWER_VALIDITY_PROMPT 2024-03-09 16:14:39 -08:00
Yuhong Sun
db77d8d7cc File connector metadata (#1203) 2024-03-09 16:13:37 -08:00
Dan Brown
fd5294ed82 Fix broken link when specified in metadata (#1200) 2024-03-09 12:49:15 -08:00
Weves
e752e6d671 Fix bug with persona creation caused by starter messages 2024-03-08 00:57:14 -08:00
Yuhong Sun
3f1cd1ad12 Better description of the document index interfaces (#1188) 2024-03-06 00:07:12 -08:00
Chris Weaver
2ace03081c More fetch_versioned_implementation logging (#1187)
---------

Co-authored-by: Kevin Shi <kevinshisvf@gmail.com>
2024-03-05 09:46:02 -08:00
Weves
40c420f845 Fix disapearing chat sessions 2024-03-04 21:05:55 -08:00
Weves
7869f23e12 Improve slack flow 2024-03-04 19:22:46 -08:00
Kevin Shi
0b0665044f @lru_cache on fetch_versioned_implementation (#1178) 2024-03-04 14:50:51 -08:00
Chris Weaver
a7c820147e Add confluence rate limit handling (#1174) 2024-03-04 01:02:57 -08:00
Weves
563df1f952 Add env variable to allow people to control what clicking on New Chat does 2024-03-03 15:32:37 -08:00
Weves
a8cc3d5a07 Add ability to add starter messages 2024-03-03 14:23:34 -08:00
Yuhong Sun
9051ebfed7 Map to local network for EC2 deployments (#1167) 2024-03-03 13:30:44 -08:00
Weves
197392a95f Change AI Message name to Persona name 2024-03-02 23:28:24 -08:00
Yuhong Sun
81cb1ae399 Fix fail case when conf empty (#1163) 2024-03-02 18:10:37 -08:00
Yuhong Sun
f934e0a5ce Fix Ollama (#1162) 2024-03-02 17:35:41 -08:00
Weves
0366f3313a Fix gen ai model correctness check frequency 2024-03-01 21:59:18 -08:00
Weves
5df2f00e80 Upgrade nextjs version 2024-02-29 17:31:43 -08:00
Matthew Holland
ddc8640504 Skip indexing pages returning HTTP 4XX & 5XX codes 2024-02-29 16:54:26 -08:00
dependabot[bot]
5e7d740814 Bump aiohttp from 3.9.0rc0 to 3.9.2 in /backend/requirements
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.0rc0 to 3.9.2.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.9.0rc0...v3.9.2)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-29 16:52:31 -08:00
Bijay Regmi
cfcc1db338 add back line 2024-02-29 16:44:41 -08:00
Bijay Regmi
05f0ed6414 fix #955 2024-02-29 16:44:41 -08:00
Weves
c65e862adc Add option to disable document cleanup 2024-02-29 16:33:03 -08:00
Thomas Ritaine
86215238fc fix(nginx): adjust config to correctly route /openapi.json and /api/* 2024-02-29 16:32:42 -08:00
Weves
9308ba02a1 Small bug with persona prompt selection 2024-02-29 15:15:44 -08:00
Weves
7b7561533f Fix early breakout causing us to not update ConnectorByCredentialPair 2024-02-29 13:54:37 -08:00
Weves
2331bf9b36 Add trace for db session creation 2024-02-29 13:54:11 -08:00
Yuhong Sun
31d3ae0e3e Fix Slack Document Only Persona (#1150) 2024-02-29 13:53:37 -08:00
Weves
10cb4ab1d2 Revert "Trace sqla get_session"
This reverts commit d07345c533.
2024-02-29 13:41:28 -08:00
Kevin Shi
d07345c533 Trace sqla get_session 2024-02-29 13:33:04 -08:00
Yuhong Sun
c7d228e292 Trim Chunks if LLM tokenizer differs from Embedding tokenizer (#1143) 2024-02-28 13:01:32 -08:00
Weves
cd198ba368 mypy fixes 2024-02-27 16:02:24 -08:00
Weves
3941111685 Update mypy version 2024-02-27 16:02:24 -08:00
Weves
78f2e07d23 Improve tag handling 2024-02-27 16:02:24 -08:00
Weves
02d81c4be5 Bump up packages + add ddtrace 2024-02-27 12:18:11 -08:00
robertoamoreno
59c416b777 Update Document360 Connector (#1113)
expects primary_owners to be a list of dictionaries, but it's being provided with a list of strings instead.

https://github.com/danswer-ai/danswer/issues/1111
2024-02-27 08:40:52 -08:00
Weves
b38be416b7 Make init-letsencrypt bring up everything 2024-02-25 22:10:34 -08:00
Yuhong Sun
6d5340ae07 Default Values (#1130) 2024-02-24 14:01:59 -08:00
Yuhong Sun
0f23effe7e Change Endpoint Assumption (#1127) 2024-02-23 20:19:23 -08:00
Weves
9dac17d3e1 Add support for overriding semantic_identifier for file connector 2024-02-23 14:53:24 -08:00
Yuhong Sun
eed45f8410 Update README.md 2024-02-23 10:29:07 -08:00
Yuhong Sun
0e3894f27b Bump Reqs (#1116) 2024-02-22 07:08:00 -08:00
Yuhong Sun
7874eadb00 Bump Requirements (#1114) 2024-02-21 23:15:20 -08:00
Yuhong Sun
cfad36b828 Update CONTRIBUTING.md 2024-02-21 17:44:13 -08:00
Lawyered
76092a5cf0 Improve Virtual Env Activation Instructions for Windows (#1082)
This update enhances the CONTRIBUTING.md guide by providing clear, separate instructions for activating the Python virtual environment on Windows, tailored for both Command Prompt and PowerShell users. Previously, the guide only included a generic command, which might not work across different shells without slight modifications. This change aims to make the setup process more accessible and straightforward for contributors using Windows, ensuring they have the correct commands for their specific environment. By reducing potential setup hurdles, we hope to streamline the initial contribution process for new developers.
2024-02-21 17:15:23 -08:00
Ikko Eltociear Ashimine
0e4677e3db Update connector.py
recieved -> received
2024-02-21 16:54:40 -08:00
Weves
3a9d5b4d90 Style change for the docs_removed_from_index field in the admin UI 2024-02-21 16:53:58 -08:00
Weves
4c7c1b468b Fix mypy errors 2024-02-21 16:53:58 -08:00
Yuhong Sun
7748f4df94 Auto-Detect if Better Default LLM available for OpenAI (#1106) 2024-02-21 16:10:22 -08:00
Johannes Vass
918bc385a2 Remove documents from index which are not returned by connector 2024-02-21 16:09:04 -08:00
Weves
cc69ba03a6 Make WelcomeModal only appear for admins + only if no connectors are setup 2024-02-20 08:28:58 -08:00
Weves
db21d82ea2 Bump tf version 2024-02-19 20:17:20 -08:00
Weves
e246ea9d3b Fix embedding model migration with existing index_attempts 2024-02-19 18:23:59 -08:00
Weves
4eaf2b1200 Add more logging to run-nginx.sh 2024-02-19 16:52:29 -08:00
Weves
9ede8b727d Fix init-letsencrypt script 2024-02-19 16:52:29 -08:00
Weves
d20d2b0970 Bump up fastapi-users version 2024-02-19 15:57:51 -08:00
Weves
6b3ad15c90 Fix persona id change 2024-02-19 15:20:45 -08:00
dependabot[bot]
aa6d86accd Bump python-multipart from 0.0.6 to 0.0.7 in /backend/requirements (#1075)
Bumps [python-multipart](https://github.com/andrew-d/python-multipart) from 0.0.6 to 0.0.7.
- [Release notes](https://github.com/andrew-d/python-multipart/releases)
- [Changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md)
- [Commits](https://github.com/andrew-d/python-multipart/compare/0.0.6...0.0.7)

---
updated-dependencies:
- dependency-name: python-multipart
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-19 14:28:04 -08:00
dependabot[bot]
33c1cc491f Bump fastapi from 0.103.0 to 0.109.1 in /backend/requirements (#1043)
Bumps [fastapi](https://github.com/tiangolo/fastapi) from 0.103.0 to 0.109.1.
- [Release notes](https://github.com/tiangolo/fastapi/releases)
- [Commits](https://github.com/tiangolo/fastapi/compare/0.103.0...0.109.1)

---
updated-dependencies:
- dependency-name: fastapi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-19 14:27:39 -08:00
George
54fb7792c8 Add Jira Server connector form (#1046) 2024-02-19 14:17:36 -08:00
Yuhong Sun
c1d1651b43 Option to stop sync-ing primary index when building secondary one (#1096) 2024-02-18 22:53:26 -08:00
Yuhong Sun
15335dcd7d Standardize Chat Message Stream (#1098) 2024-02-18 22:48:28 -08:00
Yuhong Sun
31278fc52a k 2024-02-18 18:42:45 -08:00
Yuhong Sun
46ee5b2071 k 2024-02-18 18:42:45 -08:00
Weves
6059339e61 Improve initial flow 2024-02-18 18:40:44 -08:00
Weves
f9733f9870 Handle missing ri:userkey gracefully in Confluence connector 2024-02-18 15:20:45 -08:00
Weves
4d2959f1cc Fix 'View Full Trace' 2024-02-18 01:20:40 -08:00
Weves
61e2e68cf9 Improve FE for no-retrieval personas 2024-02-17 23:42:36 -08:00
Yuhong Sun
927e85319c Memory Reduction (#1092) 2024-02-17 21:20:34 -08:00
Yuhong Sun
d2ce3033a2 Enable Chat Without Connectors (#1090) 2024-02-17 21:12:44 -08:00
Yuhong Sun
6e8acdb20d Fix Remove Index Attempt (#1091) 2024-02-17 19:53:17 -08:00
Yuhong Sun
e505486ca4 Zendesk Tags (#1089) 2024-02-17 10:40:19 -08:00
Yuhong Sun
514e7f6e41 Kill Index Attempts for previous model (#1088) 2024-02-16 18:35:01 -08:00
Weves
269431cc9d Remove accidental console.log 2024-02-16 16:51:32 -08:00
Yuhong Sun
92500d448c Guru Error Logging (#1085) 2024-02-16 12:41:18 -08:00
Weves
10ad9babef Fix paused connectors 2024-02-15 15:46:32 -08:00
Weves
064d129592 Allow specifying Postgres User / Password / DB for dev compose file 2024-02-15 14:52:25 -08:00
Yuhong Sun
5fb688df02 Dont Always Warm Up Slackbot Model (#1080) 2024-02-13 22:28:46 -08:00
Yuhong Sun
23bf6ad4c7 Sample API Script (#1079) 2024-02-13 14:47:28 -08:00
Weves
aa7c811a9a Fix view full trace styling 2024-02-12 10:38:01 -08:00
Weves
3c2fb21c11 Fix source display for Persona intro 2024-02-11 22:09:33 -08:00
Yuhong Sun
1b55e617ad Offset Github by 3 hours to not lose updates (#1073) 2024-02-11 17:08:43 -08:00
Yuhong Sun
1c4f7fe7ef Pass Tags to LLM (#1071) 2024-02-11 15:58:42 -08:00
Weves
4629df06ef Fix force search when selecting docs to chat with + fix selected document de-selection on chat switch 2024-02-11 15:58:14 -08:00
Weves
7d11f5ffb8 Fix initial session creation + add Force Search 2024-02-11 13:57:08 -08:00
Weves
591e9831e7 Fix black for Notion connector 2024-02-10 21:09:14 -08:00
Eugene Yaroslavtsev
01bd1a84c4 fix: Notion connector now skips parsing ai_block blocks instead of erroring out (ai_blocks are currently unsupported by Notion API) 2024-02-10 21:01:19 -08:00
Weves
236fa947ee Add full exception trace to UI 2024-02-10 20:52:10 -08:00
Weves
6b5c20dd54 Don't get rid of answer if something goes wrong during quote generation 2024-02-10 19:10:22 -08:00
Weves
d5168deac8 Fix feedback display 2024-02-10 00:19:20 -08:00
Weves
37110df2de Increase session timeout default 2024-02-09 21:17:20 -08:00
Yuhong Sun
517c27c5ed Dev Script to Restart Containers (#1063) 2024-02-08 17:34:15 -08:00
Weves
81f53ff3d8 Fix run-nginx script when initiated from a windows machine 2024-02-08 15:19:25 -08:00
Yuhong Sun
1a1c91a7d9 Support Detection of LLM Max Context for Non OpenAI Models (#1060) 2024-02-08 15:15:58 -08:00
Yuhong Sun
cd8d8def1e Reformat Slack Message Display (#1056) 2024-02-08 14:37:46 -08:00
Yuhong Sun
5a056f1c0c Bump Vulnerable Libs (#1055) 2024-02-07 21:06:40 -08:00
Weves
0fb3fb8a1f Improve Google Drive connector naming 2024-02-07 01:25:47 -08:00
Yuhong Sun
35fe86e931 Option to only include Domain/Org wide docs (#1052) 2024-02-07 01:16:14 -08:00
Weves
4d6b3c8f08 FE to allow full re-indexing 2024-02-07 00:10:19 -08:00
Yuhong Sun
2362c2bdcc Reindex All Backend (#1049) 2024-02-06 23:07:24 -08:00
Weves
62000c1e46 Misc frontend fixes 2024-02-06 23:06:05 -08:00
Weves
c903d92fcc Fix issue with empty issues 2024-02-05 15:47:29 -08:00
Yuhong Sun
988e9aa682 Change Vespa Query to Post from Get (#1044) 2024-02-05 13:40:39 -08:00
Yuhong Sun
6768c24723 Default LLM Update (#1042) 2024-02-05 01:25:51 -08:00
Yuhong Sun
b3b88f05d3 Anonymous User Telem (#1041) 2024-02-04 13:41:19 -08:00
Weves
e54ce779fd Enable selection of long documents 2024-02-03 17:55:24 -08:00
Yuhong Sun
4c9709ae4a Chat History Docs sometimes wrongly ordered (#1039) 2024-02-03 13:25:24 -08:00
Itay
c435bf3854 CI: new pre-commit check (#1037) 2024-02-03 11:23:31 -08:00
Yuhong Sun
bb2b517124 Relari Test Script (#1033) 2024-02-02 09:50:48 -08:00
Szymon Planeta
dc2f4297b5 Add return contexts (#1018) 2024-02-01 22:22:22 -08:00
Yuhong Sun
0060a1dd58 Immediate Mark Resolved SlackBot Option and Respond to Bots Option (#1031) 2024-02-01 22:18:15 -08:00
Weves
29e74c0877 Fix nginx startup issues 2024-02-01 21:54:49 -08:00
Yuhong Sun
779c2829bf Update Base README (#1027) 2024-02-01 16:33:41 -08:00
Yuhong Sun
6a2b7514fe Miscount but it's not used (#1025) 2024-01-30 22:48:45 -08:00
Weves
8b9e6a91a4 Fix change model popup 2024-01-30 10:15:58 -08:00
Weves
b076c3d1ea Add regex support for Slack channels 2024-01-29 20:18:49 -08:00
Weves
d75ca0542a Support Slack channel regex 2024-01-29 20:18:49 -08:00
Yuhong Sun
ce12dd4a5a Fix Secondary Index Polling (#1020) 2024-01-29 19:34:25 -08:00
Weves
0a9b854667 Make final add conector modal slightly prettier 2024-01-29 18:12:13 -08:00
Weves
159453f8d7 Fix SwitchModelModal 2024-01-29 00:33:25 -08:00
Weves
2138c0b69d UI for model selection 2024-01-29 00:14:46 -08:00
Yuhong Sun
4b45164496 Background Index Attempt Creation (#1010) 2024-01-28 23:14:20 -08:00
Moshe Zada
c0c9c67534 Moshe.download nltk data on start (#1014) 2024-01-28 13:05:42 -08:00
Itay
a4053501d0 CI: adding prettier to pre-commit (#1009) 2024-01-28 13:03:39 -08:00
Moshe Zada
60a16fa46d Add space before new line in order to fix typo (#1013) 2024-01-28 13:00:53 -08:00
Itay
0ce992e22e CI: Run Python tests (#1001) 2024-01-28 12:59:51 -08:00
Bill Yang
35105f951b Add launch.json to gitignore (#961)
Co-authored-by: Bill Yang <bill@Bills-MacBook-Pro.local>
2024-01-28 12:57:33 -08:00
Weves
f1a5460739 Fix connected sources display 2024-01-27 12:00:24 -08:00
Weves
824677ca75 Add option to add citations to Personas + allow for more chunks if an LLM model override is specified 2024-01-27 10:16:17 -08:00
Yuhong Sun
cf4ede2130 Embedding Models Table (#1006) 2024-01-26 18:40:53 -08:00
Weves
81c33cc325 Fix import order 2024-01-25 17:26:34 -08:00
Weves
ec93ad9e6d Sharepoint fixes 2024-01-25 17:24:26 -08:00
Yuhong Sun
d0fa02c8dc Multiple Indices in Vespa (#1000) 2024-01-25 13:56:29 -08:00
Hagen O'Neill
d6d83e79f1 Added sharepoint connector (#963) 2024-01-25 13:16:10 -08:00
Chris Weaver
e94fd8b022 Remove un-needed imports (#999) 2024-01-25 12:10:19 -08:00
Yuhong Sun
92628357df Prevent Scheduling Multiple Queued Indexings (#997) 2024-01-24 16:31:29 -08:00
Yuhong Sun
50086526e2 Fix Vespa Title Overly Punished when Missing (#995) 2024-01-24 15:13:36 -08:00
Weves
7174ea3908 Fix hubspot connector 2024-01-24 15:10:08 -08:00
Jeremi Joslin
d07647c597 Fix typo in gmail test connector (#981) 2024-01-24 12:01:26 -08:00
Yuhong Sun
3a6712e3a0 Default Embedding Size (#993) 2024-01-24 12:00:25 -08:00
Yuhong Sun
bcc40224fa Embed Dim Env Var (#988) 2024-01-23 19:32:51 -08:00
Yuhong Sun
5d26290c5d Vespa Hyperparameter Changes (#986) 2024-01-23 17:57:19 -08:00
Yuhong Sun
9d1aa7401e Variable Embedding Dim for Vespa (#985) 2024-01-23 17:38:50 -08:00
Weves
c2b34f623c Handle github rate limiting + fix Slack rate limiting bug + change frozen indexing time to 3 hours 2024-01-23 00:37:33 -08:00
Itay
692fdb4597 Gmail Connector (#946)
---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-01-22 16:25:10 -08:00
Weves
2c38033ef5 More latency logging + add limit/offset support 2024-01-21 18:52:55 -08:00
Weves
777521a437 Move delete to the right for consistency + disabled -> paused 2024-01-20 20:04:16 -08:00
Yuhong Sun
0e793e972b Slack Give Resolver Name (#973) 2024-01-20 20:01:23 -08:00
Yuhong Sun
a2a171999a Guru Metadata (#967) 2024-01-18 23:14:26 -08:00
Yuhong Sun
5504c9f289 GitLab Connector Logic Fixes (#966) 2024-01-18 16:44:07 -08:00
Yuhong Sun
5edc464c9a Fix GitLabs CI (#965) 2024-01-18 16:12:46 -08:00
Rutik Thakre
1670d923aa Gitlab Connector (#931) 2024-01-18 15:43:17 -08:00
Weves
1981a02473 Add tags to file connector 2024-01-18 12:12:11 -08:00
Yuhong Sun
4dc8eab014 Fix Linting (#962) 2024-01-17 22:49:37 -08:00
Weves
3a8d89afd3 Fix newlines in answers 2024-01-17 02:10:52 -08:00
Weves
fa879f7d7f Add new APIs specifically for GPTs 2024-01-15 22:35:58 -08:00
Yuhong Sun
f5be0cc2c0 Tiny Mail From Fix (#953) 2024-01-15 22:03:38 -08:00
Roman
621967d2b6 Add MAIL_FROM env variable (#949)
Co-authored-by: Roman Tyshyk <roman.tyshyk@711media.de>
2024-01-15 21:53:03 -08:00
Yuhong Sun
44905d36e5 Slack Rate Limit Options (#952) 2024-01-15 21:47:24 -08:00
mattboret
53add2c801 Add support to limit the number of Slack questions per minute (#908)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-01-15 21:26:35 -08:00
Yuhong Sun
d17426749d Group support for Slack Followup (#951) 2024-01-15 19:21:06 -08:00
Chris Weaver
d099b931d8 Slack confirmation UI (#950) 2024-01-15 15:33:55 -08:00
Yuhong Sun
4cd9122ba5 Slack Followup Option (#948) 2024-01-15 14:26:20 -08:00
Yuhong Sun
22fb7c3352 Slack LLM Filter Enabled by Default (#943) 2024-01-13 17:37:51 -08:00
dependabot[bot]
4ff3bee605 Bump pycryptodome from 3.19.0 to 3.19.1 in /backend/requirements (#909)
Bumps [pycryptodome](https://github.com/Legrandin/pycryptodome) from 3.19.0 to 3.19.1.
- [Release notes](https://github.com/Legrandin/pycryptodome/releases)
- [Changelog](https://github.com/Legrandin/pycryptodome/blob/master/Changelog.rst)
- [Commits](https://github.com/Legrandin/pycryptodome/compare/v3.19.0...v3.19.1)

---
updated-dependencies:
- dependency-name: pycryptodome
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-13 15:42:37 -08:00
Yuhong Sun
7029bdb291 Remove unused doc endpoint (#942) 2024-01-13 15:22:23 -08:00
Yuhong Sun
cf4c3c57ed Log size limit to prevent more disk usage (#941) 2024-01-13 15:22:01 -08:00
Weves
1b6eb0a52f Add API Key table 2024-01-12 17:48:25 -08:00
Yuhong Sun
503a709e37 Update Slack Link (#936) 2024-01-11 17:42:26 -08:00
Yuhong Sun
0fd36c3120 Remove Sweep Issues (#935) 2024-01-11 12:31:38 -08:00
Yuhong Sun
8d12c7c202 Remove Sweep Conf (#934) 2024-01-11 11:38:17 -08:00
Sam Jakos
a4d5ac816e Add metadata file loader to ZIP file connector (#920) 2024-01-11 11:14:30 -08:00
Yuhong Sun
2a139fd529 Poll Connector Window Overlap (#930) 2024-01-11 11:10:01 -08:00
Weves
54347e100f Fix Notion recursive 2024-01-10 17:17:56 -08:00
Yuhong Sun
936e69bc2b Stop Streaming Pattern (#923) 2024-01-09 23:52:21 -08:00
Yuhong Sun
0056cdcf44 GitHub Base URL (#922) 2024-01-09 22:31:07 -08:00
Yuhong Sun
1791edec03 Fix Occasionally dropped Issues (#921) 2024-01-09 22:16:02 -08:00
Yuhong Sun
6201c1b585 Update CONTRIBUTING.md 2024-01-08 22:44:49 -08:00
Yuhong Sun
c8f34e3103 Update CONTRIBUTING.md 2024-01-08 22:44:11 -08:00
Weves
77b0d76f53 Small improvement to persona table 2024-01-08 19:17:39 -08:00
Weves
733626f277 Fix modal z scores 2024-01-08 18:26:36 -08:00
Weves
1da79c8627 Fix mypy errors on loopio connector 2024-01-08 16:19:07 -08:00
Weves
4e3d57b1b9 Disallow Google Drive credential delete if connectors exist 2024-01-08 15:57:18 -08:00
Weves
e473ad0412 Go back to localhost:3000 2024-01-08 15:35:52 -08:00
Weves
7efd3ba42f Add retries to Slack 2024-01-08 15:24:01 -08:00
mikewolfxyou
879e873310 Add deployment/data/nginx/app.conf to .gitignore (#912)
In order to prevent local development changes from appearing in git diff and being committed

Co-authored-by: Mike Zhiguo Zhang <zhiguo.zhang@real-digital.de>
2024-01-07 10:32:27 -08:00
Yuhong Sun
adc747e66c Explain Chat History Structure (#913) 2024-01-07 10:30:01 -08:00
Mike P. Sinn
a29c1ff05c Change WEB_DOMAIN default to 127.0.0.1 instead of localhost (#901)
Got it, thanks for the contribution! Neither of us (maintainers) use Windows so we weren't aware of this. Thanks a bunch for pointing it out and fixing it!
2024-01-05 23:50:57 -08:00
Yuhong Sun
49415e4615 Don't replace citations in code blocks (#911) 2024-01-05 23:32:28 -08:00
Sam Jakos
885e698d5d Add Loopio Connector (#850)
Looks good! I couldn't verify that it end-to-end because Loopio still hasn't granted me API access but the code looks good. Thanks a bunch for the contribution!

Would you be open to also writing the docs page for the setup? It's just adding an md file with some images or gifs:
https://github.com/danswer-ai/danswer-docs

I can provide a template branch if that would make it easier, just let me know 🙏
2024-01-05 23:32:10 -08:00
Weves
30983657ec Fix indexing of whitespace only 2024-01-05 19:35:38 -08:00
Yuhong Sun
6b6b3daab7 Reenable option to run Danswer without Gen AI (#906) 2024-01-03 18:31:16 -08:00
Chris Weaver
20441df4a4 Add Tag Filter UI + other UI cleanup (#905) 2024-01-02 11:30:36 -08:00
Yuhong Sun
d7141df5fc Metadata and Title Search (#903) 2024-01-02 11:25:50 -08:00
Yuhong Sun
615bb7b095 Update CONTRIBUTING.md 2024-01-01 18:07:50 -08:00
Yuhong Sun
e759718c3e Update CONTRIBUTING.md 2024-01-01 18:06:56 -08:00
Yuhong Sun
06d8d0e53c Update CONTRIBUTING.md 2024-01-01 18:06:17 -08:00
Weves
ae9b556876 Revamp new chat screen for chat UI 2023-12-30 18:13:24 -08:00
Chris Weaver
f883611e94 Add query editing in Chat UI (#899) 2023-12-30 12:46:48 -08:00
Yuhong Sun
13c536c033 Final Backend CVEs (#900) 2023-12-30 11:57:49 -08:00
Yuhong Sun
2e6be57880 Model Server CVEs (#898) 2023-12-29 21:14:08 -08:00
Weves
b352d83b8c Increase max upload size 2023-12-29 21:11:57 -08:00
Yuhong Sun
aa67768c79 CVEs continued (#889) 2023-12-29 20:42:16 -08:00
Weves
6004e540f3 Improve Vespa invalid char cleanup 2023-12-29 20:36:03 -08:00
eukub
64d2cea396 reduced redunduncy and changed concatenation of strings to f-strings 2023-12-29 00:35:04 -08:00
Weves
b5947a1c74 Add illegal char stripping to title field 2023-12-29 00:17:40 -08:00
Weves
cdf260b277 FIx chat refresh + add stop button 2023-12-28 23:33:41 -08:00
Weves
73483b5e09 Fix more auth disabled flakiness 2023-12-27 01:23:29 -08:00
Yuhong Sun
a6a444f365 Bump Python Version for security (#887) 2023-12-26 16:15:14 -08:00
Yuhong Sun
449a403c73 Automatic Security Scan (#886) 2023-12-26 14:41:23 -08:00
Yuhong Sun
4aebf824d2 Fix broken build SHA issue (#885) 2023-12-26 14:36:40 -08:00
Weves
26946198de Fix disabled auth 2023-12-26 12:51:58 -08:00
Yuhong Sun
e5035b8992 Move some util functions around (#883) 2023-12-26 00:38:29 -08:00
Weves
2e9af3086a Remove old comment 2023-12-25 21:36:54 -08:00
Weves
dab3ba8a41 Add support for basic auth on FE 2023-12-25 21:19:59 -08:00
Yuhong Sun
1e84b0daa4 Fix escape character handling in DanswerBot (#880) 2023-12-25 12:28:35 -08:00
Yuhong Sun
f4c8abdf21 Remove Extraneous Persona Config (#878) 2023-12-24 22:48:48 -08:00
sweep-ai[bot]
ccc5bb1e67 Configure Sweep (#875)
* Create sweep.yaml

* Create sweep template

* Update sweep.yaml

---------

Co-authored-by: sweep-ai[bot] <128439645+sweep-ai[bot]@users.noreply.github.com>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-12-24 19:04:52 -08:00
Yuhong Sun
c3cf9134bb Telemetry Revision (#868) 2023-12-24 17:39:37 -08:00
Weves
0370b9b38d Stop copying local node_modules / .next dir into web docker image 2023-12-24 15:27:11 -08:00
Weves
95bf1c13ad Add http2 dependency 2023-12-24 14:49:31 -08:00
Yuhong Sun
00c1f93b12 Zendesk Tiny Cleanup (#867) 2023-12-23 16:39:15 -08:00
Yuhong Sun
a122510cee Zendesk Connector Metadata and small batch fix (#866) 2023-12-23 16:34:48 -08:00
Weves
dca4f7a72b Adding http2 support to Vespa 2023-12-23 16:23:24 -08:00
Weves
535dc265c5 Fix boost resetting on document update + fix refresh on re-index 2023-12-23 15:23:21 -08:00
Weves
56882367ba Fix migrations 2023-12-23 12:58:00 -08:00
Weves
d9fbd7ffe2 Add hiding + re-ordering to personas 2023-12-22 23:04:43 -08:00
Yuhong Sun
8b7d01fb3b Allow Duplicate Naming for CC-Pair (#862) 2023-12-22 23:03:44 -08:00
voarsh2
016a087b10 Refactor environment variable handling using ConfigMap for Kubernetes deployment (#515)
---------

Co-authored-by: Reese Jenner <reesevader@hotmail.co.uk>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-12-22 21:33:36 -08:00
Sam Jakos
241b886976 fix: parse INDEX_BATCH_SIZE to an int (#858) 2023-12-22 13:03:21 -08:00
Yuhong Sun
ff014e4f5a Bump Transformer Version (#857) 2023-12-22 01:47:18 -08:00
Aliaksandr_С
0318507911 Indexing settings and logging improve (#821)
---------

Co-authored-by: Aliaksandr Chernak <aliaksandr_chernak@epam.com>
Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2023-12-22 01:13:24 -08:00
Yuhong Sun
6650f01dc6 Multilingual Docs Updates (#856) 2023-12-22 00:26:00 -08:00
Yuhong Sun
962e3f726a Slack Feedback Message Tweaks (#855) 2023-12-21 20:52:11 -08:00
mattboret
25a73b9921 Slack bot improve source feedback (#827)
---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-12-21 20:33:20 -08:00
Yuhong Sun
dc0b3672ac git push --set-upstream origin danswerbot-format (#854) 2023-12-21 18:46:30 -08:00
Yuhong Sun
c4ad03a65d Handle DanswerBot case where no updated at (#853) 2023-12-21 18:33:42 -08:00
mattboret
c6f354fd03 Add the latest document update to the Slack bot answer (#817)
* Add the latest source update to the Slack bot answer

* fix mypy errors

---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-12-21 18:16:05 -08:00
Yuhong Sun
2f001c23b7 Confluence add tag to replaced names (#852) 2023-12-21 18:03:56 -08:00
mattboret
4d950aa60d Replace user id by the user display name in the exported Confluence page (#815)
Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-12-21 17:52:28 -08:00
Yuhong Sun
56406a0b53 Bump Vespa to 8.277.17 (#851) 2023-12-21 17:23:27 -08:00
sam lockart
eb31c08461 Update Vespa to 8.267.29 (#812) 2023-12-21 17:18:16 -08:00
Weves
26f94c9890 Improve re-sizing 2023-12-21 10:03:03 -08:00
Weves
a9570e01e2 Make document sidebar scrollbar darker 2023-12-21 10:03:03 -08:00
Weves
402d83e167 Make it so docs without links aren't clickable in chat citations 2023-12-21 10:03:03 -08:00
Ikko Eltociear Ashimine
10dcd49fc8 Update CONTRIBUTING.md
Nagivate -> Navigate
2023-12-21 09:10:52 -08:00
Yuhong Sun
0fdad0e777 Update Demo Video 2023-12-20 19:05:23 -08:00
Weves
fab767d794 Fix persona document sets 2023-12-20 15:24:32 -08:00
Weves
7dd70ca4c0 Change danswer header link in chat page 2023-12-20 11:38:33 -08:00
Weves
370760eeee Fix editing deleted personas, editing personas with no prompts, and model selection 2023-12-19 14:42:13 -08:00
Weves
24a62cb33d Fix persona + prompt apis 2023-12-19 10:23:06 -08:00
Weves
9e4a4ddf39 Update search helper styling 2023-12-19 07:08:11 -08:00
Yuhong Sun
c281859509 Google Drive handle invalid PDFs (#838) 2023-12-18 23:39:45 -08:00
Yuhong Sun
2180a40bd3 Disable Chain of Thought for now (#837) 2023-12-18 21:44:47 -08:00
Weves
997f9c3191 Fix ccPair pages crashing 2023-12-17 23:28:26 -08:00
Weves
677c32ea79 Fix issue where a message that errors out creates a bad state 2023-12-17 23:28:26 -08:00
Yuhong Sun
edfc849652 Search more frequently (#834) 2023-12-17 22:45:46 -08:00
Yuhong Sun
9d296b623b Shield Update (#833) 2023-12-17 22:17:44 -08:00
Yuhong Sun
5957b888a5 DanswerBot Chat (#831) 2023-12-17 18:18:48 -08:00
Chris Weaver
c7a91b1819 Allow re-sizing of document sidebar + make central chat smaller on small screens (#832) 2023-12-17 18:17:43 -08:00
Weves
a099f8e296 Rework header a bit + remove assumption of all personas having a prompt 2023-12-14 23:06:39 -08:00
Weves
16c8969028 Chat UI 2023-12-14 22:18:42 -08:00
Yuhong Sun
65fde8f1b3 Chat Backend (#801) 2023-12-14 22:14:37 -08:00
Yuhong Sun
229db47e5d Update LLM Key Check Logic (#825) 2023-12-09 13:41:31 -08:00
Weves
2e3397feb0 Check for slack bot token changes every 60 seconds 2023-12-08 14:14:22 -08:00
Weves
d5658ce477 Persona enhancements 2023-12-07 14:29:37 -08:00
Weves
ddf3f99da4 Add support for global API prefix env variable 2023-12-07 12:42:17 -08:00
Weves
56785e6065 Add model choice to Persona 2023-12-07 00:20:42 -08:00
Weves
26e808d2a1 Fix welcome modal 2023-12-06 21:07:34 -08:00
Yuhong Sun
e3ac373f05 Make Default Fast LLM not identical to main LLM (#818) 2023-12-06 16:14:04 -08:00
Yuhong Sun
9e9a578921 Option to speed up DanswerBot by turning off chain of thought (#816) 2023-12-05 00:43:45 -08:00
Weves
f7172612e1 Allow persona usage for Slack bots 2023-12-04 19:20:03 -08:00
Yuhong Sun
5aa2de7a40 Fix Weak Models Concurrency Issue (#811) 2023-12-04 15:40:10 -08:00
Yuhong Sun
e0b87d9d4e Fix Weak Model Prompt (#810) 2023-12-04 15:02:08 -08:00
Weves
5607fdcddd Make Slack Bot setup UI more similar to Persona setup 2023-12-03 23:36:54 -08:00
Yuhong Sun
651de071f7 Improve English rephrasing for multilingual use case (#808) 2023-12-03 14:34:12 -08:00
John Bergvall
5629ca7d96 Copy SearchQuery model with updated attribute due to Config.frozen=True (#806)
Fixes the following TypeError:

api_server_1     |   File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
api_server_1     |     return await get_asynclib().run_sync_in_worker_thread(
api_server_1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_server_1     |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
api_server_1     |     return await future
api_server_1     |            ^^^^^^^^^^^^
api_server_1     |   File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
api_server_1     |     result = context.run(func, *args)
api_server_1     |              ^^^^^^^^^^^^^^^^^^^^^^^^
api_server_1     |   File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 53, in _next
api_server_1     |     return next(iterator)
api_server_1     |            ^^^^^^^^^^^^^^
api_server_1     |   File "/app/danswer/utils/timing.py", line 47, in wrapped_func
api_server_1     |     value = next(gen)
api_server_1     |             ^^^^^^^^^
api_server_1     |   File "/app/danswer/direct_qa/answer_question.py", line 243, in answer_qa_query_stream
api_server_1     |     top_chunks = cast(list[InferenceChunk], next(search_generator))
api_server_1     |                                             ^^^^^^^^^^^^^^^^^^^^^^
api_server_1     |   File "/app/danswer/search/search_runner.py", line 469, in full_chunk_search_generator
api_server_1     |     retrieved_chunks = retrieve_chunks(
api_server_1     |                        ^^^^^^^^^^^^^^^^
api_server_1     |   File "/app/danswer/search/search_runner.py", line 353, in retrieve_chunks
api_server_1     |     q_copy.query = rephrase
api_server_1     |     ^^^^^^^^^^^^
api_server_1     |   File "pydantic/main.py", line 359, in pydantic.main.BaseModel.__setattr__
api_server_1     | TypeError: "SearchQuery" is immutable and does not support item assignment
2023-12-03 13:47:11 -08:00
Yuhong Sun
bc403d97f2 Organize Prompts for Chat implementation (#807) 2023-12-03 13:27:11 -08:00
Weves
292c78b193 Always pull latest data when visiting main search page 2023-12-03 03:25:13 -08:00
Weves
ac35719038 FE improvements to make initial setup more intuitive 2023-12-02 16:40:44 -08:00
Yuhong Sun
02095e9281 Restructure APIs (#803) 2023-12-02 14:48:08 -08:00
Yuhong Sun
8954a04602 Reorder Tables for cleaner extending (#800) 2023-12-01 17:46:13 -08:00
Yuhong Sun
8020db9e9a Update connector interface with optional Owners information (#798) 2023-11-30 23:08:16 -08:00
Yuhong Sun
17c2f06338 Add more metadata options for File connector (#797) 2023-11-30 13:24:22 -08:00
Weves
9cff294a71 Increase retries for google drive connector 2023-11-30 03:03:26 -08:00
Weves
e983aaeca7 Add more logging on existing jobs 2023-11-30 02:58:37 -08:00
Weves
7ea774f35b Change in-progress status color 2023-11-29 20:57:45 -08:00
Weves
d1846823ba Associate a user with web/file connectors 2023-11-29 18:18:56 -08:00
Yuhong Sun
fda89ac810 Expert Recommendation Heuristic Only (#791) 2023-11-29 15:53:57 -08:00
Yuhong Sun
006fd4c438 Ingestion API now always updates regardless of document updated_at (#786) 2023-11-29 02:08:50 -08:00
Weves
9b7069a043 Disallow re-indexing for File connector 2023-11-29 02:01:11 -08:00
Weves
c64c25b2e1 Fix temp file deletion 2023-11-29 02:00:20 -08:00
Yuhong Sun
c2727a3f19 Custom OpenAI Model Server (#782) 2023-11-29 01:41:56 -08:00
Chris Weaver
37daf4f3e4 Remove AI Thoughts by default (#783)
- Removes AI Thoughts by default - only shows when validation fails
- Removes punctuation "words" from queries in addition to stopwords (Vespa ignores punctuation anyways)
- Fixes Vespa deletion script for larger doc counts
2023-11-29 01:00:53 -08:00
Yuhong Sun
fcb7f6fcc0 Accept files with character issues (#781) 2023-11-28 22:43:58 -08:00
Weves
429016d4a2 Fix zulip page 2023-11-28 16:28:51 -08:00
Chris Weaver
c83a450ec4 Remove personal connectors page(#779) 2023-11-28 16:11:42 -08:00
Yuhong Sun
187b94a7d8 Blurb Key Error (#778) 2023-11-28 16:09:33 -08:00
Weves
30225fd4c5 Fix filter hiding 2023-11-28 04:13:11 -08:00
Weves
a4f053fa5b Fix persona refresh 2023-11-28 02:53:18 -08:00
Weves
eab4fe83a0 Remove Slack bot personas from web UI 2023-11-28 02:53:18 -08:00
Chris Weaver
78d1ae0379 Customizable personas (#772)
Also includes a small fix to LLM filtering when combined with reranking
2023-11-28 00:57:48 -08:00
Yuhong Sun
87beb1f4d1 Log LLM details on server start (#773) 2023-11-27 21:32:48 -08:00
Yuhong Sun
05c2b7d34e Update LLM related Libs (#771) 2023-11-26 19:54:16 -08:00
Yuhong Sun
39d09a162a Danswer APIs Document Ingestion Endpoint (#716) 2023-11-26 19:09:22 -08:00
Yuhong Sun
d291fea020 Turn off Reranking for Streaming Flows (#770) 2023-11-26 16:45:23 -08:00
Yuhong Sun
2665bff78e Option to turn off LLM for eval script (#769) 2023-11-26 15:31:03 -08:00
Yuhong Sun
65d38ac8c3 Slack to respect LLM chunk filter settings (#768) 2023-11-26 01:06:12 -08:00
Yuhong Sun
8391d89bea Fix Indexing Concurrency (#767) 2023-11-25 21:40:36 -08:00
Yuhong Sun
ac2ed31726 Indexing Jobs to have shorter lived DB sessions (#766) 2023-11-24 21:38:16 -08:00
Chris Weaver
47f947b045 Use torch.multiprocessing + enable SimpleJobClient by default (#765) 2023-11-24 18:29:28 -08:00
dependabot[bot]
63b051b342 Bump sharp from 0.32.5 to 0.32.6 in /web
Bumps [sharp](https://github.com/lovell/sharp) from 0.32.5 to 0.32.6.
- [Release notes](https://github.com/lovell/sharp/releases)
- [Changelog](https://github.com/lovell/sharp/blob/main/docs/changelog.md)
- [Commits](https://github.com/lovell/sharp/compare/v0.32.5...v0.32.6)

---
updated-dependencies:
- dependency-name: sharp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-11-24 18:14:45 -08:00
Weves
a5729e2fa6 Add new model server env vars to the compose file 2023-11-24 00:12:04 -08:00
Weves
3cec854c5c Allow different model servers for different models / indexing jobs 2023-11-23 23:39:03 -08:00
Weves
26c6651a03 Improve LLM answer parsing 2023-11-23 15:03:35 -08:00
Yuhong Sun
13001ede98 Search Regression Test and Save/Load State updates (#761) 2023-11-23 00:00:30 -08:00
Yuhong Sun
fda377a2fa Regression Script for Search quality (#760) 2023-11-22 19:33:28 -08:00
Yuhong Sun
bdfb894507 Slack Role Override (#755) 2023-11-22 17:47:18 -08:00
Weves
35c3511daa Increase Vespa timeout 2023-11-22 01:42:59 -08:00
Chris Weaver
c1e19d0d93 Add selected docs in UI + rework the backend flow a bit(#754)
Changes the flow so that the selected docs are sent over in a separate packet rather than as part of the initial packet for the streaming QA endpoint.
2023-11-21 19:46:12 -08:00
mattboret
e78aefb408 Add script to analyse the sources selection (#721)
---------

Co-authored-by: Matthieu Boret <matthieu.boret@fr.clara.net>
2023-11-21 18:35:26 -08:00
Bryan Peterson
aa2e859b46 add missing dependencies in model_server dockerfile (#752)
Thanks for catching this! Super helpful!
2023-11-21 17:59:28 -08:00
Yuhong Sun
c0c8ae6c08 Minor Tuning for Filters (#753) 2023-11-21 15:47:58 -08:00
Weves
1225c663eb Add new env variable to compose file 2023-11-20 21:40:54 -08:00
Weves
e052d607d5 Add option to log Vespa timing info 2023-11-20 21:37:22 -08:00
Yuhong Sun
8e5e11a554 Add md files to File Connector (#749) 2023-11-20 19:56:06 -08:00
Yuhong Sun
57f0323f52 NLP Model Warmup Reworked (#748) 2023-11-20 17:28:23 -08:00
Weves
6e9f31d1e9 Fix ResourceLogger blocking main thread 2023-11-20 16:46:18 -08:00
Weves
eeb844e35e Fix bug with Google Drive shortcut error case 2023-11-20 16:34:07 -08:00
Sid Ravinutala
d6a84ab413 fix for url parsing google site 2023-11-20 16:08:43 -08:00
Weves
68160d49dd Small mods to enable deployment on AWS EKS 2023-11-20 01:42:48 -08:00
Yuhong Sun
0cc3d65839 Add option to run a faster/cheaper LLM for secondary flows (#742) 2023-11-19 17:48:42 -08:00
Weves
df37387146 Fix a couple bugs with google sites link finding 2023-11-19 15:35:54 -08:00
Yuhong Sun
f72825cd46 Provide Metadata to the LLM (#740) 2023-11-19 12:28:45 -08:00
Yuhong Sun
6fb07d20cc Multilingual Query Expansion (#737) 2023-11-19 10:55:55 -08:00
Chris Weaver
b258ec1bed Adjust checks for removal from existing_jobs dict + add more logging + only one scheduled job for a connector at a time (#739) 2023-11-19 02:03:17 -08:00
Yuhong Sun
4fd55b8928 Fix GPT4All (#738) 2023-11-18 21:21:02 -08:00
Yuhong Sun
b3ea53fa46 Fix Build Version (#736) 2023-11-18 17:16:25 -08:00
Yuhong Sun
fa0d19cc8c LLM Chunk Filtering (#735) 2023-11-18 17:12:24 -08:00
Weves
d5916e420c Fix duplicated query event for 'answer_qa_query_stream' and missing llm_answer in 'answer_qa_query' 2023-11-17 21:10:23 -08:00
Weves
39b912befd Enable show GPT answer option immediately 2023-11-17 17:08:38 -08:00
Weves
37c5f24d91 Fix logout redirect 2023-11-17 16:43:24 -08:00
Weves
ae72cd56f8 Add a bit more logging in indexing pipeline 2023-11-16 12:00:19 -08:00
Yuhong Sun
be5ef77896 Optional Anonymous Telemetry (#727) 2023-11-16 09:22:36 -08:00
Weves
0ed8f14015 Improve Vespa filtering performance 2023-11-15 14:30:12 -08:00
Weves
a03e443541 Add root_page_id option for Notion connector 2023-11-15 12:46:41 -08:00
Weves
4935459798 Fix hover being transparent 2023-11-15 11:52:40 -08:00
Weves
efb52873dd Prettier fix 2023-11-14 22:22:42 -08:00
Bradley
442f7595cc Added connector configuration link and external link icon to web connector page. 2023-11-14 22:19:00 -08:00
Weves
81cbcbb403 Fix connector deletion bug 2023-11-14 09:07:59 -08:00
Weves
0a0e672b35 Fix no letsencrypt 2023-11-13 14:32:51 -08:00
Yuhong Sun
69644b266e Hybrid Search Alpha Parameter (#714) 2023-11-09 17:11:10 -08:00
Yuhong Sun
5a4820c55f Skip Index on Docs with no newer updated at (#713) 2023-11-09 16:27:32 -08:00
Weves
a5d69bb392 Add back end time to Gong 2023-11-09 14:03:46 -08:00
Weves
23ee45c033 Enhance document explorer 2023-11-09 00:58:51 -08:00
Yuhong Sun
31bfd015ae Request Tracker Connector (#709)
Contributed by Evan! Thanks for the contribution!

- Minor linting and rebasing done by Yuhong, everything else from Evan

---------

Co-authored-by: Evan Sarmiento <e.sarmiento@soax.com>
Co-authored-by: Evan <esarmien@fas.harvard.edu>
2023-11-07 16:55:10 -08:00
Yuhong Sun
0125d8a0f6 Source Filter Extraction (#708) 2023-11-07 14:21:04 -08:00
Yuhong Sun
4f64444f0f Fix Version from Tag not picked up (#705) 2023-11-06 20:01:20 -08:00
Weves
abf9cc3248 Add timeout to all Notion calls 2023-11-06 19:29:42 -08:00
Chris Weaver
f5bf2e6374 Fix experimental checkpointing + move check for disabled connector to the start of the batch (#703) 2023-11-06 17:14:31 -08:00
Yuhong Sun
24b3b1fa9e Fix GitHub Actions Naming (#702) 2023-11-06 16:40:49 -08:00
Yuhong Sun
7433dddac3 Model Server (#695)
Provides the ability to pull out the NLP models into a separate model server which can then be hosted on a GPU instance if desired.
2023-11-06 16:36:09 -08:00
Weves
fe938b6fc6 Add experimental checkpointing 2023-11-04 14:51:28 -07:00
dependabot[bot]
2db029672b Bump pypdf from 3.16.4 to 3.17.0 in /backend/requirements (#667)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 3.16.4 to 3.17.0.
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/3.16.4...3.17.0)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-03 18:54:29 -07:00
Yuhong Sun
602f9c4a0a Default Version to 0.2-dev (#690) 2023-11-03 18:37:01 -07:00
Bradley
551705ad62 Implemented Danswer versioning system. (#649)
* Web & API server versioning system. Displayed on UI.

* Remove some debugging code.

* Integrated backend version into GitHub Action & Docker build workflow using env variables.

* Fixed web container environment variable name.

* Revise Dockerfiles for GitHub Actions workflow.

* Added system information page to admin panel with version info. Updated github workflows to include tagged version, and corresponding changes in the dockerfiles and codebases for web&backend to use env variables if present. Changed to 'dev' naming scheme if no env var is present to indicate local setup. Removed version from admin panel header.

* Added missing systeminfo dir to remote repo.
2023-11-03 18:02:39 -07:00
Weves
d9581ce0ae Fix Notion recursive search for non-shared database 2023-11-03 15:46:23 -07:00
Yuhong Sun
e27800d501 Formatting 2023-11-02 23:31:19 -07:00
Yuhong Sun
927dffecb5 Prompt Layer Rework (#688) 2023-11-02 23:26:47 -07:00
Weves
68b23b6339 Enable database reading in recursive notion crawl 2023-11-02 23:14:54 -07:00
Weves
174f54473e Fix notion recursive search for blocks with children 2023-11-02 22:21:55 -07:00
Weves
329824ab22 Address issue with links for Google Sites connector 2023-11-02 22:01:08 -07:00
Yuhong Sun
b0f76b97ef Guru and Productboard Time Updated (#683) 2023-11-02 14:27:06 -07:00
753 changed files with 57241 additions and 15059 deletions

View File

@@ -1,4 +1,4 @@
name: Build and Push Backend Images on Tagging
name: Build and Push Backend Image on Tag
on:
push:
@@ -32,3 +32,13 @@ jobs:
tags: |
danswer/danswer-backend:${{ github.ref_name }}
danswer/danswer-backend:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
# To run locally: trivy image --severity HIGH,CRITICAL danswer/danswer-backend
image-ref: docker.io/danswer/danswer-backend:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'
trivyignores: ./backend/.trivyignore

View File

@@ -0,0 +1,42 @@
name: Build and Push Model Server Image on Tag
on:
push:
tags:
- '*'
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Model Server Image Docker Build and Push
uses: docker/build-push-action@v2
with:
context: ./backend
file: ./backend/Dockerfile.model_server
platforms: linux/amd64,linux/arm64
push: true
tags: |
danswer/danswer-model-server:${{ github.ref_name }}
danswer/danswer-model-server:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/danswer/danswer-model-server:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'

View File

@@ -1,4 +1,4 @@
name: Build and Push Web Images on Tagging
name: Build and Push Web Image on Tag
on:
push:
@@ -32,3 +32,11 @@ jobs:
tags: |
danswer/danswer-web-server:${{ github.ref_name }}
danswer/danswer-web-server:latest
build-args: |
DANSWER_VERSION=${{ github.ref_name }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: docker.io/danswer/danswer-web-server:${{ github.ref_name }}
severity: 'CRITICAL,HIGH'

View File

@@ -20,10 +20,12 @@ jobs:
cache-dependency-path: |
backend/requirements/default.txt
backend/requirements/dev.txt
backend/requirements/model_server.txt
- run: |
python -m pip install --upgrade pip
pip install -r backend/requirements/default.txt
pip install -r backend/requirements/dev.txt
pip install -r backend/requirements/model_server.txt
- name: Run MyPy
run: |

35
.github/workflows/pr-python-tests.yml vendored Normal file
View File

@@ -0,0 +1,35 @@
name: Python Unit Tests
on:
pull_request:
branches: [ main ]
jobs:
backend-check:
runs-on: ubuntu-latest
env:
PYTHONPATH: ./backend
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'
cache-dependency-path: |
backend/requirements/default.txt
backend/requirements/dev.txt
- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install -r backend/requirements/default.txt
pip install -r backend/requirements/dev.txt
- name: Run Tests
shell: script -q -e -c "bash --noprofile --norc -eo pipefail {0}"
run: py.test -o junit_family=xunit2 -xv --ff backend/tests/unit

21
.github/workflows/pr-quality-checks.yml vendored Normal file
View File

@@ -0,0 +1,21 @@
name: Quality Checks PR
concurrency:
group: Quality-Checks-PR-${{ github.head_ref }}
cancel-in-progress: true
on:
pull_request: null
jobs:
quality-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: pre-commit/action@v3.0.0
with:
extra_args: --from-ref ${{ github.event.pull_request.base.sha }} --to-ref ${{ github.event.pull_request.head.sha }}

6
.gitignore vendored
View File

@@ -1,3 +1,7 @@
.env
.DS_store
.venv
.venv
.mypy_cache
.idea
/deployment/data/nginx/app.conf
.vscode/launch.json

View File

@@ -28,6 +28,13 @@ repos:
rev: v0.0.286
hooks:
- id: ruff
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v3.1.0
hooks:
- id: prettier
types_or: [html, css, javascript, ts, tsx]
additional_dependencies:
- prettier
# We would like to have a mypy pre-commit hook, but due to the fact that
# pre-commit runs in it's own isolated environment, we would need to install

View File

@@ -11,62 +11,6 @@
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "API Server",
"type": "python",
"request": "launch",
"module": "uvicorn",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"DISABLE_AUTH": "True",
"TYPESENSE_API_KEY": "typesense_api_key",
"DYNAMIC_CONFIG_DIR_PATH": "./dynamic_config_storage"
},
"args": [
"danswer.main:app",
"--reload",
"--port",
"8080"
]
},
{
"name": "Indexer",
"type": "python",
"request": "launch",
"program": "danswer/background/update.py",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONPATH": ".",
"TYPESENSE_API_KEY": "typesense_api_key",
"DYNAMIC_CONFIG_DIR_PATH": "./dynamic_config_storage"
}
},
{
"name": "Temp File Deletion",
"type": "python",
"request": "launch",
"program": "danswer/background/file_deletion.py",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONPATH": "${workspaceFolder}/backend"
}
},
// For the listner to access the Slack API,
// DANSWER_BOT_SLACK_APP_TOKEN & DANSWER_BOT_SLACK_BOT_TOKEN need to be set in .env file located in the root of the project
{
"name": "Slack Bot Listener",
"type": "python",
"request": "launch",
"program": "danswer/listeners/slack_listener.py",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.env",
"env": {
"LOG_LEVEL": "DEBUG"
}
},
{
"name": "Web Server",
"type": "node",
@@ -77,6 +21,85 @@
"run", "dev"
],
"console": "integratedTerminal"
},
{
"name": "Model Server",
"type": "python",
"request": "launch",
"module": "uvicorn",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1"
},
"args": [
"model_server.main:app",
"--reload",
"--port",
"9000"
]
},
{
"name": "API Server",
"type": "python",
"request": "launch",
"module": "uvicorn",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_ALL_MODEL_INTERACTIONS": "True",
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1"
},
"args": [
"danswer.main:app",
"--reload",
"--port",
"8080"
]
},
{
"name": "Indexing",
"type": "python",
"request": "launch",
"program": "danswer/background/update.py",
"cwd": "${workspaceFolder}/backend",
"env": {
"ENABLE_MINI_CHUNK": "false",
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
}
},
// Celery and all async jobs, usually would include indexing as well but this is handled separately above for dev
{
"name": "Background Jobs",
"type": "python",
"request": "launch",
"program": "scripts/dev_run_background_jobs.py",
"cwd": "${workspaceFolder}/backend",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
},
"args": [
"--no-indexing"
]
},
// For the listner to access the Slack API,
// DANSWER_BOT_SLACK_APP_TOKEN & DANSWER_BOT_SLACK_BOT_TOKEN need to be set in .env file located in the root of the project
{
"name": "Slack Bot",
"type": "python",
"request": "launch",
"program": "danswer/danswerbot/slack/listener.py",
"cwd": "${workspaceFolder}/backend",
"envFile": "${workspaceFolder}/.env",
"env": {
"LOG_LEVEL": "DEBUG",
"PYTHONUNBUFFERED": "1",
"PYTHONPATH": "."
}
}
]
}

View File

@@ -1,3 +1,5 @@
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md"} -->
# Contributing to Danswer
Hey there! We are so excited that you're interested in Danswer.
@@ -20,7 +22,7 @@ Your input is vital to making sure that Danswer moves in the right direction.
Before starting on implementation, please raise a GitHub issue.
And always feel free to message us (Chris Weaver / Yuhong Sun) on
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w) /
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-2afut44lv-Rw3kSWu6_OmdAXRpCv80DQ) /
[Discord](https://discord.gg/TDJ59cGV2X) directly about anything at all.
@@ -38,7 +40,7 @@ Our goal is to make contributing as easy as possible. If you run into any issues
That way we can help future contributors and users can avoid the same issue.
We also have support channels and generally interesting discussions on our
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w)
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-2afut44lv-Rw3kSWu6_OmdAXRpCv80DQ)
and
[Discord](https://discord.gg/TDJ59cGV2X).
@@ -56,9 +58,10 @@ development purposes but also feel free to just use the containers and update wi
### Local Set Up
It is recommended to use Python versions >= 3.11.
It is recommended to use Python version 3.11
This guide skips setting up User Authentication for the purpose of simplicity
If using a lower version, modifications will have to be made to the code.
If using a higher version, the version of Tensorflow we use may not be available for your platform.
#### Installing Requirements
@@ -69,15 +72,20 @@ For convenience here's a command for it:
python -m venv .venv
source .venv/bin/activate
```
_For Windows activate via:_
_For Windows, activate the virtual environment using Command Prompt:_
```bash
.venv\Scripts\activate
```
If using PowerShell, the command slightly differs:
```powershell
.venv\Scripts\Activate.ps1
```
Install the required python dependencies:
```bash
pip install -r danswer/backend/requirements/default.txt
pip install -r danswer/backend/requirements/dev.txt
pip install -r danswer/backend/requirements/model_server.txt
```
Install [Node.js and npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm) for the frontend.
@@ -86,7 +94,12 @@ Once the above is done, navigate to `danswer/web` run:
npm i
```
Install Playwright (required by the Web Connector), with the python venv active, run:
Install Playwright (required by the Web Connector)
> Note: If you have just done the pip install, open a new terminal and source the python virtual-env again.
This will update the path to include playwright
Then install Playwright by running:
```bash
playwright install
```
@@ -100,26 +113,24 @@ docker compose -f docker-compose.dev.yml -p danswer-stack up -d index relational
(index refers to Vespa and relational_db refers to Postgres)
#### Running Danswer
Setup a folder to store config. Navigate to `danswer/backend` and run:
```bash
mkdir dynamic_config_storage
```
To start the frontend, navigate to `danswer/web` and run:
```bash
npm run dev
```
Package the Vespa schema. This will only need to be done when the Vespa schema is updated locally.
Nagivate to `danswer/backend/danswer/document_index/vespa/app_config` and run:
Next, start the model server which runs the local NLP models.
Navigate to `danswer/backend` and run:
```bash
zip -r ../vespa-app.zip .
uvicorn model_server.main:app --reload --port 9000
```
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
uvicorn model_server.main:app --reload --port 9000
"
```
- Note: If you don't have the `zip` utility, you will need to install it prior to running the above
The first time running Danswer, you will also need to run the DB migrations for Postgres.
The first time running Danswer, you will need to run the DB migrations for Postgres.
After the first time, this is no longer required unless the DB models change.
Navigate to `danswer/backend` and with the venv active, run:
@@ -137,17 +148,12 @@ python ./scripts/dev_run_background_jobs.py
To run the backend API server, navigate back to `danswer/backend` and run:
```bash
AUTH_TYPE=disabled \
DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage \
VESPA_DEPLOYMENT_ZIP=./danswer/document_index/vespa/vespa-app.zip \
uvicorn danswer.main:app --reload --port 8080
AUTH_TYPE=disabled uvicorn danswer.main:app --reload --port 8080
```
_For Windows (for compatibility with both PowerShell and Command Prompt):_
```bash
powershell -Command "
$env:AUTH_TYPE='disabled'
$env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'
$env:VESPA_DEPLOYMENT_ZIP='./danswer/document_index/vespa/vespa-app.zip'
uvicorn danswer.main:app --reload --port 8080
"
```
@@ -166,20 +172,16 @@ pre-commit install
Additionally, we use `mypy` for static type checking.
Danswer is fully type-annotated, and we would like to keep it that way!
Right now, there is no automated type checking at the moment (coming soon), but we ask you to manually run it before
creating a pull requests with `python -m mypy .` from the `danswer/backend` directory.
To run the mypy checks manually, run `python -m mypy .` from the `danswer/backend` directory.
#### Web
We use `prettier` for formatting. The desired version (2.8.8) will be installed via a `npm i` from the `danswer/web` directory.
To run the formatter, use `npx prettier --write .` from the `danswer/web` directory.
Like `mypy`, we have no automated formatting yet (coming soon), but we request that, for now,
you run this manually before creating a pull request.
Please double check that prettier passes before creating a pull request.
### Release Process
Danswer follows the semver versioning standard.
A set of Docker containers will be pushed automatically to DockerHub with every tag.
You can see the containers [here](https://hub.docker.com/search?q=danswer%2F).
As pre-1.0 software, even patch releases may contain breaking or non-backwards-compatible changes.

View File

@@ -1,15 +1,17 @@
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/README.md"} -->
<h2 align="center">
<a href="https://www.danswer.ai/"> <img width="50%" src="https://github.com/danswer-owners/danswer/blob/1fabd9372d66cd54238847197c33f091a724803b/DanswerWithName.png?raw=true)" /></a>
</h2>
<p align="center">
<p align="center">OpenSource Enterprise Question-Answering</p>
<p align="center">Open Source Gen-AI Chat + Unified Search.</p>
<p align="center">
<a href="https://docs.danswer.dev/" target="_blank">
<img src="https://img.shields.io/badge/docs-view-blue" alt="Documentation">
</a>
<a href="https://join.slack.com/t/danswer/shared_invite/zt-1u5ycen3o-6SJbWfivLWP5LPyp_jftuw" target="_blank">
<a href="https://join.slack.com/t/danswer/shared_invite/zt-2afut44lv-Rw3kSWu6_OmdAXRpCv80DQ" target="_blank">
<img src="https://img.shields.io/badge/slack-join-blue.svg?logo=slack" alt="Slack">
</a>
<a href="https://discord.gg/TDJ59cGV2X" target="_blank">
@@ -20,62 +22,88 @@
</a>
</p>
<strong>[Danswer](https://www.danswer.ai/)</strong> allows you to ask natural language questions against internal documents and get back reliable answers backed by quotes and references from the source material so that you can always trust what you get back. You can connect to a number of common tools such as Slack, GitHub, Confluence, amongst others.
<strong>[Danswer](https://www.danswer.ai/)</strong> is the AI Assistant connected to your company's docs, apps, and people.
Danswer provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any
scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your
own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready
for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for
configuring Personas (AI Assistants) and their Prompts.
Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc.
By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if
it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already
supported?" or "Where's the pull request for feature Y?"
<h3>Usage</h3>
Danswer provides a fully-featured web UI:
Danswer Web App:
https://github.com/danswer-ai/danswer/assets/32520769/563be14c-9304-47b5-bf0a-9049c2b6f410
https://github.com/danswer-ai/danswer/assets/25087905/619607a1-4ad2-41a0-9728-351752acc26e
Or, if you prefer, you can plug Danswer into your existing Slack workflows (more integrations to come 😁):
Or, plug Danswer into your existing Slack workflows (more integrations to come 😁):
https://github.com/danswer-ai/danswer/assets/25087905/3e19739b-d178-4371-9a38-011430bdec1b
For more details on the admin controls, check out our <strong><a href="https://www.youtube.com/watch?v=geNzY1nbCnU">Full Video Demo</a></strong>!
For more details on the Admin UI to manage connectors and users, check out our
<strong><a href="https://www.youtube.com/watch?v=geNzY1nbCnU">Full Video Demo</a></strong>!
<h3>Deployment</h3>
## Deployment
Danswer can easily be tested locally or deployed on a virtual machine with a single `docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
Danswer can easily be run locally (even on a laptop) or deployed on a virtual machine with a single
`docker compose` command. Checkout our [docs](https://docs.danswer.dev/quickstart) to learn more.
We also have built-in support for deployment on Kubernetes. Files for that can be found [here](https://github.com/danswer-ai/danswer/tree/main/deployment/kubernetes).
## 💃 Features
* Direct QA powered by Generative AI models with answers backed by quotes and source links.
* Intelligent Document Retrieval (Semantic Search/Reranking) using the latest LLMs.
* An AI Helper backed by a custom Deep Learning model to interpret user intent.
* User authentication and document level access management.
* Support for an LLM of your choice (GPT-4, Llama2, Orca, etc.)
* Management Dashboard to manage connectors and set up features such as live update fetching.
* One line Docker Compose (or Kubernetes) deployment to host Danswer anywhere.
## 🔌 Connectors
## 💃 Main Features
* Chat UI with the ability to select documents to chat with.
* Create custom AI Assistants with different prompts and backing knowledge sets.
* Connect Danswer with LLM of your choice (self-host for a fully airgapped solution).
* Document Search + AI Answers for natural language queries.
* Connectors to all common workplace tools like Google Drive, Confluence, Slack, etc.
* Slack integration to get answers and search results directly in Slack.
Danswer currently syncs documents (every 10 minutes) from:
## 🚧 Roadmap
* Chat/Prompt sharing with specific teammates and user groups.
* Multi-Model model support, chat with images, video etc.
* Choosing between LLMs and parameters during chat session.
* Tool calling and agent configurations options.
* Organizational understanding and ability to locate and suggest experts from your team.
## Other Noteable Benefits of Danswer
* User Authentication with document level access management.
* Best in class Hybrid Search across all sources (BM-25 + prefix aware embedding models).
* Admin Dashboard to configure connectors, document-sets, access, etc.
* Custom deep learning models + learn from user feedback.
* Easy deployment and ability to host Danswer anywhere of your choosing.
## 🔌 Connectors
Efficiently pulls the latest changes from:
* Slack
* GitHub
* Google Drive
* Confluence
* Jira
* Zendesk
* Gmail
* Notion
* Gong
* Slab
* Linear
* Productboard
* Guru
* Zulip
* Bookstack
* Document360
* Sharepoint
* Hubspot
* Local Files
* Websites
* With more to come...
## 🚧 Roadmap
* Chat/Conversation support.
* Organizational understanding.
* Ability to locate and suggest experts.
* And more ...
## 💡 Contributing
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.

17
backend/.dockerignore Normal file
View File

@@ -0,0 +1,17 @@
**/__pycache__
venv/
env/
*.egg-info
.cache
.git/
.svn/
.vscode/
.idea/
*.log
log/
.env
secrets.yaml
build/
dist/
.coverage
htmlcov/

2
backend/.gitignore vendored
View File

@@ -1,4 +1,5 @@
__pycache__/
.mypy_cache
.idea/
site_crawls/
.ipynb_checkpoints/
@@ -7,3 +8,4 @@ api_keys.py
.env
vespa-app.zip
dynamic_config_storage/
celerybeat-schedule*

46
backend/.trivyignore Normal file
View File

@@ -0,0 +1,46 @@
# https://github.com/madler/zlib/issues/868
# Pulled in with base Debian image, it's part of the contrib folder but unused
# zlib1g is fine
# Will be gone with Debian image upgrade
# No impact in our settings
CVE-2023-45853
# krb5 related, worst case is denial of service by resource exhaustion
# Accept the risk
CVE-2024-26458
CVE-2024-26461
CVE-2024-26462
CVE-2024-26458
CVE-2024-26461
CVE-2024-26462
CVE-2024-26458
CVE-2024-26461
CVE-2024-26462
CVE-2024-26458
CVE-2024-26461
CVE-2024-26462
# Specific to Firefox which we do not use
# No impact in our settings
CVE-2024-0743
# bind9 related, worst case is denial of service by CPU resource exhaustion
# Accept the risk
CVE-2023-50387
CVE-2023-50868
CVE-2023-50387
CVE-2023-50868
# libexpat1, XML parsing resource exhaustion
# We don't parse any user provided XMLs
# No impact in our settings
CVE-2023-52425
CVE-2024-28757
# sqlite, only used by NLTK library to grab word lemmatizer and stopwords
# No impact in our settings
CVE-2023-7104
# libharfbuzz0b, O(n^2) growth, worst case is denial of service
# Accept the risk
CVE-2023-25193

View File

@@ -1,10 +1,25 @@
FROM python:3.11.4-slim-bookworm
FROM python:3.11.7-slim-bookworm
LABEL com.danswer.maintainer="founders@danswer.ai"
LABEL com.danswer.description="This image is for the backend of Danswer. It is MIT Licensed and \
free for all to use. You can find it at https://hub.docker.com/r/danswer/danswer-backend. For \
more details, visit https://github.com/danswer-ai/danswer."
# Default DANSWER_VERSION, typically overriden during builds by GitHub Actions.
ARG DANSWER_VERSION=0.3-dev
ENV DANSWER_VERSION=${DANSWER_VERSION}
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
# Install system dependencies
# cmake needed for psycopg (postgres)
# libpq-dev needed for psycopg (postgres)
# curl included just for users' convenience
# zip for Vespa step futher down
# ca-certificates for HTTPS
RUN apt-get update && \
apt-get install -y git cmake pkg-config libprotobuf-c-dev protobuf-compiler \
libprotobuf-dev libgoogle-perftools-dev libpq-dev build-essential cron curl \
supervisor zip ca-certificates gnupg && \
apt-get install -y cmake curl zip ca-certificates libgnutls30=3.7.9-2+deb12u2 \
libblkid1=2.38.1-5+deb12u1 libmount1=2.38.1-5+deb12u1 libsmartcols1=2.38.1-5+deb12u1 \
libuuid1=2.38.1-5+deb12u1 && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
@@ -13,45 +28,36 @@ RUN apt-get update && \
COPY ./requirements/default.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt && \
pip uninstall -y py && \
playwright install chromium && \
playwright install-deps chromium
# install nodejs and replace nodejs packaged with playwright (18.17.0) with the one installed below
# based on the instructions found here:
# https://nodejs.org/en/download/package-manager#debian-and-ubuntu-based-linux-distributions
# this is temporarily needed until playwright updates their packaged node version to
# 20.5.1+
RUN mkdir -p /etc/apt/keyrings && \
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_20.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list && \
apt-get update && \
apt-get install -y nodejs && \
cp /usr/bin/node /usr/local/lib/python3.11/site-packages/playwright/driver/node && \
apt-get remove -y nodejs
playwright install chromium && playwright install-deps chromium && \
ln -s /usr/local/bin/supervisord /usr/bin/supervisord
# Cleanup for CVEs and size reduction
# Remove tornado test key to placate vulnerability scanners
# More details can be found here:
# https://github.com/tornadoweb/tornado/issues/3107
RUN apt-get remove -y linux-libc-dev && \
# xserver-common and xvfb included by playwright installation but not needed after
# perl-base is part of the base Python Debian image but not needed for Danswer functionality
# perl-base could only be removed with --allow-remove-essential
RUN apt-get remove -y --allow-remove-essential perl-base xserver-common xvfb cmake \
libldap-2.5-0 libldap-2.5-0 && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/* && \
rm /usr/local/lib/python3.11/site-packages/tornado/test/test.key
# Pre-downloading models for setups with limited egress
RUN python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('intfloat/e5-base-v2')"
# Pre-downloading NLTK for setups with limited egress
RUN python -c "import nltk; \
nltk.download('stopwords', quiet=True); \
nltk.download('wordnet', quiet=True); \
nltk.download('punkt', quiet=True);"
# Set up application files
WORKDIR /app
COPY ./danswer /app/danswer
COPY ./shared_configs /app/shared_configs
COPY ./alembic /app/alembic
COPY ./alembic.ini /app/alembic.ini
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# Create Vespa app zip
WORKDIR /app/danswer/document_index/vespa/app_config
RUN zip -r /app/danswer/vespa-app.zip .
WORKDIR /app
# TODO: remove this once all users have migrated
COPY ./scripts/migrate_vespa_to_acl.py /app/migrate_vespa_to_acl.py
COPY supervisord.conf /usr/etc/supervisord.conf
ENV PYTHONPATH /app

View File

@@ -0,0 +1,46 @@
FROM python:3.11.7-slim-bookworm
LABEL com.danswer.maintainer="founders@danswer.ai"
LABEL com.danswer.description="This image is for the Danswer model server which runs all of the \
AI models for Danswer. This container and all the code is MIT Licensed and free for all to use. \
You can find it at https://hub.docker.com/r/danswer/danswer-model-server. For more details, \
visit https://github.com/danswer-ai/danswer."
# Default DANSWER_VERSION, typically overriden during builds by GitHub Actions.
ARG DANSWER_VERSION=0.3-dev
ENV DANSWER_VERSION=${DANSWER_VERSION}
RUN echo "DANSWER_VERSION: ${DANSWER_VERSION}"
COPY ./requirements/model_server.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt
RUN apt-get remove -y --allow-remove-essential perl-base && \
apt-get autoremove -y
# Pre-downloading models for setups with limited egress
RUN python -c "from transformers import AutoModel, AutoTokenizer, TFDistilBertForSequenceClassification; \
from huggingface_hub import snapshot_download; \
AutoTokenizer.from_pretrained('danswer/intent-model'); \
AutoTokenizer.from_pretrained('intfloat/e5-base-v2'); \
AutoTokenizer.from_pretrained('mixedbread-ai/mxbai-rerank-xsmall-v1'); \
snapshot_download('danswer/intent-model'); \
snapshot_download('intfloat/e5-base-v2'); \
snapshot_download('mixedbread-ai/mxbai-rerank-xsmall-v1')"
WORKDIR /app
# Utils used by model server
COPY ./danswer/utils/logger.py /app/danswer/utils/logger.py
# Place to fetch version information
COPY ./danswer/__init__.py /app/danswer/__init__.py
# Shared between Danswer Backend and Model Server
COPY ./shared_configs /app/shared_configs
# Model Server main code
COPY ./model_server /app/model_server
ENV PYTHONPATH /app
CMD ["uvicorn", "model_server.main:app", "--host", "0.0.0.0", "--port", "9000"]

View File

@@ -1,4 +1,8 @@
Generic single-database configuration with an async dbapi.
<!-- DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/backend/alembic/README.md"} -->
# Alembic DB Migrations
These files are for creating/updating the tables in the Relational DB (Postgres).
Danswer migrations use a generic single-database configuration with an async dbapi.
## To generate new migrations:
run from danswer/backend:
@@ -7,7 +11,6 @@ run from danswer/backend:
More info can be found here: https://alembic.sqlalchemy.org/en/latest/autogenerate.html
## Running migrations
To run all un-applied migrations:
`alembic upgrade head`

View File

@@ -0,0 +1,31 @@
"""Add starter prompts
Revision ID: 0a2b51deb0b8
Revises: 5f4b8568a221
Create Date: 2024-03-02 23:23:49.960309
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "0a2b51deb0b8"
down_revision = "5f4b8568a221"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column(
"starter_messages",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
def downgrade() -> None:
op.drop_column("persona", "starter_messages")

View File

@@ -0,0 +1,113 @@
"""Enable Encrypted Fields
Revision ID: 0a98909f2757
Revises: 570282d33c49
Create Date: 2024-05-05 19:30:34.317972
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table
from sqlalchemy.dialects import postgresql
import json
from danswer.utils.encryption import encrypt_string_to_bytes
# revision identifiers, used by Alembic.
revision = "0a98909f2757"
down_revision = "570282d33c49"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
connection = op.get_bind()
op.alter_column("key_value_store", "value", nullable=True)
op.add_column(
"key_value_store",
sa.Column(
"encrypted_value",
sa.LargeBinary,
nullable=True,
),
)
# Need a temporary column to translate the JSONB to binary
op.add_column("credential", sa.Column("temp_column", sa.LargeBinary()))
creds_table = table(
"credential",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"credential_json",
postgresql.JSONB(astext_type=sa.Text()),
nullable=False,
),
sa.Column(
"temp_column",
sa.LargeBinary(),
nullable=False,
),
)
results = connection.execute(sa.select(creds_table))
# This uses the MIT encrypt which does not actually encrypt the credentials
# In other words, this upgrade does not apply the encryption. Porting existing sensitive data
# and key rotation currently is not supported and will come out in the future
for row_id, creds, _ in results:
creds_binary = encrypt_string_to_bytes(json.dumps(creds))
connection.execute(
creds_table.update()
.where(creds_table.c.id == row_id)
.values(temp_column=creds_binary)
)
op.drop_column("credential", "credential_json")
op.alter_column("credential", "temp_column", new_column_name="credential_json")
op.add_column("llm_provider", sa.Column("temp_column", sa.LargeBinary()))
llm_table = table(
"llm_provider",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"api_key",
sa.String(),
nullable=False,
),
sa.Column(
"temp_column",
sa.LargeBinary(),
nullable=False,
),
)
results = connection.execute(sa.select(llm_table))
for row_id, api_key, _ in results:
llm_key = encrypt_string_to_bytes(api_key)
connection.execute(
llm_table.update()
.where(llm_table.c.id == row_id)
.values(temp_column=llm_key)
)
op.drop_column("llm_provider", "api_key")
op.alter_column("llm_provider", "temp_column", new_column_name="api_key")
def downgrade() -> None:
# Some information loss but this is ok. Should not allow decryption via downgrade.
op.drop_column("credential", "credential_json")
op.drop_column("llm_provider", "api_key")
op.add_column("llm_provider", sa.Column("api_key", sa.String()))
op.add_column(
"credential",
sa.Column("credential_json", postgresql.JSONB(astext_type=sa.Text())),
)
op.execute("DELETE FROM key_value_store WHERE value IS NULL")
op.alter_column("key_value_store", "value", nullable=False)
op.drop_column("key_value_store", "encrypted_value")

View File

@@ -0,0 +1,37 @@
"""Introduce Danswer APIs
Revision ID: 15326fcec57e
Revises: 77d07dffae64
Create Date: 2023-11-11 20:51:24.228999
"""
from alembic import op
import sqlalchemy as sa
from danswer.configs.constants import DocumentSource
# revision identifiers, used by Alembic.
revision = "15326fcec57e"
down_revision = "77d07dffae64"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.alter_column("credential", "is_admin", new_column_name="admin_public")
op.add_column(
"document",
sa.Column("from_ingestion_api", sa.Boolean(), nullable=True),
)
op.alter_column(
"connector",
"source",
type_=sa.String(length=50),
existing_type=sa.Enum(DocumentSource, native_enum=False),
existing_nullable=False,
)
def downgrade() -> None:
op.drop_column("document", "from_ingestion_api")
op.alter_column("credential", "admin_public", new_column_name="is_admin")

View File

@@ -0,0 +1,29 @@
"""Port Config Store
Revision ID: 173cae5bba26
Revises: e50154680a5c
Create Date: 2024-03-19 15:30:44.425436
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "173cae5bba26"
down_revision = "e50154680a5c"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"key_value_store",
sa.Column("key", sa.String(), nullable=False),
sa.Column("value", postgresql.JSONB(astext_type=sa.Text()), nullable=False),
sa.PrimaryKeyConstraint("key"),
)
def downgrade() -> None:
op.drop_table("key_value_store")

View File

@@ -13,8 +13,8 @@ from alembic import op
# revision identifiers, used by Alembic.
revision = "2666d766cb9b"
down_revision = "6d387b3196c2"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -13,8 +13,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "27c6ecc08586"
down_revision = "2666d766cb9b"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -11,8 +11,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "30c1d5744104"
down_revision = "7f99be1cb9f5"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,45 @@
"""Add tool table
Revision ID: 3879338f8ba1
Revises: f1c6478c3fd8
Create Date: 2024-05-11 16:11:23.718084
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "3879338f8ba1"
down_revision = "f1c6478c3fd8"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"tool",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("in_code_tool_id", sa.String(), nullable=True),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"persona__tool",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column("tool_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["tool_id"],
["tool.id"],
),
sa.PrimaryKeyConstraint("persona_id", "tool_id"),
)
def downgrade() -> None:
op.drop_table("persona__tool")
op.drop_table("tool")

View File

@@ -0,0 +1,41 @@
"""Add chat session sharing
Revision ID: 38eda64af7fe
Revises: 776b3bbe9092
Create Date: 2024-03-27 19:41:29.073594
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "38eda64af7fe"
down_revision = "776b3bbe9092"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_session",
sa.Column(
"shared_status",
sa.Enum(
"PUBLIC",
"PRIVATE",
name="chatsessionsharedstatus",
native_enum=False,
),
nullable=True,
),
)
op.execute("UPDATE chat_session SET shared_status='PRIVATE'")
op.alter_column(
"chat_session",
"shared_status",
nullable=False,
)
def downgrade() -> None:
op.drop_column("chat_session", "shared_status")

View File

@@ -11,8 +11,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "3b25685ff73c"
down_revision = "e0a68a81d434"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ from alembic import op
# revision identifiers, used by Alembic.
revision = "3c5e35aa9af0"
down_revision = "27c6ecc08586"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,49 @@
"""Add tables for UI-based LLM configuration
Revision ID: 401c1ac29467
Revises: 703313b75876
Create Date: 2024-04-13 18:07:29.153817
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "401c1ac29467"
down_revision = "703313b75876"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"llm_provider",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("name", sa.String(), nullable=False),
sa.Column("api_key", sa.String(), nullable=True),
sa.Column("api_base", sa.String(), nullable=True),
sa.Column("api_version", sa.String(), nullable=True),
sa.Column(
"custom_config",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
sa.Column("default_model_name", sa.String(), nullable=False),
sa.Column("fast_default_model_name", sa.String(), nullable=True),
sa.Column("is_default_provider", sa.Boolean(), unique=True, nullable=True),
sa.Column("model_names", postgresql.ARRAY(sa.String()), nullable=True),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("name"),
)
op.add_column(
"persona",
sa.Column("llm_model_provider_override", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "llm_model_provider_override")
op.drop_table("llm_provider")

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "465f78d9b7f9"
down_revision = "3c5e35aa9af0"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -11,8 +11,8 @@ from sqlalchemy import String
# revision identifiers, used by Alembic.
revision = "46625e4745d4"
down_revision = "9d97fecfab7f"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,28 @@
"""PG File Store
Revision ID: 4738e4b3bae1
Revises: e91df4e935ef
Create Date: 2024-03-20 18:53:32.461518
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "4738e4b3bae1"
down_revision = "e91df4e935ef"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"file_store",
sa.Column("file_name", sa.String(), nullable=False),
sa.Column("lobj_oid", sa.Integer(), nullable=False),
sa.PrimaryKeyConstraint("file_name"),
)
def downgrade() -> None:
op.drop_table("file_store")

View File

@@ -11,9 +11,9 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "47433d30de82"
down_revision = None
branch_labels = None
depends_on = None
down_revision: None = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,23 @@
"""Add name to api_key
Revision ID: 475fcefe8826
Revises: ecab2b3f1a3b
Create Date: 2024-04-11 11:05:18.414438
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "475fcefe8826"
down_revision = "ecab2b3f1a3b"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("api_key", sa.Column("name", sa.String(), nullable=True))
def downgrade() -> None:
op.drop_column("api_key", "name")

View File

@@ -0,0 +1,28 @@
"""Add additional retrieval controls to Persona
Revision ID: 50b683a8295c
Revises: 7da0ae5ad583
Create Date: 2023-11-27 17:23:29.668422
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "50b683a8295c"
down_revision = "7da0ae5ad583"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("persona", sa.Column("num_chunks", sa.Integer(), nullable=True))
op.add_column(
"persona",
sa.Column("apply_llm_relevance_filter", sa.Boolean(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "apply_llm_relevance_filter")
op.drop_column("persona", "num_chunks")

View File

@@ -0,0 +1,27 @@
"""Track Danswerbot Explicitly
Revision ID: 570282d33c49
Revises: 7547d982db8f
Create Date: 2024-05-04 17:49:28.568109
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "570282d33c49"
down_revision = "7547d982db8f"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_session", sa.Column("danswerbot_flow", sa.Boolean(), nullable=True)
)
op.execute("UPDATE chat_session SET danswerbot_flow = one_shot")
op.alter_column("chat_session", "danswerbot_flow", nullable=False)
def downgrade() -> None:
op.drop_column("chat_session", "danswerbot_flow")

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "57b53544726e"
down_revision = "800f48024ae9"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -13,8 +13,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5809c0787398"
down_revision = "d929f0c1c6af"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5e84129c8be3"
down_revision = "e6a4bbc13fe4"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,27 @@
"""add removed documents to index_attempt
Revision ID: 5f4b8568a221
Revises: dbaa756c2ccf
Create Date: 2024-02-16 15:02:03.319907
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "5f4b8568a221"
down_revision = "8987770549c0"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"index_attempt",
sa.Column("docs_removed_from_index", sa.Integer()),
)
op.execute("UPDATE index_attempt SET docs_removed_from_index = 0")
def downgrade() -> None:
op.drop_column("index_attempt", "docs_removed_from_index")

View File

@@ -0,0 +1,45 @@
"""Add user-configured names to LLMProvider
Revision ID: 643a84a42a33
Revises: 0a98909f2757
Create Date: 2024-05-07 14:54:55.493100
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "643a84a42a33"
down_revision = "0a98909f2757"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("llm_provider", sa.Column("provider", sa.String(), nullable=True))
# move "name" -> "provider" to match the new schema
op.execute("UPDATE llm_provider SET provider = name")
# pretty up display name
op.execute("UPDATE llm_provider SET name = 'OpenAI' WHERE name = 'openai'")
op.execute("UPDATE llm_provider SET name = 'Anthropic' WHERE name = 'anthropic'")
op.execute("UPDATE llm_provider SET name = 'Azure OpenAI' WHERE name = 'azure'")
op.execute("UPDATE llm_provider SET name = 'AWS Bedrock' WHERE name = 'bedrock'")
# update personas to use the new provider names
op.execute(
"UPDATE persona SET llm_model_provider_override = 'OpenAI' WHERE llm_model_provider_override = 'openai'"
)
op.execute(
"UPDATE persona SET llm_model_provider_override = 'Anthropic' WHERE llm_model_provider_override = 'anthropic'"
)
op.execute(
"UPDATE persona SET llm_model_provider_override = 'Azure OpenAI' WHERE llm_model_provider_override = 'azure'"
)
op.execute(
"UPDATE persona SET llm_model_provider_override = 'AWS Bedrock' WHERE llm_model_provider_override = 'bedrock'"
)
def downgrade() -> None:
op.execute("UPDATE llm_provider SET name = provider")
op.drop_column("llm_provider", "provider")

View File

@@ -13,8 +13,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "6d387b3196c2"
down_revision = "47433d30de82"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,83 @@
"""Add TokenRateLimit Tables
Revision ID: 703313b75876
Revises: fad14119fb92
Create Date: 2024-04-15 01:36:02.952809
"""
import json
from typing import cast
from alembic import op
import sqlalchemy as sa
from danswer.dynamic_configs.factory import get_dynamic_config_store
# revision identifiers, used by Alembic.
revision = "703313b75876"
down_revision = "fad14119fb92"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"token_rate_limit",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("enabled", sa.Boolean(), nullable=False),
sa.Column("token_budget", sa.Integer(), nullable=False),
sa.Column("period_hours", sa.Integer(), nullable=False),
sa.Column(
"scope",
sa.String(length=10),
nullable=False,
),
sa.Column(
"created_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"token_rate_limit__user_group",
sa.Column("rate_limit_id", sa.Integer(), nullable=False),
sa.Column("user_group_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["rate_limit_id"],
["token_rate_limit.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("rate_limit_id", "user_group_id"),
)
try:
settings_json = cast(
str, get_dynamic_config_store().load("token_budget_settings")
)
settings = json.loads(settings_json)
is_enabled = settings.get("enable_token_budget", False)
token_budget = settings.get("token_budget", -1)
period_hours = settings.get("period_hours", -1)
if is_enabled and token_budget > 0 and period_hours > 0:
op.execute(
f"INSERT INTO token_rate_limit \
(enabled, token_budget, period_hours, scope) VALUES \
({is_enabled}, {token_budget}, {period_hours}, 'GLOBAL')"
)
# Delete the dynamic config
get_dynamic_config_store().delete("token_budget_settings")
except Exception:
# Ignore if the dynamic config is not found
pass
def downgrade() -> None:
op.drop_table("token_rate_limit__user_group")
op.drop_table("token_rate_limit")

View File

@@ -0,0 +1,81 @@
"""Permission Auto Sync Framework
Revision ID: 72bdc9929a46
Revises: 475fcefe8826
Create Date: 2024-04-14 21:15:28.659634
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "72bdc9929a46"
down_revision = "475fcefe8826"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"email_to_external_user_cache",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("external_user_id", sa.String(), nullable=False),
sa.Column("user_id", sa.UUID(), nullable=True),
sa.Column("user_email", sa.String(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"external_permission",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("user_id", sa.UUID(), nullable=True),
sa.Column("user_email", sa.String(), nullable=False),
sa.Column(
"source_type",
sa.String(),
nullable=False,
),
sa.Column("external_permission_group", sa.String(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"permission_sync_run",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"source_type",
sa.String(),
nullable=False,
),
sa.Column("update_type", sa.String(), nullable=False),
sa.Column("cc_pair_id", sa.Integer(), nullable=True),
sa.Column(
"status",
sa.String(),
nullable=False,
),
sa.Column("error_msg", sa.Text(), nullable=True),
sa.Column(
"updated_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.ForeignKeyConstraint(
["cc_pair_id"],
["connector_credential_pair.id"],
),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("permission_sync_run")
op.drop_table("external_permission")
op.drop_table("email_to_external_user_cache")

View File

@@ -0,0 +1,51 @@
"""Chat Folders
Revision ID: 7547d982db8f
Revises: ef7da92f7213
Create Date: 2024-05-02 15:18:56.573347
"""
from alembic import op
import sqlalchemy as sa
import fastapi_users_db_sqlalchemy
# revision identifiers, used by Alembic.
revision = "7547d982db8f"
down_revision = "ef7da92f7213"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"chat_folder",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column("name", sa.String(), nullable=True),
sa.Column("display_priority", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.add_column("chat_session", sa.Column("folder_id", sa.Integer(), nullable=True))
op.create_foreign_key(
"chat_session_chat_folder_fk",
"chat_session",
"chat_folder",
["folder_id"],
["id"],
)
def downgrade() -> None:
op.drop_constraint(
"chat_session_chat_folder_fk", "chat_session", type_="foreignkey"
)
op.drop_column("chat_session", "folder_id")
op.drop_table("chat_folder")

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "767f1c2a00eb"
down_revision = "dba7f71618f5"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,32 @@
"""CC-Pair Name not Unique
Revision ID: 76b60d407dfb
Revises: b156fa702355
Create Date: 2023-12-22 21:42:10.018804
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "76b60d407dfb"
down_revision = "b156fa702355"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.execute("DELETE FROM connector_credential_pair WHERE name IS NULL")
op.drop_constraint(
"connector_credential_pair__name__key",
"connector_credential_pair",
type_="unique",
)
op.alter_column(
"connector_credential_pair", "name", existing_type=sa.String(), nullable=False
)
def downgrade() -> None:
# This wasn't really required by the code either, no good reason to make it unique again
pass

View File

@@ -0,0 +1,71 @@
"""Remove Remaining Enums
Revision ID: 776b3bbe9092
Revises: 4738e4b3bae1
Create Date: 2024-03-22 21:34:27.629444
"""
from alembic import op
import sqlalchemy as sa
from danswer.db.models import IndexModelStatus
from danswer.search.enums import RecencyBiasSetting
from danswer.search.models import SearchType
# revision identifiers, used by Alembic.
revision = "776b3bbe9092"
down_revision = "4738e4b3bae1"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.alter_column(
"persona",
"search_type",
type_=sa.String,
existing_type=sa.Enum(SearchType, native_enum=False),
existing_nullable=False,
)
op.alter_column(
"persona",
"recency_bias",
type_=sa.String,
existing_type=sa.Enum(RecencyBiasSetting, native_enum=False),
existing_nullable=False,
)
# Because the indexmodelstatus enum does not have a mapping to a string type
# we need this workaround instead of directly changing the type
op.add_column("embedding_model", sa.Column("temp_status", sa.String))
op.execute("UPDATE embedding_model SET temp_status = status::text")
op.drop_column("embedding_model", "status")
op.alter_column("embedding_model", "temp_status", new_column_name="status")
op.execute("DROP TYPE IF EXISTS searchtype")
op.execute("DROP TYPE IF EXISTS recencybiassetting")
op.execute("DROP TYPE IF EXISTS indexmodelstatus")
def downgrade() -> None:
op.alter_column(
"persona",
"search_type",
type_=sa.Enum(SearchType, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)
op.alter_column(
"persona",
"recency_bias",
type_=sa.Enum(RecencyBiasSetting, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)
op.alter_column(
"embedding_model",
"status",
type_=sa.Enum(IndexModelStatus, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)

View File

@@ -12,8 +12,8 @@ from sqlalchemy import String
# revision identifiers, used by Alembic.
revision = "77d07dffae64"
down_revision = "d61e513bef0a"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -11,8 +11,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "78dbe7e38469"
down_revision = "7ccea01261f6"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,48 @@
"""Add api_key table
Revision ID: 79acd316403a
Revises: 904e5138fffb
Create Date: 2024-01-11 17:56:37.934381
"""
from alembic import op
import fastapi_users_db_sqlalchemy
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "79acd316403a"
down_revision = "904e5138fffb"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"api_key",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("hashed_api_key", sa.String(), nullable=False),
sa.Column("api_key_display", sa.String(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.Column(
"owner_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column(
"created_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("api_key_display"),
sa.UniqueConstraint("hashed_api_key"),
)
def downgrade() -> None:
op.drop_table("api_key")

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "7ccea01261f6"
down_revision = "a570b80a5f20"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,23 @@
"""Add description to persona
Revision ID: 7da0ae5ad583
Revises: e86866a9c78a
Create Date: 2023-11-27 00:16:19.959414
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "7da0ae5ad583"
down_revision = "e86866a9c78a"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("persona", sa.Column("description", sa.String(), nullable=True))
def downgrade() -> None:
op.drop_column("persona", "description")

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "7da543f5672f"
down_revision = "febe9eaa0644"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,26 @@
"""Slack Followup
Revision ID: 7f726bad5367
Revises: 79acd316403a
Create Date: 2024-01-15 00:19:55.991224
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "7f726bad5367"
down_revision = "79acd316403a"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_feedback",
sa.Column("required_followup", sa.Boolean(), nullable=True),
)
def downgrade() -> None:
op.drop_column("chat_feedback", "required_followup")

View File

@@ -11,8 +11,8 @@ from alembic import op
# revision identifiers, used by Alembic.
revision = "7f99be1cb9f5"
down_revision = "78dbe7e38469"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ from sqlalchemy.schema import Sequence, CreateSequence
# revision identifiers, used by Alembic.
revision = "800f48024ae9"
down_revision = "767f1c2a00eb"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,36 @@
"""Add chat session to query_event
Revision ID: 80696cf850ae
Revises: 15326fcec57e
Create Date: 2023-11-26 02:38:35.008070
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "80696cf850ae"
down_revision = "15326fcec57e"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"query_event",
sa.Column("chat_session_id", sa.Integer(), nullable=True),
)
op.create_foreign_key(
"fk_query_event_chat_session_id",
"query_event",
"chat_session",
["chat_session_id"],
["id"],
)
def downgrade() -> None:
op.drop_constraint(
"fk_query_event_chat_session_id", "query_event", type_="foreignkey"
)
op.drop_column("query_event", "chat_session_id")

View File

@@ -0,0 +1,34 @@
"""Add is_visible to Persona
Revision ID: 891cd83c87a8
Revises: 76b60d407dfb
Create Date: 2023-12-21 11:55:54.132279
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "891cd83c87a8"
down_revision = "76b60d407dfb"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("is_visible", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET is_visible = true")
op.alter_column("persona", "is_visible", nullable=False)
op.add_column(
"persona",
sa.Column("display_priority", sa.Integer(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "is_visible")
op.drop_column("persona", "display_priority")

View File

@@ -0,0 +1,25 @@
"""Add full exception stack trace
Revision ID: 8987770549c0
Revises: ec3ec2eabf7b
Create Date: 2024-02-10 19:31:28.339135
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "8987770549c0"
down_revision = "ec3ec2eabf7b"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"index_attempt", sa.Column("full_exception_trace", sa.Text(), nullable=True)
)
def downgrade() -> None:
op.drop_column("index_attempt", "full_exception_trace")

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "8aabb57f3b49"
down_revision = "5e84129c8be3"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "8e26726b7683"
down_revision = "5809c0787398"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "904451035c9b"
down_revision = "3b25685ff73c"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,61 @@
"""Tags
Revision ID: 904e5138fffb
Revises: 891cd83c87a8
Create Date: 2024-01-01 10:44:43.733974
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "904e5138fffb"
down_revision = "891cd83c87a8"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"tag",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("tag_key", sa.String(), nullable=False),
sa.Column("tag_value", sa.String(), nullable=False),
sa.Column("source", sa.String(), nullable=False),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint(
"tag_key", "tag_value", "source", name="_tag_key_value_source_uc"
),
)
op.create_table(
"document__tag",
sa.Column("document_id", sa.String(), nullable=False),
sa.Column("tag_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["document_id"],
["document.id"],
),
sa.ForeignKeyConstraint(
["tag_id"],
["tag.id"],
),
sa.PrimaryKeyConstraint("document_id", "tag_id"),
)
op.add_column(
"search_doc",
sa.Column(
"doc_metadata",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
op.execute("UPDATE search_doc SET doc_metadata = '{}' WHERE doc_metadata IS NULL")
op.alter_column("search_doc", "doc_metadata", nullable=False)
def downgrade() -> None:
op.drop_table("document__tag")
op.drop_table("tag")
op.drop_column("search_doc", "doc_metadata")

View File

@@ -0,0 +1,36 @@
"""Remove DocumentSource from Tag
Revision ID: 91fd3b470d1a
Revises: 173cae5bba26
Create Date: 2024-03-21 12:05:23.956734
"""
from alembic import op
import sqlalchemy as sa
from danswer.configs.constants import DocumentSource
# revision identifiers, used by Alembic.
revision = "91fd3b470d1a"
down_revision = "173cae5bba26"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.alter_column(
"tag",
"source",
type_=sa.String(length=50),
existing_type=sa.Enum(DocumentSource, native_enum=False),
existing_nullable=False,
)
def downgrade() -> None:
op.alter_column(
"tag",
"source",
type_=sa.Enum(DocumentSource, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "9d97fecfab7f"
down_revision = "ffc707a226b4"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "a570b80a5f20"
down_revision = "904451035c9b"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "ae62505e3acc"
down_revision = "7da543f5672f"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -11,8 +11,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "b082fec533f0"
down_revision = "df0c7ad8a076"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,520 @@
"""Chat Reworked
Revision ID: b156fa702355
Revises: baf71f781b9e
Create Date: 2023-12-12 00:57:41.823371
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
from sqlalchemy.dialects.postgresql import ENUM
from danswer.configs.constants import DocumentSource
# revision identifiers, used by Alembic.
revision = "b156fa702355"
down_revision = "baf71f781b9e"
branch_labels: None = None
depends_on: None = None
searchtype_enum = ENUM(
"KEYWORD", "SEMANTIC", "HYBRID", name="searchtype", create_type=True
)
recencybiassetting_enum = ENUM(
"FAVOR_RECENT",
"BASE_DECAY",
"NO_DECAY",
"AUTO",
name="recencybiassetting",
create_type=True,
)
def upgrade() -> None:
bind = op.get_bind()
searchtype_enum.create(bind)
recencybiassetting_enum.create(bind)
# This is irrecoverable, whatever
op.execute("DELETE FROM chat_feedback")
op.execute("DELETE FROM document_retrieval_feedback")
op.create_table(
"search_doc",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("document_id", sa.String(), nullable=False),
sa.Column("chunk_ind", sa.Integer(), nullable=False),
sa.Column("semantic_id", sa.String(), nullable=False),
sa.Column("link", sa.String(), nullable=True),
sa.Column("blurb", sa.String(), nullable=False),
sa.Column("boost", sa.Integer(), nullable=False),
sa.Column(
"source_type",
sa.Enum(DocumentSource, native=False),
nullable=False,
),
sa.Column("hidden", sa.Boolean(), nullable=False),
sa.Column("score", sa.Float(), nullable=False),
sa.Column("match_highlights", postgresql.ARRAY(sa.String()), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=True),
sa.Column("primary_owners", postgresql.ARRAY(sa.String()), nullable=True),
sa.Column("secondary_owners", postgresql.ARRAY(sa.String()), nullable=True),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"prompt",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
sa.Column("name", sa.String(), nullable=False),
sa.Column("description", sa.String(), nullable=False),
sa.Column("system_prompt", sa.Text(), nullable=False),
sa.Column("task_prompt", sa.Text(), nullable=False),
sa.Column("include_citations", sa.Boolean(), nullable=False),
sa.Column("datetime_aware", sa.Boolean(), nullable=False),
sa.Column("default_prompt", sa.Boolean(), nullable=False),
sa.Column("deleted", sa.Boolean(), nullable=False),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("id"),
)
op.create_table(
"persona__prompt",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column("prompt_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["prompt_id"],
["prompt.id"],
),
sa.PrimaryKeyConstraint("persona_id", "prompt_id"),
)
# Changes to persona first so chat_sessions can have the right persona
# The empty persona will be overwritten on server startup
op.add_column(
"persona",
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=True,
),
)
op.add_column(
"persona",
sa.Column(
"search_type",
searchtype_enum,
nullable=True,
),
)
op.execute("UPDATE persona SET search_type = 'HYBRID'")
op.alter_column("persona", "search_type", nullable=False)
op.add_column(
"persona",
sa.Column("llm_relevance_filter", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET llm_relevance_filter = TRUE")
op.alter_column("persona", "llm_relevance_filter", nullable=False)
op.add_column(
"persona",
sa.Column("llm_filter_extraction", sa.Boolean(), nullable=True),
)
op.execute("UPDATE persona SET llm_filter_extraction = TRUE")
op.alter_column("persona", "llm_filter_extraction", nullable=False)
op.add_column(
"persona",
sa.Column(
"recency_bias",
recencybiassetting_enum,
nullable=True,
),
)
op.execute("UPDATE persona SET recency_bias = 'BASE_DECAY'")
op.alter_column("persona", "recency_bias", nullable=False)
op.alter_column("persona", "description", existing_type=sa.VARCHAR(), nullable=True)
op.execute("UPDATE persona SET description = ''")
op.alter_column("persona", "description", nullable=False)
op.create_foreign_key("persona__user_fk", "persona", "user", ["user_id"], ["id"])
op.drop_column("persona", "datetime_aware")
op.drop_column("persona", "tools")
op.drop_column("persona", "hint_text")
op.drop_column("persona", "apply_llm_relevance_filter")
op.drop_column("persona", "retrieval_enabled")
op.drop_column("persona", "system_text")
# Need to create a persona row so fk can work
result = bind.execute(sa.text("SELECT 1 FROM persona WHERE id = 0"))
exists = result.fetchone()
if not exists:
op.execute(
sa.text(
"""
INSERT INTO persona (
id, user_id, name, description, search_type, num_chunks,
llm_relevance_filter, llm_filter_extraction, recency_bias,
llm_model_version_override, default_persona, deleted
) VALUES (
0, NULL, '', '', 'HYBRID', NULL,
TRUE, TRUE, 'BASE_DECAY', NULL, TRUE, FALSE
)
"""
)
)
delete_statement = sa.text(
"""
DELETE FROM persona
WHERE name = 'Danswer' AND default_persona = TRUE AND id != 0
"""
)
bind.execute(delete_statement)
op.add_column(
"chat_feedback",
sa.Column("chat_message_id", sa.Integer(), nullable=False),
)
op.drop_constraint(
"chat_feedback_chat_message_chat_session_id_chat_message_me_fkey",
"chat_feedback",
type_="foreignkey",
)
op.drop_column("chat_feedback", "chat_message_edit_number")
op.drop_column("chat_feedback", "chat_message_chat_session_id")
op.drop_column("chat_feedback", "chat_message_message_number")
op.add_column(
"chat_message",
sa.Column(
"id",
sa.Integer(),
primary_key=True,
autoincrement=True,
nullable=False,
unique=True,
),
)
op.add_column(
"chat_message",
sa.Column("parent_message", sa.Integer(), nullable=True),
)
op.add_column(
"chat_message",
sa.Column("latest_child_message", sa.Integer(), nullable=True),
)
op.add_column(
"chat_message", sa.Column("rephrased_query", sa.Text(), nullable=True)
)
op.add_column("chat_message", sa.Column("prompt_id", sa.Integer(), nullable=True))
op.add_column(
"chat_message",
sa.Column("citations", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
)
op.add_column("chat_message", sa.Column("error", sa.Text(), nullable=True))
op.drop_constraint("fk_chat_message_persona_id", "chat_message", type_="foreignkey")
op.create_foreign_key(
"chat_message__prompt_fk", "chat_message", "prompt", ["prompt_id"], ["id"]
)
op.drop_column("chat_message", "parent_edit_number")
op.drop_column("chat_message", "persona_id")
op.drop_column("chat_message", "reference_docs")
op.drop_column("chat_message", "edit_number")
op.drop_column("chat_message", "latest")
op.drop_column("chat_message", "message_number")
op.add_column("chat_session", sa.Column("one_shot", sa.Boolean(), nullable=True))
op.execute("UPDATE chat_session SET one_shot = TRUE")
op.alter_column("chat_session", "one_shot", nullable=False)
op.alter_column(
"chat_session",
"persona_id",
existing_type=sa.INTEGER(),
nullable=True,
)
op.execute("UPDATE chat_session SET persona_id = 0")
op.alter_column("chat_session", "persona_id", nullable=False)
op.add_column(
"document_retrieval_feedback",
sa.Column("chat_message_id", sa.Integer(), nullable=False),
)
op.drop_constraint(
"document_retrieval_feedback_qa_event_id_fkey",
"document_retrieval_feedback",
type_="foreignkey",
)
op.create_foreign_key(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
"chat_message",
["chat_message_id"],
["id"],
)
op.drop_column("document_retrieval_feedback", "qa_event_id")
# Relation table must be created after the other tables are correct
op.create_table(
"chat_message__search_doc",
sa.Column("chat_message_id", sa.Integer(), nullable=False),
sa.Column("search_doc_id", sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(
["chat_message_id"],
["chat_message.id"],
),
sa.ForeignKeyConstraint(
["search_doc_id"],
["search_doc.id"],
),
sa.PrimaryKeyConstraint("chat_message_id", "search_doc_id"),
)
# Needs to be created after chat_message id field is added
op.create_foreign_key(
"chat_feedback__chat_message_fk",
"chat_feedback",
"chat_message",
["chat_message_id"],
["id"],
)
op.drop_table("query_event")
def downgrade() -> None:
op.drop_constraint(
"chat_feedback__chat_message_fk", "chat_feedback", type_="foreignkey"
)
op.drop_constraint(
"document_retrieval_feedback__chat_message_fk",
"document_retrieval_feedback",
type_="foreignkey",
)
op.drop_constraint("persona__user_fk", "persona", type_="foreignkey")
op.drop_constraint("chat_message__prompt_fk", "chat_message", type_="foreignkey")
op.drop_constraint(
"chat_message__search_doc_chat_message_id_fkey",
"chat_message__search_doc",
type_="foreignkey",
)
op.add_column(
"persona",
sa.Column("system_text", sa.TEXT(), autoincrement=False, nullable=True),
)
op.add_column(
"persona",
sa.Column(
"retrieval_enabled",
sa.BOOLEAN(),
autoincrement=False,
nullable=True,
),
)
op.execute("UPDATE persona SET retrieval_enabled = TRUE")
op.alter_column("persona", "retrieval_enabled", nullable=False)
op.add_column(
"persona",
sa.Column(
"apply_llm_relevance_filter",
sa.BOOLEAN(),
autoincrement=False,
nullable=True,
),
)
op.add_column(
"persona",
sa.Column("hint_text", sa.TEXT(), autoincrement=False, nullable=True),
)
op.add_column(
"persona",
sa.Column(
"tools",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
nullable=True,
),
)
op.add_column(
"persona",
sa.Column("datetime_aware", sa.BOOLEAN(), autoincrement=False, nullable=True),
)
op.execute("UPDATE persona SET datetime_aware = TRUE")
op.alter_column("persona", "datetime_aware", nullable=False)
op.alter_column("persona", "description", existing_type=sa.VARCHAR(), nullable=True)
op.drop_column("persona", "recency_bias")
op.drop_column("persona", "llm_filter_extraction")
op.drop_column("persona", "llm_relevance_filter")
op.drop_column("persona", "search_type")
op.drop_column("persona", "user_id")
op.add_column(
"document_retrieval_feedback",
sa.Column("qa_event_id", sa.INTEGER(), autoincrement=False, nullable=False),
)
op.drop_column("document_retrieval_feedback", "chat_message_id")
op.alter_column(
"chat_session", "persona_id", existing_type=sa.INTEGER(), nullable=True
)
op.drop_column("chat_session", "one_shot")
op.add_column(
"chat_message",
sa.Column(
"message_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
primary_key=True,
),
)
op.add_column(
"chat_message",
sa.Column("latest", sa.BOOLEAN(), autoincrement=False, nullable=False),
)
op.add_column(
"chat_message",
sa.Column(
"edit_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
primary_key=True,
),
)
op.add_column(
"chat_message",
sa.Column(
"reference_docs",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
nullable=True,
),
)
op.add_column(
"chat_message",
sa.Column("persona_id", sa.INTEGER(), autoincrement=False, nullable=True),
)
op.add_column(
"chat_message",
sa.Column(
"parent_edit_number",
sa.INTEGER(),
autoincrement=False,
nullable=True,
),
)
op.create_foreign_key(
"fk_chat_message_persona_id",
"chat_message",
"persona",
["persona_id"],
["id"],
)
op.drop_column("chat_message", "error")
op.drop_column("chat_message", "citations")
op.drop_column("chat_message", "prompt_id")
op.drop_column("chat_message", "rephrased_query")
op.drop_column("chat_message", "latest_child_message")
op.drop_column("chat_message", "parent_message")
op.drop_column("chat_message", "id")
op.add_column(
"chat_feedback",
sa.Column(
"chat_message_message_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
),
)
op.add_column(
"chat_feedback",
sa.Column(
"chat_message_chat_session_id",
sa.INTEGER(),
autoincrement=False,
nullable=False,
primary_key=True,
),
)
op.add_column(
"chat_feedback",
sa.Column(
"chat_message_edit_number",
sa.INTEGER(),
autoincrement=False,
nullable=False,
),
)
op.drop_column("chat_feedback", "chat_message_id")
op.create_table(
"query_event",
sa.Column("id", sa.INTEGER(), autoincrement=True, nullable=False),
sa.Column("query", sa.VARCHAR(), autoincrement=False, nullable=False),
sa.Column(
"selected_search_flow",
sa.VARCHAR(),
autoincrement=False,
nullable=True,
),
sa.Column("llm_answer", sa.VARCHAR(), autoincrement=False, nullable=True),
sa.Column("feedback", sa.VARCHAR(), autoincrement=False, nullable=True),
sa.Column("user_id", sa.UUID(), autoincrement=False, nullable=True),
sa.Column(
"time_created",
postgresql.TIMESTAMP(timezone=True),
server_default=sa.text("now()"),
autoincrement=False,
nullable=False,
),
sa.Column(
"retrieved_document_ids",
postgresql.ARRAY(sa.VARCHAR()),
autoincrement=False,
nullable=True,
),
sa.Column("chat_session_id", sa.INTEGER(), autoincrement=False, nullable=True),
sa.ForeignKeyConstraint(
["chat_session_id"],
["chat_session.id"],
name="fk_query_event_chat_session_id",
),
sa.ForeignKeyConstraint(
["user_id"], ["user.id"], name="query_event_user_id_fkey"
),
sa.PrimaryKeyConstraint("id", name="query_event_pkey"),
)
op.drop_table("chat_message__search_doc")
op.drop_table("persona__prompt")
op.drop_table("prompt")
op.drop_table("search_doc")
op.create_unique_constraint(
"uq_chat_message_combination",
"chat_message",
["chat_session_id", "message_number", "edit_number"],
)
op.create_foreign_key(
"chat_feedback_chat_message_chat_session_id_chat_message_me_fkey",
"chat_feedback",
"chat_message",
[
"chat_message_chat_session_id",
"chat_message_message_number",
"chat_message_edit_number",
],
["chat_session_id", "message_number", "edit_number"],
)
op.create_foreign_key(
"document_retrieval_feedback_qa_event_id_fkey",
"document_retrieval_feedback",
"query_event",
["qa_event_id"],
["id"],
)
op.execute("DROP TYPE IF EXISTS searchtype")
op.execute("DROP TYPE IF EXISTS recencybiassetting")
op.execute("DROP TYPE IF EXISTS documentsource")

View File

@@ -0,0 +1,26 @@
"""Add llm_model_version_override to Persona
Revision ID: baf71f781b9e
Revises: 50b683a8295c
Create Date: 2023-12-06 21:56:50.286158
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "baf71f781b9e"
down_revision = "50b683a8295c"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"persona",
sa.Column("llm_model_version_override", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("persona", "llm_model_version_override")

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "d5645c915d0e"
down_revision = "8e26726b7683"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -11,8 +11,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "d61e513bef0a"
down_revision = "46625e4745d4"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "d7111c1238cd"
down_revision = "465f78d9b7f9"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -13,8 +13,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "d929f0c1c6af"
down_revision = "8aabb57f3b49"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "dba7f71618f5"
down_revision = "d5645c915d0e"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,139 @@
"""Embedding Models
Revision ID: dbaa756c2ccf
Revises: 7f726bad5367
Create Date: 2024-01-25 17:12:31.813160
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy import table, column, String, Integer, Boolean
from danswer.db.embedding_model import (
get_new_default_embedding_model,
get_old_default_embedding_model,
user_has_overridden_embedding_model,
)
from danswer.db.models import IndexModelStatus
# revision identifiers, used by Alembic.
revision = "dbaa756c2ccf"
down_revision = "7f726bad5367"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"embedding_model",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("model_name", sa.String(), nullable=False),
sa.Column("model_dim", sa.Integer(), nullable=False),
sa.Column("normalize", sa.Boolean(), nullable=False),
sa.Column("query_prefix", sa.String(), nullable=False),
sa.Column("passage_prefix", sa.String(), nullable=False),
sa.Column("index_name", sa.String(), nullable=False),
sa.Column(
"status",
sa.Enum(IndexModelStatus, native=False),
nullable=False,
),
sa.PrimaryKeyConstraint("id"),
)
# since all index attempts must be associated with an embedding model,
# need to put something in here to avoid nulls. On server startup,
# this value will be overriden
EmbeddingModel = table(
"embedding_model",
column("id", Integer),
column("model_name", String),
column("model_dim", Integer),
column("normalize", Boolean),
column("query_prefix", String),
column("passage_prefix", String),
column("index_name", String),
column(
"status", sa.Enum(IndexModelStatus, name="indexmodelstatus", native=False)
),
)
# insert an embedding model row that corresponds to the embedding model
# the user selected via env variables before this change. This is needed since
# all index_attempts must be associated with an embedding model, so without this
# we will run into violations of non-null contraints
old_embedding_model = get_old_default_embedding_model()
op.bulk_insert(
EmbeddingModel,
[
{
"model_name": old_embedding_model.model_name,
"model_dim": old_embedding_model.model_dim,
"normalize": old_embedding_model.normalize,
"query_prefix": old_embedding_model.query_prefix,
"passage_prefix": old_embedding_model.passage_prefix,
"index_name": old_embedding_model.index_name,
"status": old_embedding_model.status,
}
],
)
# if the user has not overridden the default embedding model via env variables,
# insert the new default model into the database to auto-upgrade them
if not user_has_overridden_embedding_model():
new_embedding_model = get_new_default_embedding_model(is_present=False)
op.bulk_insert(
EmbeddingModel,
[
{
"model_name": new_embedding_model.model_name,
"model_dim": new_embedding_model.model_dim,
"normalize": new_embedding_model.normalize,
"query_prefix": new_embedding_model.query_prefix,
"passage_prefix": new_embedding_model.passage_prefix,
"index_name": new_embedding_model.index_name,
"status": IndexModelStatus.FUTURE,
}
],
)
op.add_column(
"index_attempt",
sa.Column("embedding_model_id", sa.Integer(), nullable=True),
)
op.execute(
"UPDATE index_attempt SET embedding_model_id=1 WHERE embedding_model_id IS NULL"
)
op.alter_column(
"index_attempt",
"embedding_model_id",
existing_type=sa.Integer(),
nullable=False,
)
op.create_foreign_key(
"index_attempt__embedding_model_fk",
"index_attempt",
"embedding_model",
["embedding_model_id"],
["id"],
)
op.create_index(
"ix_embedding_model_present_unique",
"embedding_model",
["status"],
unique=True,
postgresql_where=sa.text("status = 'PRESENT'"),
)
op.create_index(
"ix_embedding_model_future_unique",
"embedding_model",
["status"],
unique=True,
postgresql_where=sa.text("status = 'FUTURE'"),
)
def downgrade() -> None:
op.drop_constraint(
"index_attempt__embedding_model_fk", "index_attempt", type_="foreignkey"
)
op.drop_column("index_attempt", "embedding_model_id")
op.drop_table("embedding_model")
op.execute("DROP TYPE indexmodelstatus;")

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "df0c7ad8a076"
down_revision = "d7111c1238cd"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -11,8 +11,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "e0a68a81d434"
down_revision = "ae62505e3acc"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,38 @@
"""No Source Enum
Revision ID: e50154680a5c
Revises: fcd135795f21
Create Date: 2024-03-14 18:06:08.523106
"""
from alembic import op
import sqlalchemy as sa
from danswer.configs.constants import DocumentSource
# revision identifiers, used by Alembic.
revision = "e50154680a5c"
down_revision = "fcd135795f21"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.alter_column(
"search_doc",
"source_type",
type_=sa.String(length=50),
existing_type=sa.Enum(DocumentSource, native_enum=False),
existing_nullable=False,
)
op.execute("DROP TYPE IF EXISTS documentsource")
def downgrade() -> None:
op.alter_column(
"search_doc",
"source_type",
type_=sa.Enum(DocumentSource, native_enum=False),
existing_type=sa.String(length=50),
existing_nullable=False,
)

View File

@@ -11,8 +11,8 @@ from alembic import op
# revision identifiers, used by Alembic.
revision = "e6a4bbc13fe4"
down_revision = "b082fec533f0"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,27 @@
"""Add persona to chat_session
Revision ID: e86866a9c78a
Revises: 80696cf850ae
Create Date: 2023-11-26 02:51:47.657357
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "e86866a9c78a"
down_revision = "80696cf850ae"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column("chat_session", sa.Column("persona_id", sa.Integer(), nullable=True))
op.create_foreign_key(
"fk_chat_session_persona_id", "chat_session", "persona", ["persona_id"], ["id"]
)
def downgrade() -> None:
op.drop_constraint("fk_chat_session_persona_id", "chat_session", type_="foreignkey")
op.drop_column("chat_session", "persona_id")

View File

@@ -0,0 +1,118 @@
"""Private Personas DocumentSets
Revision ID: e91df4e935ef
Revises: 91fd3b470d1a
Create Date: 2024-03-17 11:47:24.675881
"""
import fastapi_users_db_sqlalchemy
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "e91df4e935ef"
down_revision = "91fd3b470d1a"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.create_table(
"document_set__user",
sa.Column("document_set_id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.ForeignKeyConstraint(
["document_set_id"],
["document_set.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("document_set_id", "user_id"),
)
op.create_table(
"persona__user",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column(
"user_id",
fastapi_users_db_sqlalchemy.generics.GUID(),
nullable=False,
),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["user_id"],
["user.id"],
),
sa.PrimaryKeyConstraint("persona_id", "user_id"),
)
op.create_table(
"document_set__user_group",
sa.Column("document_set_id", sa.Integer(), nullable=False),
sa.Column(
"user_group_id",
sa.Integer(),
nullable=False,
),
sa.ForeignKeyConstraint(
["document_set_id"],
["document_set.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("document_set_id", "user_group_id"),
)
op.create_table(
"persona__user_group",
sa.Column("persona_id", sa.Integer(), nullable=False),
sa.Column(
"user_group_id",
sa.Integer(),
nullable=False,
),
sa.ForeignKeyConstraint(
["persona_id"],
["persona.id"],
),
sa.ForeignKeyConstraint(
["user_group_id"],
["user_group.id"],
),
sa.PrimaryKeyConstraint("persona_id", "user_group_id"),
)
op.add_column(
"document_set",
sa.Column("is_public", sa.Boolean(), nullable=True),
)
# fill in is_public for existing rows
op.execute("UPDATE document_set SET is_public = true WHERE is_public IS NULL")
op.alter_column("document_set", "is_public", nullable=False)
op.add_column(
"persona",
sa.Column("is_public", sa.Boolean(), nullable=True),
)
# fill in is_public for existing rows
op.execute("UPDATE persona SET is_public = true WHERE is_public IS NULL")
op.alter_column("persona", "is_public", nullable=False)
def downgrade() -> None:
op.drop_column("persona", "is_public")
op.drop_column("document_set", "is_public")
op.drop_table("persona__user")
op.drop_table("document_set__user")
op.drop_table("persona__user_group")
op.drop_table("document_set__user_group")

View File

@@ -0,0 +1,27 @@
"""Index From Beginning
Revision ID: ec3ec2eabf7b
Revises: dbaa756c2ccf
Create Date: 2024-02-06 22:03:28.098158
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "ec3ec2eabf7b"
down_revision = "dbaa756c2ccf"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"index_attempt", sa.Column("from_beginning", sa.Boolean(), nullable=True)
)
op.execute("UPDATE index_attempt SET from_beginning = False")
op.alter_column("index_attempt", "from_beginning", nullable=False)
def downgrade() -> None:
op.drop_column("index_attempt", "from_beginning")

View File

@@ -0,0 +1,40 @@
"""Add overrides to the chat session
Revision ID: ecab2b3f1a3b
Revises: 38eda64af7fe
Create Date: 2024-04-01 19:08:21.359102
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "ecab2b3f1a3b"
down_revision = "38eda64af7fe"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_session",
sa.Column(
"llm_override",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
op.add_column(
"chat_session",
sa.Column(
"prompt_override",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
),
)
def downgrade() -> None:
op.drop_column("chat_session", "prompt_override")
op.drop_column("chat_session", "llm_override")

View File

@@ -0,0 +1,27 @@
"""Add files to ChatMessage
Revision ID: ef7da92f7213
Revises: 401c1ac29467
Create Date: 2024-04-28 16:59:33.199153
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "ef7da92f7213"
down_revision = "401c1ac29467"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_message",
sa.Column("files", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
)
def downgrade() -> None:
op.drop_column("chat_message", "files")

View File

@@ -0,0 +1,25 @@
"""Add pre-defined feedback
Revision ID: f1c6478c3fd8
Revises: 643a84a42a33
Create Date: 2024-05-09 18:11:49.210667
"""
from alembic import op
import sqlalchemy as sa
revision = "f1c6478c3fd8"
down_revision = "643a84a42a33"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"chat_feedback",
sa.Column("predefined_feedback", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("chat_feedback", "predefined_feedback")

View File

@@ -0,0 +1,39 @@
"""Delete Tags with wrong Enum
Revision ID: fad14119fb92
Revises: 72bdc9929a46
Create Date: 2024-04-25 17:05:09.695703
"""
from alembic import op
# revision identifiers, used by Alembic.
revision = "fad14119fb92"
down_revision = "72bdc9929a46"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
# Some documents may lose their tags but this is the only way as the enum
# mapping may have changed since tag switched to string (it will be reindexed anyway)
op.execute(
"""
DELETE FROM document__tag
WHERE tag_id IN (
SELECT id FROM tag
WHERE source ~ '^[0-9]+$'
)
"""
)
op.execute(
"""
DELETE FROM tag
WHERE source ~ '^[0-9]+$'
"""
)
def downgrade() -> None:
pass

View File

@@ -0,0 +1,39 @@
"""Add slack bot display type
Revision ID: fcd135795f21
Revises: 0a2b51deb0b8
Create Date: 2024-03-04 17:03:27.116284
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "fcd135795f21"
down_revision = "0a2b51deb0b8"
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:
op.add_column(
"slack_bot_config",
sa.Column(
"response_type",
sa.Enum(
"QUOTES",
"CITATIONS",
name="slackbotresponsetype",
native_enum=False,
),
nullable=True,
),
)
op.execute(
"UPDATE slack_bot_config SET response_type = 'QUOTES' WHERE response_type IS NULL"
)
op.alter_column("slack_bot_config", "response_type", nullable=False)
def downgrade() -> None:
op.drop_column("slack_bot_config", "response_type")

View File

@@ -12,8 +12,8 @@ import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = "febe9eaa0644"
down_revision = "57b53544726e"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -12,8 +12,8 @@ from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision = "ffc707a226b4"
down_revision = "30c1d5744104"
branch_labels = None
depends_on = None
branch_labels: None = None
depends_on: None = None
def upgrade() -> None:

View File

@@ -0,0 +1,3 @@
import os
__version__ = os.environ.get("DANSWER_VERSION", "") or "0.3-dev"

View File

@@ -4,7 +4,7 @@ from danswer.access.models import DocumentAccess
from danswer.configs.constants import PUBLIC_DOC_PAT
from danswer.db.document import get_acccess_info_for_documents
from danswer.db.models import User
from danswer.server.models import ConnectorCredentialPairIdentifier
from danswer.server.documents.models import ConnectorCredentialPairIdentifier
from danswer.utils.variable_functionality import fetch_versioned_implementation

View File

@@ -23,24 +23,28 @@ from fastapi_users.authentication import CookieTransport
from fastapi_users.authentication import Strategy
from fastapi_users.authentication.strategy.db import AccessTokenDatabase
from fastapi_users.authentication.strategy.db import DatabaseStrategy
from fastapi_users.db import SQLAlchemyUserDatabase
from fastapi_users.openapi import OpenAPIResponseType
from fastapi_users_db_sqlalchemy import SQLAlchemyUserDatabase
from sqlalchemy.orm import Session
from danswer.auth.schemas import UserCreate
from danswer.auth.schemas import UserRole
from danswer.configs.app_configs import AUTH_TYPE
from danswer.configs.app_configs import DISABLE_AUTH
from danswer.configs.app_configs import EMAIL_FROM
from danswer.configs.app_configs import REQUIRE_EMAIL_VERIFICATION
from danswer.configs.app_configs import SECRET
from danswer.configs.app_configs import SESSION_EXPIRE_TIME_SECONDS
from danswer.configs.app_configs import SMTP_PASS
from danswer.configs.app_configs import SMTP_PORT
from danswer.configs.app_configs import SMTP_SERVER
from danswer.configs.app_configs import SMTP_USER
from danswer.configs.app_configs import USER_AUTH_SECRET
from danswer.configs.app_configs import VALID_EMAIL_DOMAINS
from danswer.configs.app_configs import WEB_DOMAIN
from danswer.configs.constants import AuthType
from danswer.configs.constants import DANSWER_API_KEY_DUMMY_EMAIL_DOMAIN
from danswer.configs.constants import DANSWER_API_KEY_PREFIX
from danswer.configs.constants import UNNAMED_KEY_PLACEHOLDER
from danswer.db.auth import get_access_token_db
from danswer.db.auth import get_user_count
from danswer.db.auth import get_user_db
@@ -48,6 +52,8 @@ from danswer.db.engine import get_session
from danswer.db.models import AccessToken
from danswer.db.models import User
from danswer.utils.logger import setup_logger
from danswer.utils.telemetry import optional_telemetry
from danswer.utils.telemetry import RecordType
from danswer.utils.variable_functionality import fetch_versioned_implementation
@@ -66,6 +72,26 @@ def verify_auth_setting() -> None:
logger.info(f"Using Auth Type: {AUTH_TYPE.value}")
def get_display_email(email: str | None, space_less: bool = False) -> str:
if email and email.endswith(DANSWER_API_KEY_DUMMY_EMAIL_DOMAIN):
name = email.split("@")[0]
if name == DANSWER_API_KEY_PREFIX + UNNAMED_KEY_PLACEHOLDER:
return "Unnamed API Key"
if space_less:
return name
return name.replace("API_KEY__", "API Key: ")
return email or ""
def user_needs_to_be_verified() -> bool:
# all other auth types besides basic should require users to be
# verified
return AUTH_TYPE != AuthType.BASIC or REQUIRE_EMAIL_VERIFICATION
def get_user_whitelist() -> list[str]:
global _user_whitelist
if _user_whitelist is None:
@@ -99,13 +125,18 @@ def verify_email_domain(email: str) -> None:
)
def send_user_verification_email(user_email: str, token: str) -> None:
def send_user_verification_email(
user_email: str,
token: str,
mail_from: str = EMAIL_FROM,
) -> None:
msg = MIMEMultipart()
msg["Subject"] = "Danswer Email Verification"
msg["From"] = "no-reply@danswer.dev"
msg["To"] = user_email
if mail_from:
msg["From"] = mail_from
link = f"{WEB_DOMAIN}/verify-email?token={token}"
link = f"{WEB_DOMAIN}/auth/verify-email?token={token}"
body = MIMEText(f"Click the following link to verify your email address: {link}")
msg.attach(body)
@@ -119,8 +150,8 @@ def send_user_verification_email(user_email: str, token: str) -> None:
class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
reset_password_token_secret = SECRET
verification_token_secret = SECRET
reset_password_token_secret = USER_AUTH_SECRET
verification_token_secret = USER_AUTH_SECRET
async def create(
self,
@@ -170,6 +201,11 @@ class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
self, user: User, request: Optional[Request] = None
) -> None:
logger.info(f"User {user.id} has registered.")
optional_telemetry(
record_type=RecordType.SIGN_UP,
data={"action": "create"},
user_id=str(user.id),
)
async def on_after_forgot_password(
self, user: User, token: str, request: Optional[Request] = None
@@ -194,7 +230,10 @@ async def get_user_manager(
yield UserManager(user_db)
cookie_transport = CookieTransport(cookie_max_age=SESSION_EXPIRE_TIME_SECONDS)
cookie_transport = CookieTransport(
cookie_max_age=SESSION_EXPIRE_TIME_SECONDS,
cookie_secure=WEB_DOMAIN.startswith("https"),
)
def get_database_strategy(
@@ -253,15 +292,36 @@ fastapi_users = FastAPIUserWithLogoutRouter[User, uuid.UUID](
)
optional_valid_user = fastapi_users.current_user(
active=True, verified=REQUIRE_EMAIL_VERIFICATION, optional=True
)
# NOTE: verified=REQUIRE_EMAIL_VERIFICATION is not used here since we
# take care of that in `double_check_user` ourself. This is needed, since
# we want the /me endpoint to still return a user even if they are not
# yet verified, so that the frontend knows they exist
optional_fastapi_current_user = fastapi_users.current_user(active=True, optional=True)
async def double_check_user(
async def optional_user_(
request: Request,
user: User | None,
db_session: Session,
) -> User | None:
"""NOTE: `request` and `db_session` are not used here, but are included
for the EE version of this function."""
return user
async def optional_user(
request: Request,
user: User | None = Depends(optional_fastapi_current_user),
db_session: Session = Depends(get_session),
) -> User | None:
versioned_fetch_user = fetch_versioned_implementation(
"danswer.auth.users", "optional_user_"
)
return await versioned_fetch_user(request, user, db_session)
async def double_check_user(
user: User | None,
optional: bool = DISABLE_AUTH,
) -> User | None:
if optional:
@@ -273,19 +333,19 @@ async def double_check_user(
detail="Access denied. User is not authenticated.",
)
if user_needs_to_be_verified() and not user.is_verified:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied. User is not verified.",
)
return user
async def current_user(
request: Request,
user: User | None = Depends(optional_valid_user),
db_session: Session = Depends(get_session),
user: User | None = Depends(optional_user),
) -> User | None:
double_check_user = fetch_versioned_implementation(
"danswer.auth.users", "double_check_user"
)
user = await double_check_user(request, user, db_session)
return user
return await double_check_user(user)
async def current_admin_user(user: User | None = Depends(current_user)) -> User | None:

View File

@@ -1,6 +1,4 @@
import os
from datetime import timedelta
from pathlib import Path
from typing import cast
from celery import Celery # type: ignore
@@ -10,16 +8,14 @@ from danswer.background.connector_deletion import delete_connector_credential_pa
from danswer.background.task_utils import build_celery_task_wrapper
from danswer.background.task_utils import name_cc_cleanup_task
from danswer.background.task_utils import name_document_set_sync_task
from danswer.configs.app_configs import FILE_CONNECTOR_TMP_STORAGE_PATH
from danswer.configs.app_configs import JOB_TIMEOUT
from danswer.connectors.file.utils import file_age_in_hours
from danswer.db.connector_credential_pair import get_connector_credential_pair
from danswer.db.deletion_attempt import check_deletion_attempt_is_allowed
from danswer.db.document import prepare_to_modify_documents
from danswer.db.document_set import delete_document_set
from danswer.db.document_set import fetch_document_sets
from danswer.db.document_set import fetch_document_sets_for_documents
from danswer.db.document_set import fetch_documents_for_document_set
from danswer.db.document_set import fetch_documents_for_document_set_paginated
from danswer.db.document_set import get_document_set_by_id
from danswer.db.document_set import mark_document_set_as_synced
from danswer.db.engine import build_connection_string
@@ -28,20 +24,20 @@ from danswer.db.engine import SYNC_DB_API
from danswer.db.models import DocumentSet
from danswer.db.tasks import check_live_task_not_timed_out
from danswer.db.tasks import get_latest_task
from danswer.document_index.document_index_utils import get_both_index_names
from danswer.document_index.factory import get_default_document_index
from danswer.document_index.interfaces import DocumentIndex
from danswer.document_index.interfaces import UpdateRequest
from danswer.utils.batching import batch_generator
from danswer.utils.logger import setup_logger
logger = setup_logger()
celery_broker_url = "sqla+" + build_connection_string(db_api=SYNC_DB_API)
celery_backend_url = "db+" + build_connection_string(db_api=SYNC_DB_API)
connection_string = build_connection_string(db_api=SYNC_DB_API)
celery_broker_url = f"sqla+{connection_string}"
celery_backend_url = f"db+{connection_string}"
celery_app = Celery(__name__, broker=celery_broker_url, backend=celery_backend_url)
_SYNC_BATCH_SIZE = 1000
_SYNC_BATCH_SIZE = 100
#####
@@ -66,20 +62,25 @@ def cleanup_connector_credential_pair_task(
connector_id=connector_id,
credential_id=credential_id,
)
if not cc_pair or not check_deletion_attempt_is_allowed(
connector_credential_pair=cc_pair
):
if not cc_pair:
raise ValueError(
"Cannot run deletion attempt - connector_credential_pair is not deletable. "
"This is likely because there is an ongoing / planned indexing attempt OR the "
"connector is not disabled."
f"Cannot run deletion attempt - connector_credential_pair with Connector ID: "
f"{connector_id} and Credential ID: {credential_id} does not exist."
)
deletion_attempt_disallowed_reason = check_deletion_attempt_is_allowed(cc_pair)
if deletion_attempt_disallowed_reason:
raise ValueError(deletion_attempt_disallowed_reason)
try:
# The bulk of the work is in here, updates Postgres and Vespa
curr_ind_name, sec_ind_name = get_both_index_names(db_session)
document_index = get_default_document_index(
primary_index_name=curr_ind_name, secondary_index_name=sec_ind_name
)
return delete_connector_credential_pair(
db_session=db_session,
document_index=get_default_document_index(),
document_index=document_index,
cc_pair=cc_pair,
)
except Exception as e:
@@ -93,17 +94,13 @@ def sync_document_set_task(document_set_id: int) -> None:
"""For document sets marked as not up to date, sync the state from postgres
into the datastore. Also handles deletions."""
def _sync_document_batch(
document_ids: list[str], document_index: DocumentIndex
) -> None:
def _sync_document_batch(document_ids: list[str], db_session: Session) -> None:
logger.debug(f"Syncing document sets for: {document_ids}")
# begin a transaction, release lock at the end
with Session(get_sqlalchemy_engine()) as db_session:
# acquires a lock on the documents so that no other process can modify them
prepare_to_modify_documents(
db_session=db_session, document_ids=document_ids
)
# Acquires a lock on the documents so that no other process can modify them
with prepare_to_modify_documents(
db_session=db_session, document_ids=document_ids
):
# get current state of document sets for these documents
document_set_map = {
document_id: document_sets
@@ -113,31 +110,36 @@ def sync_document_set_task(document_set_id: int) -> None:
}
# update Vespa
document_index.update(
update_requests=[
UpdateRequest(
document_ids=[document_id],
document_sets=set(document_set_map.get(document_id, [])),
)
for document_id in document_ids
]
curr_ind_name, sec_ind_name = get_both_index_names(db_session)
document_index = get_default_document_index(
primary_index_name=curr_ind_name, secondary_index_name=sec_ind_name
)
update_requests = [
UpdateRequest(
document_ids=[document_id],
document_sets=set(document_set_map.get(document_id, [])),
)
for document_id in document_ids
]
document_index.update(update_requests=update_requests)
with Session(get_sqlalchemy_engine()) as db_session:
try:
document_index = get_default_document_index()
documents_to_update = fetch_documents_for_document_set(
document_set_id=document_set_id,
db_session=db_session,
current_only=False,
)
for document_batch in batch_generator(
documents_to_update, _SYNC_BATCH_SIZE
):
cursor = None
while True:
document_batch, cursor = fetch_documents_for_document_set_paginated(
document_set_id=document_set_id,
db_session=db_session,
current_only=False,
last_document_id=cursor,
limit=_SYNC_BATCH_SIZE,
)
_sync_document_batch(
document_ids=[document.id for document in document_batch],
document_index=document_index,
db_session=db_session,
)
if cursor is None:
break
# if there are no connectors, then delete the document set. Otherwise, just
# mark it as successfully synced.
@@ -178,7 +180,7 @@ def check_for_document_sets_sync_task() -> None:
with Session(get_sqlalchemy_engine()) as db_session:
# check if any document sets are not synced
document_set_info = fetch_document_sets(
db_session=db_session, include_outdated=True
user_id=None, db_session=db_session, include_outdated=True
)
for document_set, _ in document_set_info:
if not document_set.is_up_to_date:
@@ -199,19 +201,6 @@ def check_for_document_sets_sync_task() -> None:
)
@celery_app.task(name="clean_old_temp_files_task", soft_time_limit=JOB_TIMEOUT)
def clean_old_temp_files_task(
age_threshold_in_hours: float | int = 24 * 7, # 1 week,
base_path: Path | str = FILE_CONNECTOR_TMP_STORAGE_PATH,
) -> None:
"""Files added via the File connector need to be deleted after ingestion
Currently handled async of the indexing job"""
os.makedirs(base_path, exist_ok=True)
for file in os.listdir(base_path):
if file_age_in_hours(file) > age_threshold_in_hours:
os.remove(Path(base_path) / file)
#####
# Celery Beat (Periodic Tasks) Settings
#####
@@ -220,8 +209,4 @@ celery_app.conf.beat_schedule = {
"task": "check_for_document_sets_sync_task",
"schedule": timedelta(seconds=5),
},
"clean-old-temp-files": {
"task": "clean_old_temp_files_task",
"schedule": timedelta(minutes=30),
},
}

View File

@@ -2,7 +2,7 @@ from sqlalchemy.orm import Session
from danswer.background.task_utils import name_cc_cleanup_task
from danswer.db.tasks import get_latest_task
from danswer.server.models import DeletionAttemptSnapshot
from danswer.server.documents.models import DeletionAttemptSnapshot
def get_deletion_status(

View File

@@ -11,8 +11,6 @@ connector / credential pair from the access list
(6) delete all relevant entries from postgres
"""
import time
from collections.abc import Callable
from typing import cast
from sqlalchemy.orm import Session
@@ -21,8 +19,8 @@ from danswer.db.connector import fetch_connector_by_id
from danswer.db.connector_credential_pair import (
delete_connector_credential_pair__no_commit,
)
from danswer.db.document import delete_document_by_connector_credential_pair
from danswer.db.document import delete_documents_complete
from danswer.db.document import delete_document_by_connector_credential_pair__no_commit
from danswer.db.document import delete_documents_complete__no_commit
from danswer.db.document import get_document_connector_cnts
from danswer.db.document import get_documents_for_connector_credential_pair
from danswer.db.document import prepare_to_modify_documents
@@ -35,9 +33,8 @@ from danswer.db.index_attempt import delete_index_attempts
from danswer.db.models import ConnectorCredentialPair
from danswer.document_index.interfaces import DocumentIndex
from danswer.document_index.interfaces import UpdateRequest
from danswer.server.models import ConnectorCredentialPairIdentifier
from danswer.server.documents.models import ConnectorCredentialPairIdentifier
from danswer.utils.logger import setup_logger
from danswer.utils.variable_functionality import fetch_versioned_implementation
logger = setup_logger()
@@ -50,56 +47,65 @@ def _delete_connector_credential_pair_batch(
credential_id: int,
document_index: DocumentIndex,
) -> None:
"""
Removes a batch of documents ids from a cc-pair. If no other cc-pair uses a document anymore
it gets permanently deleted.
"""
with Session(get_sqlalchemy_engine()) as db_session:
# acquire lock for all documents in this batch so that indexing can't
# override the deletion
prepare_to_modify_documents(db_session=db_session, document_ids=document_ids)
document_connector_cnts = get_document_connector_cnts(
with prepare_to_modify_documents(
db_session=db_session, document_ids=document_ids
)
# figure out which docs need to be completely deleted
document_ids_to_delete = [
document_id for document_id, cnt in document_connector_cnts if cnt == 1
]
logger.debug(f"Deleting documents: {document_ids_to_delete}")
document_index.delete(doc_ids=document_ids_to_delete)
delete_documents_complete(
db_session=db_session,
document_ids=document_ids_to_delete,
)
# figure out which docs need to be updated
document_ids_to_update = [
document_id for document_id, cnt in document_connector_cnts if cnt > 1
]
access_for_documents = get_access_for_documents(
document_ids=document_ids_to_update,
db_session=db_session,
cc_pair_to_delete=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
update_requests = [
UpdateRequest(
document_ids=[document_id],
access=access,
):
document_connector_cnts = get_document_connector_cnts(
db_session=db_session, document_ids=document_ids
)
for document_id, access in access_for_documents.items()
]
logger.debug(f"Updating documents: {document_ids_to_update}")
document_index.update(update_requests=update_requests)
delete_document_by_connector_credential_pair(
db_session=db_session,
document_ids=document_ids_to_update,
connector_credential_pair_identifier=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
db_session.commit()
# figure out which docs need to be completely deleted
document_ids_to_delete = [
document_id for document_id, cnt in document_connector_cnts if cnt == 1
]
logger.debug(f"Deleting documents: {document_ids_to_delete}")
document_index.delete(doc_ids=document_ids_to_delete)
delete_documents_complete__no_commit(
db_session=db_session,
document_ids=document_ids_to_delete,
)
# figure out which docs need to be updated
document_ids_to_update = [
document_id for document_id, cnt in document_connector_cnts if cnt > 1
]
access_for_documents = get_access_for_documents(
document_ids=document_ids_to_update,
db_session=db_session,
cc_pair_to_delete=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
update_requests = [
UpdateRequest(
document_ids=[document_id],
access=access,
)
for document_id, access in access_for_documents.items()
]
logger.debug(f"Updating documents: {document_ids_to_update}")
document_index.update(update_requests=update_requests)
delete_document_by_connector_credential_pair__no_commit(
db_session=db_session,
document_ids=document_ids_to_update,
connector_credential_pair_identifier=ConnectorCredentialPairIdentifier(
connector_id=connector_id,
credential_id=credential_id,
),
)
db_session.commit()
def cleanup_synced_entities(
@@ -173,14 +179,8 @@ def delete_connector_credential_pair(
# Clean up document sets / access information from Postgres
# and sync these updates to Vespa
cleanup_synced_entities__versioned = cast(
Callable[[ConnectorCredentialPair, Session], None],
fetch_versioned_implementation(
"danswer.background.connector_deletion",
"cleanup_synced_entities",
),
)
cleanup_synced_entities__versioned(cc_pair, db_session)
# TODO: add user group cleanup with `fetch_versioned_implementation`
cleanup_synced_entities(cc_pair, db_session)
# clean up the rest of the related Postgres entities
delete_index_attempts(

View File

@@ -0,0 +1,80 @@
"""Experimental functionality related to splitting up indexing
into a series of checkpoints to better handle intermittent failures
/ jobs being killed by cloud providers."""
import datetime
from danswer.configs.app_configs import EXPERIMENTAL_CHECKPOINTING_ENABLED
from danswer.configs.constants import DocumentSource
from danswer.connectors.cross_connector_utils.miscellaneous_utils import datetime_to_utc
def _2010_dt() -> datetime.datetime:
return datetime.datetime(year=2010, month=1, day=1, tzinfo=datetime.timezone.utc)
def _2020_dt() -> datetime.datetime:
return datetime.datetime(year=2020, month=1, day=1, tzinfo=datetime.timezone.utc)
def _default_end_time(
last_successful_run: datetime.datetime | None,
) -> datetime.datetime:
"""If year is before 2010, go to the beginning of 2010.
If year is 2010-2020, go in 5 year increments.
If year > 2020, then go in 180 day increments.
For connectors that don't support a `filter_by` and instead rely on `sort_by`
for polling, then this will cause a massive duplication of fetches. For these
connectors, you may want to override this function to return a more reasonable
plan (e.g. extending the 2020+ windows to 6 months, 1 year, or higher)."""
last_successful_run = (
datetime_to_utc(last_successful_run) if last_successful_run else None
)
if last_successful_run is None or last_successful_run < _2010_dt():
return _2010_dt()
if last_successful_run < _2020_dt():
return min(last_successful_run + datetime.timedelta(days=365 * 5), _2020_dt())
return last_successful_run + datetime.timedelta(days=180)
def find_end_time_for_indexing_attempt(
last_successful_run: datetime.datetime | None,
# source_type can be used to override the default for certain connectors, currently unused
source_type: DocumentSource,
) -> datetime.datetime | None:
"""Is the current time unless the connector is run over a large period, in which case it is
split up into large time segments that become smaller as it approaches the present
"""
# NOTE: source_type can be used to override the default for certain connectors
end_of_window = _default_end_time(last_successful_run)
now = datetime.datetime.now(tz=datetime.timezone.utc)
if end_of_window < now:
return end_of_window
# None signals that we should index up to current time
return None
def get_time_windows_for_index_attempt(
last_successful_run: datetime.datetime, source_type: DocumentSource
) -> list[tuple[datetime.datetime, datetime.datetime]]:
if not EXPERIMENTAL_CHECKPOINTING_ENABLED:
return [(last_successful_run, datetime.datetime.now(tz=datetime.timezone.utc))]
time_windows: list[tuple[datetime.datetime, datetime.datetime]] = []
start_of_window: datetime.datetime | None = last_successful_run
while start_of_window:
end_of_window = find_end_time_for_indexing_attempt(
last_successful_run=start_of_window, source_type=source_type
)
time_windows.append(
(
start_of_window,
end_of_window or datetime.datetime.now(tz=datetime.timezone.utc),
)
)
start_of_window = end_of_window
return time_windows

View File

@@ -0,0 +1,33 @@
import asyncio
import psutil
from dask.distributed import WorkerPlugin
from distributed import Worker
from danswer.utils.logger import setup_logger
logger = setup_logger()
class ResourceLogger(WorkerPlugin):
def __init__(self, log_interval: int = 60 * 5):
self.log_interval = log_interval
def setup(self, worker: Worker) -> None:
"""This method will be called when the plugin is attached to a worker."""
self.worker = worker
worker.loop.add_callback(self.log_resources)
async def log_resources(self) -> None:
"""Periodically log CPU and memory usage.
NOTE: must be async or else will clog up the worker indefinitely due to the fact that
Dask uses Tornado under the hood (which is async)"""
while True:
cpu_percent = psutil.cpu_percent(interval=None)
memory_available_gb = psutil.virtual_memory().available / (1024.0**3)
# You can now log these values or send them to a monitoring service
logger.debug(
f"Worker {self.worker.address}: CPU usage {cpu_percent}%, Memory available {memory_available_gb}GB"
)
await asyncio.sleep(self.log_interval)

Some files were not shown because too many files have changed in this diff Show More