1
0
forked from github/onyx

3096 Commits

Author SHA1 Message Date
SubashMohan
fe029eccae chore: add SharePoint sync environment variables to integration test (#5197)
* chore: add SharePoint sync environment variables to integration test workflows

* fix cubic comments

* test: skip SharePoint permission tests for non-enterprise

* test: update SharePoint permission tests to skip for non-enterprise environments
2025-08-18 03:21:04 +00:00
Wenxi
ea72af7698 fix sharepoint tests (#5209) 2025-08-17 22:25:47 +00:00
Wenxi
17abf85533 fix unpaused user files (#5205) 2025-08-16 01:39:16 +00:00
Wenxi
3bd162acb9 fix: sharepoint tests and indexing logic (#5204)
* don't index onedrive personal sites in sharepoint

* fix sharepoint tests and indexing behavior

* remove print
2025-08-15 18:19:42 -07:00
Evan Lohn
664ce441eb generous timeout between docfetching finishing and docprocessing starting (#5201) 2025-08-15 15:43:01 -07:00
Wenxi
6863fbee54 fix: validate sharepoint connector with validate_connector_settings (#5199)
* validate sharepoint connector with validate_connector_settings

* fix test

* fix tests
2025-08-15 00:38:31 +00:00
Nils
a605bd4ca4 feat: make sharepoint documents and sharepoint pages optional (#5183)
* feat: make sharepoint documents and sharepoint pages optional

* fix: address review feedback for PR #5183

* fix: exclude personal sites from sharepoint connector

---------

Co-authored-by: Nils Kleinrahm <nils.kleinrahm@pledoc.de>
2025-08-14 15:17:23 -07:00
Dominic Feliton
0e8b5af619 fix(connector): user file helm start cmd + legacy file connector incompatibility (#5195)
* Fix user file helm start cmd + legacy file connector incompatibility

* typo

* remove unnecessary logic

* undo

* make recommended changes

* keep comment

* cleanup

* format

---------

Co-authored-by: Dominic Feliton <37809476+dominicfeliton@users.noreply.github.com>
2025-08-14 13:20:19 -07:00
SubashMohan
46f3af4f68 enhance file processing with content type handling (#5196) 2025-08-14 08:59:53 +00:00
Evan Lohn
2af64ebf4c fix: ensure exception strings don't get swallowed (#5192)
* ensure exception strings don't get swallowed

* just send exception code
2025-08-13 20:05:16 +00:00
Evan Lohn
0eb1824158 fix: sf connector docs (#5171)
* fix: sf connector docs

* more sf logs

* better logs and new attempt

* add fields to error temporarily

* fix sf

---------

Co-authored-by: Wenxi <wenxi@onyx.app>
2025-08-13 17:52:32 +00:00
Chris Weaver
e0a9a6fb66 feat: okta profile tool (#5184)
* Initial Okta profile tool

* Improve

* Fix

* Improve

* Improve

* Address EL comments
2025-08-13 09:57:31 -07:00
Wenxi
55dc24fd27 fix: seeded total doc count (#5188)
* fix seeded total doc count

* fix seeded total doc count
2025-08-13 00:19:06 +00:00
Evan Lohn
da02962a67 fix: thread safe approach to docprocessing logging (#5185)
* thread safe approach to docprocessing logging

* unify approaches

* reset
2025-08-12 02:25:47 +00:00
SubashMohan
9bc62cc803 feat: sharepoint perm sync (#5033)
* sharepoint perm sync first draft

* feat: Implement SharePoint permission synchronization

* mypy fix

* remove commented code

* bot comments fixes and job failure fixes

* introduce generic way to upload certificates in credentials

* mypy fix

* add checkpoiting to sharepoint connector

* add sharepoint integration tests

* Refactor SharePoint connector to derive tenant domain from verified domains and remove direct tenant domain input from credentials

* address review comments

* add permission sync to site pages

* mypy fix

* fix tests error

* fix tests and address comments

* Update file extraction behavior in SharePoint connector to continue processing on unprocessable files
2025-08-11 16:59:16 +00:00
Evan Lohn
bf6705a9a5 fix: max tokens param (#5174)
* max tokens param

* fix unit test

* fix unit test
2025-08-11 09:57:44 -07:00
Rei Meguro
df2fef3383 fix: removal of old tags + is_list differentiation (#5147)
* initial migration

* getting metadata from tags

* complete migration

* migration override for cloud

* fix: more robust structured tag gen

* tag and indexing update

* fix: move is_list to tags

* migration rebase

* test cases + bugfix on unique constraint

* fix logging
2025-08-10 22:39:33 +00:00
SubashMohan
8cec3448d7 fix: restrict user file access to current user only (#5177)
* fix: restrict user file access to current user only

* fix: enhance user file access control for recent folder
2025-08-10 19:00:18 +00:00
Wenxi
bacee0d09d fix: sanitize slack payload before logging (#5167)
* sanitize slack payload before logging

* nit
2025-08-08 02:10:00 +00:00
Evan Lohn
297720c132 refactor: file processing (#5136)
* file processing refactor

* mypy

* CW comments

* address CW
2025-08-08 00:34:35 +00:00
Evan Lohn
bd4bd00cef feat: office parsing markitdown (#5115)
* switch to markitdown untested

* passing tests

* reset file

* dotenv version

* docs

* add test file

* add doc

* fix integration test
2025-08-07 23:26:02 +00:00
Wenxi
cf193dee29 feat: support gpt5 models (#5169)
* support gpt5 models

* gpt5mini visible
2025-08-07 12:35:46 -07:00
Evan Lohn
1b47fa2700 fix: remove erroneous error case and add valid error (#5163)
* fix: remove erroneous error case and add valid error

* also address docfetching-docprocessing limbo
2025-08-07 18:17:00 +00:00
Wenxi Onyx
e1a305d18a mask llm api key from logs 2025-08-07 00:01:29 -07:00
Evan Lohn
e2233d22c9 feat: salesforce custom query (#5158)
* WIP merged approach untested

* tested custom configs

* JT comments

* fix unit test

* CW comments

* fix unit test
2025-08-07 02:37:23 +00:00
Wenxi
1b2f4f3b87 fix: slash command slackbot to respond in private msg (#5151)
* fix slash command slackbot to respond in private msg

* rename confusing variable. fix slash message response in DMs
2025-08-05 19:03:38 -07:00
Evan Lohn
d85b55a9d2 no more scheduled stalling (#5154) 2025-08-05 20:17:44 +00:00
Chris Weaver
258e08abcd feat: add customization via env vars for curator role (#5150)
* Add customization via env vars for curator role

* Simplify

* Simplify more

* Address comments
2025-08-05 09:58:36 -07:00
SubashMohan
146628e734 fix unsupported character error in minio migration (#5145)
* fix unsupported character error in minio migration

* slash fix
2025-08-04 12:42:07 -07:00
Wenxi
c1d4b08132 fix: minio file names (#5138)
* nit var clarity

* maintain file names in connector config for display

* remove unused util

* migration draft

* optional file names to not break existing instances

* backwards compatible

* backwards compatible

* migration logging

* update file ocnn tests

* unncessary none

* mypy + explanatory comments
2025-08-01 20:31:29 +00:00
Wenxi
554cd0f891 fix: accept multiple zip types and fallback to extension (#5135)
* accept multiple zip types and fallback to extension

* move zip check to util

* mypy nit
2025-07-30 22:21:16 +00:00
Raunak Bhagat
f87d3e9849 fix: Make ungrounded types have a default name when sending to the frontend (#5133)
* Update names in map-comprehension

* Make default name for ungrounded types public

* Return the default name for ungrounded entity-types

* Update backend/onyx/db/entities.py

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

---------

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-07-30 20:46:30 +00:00
SubashMohan
c442ebaff6 Feature/GitHub permission sync (#4996)
* github perm sync initial draft

* introduce github  doc sync and perm sync

* remove specific start time check

* Refactor GitHub connector to use SlimCheckpointOutputWrapper for improved document handling

* Update GitHub sync frequency defaults from 30 minutes to 5 minutes

* Add stop signal handling and progress reporting in GitHub document sync

* Refactor tests for Confluence and Google Drive connectors to use a mock fetch function for document access

* change the doc_sync approach

* add static typing for ocument columns and where clause

* remove prefix logic in connector runner

* mypy fix

* code review changes

* mypy fix

* fix review comments

* add sort order

* Implement merge heads migration for Alembic and update Confluence and Google Drive test

* github unit tests fix

* delete merge head and rebase the docmetadata field migration

---------

Co-authored-by: Subash <subash@onyx.app>
2025-07-30 02:42:18 +00:00
Justin Tahara
0157ae099a [Vespa] Update to optimized configuration pt.2 (#5113) 2025-07-28 20:42:31 +00:00
justin-tahara
565fb42457 Let's do this properly 2025-07-28 10:42:31 -07:00
justin-tahara
a50a8b4a12 [Vespa] Update to optimized configuration 2025-07-28 10:38:48 -07:00
Evan Lohn
4baf4e7d96 feat: pruning freq (#5097)
* pruning frequency increase

* add logs
2025-07-26 22:29:43 +00:00
Wenxi
8b7ab2eb66 onyx metadata minio fix + permissive unstructured fail (#5085) 2025-07-25 21:26:02 +00:00
Evan Lohn
650884d76a fix: preserve error traces (#5083) 2025-07-25 18:56:11 +00:00
Evan Lohn
71037678c3 attempt to fix parsing of tricky template files (#5080) 2025-07-25 02:18:35 +00:00
Chris Weaver
68de1015e1 feat: support aspx files (#5068)
* Support aspx files

* Add fetching of site pages

* Improve

* Small enhancement

* more improvements

* Improvements

* Fix tests
2025-07-24 19:19:24 -07:00
Evan Lohn
e2b3a6e144 fix: drive external links (#5079) 2025-07-24 17:42:12 -07:00
Evan Lohn
4f04b09efa add library to fall back to for tokenizing (#5078) 2025-07-24 11:15:07 -07:00
SubashMohan
5c4f44d258 fix: sharepoint lg files issue (#5065)
* add SharePoint file size threshold check

* Implement retry logic for SharePoint queries to handle rate limiting and server error

* mypy fix

* add content none check

* remove unreachable code from retry logic in sharepoint connector
2025-07-24 14:26:01 +00:00
Evan Lohn
19652ad60e attempt fix for broken excel files (#5071) 2025-07-24 01:21:13 +00:00
Evan Lohn
70c96b6ab3 fix: remove locks from indexing callback (#5070) 2025-07-23 23:05:35 +00:00
Evan Lohn
bf1e2a2661 feat: avoid full rerun (#5063)
* fix: remove extra group sync

* second extra task

* minor improvement for non-checkpointed connectors
2025-07-23 18:01:23 +00:00
Evan Lohn
991d5e4203 fix: regen api key (#5064) 2025-07-23 03:36:51 +00:00
Evan Lohn
d21f012b04 fix: remove extra group sync (#5061)
* fix: remove extra group sync

* second extra task
2025-07-22 23:24:42 +00:00
Wenxi
86b7beab01 fix: too many internet chunks (#5060)
* minor internet search env vars

* add limit to internet search chunks

* note

* nits
2025-07-22 23:11:10 +00:00