logic: After processing FILES_MD, the stage is reset to PRS instead of moving to the next repository. This will cause an infinite loop
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
New field include_files_md is declared as required, but older connector records won’t contain it, so the type should be optional to avoid undefined values at runtime
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* WIP
* renamed and moved tasks (WIP)
* minio migration
* bug fixes and finally add document batch storage
* WIP: can suceed but status is error
* WIP
* import fixes
* working v1 of decoupled
* catastrophe handling
* refactor
* remove unused db session in prep for new approach
* renaming and docstrings (untested)
* renames
* WIP with no more indexing fences
* robustness improvements
* clean up rebase
* migration and salesforce rate limits
* minor tweaks
* test fix
* connector pausing behavior
* correct checkpoint resumption logic
* cleanups in docfetching
* add heartbeat file
* update template jsonc
* deployment fixes
* fix vespa httpx pool
* error handling
* cosmetic fixes
* dumb
* logging improvements and non checkpointed connector fixes
* didnt save
* misc fixes
* fix import
* fix deletion of old files
* add in attempt prefix
* fix attempt prefix
* tiny log improvement
* minor changes
* fixed resumption behavior
* passing int tests
* fix unit test
* fixed unit tests
* trying timeout bump to see if int tests pass
* trying timeout bump to see if int tests pass
* fix autodiscovery
* helm chart fixes
* helm and logging
* Improve check_for_indexing + check_for_vespa_sync_task
* Remove unused
* Fix
* Simplify query
* Add more logging
* Address bot comments
* Increase # of tasks generated since we're not going cc-pair by cc-pair
* Only index 50 user files at a time
* Add basic structure for frontend email connector
* Update names of credentials-json keys
* Fix up configurations workflow
* Edit logic on how `mail_client` is used
- imaplib.IMAP4_SSL is supposed to be treated as an ephemeral object
* Edit helper name and add docs
* Fix invalid mailbox selection error
* Implement greptile suggestions
* Make recipients optional and add sender to primary-owners
* Add sender to external-access too; perform dedupe-ing of emails
* Simplify logic
* Make constant a global
* Add ability to specify vertex location
* Add period
* Add a hardcoding path to the frontend
* Add docs
* Add default value to `CustomConfigKey`
* Consume default value from custom-config-key on frontend
* Use markdown renderer instead
* Update description
* Remove macro stylings from HTML tree
* Add params
* Handle multiple cases of `ac:structured-macro` being found.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implement fetching; still need to work on document parsing
* Add basic skeleton of parsing email bodies
* Add id field
* Add email body parsing
* Implement checkpointed imap-connector
* Add testing logic for basic iteration
* Add logic to get different header if "to" isn't present
- possible in mailing-list workflows
* Add ability to index specific mailboxes
* Add breaking when indexing has been fully exhausted
* Sanitize all mailbox names + add space between stripped strings after parsing
* Add multi-recipient parsing
* Change around semantic-identifier and title
* Add imap tests
* Add recipients and content assertions to tests
* Add envvars to github actions workflow file
* Remove encoding header
* Update logic to not immediately establish connection upon init of `ImapConnector`
* Add start and end datetime filtering + edit when connection is established / how login is done
* Remove content-type header
* Add note about guards
* Change default parameters to be `None` instead of `[]`
* Address comment on PR
* Implement more PR suggestions
* More PR suggestions
* Implement more PR suggestions
* Change up login/logout flow (PR suggestion)
* Move port number to be envvar
* Make globals variants in enum instead (PR suggestion)
* Fix more documentation related suggestions on PR
* Have the imap connector implement `CheckpointedConnectorWithPermSync` instead
* Add helper for loading all docs with permission syncing
* fixed id extraction in drive connector
* WIP migration
* full migration script
* migration works single tenant without duplicates
* tested single tenant with duplicate docs
* migrations and frontend
* tested mutlitenant
* fix connector tests
* make tests pass
* Fix bug with incorrect model icon being shown
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/chat/input/LLMPopover.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add visibility to filtering
* Update the model names which are shown in the popup
* Fix incorrect llm updating bug
* Fix bug in which the provider name would be used instead
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add new convenience method
* Fix bug in which emails would be fetched for initial indexing
* Improve tests for MS Teams connector
* Fix test_gdrive_perm_sync_with_real_data patching
* Protect against incorrect truthiness
---------
Co-authored-by: Weves <chrisweaver101@gmail.com>
* feat: move vespa at end in try block
* simplify query
* mypy
* added order by just in case for consistent pagination
* liveness probe
* kg_p check for both extraction and clustering
* fix: better vespa logging
* Add function stubs for Teams
* Implement more boilerplate code
* Change structure of helper functions
* Implement teams perms for the initial index
* Make private functions start with underscore
* Implement slim_doc retrieval and fix up doc_sync
* Simplify how doc-sync is done
* Refactor jira doc-sync
* Make locally used function start with an underscore
* Update backend/ee/onyx/configs/app_configs.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add docstring to helper function
* Update tests
* Add an expected failure
* Address comment on PR
* Skip expert-info if user does not have a display-name
* Add doc comments
* Fix error in generic_doc_sync
* Move callback invocation to earlier in the loop
* Update tests to include proper list of user emails
* Update logic to grab user emails as well
* Only fetch expert-info if channel is not public
* Pull expert-info creation outside of loop
* Remove unnecessary call to `iter`
* Switch from `dataclass` to `BaseModel`
* Simplify boolean logic
* Simplify logic for determining if channel is public
* Remove unnecessary channel membership-type
* Add log-warns
* Only perform another API fetch if email is not present
* Address comments on PR
* Add message on assertion failure
* Address typo
* Make exception message more descriptive
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add function stubs for Teams
* Implement more boilerplate code
* Change structure of helper functions
* Implement teams perms for the initial index
* Make private functions start with underscore
* Implement slim_doc retrieval and fix up doc_sync
* Simplify how doc-sync is done
* Refactor jira doc-sync
* Make locally used function start with an underscore
* Update backend/ee/onyx/configs/app_configs.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add docstring to helper function
* Update tests
* Add an expected failure
* Address comment on PR
* Skip expert-info if user does not have a display-name
* Add doc comments
* Fix error in generic_doc_sync
* Move callback invocation to earlier in the loop
* Update tests to include proper list of user emails
* Update logic to grab user emails as well
* Only fetch expert-info if channel is not public
* Pull expert-info creation outside of loop
* Remove unnecessary call to `iter`
* Switch from `dataclass` to `BaseModel`
* Simplify boolean logic
* Simplify logic for determining if channel is public
* Remove unnecessary channel membership-type
* Add log-warns
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* kg cleanup
* more cleanup
* fix: copy over _get_classification_content_from_call_chunks for content formatting
* added back deep extraction logic
* feat: making deep extraction and clustering work
* nit
Changed the restart policy to unless-stopped to ensure containers
automatically restart after failures or reboots but allow manual stop
without immediate restart.
This is preferable over always because it prevents containers from
restarting automatically after a manual stop, enabling controlled
shutdowns and maintenance without unintended restarts.
* Split up engine file
* Switch to schema_translate_map
* Fix mass serach/replace
* Remove unused
* Fix mypy
* Fix
* Add back __init__.py
* kg fix for new session management
Adding "<tenant_id>" in front of all views.
* additional kg fix
* better handling
* improve naming
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* updates
- no classification if deep extraction is False
- separate names for views in LLM generation
- better prompts
- any relationship type provided to LLM that relates to identified entities
* CW feedback/comment update
* GCS metadata processing
* Unprocessable files should still be indexed to be searched by title
* Moved re-used logic to utils. Combined file metadata PR with GCS metadata changes
* Added OnyxMetadata type, adjusted timestamp naming consistency, clarified timestamp logic
* Use BaseModel
---------
Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
* Create Entity-Only path for simple entity-focussed queries. Plus
other fixes.
* fix: use env var
* mypy fix
* fix: mypy
---------
Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com>
* refactor salesforce sqlite db access
* more refactoring
* refactor again
* refactor again
* rename object
* add finalizer to ensure db connection is always closed
* avoid unnecessarily nesting connections and commit regularly when possible
* remove db usage from csv download
* dead code
* hide deprecation warning in ddtrace
* remove unused param
* local testing WIP
* stuff for pytest-dotenv
* autodetect filter types instead of assuming last modified always works (it doesn't)
Move filtering responsibility up instead of making utility calls excessively stateful
* fix how changed parent id's are yielded
* remove slow part of test
* clean up comments
* small refactor
* more refactor
* add normalize test
* checkpoint and comments
* add helper function
* fix gitignore
* add gitignore
* update pyproject
* delta updates
* remove comments
* fix time import
* fix set init
* add salesforce env vars
* cleanup
* more cleanup
* filtered item is unbound here
* typo
* fix suffix check
* fix empty type query
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* db setup
* transfer 1 - incomplete
* more adjustments
* relationship table + query update
* temp view creation
* restructuring
* nits
* updates
* separate read_only engine
* extraction revamp
* focus on metadata relatonships 1
* dev
* migration downgrade fix
* rebase migration change
* a3+
* progress
* base
* new extraction
* progress
* fixed KG extraction
* nits
* updates
* simplifications & cleanup
* fixes
* updates
* more feature flag checks
* fixes
* extraction process fix
* read-only user creation as part of setup
* fix for missing entity attributes
* kg read-only user creation as part of migration
* typo
* EL initial comments
* initial Account/SF Connector chnges
* SF Connector update
- include account information
* base w/ salesforce
* evan updates + quite a bit more
* kg-filtered search
* EL changes pt 2
* migrations and env vars
* quick migration fix
* migration update
* post_rebase fixes
* mypy fixes
* test fixes
* test fix
* test fix
* read_only pool + misc
* nf
* env vars
* test improvements
* salesforce fix
* test update
* small changes
* small adjustments
* SF Connector fix & kg_stage removal for one table
* mypy fix
* small fixes
* EL + RK (pt 1) comments
* nit
* setting updated
* Salesforce test update
* EL comments
* read-only user replacement & cleanup
* SQL View fix
* converting entity type-name separators
* sql view group ownership
* view fix
* SQL tweak
* dealing with docs that were skipped by indexing
* increased error handling
* more error handling
* Output formatting fix
* kg-incremental-reindexing
* 0-doc found improvement
* celery
* migration correction
* timeout adjustments
* nit
* Updated migration
* Entity Normalization for KG Dev 1 (#4746)
* feat: trigrams column
* fix: reranking and db
* feat: v1
* fix: convert to orm
* feat: parallel
* fix: default to id_name
* fix: renamed semantic_id and semantic_id_trigrams
* fix: scalar subquery
* fix: tuning + redundancy
* fix: threshold
* fix: typo
* fix: shorten names
* wip
* fix: reverted
* feat: config
* feat: works but it was dumb
* feat: clustering works
* fix: mypy
* normalization <-> language awareness for SQL generation
* small type fixes
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* mypy
* typo and dead code
* kg_time_fencing
* feat: remove temp views on migration downgrade
* remove functions and triggers for now
* rebase adjustments
* EL code review results
* quick fix + trigger/funcs for single tenant
* fix: typo, mypy, dead code
* fix: autoflake
* small updatesd
* nit
* fix: typo
* early + faster view creation
* Extension creation in MT migration
* nit changes to default ETs
* Incremental Clustering and KG Refactor V1 (#4784)
Optimized/restructured incremental clustering. New pipeline actually that moves vespa updates to clustering.
Also, celery configuration has been updated.
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* Move file
* Fix all prior imports
* Clean sidebar items logic; add kg page
* Add kg_processing celery background task
* prompt tweak & ET extraction reset
* more general hierarchical structure
* feat: better vespa reset logic
* Add basic knowledge graph configuration
* Add configurations for KG entity-type
* prompt optimization and entity replacemants
* small prompt changes
* Implement backend APIs
* KG Refactor V2 (#4814)
Clustering & Extraction improvements & various nits
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* add connector-level coverage days
* Update APIs to be more frontend ergonomic
* Add simple test
* Make config optional in test
* fix: nit
* initial EL responses
* refactor: helper functions for formatting
* fix: more helper fns & comments
* fix: comment code that's been implemented elsewhere
* Add entity-types APIs
* Hook up frontend to backend
* Finish hookup up entity-types to backend
* Update ordering of entity-types and fix form submitting
* Add backend API to get kg-exposed
* Add kg-exposed to sidebar
* Fix path
* Use existing values, even if kg-enabled is false
* Update what initial values are used
* Add skeleton for kg resetting
* Add return type
* Add default entity-type population when fetching entity-types
* Remove circular deps
* Minor fixes to logic
* Edit logic for default entity-types population
* Add re-index API + skeleton
* Update verbiage for KG
* Remove templatization in favour of function
* Address comments on PR
* Pull call out into its own binding
* Remove re-index API and revert implement of reset back to stub
* Fix circular import error
* Remove 'reindex' button
* Edit how the empty vendor name list is handled
* Edit how exposed is processed
* Redirect if navigated to `/admin/kg` and kg is not exposed
* Address comments on PR
* reset + entity type table display & updating updates
* Update fetching entity-types
* Make KG entity types refresh when reset
* Edit verbiage of reset button
* Update package-lock.json file
* Protect against overflowing
* Re-implement refreshing table after reset
* Edit message when nothing is shown.
* UI enhancements
* small fixes
* remove form validation?
* fix
* nit
* nit
* nit
* nit
* fix configure max coverage days
* EL comments for JR
* refactor: moved functions where they belong to fix circular import
* feat: intuitive coverage days
* feat: intuitive coverage days
* fix: safe date picker
* fix: startdate
* evan fixes
* fix: evan comment on enable/disable
* fix: style
* fix: ui issues
* fix: ui issues for reset too
* fix: tests
* fix: kg entity is not enabled
* fix: entity type reload on enable
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com>
* Edit logic for default entity-types population
* Remove templatization in favour of function
* Address comments on PR
* Pull call out into its own binding
* Address comments on PR
* Add rate-limiting to Teams API request
* Add comment for rate-limiting
* Implement rate-limiting for office365 library.
* Remove hardcoded value
* Fix nits on PR
* initial model switching changes
* Update image generation output format and revise prompt handling
* Add validation for output format in ImageGenerationTool and implement tests
---------
Co-authored-by: Subash <subash@onyx.app>
* Add perm sync to indexing for google drive
* Applying changes elsewhere
* Turn on EE for perm sync slack tests
* Add new load_from_checkpoint_with_perm_sync
* Adjust way perm sync configs are represented
* Adjust run_indexing to handle perm sync on first run
* Add missing file
* Add sync on index for slack
* Add test + fixes
* Update permission
* Fix connector tests
* skip perm sync test if running MIT tests
* Address EL comments
* Add error clarity to restart containers script
* erroneous cleanup on exit
* fix when starting containers for the first time
---------
Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
* Fixed indexing when no sites are specificed
* Added test for Sharepoint all sites index
* Accounted for paginated results.
* Typing
* Typing
---------
Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
* add percentage progress
* range checking
* formatting
* for new channels, skip them if the most recent messages are all from bots
* comments
* bypass bot channels
* code review
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* try fixing slack bot
* add logging
* just use if
* safe msg get
* .close isn't async
* enforce block list size limit
* various fixes and notes
* don't use self
* switch to punkt_tab
* fix return condition
* synchronize waiting, use non thread local redis locks
* fix log format, make collection copy more explicit for readability
* fix some logging
* unnecessary function
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* add more info
* fix headers
* add filename as param (merge)
* db manager entry in launch template
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Add replies to document construction and edit tests
* Update tests
* Add replies processing to teams
* Fix test
* Add try-except block around potential failure
* Update entity-id during ConnectorFailure raise
* Change query-exporting to use generators instead of expanding fully into memory
* Fix pagination logic
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add type annotation
* Add early break if list of chat_sessions is empty
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Highlight active link in AdminSidebar based on current pathname
* Refactor AdminSidebar to declare pathname variable earlier
---------
Co-authored-by: Subash <subash@onyx.app>
* Add basic foundation for teams checkpointing classes
* Fix slack connector main entrypoint
* Saving changes
* Finish teams checkpointing impl
* Remove commented out code
* Remove more unused code
* Move code around
* Add threadpool to process requests in parallel
* Fix mypy errors / warnings
* Move test import to main function only
* Address nits on PR
* Remove unnecessary check prior to entering while-loop
* Remove print statement
* Change exception message
* Address more nits
* Use indexing instead of destructuring
* Add back invocation of `run_with_timeout` instead of a direct call
* Revert slack testing code
* Move early return to before second API call
* Pull fetch to team outside of loop
* Address nits on PR
* Add back client-side filtering
* Updated connector to return after a team's indexing is finished
* Add type ignore
* Implement proper datetime range fetching
* Address comment on PR
* Rename function
* Change exception type when no team with the given id was found
* Address nit on PR
* Add comment on why `page_loaded` is needed to be specified explicitly
* Remove duplicated calls to fetching channels
* Use helper function for thread-based yielding instead of manual logic
* Move datetime filtering to message-level instead
* Address more comments on PR
* Add new utility function for yielding sections
* Add additional utility function
* Add teams tests
* Edit error message
* Address nits on PR
* Promote url-prefix to be a class level constant
* Fix mypy error
* Remove start/end parameters from function that doesn't use them anymore; move around comments
* Address more nits on PR
* Add comment
* add utility function
* add utility functions to DocExternalAccess
* refactor db access out of individual celery tasks and put it directly into the heavy task
* code review and remove leftovers
* fix circular imports
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* ensure we don't tag 'latest' with cloud images
* add docker login to trivy
* fix tag names
* flavor latest false (no auto latest tags)
* fix typo
* only run the appropriate workflow for web
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* set field size limit
* don't use sys.maxsize
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* run testing
* need to break on success
* add a readme
* raise vespa to 6GB
* allow test to retry
* add 20 attempts
* put memory limits back to normal
* restore chart testing on changes only
* increase retries to 40
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Add more logging for confluence perm-sync + handle case where permissions are removed from the access token
* Make required permissions are explicit
* more
* Add slim fetch limit + mark all cc pairs of source type as successful upon group sync
* Add to dev compose
* Small teams fix
* Add file
* Add single limit pagination for confluence
* Restrict to server only
* more logging
* cleanup
* Cleanup
* Remove CONFLUENCE_CONNECTOR_SLIM_FETCH_LIMIT
* Handle teams error
* Fix ut
* Remove db dependency from confluence_doc_sync
* move stuff back to debug
* restore caching and fix up some prefixing
* try backend matrix build and fix artifact names
* need id
* add backslashes to be consistent
* fix no-cache
* leave docker tags to the meta action
* need checkout in merge
* add comment
* move spammy logs to debug status
* bunch of no-cache updates
* prefix
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Update mode to be a default parameter in `FileStore.read`
* Move query history exporting process to be a background job instead
* Move hardcoded report-file-naming to a common utility function
* Add type annotations
* Update download component
* Implement button to re-ping and download CSV file; fix up some backend file-checking logic
* De-indent logic (w/ early return)
* Return different error codes dependings on the type of task status
* Add more resistant failure retrying mechanisms
* Remove default parameter in helper function
* Use popup for error messaging
* Update return code
* Update web/src/app/ee/admin/performance/query-history/DownloadAsCSV.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add type to useState call
* Update backend/ee/onyx/server/query_history/api.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/file_store/file_store.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/ee/onyx/background/celery/apps/primary.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Move rerender call to after check
* Run formatter
* Add type conversions back (smh greptile)
* Remove duplicated call to save_file
* Move non-fallible logic out of try-except block
* Pass date-ranges into API call
* Convert to ISO strings before passing it into the API call
* Add API to list all tasks
* Create new pydantic model to represent tasks to return instead
* Change helper to only fetch query-history tasks
* Use `shared_tasks` instead of old method
* Address more comments from PR; consolidate how task name is generated
* Mark task as failed if any exception is raised
* Change the task object which is returned back to the FE
* Add a table to display previously generated query-history-csv's
* Add timestamps to task; delete tasks as soon as file finishes processing
* Raise exception if start_time is not present
* Convert hard-coded string to constant
* Add "Generated At" field to table
* Return task list in sorted order (based off of start-time)
* Implement pagination
* Remove unused props and cleanup tailwind classes
* Change the name of kickoff button
* Redesign how previous query exports are viewed
* Make button a constant width even when contents change
* Remove timezone information before comparing
* Decrease interval time for re-pinging API
* Add timezone to start-time creation
* Add a refreshInterval for getting updated task status
* Add new background queue
* Edit small verbiage and remove error popup when max-retries is hit
* Change up heavy worker to recognize new task in new module
* Ensure `celery_app` is imported
* Change how `celery_app` is imported and defined
* Update comment on why `celery_app` must be imported
* Add basic skeleton for new beat task to cleanup any dead / failed query-history-export tasks
* Move cleanup task to different worker / queue
* Implement cleanup task
* Add return type
* Address comment on PR
* Remove delimiter from prefix
* Change name of function to be more descriptive
* Remove delimiter from prefix constant
* Move function invocation closer to usage location
* Move imports to top of file
* Move variable up a scope due to undefined error
* Remove dangling if-statement
* Make function more pure-functional
* Remove redefinition
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* it will never happen again.
* fix perm sync issue
* fix perm sync issue2
* ensure member emails map is populated
* other fix for perm sync
* address CW comments
* nit
* don't log all channels
* print number of channels
* sanitize indexing exception messages
* harden vespa index swap
* use constants and fix list generation
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* memory optimize task generation for connector deletion
* test
* fix up integration test docker file
* more no-cache
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
- created env variable AGENT_ALLOW_REFINEMENT with default "". Must be set to true to enable Refinement.
- added an environment variable for the upper limit of docs that can be sent to verification
* don't hardcode -1
* extra spaces
* fix binary data in blurb
* add note to binary handling
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* tolerance of confluence api weirdness
* remove checkpointing
* remove skipping logic from checkpointing
* add back checkpointing
* switch confluence checkpointing to be based on page starts
* address CW comments and fix unit tests
* some mitigations of bad confluence api
* new checkpointing approach and testing fixes
* fix test
* CW comments
* Fix migration
* Fix migration to take care of various nullability cases
* Address comments on PR
* Rename variables to be more descriptive
* Make helpers private
* Fix select statement
* Add comments to explain the involved logic
* Saving changes
* Finish script to revalidate `display_model_names`
* Address comments on PR by greptile
* Add missing columns
* Pull difference operator out into binding
* Add deletion prior to re-insertion
* Use map from shared llm-provider file instead
* Use helper function instead of copying code
* Remove delete and convert into an update statement
* Use pydantic for ModelConfigurations
* Update to do nothing on-conflict rather than update
* Address nits on PR
* Add default visible model(s) for bedrock
* Perform an update on conflict instead of doing nothing
* Fix migration
* Fix migration to take care of various nullability cases
* Address comments on PR
* Rename variables to be more descriptive
* Make helpers private
* Fix select statement
* Add comments to explain the involved logic
* Add helpers for viewing visible model names
* Fix logic for missing model + display-model names in migration
* refactor salesforce sqlite db access
* more refactoring
* refactor again
* refactor again
* rename object
* add finalizer to ensure db connection is always closed
* avoid unnecessarily nesting connections and commit regularly when possible
* remove db usage from csv download
* dead code
* hide deprecation warning in ddtrace
* remove unused param
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* debug script + slight refactor of db class
* better comments
* move setup logger
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* friendlier handling of slack channel retrieval
* retry on downgrade_postgres deadlock
* fix comment
* text
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Convert the model_names and display_model_names into a set instead
* Update backend/alembic/versions/7a70b7664e37_add_model_configuration_table.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Return default value instead of throwing error
* Add default parameter
* Move logic around
* Use dummy value for max_input_tokens in testing flow
* Remove unnecessary assignment
* tool to generate vespa schema variations for our cloud
* extraneous assign
* use a real templating system instead of search/replace
* fix float
* maybe this should be double
* remove redundant var
* template the other files
* try a spawned process
* move the wrapper
* fix args
* increase timeout
* run multitenant reset operations out of process as well
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add emails to retry with on 403
* attempted fix for connector test
* CW comments
* connector test fix
* test fixes and continue on 403
* fix tests
* fix tests
* fix concurrency tests
* fix integration tests with llmprovider eager loading
* Add multi text array field
* Add multiple values to model configuration for a custom LLM provider
* Fix reference to old field name
* Add migration
* Update all instances of model_names / display_model_names to use new schema migration
* Update background task
* Update endpoints to not throw errors
* Add test
* Update backend/alembic/versions/7a70b7664e37_add_models_configuration_table.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Fix list comprehension nits
* Update web/src/components/admin/connectors/Field.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update web/src/app/admin/configuration/llm/interfaces.ts
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implement greptile recommendations
* Update backend/onyx/db/llm.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/server/manage/llm/api.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/db/llm.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Fix more greptile suggestions
* Run formatter again
* Update backend/onyx/db/models.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add relationship to `LLMProvider` and `ModelConfigurations` classes
* Use sqlalchemy ORM relationships instead of manually populating fields
* Upgrade migration
* Update interface
* Remove all instances of model_names and display_model_names from backend
* Add more tests and fix bugs
* Run prettier
* Add types
* Update migration to perform data transformation
* Ensure native llm providers don't have custom max input tokens
* Start updating frontend logic to support custom max input tokens
* Pass max input tokens to LLM class (to be passed into `litellm.completion` call later)
* Add ModelConfigurationField component for custom llm providers
* Edit spacing and styling of model configuration matrix
* Fix error message displaying bug
* Edit opacity of `FiX` field for first index
* Change opacity back
* Change roundness
* Address comments on PR
* Perform fetching of `max_input_tokens` at the beginning of the callgraph and rope it throughout the entire callstack
* Change `add` to `execute`
* Move `max_input_tokens` into `LLMConfig`
* Fix bug with error messages not being cleared
* Change field used to fetch LLMProvider
* Fix model-configuration UI
* Address comments
* Remove circular import
* Fix failing tests in GH
* Fix failing tests
* Use `isSubset` instead of equality to determine native vs custom LLM Provider
* Remove unused import
* Make responses always display max_input_tokens
* Fix api endpoint to hit
* Update types in web application
* Update object field
* Fix more type errors
* Fix failing llm provider tests
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* add o3 + o4 mini
* k
* see which ones fail
* attempt
* k
* k
* llm ordering passing
* all tests passing
* quick bump
* Revert "add o3 + o4 mini"
This reverts commit 4cfa1984ec.
* k
* k
* tool to generate vespa schema variations for our cloud
* extraneous assign
* use a real templating system instead of search/replace
* fix float
* maybe this should be double
* remove redundant var
* template the other files
* try a spawned process
* move the wrapper
* fix args
* increase timeout
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* tool to generate vespa schema variations for our cloud
* extraneous assign
* float, not double
* back to double
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* refactor a mega function for readability and make sure to increment retry_count on exception so that we don't infinitely loop
* improve session and page level context handling
* don't use pydantic for the session context
* we don't need retry success
* move playwright handling into the session context
* need to break on ok
* return doc from scrape
* fix comment
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Fix duplicate kwarg issue
* Change how vertex_credentials are passed
* Modify temporary dict instead
* Change string to a global constant
* Add extra condition to if-check during population of map
* small improvement to checking for image attachments
* better comments
* check centralized list of types instead of hardcoding them in the connector
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* upgrade celery to release version
* make the watchdog script more reusable
* use constant
* code review
* catch interrupt
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* rollback properly on exception
* rollback on exception
* don't continue if we can't set the search path
* cleaner handling via context manager
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* initial working version
* ranking profile
* modification for keyword/instruction retrieval
* mypy fixes
* EL comments
* added env var (True for now)
* flipped default to False
* mypy & final EL/CW comments + import issue
* refactor salesforce sqlite db access
* more refactoring
* refactor again
* refactor again
* rename object
* add finalizer to ensure db connection is always closed
* avoid unnecessarily nesting connections and commit regularly when possible
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* bump fastapi and starlette
* bumping llama index and nltk and associated deps
* bump to fix python-multipart
* bump aiohttp
* update package lock for examples/widget
* bump black
* sentencesplitter has changed namespaces
* fix reorder import check, fix missing passlib
* update package-lock.json
* black formatter updated
* reformatted again
* change to black compatible reorder
* change to black compatible reorder-python-imports fork
* fix pytest dependency
* black format again
* we don't need cdk.txt. update packages to be consistent across all packages
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* pass through various id's and log them in the model server for better tracking
* fix test
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* use send_task to be consistent
* add pidbox monitoring task
* add logging so we can track the task execution
* log the idletime of the pidbox
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
The code for token cost calculation fails when using a LiteLLM proxy due to mismatch with the provider naming. For now, just handle this exception and assume cost 0 when that happens instead of breaking the flow - A more precise, LiteLLM proxy based cost calculation (relying in the `/model/info`) LiteLLM Proxy method will be needed
* Add gemini well-known-llm-provider
* Edit styling of anonymous function
* Remove space
* Edit how advanced options are displayed
* Add VertexAI to acceptable llm providers
* Add new `FileUploadFormField` component
* Edit FileUpload component
* Clean up logic for displaying native llm providers; add support for more complex `CustomConfigKey` types
* Fix minor nits in web app
* Add ability to pass vertex credentials to `litellm`
* Remove unused prop
* Change name of enum value
* Add back ability to change form based on first time configurations
* Create new Error with string instead of throwing raw string
* Add more Gemini models
* Edit mappings for Gemini models
* Edit comment
* Rearrange llm models
* Run black formatter
* Remove complex configurations during first time registration
* Fix nit
* Update llm provider name
* Edit temporary formik field to also have the filename
* Run reformatter
* Reorder commits
* Add advanced configurations for enabled LLM Providers
* WIP
* WIP almost done, but realized we can just do basic retrieval
* rebased and added scripts
* improved approach to extracting smart chips
* remove files from previous branch
* fix connector tests
* fix test
* Update web connector implementation and fix line length issues
* Update configurations and fix connector issues
* Update Slack connector
* Update connectors and add jira_test_env to gitignore, removing sensitive information
* Restore checkpointing functionality and remove sensitive information
* Fix agent mode to properly handle thinking tokens
* up
* Enhance ThinkingBox component with improved content handling and animations. Added support for partial thinking tokens, refined scrolling behavior, and updated CSS for better visual feedback during thinking states.
* Create clean branch with frontend thinking mode changes only
* Update ThinkingBox component to include new props for completion and streaming states. Refactor smooth scrolling logic into a dedicated function for improved readability. Add new entry to .gitignore for jira_test_env.
* Remove autoCollapse prop from AIMessage component for improved flexibility in message display.
* Update thinking tokens handling in chat utils
* Remove unused cleanThinkingContent import from Messages component to streamline code.
---------
Co-authored-by: ferdinand loesch <f.loesch@sportradar.com>
Co-authored-by: EC2 Default User <ec2-user@ip-10-73-128-233.eu-central-1.compute.internal>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
* working around a gong race condition in their api
* add back gong basic test
* formatting
* add the call index
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* add some gc
* small refactoring for temp directories
* WIP
* add some gc collects and size calculations
* un-xfail
* fix salesforce test
* loose check for number of docs
* adjust test again
* cleanup
* nuke directory param, remove using sqlite db to cache email / id mappings
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* Enhance Highspot connector with error handling and add unit tests for poll_source functionality
* Fix file extension validation logic to allow either plain text or document format
* gong debugging
* add retries via class level session, add debugging
* add gong connector test
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* add prometheus metrics endpoints via helper package
* model server specific requirements
* mark as public endpoint
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* fix large docs selected in chat pruning
* better approach to length restriction
* comments
* comments
* fix unit tests and minor pruning bug
* remove prints
* stubbing out request id
* passthru or create request id's in api and model server
* add onyx request id
* get request id logging into uvicorn
* no logs
* change prefixes
* fix comment
* docker image needs specific shared files
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* use slack's built in rate limit handler for the bot
* WIP
* fix the slack rate limit handler
* change default to 8
* cleanup
* try catch int conversion just in case
* linearize this logic better
* code review comments
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* new mit integration test template
* edit
* fix problem with ACL type tags and MIT testing for test_connector_deletion
* fix test_connector_deletion_for_overlapping_connectors
* disable some enterprise only tests in MIT version
* disable a bunch of user group / curator tests in MIT version
* wire off more tests
* typo fix
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* fix acl prefixing
* increase timeout a tad
* block access to init'ing DocumentAccess directly, fix test to work with ee/MIT
* fix env var checks
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* refactor file extension checking and add test for blob s3
* code review
* fix checking ext
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* possible fix for confluence query filter
* nuke the attachment filter query ... it doesn't work!
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* fix issue with drive connector service account indexing
* correct checkpoint resumption
* final set of fixes
* nit
* fix typing
* logging and CW comments
* nit
* wire off image downloading for confluence and gdrive if not enabled in settings
* fix partial func
* fix confluence basic test
* add test for skipping/allowing images
* review comments
* skip allow images test
* mock function using the db
* mock at the proper level
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* sanitize llm keys and handle updates properly
* fix llm provider testing
* fix test
* mypy
* fix default model editing
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Checkpointed Jira connector
* nit
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* typing improvements and test fixes
* cleaner typing
* remove default because it is from the future
* mypy
* Address EL comments
---------
Co-authored-by: evan-danswer <evan@danswer.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* work in progress
* work in progress
* WIP
* refactor, use inline attachment for image (base64 encoding doesn't work)
* pretty sure this belongs behind a multi_tenant check
* code review / refactor
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* remove title for slack
* initial working code
* simplification
* improvements
* name change to information_content_model
* avoid boost_score > 1.0
* nit
* EL comments and improvements
Improvements:
- proper import of information content model from cache or HF
- warm up for information content model
Other:
- EL PR review comments
* nit
* requirements version update
* fixed docker file
* new home for model_server configs
* default off
* small updates
* YS comments - pt 1
* renaming to chunk_boost & chunk table def
* saving and deleting chunk stats in new table
* saving and updating chunk stats
* improved dict score update
* create columns for individual boost factors
* RK comments
* Update migration
* manual import reordering
* fix oauth downloading and size limits in confluence
* bump black to get past corrupt hash
* try working around another corrupt package
* fix raw_bytes
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* rename agent test script to prevent pytest autodiscovery
* first cut
* fix log message
* fix up typing
* add a sample test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* functional initial auth modal
* k
* k
* k
* looking good
* k
* k
* k
* k
* update
* k
* k
* misc bunch
* improvements
* k
* address comments
* k
* nit
* update
* k
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
* slight improvements and notes to query history and seeding
* update test
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* add ingress for api and web
* helm setup docs
* add letsencrypt. close blocks
* use pathType ImplementationSpecific as Prefix is deprecated
* fix backend labels. configure nginx routes. update annotations
* fix linting
---------
Co-authored-by: Sajjad Anwar <sajjadkm@gmail.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* * Replaces Amazon and Anthropic Icons with version better suitable for both Dark and Light modes;
* Adds icon for DeepSeek;
* Simplify logic on icon selection;
* Adds entries for Phi-4, Claude 3.7, Ministral and Gemini 2.0 models
* nit
* k
* k
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
* Update text embedding model to version 005 and enhance embedding retrieval process
* re
* Fix formatting issues
* Add support for Bedrock reranking provider and AWS credentials handling
* fix: improve AWS key format validation and error messages
* Fix vertex embedding model crash
* feat: add environment template for local development setup
* Add display name for Claude 3.7 Sonnet model
* Add display names for Gemini 2.0 models and update Claude 3.7 Sonnet entry
* Fix ruff errors by ensuring lines are within 130 characters
* revert to currently default onyx browser settings
* add / fix boto requirements
---------
Co-authored-by: ferdinand loesch <f.loesch@sportradar.com>
Co-authored-by: Ferdinand Loesch <ferdinandloesch@me.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* fix blowing up the entire task on exception and trying to reuse an invalid db session
* list comprehension
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
2025-03-04 00:57:27 +00:00
1509 changed files with 114317 additions and 31828 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.