mirror of
https://github.com/onyx-dot-app/onyx.git
synced 2026-02-16 23:35:46 +00:00
Compare commits
1 Commits
experiment
...
pdf_fix
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
03dfa0fcc0 |
64
README.md
64
README.md
@@ -30,30 +30,26 @@ Keep knowledge and access controls sync-ed across over 40 connectors like Google
|
||||
Create custom AI agents with unique prompts, knowledge, and actions that the agents can take.
|
||||
Onyx can be deployed securely anywhere and for any scale - on a laptop, on-premise, or to cloud.
|
||||
|
||||
|
||||
<h3>Feature Highlights</h3>
|
||||
|
||||
**Deep research over your team's knowledge:**
|
||||
|
||||
https://private-user-images.githubusercontent.com/32520769/414509312-48392e83-95d0-4fb5-8650-a396e05e0a32.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk5Mjg2MzYsIm5iZiI6MTczOTkyODMzNiwicGF0aCI6Ii8zMjUyMDc2OS80MTQ1MDkzMTItNDgzOTJlODMtOTVkMC00ZmI1LTg2NTAtYTM5NmUwNWUwYTMyLm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDAxMjUzNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFhMzk5Njg2Y2Y5YjFmNDNiYTQ2YzM5ZTg5YWJiYTU2NWMyY2YwNmUyODE2NWUxMDRiMWQxZWJmODI4YTA0MTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.a9D8A0sgKE9AoaoE-mfFbJ6_OKYeqaf7TZ4Han2JfW8
|
||||
|
||||
|
||||
**Use Onyx as a secure AI Chat with any LLM:**
|
||||
|
||||

|
||||
|
||||
|
||||
**Easily set up connectors to your apps:**
|
||||
|
||||

|
||||
|
||||
|
||||
**Access Onyx where your team already works:**
|
||||
|
||||

|
||||
|
||||
|
||||
## Deployment
|
||||
|
||||
**To try it out for free and get started in seconds, check out [Onyx Cloud](https://cloud.onyx.app/signup)**.
|
||||
|
||||
Onyx can also be run locally (even on a laptop) or deployed on a virtual machine with a single
|
||||
@@ -62,23 +58,23 @@ Onyx can also be run locally (even on a laptop) or deployed on a virtual machine
|
||||
We also have built-in support for high-availability/scalable deployment on Kubernetes.
|
||||
References [here](https://github.com/onyx-dot-app/onyx/tree/main/deployment).
|
||||
|
||||
|
||||
## 🔍 Other Notable Benefits of Onyx
|
||||
|
||||
- Custom deep learning models for indexing and inference time, only through Onyx + learning from user feedback.
|
||||
- Flexible security features like SSO (OIDC/SAML/OAuth2), RBAC, encryption of credentials, etc.
|
||||
- Knowledge curation features like document-sets, query history, usage analytics, etc.
|
||||
- Scalable deployment options tested up to many tens of thousands users and hundreds of millions of documents.
|
||||
|
||||
|
||||
## 🚧 Roadmap
|
||||
|
||||
- New methods in information retrieval (StructRAG, LightGraphRAG, etc.)
|
||||
- Personalized Search
|
||||
- Organizational understanding and ability to locate and suggest experts from your team.
|
||||
- Code Search
|
||||
- SQL and Structured Query Language
|
||||
|
||||
|
||||
## 🔌 Connectors
|
||||
|
||||
Keep knowledge and access up to sync across 40+ connectors:
|
||||
|
||||
- Google Drive
|
||||
@@ -99,19 +95,65 @@ Keep knowledge and access up to sync across 40+ connectors:
|
||||
|
||||
See the full list [here](https://docs.onyx.app/connectors).
|
||||
|
||||
|
||||
## 📚 Licensing
|
||||
|
||||
There are two editions of Onyx:
|
||||
|
||||
- Onyx Community Edition (CE) is available freely under the MIT Expat license. Simply follow the Deployment guide above.
|
||||
- Onyx Enterprise Edition (EE) includes extra features that are primarily useful for larger organizations.
|
||||
For feature details, check out [our website](https://www.onyx.app/pricing).
|
||||
For feature details, check out [our website](https://www.onyx.app/pricing).
|
||||
|
||||
To try the Onyx Enterprise Edition:
|
||||
|
||||
1. Checkout [Onyx Cloud](https://cloud.onyx.app/signup).
|
||||
2. For self-hosting the Enterprise Edition, contact us at [founders@onyx.app](mailto:founders@onyx.app) or book a call with us on our [Cal](https://cal.com/team/onyx/founders).
|
||||
|
||||
|
||||
## 💡 Contributing
|
||||
|
||||
Looking to contribute? Please check out the [Contribution Guide](CONTRIBUTING.md) for more details.
|
||||
|
||||
# YC Company Twitter Scraper
|
||||
|
||||
A script that scrapes YC company pages and extracts Twitter/X.com links.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.7+
|
||||
- Playwright
|
||||
|
||||
## Installation
|
||||
|
||||
1. Install the required packages:
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. Install Playwright browsers:
|
||||
```
|
||||
playwright install
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Run the script with default settings:
|
||||
|
||||
```
|
||||
python scrape_yc_twitter.py
|
||||
```
|
||||
|
||||
This will scrape the YC companies from recent batches (W23, S23, S24, F24, S22, W22) and save the Twitter links to `twitter_links.txt`.
|
||||
|
||||
### Custom URL and Output
|
||||
|
||||
```
|
||||
python scrape_yc_twitter.py --url "https://www.ycombinator.com/companies?batch=W24" --output "w24_twitter.txt"
|
||||
```
|
||||
|
||||
## How it works
|
||||
|
||||
1. Navigates to the specified YC companies page
|
||||
2. Scrolls down to load all company cards
|
||||
3. Extracts links to individual company pages
|
||||
4. Visits each company page and extracts Twitter/X.com links
|
||||
5. Saves the results to a text file
|
||||
|
||||
45
YC_SCRAPER_README.md
Normal file
45
YC_SCRAPER_README.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# YC Company Twitter Scraper
|
||||
|
||||
A script that scrapes YC company pages and extracts Twitter/X.com links.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.7+
|
||||
- Playwright
|
||||
|
||||
## Installation
|
||||
|
||||
1. Install the required packages:
|
||||
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. Install Playwright browsers:
|
||||
```
|
||||
playwright install
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Run the script with default settings:
|
||||
|
||||
```
|
||||
python scrape_yc_twitter.py
|
||||
```
|
||||
|
||||
This will scrape the YC companies from recent batches (W23, S23, S24, F24, S22, W22) and save the Twitter links to `twitter_links.txt`.
|
||||
|
||||
### Custom URL and Output
|
||||
|
||||
```
|
||||
python scrape_yc_twitter.py --url "https://www.ycombinator.com/companies?batch=W24" --output "w24_twitter.txt"
|
||||
```
|
||||
|
||||
## How it works
|
||||
|
||||
1. Navigates to the specified YC companies page
|
||||
2. Scrolls down to load all company cards
|
||||
3. Extracts links to individual company pages
|
||||
4. Visits each company page and extracts Twitter/X.com links
|
||||
5. Saves the results to a text file
|
||||
@@ -15,6 +15,7 @@ from pathlib import Path
|
||||
from typing import Any
|
||||
from typing import IO
|
||||
from typing import NamedTuple
|
||||
from typing import Optional
|
||||
|
||||
import chardet
|
||||
import docx # type: ignore
|
||||
@@ -568,7 +569,9 @@ def extract_text_and_images(
|
||||
return ExtractionResult(text_content="", embedded_images=[], metadata={})
|
||||
|
||||
|
||||
def convert_docx_to_txt(file: UploadFile, file_store: FileStore) -> str:
|
||||
def convert_docx_to_txt(
|
||||
file: UploadFile, file_store: FileStore, file_path: Optional[str] = None
|
||||
) -> str:
|
||||
"""
|
||||
Helper to convert docx to a .txt file in the same filestore.
|
||||
"""
|
||||
@@ -580,7 +583,8 @@ def convert_docx_to_txt(file: UploadFile, file_store: FileStore) -> str:
|
||||
all_paras = [p.text for p in doc.paragraphs]
|
||||
text_content = "\n".join(all_paras)
|
||||
|
||||
text_file_name = docx_to_txt_filename(file.filename or f"docx_{uuid.uuid4()}")
|
||||
file_name = file.filename or f"docx_{uuid.uuid4()}"
|
||||
text_file_name = docx_to_txt_filename(file_path if file_path else file_name)
|
||||
file_store.save_file(
|
||||
file_name=text_file_name,
|
||||
content=BytesIO(text_content.encode("utf-8")),
|
||||
@@ -593,3 +597,27 @@ def convert_docx_to_txt(file: UploadFile, file_store: FileStore) -> str:
|
||||
|
||||
def docx_to_txt_filename(file_path: str) -> str:
|
||||
return file_path.rsplit(".", 1)[0] + ".txt"
|
||||
|
||||
|
||||
def convert_pdf_to_txt(file: UploadFile, file_store: FileStore, file_path: str) -> str:
|
||||
"""
|
||||
Helper to convert PDF to a .txt file in the same filestore.
|
||||
"""
|
||||
file.file.seek(0)
|
||||
|
||||
# Extract text from the PDF
|
||||
text_content, _, _ = read_pdf_file(file.file)
|
||||
|
||||
text_file_name = pdf_to_txt_filename(file_path)
|
||||
file_store.save_file(
|
||||
file_name=text_file_name,
|
||||
content=BytesIO(text_content.encode("utf-8")),
|
||||
display_name=file.filename,
|
||||
file_origin=FileOrigin.CONNECTOR,
|
||||
file_type="text/plain",
|
||||
)
|
||||
return text_file_name
|
||||
|
||||
|
||||
def pdf_to_txt_filename(file_path: str) -> str:
|
||||
return file_path.rsplit(".", 1)[0] + ".txt"
|
||||
|
||||
@@ -100,6 +100,7 @@ from onyx.db.models import UserGroup__ConnectorCredentialPair
|
||||
from onyx.db.search_settings import get_current_search_settings
|
||||
from onyx.db.search_settings import get_secondary_search_settings
|
||||
from onyx.file_processing.extract_file_text import convert_docx_to_txt
|
||||
from onyx.file_processing.extract_file_text import convert_pdf_to_txt
|
||||
from onyx.file_store.file_store import get_default_file_store
|
||||
from onyx.key_value_store.interface import KvKeyNotFoundError
|
||||
from onyx.redis.redis_connector import RedisConnector
|
||||
@@ -435,8 +436,16 @@ def upload_files(files: list[UploadFile], db_session: Session) -> FileUploadResp
|
||||
if file.content_type and file.content_type.startswith(
|
||||
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
|
||||
):
|
||||
file_path = convert_docx_to_txt(file, file_store)
|
||||
deduped_file_paths.append(file_path)
|
||||
file_path = os.path.join(str(uuid.uuid4()), cast(str, file.filename))
|
||||
text_file_path = convert_docx_to_txt(file, file_store)
|
||||
deduped_file_paths.append(text_file_path)
|
||||
continue
|
||||
|
||||
# Special handling for PDF files - only store the plaintext version
|
||||
if file.content_type and file.content_type.startswith("application/pdf"):
|
||||
file_path = os.path.join(str(uuid.uuid4()), cast(str, file.filename))
|
||||
text_file_path = convert_pdf_to_txt(file, file_store, file_path)
|
||||
deduped_file_paths.append(text_file_path)
|
||||
continue
|
||||
|
||||
# Default handling for all other file types
|
||||
|
||||
529
company_links.csv
Normal file
529
company_links.csv
Normal file
@@ -0,0 +1,529 @@
|
||||
Company,Link
|
||||
1849-bio,https://x.com/1849bio
|
||||
1stcollab,https://twitter.com/ycombinator
|
||||
abundant,https://x.com/abundant_labs
|
||||
activepieces,https://mobile.twitter.com/mabuaboud
|
||||
acx,https://twitter.com/ycombinator
|
||||
adri-ai,https://twitter.com/darshitac_
|
||||
affil-ai,https://twitter.com/ycombinator
|
||||
agave,https://twitter.com/moyicat
|
||||
aglide,https://twitter.com/pdmcguckian
|
||||
ai-2,https://twitter.com/the_yuppy
|
||||
ai-sell,https://x.com/liuzjerry
|
||||
airtrain-ai,https://twitter.com/neutralino1
|
||||
aisdr,https://twitter.com/YuriyZaremba
|
||||
alex,https://x.com/DanielEdrisian
|
||||
alga-biosciences,https://twitter.com/algabiosciences
|
||||
alguna,https://twitter.com/aleks_djekic
|
||||
alixia,https://twitter.com/ycombinator
|
||||
aminoanalytica,https://x.com/lilwuuzivert
|
||||
anara,https://twitter.com/naveedjanmo
|
||||
andi,https://twitter.com/MiamiAngela
|
||||
andoria,https://x.com/dbudimane
|
||||
andromeda-surgical,https://twitter.com/nickdamian0
|
||||
anglera,https://twitter.com/ycombinator
|
||||
angstrom-ai,https://twitter.com/JaviAC7
|
||||
ankr-health,https://twitter.com/Ankr_us
|
||||
apoxy,https://twitter.com/ycombinator
|
||||
apten,https://twitter.com/dho1357
|
||||
aragorn-ai,https://twitter.com/ycombinator
|
||||
arc-2,https://twitter.com/DarkMirage
|
||||
archilabs,https://twitter.com/ycombinator
|
||||
arcimus,https://twitter.com/husseinsyed73
|
||||
argovox,https://www.argovox.com/
|
||||
artemis-search,https://twitter.com/ycombinator
|
||||
artie,https://x.com/JacquelineSYC19
|
||||
asklio,https://twitter.com/butterflock
|
||||
atlas-2,https://twitter.com/jobryan
|
||||
attain,https://twitter.com/aamir_hudda
|
||||
autocomputer,https://twitter.com/madhavsinghal_
|
||||
automat,https://twitter.com/lucas0choa
|
||||
automorphic,https://twitter.com/sandkoan
|
||||
autopallet-robotics,https://twitter.com/ycombinator
|
||||
autumn-labs,https://twitter.com/ycombinator
|
||||
aviary,https://twitter.com/ycombinator
|
||||
azuki,https://twitter.com/VamptVo
|
||||
banabo,https://twitter.com/ycombinator
|
||||
baseline-ai,https://twitter.com/ycombinator
|
||||
baserun,https://twitter.com/effyyzhang
|
||||
benchify,https://www.x.com/maxvonhippel
|
||||
berry,https://twitter.com/annchanyt
|
||||
bifrost,https://twitter.com/0xMysterious
|
||||
bifrost-orbital,https://x.com/ionkarbatra
|
||||
biggerpicture,https://twitter.com/ycombinator
|
||||
biocartesian,https://twitter.com/ycombinator
|
||||
bland-ai,https://twitter.com/zaygranet
|
||||
blast,https://x.com/useblast
|
||||
blaze,https://twitter.com/larfy_rothwell
|
||||
bluebirds,https://twitter.com/RohanPunamia
|
||||
bluedot,https://twitter.com/selinayfilizp
|
||||
bluehill-payments,https://twitter.com/HimanshuMinocha
|
||||
blyss,https://twitter.com/blyssdev
|
||||
bolto,https://twitter.com/mrinalsingh02?lang=en
|
||||
botcity,https://twitter.com/lorhancaproni
|
||||
boundo,https://twitter.com/ycombinator
|
||||
bramble,https://x.com/meksikanpijha
|
||||
bricksai,https://twitter.com/ycombinator
|
||||
broccoli-ai,https://twitter.com/abhishekjain25
|
||||
bronco-ai,https://twitter.com/dluozhang
|
||||
bunting-labs,https://twitter.com/normconstant
|
||||
byterat,https://twitter.com/penelopekjones_
|
||||
callback,https://twitter.com/ycombinator
|
||||
cambio-2,https://twitter.com/ycombinator
|
||||
camfer,https://x.com/AryaBastani
|
||||
campfire-2,https://twitter.com/ycombinator
|
||||
campfire-applied-ai-company,https://twitter.com/siamakfr
|
||||
candid,https://x.com/kesavkosana
|
||||
canvas,https://x.com/essamsleiman
|
||||
capsule,https://twitter.com/kelsey_pedersen
|
||||
cardinal,http://twitter.com/nadavwiz
|
||||
cardinal-gray,https://twitter.com/ycombinator
|
||||
cargo,https://twitter.com/aureeaubert
|
||||
cartage,https://twitter.com/ycombinator
|
||||
cashmere,https://twitter.com/shashankbuilds
|
||||
cedalio,https://twitter.com/LucianaReznik
|
||||
cekura-2,https://x.com/tarush_agarwal_
|
||||
central,https://twitter.com/nilaymod
|
||||
champ,https://twitter.com/ycombinator
|
||||
cheers,https://twitter.com/ycombinator
|
||||
chequpi,https://twitter.com/sudshekhar02
|
||||
chima,https://twitter.com/nikharanirghin
|
||||
cinapse,https://www.twitter.com/hgphillipsiv
|
||||
ciro,https://twitter.com/davidjwiner
|
||||
clara,https://x.com/levinsonjon
|
||||
cleancard,https://twitter.com/_tom_dot_com
|
||||
clearspace,https://twitter.com/rbfasho
|
||||
cobbery,https://twitter.com/Dan_The_Goodman
|
||||
codeviz,https://x.com/liam_prev
|
||||
coil-inc,https://twitter.com/ycombinator
|
||||
coldreach,https://twitter.com/ycombinator
|
||||
combinehealth,https://twitter.com/ycombinator
|
||||
comfy-deploy,https://twitter.com/nicholaskkao
|
||||
complete,https://twitter.com/ranimavram
|
||||
conductor-quantum,https://twitter.com/BrandonSeverin
|
||||
conduit,https://twitter.com/ycombinator
|
||||
continue,https://twitter.com/tylerjdunn
|
||||
contour,https://twitter.com/ycombinator
|
||||
coperniq,https://twitter.com/abdullahzandani
|
||||
corgea,https://twitter.com/asadeddin
|
||||
corgi,https://twitter.com/nico_laqua?lang=en
|
||||
corgi-labs,https://twitter.com/ycombinator
|
||||
coris,https://twitter.com/psvinodh
|
||||
cosine,https://twitter.com/AlistairPullen
|
||||
courtyard-io,https://twitter.com/lejeunedall
|
||||
coverage-cat,https://twitter.com/coveragecats
|
||||
craftos,https://twitter.com/wa3l
|
||||
craniometrix,https://craniometrix.com
|
||||
ctgt,https://twitter.com/cyrilgorlla
|
||||
curo,https://x.com/EnergizedAndrew
|
||||
dagworks-inc,https://twitter.com/dagworks
|
||||
dart,https://twitter.com/milad3malek
|
||||
dashdive,https://twitter.com/micahawheat
|
||||
dataleap,https://twitter.com/jh_damm
|
||||
decisional-ai,https://x.com/groovetandon
|
||||
decoda-health,https://twitter.com/ycombinator
|
||||
deepsilicon,https://x.com/abhireddy2004
|
||||
delfino-ai,https://twitter.com/ycombinator
|
||||
demo-gorilla,https://twitter.com/ycombinator
|
||||
demospace,https://www.twitter.com/nick_fiacco
|
||||
dench-com,https://www.twitter.com/markrachapoom
|
||||
denormalized,https://twitter.com/IAmMattGreen
|
||||
dev-tools-ai,https://twitter.com/ycombinator
|
||||
diffusion-studio,https://x.com/MatthiasRuiz22
|
||||
digitalcarbon,https://x.com/CtrlGuruDelete
|
||||
dimely,https://x.com/UseDimely
|
||||
disputeninja,https://twitter.com/legitmaxwu
|
||||
diversion,https://twitter.com/sasham1
|
||||
dmodel,https://twitter.com/dmooooon
|
||||
doctor-droid,https://twitter.com/TheBengaluruGuy
|
||||
dodo,https://x.com/dominik_moehrle
|
||||
dojah-inc,https://twitter.com/ololaday
|
||||
domu-technology-inc,https://twitter.com/ycombinator
|
||||
dr-treat,https://twitter.com/rakeshtondon
|
||||
dreamrp,https://x.com/dreamrpofficial
|
||||
drivingforce,https://twitter.com/drivingforcehq
|
||||
dynamo-ai,https://twitter.com/dynamo_fl
|
||||
edgebit,https://twitter.com/robszumski
|
||||
educato-ai,https://x.com/FelixGabler
|
||||
electric-air-2,https://twitter.com/JezOsborne
|
||||
ember,https://twitter.com/hsinleiwang
|
||||
ember-robotics,https://twitter.com/ycombinator
|
||||
emergent,https://twitter.com/mukundjha
|
||||
emobi,https://twitter.com/ycombinator
|
||||
entangl,https://twitter.com/Shapol_m
|
||||
envelope,https://twitter.com/joshuakcockrell
|
||||
et-al,https://twitter.com/ycombinator
|
||||
eugit-therapeutics,http://www.eugittx.com
|
||||
eventual,https://twitter.com/sammy_sidhu
|
||||
evoly,https://twitter.com/ycombinator
|
||||
expand-ai,https://twitter.com/timsuchanek
|
||||
ezdubs,https://twitter.com/PadmanabhanKri
|
||||
fabius,https://twitter.com/adayNU
|
||||
fazeshift,https://twitter.com/ycombinator
|
||||
felafax,https://twitter.com/ThatNithin
|
||||
fetchr,https://twitter.com/CalvinnChenn
|
||||
fiber-ai,https://twitter.com/AdiAgashe
|
||||
ficra,https://x.com/ficra_ai
|
||||
fiddlecube,https://twitter.com/nupoor_neha
|
||||
finic,https://twitter.com/jfan001
|
||||
finta,https://www.twitter.com/andywang
|
||||
fintool,https://twitter.com/nicbstme
|
||||
finvest,https://twitter.com/shivambharuka
|
||||
firecrawl,https://x.com/ericciarla
|
||||
firstwork,https://twitter.com/techie_Shubham
|
||||
fixa,https://x.com/jonathanzliu
|
||||
flair-health,https://twitter.com/adivawhocodes
|
||||
fleek,https://twitter.com/ycombinator
|
||||
fleetworks,https://twitter.com/ycombinator
|
||||
flike,https://twitter.com/yajmch
|
||||
flint-2,https://twitter.com/hungrysohan
|
||||
floworks,https://twitter.com/sarthaks92
|
||||
focus-buddy,https://twitter.com/yash14700/
|
||||
forerunner-ai,https://x.com/willnida0
|
||||
founders,https://twitter.com/ycombinator
|
||||
foundry,https://x.com/FoundryAI_
|
||||
freestyle,https://x.com/benswerd
|
||||
fresco,https://twitter.com/ycombinator
|
||||
friday,https://x.com/AllenNaliath
|
||||
frigade,https://twitter.com/FrigadeHQ
|
||||
futureclinic,https://twitter.com/usamasyedmd
|
||||
gait,https://twitter.com/AlexYHsia
|
||||
galini,https://twitter.com/ycombinator
|
||||
gauge,https://twitter.com/the1024th
|
||||
gecko-security,https://x.com/jjjutla
|
||||
general-analysis,https://twitter.com/ycombinator
|
||||
giga-ml,https://twitter.com/varunvummadi
|
||||
glade,https://twitter.com/ycombinator
|
||||
glass-health,https://twitter.com/dereckwpaul
|
||||
goodfin,https://twitter.com/ycombinator
|
||||
grai,https://twitter.com/ycombinator
|
||||
greenlite,https://twitter.com/will_lawrenceTO
|
||||
grey,https://www.twitter.com/kingidee
|
||||
happyrobot,https://twitter.com/pablorpalafox
|
||||
haystack-software,https://x.com/AkshaySubr42403
|
||||
health-harbor,https://twitter.com/AlanLiu96
|
||||
healthspark,https://twitter.com/stephengrinich
|
||||
hedgehog-2,https://twitter.com/ycombinator
|
||||
helicone,https://twitter.com/justinstorre
|
||||
heroui,https://x.com/jrgarciadev
|
||||
hoai,https://twitter.com/ycombinator
|
||||
hockeystack,https://twitter.com/ycombinator
|
||||
hokali,https://twitter.com/hokalico
|
||||
homeflow,https://twitter.com/ycombinator
|
||||
hubble-network,https://twitter.com/BenWild10
|
||||
humand,https://twitter.com/nicolasbenenzon
|
||||
humanlayer,https://twitter.com/dexhorthy
|
||||
hydra,https://twitter.com/JoeSciarrino
|
||||
hyperbound,https://twitter.com/sguduguntla
|
||||
ideate-xyz,https://twitter.com/nomocodes
|
||||
inbuild,https://twitter.com/TySharp_iB
|
||||
indexical,https://twitter.com/try_nebula
|
||||
industrial-next,https://twitter.com/ycombinator
|
||||
infisical,https://twitter.com/matsiiako
|
||||
inkeep,https://twitter.com/nickgomezc
|
||||
inlet-2,https://twitter.com/inlet_ai
|
||||
innkeeper,https://twitter.com/tejasybhakta
|
||||
instant,https://twitter.com/JoeAverbukh
|
||||
integrated-reasoning,https://twitter.com/d4r5c2
|
||||
interlock,https://twitter.com/ycombinator
|
||||
intryc,https://x.com/alexmarantelos?lang=en
|
||||
invert,https://twitter.com/purrmin
|
||||
iollo,https://twitter.com/daniel_gomari
|
||||
jamble,https://twitter.com/ycombinator
|
||||
joon-health,https://twitter.com/IsaacVanEaves
|
||||
juicebox,https://twitter.com/davepaffenholz
|
||||
julius,https://twitter.com/0interestrates
|
||||
karmen,https://twitter.com/ycombinator
|
||||
kenley,https://x.com/KenleyAI
|
||||
keylika,https://twitter.com/buddhachaudhuri
|
||||
khoj,https://twitter.com/debanjum
|
||||
kite,https://twitter.com/DerekFeehrer
|
||||
kivo-health,https://twitter.com/vaughnkoch
|
||||
knowtex,https://twitter.com/CarolineCZhang
|
||||
koala,https://twitter.com/studioseinstein?s=11
|
||||
kopra-bio,https://x.com/AF_Haddad
|
||||
kura,https://x.com/kura_labs
|
||||
laminar,https://twitter.com/skull8888888888
|
||||
lancedb,https://twitter.com/changhiskhan
|
||||
latent,https://twitter.com/ycombinator
|
||||
layerup,https://twitter.com/arnavbathla20
|
||||
lazyeditor,https://twitter.com/jee_cash
|
||||
ledgerup,https://twitter.com/josephrjohnson
|
||||
lifelike,https://twitter.com/alecxiang1
|
||||
lighthouz-ai,https://x.com/srijankedia
|
||||
lightski,https://www.twitter.com/hansenq
|
||||
ligo-biosciences,https://x.com/ArdaGoreci/status/1830744265007480934
|
||||
line-build,https://twitter.com/ycombinator
|
||||
lingodotdev,https://twitter.com/maxprilutskiy
|
||||
linkgrep,https://twitter.com/linkgrep
|
||||
linum,https://twitter.com/schopra909
|
||||
livedocs,https://twitter.com/arsalanbashir
|
||||
luca,https://twitter.com/LucaPricingHq
|
||||
lumenary,https://twitter.com/vivekhaz
|
||||
lune,https://x.com/samuelp4rk
|
||||
lynx,https://twitter.com/ycombinator
|
||||
magic-loops,https://twitter.com/jumploops
|
||||
manaflow,https://twitter.com/austinywang
|
||||
mandel-ai,https://twitter.com/shmkkr
|
||||
martin,https://twitter.com/martinvoiceai
|
||||
matano,https://twitter.com/AhmedSamrose
|
||||
mdhub,https://twitter.com/ealamolda
|
||||
mederva-health,http://twitter.com/sabihmir
|
||||
medplum,https://twitter.com/ReshmaKhilnani
|
||||
melty,https://x.com/charliebholtz
|
||||
mem0,https://twitter.com/taranjeetio
|
||||
mercator,https://www.twitter.com/ajdstein
|
||||
mercoa,https://twitter.com/Sarora27
|
||||
meru,https://twitter.com/rohanarora_
|
||||
metalware,https://twitter.com/ryanchowww
|
||||
metriport,https://twitter.com/dimagoncharov_
|
||||
mica-ai,https://twitter.com/ycombinator
|
||||
middleware,https://twitter.com/laduramvishnoi
|
||||
midship,https://twitter.com/_kietay
|
||||
mintlify,https://twitter.com/hanwangio
|
||||
minusx,https://twitter.com/nuwandavek
|
||||
miracle,https://twitter.com/ycombinator
|
||||
miru-ml,https://twitter.com/armelwtalla
|
||||
mito-health,https://twitter.com/teemingchew
|
||||
mocha,https://twitter.com/nichochar
|
||||
modern-realty,https://x.com/RIsanians
|
||||
modulari-t,https://twitter.com/ycombinator
|
||||
mogara,https://twitter.com/ycombinator
|
||||
monterey-ai,https://twitter.com/chunonline
|
||||
moonglow,https://twitter.com/leilavclark
|
||||
moonshine,https://x.com/useMoonshine
|
||||
moreta,https://twitter.com/ycombinator
|
||||
mutable-ai,https://x.com/smahsramo
|
||||
myria,https://twitter.com/reyflemings
|
||||
nango,https://twitter.com/rguldener
|
||||
nanograb,https://twitter.com/lauhoyeung
|
||||
nara,https://twitter.com/join_nara
|
||||
narrative,https://twitter.com/axitkhurana
|
||||
nectar,https://twitter.com/AllenWang314
|
||||
neosync,https://twitter.com/evisdrenova
|
||||
nerve,https://x.com/fortress_build
|
||||
networkocean,https://twitter.com/sammendel4
|
||||
ngrow-ai,https://twitter.com/ycombinator
|
||||
no-cap,https://x.com/nocapso
|
||||
nowadays,https://twitter.com/ycombinator
|
||||
numeral,https://www.twitter.com/mduvall_
|
||||
obento-health,https://twitter.com/ycombinator
|
||||
octopipe,https://twitter.com/abhishekray07
|
||||
odo,https://twitter.com/ycombinator
|
||||
ofone,https://twitter.com/ycombinator
|
||||
onetext,http://twitter.com/jfudem
|
||||
openfunnel,https://x.com/fenilsuchak
|
||||
opensight,https://twitter.com/OpenSightAI
|
||||
ora-ai,https://twitter.com/ryan_rl_phelps
|
||||
orchid,https://twitter.com/ycombinator
|
||||
origami-agents,https://x.com/fin465
|
||||
outerbase,https://www.twitter.com/burcs
|
||||
outerport,https://x.com/yongyuanxi
|
||||
outset,https://twitter.com/AaronLCannon
|
||||
overeasy,https://twitter.com/skyflylu
|
||||
overlap,https://x.com/jbaerofficial
|
||||
oway,https://twitter.com/owayinc
|
||||
ozone,https://twitter.com/maxvwolff
|
||||
pair-ai,https://twitter.com/ycombinator
|
||||
palmier,https://twitter.com/ycombinator
|
||||
panora,https://twitter.com/rflih_
|
||||
parabolic,https://twitter.com/ycombinator
|
||||
paragon-ai,https://twitter.com/ycombinator
|
||||
parahelp,https://twitter.com/ankerbachryhl
|
||||
parity,https://x.com/wilson_spearman
|
||||
parley,https://twitter.com/ycombinator
|
||||
patched,https://x.com/rohan_sood15
|
||||
pearson-labs,https://twitter.com/ycombinator
|
||||
pelm,https://twitter.com/ycombinator
|
||||
penguin-ai,https://twitter.com/ycombinator
|
||||
peoplebox,https://twitter.com/abhichugh
|
||||
permitflow,https://twitter.com/ycombinator
|
||||
permitportal,https://twitter.com/rgmazilu
|
||||
persana-ai,https://www.twitter.com/tweetsreez
|
||||
pharos,https://x.com/felix_brann
|
||||
phind,https://twitter.com/michaelroyzen
|
||||
phonely,https://x.com/phonely_ai
|
||||
pier,https://twitter.com/ycombinator
|
||||
pierre,https://twitter.com/fat
|
||||
pinnacle,https://twitter.com/SeanRoades
|
||||
pipeshift,https://x.com/FerraoEnrique
|
||||
pivot,https://twitter.com/raimietang
|
||||
planbase,https://twitter.com/ycombinator
|
||||
plover-parametrics,https://twitter.com/ycombinator
|
||||
plutis,https://twitter.com/kamil_m_ali
|
||||
poka-labs,https://twitter.com/ycombinator
|
||||
poly,https://twitter.com/Denizen_Kane
|
||||
polymath-robotics,https://twitter.com/stefanesa
|
||||
ponyrun,https://twitter.com/ycombinator
|
||||
poplarml,https://twitter.com/dnaliu17
|
||||
posh,https://twitter.com/PoshElectric
|
||||
power-to-the-brand,https://twitter.com/ycombinator
|
||||
primevault,https://twitter.com/prashantupd
|
||||
prohostai,https://twitter.com/bilguunu
|
||||
promptloop,https://twitter.com/PeterbMangan
|
||||
propaya,https://x.com/PropayaOfficial
|
||||
proper,https://twitter.com/kylemaloney_
|
||||
proprise,https://twitter.com/kragerDev
|
||||
protegee,https://x.com/kirthibanothu
|
||||
pump-co,https://www.twitter.com/spndn07/
|
||||
pumpkin,https://twitter.com/SamuelCrombie
|
||||
pure,https://twitter.com/collectpure
|
||||
pylon-2,https://x.com/marty_kausas
|
||||
pyq-ai,https://twitter.com/araghuvanshi2
|
||||
query-vary,https://twitter.com/DJFinetunes
|
||||
rankai,https://x.com/rankai_ai
|
||||
rastro,https://twitter.com/baptiste_cumin
|
||||
reactwise,https://twitter.com/ycombinator
|
||||
read-bean,https://twitter.com/maggieqzhang
|
||||
readily,https://twitter.com/ycombinator
|
||||
redouble-ai,https://twitter.com/pneumaticdill?s=21
|
||||
refine,https://twitter.com/civanozseyhan
|
||||
reflex,https://twitter.com/getreflex
|
||||
reforged-labs,https://twitter.com/ycombinator
|
||||
relace,https://twitter.com/ycombinator
|
||||
relate,https://twitter.com/chrischae__
|
||||
remade,https://x.com/Christos_antono
|
||||
remy,https://twitter.com/ycombinator
|
||||
remy-2,https://x.com/remysearch
|
||||
rentflow,https://twitter.com/ycombinator
|
||||
requestly,https://twitter.com/sachinjain024
|
||||
resend,https://x.com/zenorocha
|
||||
respaid,https://twitter.com/johnbanr
|
||||
reticular,https://x.com/nithinparsan
|
||||
retrofix-ai,https://twitter.com/danieldoesdev
|
||||
revamp,https://twitter.com/getrevamp_ai
|
||||
revyl,https://x.com/landseerenga
|
||||
reworkd,https://twitter.com/asimdotshrestha
|
||||
reworks,https://twitter.com/ycombinator
|
||||
rift,https://twitter.com/FilipTwarowski
|
||||
riskangle,https://twitter.com/ycombinator
|
||||
riskcube,https://x.com/andrei_risk
|
||||
rivet,https://twitter.com/nicholaskissel
|
||||
riveter-ai,https://x.com/AGrillz
|
||||
roame,https://x.com/timtqin
|
||||
roforco,https://x.com/brain_xiang
|
||||
rome,https://twitter.com/craigzLiszt
|
||||
roomplays,https://twitter.com/criyaco
|
||||
rosebud-biosciences,https://twitter.com/KitchenerWilson
|
||||
rowboat-labs,https://twitter.com/segmenta
|
||||
rubber-ducky-labs,https://twitter.com/alexandraj777
|
||||
ruleset,https://twitter.com/LoganFrederick
|
||||
ryvn,https://x.com/ryvnai
|
||||
safetykit,https://twitter.com/ycombinator
|
||||
sage-ai,https://twitter.com/akhilmurthy20
|
||||
saldor,https://x.com/notblandjacob
|
||||
salient,https://twitter.com/ycombinator
|
||||
schemeflow,https://x.com/browninghere
|
||||
sculpt,https://twitter.com/ycombinator
|
||||
seals-ai,https://x.com/luismariogm
|
||||
seis,https://twitter.com/TrevMcKendrick
|
||||
sensei,https://twitter.com/ycombinator
|
||||
sensorsurf,https://twitter.com/noahjepstein
|
||||
sepal-ai,https://www.twitter.com/katqhu1
|
||||
serial,https://twitter.com/Serialmfg
|
||||
serif-health,https://www.twitter.com/mfrobben
|
||||
serra,https://twitter.com/ycombinator
|
||||
shasta-health,https://twitter.com/SrinjoyMajumdar
|
||||
shekel-mobility,https://twitter.com/ShekelMobility
|
||||
shortbread,https://twitter.com/ShortbreadAI
|
||||
showandtell,https://twitter.com/ycombinator
|
||||
sidenote,https://twitter.com/jclin22009
|
||||
sieve,https://twitter.com/mokshith_v
|
||||
silkchart,https://twitter.com/afakerele
|
||||
simple-ai,https://twitter.com/catheryn_li
|
||||
simplehash,https://twitter.com/Alex_Kilkka
|
||||
simplex,https://x.com/simplexdata
|
||||
simplifine,https://x.com/egekduman
|
||||
sizeless,https://twitter.com/cornelius_einem
|
||||
skyvern,https://x.com/itssuchintan
|
||||
slingshot,https://twitter.com/ycombinator
|
||||
snowpilot,https://x.com/snowpilotai
|
||||
soff,https://x.com/BernhardHausle1
|
||||
solum-health,https://twitter.com/ycombinator
|
||||
sonnet,https://twitter.com/ycombinator
|
||||
sophys,https://twitter.com/ycombinator
|
||||
sorcerer,https://x.com/big_veech
|
||||
soteri-skin,https://twitter.com/SoteriSkin
|
||||
sphere,https://twitter.com/nrudder_
|
||||
spine-ai,https://twitter.com/BudhkarAkshay
|
||||
spongecake,https://twitter.com/ycombinator
|
||||
spur,https://twitter.com/sneha8sivakumar
|
||||
sre-ai,https://twitter.com/ycombinator
|
||||
stably,https://x.com/JinjingLiang
|
||||
stack-ai,https://twitter.com/bernaceituno
|
||||
stellar,https://twitter.com/ycombinator
|
||||
stormy-ai-autonomous-marketing-agent,https://twitter.com/karmedge/
|
||||
strada,https://twitter.com/AmirProd1
|
||||
stream,https://twitter.com/ycombinator
|
||||
structured-labs,https://twitter.com/amruthagujjar
|
||||
studdy,https://twitter.com/mike_lamma
|
||||
subscriptionflow,https://twitter.com/KashifSaleemCEO
|
||||
subsets,https://twitter.com/ycombinator
|
||||
supercontrast,https://twitter.com/ycombinator
|
||||
supertone,https://twitter.com/trysupertone
|
||||
superunit,https://x.com/peter_marler
|
||||
sweep,https://twitter.com/wwzeng1
|
||||
syncly,https://x.com/synclyhq
|
||||
synnax,https://x.com/Emilbon99
|
||||
syntheticfi,https://x.com/SyntheticFi_SF
|
||||
t3-chat-prev-ping-gg,https://twitter.com/t3dotgg
|
||||
tableflow,https://twitter.com/mitchpatin
|
||||
tai,https://twitter.com/Tragen_ai
|
||||
tandem-2,https://x.com/Tandemspace
|
||||
taxgpt,https://twitter.com/ChKashifAli
|
||||
taylor-ai,https://twitter.com/brian_j_kim
|
||||
teamout,https://twitter.com/ycombinator
|
||||
tegon,https://twitter.com/harshithb4h
|
||||
terminal,https://x.com/withterminal
|
||||
theneo,https://twitter.com/robakid
|
||||
theya,https://twitter.com/vikasch
|
||||
thyme,https://twitter.com/ycombinator
|
||||
tiny,https://twitter.com/ycombinator
|
||||
tola,https://twitter.com/alencvisic
|
||||
trainy,https://twitter.com/TrainyAI
|
||||
trendex-we-tokenize-talent,https://twitter.com/ycombinator
|
||||
trueplace,https://twitter.com/ycombinator
|
||||
truewind,https://twitter.com/AlexLee611
|
||||
trusty,https://twitter.com/trustyhomes
|
||||
truva,https://twitter.com/gaurav_aggarwal
|
||||
tuesday,https://twitter.com/kai_jiabo_feng
|
||||
twenty,https://twitter.com/twentycrm
|
||||
twine,https://twitter.com/anandvalavalkar
|
||||
two-dots,https://twitter.com/HensonOrser1
|
||||
typa,https://twitter.com/sounhochung
|
||||
typeless,https://twitter.com/ycombinator
|
||||
unbound,https://twitter.com/ycombinator
|
||||
undermind,https://twitter.com/UndermindAI
|
||||
unison,https://twitter.com/maxim_xyz
|
||||
unlayer,https://twitter.com/adeelraza
|
||||
unstatiq,https://twitter.com/NishSingaraju
|
||||
unusual,https://x.com/willwjack
|
||||
upfront,https://twitter.com/KnowUpfront
|
||||
vaero,https://twitter.com/ycombinator
|
||||
vango-ai,https://twitter.com/vango_ai
|
||||
variance,https://twitter.com/karinemellata
|
||||
variant,https://twitter.com/bnj
|
||||
velos,https://twitter.com/OscarMHBF
|
||||
velt,https://twitter.com/rakesh_goyal
|
||||
vendra,https://x.com/vendraHQ
|
||||
vera-health,https://x.com/_maximall
|
||||
verata,https://twitter.com/ycombinator
|
||||
versive,https://twitter.com/getversive
|
||||
vessel,https://twitter.com/vesselapi
|
||||
vibe,https://twitter.com/ycombinator
|
||||
videogen,https://twitter.com/ycombinator
|
||||
vigilant,https://twitter.com/BenShumaker_
|
||||
vitalize-care,https://twitter.com/nikhiljdsouza
|
||||
viva-labs,https://twitter.com/vishal_the_jain
|
||||
vizly,https://twitter.com/vizlyhq
|
||||
vly-ai-2,https://x.com/victorxheng
|
||||
vocode,https://twitter.com/kianhooshmand
|
||||
void,https://x.com/parel_es
|
||||
voltic,https://twitter.com/ycombinator
|
||||
vooma,https://twitter.com/jessebucks
|
||||
wingback,https://twitter.com/tfriehe_
|
||||
winter,https://twitter.com/AzianMike
|
||||
wolfia,https://twitter.com/narenmano
|
||||
wordware,https://twitter.com/kozerafilip
|
||||
zenbase-ai,https://twitter.com/CyrusOfEden
|
||||
zeropath,https://x.com/zeropathAI
|
||||
|
29
dedup_links.py
Normal file
29
dedup_links.py
Normal file
@@ -0,0 +1,29 @@
|
||||
import csv
|
||||
|
||||
companies = {}
|
||||
|
||||
with open("twitter_links.txt", "r") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
parts = line.split(":", 1)
|
||||
if len(parts) != 2:
|
||||
continue
|
||||
|
||||
company, url = parts
|
||||
url = url.strip()
|
||||
|
||||
# Store only the first URL for each company
|
||||
if company not in companies:
|
||||
companies[company] = url
|
||||
|
||||
# Write to CSV
|
||||
with open("company_links.csv", "w", newline="") as f:
|
||||
writer = csv.writer(f)
|
||||
writer.writerow(["Company", "Link"])
|
||||
for company, url in sorted(companies.items()):
|
||||
writer.writerow([company, url])
|
||||
|
||||
print(f"Deduped {len(companies)} companies to company_links.csv")
|
||||
1
requirements.txt
Normal file
1
requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
playwright==1.42.0
|
||||
161
scrape_yc_twitter.py
Normal file
161
scrape_yc_twitter.py
Normal file
@@ -0,0 +1,161 @@
|
||||
#!/usr/bin/env python3
|
||||
import argparse
|
||||
import asyncio
|
||||
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
|
||||
async def scrape_twitter_links(url):
|
||||
async with async_playwright() as p:
|
||||
browser = await p.chromium.launch(
|
||||
headless=False
|
||||
) # Use non-headless for better scrolling
|
||||
page = await browser.new_page(viewport={"width": 1280, "height": 800})
|
||||
|
||||
print(f"Navigating to main page: {url}")
|
||||
await page.goto(url)
|
||||
await page.wait_for_load_state("networkidle")
|
||||
|
||||
# More aggressive scrolling to load all company cards
|
||||
company_links = set() # Use a set for automatic deduplication
|
||||
no_new_links_count = 0
|
||||
|
||||
print("Starting to scroll and collect company links...")
|
||||
|
||||
# First, try scrolling to the very bottom
|
||||
await scroll_to_bottom(page)
|
||||
|
||||
# Then collect all links
|
||||
prev_size = 0
|
||||
while True:
|
||||
# Get all company links
|
||||
elements = await page.query_selector_all('a[href^="/companies/"]')
|
||||
|
||||
for element in elements:
|
||||
href = await element.get_attribute("href")
|
||||
if href and "/companies/" in href and "?" not in href:
|
||||
company_url = f"https://www.ycombinator.com{href}"
|
||||
company_links.add(company_url)
|
||||
|
||||
current_size = len(company_links)
|
||||
print(f"Found {current_size} unique company links so far...")
|
||||
|
||||
if current_size == prev_size:
|
||||
no_new_links_count += 1
|
||||
if no_new_links_count >= 3:
|
||||
print("No new links found after multiple attempts, ending scroll.")
|
||||
break
|
||||
else:
|
||||
no_new_links_count = 0
|
||||
|
||||
prev_size = current_size
|
||||
|
||||
# Try to click "Load More" button if it exists
|
||||
try:
|
||||
load_more = await page.query_selector('button:has-text("Load More")')
|
||||
if load_more:
|
||||
await load_more.click()
|
||||
print("Clicked 'Load More' button")
|
||||
await page.wait_for_timeout(3000)
|
||||
await scroll_to_bottom(page)
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f"Error clicking Load More: {str(e)}")
|
||||
|
||||
# Scroll more
|
||||
try:
|
||||
await scroll_to_bottom(page)
|
||||
except Exception as e:
|
||||
print(f"Error scrolling: {str(e)}")
|
||||
break
|
||||
|
||||
print(f"Found {len(company_links)} total unique company links after scrolling")
|
||||
|
||||
# Visit each company page and extract Twitter links
|
||||
twitter_data = []
|
||||
|
||||
for i, company_url in enumerate(sorted(company_links)):
|
||||
print(f"Processing company {i+1}/{len(company_links)}: {company_url}")
|
||||
try:
|
||||
await page.goto(company_url)
|
||||
await page.wait_for_load_state("networkidle")
|
||||
|
||||
# Extract company name from URL
|
||||
company_name = company_url.split("/")[-1]
|
||||
|
||||
# Find all links on the page
|
||||
all_links = await page.query_selector_all("a")
|
||||
twitter_links = []
|
||||
|
||||
for link in all_links:
|
||||
href = await link.get_attribute("href")
|
||||
if href and ("twitter.com" in href or "x.com" in href):
|
||||
twitter_links.append(href)
|
||||
|
||||
if twitter_links:
|
||||
for twitter_link in twitter_links:
|
||||
twitter_data.append(f"{company_name}: {twitter_link}")
|
||||
else:
|
||||
twitter_data.append(f"{company_name}: No Twitter/X link found")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error processing {company_url}: {str(e)}")
|
||||
|
||||
await browser.close()
|
||||
return twitter_data
|
||||
|
||||
|
||||
async def scroll_to_bottom(page):
|
||||
"""Aggressively scroll to the bottom of the page."""
|
||||
print("Scrolling to bottom...")
|
||||
|
||||
# Get the current height of the page
|
||||
await page.evaluate("document.body.scrollHeight")
|
||||
|
||||
# while True:
|
||||
# Scroll to bottom
|
||||
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
|
||||
await page.wait_for_timeout(2000) # Wait for content to load
|
||||
|
||||
# Check if we've reached the bottom
|
||||
await page.evaluate("document.body.scrollHeight")
|
||||
# if current_height == prev_height:
|
||||
# break
|
||||
|
||||
# Additional scrolls for extra measure
|
||||
for _ in range(3):
|
||||
await page.keyboard.press("End")
|
||||
await page.wait_for_timeout(500)
|
||||
|
||||
|
||||
async def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Scrape Twitter links from YC company pages"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--url",
|
||||
default="https://www.ycombinator.com/companies?batch=W23&batch=S23&batch=S24&batch=F24&batch=S22&batch=W22&query=San%20Francisco",
|
||||
help="URL to scrape (default: YC companies from recent batches)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
default="twitter_links.txt",
|
||||
help="Output file name (default: twitter_links.txt)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--headless", action="store_true", help="Run in headless mode (default: False)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
twitter_links = await scrape_twitter_links(args.url)
|
||||
|
||||
# Save to file
|
||||
with open(args.output, "w") as f:
|
||||
f.write("\n".join(twitter_links))
|
||||
|
||||
print(f"Saved {len(twitter_links)} results to {args.output}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
1419
twitter_links.txt
Normal file
1419
twitter_links.txt
Normal file
File diff suppressed because it is too large
Load Diff
@@ -3018,7 +3018,11 @@ export function ChatPage({
|
||||
currentAlternativeAssistant
|
||||
}
|
||||
messageId={message.messageId}
|
||||
content={message.message}
|
||||
content={
|
||||
userFiles
|
||||
? message.message
|
||||
: "message.message"
|
||||
}
|
||||
files={message.files}
|
||||
query={
|
||||
messageHistory[i]?.query || undefined
|
||||
|
||||
@@ -508,7 +508,11 @@ export const AIMessage = ({
|
||||
userKnowledgeFiles={userKnowledgeFiles}
|
||||
/>
|
||||
)}
|
||||
|
||||
{userKnowledgeFiles ? (
|
||||
<div className="h-10 w-10 rounded-full bg-black" />
|
||||
) : (
|
||||
<div className="h-10 w-10 rounded-full bg-red-400" />
|
||||
)}
|
||||
{!userKnowledgeFiles &&
|
||||
toolCall &&
|
||||
!TOOLS_WITH_CUSTOM_HANDLING.includes(
|
||||
|
||||
Reference in New Issue
Block a user