This repo houses the code for the HubSpot DevRel Labs LLMs.txt File Generator. This is a project that generates llms.txt files from website content. We're using the HubSpot Projects framework, so that you can use this app in your HubSpot account.
There are a few things that must be set up before you can make use of this getting started project.
- You must have an active unified HubSpot developer account.
- You must have the HubSpot CLI installed and set up.
- You must have a Vercel account to setup the serverless function.
- You must have Node.js installed to run the local version.
- You must have npm installed to run the local version.
This repo houses both a Node.js version local version and a Vercel serverless function version. The node.js version is located in the local-node-version folder, and the Vercel serverless function is located in the vercel-serverless folder.
This project is also a HubSpot Projects version 2025.2 project that you can upload to your HubSpot account. It uses static auth that you can install in a singular portal.
The HubSpot CLI enables you to run this project locally so that you may test and iterate quickly. Getting started is simple, just run this HubSpot CLI command in the hubspot-project directory and follow the prompts:
hs project devUpload the project to your HubSpot account either using the HubSpot CLI or by using the HubSpot developer MCP server. For more information on how to use the HubSpot project, please refer to the HubSpot Project README.md file.
Then, upload your Vercel serverless function to Vercel and retrieve the API endpoint and update the API endpoint in your app settings TSX file. For more information on how to use the Vercel serverless function, please refer to the Vercel Serverless Function README.md file.
To run the local version, navigate to the local-node-version folder and run the following command:
npm run build
node dist/index.js <url-to-process> <optional-url-limit>The output will be written to a txt file in the output directory. If you've already run the script, you can run it again, and a new file will be created, using the following naming convention: llms-1.txt, llms-2.txt, etc. For more information on how to use the local version, please refer to the Local Node Version README.md file.
Architecture: How the HubSpot App Settings Page Frontend Connects to the Vercel Serverless Function Backend
This project uses a distributed architecture where a HubSpot UI Extension communicates with a Vercel serverless function to process web content and generate llms.txt files.
┌───────────────────────────────────────────────────────────────────┐
│ HubSpot Account │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ UI Extension (App Settings Page) │ │
│ │ - User inputs sitemap/single URL │ │
│ │ - Selects processing mode (sitemap/single URL) │ │
│ │ - Sets URL limit (optional) │ │
│ └──────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ │ 1. hubspot.fetch() POST request │
│ │ {mode, url, urlLimit, portalId} │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Vercel Serverless API │ │
│ │ /api/sitemap-processor │ │
│ └──────────┬──────────────┘ │
│ │ │
│ │ 2. Process Request │
│ │ - Validate payload │
│ │ - Fetch sitemap/URL │
│ ▼ - Extract content │
│ ┌─────────────────────────┐ │
│ │ Content Processing │ │
│ │ - Fetch URLs │ │
│ │ - Parse HTML (cheerio) │ │
│ │ - Extract text content │ │
│ └──────────┬──────────────┘ │
│ │ │
│ │ 3. Upload to HubSpot │
│ │ POST `/files/v3/files` │
│ ▼ (Bearer Token Auth) │
│ ┌─────────────────────────┐ │
│ │ HubSpot Files API │ │
│ │ - Creates llms-*.txt │ │
│ │ - Stores in /llm-gen.. │ │
│ └──────────┬──────────────┘ │
│ │ │
│ │ 4. Return Response │
│ │ {success, fileUrl, ...} │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ UI Extension - Display Results │ │
│ │ - Shows success/error alert │ │
│ │ - Displays link to file in File Manager │ │
│ │ - Shows processing stats (URLs processed, time elapsed) │ │
│ └────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
1. User Interaction (HubSpot UI Extension)
- Component:
LLMsGeneratorSettingsPage.tsx - User selects mode:
sitemaporsingle_url - Enters target URL and optional URL limit
- Clicks "Generate LLM File" button
2. API Request
- Uses
hubspot.fetch()to make CORS-compliant POST request - Endpoint: Configured in
VERCEL_CONFIG.endpoint - Payload includes:
{mode, url, urlLimit, portalId} - Portal ID automatically retrieved from
context.portal.id
3. Vercel Serverless Processing
- Function:
/api/sitemap-processor.ts - Validates request and extracts parameters
- Fetches sitemap XML or single URL
- Iterates through URLs extracting text content
- Implements rate limiting and timeout protection (14 min max)
4. Content Extraction
- Removes scripts, styles, and non-content elements using regex
- Extracts clean text from main content areas using regex
- Formats with page headers and metadata using regex
5. HubSpot File Upload
- Service:
hubspot-uploader.ts - API: HubSpot Files API v3 (
/files/v3/files) - Authentication: Bearer token from
HUBSPOT_ACCESS_TOKENenv var - Uploads to
/llm-generatorfolder - Sets file as
PUBLIC_INDEXABLE
6. Response & Display
- Returns JSON with:
{success, fileUrl, processedUrls, totalUrls, siteName, timeElapsed} - UI displays success alert with clickable link
- Link opens File Manager directly to the generated file
- HubSpot Project: Must update
VERCEL_CONFIG.endpointinLLMsGeneratorSettingsPage.tsx - Vercel Function: Must set
HUBSPOT_ACCESS_TOKENenvironment variable - CORS: Vercel function configured to allow
https://app.hubspot.comorigin - Timeouts: Function respects Vercel's 15-minute serverless limit
This project uses GitHub Actions to auto-deploy the project to your HubSpot account when you push to the main branch. Your GitHub Actions workflow file is located in the .github/workflows/main.yml file.
In your GitHub repository, create two new secrets for:
HUBSPOT_ACCOUNT_ID: the ID of your HubSpot account.HUBSPOT_PERSONAL_ACCESS_KEY: your personal access key.
This project relies on a serverless function to process the sitemap/URL to parse the content and generate the LLMs.txt file. The function then uses the HubSpot Files API to upload the file to the HubSpot File Manager. To call the function, the app uses the hubspot.fetch function to make a POST request to an API endpoint that the function interacts with.
The local version relies on Node.js to run the script. It uses the axios library to fetch the content of the URL and the cheerio library to parse the content. It uses the fs library to write the content to a txt file. It uses the path library to create the output directory. It uses the url library to normalize the URL. It uses the __dirname library to get the current directory.
This project uses a third-party backend service for several strategic advantages:
- Better processing: Depending on the backend service you choose, you're processing time can be extended so that users can generate comprehensive llms.txt files from entire website sitemaps in a single operation
- Resource Isolation: The backend scales independently from your HubSpot account, preventing heavy processing from impacting other HubSpot operations
- Optimized Environment: Third-party platforms are specifically designed for compute-intensive tasks like web scraping and content parsing
- Concurrent Processing: Handle multiple requests simultaneously without affecting HubSpot app responsiveness
- Library Freedom: Use any npm package without HubSpot's runtime restrictions (e.g.,
cheerio,axios, advanced XML parsers) - Native APIs: Direct access to Node.js APIs and filesystem operations during processing
- Flexibility: Easily integrate with other services, databases, or APIs as needed
- Independent Versioning: Update and deploy backend logic without re-uploading your entire HubSpot Project
- Environment Variables: Securely manage sensitive credentials (like
HUBSPOT_ACCESS_TOKEN) using your backend's environment variable system - Easy Testing: Test backend logic locally or in staging environments before affecting production HubSpot apps
- CI/CD Integration: Leverage your backend's automatic deployments from Git repositories
- Pay-per-Use: Only pay for compute resources when processing requests, rather than maintaining always-on infrastructurex
- Reduced HubSpot Limits: Offload processing to avoid hitting HubSpot's API rate limits or execution quotas
- Provider Flexibility: The same architecture pattern works with AWS Lambda, Google Cloud Functions, or any serverless platform
- Feature Growth: Easily add new capabilities like PDF generation, image processing, or database storage without HubSpot constraints
- Multi-Cloud: Deploy to multiple providers for redundancy or geographic distribution
This third-party backend pattern is ideal for HubSpot Projects that require:
- Long-running operations (>10 seconds)
- Heavy data processing or web scraping
- Access to specialized npm packages
- Independent scaling from HubSpot
- Frequent backend updates without HubSpot redeployment
This project is made with ❤️ by the HubSpot DevRel Team and is part of the HubSpot DevRel Labs initiative.
To contribute to this project, please create a new branch and submit a pull request. A member of the HubSpot DevRel Team will review the pull request and merge it into the main branch.