Bilibili Analyzer

A Python tool to fetch video information from Bilibili using either a video URL/BVID or user UID.

Features

Fetch detailed information about a single video using its URL or BVID
Fetch all videos from a user using their UID
Auto-detect if an identifier is a user UID or video BVID
Extract video subtitles using multiple methods (API, yt-dlp, Whisper AI)
Export all subtitles from a user's videos to a single text file
Beautiful console output using rich formatting with progress indicators
Support for both bilibili.com and b23.tv URLs
Automatic handling of video duration and timestamp formatting
LLM post-processing for Whisper transcripts

Requirements

Python >= 3.12, < 4.0
Poetry for dependency management
ffmpeg (for audio extraction)

Installation

Clone this repository
Install dependencies using Poetry:

poetry install

Usage

Fetch information about a single video

Using a Bilibili URL:

poetry run python main.py "https://www.bilibili.com/video/BV1xx411c7mD"

Using a BVID:

poetry run python main.py "BV1xx411c7mD"

Fetch all videos from a user

Using a user's UID (automatic detection):

poetry run python main.py 123456

You can also explicitly specify to fetch user videos:

poetry run python main.py 123456 --user

This will:

Fetch all videos uploaded by the user with the given UID
Display a table with video details (BVID, title, duration, view count, upload time)
Show the total number of videos found

Extract Video Text Content

You can extract all text content from a video, including description, subtitles, and comments.

Extract with default settings

python main.py BV1xx411c7mD --text

This will attempt to extract subtitles using three methods in order:

Bilibili API
yt-dlp subtitle extraction
Whisper AI audio transcription (if the first two methods fail)

Authentication for premium content

For videos that require authentication:

python main.py BV1xx411c7mD --text --browser chrome

This extracts cookies directly from your browser for authenticated access.

Control which content to include

python main.py BV1xx411c7mD --text --content subtitles,comments,uploader

Subtitles are always exported as plain text. There is no markdown formatting option.

Format options

Subtitles are always exported as plain text. The --format option is no longer available.

Save output to file

python main.py BV1xx411c7mD --text -o output.txt

Export All User Subtitles

You can export all subtitles from a user's videos into a single text file:

python main.py 12345678 --export-user-subtitles

This will:

Get the list of all videos from the user
Extract subtitles from each video using the same 3-step approach (API → yt-dlp → Whisper)
Clean up and remove timestamps from the subtitles
Add video information header to each video's subtitles
Combine all subtitles into a single text file
Save the file in a folder named after the user's UID (e.g., user_12345678/all_subtitles.txt)
Generate a statistics file with processing details

Authentication for subtitle export

For users with private or premium videos, authentication is required:

python main.py 12345678 --export-user-subtitles --browser chrome

If authentication is needed but not provided, the program will exit early with a clear message.

Customize the export process

Limit the number of videos to process:

python main.py 12345678 --export-user-subtitles --subtitle-limit 10

Exclude video descriptions from headers:

python main.py 12345678 --export-user-subtitles --no-description

Specify output file:

python main.py 12345678 --export-user-subtitles -o custom_output.txt

Output files

The tool creates a folder structure for the output:

user_12345678/
  ├── all_subtitles.txt    # Combined subtitles from all videos
  └── stats.txt            # Statistics about the processing

Command Line Options

Option	Description
`identifier`	Bilibili video URL, BVID, or user UID (required)
`--user`	Explicitly fetch videos from a user (overrides auto-detection)
`--text`	Get video text content including subtitles in plain text format
`--content`	Comma-separated list of content to include (subtitles,comments,uploader)
`--comment-limit`	Number of top comments to include (default: 10)
`--output`, `-o`	Output file path (if not specified, print to console)
`--browser`	Browser to extract cookies from (chrome or firefox) for authenticated access
`--debug`	Enable debug logging output
`--retry-llm`	Retry LLM post-processing for an existing Whisper transcript
`--export-user-subtitles`	Export all subtitles from a user's videos to a single text file
`--subtitle-limit`	Limit the number of videos to process when exporting subtitles
`--no-description`	Don't include video descriptions in exported subtitles
`--no-meta-info`	Don't include meta info (title, views, coins, etc.) in the header of each video in exported subtitles
`--force-charging`	Force download attempt for charging exclusive videos without confirmation prompts
`--skip-charging`	Skip charging exclusive videos completely when batch processing

Auto-Detection

The tool automatically detects if the identifier is a user UID or a video BVID/URL:

If the identifier is a purely numeric value with 10 or fewer digits, it's treated as a user UID
Otherwise, it's treated as a video BVID or URL
The --user flag can be used to override this detection and force user mode

Handling Charging Exclusive Videos

The tool detects Bilibili's charging exclusive videos (付费充电视频) which require payment for full access:

# Video info will show charging status
python main.py BV1xx411c7mD

When downloading charging exclusive videos:

By default, the tool warns you that only a preview (typically ~1 minute) is available
You can choose to continue with the limited content or cancel the download
After downloading, the tool verifies if the content is complete or truncated

Options for handling charging videos:

# Skip charging videos entirely when batch processing
python main.py 12345678 --export-user-subtitles --skip-charging

# Force download attempts without confirmation prompts
python main.py 12345678 --export-user-subtitles --force-charging

Note: These options do not bypass Bilibili's content protection. You will still only get the free preview portion without proper payment.

Logging and Debug Output

Control the verbosity of logs:

# Show detailed debug information
python main.py BV1xx411c7mD --text --debug

# Normal operation with only important info
python main.py BV1xx411c7mD --text

The debug mode shows:

Detailed yt-dlp commands and outputs
API request details
Whisper processing details
LLM processing information

Subtitle Extraction Features

Subtitle Source Priority

Bilibili Official API: Attempts to get subtitles through the official API
yt-dlp Subtitle Extraction: Uses yt-dlp to extract available subtitles
Whisper AI Transcription: Fallback method that downloads audio and uses Whisper AI for transcription

Retry LLM Post-Processing

If the LLM post-processing of Whisper transcripts fails or you want to reprocess:

python main.py BV1xx411c7mD --retry-llm

Environment Variables

Create a .env file with the following variables:

# Bilibili credentials (optional)
BILIBILI_SESSDATA=your_sessdata
BILIBILI_BILI_JCT=your_bili_jct
BILIBILI_BUVID3=your_buvid3

# LLM configuration for transcript improvement
LLM_MODEL=openai:gpt-4.1-nano
LLM_API_KEY=your_api_key
LLM_BASE_URL=https://api.openai.com # or your custom endpoint

Output

The tool will display the information in a nicely formatted table, including:

Video title and description
Duration (in seconds)
View, like, coin, favorite, and share counts
Upload time (in YYYY-MM-DD HH:MM:SS format)
Uploader information

For video text content, it will include:

Video basic info
Uploader details
Tags and categories
Subtitles (from API, yt-dlp or Whisper)
Top comments

Notes

Rate limiting may apply
Some videos require authentication for subtitle access
Whisper transcription quality varies depending on audio quality
LLM post-processing requires a valid API key and connection

Streamlit Web Interface

The project now includes a beautiful Streamlit web interface for easier use of the Bilibili Analyzer.

Starting the Streamlit App

Since the project uses Poetry for dependency management, you can start the Streamlit app directly after installing dependencies:

# After running poetry install
poetry run streamlit run app.py

Or run it directly after activating the Poetry environment:

# After activating the environment
streamlit run app.py

Streamlit App Features

The Streamlit interface provides a user-friendly way to use the Bilibili Analyzer:

Video Analysis: Fetch information and text content from Bilibili videos
User Analysis: Export subtitles from a user's videos
Visualizations: Generate charts and graphs to analyze a user's content
Settings: Configure default options for the application
Help: Access comprehensive documentation

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.streamlit		.streamlit
pages		pages
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
bilibili_client.py		bilibili_client.py
extract_cookies.py		extract_cookies.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
utilities.py		utilities.py

License

zhoudaxia233/bilibili-analyzer

Folders and files

Latest commit

History

Repository files navigation