A command-line interface for Pocketcast that adds AI-powered features like transcription and search to your podcast listening experience. Built in just half a day as an experiment in AI-assisted development, inspired by Simon Willison's post about Gemini transcription.
- Browse and search your Pocketcast episodes with a beautiful terminal UI
- Interactive real-time search with
/
key and live filtering - Quick filter cycling with
f
key (All → Downloaded → Starred → Archived → Transcribed) - Multiple sorting options with
s
key:- Date (newest first)
- Duration (longest/shortest first)
- Podcast name
- Toggle starring with
*
key - Automatic downloads when playing episodes
- Episode ID display for reference and debugging
- Detailed show notes with HTML formatting stripped
- Status indicators for downloaded (↓), starred (★), and transcribed (T) episodes
- Real-time episode list updates during search and filtering
- Play/pause with Enter key
- Skip forward/backward 30 seconds with arrow keys
- Real-time progress bar and duration display
- Automatic download handling before playback
- Uses ffmpeg for reliable audio extraction and seeking
- Clean process management for reliable playback
- Status bar with playback controls and current state
- Download progress indicator with percentage
- Automatic transcription when playing downloaded episodes
- Uses Google's Gemini API with specific model selection
- Real-time progress tracking during transcription
- Live segment updates as transcription progresses
- Highlighted current segment during playback
- Navigate transcript with Up/Down and PgUp/PgDn keys
- Smart scrolling with 1/3 past, 2/3 future context ratio
- Scroll indicators (↑/↓) when more content is available
- Dimmed past segments for better context
- Interact with episode transcripts through chat interface
- Automatically enables when transcript is complete
- Clear status indicators showing chat availability
- Maintains context about episode content
- Preserves playback state during chat sessions
- Press 'c' to enter chat mode when transcript is ready
Watch as the transcript automatically follows along with the audio
- FFMPEG
- Ruby 3.0+
- llm-gemini for transcription
- ffmpeg for audio processing and seeking
-
Clone the repository:
git clone https://github.com/The-Focus-AI/pocketcast-collaborator.git cd pocketcast-collaborator
-
Install dependencies:
bundle install brew install ffmpeg # Required for audio processing pip install llm-gemini
-
We look for pocketcast credentials in 1password
Start the application:
bin/pocketcast
f
Cycle through filter options (All → Downloaded → Starred → Archived → Transcribed)*
Toggle starred status for current episodes
Cycle through sort options (Date → Duration → Duration Asc → Podcast)/
Start interactive real-time search- Search by title or podcast name
- Up/Down arrows to navigate results while searching
- Enter to select, Escape to cancel
- Special searches: "longest" (top 10% longest), "shortest" (top 10% shortest)
r
Refresh episode list from Pocketcast↑/↓
ork/j
Navigate through episodesEnter
Select and play episodeq
Quit current view
Enter
Play/Pause←/→
Skip backward/forward 30 seconds↑/↓
Navigate transcript segmentsPgUp/PgDn
Scroll transcript pagesc
Open chat interface (when transcript is available)q
Quit current view/player
-
Episode Management:
- Connects to your Pocketcast account to access your episodes
- Maintains a local cache for quick filtering and searching
- Supports complex filtering combinations
- Real-time search with instant results
- Smart sorting with proper handling of missing data
-
Audio Processing:
- Uses ffmpeg to extract audio segments for precise seeking
- Handles various audio formats and bitrates automatically
- Provides accurate playback position for transcript sync
- Enables instant seeking without re-buffering
- Clean process management for reliable playback
-
Transcription:
- Automatically transcribes downloaded episodes using Google's Gemini API
- Displays real-time progress during transcription
- Saves transcripts for future playback
- Synchronizes transcript display with audio position
- Smart scrolling to maintain context while reading
This project was built in just half a day (March 26, 2024) as an experiment in AI-assisted development. Inspired by Simon Willison's exploration of Gemini's transcription capabilities, I wanted to create a tool that would make podcast listening more interactive and searchable.
Using Claude in Cursor, we were able to rapidly:
- Set up the basic Pocketcast API integration
- Implement a robust terminal UI
- Add audio playback with ffmpeg
- Integrate real-time transcription
- Create a synchronized transcript display
- Add intelligent search and filtering
- Implement smart scrolling and navigation
Check the WORKLOG.md for the detailed development timeline and progress.
The codebase follows a clean, modular architecture with clear separation of concerns:
lib/pocketcast_cli/
├── commands/ # UI and CLI components (presentation layer)
│ ├── chat.rb # Chat command interface
│ ├── cli_command.rb # Main CLI command hub
│ ├── episode_selector_command.rb # Episode browsing interface
│ ├── podcast_player.rb # Audio player interface
│ └── transcribe.rb # Transcription command
├── models/ # Data models (domain layer)
│ ├── episode.rb # Episode data representation
│ └── transcript.rb # Transcript data handling
└── services/ # Business logic (service layer)
├── chat_service.rb # Chat interaction business logic
├── episode_service.rb # Episode management services
├── path_service.rb # Centralized file path handling
├── player_service.rb # Audio playback functionality
├── pocketcast_service.rb # API integration with Pocketcast
└── transcription_service.rb # Transcript generation logic
Key architectural principles:
- Service-Oriented Design: Core business logic is encapsulated in dedicated service classes
- Dependency Injection: Services are passed as dependencies rather than instantiated directly
- Clean Command Pattern: UI commands act as thin wrappers around service layer
- Clear Responsibility Boundaries: Each component has a single, well-defined responsibility
- Avoidance of Circular Dependencies: Services maintain clear unidirectional relationships
Benefits of this architecture:
- Easier addition of new features and commands
- Isolated testing of business logic
- Improved maintainability with proper separation
- Better component reuse across different UI elements
- Enhanced stability through clear boundaries
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -am 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Pocketcasts for their amazing podcast platform
- llm-gemini for transcription capabilities
- FFmpeg for audio processing and seeking support
- Simon Willison for the inspiration and Gemini transcription exploration