🎙️ Voice Command Identity Dataset (VocID)

VocID is a dataset of short, command-based utterances collected for research on speaker verification and voice spoofing in realistic, access-control scenarios.
It includes over 10,000 recordings from 30 Italian-native speakers, captured under controlled acoustic conditions using multiple mobile devices.

The dataset is particularly suited for evaluating the effectiveness of deepfake attacks, biometric verification pipelines, and automatic speech recognition (ASR) in voice-controlled systems (e.g., smart homes, banking interfaces).

📦 Dataset Features

30 speakers (18 male, 12 female)
8 fixed voice commands (in Italian and English)
Reading tasks and multi-command sessions
Recorded with Google Pixel 3a and iPhone 14
16kHz, 16-bit PCM WAV format
Annotated metadata: speaker ID, gender, device, language

🔐 Access Request

Due to the inclusion of human voice data, the VocID dataset is distributed only for non-commercial research purposes and requires an access request.

To obtain the dataset follow the instructions below.

Download the License Agreement (PDF))
Compile and send us ([email protected]) the License Agreements signed by the Requestor of the data and by a legal representative of you institution. The document must be certified by the stamp of your company or institution or by an electronic signature.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
release_agreement_VocID.pdf		release_agreement_VocID.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Voice Command Identity Dataset (VocID)

📦 Dataset Features

🔐 Access Request

About

Uh oh!

Releases

Packages

PRALabBiometrics/VocID

Folders and files

Latest commit

History

Repository files navigation

🎙️ Voice Command Identity Dataset (VocID)

📦 Dataset Features

🔐 Access Request

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages