Hiring Through Computer Voice Profiling, a Moral Achievement?

Rana King

10 years ago

English (North American), Male, Adult, Corporate, Production Skills – these are some of the requirements set by the client when they search for the right voiceover talent for their campaign or project. Major voiceover marketplace sites use an algorithm that filters their pool of talents based on how the talents have accomplished their profiles and send out the invitation to audition to those who fit the requirements set by the client.

But what if we take it one step further?

Whose voice is engaging? Who is trustworthy? Who can convince the audience to purchase? Who is the voice of reason, the voice of authority? Who can the audience relate to? Depending on their project needs, these are some of the questions that go through the minds of casting directors, producers, and clients while they listen to voiceover submissions.

While they judge the voiceover actor by his/her skills, talent and delivery, it is like going through a voice beauty contest with a set of criterion that the talent needs to tick off. And these criteria are based on years of studies and consumer feedback on what is appealing to their target audience.

But what if we can eliminate this process and save hours of listening to recordings?

Humans are hardwired to judge, form perceptions the moment they interact – whether it be upon seeing, touching or listening, our brains start to formulate different impressions. In the first few seconds we create perceptions that leads to feelings and then snap decisions. What if these perceptions and feelings are bottled up, or more accurately computerized?

A company says they have done just that.

Jobaline has taken years of scientific studies and focus groups results on the human voice and fed it into algorithms. The program categorize and interpret the emotions evoked when listening to a person speaking, and validate it with real human listeners. CEO of Jobaline, Luis Salazar says in an interview with NPR, “We’re not analyzing how the speaker feels. That’s irrelevant.” What they are homing on is the “emotion that that voice is going to generate on the listener.”

Is this not what casting directors, producers, and clients are looking for – who can evoke the right emotions that they wish their audience would feel?

Regardless of the intonation, emotions expressed, speech rate, and other qualities that can form the listener’s perception, there would always be an underlying quality in a person’s voice. This is man’s unique vocal fingerprint. Perception though varies from person to person. What can be perceived as high pitched and excitable for some, maybe heard as an expression of happiness for others. Jobaline’s approach was to identify interactions between an array of different features from pitch to energy accumulated over time. With this they aim to accurately predict which voice is suitable for a particular job using the right combination of vocal features. So far, the company’s formula can determine if a voice is engaging, calming, and/or trustworthy.

The company lauds that this is a moral achievement. “That’s the beauty of math,” Salazar says. “It’s blind.” This computer automation is said to not only cut costs, but also eliminates any biases. It is unaware of differences in race, gender, sexual preferences or age, and for the sake of argument, years of experience in honing your skill or craft.

The problem is, this “blind audition” is still riddled with unfairness and prejudice. Like any form of profiling, voice profiling is scary. We leave the first say to an impersonal machine on who gets to the next stage. Voice profiling at this level is dangerous, as we pass on man’s prejudices to an unfeeling binary program that only few can comprehend.

Can this technology be used by voiceover platforms to help clients screen talents? Isn’t this the essence of screening – the years of experience of casting directors, producers, and clients, backed up with customer data, made efficient by binary codes?

Imagine reading the script and employing years of training and experience to deliver the copy perfectly – your recording fed through the system, and then only to be passed up for the project because a combination of ones and zeroes says your voice is lacking.

Are we going to let robots take over humanity?