There has already been some discussion about ways AI could advance game development in areas other than the traditional visual quality and here's one that goes right in the footsteps of the earlier ChatGPT discussion.
Skyrim, being such a rich playground for modders, just got its
Dragonborn voice over mod, which adds AI generated voice to the previously silent protagonist. Here is a short sample (for now there is only female voice available):
You can play with this tool at their
website, add whatever (English) text you want, select various voice presets and listen to the result. In future there could be all kinds of voice types, accents, tones, moods, etc., it will be interesting to watch how the system develops.
So now we can have quest texts generated by the AI and now also voiced by the AI. A few more steps and we'll probably be able to generate entire games using AI toolsets... But joking aside (or perhaps it's not a joke at all), we seem to be entering a new deep transformation in gaming (and much more, but that's not what this site is about).
Here, have a listen to their
AI-narrated excerpt from The Great Gatsby.
Comments
Games don't usually use TTS because it can't read between the lines what emotion should be added to the speech, and because as long as you don't hire any big names voice actors are cheap. But the tech to turn any text in games into speech is there and has been already for some time, it's just a matter of whether the devs bother to make their game use it.
Godfred's Tomb Trailer: https://youtu.be/-nsXGddj_4w
Original Skyrim: https://www.nexusmods.com/skyrim/mods/109547
Serph toze kindly has started a walk-through. https://youtu.be/UIelCK-lldo
Now the thing shown with M&B mod using ChatGPT to having conversations with NPCs that's a whole can of worms as the simulation costs of each query are insane, and have to happen in the cloud, so that is not going to be a thing anytime soon unless we figure out the simulation costs of these things.
It will probably be possible for programmers to manually make that sort of adjustment to make AI-generated voiceovers express emotion properly, but it's going to take a lot of manual tweaking. If you want good voiceovers, it might well be easier and cheaper to just hire a human voice actor. Even if a programmer can get a computerized voice to read a line just right after five minutes of tweaking, a human voice actor can do that and move on to the next line a lot faster.
Logic, my dear, merely enables one to be wrong with great authority.
This is handy for some things in my mind, but wouldn't really change much in general. Like Hogwarts Legacy insisted in a player voiceover, but they were limited to a single male and female voice with a pitch shift slider that...kinda sucks and makes the voices break.
If the voiceovers had any big range of expression I might argue against it, but the voiceover in that game already is pretty calm even at the most emotive moments. All in all, you could use a set of AI voices to add more choice to a game like that, and have it arguably come out better than the weird tinny pitch shifted options the game provides.
But it's not really doing that much more for a game in and of itself.
Someone who is registered as being a flex offender is a person who feels the need to flex about everything they say.
Always be the guy that paints the house in the dark.
Lucidity can be forged with enough liquidity and pharmed for decades with enough compound interest that a reachable profit would never end.
“Microtransactions? In a single player role-playing game? Are you nuts?”
― CD PROJEKT RED
Of course it's not new, as you wrote, it's been around for a few decades in some form. But the quality of this new approach and ease of use versus the old tech are not comparable at all.
True, their showcases will favour texts that are especially suitable for this tech demo and will mostly aim for a narrative character, but even with that caveat, you must admit the AI voice is very convincing. Have a listen to their samples or just paste in there random text from this discussion, select voice and listen. The flow sounds very natural, intonation, melody, stress - it's all there.
Now, the added value is also the apparent simplicity of its implementation in games, as we can see with this use of a simple plugin for Skyrim.
As others wrote, once they will develop it into a form that can work with different moods and emotions and adapt the voice accordingly, this will be big.
Voice acting is a huge expense both financially and in terms of time and logistics needed to get it done. Hundreds of hours spent in studios and audio booths for every bigger narrative game is a major expense.
I am not saying it will go away. But the way voice acting is done will most likely change. Small devs and indie studios will have the possibility to quickly add hours of dialogue for dozens of characters just by writing the lines. Big studios will probably keep using big voice talents for the main roles, while AI can populate the game worlds with nearly endless background conversations. Just imagine RPGs like the Witcher with Novigrad full of hundreds of people in the background chatting, conversing, arguing - not to mention MMORPGs, which are ideal for this, given their naturally sprawling and persistent world design...
Throughout human history "tools" have replaced human jobs, the loom put a lot of seamstresses out of work. This tool, AI is often thought of as eventually leading to mass unemployment or the death of human kind. That's the typical negative scenario, so overplayed in SF but I want to raise a different one.
If they can now compose music, draw and write somewhat well, how long before they can do that well consistently? Think of the speed they are capable of, once they are consistently good how long would it take them to put out a library of fiction and reference works equal to everything humanity has written so far? Sure it would only be good not great, but then then most of what is already published is not great and who says one day great works would not be possible?
With the aid of 3d printers the only thing stopping them putting out say a good painting for every one ever created would only be resources and so on for the rest of art.
The unemployment issue has this idea that we will still do more intellectual work, there will always be something for us to do particularly creatives. But that is just another step for the AI to master, no job is safe. No achievement humans might attempt could end up being done just as well as by an AI.
As societies we rarely think about why we are here, what the aim of our societies should be and what the purpose of existence is. Which is just as well, because if we were inclined that way we could be facing an existential crisis, what would be the point of humanity anymore if the AI we created does everything as well as we do?
Being social media stars may be all that is left to us, what a wonderful world.
That sort of mimicry can never surpass or even catch up to what humans can do outside of very simple situations. What it can sometimes do is to be close enough while being much cheaper. That, for example, allows for search engines to quickly give you good enough results. That's critical when high volume is essential. It's far less useful when humans can readily produce ample or even excessive volume on our own.
If you had your choice between one great novel written by a human or any one of a trillion mediocre novels written by AI, a whole lot of people will be interested in the one great novel. The only real reason to care about any of the AI novels is if it's highly customized to suit your personal tastes and you value that more than the various stylistic ways that the AI novel is inferior.
Another problem with AI is that it makes a lot of glaring mistakes that a competent human would not. That's really not a fixable problem, either, as it's intrinsic to how the algorithms work, at least in complicated situations. That's acceptable for some things, as if a search engine returns six things that you're interested in among its top ten results, having a few that are completely irrelevant isn't a problem. But it's a crippling problem when being right 99.9% of the time isn't good enough. That's why, for example, self-driving cars aren't going to be mainstream anytime soon.
The worry about jobs going away has been a perpetual one for centuries. Many particular jobs did go away, but plenty of new jobs arose in their place. AI will just be the latest iteration of this old phenomenon. There will probably be a quite a few jobs of humans working to improve AI by providing better training data. There will almost certainly be a lot of jobs of AI assisting humans where the human acts as an editor to catch and fix the mistakes that AI makes.
Recently our Chancellor used Chat AI to write the introduction to his speech, as you can expect it got headlines about "who needs politicians"? What they didn't ask was who needs song writers when these chat AI can write a song in the style of a given songwriter. From what I understand it was a good introduction but the artist whose song writing style was copied said it was rubbish, but then he would wouldn't he?
I think you are right about humans always being able to get a supervisor/editing role, the question is for how long?
https://biturl.top/rU7bY3
Beyond the shadows there's always light
AI can do mimicry, but not novelty. If you ask an AI to write something similar to what it already has many examples of, it might well do a respectable job. For example, it wouldn't surprise me if ChatGPT gave a decent output if you asked it for a patriotic speech that an American mayor might deliver on the fourth of July. Even if ChatGPT can't do that quite yet, some future AI probably will.
But if you want AI to write the speech making the case for some new initiative, the output is going to be a disaster unless you end up with a "stone soup" style of speech where you have to basically feed the AI all of the points that you want to make in the speech. At that point, the AI that "writes" the speech barely does any more than a word processor.
What's not going to happen in the foreseeable future is to get good voice acting out of a purely text to speech program that doesn't give the AI any special information about the cadence, tone, or emotion involved.
Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio | Ars Technica
AI Emotion and Sentiment Analysis With Computer Vision in 2023 - viso.ai
Logic, my dear, merely enables one to be wrong with great authority.
It's common for AI to be able to mimic something decently well without understanding it at all. But that's also why it often makes ridiculous, howling errors at seemingly random points in the middle of otherwise sensible output. It doesn't actually understand what it is saying, but only observes that it scores as the closest that it can get to the training data.
If working out which posts were adbots was my job this is one human who would be feeling the heat.
https://biturl.top/rU7bY3
Beyond the shadows there's always light