\chapter{Literature Review}
\label{ch:review}

\section{Theoretical background}

In order to navigate this thesis some theoretical terms will need to be briefly explained first, since
this thesis touches quite a bit of technical and psychological spaces. The list of the terms with their
definitions is visible below:


\subsection{UX}

User Experience is a holistic term that encompasses the whole interaction of the user with a given product, service,
company or place. It's a sum of individual moments, feelings thoughts emotions and actions that take place during
the interaction \textcite{ibmWhatUserExperience2025}

In this thesis UX is important since immersion is deeply linked to the experience that the user has. 
It's very difficult to talk about the immersiveness of an experience without mentioning UX, since 
they are so deeply connected. On top of that the research and methods used by people who care for
UX include tools and guidelines on conducting interviews and measuring the UX even in a qualitative
sense can help to see if the baseline for the immersion can be achieved. 

In other words UX is a sum of all of the experience, and this thesis is predominantly about the user
experience. 

\subsection{A/B testing}

A/B testing is a technique used to isolate and measure underlying features of
the performance of two opposing options. It's mostly used in the commercial space of digital products 
in which certain ammount of users are presented with the option A and othe other group sees option B with 
logging in the background measuring desired features. 
For example when a company is trying to decide which button colour to use for signing up action, half 
of the users might see a red button and the other half a grey one. The conversion rate for the users 
is measures in both cases, effectively signalling which choice performs better. 

In the context of this thesis this type of comparative analysis is important because it allows to distil 
complex objects with many features into comparative things. That ability proves useful in measuring 
difficult to observe features such as immersion. 

\subsection{Immersion}

Immersion is the feeling and action of being completely involved in something. In an immersed state the 
person is so deeply engaged with the medium that they forget about the outside world. This state can 
occur when interacting with many different artwork types, such as movies, video games or books. 

In the context of creating video games immersion is often one of the most improtant qualities, since engaging 
with the virtual world on a deeper lever makes for games that feel more real and enjoyable. 

Immersion as it is is not a binary state. It exists as a spectrum. Things can be more or less immersive,
corresponding to many factors that create the experience. For example a movie played without sound is going 
to be less immersive than the same movie with sound on.

Despite the broad definition immersion is something that can be measured both qualitatively and quantitavely, either
via performing interviews with the test participants, or measuring eye coordination, breathing patterns, pulse, brain 
activity and other biomarkers in the matter of quantitave measurement. 

Since immersion is something that exists on a scale and can be measured it's possible to isolate out individual 
features that deepen \& weaken it's degree via measures such as A/B testing. Game designers often rely on certain 
features that make immersion deeper. Commonly engaging multiple senses, such as hearing and touch proves to work. 
On top of that using realistic visuals makes for easier entrance into immersive state, and above all a smooth 
lag-free experience. A game that cuts and is unresponsive is a sure-fire way into breaking the immersion of it's player.



For this thesis immersion is the core of the focus. It's a fascinating phenomenon in itself. Something that seems to be 
natural for the end-user, while it's not just a mere byproduct of the artwork - instead it's a hand crafted part of the 
exerience. That ingredient is what makes it possible to transform pixels moving on the screen into an experience that 
makes people belive that they're living another life, even if for some brief moment. 

\subsection{Suspension of disbelief}

This is a state in which the player is immersed so deeply into the experience that even if something not realistic
happens inside of it, they accept it instead of questioning it. For instance if a person sees a dragon on the street
they would question it's reality as a natural instinct, since dragons are not real. The same person watching a fantasy
movie that strives to be as immersive as possible might attune to the rules of the world depicted in the movie to the 
degree that a sudden appearance of a dragon won't surprise them and cause disbelief in the same way that seeing it in person 
would. It's an important marker that can show if immersive state is reached, or is the presented world too off-putting. 

This term comes into play in this thesis during the discussion of the world that's portrayed by the video game accompaniing 
this publication. 

\subsection{Input and game modality}

In the realm of video games modality refers to all of the different modes or channels in which communication, interaction, 
and experiences are relayed between the player and the game. As a concept it comes from linguistics and human computer interaction
but it can be broken into a few main dimensions such as input, or social modalities. In the context of this thesis input modality
is the main focus. 

The input modalities are all of the ways that the game communicates with the player. Different modalities correspond to different
senses that can be engaged such as visual, audio or haptic cues. 

This term plays a role in this thesis by allowing to label different input methods in a way that allows for clear 
distinction and gives them an universally recognised names. 

\subsection{VUI}

Voice user interfaces allow people to control computers and other devices through speech recognition \parencite{interactiondesignfoundationWhatAreVoice2025}
The response from the VUI can be anything, ranging from audio response down to visual or haptic feedback.

It can be portrayed as an alternative interaction method to visual User Interfaces which rely on interacting by the 
means of using visual modality. 

This type of emergent user interface type is something that this thesis focuses on, by creating a video game that 
employs voice communication. 

\subsection{Accessability \& Inclusive Game Design}

Accessability the quality of being able to be accessed and used by all people including those with disabilities. 
This term origianlly emerged in architecture where architects started to embrace aids for not fully abled people. 
In a way architectural accesability can be easier to spot, since people have designed buildings for much longer than
digital interfaces. Things such as elevators, braile signs, or proper wayfinding are often required by law these days,
and generally are often simpler solutions than when it comes to digital Accessability. 

In the digital realm accesability effectively touches on the same merit as physical accesability making it possible
for people to access and use digital systems no matter their potential disabilities. Digital accesability is an
evolving topic with growing scrutinity from various goverments\todo{add sources for EU and Poland for example}


Inclusive Game design is the answer to the question of "who can be potentially excluded from this experience and how can
can they be included instead?". Exclusion can happen from many different reasons, and as itself is a very deep topic.
While commercially launched games almost always have some sort of target demographics for it's player base, 
the Inclusive Game Design is the antithesis of that narrow field of view in a way, since it allows to bring as many people
as possible to be able to enjoy the experience. Both of these aproaches don't eliminate one another, it's rather a complimentary
mechanism that makes sure that marginalised groups in the society don't get pushed even further away. \todo{cite The Last of Us II}

While the primary goal of this thesis is measuing player immersion it's worth noting the potential impact of the voice
based navigation on the Accessability of video games. On top of that the game designed for the evaluation of the thesis
strives to follow Inclusive Game Design Principles as to not cause harm by exclusion.

\subsection{Voice Commands vs Natural language input}

Given that this thesis deals with voice input as it's main focus it's important to establish one crucial difference
in that matter. In the realm of video games there are many which allow some degree of control using the audio modality, 
in the means of issuing direct commands. For example by saying "Forward" the fame recognises that the player wants to 
move the character forward just like by pressing the character "W" on the keyboard would. 
However if the player would say "I would like to move my character to the direction in front of him" the game would not 
be able to recognise that, since that command is not something that it expects. 
On the other hand a game that can handle natural language input would be able to process such request by extracting the 
wish of the player from the spoken phrase. This kind of processing is what this thesis focuses on, compared to implementing
just a list of commands that the player has to commit to his memory. 

The choice for this natural input method is made since it allows for higher freedom for the player to express his actions 
while reducing the cognitive load required to play the game by the means of not requiring the player to remember all of the 
possible actions. It also possesses another benefit of being able to obscure the potential choices from the player by making 
him guess what can be done in a given situation. To illustrate this let's assume that player is in a control room of a submarine. 
They can either go out of the room, take control of the submarine, use the radio or talk to the crew member on board. If the game
were to employ command list, all of these options would have special commands that would activate them. Those commands would 
have to be visible to the player to make the controls usable, which in turn would make it impossible for the player to guess what 
can he do in his current predicament. Taking the same environment in a game that processes natural language, it's possible 
to show the player his surroundings and letting him try to guess what can he do by expressing his actions with natural language 
such as "I want to use the radio".

\subsection{NLP}

\subsection{Juice}





% definitions:

% Accessabiliy in games ( Inclusive Design etc. )
% VUI
% Define the role of audio modality in interactive systems UX
% Paragraph about disability statistics by WHO


\section{Voice interaction in computing}

% short history of voice interaction in computing 

\section{Voice input in Video Games}

% review of existing games that use voice with my examples 

% how current approaches are limited 

\section{Accessability and inclusiveness in Video Games}

% existing literature on the Accessability and inclusiveness in Video Games
% benefits of voice input modality for people with impairments
% challenges of the voice input (false positives, edge cases, memorisation, immersion issues if false negatives)

\section{Gaps and Opportunities}

% most work on voice as a command input, not deep integration into the game engine
% few bodies of research test natural language as an input source
% gap: immersive natural usage of voice in games for navigation in complex game environments


\section{Summary}

% this thesis has the potential blah blah blah



\section{These are just my notes:}

This is the literature review\textcite{liResearchVoiceInteraction2019}


Blah blah this is a bad sentence \todo{This is something I need to fix.}

15\% of people have some sort of disability according to WHO \todo{source}

VUI - Voice User Interface. Very nice definition in this paper\parencite{harneskExploringVoiceInteractions}

VUI user journey: The user journey could be much reduced in the context of specific game actions compared to using conventional modalities, for example, instead of selecting a waypoint on map manually they could just ask the game to set the waypoint for them

There's this thing called all encompassing design
Designing games not only for fully able-bodied or disabled people, but rather for everyone and all.


\section{Examples of games using audio modality}

Alien: Isolation used the sound modality to alert the enemy if the player was making sounds. Super clever use in a horror game.

Phasmophobia uses mic input to change the game according to what the player is saying

Narrative games seem to be a good fit for audio modality given their pacing and focus on story line, language-driven approach.


There are also games that that use audio only\parencite{PtolemsSingingCatacombs}

There's also Facade used written text for player input. The game was praised at it's time.


There's this Talon deep script integration too which adds a layer of control 
\todo{Explain how different is it from your thesis - how what you're doing is another layer of control by calling game engine functions directly not the input}
Talon offers script integration using voice commands as python module - interesting interface approach.

Most of the games found focus on voice in terms of commands, not deep integration to the game.
Example is Kinnect voice commands 

Lexical analysis models can help in tokenisation process, could they potentially help the LLM in interpretation of the commands?
