Auto Scroll
Select text to annotate, Click play in YouTube to begin
hi I'm Lulu today I'm gonna introduce a bidirectional human-computer interaction device from MIT Media Lab at Regal and I hope this gait scheme can reflect the
future trend of human-computer interaction since the idea that human Intel intelligence should be complemented by a machine such that computer are tasks to perform routine of
formulating work paving the way for insights and decisions by humans consider our current interface our a barrier to after effortless and private
human machine communication people either have to shift their attention away from their surroundings to type or they have to say their private message
out loud in public our ego was presented to enable user to communicate with device AI assistance applications or other people in a silent concealed and a
similar manner so here let's start from a TED talk from one of the principal researchers as the video shows out regal
is a wearable silent speech interface that enables discrete similes and bi-directional communication with the computing device in natural angular's
without visible movements or voice input its distinguished from other system based on five points first the system capture neuromuscular 60 glows from the
surface of the users skin we are a wearable mask instead of invasiveness way second the existing now invasive real-time method with robust accuracies
requires the users to explicitly Mouse their speech with pronounced apparent facial movement our Trego system performed robustly even when the users
does not open their mouths make any sound third the system achieves a median accuracy of 92% on a standard a digital recognition test
which outperforming conventional method among the others force the device is an wearable set system which is a user just
need to wear it for function and the device connects to wirelessly over Bluetooth to any external computing device lost unlike a proposal
traditional ring computer interface the platform does not have access to private information or thoughts and the input is voluntary on the user's part since the
signal capture wander facial and the net area I will expand more detail on the following slides before actually access to alt regal let's get to know the
internal vocalization or say subvocalization it is the internal speech in typically made when reading provide the sun of the word as it is
read it is a natural process when reading and helps the mind to access meanings to cam pre comprehend and remember what is read most of these
movements are intact app detectable by person who is reading without the aid of machines therefore silent speech interface was
introduced to achieve the goal silent speech interface allows communication without using the sound made when people vocalize their speech sense it is an
exciting and growing technology as it is very suitable for human machine interactions the goal of a silent speech interface can be generating actual sound
or text or to be used as a mean of an interface between humans and computer systems such device or create it as to help those aren't
able to create a sound for audible peace which another use is for communication when speech is mastered by background noise a further practical use is where a
need exists for silence communication such as one privacy is required in a public place or a hands-free data silence transmission is needed during a military
or security operation this system can be categorized under two primary approaches invasive and non-invasive systems there
have been several previous attempts at achieving silent speech communication like sensors placed on the tongue to measure tom movements or direct brain
implants to achieve silent speech recognition or invasive versus non-invasive like from facial muscle
movements using surface electromyography both are have their post pros and cons so out ego belongs to the latter one yet
speech is complex brain signals precisely coordinate nearly 100 muscles to move the lips Joe Tom and Oryx simply
saying neuron a neural received neural impulses mapping from linguistic instance in brain and deliver it to
neuromuscular junctions where a single neural in narrates multiple muscle fibers the ionic movement caused by
muscle fiber resistance generates time varying potential difference patterns that occur in the facial and the neck must muscles while intending to speak leading to a chorus
finding my electrical signal that is detected by the system from the surface of the official scheme a monster the virus muscle articulated involved in the
speech production outrigger team focused their investigation on the lari lari geo and hi Oriole regions along with the
buckle mental oral and infraorbital orbital origins to detect a signal signature into a non in safe manner
to determine the spatial location of the detection points they selected a seven target area from an initial 30 point great spatially covering the
aforementioned selector regions the selection was done on experimental data recording on ranking they erect the potential 30 target locations according
to the future ranking evaluating how signal sourced from each target were able to better the free differentiate between world label in the data sets and
choose the first seven ones as directed Direction detection area for the device the final position of the electrons on the skin with the width in the select
regions were then addressed empirically signals are capture use electrons from the seven selected
targets area first the signal are filtered by a force ordered infinite impulse response borrowers the high-pass filter for preventing signal aliasing
artifacts wire the low-pass filter for avoiding movement artifacts in the signal another large filter is applied as six six Hertz to cancel outlying interference in
the hardware after independent component analysis to further remove movement artifacts the signals are digitally rectified
normalized to a range of 0 to 1 and nakhon Khattak concatenate as integer string the stream are sent to a mobile computer computational device through
bluetooth which subsequently sends the data to a server hosting the recognition model to classify silent words the data
corpus for experiments includes datasets for various vocabulary sites which is created by collecting data from free participants during two main phases
first the participants join the pilot study to investigate the feasible of signal detection and to determine electron positions the initial data sets
recorded with the participants was binary was the word label being yes or no the vocabulary size gradually increase with more words in some the day
the data collected during the study has about five hours of Saburo localized text in the second phase data corpus was
created for training for training training at a classify the corpus has about 31 hours of internal speech text recorded in different sessions to be
able to recognize the recognized model for session independence the Koppers comprises of multiple data sized data
sets for example in one category the word labels are numerical digital along with fundamental mathematical operation to facility externalizing arithmetic
computation through the interface they use the external trigger signal to slice the data into words the eggs instance in each recording session signals were
recorded for randomly choose words from a specific vocabulary set this data is used to Train the recognition model for
various applications like word clock calendar and chess game before the signal being feed to the recognition
model a filter to be used to identify and omit single Spikers industry with amplitudes greater than the average value for their nearest four points
before and after during the signal processing mal frequency capstone and discrete cosine transform has been applied to to frame the signal stream
and processing signal without hand picking any features this feature representation is passed through a one dimensional convolutional neural network
to classify into word labels with the architecture described as follows the hidden layer convulse 400 filters of kernel size 3 with step size 1 to
process input and them pass through our rectifying non linearly this subsequently subsequently are followed
by a max pooling layer this unit is repeatedly twice between globally max pooling over its input this is followed by a fully connected layer of dimension
200 pass through rectify a non-linearity which is followed by another fully connected layer with a sigmoid activation the network was optimized
using our first-order gradient decent and parameters were update using Adam during training the output attempts
transmitted to the user by bone conduction headphones as our output when conduction headphones allows the system to simulate respond to sub local queries
this potentially facilities the complementary synergy between human users and machines after an internal localized the freeze is recognized the
computer can't actually come process the phrase according to the relevant application the users access for example and Internet of Things application would
assign the internal vocalize digital 3d to device number 3 whereas the mathematic the application will consider the same input as the actual number 3
the output does computed by the application is then converted use text-to-speech and orally transmitted to the user the device enable users to
silence speech interface with a computing device where certain computing application respond to sub vocal queries
with oral feedback thereby enabling a closed loop for example the system allows users to internal vocalized
arithmetic expression like 3 plus 5 the application would output the answer a to the user through bone conduction headphones well the device can also be used as a
reminder and a scheduler at a specific time which will orally output to the user at a corresponding time thereby providing a form of memory augmentation
to the user the silently communication can also be used in artificial intelligence the researchers in implemented human AI collaborative chess
and goes through bi-directional silent speech where the user would silently convey in the game state and the AI would compute and then orally output the
next move to be played the device also can be used as a singles silent speech input to application applications to
control device and to avail service example is to connect to a smartphone controller the device enables the user
to control home the home switch keep it on and off television control and home whitening etc through internal speech
the interface can be personalized personalized chained to recognize phrases mean to access specific service
guessing example internal vocalize uber to home could be used to book transport from the users current destination the
device can also augments the way how people communicate it can be used as a back channel to communicate with other people or even a public session in the
future the current version of the device the user can internally communicate five comma comma conversational phrases to
another person through the interface this interface could be expanded with further training since the researchers have created an environment
for the device which allows for peripheral devices to be directly interfaced with the system for instance smart glasses could directly communicate
with the device and provide contextual information to and from the device to evaluate the robustness of the platform
ten participants were invited in the evaluate experiment none of them have experienced same study before the arithmetic computation
application is used as a bias for accuracy evaluation for each participant showed them a total of 750 digits randomly sequence to onyx computer
screen and instructed that the users to read a number to themselves without producing a sound and moving their lips the digits were remnant leap ICTA from
digit 0 to 9 such that each digit exactly appeared 75 times the data was recorded for each user with the trigger
voltage marking word label alignments the experiment was conducted into two phases first collect silence which data
from users and evaluated worse accuracy on a training test split second access the recognition latency of the interface by testing life inputs on the model
trained using the previous step the experiment achieve immediate accuracy of 92% on a standard recognition test which outperforming
conventional method amonst with others so thanks for watching
End of transcript