Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:00:00
hi I'm Lulu today I'm gonna introduce a bidirectional human-computer interaction device from MIT Media Lab at Regal and I hope this gait scheme can reflect the
00:00:12
future trend of human-computer interaction since the idea that human Intel intelligence should be complemented by a machine such that computer are tasks to perform routine of
00:00:25
formulating work paving the way for insights and decisions by humans consider our current interface our a barrier to after effortless and private
00:00:37
human machine communication people either have to shift their attention away from their surroundings to type or they have to say their private message
00:00:47
out loud in public our ego was presented to enable user to communicate with device AI assistance applications or other people in a silent concealed and a
00:01:02
similar manner so here let's start from a TED talk from one of the principal researchers as the video shows out regal
00:01:15
is a wearable silent speech interface that enables discrete similes and bi-directional communication with the computing device in natural angular's
00:01:27
without visible movements or voice input its distinguished from other system based on five points first the system capture neuromuscular 60 glows from the
00:01:41
surface of the users skin we are a wearable mask instead of invasiveness way second the existing now invasive real-time method with robust accuracies
00:01:54
requires the users to explicitly Mouse their speech with pronounced apparent facial movement our Trego system performed robustly even when the users
00:02:08
does not open their mouths make any sound third the system achieves a median accuracy of 92% on a standard a digital recognition test
00:02:20
which outperforming conventional method among the others force the device is an wearable set system which is a user just
00:02:34
need to wear it for function and the device connects to wirelessly over Bluetooth to any external computing device lost unlike a proposal
00:02:49
traditional ring computer interface the platform does not have access to private information or thoughts and the input is voluntary on the user's part since the
00:03:02
signal capture wander facial and the net area I will expand more detail on the following slides before actually access to alt regal let's get to know the
00:03:15
internal vocalization or say subvocalization it is the internal speech in typically made when reading provide the sun of the word as it is
00:03:26
read it is a natural process when reading and helps the mind to access meanings to cam pre comprehend and remember what is read most of these
00:03:39
movements are intact app detectable by person who is reading without the aid of machines therefore silent speech interface was
00:03:51
introduced to achieve the goal silent speech interface allows communication without using the sound made when people vocalize their speech sense it is an
00:04:05
exciting and growing technology as it is very suitable for human machine interactions the goal of a silent speech interface can be generating actual sound
00:04:17
or text or to be used as a mean of an interface between humans and computer systems such device or create it as to help those aren't
00:04:29
able to create a sound for audible peace which another use is for communication when speech is mastered by background noise a further practical use is where a
00:04:41
need exists for silence communication such as one privacy is required in a public place or a hands-free data silence transmission is needed during a military
00:04:55
or security operation this system can be categorized under two primary approaches invasive and non-invasive systems there
00:05:08
have been several previous attempts at achieving silent speech communication like sensors placed on the tongue to measure tom movements or direct brain
00:05:22
implants to achieve silent speech recognition or invasive versus non-invasive like from facial muscle
00:05:33
movements using surface electromyography both are have their post pros and cons so out ego belongs to the latter one yet
00:05:49
speech is complex brain signals precisely coordinate nearly 100 muscles to move the lips Joe Tom and Oryx simply
00:06:00
saying neuron a neural received neural impulses mapping from linguistic instance in brain and deliver it to
00:06:14
neuromuscular junctions where a single neural in narrates multiple muscle fibers the ionic movement caused by
00:06:26
muscle fiber resistance generates time varying potential difference patterns that occur in the facial and the neck must muscles while intending to speak leading to a chorus
00:06:40
finding my electrical signal that is detected by the system from the surface of the official scheme a monster the virus muscle articulated involved in the
00:06:55
speech production outrigger team focused their investigation on the lari lari geo and hi Oriole regions along with the
00:07:09
buckle mental oral and infraorbital orbital origins to detect a signal signature into a non in safe manner
00:07:21
to determine the spatial location of the detection points they selected a seven target area from an initial 30 point great spatially covering the
00:07:35
aforementioned selector regions the selection was done on experimental data recording on ranking they erect the potential 30 target locations according
00:07:48
to the future ranking evaluating how signal sourced from each target were able to better the free differentiate between world label in the data sets and
00:07:59
choose the first seven ones as directed Direction detection area for the device the final position of the electrons on the skin with the width in the select
00:08:15
regions were then addressed empirically signals are capture use electrons from the seven selected
00:08:26
targets area first the signal are filtered by a force ordered infinite impulse response borrowers the high-pass filter for preventing signal aliasing
00:08:42
artifacts wire the low-pass filter for avoiding movement artifacts in the signal another large filter is applied as six six Hertz to cancel outlying interference in
00:08:57
the hardware after independent component analysis to further remove movement artifacts the signals are digitally rectified
00:09:10
normalized to a range of 0 to 1 and nakhon Khattak concatenate as integer string the stream are sent to a mobile computer computational device through
00:09:24
bluetooth which subsequently sends the data to a server hosting the recognition model to classify silent words the data
00:09:37
corpus for experiments includes datasets for various vocabulary sites which is created by collecting data from free participants during two main phases
00:09:52
first the participants join the pilot study to investigate the feasible of signal detection and to determine electron positions the initial data sets
00:10:06
recorded with the participants was binary was the word label being yes or no the vocabulary size gradually increase with more words in some the day
00:10:20
the data collected during the study has about five hours of Saburo localized text in the second phase data corpus was
00:10:32
created for training for training training at a classify the corpus has about 31 hours of internal speech text recorded in different sessions to be
00:10:46
able to recognize the recognized model for session independence the Koppers comprises of multiple data sized data
00:11:00
sets for example in one category the word labels are numerical digital along with fundamental mathematical operation to facility externalizing arithmetic
00:11:15
computation through the interface they use the external trigger signal to slice the data into words the eggs instance in each recording session signals were
00:11:29
recorded for randomly choose words from a specific vocabulary set this data is used to Train the recognition model for
00:11:41
various applications like word clock calendar and chess game before the signal being feed to the recognition
00:11:53
model a filter to be used to identify and omit single Spikers industry with amplitudes greater than the average value for their nearest four points
00:12:08
before and after during the signal processing mal frequency capstone and discrete cosine transform has been applied to to frame the signal stream
00:12:21
and processing signal without hand picking any features this feature representation is passed through a one dimensional convolutional neural network
00:12:35
to classify into word labels with the architecture described as follows the hidden layer convulse 400 filters of kernel size 3 with step size 1 to
00:12:50
process input and them pass through our rectifying non linearly this subsequently subsequently are followed
00:13:06
by a max pooling layer this unit is repeatedly twice between globally max pooling over its input this is followed by a fully connected layer of dimension
00:13:21
200 pass through rectify a non-linearity which is followed by another fully connected layer with a sigmoid activation the network was optimized
00:13:36
using our first-order gradient decent and parameters were update using Adam during training the output attempts
00:13:49
transmitted to the user by bone conduction headphones as our output when conduction headphones allows the system to simulate respond to sub local queries
00:14:03
this potentially facilities the complementary synergy between human users and machines after an internal localized the freeze is recognized the
00:14:17
computer can't actually come process the phrase according to the relevant application the users access for example and Internet of Things application would
00:14:32
assign the internal vocalize digital 3d to device number 3 whereas the mathematic the application will consider the same input as the actual number 3
00:14:44
the output does computed by the application is then converted use text-to-speech and orally transmitted to the user the device enable users to
00:15:00
silence speech interface with a computing device where certain computing application respond to sub vocal queries
00:15:11
with oral feedback thereby enabling a closed loop for example the system allows users to internal vocalized
00:15:22
arithmetic expression like 3 plus 5 the application would output the answer a to the user through bone conduction headphones well the device can also be used as a
00:15:37
reminder and a scheduler at a specific time which will orally output to the user at a corresponding time thereby providing a form of memory augmentation
00:15:50
to the user the silently communication can also be used in artificial intelligence the researchers in implemented human AI collaborative chess
00:16:05
and goes through bi-directional silent speech where the user would silently convey in the game state and the AI would compute and then orally output the
00:16:17
next move to be played the device also can be used as a singles silent speech input to application applications to
00:16:28
control device and to avail service example is to connect to a smartphone controller the device enables the user
00:16:39
to control home the home switch keep it on and off television control and home whitening etc through internal speech
00:16:54
the interface can be personalized personalized chained to recognize phrases mean to access specific service
00:17:07
guessing example internal vocalize uber to home could be used to book transport from the users current destination the
00:17:19
device can also augments the way how people communicate it can be used as a back channel to communicate with other people or even a public session in the
00:17:33
future the current version of the device the user can internally communicate five comma comma conversational phrases to
00:17:46
another person through the interface this interface could be expanded with further training since the researchers have created an environment
00:17:57
for the device which allows for peripheral devices to be directly interfaced with the system for instance smart glasses could directly communicate
00:18:13
with the device and provide contextual information to and from the device to evaluate the robustness of the platform
00:18:27
ten participants were invited in the evaluate experiment none of them have experienced same study before the arithmetic computation
00:18:40
application is used as a bias for accuracy evaluation for each participant showed them a total of 750 digits randomly sequence to onyx computer
00:18:55
screen and instructed that the users to read a number to themselves without producing a sound and moving their lips the digits were remnant leap ICTA from
00:19:09
digit 0 to 9 such that each digit exactly appeared 75 times the data was recorded for each user with the trigger
00:19:23
voltage marking word label alignments the experiment was conducted into two phases first collect silence which data
00:19:36
from users and evaluated worse accuracy on a training test split second access the recognition latency of the interface by testing life inputs on the model
00:19:50
trained using the previous step the experiment achieve immediate accuracy of 92% on a standard recognition test which outperforming
00:20:04
conventional method amonst with others so thanks for watching
End of transcript