Auditory Perception Psychology Pdf Part-2


Auditory Perception Psychology Pdf (Part-2)

Today in this article we will talk about Auditory Perception Psychology Pdf, The second article on auditory perception starts with the multi-modal speech perception known as McGurk Effect. We will also talk about how mirror neurons play a role in speech production and perception and some challenges to the motor theory of speech perception. This article closes with the general auditory approach to speech perception and the Fuzzy Logic Model of Speech Perception (FLMP).

If you remember in the last article we were talking about the auditory perception we were talking we talked a little bit about the physiology of the ear and some physical characteristics of the sound we also talked about we started talking about the theories on moto theories of speech perception the theory we were talking about while I ended the last article was the motor theory of speech perception and will try and continue from there onwards in this article as well now an interesting effect of the motor theory of speech perception is basically that it says that understanding speech gestures requires one to uh figure out whatever gestures have created any given acoustic signal the system therefore uses or will require to use any sort of information that can help identify these gestures.

The McGurk Effect


Now while acoustics only offered cues as to what those gestures possibly are the help can be taken from other perceptual systems as well if the perception systems can provide this kind of health they can provide any kind of clue as to what these gestures are the moto theory says that the speech perception system will take up this information use this information and use it in understanding speech in fact two non-auditory perceptual systems like vision and touch have already been shown to affect speech perception and the most famous demonstration of this multi-modal effect on speech perception is the McGurk effect which first was reported by McGurk and Macdonald in 1976.


Now, this mega effect you can find a lot of videos about on youtube etc but the crux of this mega effect is that this happens when people are watching a video of a person talking but the audio portion of the tape has been altered for example the video might be showing a person speaking Gur but the audio signal is of a person speaking but uh generally what happens is that people perceive neither ga or ba they perceive a combination which comes out as the sound of duh.

So what is happening is uh the person is saying you’re hearing uh but the video is of saying ba and what the system is doing it’s combining these two information in some sense and coming up with the entirely new sound that is the why is this happening this is happening because if the visual system information is removed the auditory information is accurately perceived and the person hears ba so if you kind of close your eyes and listen to this information the visual thing is gone and then you will correctly perceive whatever the person was saying as.


Now, this mega effect has been shown to be incredibly robust it happens even when people are fully warned even if they are told that see the audio is different video is different you have to still try and understand it it still happens people still cannot really control integrating of these two information and coming over this third category.

Speech Perception System

Now the mega effect happens because our speech perception system combines both visual and auditory information for perceiving speech rather than relying on the visual or auditory perception alone of course the auditory information by itself is sometimes sufficient for the perception worker but the mega effect shows that the visual information also influences a speech perception whenever it is available to say for example if you’re talking on the phone these kinds of effects are certainly not there the mega effect is an example of what is called the multimodal perception because two sensory modalities visual and auditory are actually being used in order to create the subjective experience of the sound another way to create a variant of the mega effect is by combining haptic information with auditory information to change the way that people perceive a spoken syllable this kind of perception that occurs outside the laboratory from time to time in a specialized module is called Tadoma.


Now Helen Keller people who are not who are visually impaired sometimes learn to speak by using their sense of touch to feel whatever articulatory information is presented remember we are still talking about the motor theory of speech perception and the goal is still to identify whatever gestures were used and that is supposed to help us understand the sound.

Now carol fowler did this experiment and actually, she had her participants of the experiment feel her lips while they listened to a recording of a female speaker speaking a variety of syllables.

Blindfolded and Under Gloves

Now blindfolded and under gloves, these experimental participants heard the syllable girl over the speaker while carol fowler simultaneously mouthed the syllable but again as in the mega effect traditionally they reported hearing the syllable of d the motor theory explains both these versions of the mega effect the visual and the haptic one as stemming from the same basic processes as the goal of the speech production system is not respect a spectral analysis of the auditory input rather it is to figure out the set of gestures that have been producing these sounds, the motor theory tries to tell us that both of these information the visual and haptic are basically being used in order to make these judgments and that is what is leading to this combined perception that is happening under natural circumstances the visual auditory and touch information will anyways all line-up and they will not be conflicting like in the case of the mega effect video.


So in that sense, it will all work fine but if you are creating an experimental situation like this it might lead to confusion as we saw in the perception of the syllable the moto theory of speech perception repeatedly talks about the importance of understanding the motor aspects of speech and it has been basically a very popular theory uh but say for example another wave was found which could help and support the motivated speech perception this other way was a chance discovery of by researchers who were working on macaque monkeys and they discovered that particular neurons in the monkey’s frontal cortex responded when the monkey performed the action but they also responded when the monkey observed a particular action these neurons were referred to as the mirror neurons.

The Mirror Neurons

Now the existence of mirror neurons in monkeys was established by invasive single-cell recordings and in that sense, they have not these kinds of recordings because they are not possible to do with humans uh there is a hypothesis that the human brain which is very similar to the monkey brain also contains these similar kinds of neurons however the part of the brain of the macaques that have the mirror neurons is similar to the Broca’s area of the brain which also is involved incidentally in production of speech so it also does something motor which is related to speech you can remember the Broca’s area from the lecture article I gave on the brain in behavior thing neuroimaging and research involving direct recording from neurons in the Broca’s areas show that Broca’s area participates in speech perception.


Now researchers who discovered mirror neurons proposed that the mirror neurons could be the neurological mechanisms that the motor theory of speech perception requires that is these mirror neurons in the Broca’s area could fire when an individual produces a particular set of phonemes or hear a particular set of phonemes and providing the bridge between speaking and listening.

so if you’re speaking a set of phonemes the mirror neurons in the bronchos area are firing and if you’re listening to the particular sound the same neurons are firing again so that there is the same neurological structure that is involved both in speaking and listening there have been experiments conducted to non-invasively find evidence for the participation of mirror neurons or the participation of the motor cortex in speech perception obviously the motor cortex has been known to participate while speech production is there but remember we are talking about speech perception the moto theory says that accessing the representations of specific speed gestures must underlie speech perception the representations of these species gestures must be stored in the parts of the brain that control articulatory movements wherever those parts of the brain which are involved in making these movements must also be the parts of the brain that store this information about what gestures have been used the parts of the brain that control articulation are the motor cortex in the frontal lobes of the brain and the adjacent pre-motor cortex that are used when we perceive speech.

Now the proponent of the mirror neurons argue that mirror neurons are able to are basically the neural mechanism that will establish the link between speech and the motor representations that underlie speech production mirror neurons have recently been found in the monkey equivalent of the motor cortex as well and so the proponents of mirrored neurons view this as evidence of the fact that all motor neurons respond to speech perception as well some mirror neuron theorists argue further that mirror neurons also play a role in modern humans because our speech perception and production process are evolving from a manual gesture system the theory about language evolution is that because we are using manual gestures initially that is why our mirror neuron or motor cortex is involved in perception on the production of speech as well there’s a story for another day but let us examine some evidence about this involvement of mirror neurons or the motor cortex in human speech perception.


so Pulver muller and colleagues conducted a study wherein participants were to listen to syllables that resulted from either billy bill stops like per or ba or alveolar stops like ta or da on listening trials on silent production trials these participants imagine themselves making these sounds so there is production and reception both happening measurements of the brain activity were gathered using fMRI if you remember fMRI basically measures the amount of flow of oxygenated blood to the areas that are involved in particular cognitive tasks.

Now listening to the speech cause substantial activity in the superior parts of the temporal lobes on both sides of the partition’s brain but it also causes a lot of activity in the motor cortex in the experimental participant’s frontal lobes further brain activity in the motor cortex depended on what kind of speech sounds the participants were listening to so there were different activations depending on whether the sound was a bilabial stop or an alveolar stop this result is explained by the motor theory and they say that the same areas that produce this speech are involved in perceiving it.


so it is one kind of confirmatory evidence in another study when TMS was applied to a participant’s motor cortex participants were less able to tell the difference between two similar phonemes if those areas are not working your perception and understanding of these two phonemes might also be attenuated further when people listen to speech sounds that involve tongue movements and have TMS applied to parts of the motor cortex that control the tongue increased motor evoked potentials are observed in the participant’s tongue muscles.

so there is some processing happening there as well all of these experiments put together show that the motor cortex indeed generates neural activity in response to listening speech consistent with what the motor theory has been saying but there have been some challenges to the motor theory of speech perception as well some of the challenges are rooted in the link that they make between perception and production you can say for example infants are fully capable of perceiving an understanding speech despite the fact that they are thoroughly incapable of producing these speech sounds to account for these results we will either have to conclude that the infants are born with this innate set of speech motor representations or that having a speech motor representations is not necessary for perceiving phonemes if we accept the latter we are kind of violating what the motor theory was saying additional experiments have also cast doubt on whether speech motor representations are necessary for speech perception.


Now no one would suggest that non-human animals have a supply of speech motor representations that pertain to human speech sounds but it has been found that animals like the Japanese quail and chinchillas also respond to a particular class of speech sounds and refrain from responding to other class of speech sounds telling us that they have this aspect of speech perception and they can differentiate between these different sounds.

Now the moto theory would say that they also know which gestures are involved in producing these which kind of is a non-starter because these animals lack the human article articulatory apparatus they cannot have the speech motor representations as I was saying but as they respond to these different aspects of speech very much like humans would do the moto theories claim that speech motor representations are a necessary part of speech perception is kind of weakened.


Further research with a physic patient also casts doubt on the motor theory say for example both Broca and Wernick showed that some brain damage patients could not produce speech but understand it while others couldn’t understand speech but not produce speech if you were to listen to the moto theories claims then you will say that this is not really possible the existence of these clear dissociations between speech perception and production systems provides strong evidence against the account of the motor theory also a speech perception requires access to intact motor representations then brain damage that impairs spoken language output should also impair spoken language comprehension as I was saying.

Another Problem

Now the final problem or another problem about this account is basically that one has to say the same speech sounds can be produced by different articulatory gestures it has been shown in a study by Mcneil age in 1970 more specifically different people can produce the same phoneme by altering configurations of the vocal tract because the vocal tract offers a number of locations where the airflow can be restricted and because different combinations of airflow restrictions have the same physical effect they wind up producing similar acoustic signals and which are indistinguishable to the perceiver so perceiver might be listening to two different the same sound being produced by different articulators and different kinds of gestural scores then it becomes very difficult uh uh to say that a single gesture uh is responsible for a sound like you can say then there are multiple gestures.


Now studies involving and there are these interesting experiments done called bite block vowels when people are keeping something in their mouth and then producing the sound and the participants can still understand it, it says that the motor theory is kind of weakening here the motor theory if given a chance will try to account for this set of findings in one of two ways they will say either one more than one speech motor representation goes with a given phoneme or that there is a single set prototype of speech motor representations and that acoustic analysis of speech signals determine which of these ideal gestures most closely match the acoustic output.

Now if you see this closely that both of these things will violate the spirit of what the motor theory is saying I’ll just repeat this once more so two reasons, they can say that more than one special motor representation can go with a given phoneme or they can say that there is a single prototype which can be matched to any given gesture both of these things are contradictory to what the motor theory speech perception has originally claimed in that sense it is kind of weakened and it does not explain able to explain all the findings.

Other Theories of Speech Perception

So when there are flaws with a particular big theory there are other theories that jump in the other important theory of speech perception is the general auditory approach to speech perception.


Now the general auditory approach basically says it starts with the assumption that speech sounds must be perceived or are perceived using the same mechanisms of audition and perceptual learning that have evolved in humans to handle all other classes or sounds as well so it says that speech perception is not really special you understand speech as you understand all the other sounds researchers in the general auditory tradition look for consistent patterns in acoustic signals for speech that appears whenever particular speech properties are present further they seek to explain commonalities in the way different people and even different species react to aspects of speech, for example, some studies have looked at the way in which people and animals respond to what is called voicing contrast.

For example, the example I was talking about per or ba and these studies have suggested that our ability to perceive voicing is related to the fundamental properties of the auditory system and not really a special module that was proposed by this motor theory we can tell whether two sounds uh occurred simultaneously if they begin more than 20 milliseconds apart so it’s just a matter of time and not really that it is a special speech signal so if two sounds are presented within 20 milliseconds of each other we will perceive them as being simultaneous in time if one starts 20 milliseconds before or after another we start perceiving them in one before the other in different sounds the voicing boundary for people and quails, by the way, sits right at the same difference of 20 milliseconds.

Now you can see that this is this generality between the human auditory perception system and the quail auditory perception system by the way quail is a particular bird if the vocal fold vibration starts within 20 milliseconds of the burst we will perceive the phoneme as voice if it starts after 20 milliseconds will be perceived as unvoiced per ember the same example we have been talking about.

So this aspect of this general aspect of phonological perception then could be said to be based on a fundamental property of auditory perception rather than the peculiarities of gesture that go into voiced and unvoiced.

So if I were an auditory perception system I do not really need to keep a tab of whether it is voice or not wise I will just keep track of time and if the time is sufficiently apart I will treat them as different I can be a quail or any other animal and still do this task perfectly the general auditory approach also does not offer an explanation of the full range of human or perception capabilities but its chief advantage lies in its ability to explain common characteristics of human and non-human speech perception and production since the theory is not really committed to gestures per se as the fundamental unit of phonological representations it is also not vulnerable to the kind of flaws that were associated with a motor theory which had said that speech perception and production link is necessary another kind of model a different kind of model of speech perception that is more popular and more recent is the fuzzy logic model of speech perception.


Now it is different from both the general auditory and the motor theory of switch perception in that it says that a better approach or a general auditory tradition is that there is a single set of ideal or prototype representations of speech sounds as determined by their acoustic characteristics.


Now according to this FLMP model speech perception reflects the outcomes of two kinds of processes there are bottom-up processes which are the mental operations that analyze the acoustic properties of the incoming speech stimulus and there are top-down processes which activate a set of potentially matching phonological representations so imagine if you’re listening to a sound if somebody is speaking something to you one set is already analyzing this incoming signal in terms of very basic physical characteristics and the other set of operation is trying to look into your memory as to whatever information about this particular sound you have and they meet somewhere in the middle and then you can understand whatever sound is produced.

Now it needs to specify that there are a lot of stored representations of phonemes and they are activated to different degrees and they are similar to acoustic properties in the speech stimulus more similar phonemes attain higher degrees of activation less similar phonemes achieve lower degrees of activation so if you’re listening to a particular sound all those sounds similar to this incoming sound that you have heard of will all get activated and they will potentially be matched against this incoming sound the top down processes are these mental operations that use the information in long-term memory to try and select the best possible candidate from among the set of candidates activated by the bottom-up processes.

so the bottom-up analysis has activated so many candidates which you can potentially match to any of this incoming information and the top down mental operations are actually doing this matching and they are trying to select the best possible candidate which will match this incoming stimulus once that match is made you understand that particular stimulus if you have heard of it earlier this may be especially important if the incoming information is ambiguous or degraded let me take an example say for example when the n phoneme precedes the b sound say for example in the case of lean bacon and if i am saying it very fastly in bacon oftentimes the co-articulation because n and b are coming so close together a lot of people might perceive this as lean bacon if i am saying n then be so close together the word lean will be perceived as lean and a lot of people will report saying hearing a lean bacon if i if you uh just want to do it you can say this very fast again and again to yourselves and then you can see that what is happening.

Lean Bacon

Now say when somebody listens to lean bacon, the bottom of processes will activate both the prototypes it will activate the end and it will activate m so they could be lean bacon and lean bacon both will be activated according to the fuzzy logic model of speech perception our knowledge that lean bacon is actually a meaningful word and is a likely representation in English will cause us to favor lean bacon over lean bacon and that is how we will understand whatever has been said however if the n-word says for example in a non-word such as pleat bacon and if I will say plain bacon or something like that a listener will be more likely to favor the m interpretation because the opening sound would not receive any support from top-down processes because there is nothing I know that is called clean bacon.

So I will go by the bottom-up analysis and I will pick up something that is coming from there now this tendency to perceive ambiguous speech stimuli as real words are possible, real words are actually called the Ganong effect named after William Ganong in 1980 FLMP also offers a mechanism that can produce what is called phonemic restoration effects now phonemic restoration basically is when speeds normally are edited to create gaps say for example if you remember i was talking in an earlier class about legislatures and there was this experiment and people were healing the word legislators on the headphones and there was a coughing sound where the s is there so lege cuff and laters experiment what people do is people do report hearing this even if though there is no s presented in the signal itself what is happening here is that you’re using your previous knowledge of the word legislatures to fill in that s and you do it so well that you’re convinced that there was ns in that signal.

Now, these phonemic restoration effects are stronger for longer than uh for longer words than shorter words because these make much more sense and are more grammatical than shorter words which are ungrammatical and they might not make sense further the specific phoneme that is restored can also depend on the meaning of the sentence that is edited okay say for example if there is a different thing if you’re hearing the cuffing sound let us say the wagon losses and there’s a cuffing sound eel you will most likely hear the word w.


So because you will hear the wagon lost it we lost its wheel which is the more probable thing but if you hear the circus train has lost and the circus has lost a trained eel and there’s a coughing sound before eel you will probably think that you’re talking about s you’re hearing an s there because the circus generally has animals like seals and you say that maybe the sound that was there was s so how you will do the phonemic restoration actually depends on the context that is built in the sentence previously research involving ERPs have shown that nervous system does register the presence of the cuffing noise very soon after it appears in the stimulus.

so what is happening is you are registering the cuffing sound very early but you are doing all the mental processes as possible to fill up that space that is created by the cuffing sound all of these suggest that there is a variety of possible sources of top-down information and these in various sources of information can affect the way an acoustic signal is perceived further they suggest that the perception of speech involves anything involves analyzing the signal as well as biasing the results of this analysis based on how well different candidate representations fit in with the other aspects of the message.

So you might have be biased to hear wheel or seal or anything else but because i was talking about the wagon you would fill it with wheel because that fits in better and you will not fill s there because and you’ll feel s in the circus example because wheel does not fit there better so you are doing this different kind of calculations and your online correcting your perception of speech there these other aspects could include whether the phonological representations result in a real word or not whether the semantic interpretation of the sentence make sense or not and how intact the top down information is if you do not remember the word exactly if you do not remember what animal the circus has if you do not have any language any knowledge about the animals of the circus then you might not be able to fill the example there with s as to seal if you’re living in a particular place where in circuses do typically have seals then you will create the sentence that the circus has lost a trained seal this is all about speech perception that we will be talking about in the next class we will begin talking about attention as a cognitive phenomena thank you.

Read also:

Perception and Action Psychology

Theories of Object Recognition Psychology

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Copy link
Powered by Social Snap