Theories of Object Recognition Psychology
Today we will talk about theories of object recognition, Theories of Object Recognition Psychology, you see till now that we have been talking about the perception we have been talking about various theories of perception.
This article is about various approaches to object recognition, viz., Template matching theory, Feature analysis theory, and Recognition by components theory and its processing mechanism, which is primarily a bottom-up process. The emphasis on how memory & expectations help in identifying objects is covered in top-down influences on object recognition. It closes by answering the fascinating question, why is face perception special?
Theory of Object Recognition
Today we will take a particular case and we will see how we interact and how we understand and recognize objects in this external world now there have been a variety of theories that have been proposed in order to explain how visual recognition is achieved these theories may differ depending upon the theoretical stance they take say for example whether they are bottom-up theories or they are top-down theories you might know already using the previous articles that bottom-up theories basically focus upon developing or using the information coming from the sensory experience to develop mental representations and top-down theories basically favor the theoretical stance that it is our memory and it is our experience and knowledge of the world.
That helps us build mental representations of the external world now we will just we are just adding a case there we are saying whether it is the sensory experience that leads us to form representations of the objects or whether it is that information that helps us to recognize and interact with objects versus whether it is our memory and our knowledge of the world that helps us interact and recognize objects that is the kind of a difference.
Can talk about all in all in all basically the attempt of this object recognition theory is basically to be able to account for the excellent capacity of object recognition that we have the fact that we make errors very rarely and the fact that we do object recognition rather quickly there are two problems of object recognition we referred to this while we were talking about mar is there are two kinds of representation possible.
Template Matching Theory
If you are talking about a particular object one form of representation is if you are taking an object in variant view you talking about the representation of the object which is an object-centered representation versus if you are talking about a viewer centered representation say for example i am looking at a flower pot standing on the window of my house how does that flower pot look if i move around versus what is the general perception of that flower pot which will not change irrespective of how i move around about in the room so these are some of the problems which are there in object recognition we will try and see how these various theories have attempted to solve those problems now there have there are variety of theories we will talk about that but the one of the most basic theories of object recognition could be something like a template matching theory now a template matching theory basically says that we compare the stimulus that we compare the sensory input with the set of templates that we already have.
So you might already have a template of how a particular ball looks like or how a particular toy looks like and you kind of have this template and you kind of have these specific patterns and what you trying to do here is match this set template which is there in your head with the sensory input so you compare the incoming sensory input to a variety of templates you might have and you select a particular template that matches the sensory input to the best ok to say for example you have templates a b and c and the input is let us say d you will kind of choose either of a b or c depending on the degree of match between d and either of these say for example a matches 65 percent the other two matches.
Let us say 20 and 15 percent so you will choose probably the template of a because that matches d more than the other two okay in this template matching account we were looking we are basically looking for the exact match between the stored template and the input representation it is not even 65 or something we are kind of thinking of a hundred percent match between this template that we have stored and the representation or the sensory input that is coming in I will show you an example –
Suppose there are different ways of writing let us say pattern perception you will see that depending on different people’s handwriting the templates of just these set of alphabets can be really different now suppose you are a system which is supposed to recognize each of these things as soon as the template kind of varies by one degree or two degrees you will already start having a trouble in recognizing these patterns an example of these kind of things could be the the machine recognition systems that examine for example your signatures in your checkbooks if say for example you just made a very small mistake in your signature you’ve kind of say for example you there is a superscript and there’s a subscript if the subscript and superscript are kind of shifted a little bit the machine recognition system which kind of matches this signature of yours with the signature that you have given at the time of opening the account and if that match is less than a particular amount it will not recognize it.
That is the problem with these template matching accounts the problem is that these are extremely inflexible theories if a letter were to differ from the appropriate template even slightly the pattern will never be recognized so but that degree of freedom is not there and you see that say for example our systems the recognition systems that we have are actually very good at it they do not really say for example if I am writing a particular word or you are writing a particular word and obviously given that our writings will be really different you’re still able to read that word perfectly so it kind of tells us that maybe we’re not really using a template matching account and in that sense, we’ll let us we’ll kind of move on to other models as well now the template models in that sense by the way they are useful also but they are kind of use only and they work only for isolated letters numbers and other simple two dimensional kinds of shapes.
Feature Analysis Theory
If you have a really very very simple kind of a setup obviously then maybe the template matching account would work but not really for complex configurations say for example cursive handwriting for a large word that kind of brings us to a different kind of theory of object recognition, this theory is known by the name of feature analysis theory, now this announce this theory basically proposes that it kind of proposes a more flexible approach and this approach is about that any visual stimulus says, for example, a particular letter for that matter is supposed to be composed of a small number of characteristics or components.
A Feature-Analysis Approach
Let us not have a fixed template let us talk about the components that make up that particular object or that particular pattern and what will we look for is not really the exact template but we look for the presence of these different components, ok an example could be said for example if you talk about the letter r the letter r basically has three components it has a vertical component it has a curved component and it has a slanted component now if either a system that was supposed to recognize the letter r and irrespective of the people whose handwriting I am kind of recognizing I would assume that the letter r will at least have these three features ok so in that sense I am not going to be troubled by people’s handwritings because anybody who writes the letter r will at least write these three components
This is something wherein you find that the feature analysis theory might be good you have this example you will see so these are these different letters and this is basically how these features are ordinary so you have straight horizontal vertical diagonal lines you have a closed curve you have intersections you have symmetry about them so these are basically kind of features which were shown by Gibson that how letters really differ from each other with respect to these distinctive features and Gibson believed that this is how we really recognize these letters this is how even our higher object recognition kind of mechanisms would work on.
The feature analysis theories propose that the distinctive features for each alphabet remain constant irrespective of whether the letter is handwritten or typed or it is a photograph of a letter anything ok these models can also explain how we can perceive a wide variety of two-dimensional patterns such as figures in a painting design fabric those kinds of things ok now a feature analysis theories as a group and there are many theories we are kind of discussing at a rather generic level feature analysis theories are consistent with both psychological and neuroscience research.
Eleanor gibbs we have been talking about of a research in the last thing basically demonstrated that people do require a relatively longer time to decide if the two letters are same or different given that the two letters share a critical number of features say for example if you have decide between whether a particular letter is an n or whether it is an m you will see that these two letters share a lot of features they all share slander line and two vertical lines so the idea is because these two letters share these critical features your matching of these letters or say for example the decision of telling that these two letters are different will take more time because what you are doing according to this particular theory is checking for each feature so you check for pair of vertical lines you find them all right you check for a slanted line you find them find that as well but once you start looking for the second slanted line or let us say the second the different direction slanted line then you will find that m has it but n does not and in that sense i am talking about a caps lock scenario in that sense you will find that these two things are different this is what the feature analysis theory says about recognizing these things.
So Larson and Bundesen designed a model based on feature analysis that correctly recognized an impressive around 95 percent of the numbers written in street addresses and zip codes even neuroscience research also has shown that features and feature analysis is something that we do Ubilen weasel did this research with neurons and they found that those neurons basically can be tuned to recognizing orientation say for example horizontal versus slanted versus vertical lines.
So you have these set of neurons that actually code for specific features in that sense you could say that feature analysis theory of object recognition has some support from the neuroscience data as well okay now but feature analysis also has some related problems here there is some criticism on feature analysis theories as well let us talk about those problems now a theory of object recognition simply should not just list the features now the idea is if you just talking at the level of features and isolated features you’re not getting the entire how those features are linked with each other what am I in what way these features have jointed those kind of things you will not you are not really talking about.
Also, you can remember from the gestalt view of perception that an object is not really just components and components joining together just not really just give one object it kind of the whole is more than the sum of its part thing
A theory of object recognition then should not simply list the features contained in a stimulus it must also describe the physical relationship between those features and how are they linked together ok say for example in the letter t the vertical line supports the horizontal line whereas in the letter l the vertical line is resting under the horizontal line as resting at the side of the horizontal line.
So you can look for a horizontal and a vertical line but you will find that it is present both in l and t so you need to really specify how this linkage is there only then you will be able to understand.
say for example an l is different than a t now the featured analysis theories were constructed to explain the relatively simpler recognition of letters see the object recognition in itself is a bigger problem featured analysis theory is basically started with explaining to us the recognition of letters so kind of taking a very simple scenario they are trying to solve a very simple problem but if you really look at objects in the world say for example plants and animals and automobiles and houses and those kinds of things they are much more complex than just being a concatenated set of features.
so in that sense a feature analysis kind of falls short of explaining the myriad problems in understanding or recognizing different objects so theory also something that you will see is that there are generally distortions things are moving and there are some kind of distortions in features as well say for example if you were to recognize what a cheetah looks like or say for example what a horse look looks like and you are looking at a horse while it is really running the features are changing and the percept or the sensory input that you’re getting is also changing okay in that sense it will become that much more difficult for a featured analysis theory to explain what a particular object looks like.
So imagine saying for example you have a particular presentation in which the features of the letters are mixing with each other or they are kind of moving to say for example if you are reading something written on a flag, for example, the flag is moving with the wind and something that is actually horizontal does not look horizontal at the moment it looks slanted because the flag is moved by air now those kinds of things the feature analysis theory you will find slightly harder to explain
So let us move towards a different kind of theory another theory that we can talk about is Beedemont’s recognition by components theory now what is this recognition by components theory bitumen basically developed a theory to recognize three-dimensional shapes this idea was similar to what Gibson was saying we are not really dealing with two-dimensional entities in the external world we are dealing with three-dimensional entities.
The Recognition by Components Theory
So, let us develop a theory that will explain the understanding or recognition of three-dimensional objects rather than two-dimensional objects so that they start the basic assumption of recognition by components theory is that a specific view of an object can be represented as an arrangement of a simple 3d objects and so bitumen basically came up with these simple 3d objects called geons okay I’ll show you the geons in a short while but these geons were supposed to combine and represent particular shapes or particular objects and the idea was that we are actually understanding each of these geons and that is what is leading to our understanding or recognition of the objects you can see here say for example you have geons on the left side and you have these objects on the right side and you can see each of these objects on the right side are actually composed of these different geons.
You can see say for example the telephone is composed of geons number one three and five, this thing, the cup is basically made of geons number three and number five okay so this is basically in some sense you kind of extract this higher level feature or say for example you can extract this higher level component from the three-dimensional objects that you see and you make a sense of that, okay if you combine this object with this object and say for the geon number three and g on number five this will lead us to form what is called a cup ok in some sense what you are doing is we are recognizing by these discrete components.
All objects are basically can be understood to be permutations and combinations of these various geons ok this is pretty much what the Irving Biederman model of recognition by components was talking about in general let us elaborate on this a little bit, in general, the arrangement of three geons would give people enough information to classify a particular object in that sense Biederman’s recognition by component theory is essentially a featured analysis theory as I was already saying for three-dimensional objects.
You have three features you have a curved thing if you look at this figure again you have a cylinder which is number three you have a pipe-like curved thing which is number five and you combine the cylinder with this pipe-like thing you get a cup which is an object so this is what is the thing with this particular theory that you can combine these different components and these components are pretty much as features only and you can combine these different components to get representations of objects in the world.
Biederman and colleagues
Biederman and colleagues basically conduct this fMRI research with humans and single-cell recordings with monkeys and their findings have shown that areas of the cortex beyond the primary visual cortex respond actually respond to geons as presented earlier so the thing is we do have some coding for these three-dimensional features or three-dimensional components and in that sense maybe we are sensitive to how these components occur as parts of different objects and maybe that helps us recognize these different objects.
There is some data that supports the recognition by components theory now the recognition by components theory also requires an important modification because people will recognize an object more quickly when those objects are seen from a standard viewpoint so now the thing is this if you see a cup or say for example if you see a phone here this is one representation of the phone or this is one representation of the cup suppose I invert the cup suppose I kind of tilt the cup then the componential analysis might change a little bit ok at least the feature extraction will change a little bit.
You will need to modify this recognition by component theory by a little bit because people will recognize objects more quickly from a standard viewpoint what is the standard viewpoint it is a canonical viewpoint ok rather than a much different or a non-canonical viewpoint now a modification of this approach basically named as the viewer centered approach this approach by the way recognition of components in the standard form is recognition is an object-centered approach but say for example if you are a person who was moving around the room and having different views of these objects you would have to have what is called a viewer centered approach the viewer centered approach basically proposes that we store a small number of 3d objects as features or as components rather than just one view.
We will just store a small number of views we will see what are the three or four views of a cup I can have you store each of these three or four views and you kind of develop a componential analysis of how these three or few views will be made and that will lead to understanding these components.
Yes, and when the when you come across such an object you will mentally rotate the image of the object until it matches one of the views that is already stored and in the memory and then say for example by a combination of a top-down and a bottom upper row she will recognize that particular object we can now talk about its a good point to talk about top-dow influences now top-down influences emphasize how a person’s concepts and higher-level mental processes will influence object recognition more precisely how a person’s expectations and memory may help you in recognizing these objects.
We can expect certain shapes to be found in certain locations and we can expect to encounter those shapes because of past experiences say for example if you’re looking at your study desk you will expect to find a particular say for example a notebook there or maybe a pen there suppose I bring a particular object and keep it maybe I bring a cylinder and keep it if you touch that cylinder if there is no light you’re not really looking at it if you touch that cylinder more often than not you will expect it to be a pen maybe it’s not maybe it’s a pipe for that matter.
These kinds of expectations kind of help us fill the gaps in our sensory input an example could be something like this you see you can probably read this the man ran but you’ll see both the a’s have the top part cut so it’s technically perceptually not an a but you will see say for example in the first case you read it at as h in the second and the third case that this is a-okay even though it is exactly matching the h in the first word.
What kind of thing to expect where and that is how the top-down influence will modulate your sensory experience and help you generate a perceptual representation something like this there is something a figure which I a message which I found floating on the internet a lot of Facebook memes about this and it says that you can read this you can see say for example the first letter is basically replaced by a number but that number perceptually resembles the letter it was supposed if you read the first word this is this the 7 basically resembles the t and the 5 resembles the s so you can still understand it is this and say for example you talk about the message the second word e resembles s and 3 resembles e and 5 resembles s.
You can still make out that it is a message ok now let us move to a special case of object recognition let us talk about face recognition now according to psychologists most people perceive faces in a slightly different fashion than they perceive other stimuli and they say that phase perception is somehow really special there is a lot of research into face perception and it all of it suggests that faces are slightly special stimuli as compared to other objects in the world young infants track the movements of a photographed human face much more than other similar stimuli say for example it shows that faces are socially or in some sense important to even the young infants we are evolutionary probably wired to treat faces as slightly special stimuli and it makes sense because –
for example, men or humans being social animals this aspect of recognizing a face is a socially important skill also aspects of recognizing the emotions of a face is also socially important to suppose you’re looking at a face of an individual and he suddenly shoots you so how will that be understood if you’re not recognizing the face of the individual that this individual is angry and might act aggressively if you’re not really already on your feet you will pay a huge cost so in that sense, you can understand that faces are a socially important stimulus they are a special class of stimuli a lot of research really talks about that say for example Tanaka and Farah in 1993 found that people were significantly more accurate in recognizing facial features when they appeared within the context of a whole face rather than in isolation.
So you have to really look at the whole face you’re not tuned to recognizing features or faces without the whole face again something that probably is not true if you are talking about objects and features okay in contrast yeah that’s the contrast I was talking about when they charged houses they were just as accurate in recognizing isolated houses or in isolated house features say for example a door or a window or a gate something like that this shows that we recognize faces on a holistic basis we kind of have a holistic or overall understanding of what a face should look like what are the features that holistically this particular object should have and that is how we kind of this organization of the eyes and the nose and the mouth and the ears is what we kind of look at as a face.
Neuroscience Research on Face Recognition
It’s kind of in that sense slightly difficult if the eyes are presented separately or if the nose is presented separately it will take slightly more time for you to recognize that this makes sense because face perception has a special status given the importance of our social interactions something I was already talking about there is also a lot of neuroscience research on face recognition McNeil and Waddington they studied a professional who had lost his ability to recognize human faces after he experienced several strokes this patient at a later point change his career started to raise sheep but surprisingly it was found that he could recognize many of the sheep’s faces though he could still not recognize human faces.
now this special condition was diagnosed and later termed prosopagnosia which is a condition in which people cannot recognize human faces visually though they can perceive other objects relatively normally.
It again is a clue to how important recognition of faces is for us as humans the location in the brain most responsible for face recognition is the temporal cortex at the side of the brain generally the right side specifically the inferior temporal cortex so the area under the temporal cortex in the lower portion that is is what is implicated it has also been shown that certain cells in the inferior temporal cortex respond especially vigorously when encountered with faces.
Region of the Brain used in identifying Faces
This is the area that kind of lights up when you are presented with a particular face also it has been reported in a lot of fMRI studies that the brain responds much more quickly to faces present in the upright condition than two phases presented in the inverted position because the configuration would change actually and then you would probably have to apply a feature analysis or recognition by components kind of approach to see what faces if it is automatically if it is in the canonical position you probably gain more information out of it much more quickly.
This is the representation this is the yellow region is basically called the fusiform gyrus this is the region that is actually responsible for us to recognize faces now this was all about object recognition let us try and sum up what we’ve talked about object recognition we saw that object revolution can be achieved by a combination of top-down and bottom-up approaches we also saw that perception of faces is a special case of object recognition because faces carry much more information and social salience as compared to some of the other objects that we interact with thank you.