DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
 


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rodriguez-Serranoet al. (US 20170083792 A1), in view of Zadeh ( US 20180204111 A1), and further in view of Strong (US 20190043201 A1, DATE FILED: September 26, 2018), and further in view of  Birdwell ( US 20130054603 A1)
Re Claim 1, Rodriguez-Serrano discloses a tangible, non-transitory, computer-readable medium storing computer program instructions that when executed by one or more processors effectuate operations (see Rodriguez-Serrano: e.g. -- The memory 14 may represent any type of non-transitory computer readable medium such as random access memory (RAM)… the processor 18 and memory 14 may be combined in a single chip.  Memory 14 stores instructions for performing the exemplary method--, in [0035]) comprising:
obtaining, with a computer system, (i) an image captured by a mobile computing device (see Rodriguez-Serrano: e.g. -- a query image 12. The system includes memory 14, which stores software instructions 16 for performing the method described with reference to FIG. 2 and a processor 18 in communication with the memory for executing the instructions. The system may be resident on one or more computing devices, such as the illustrated computer 20. One or more input output devices (I/O) 22, 24 allow the system 10 to communicate with external devices, such as the illustrated image capture device 26--, in [0027]-[0028], and, --The computer system 10 may include one or more computing devices 20, such as a desktop, a laptop, palmtop computer, portable digital assistant (PDA), server computer, cellular telephone, tablet computer, pager, computing device integral with or associated with the camera 26,--, in [0034]-[0035]);
Rodriguez-Serrano however does not explicitly disclose ii) coordinates indicating an input location of an input detected on a display screen of the mobile computing device, 
Zadeh teaches {obtaining}  ii) coordinates indicating an input location of an input detected on a display screen of the mobile computing device (see Zadeh: e.g., -- once the user clicks on some object on screen, which is traceable, as an input device (such as screen of APPLE® IPHONE), the system can find what object is chosen by the user, based on extracted objects or based on the coordinate of the objects on screen… the system will understand what the user wants to select from its screen coordinate and location of the objects,--, in [2197]-[2198]);
Rodriguez-Serrano and Zadeh are combinable as they are in the same field of endeavor: object identification/recognition/detection through feature extraction and classification. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rodriguez-Serrano’s medium using Zadeh’s teachings by including {obtaining}  ii) coordinates indicating an input location of an input detected on a display screen of the mobile computing device to Rodriguez-Serrano’s application of object detection and classification based on similarity metric in order to match the images on the webpage to similar products in their catalogs (see Zadeh: e.g., in [2774]);
Rodriguez-Serrano as modified by Zadeh further disclose wherein: the input caused the image to be captured, the input location is a location in pixel-space of the image (see Zadeh: e.g., -- once the user clicks on some object on screen, which is traceable, as an input device (such as screen of APPLE® IPHONE), the system can find what object is chosen by the user, based on extracted objects or based on the coordinate of the objects on screen… the system will understand what the user wants to select from its screen coordinate and location of the objects, and then the system gives all info, docs, specs, links, web sites, history, dictionary, encyclopedia, merchants, manufacturers, agents, characteristics, related objects, suggested objects, suggested similar or replacement or complementary objects--, in [2197]-[2206]), and
the image depicts a first object located at a first location in the image (see Rodriguez-Serrano: e.g., --In object localization, the input image is expected to contain an object (or more than one), and the aim is to output information about the location of each object, such as the rectangle that tightly encompasses the object. In some instances, it may be desirable to identify a single, prominent object. The definition of “prominent object” may be specified by examples: for example, training dataset of images may be provided with their corresponding annotations of the true object locations.--, in [0003], and, -- Object location information is then transferred from at least one of the subset of annotated images to the query image. Information based on the transferred object location information is output.--, in [0016]; also see Zadeh: e.g., -- once the user clicks on some object on screen, which is traceable, as an input device (such as screen of APPLE® IPHONE), the system can find what object is chosen by the user, based on extracted objects or based on the coordinate of the objects on screen… the system will understand what the user wants to select from its screen coordinate and location of the objects, and then the system gives all info, docs, specs, links, web sites, history, dictionary, encyclopedia, merchants, manufacturers, agents, characteristics, related objects, suggested objects, suggested similar or replacement or complementary objects--, in [2197]-[2206]);
obtaining, with the computer system, a computer-vision object recognition model trained using a training data set comprising images depicting objects (see Rodriguez-Serrano: e.g. -- the CNN 43 may be/have been trained by end-to-end learning of the parameters of the neural network using a set of training images labeled by class--, in [0041], and, -- annotated image 38 is annotated with a bounding box which identifies a location of an object of interest.--, in [0029], and [0042]; also see: -- a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.  In another embodiment, the information 36 may simply be a crop of the image 12 which includes the region in the bounding box, or other information extracted therefrom.  An output component 62 outputs the information 32.--, in [0032]-[0033], and see “appearance models for human pose estimation” in [0094]);
wherein: each image of the training data set is labeled with an object identifier (see Rodriguez-Serrano: e.g. -- the CNN 43 may be/have been trained by end-to-end learning of the parameters of the neural network using a set of training images labeled by class--, in [0041], and, -- annotated image 38 is annotated with a bounding box which identifies a location of an object of interest.--, in [0029], and [0042]; also see: -- a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.  In another embodiment, the information 36 may simply be a crop of the image 12 which includes the region in the bounding box, or other information extracted therefrom.  An output component 62 outputs the information 32.--, in [0032]-[0033], {so that herein “object classifier”, or “bounding box” is object identifier}),
Rodriguez-Serrano as modified by Zadeh further disclose he obtained training set depicts objects including more than 100 different objects (see Rodriguez-Serrano: e.g., -- a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.  In another embodiment, the information 36 may simply be a crop of the image 12 which includes the region in the bounding box, or other information extracted therefrom.  An output component 62 outputs the information 32.--, in [0032]-[0033]),
Rodriguez-Serrano as modified by Zadeh however do not explicitly disclose above training set depicts objects in an ontology of objects,
Strong teaches training set depicts objects in an object ontology depicted by a corresponding image (see Strong: e.g., -- the additional features can be identified based on a defined ontology, such as an object ontology that represents or defines a hierarchy of objects and their associated relationships at multiple levels of abstraction.--, in [0659]-[0661]),
Rodriguez-Serrano (as modified by Zadeh) and Strong are combinable as they are in the same field of endeavor: object identification/recognition/detection through feature extraction and classification. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rodriguez-Serrano (as modified by Zadeh)’s medium using Strong’s teachings by including training set depicts objects in an object ontology depicted by a corresponding image to Rodriguez-Serrano’s training data set in order to  determine whether any of the additional features are detected in the visual data, and based on any such detected features, the semantic processing phase can similarly be performed again to further identify additional features that are expected in the visual data. The process may continue cycling through multiple iterations of the feature recognition and semantic processing phases in this manner in order to continue detecting features in the visual data (see Strong: e.g., [0660]);
Rodriguez-Serrano as modified by Zadeh and Strong further disclose and
the object ontology comprises the first object (see Rodriguez-Serrano: e.g. -- the CNN 43 may be/have been trained by end-to-end learning of the parameters of the neural network using a set of training images labeled by class--, in [0041], and, -- annotated image 38 is annotated with a bounding box which identifies a location of an object of interest.--, in [0029], and [0042]; also see: -- a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.  In another embodiment, the information 36 may simply be a crop of the image 12 which includes the region in the bounding box, or other information extracted therefrom.  An output component 62 outputs the information 32.--, in [0032]-[0033]);
	detecting, with the computer system, with the computer-vision object recognition model , the first object (see Rodriguez-Serrano: e.g. -- encodes each image using a single, spatially-variant feature vector, and expresses detection as a "query" to an annotated training set: given a new image, the method first computes its similarity to all images in the training set….--, in [0004]-[0006], and, --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]) based on:
	a first distance in a feature space of the computer-vision object recognition model between an image feature vector of the image and a first feature vector of the first object in the computer-vision object recognition model (see Rodriguez-Serrano: e.g., -- In such a case, the dot product between the feature vectors of two images 12, 38 is equivalent to the cosine similarity. Then at S110, for a query image 12, denoted Q, q=p.sub.5(Q) is computed in the same manner and l.sub.2-normalized….another similarity measure which does not require normalization, such as the Euclidean distance, is used…. This results in a matrix where the annotated images with 
a higher bounding box overlap are likely to be considered more similar in the new feature space. --, in [0072]-[0079]); 
	Rodriguez-Serrano as modified by Zadeh and Strong however do not explicitly disclose a first distance in the pixel-space of the image between the input location of the input and the first location of the first object, 
Birdwell disclose a first distance in the pixel-space of the image between the input location of the input and the first location of the first object (see Birdwell: e.g., -- known metrics will be individually discussed as known in the art: inner product, Euclidean distance, Mahalanobis distance, Manhattan distance--, in [0020]-[0022], and, -- a contour structure yields similar values and all points with a similarity distance may be determined with a similarity distance value.--, in [0027], and, -- a database that implements efficient similarity-based, or nearest-neighbor search. This means that a request to search the content of the database will return identifiers for objects that are within a specified distance to a reference, or target, object but may not precisely match the target's characteristics. One way to define the term "distance" uses a metric that is defined on the stored objects--, in [0131]-[0132], and, -- calculating a distance to a centroid for the reference or training group and comparing the distance to a threshold--, in [0237]),
Rodriguez-Serrano (as modified by Zadeh and Strong) and Birdwell are combinable as they are in the same field of endeavor: object identification/recognition/detection through feature extraction and classification. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Rodriguez-Serrano (as modified by Zadeh and Strong)’s medium using Birdwell’s teachings by including a first distance in the pixel-space of the image between the input location of the input and the first location of the first object to Rodriguez-Serrano (as modified by Zadeh and Strong)’s object detection and classification based on similarity metric  in order to improve the search time by organizing vectors of attributes for the identification of the objects (see Birdwell: e.g., in [0164]);
Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose causing, with the computer system, a first object identifier of the first object from the object ontology to be stored in memory (see Rodriguez-Serrano : e.g., Fig. 5, and, -- to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space).--, in [0046]).

Re Claim 2, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose the image depicts a second object located at a second location in the image (see Rodriguez-Serrano: e.g. --other types of information are contemplated.  For example, the information extraction component 60 may include a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.--; in [0033], {so that “image of license plate” as the first object, then “a bird” is the second object}; also see: Fig. 5, training data set 40, and I1, I2, in [0070]-[0072]);
the object ontology comprises the second object (see Strong: e.g., -- the additional features can be identified based on a defined ontology, such as an object ontology that represents or defines a hierarchy of objects and their associated relationships at multiple levels of abstraction.--, in [0659]-[0661]); and 
the first object is detected based on
a second distance in the feature space of the computer-vision object recognition model between the image feature vector of the image and a second feature vector of the second object in the computer-vision object recognition model (See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]), and
a second distance in the pixel-space of the image between the input location of the input and the second location of the second object (see Birdwell: e.g., -- known metrics will be individually discussed as known in the art: inner product, Euclidean distance, Mahalanobis distance, Manhattan distance--, in [0020]-[0022], and, -- a contour structure yields similar values and all points with a similarity distance may be determined with a similarity distance value.--, in [0027], and, -- a database that implements efficient similarity-based, or nearest-neighbor search. This means that a request to search the content of the database will return identifiers for objects that are within a specified distance to a reference, or target, object but may not precisely match the target's characteristics. One way to define the term "distance" uses a metric that is defined on the stored objects--, in [0131]-[0132], and, -- calculating a distance to a centroid for the reference or training group and comparing the distance to a threshold--, in [0237]).

Re Claim 3, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the object ontology further comprises a third object not depicted in the image (see Rodriguez-Serrano: e.g. -- leverages a CNN 43 trained for one task (classification) on one dataset (typically, ImageNet.RTM.) for another task (DDD) by retraining it on another dataset (a detection dataset) with small amounts of data.--, in [0090], and, -- This is a prominent person detection task (where the subject is in a challenging "sportive" pose.  This dataset contains 10,000 images gathered from Flickr searches for the tags `parkour`, `gymnastics`, and `athletics` and consists of poses deemed to be challenging to estimate.  Each image has a corresponding annotation gathered from Amazon Mechanical Turk and as such cannot be guaranteed to be highly accurate.  The images have been scaled such that the annotated person is roughly 150 pixels in length.  Each image has been annotated with up to 14 visible joint locations.--, in [0094]; -- to produce a compact representation 46 of the query image 12. … projecting the representations 46, 48 into a feature space (e.g., one of lower dimensionality)--, in [0031], and [0072]), detecting the first object further comprises:
detecting, with the computer system, the first object based on a distance in the feature space of the computer-vision object recognition model between the image vector of the image and a third feature vector of the third object in the computer-vision object recognition model (See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]).

Re Claim 4, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose determining, with the computer system, whether the first distance in the feature space and the second distance in the feature space are less than a predefined threshold distance; and selecting, with the computer system, based on the first distance in the feature space being less than the predefined threshold distance and the second distance in the feature space being greater than the predefined threshold distance, the first object. ((See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]; also see Birdwell: e.g., -- known metrics will be individually discussed as known in the art: inner product, Euclidean distance, Mahalanobis distance, Manhattan distance--, in [0020]-[0022], and, -- a contour structure yields similar values and all points with a similarity distance may be determined with a similarity distance value.--, in [0027], and, -- a database that implements efficient similarity-based, or nearest-neighbor search. This means that a request to search the content of the database will return identifiers for objects that are within a specified distance to a reference, or target, object but may not precisely match the target's characteristics. One way to define the term "distance" uses a metric that is defined on the stored objects--, in [0131]-[0132], and, -- calculating a distance to a centroid for the reference or training group and comparing the distance to a threshold--, in [0237]).

Re Claim 5, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose selecting, with the computer system, the first object based on the first distance in the pixel-space being less than the second distance in the pixel-space indicating that the input is directed to the first object see Birdwell: e.g., -- known metrics will be individually discussed as known in the art: inner product, Euclidean distance, Mahalanobis distance, Manhattan distance--, in [0020]-[0022], and, -- a contour structure yields similar values and all points with a similarity distance may be determined with a similarity distance value.--, in [0027], and, -- a database that implements efficient similarity-based, or nearest-neighbor search. This means that a request to search the content of the database will return identifiers for objects that are within a specified distance to a reference, or target, object but may not precisely match the target's characteristics. One way to define the term "distance" uses a metric that is defined on the stored objects--, in [0131]-[0132], and, -- calculating a distance to a centroid for the reference or training group and comparing the distance to a threshold--, in [0237]).

Re Claim 6, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose determining, with the computer system, a second object identifier of the second object from the object ontology based on the first object identifier of the first object (see Rodriguez-Serrano: e.g., --other types of information are contemplated.  For example, the information extraction component 60 may include a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.--; in [0033], {so that “image of license plate” as the first object, then “a bird” is the second object}); and
causing, with the computer system, the second object identifier of the second object to be stored in the memory (see Rodriguez-Serrano: e.g., -- another similarity measure which does not require normalization, such as the Euclidean distance, is used…. This results in a matrix where the annotated images with a higher bounding box overlap are likely to be considered more similar in the new feature space. --, in [0072]-[0079]).

Re Claim 7, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose the operations further comprise:
causing, with the computer system, a first search to be performed for first information related to the first object using the first object identifier as a first query input for the first search (see Rodriguez-Serrano: e.g. -- leverages a CNN 43 trained for one task (classification) on one dataset (typically, ImageNet.RTM.) for another task (DDD) by retraining it on another dataset (a detection dataset) with small amounts of data.--, in [0090], and, -- This is a prominent person detection task (where the subject is in a challenging "sportive" pose.  This dataset contains 10,000 images gathered from Flickr searches for the tags `parkour`, `gymnastics`, and `athletics` and consists of poses deemed to be challenging to estimate.  Each image has a corresponding annotation gathered from Amazon Mechanical Turk and as such cannot be guaranteed to be highly accurate.  The images have been scaled such that the annotated person is roughly 150 pixels in length.  Each image has been annotated with up to 14 visible joint locations.--, in [0094]; -- to produce a compact representation 46 of the query image 12. … projecting the representations 46, 48 into a feature space (e.g., one of lower dimensionality)--, in [0031], and [0072]; also see Birdwell: e.g., -- First, the database's tree-structured index can be maintained in memory, as well as vectors of attributes for the stored objects. Second, the operations that must be performed at each node of the index are a small number of vector inner products (to obtain the scores for a search target for each principal component used by the node), followed by evaluation of a set of Boolean expressions involving a small number of comparisons.--, in [0164]);
providing, with the computer system, for display on the display screen of the mobile computing device, a kiosk device including a display screen, or the display screen of the mobile computing device and the kiosk device, at least some of the first information and at least some of the second information (see Zadeh: e.g., ., -- once the user clicks on some object on screen, which is traceable, as an input device (such as screen of APPLE® IPHONE), the system can find what object is chosen by the user, based on extracted objects or based on the coordinate of the objects on screen… the system will understand what the user wants to select from its screen coordinate and location of the objects,--, in [2197]-[2198]; -- a kiosk in a store with camera… the kiosk acts as a recognition unit--, in [2455], [2563], and, -- A problem scenario (as for example depicted in Fig. B1 of Appendix 4): A user sees an item, e.g., in a store or at a party or on a website, and wants to have it.  The item may be hard to describe by words beyond few generic terms.  In typical search engines (by words) the desired item… to sort through the results (even if the desired item is there), and not being taken to the targeted (desired) product item (webpage)…. to match the images on the webpage to similar products in their catalogs--, in [2774];  --the desired item based on image 
taken by the user or identified by the user--, in [2774]-[2775]).

Re Claim 8, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein detecting the first object comprises:
determining, with the computer system, a first score indicating how similar the first object in the image is to a first identified object from the object ontology represented by the first object identifier (see Birdwell: e.g., -- First, the database's tree-structured index can be maintained in memory, as well as vectors of attributes for the stored objects. Second, the operations that must be performed at each node of the index are a small number of vector inner products (to obtain the scores for a search target for each principal component used by the node), followed by evaluation of a set of Boolean expressions involving a small number of comparisons. Depending upon the complexity of the application, search times for exact matches of microseconds to 10s of milliseconds are feasible for a database…. The methodology exhibits good scalability, with the largest runs to date involving over 100 million stored objects. Search times typically scale logarithmically with database size.--, in [0164]; similarly, also see Zadeh: e.g., -- the additional facts are tested against specific test scenarios for scoring and validations.  In one embodiment, additional facts are promoted/tagged as general facts after a validation process and/or passing a validation threshold.--, in [1491], and, -- The test score evaluation module assigns a second test score to the antecedent part based on the antecedent part, set of candidate probability distributions, and the first test score.  The fuzzy logic inference engine determines whether the antecedent part is satisfied beyond a threshold, based on the second test score.--, in [1532], and, -- (j) searching for objects, search engines,--, in [1545], and, -- auxiliary queue/list/table (e.g., 12125) is scanned and items marked for removal (e.g., A.sub.2) are removed having fuzzy ending value(s) (e.g., x.sub.m) less than current value--, in [1514]);
determining, with the computer system, a second score indicating how similar the second object in the image is to a second identified object from the object ontology represented by a second object identifier (see Birdwell: e.g., -- First, the database's tree-structured index can be maintained in memory, as well as vectors of attributes for the stored objects. Second, the operations that must be performed at each node of the index are a small number of vector inner products (to obtain the scores for a search target for each principal component used by the node), followed by evaluation of a set of Boolean expressions involving a small number of comparisons. Depending upon the complexity of the application, search times for exact matches of microseconds to 10s of milliseconds are feasible for a database…. The methodology exhibits good scalability, with the largest runs to date involving over 100 million stored objects. Search times typically scale logarithmically with database size.--, in [0164]; similarly, also see Zadeh: e.g., -- the additional facts are tested against specific test scenarios for scoring and validations.  In one embodiment, additional facts are promoted/tagged as general facts after a validation process and/or passing a validation threshold.--, in [1491], and, -- The test score evaluation module assigns a second test score to the antecedent part based on the antecedent part, set of candidate probability distributions, and the first test score.  The fuzzy logic inference engine determines whether the antecedent part is satisfied beyond a threshold, based on the second test score.--, in [1532], and, -- (j) searching for objects, search engines,--, in [1545], and, -- auxiliary queue/list/table (e.g., 12125) is scanned and items marked for removal (e.g., A.sub.2) are removed having fuzzy ending value(s) (e.g., x.sub.m) less than current value--, in [1514]);
generating, with the computer system, a first revised score based on the first score and a first weight applied to the first score, wherein the first weight is determined based on a first distance between the input location of the input and the first location of the first object ((See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]; and see Birdwell: e.g., -- First, the database's tree-structured index can be maintained in memory, as well as vectors of attributes for the stored objects. Second, the operations that must be performed at each node of the index are a small number of vector inner products (to obtain the scores for a search target for each principal component used by the node), followed by evaluation of a set of Boolean expressions involving a small number of comparisons. Depending upon the complexity of the application, search times for exact matches of microseconds to 10s of milliseconds are feasible for a database…. The methodology exhibits good scalability, with the largest runs to date involving over 100 million stored objects. Search times typically scale logarithmically with database size.--, in [0164]; similarly, also see Zadeh: e.g., -- the additional facts are tested against specific test scenarios for scoring and validations.  In one embodiment, additional facts are promoted/tagged as general facts after a validation process and/or passing a validation threshold.--, in [1491], and, -- The test score evaluation module assigns a second test score to the antecedent part based on the antecedent part, set of candidate probability distributions, and the first test score.  The fuzzy logic inference engine determines whether the antecedent part is satisfied beyond a threshold, based on the second test score.--, in [1532], and, -- (j) searching for objects, search engines,--, in [1545], and, -- auxiliary queue/list/table (e.g., 12125) is scanned and items marked for removal (e.g., A.sub.2) are removed having fuzzy ending value(s) (e.g., x.sub.m) less than current value--, in [1514]);
generating, with the computer system, a second revised score based on the second score and a second weight applied to the second score, wherein the second weight is determined based on a second distance between the input location of the input and the second location of the first object; and selecting, with the computer system, the first object based on the first revised score and the second revised score (See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]).

Re Claim 9, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations further comprise: generating, with the computer system, an enhanced version of the image by enhancing the image in a region of the image surrounding the input location, wherein the enhancing the image comprises performing, to the region surrounding the input location, at least one of: light balance enhancement, shadow removal, pattern recognition, or color spectrum recognition (see Zadeh: e.g., --the user (e.g., browser) is taken to the catalog item webpage, saving the user time and trouble of sorting through thousands of irrelevant items.  In one embodiment, the exact, similar and matching items are shown/provided to the user, based on color, pattern, or style identified/recognized in the image.  In one embodiment, complementary items (e.g., in an outfit), e.g., by pattern, style, size, material, model, brand, price, and merchant, are shown/provided to the user, in a computing device such as a mobile device, laptop, or desktop--, and, --a user's computing device sends or uploads an image to a server (e.g., a merchant server or website).  In one embodiment, the user captures the image via built-in camera on the computing device (e.g., a mobile device) or from an album repository on the device.  In one embodiment, the user via the computing device provides a URI for the image (e.g., residing in a cloud or network) to the server and the image is uploaded to the server based on the URI.--, in [2774]-[2782]).
 
	Re Claim 10, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations further comprise: generating, with the computer system, a compressed version of the image by compressing portions of the image further than a threshold distance from the input location, wherein compressing the portions of the image comprise: segmenting the image into blocks, identifying a set of blocks encompassing the input location, compressing pixels in each remaining block from the blocks excluding the set of blocks with a first amount of loss, and compressing pixels in each block of the set of blocks with a second amount of loss, wherein the second amount of loss is smaller than the first amount of loss (see Rodriguez-Serrano: e.g. -- encodes each image using a single, spatially-variant feature vector, and expresses detection as a "query" to an annotated training set: given a new image, the method first computes its similarity to all images in the training set….--, in [0004]-[0006], -- compression of the feature vectors is beneficial in order to ensure a fast lookup.  DDD offers two variants.  In the case of metric learning, a low-rank metric learning algorithm is used which allows working in a projected (lower-dimensional) subspace--, in [0007]-[0008], -- to produce a compact representation 46 of the query image 12. … projecting the representations 46, 48 into a feature space (e.g., one of lower dimensionality)--, in [0031], and [0072]; and, .--evaluates the loss of a mini-batch of training images and the derivatives.  The above loss function and derivatives correspond to a mini-batch of size 1, but they are readily extendable to a larger value.--, in [0078], and [0088]).

Re Claim 11, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations further comprise: providing, with the computer system, the first object identifier to a computer vision search system to obtain information indicating at least one of: a location of the first object, an availability to purchase the first object, one or more related objects, or a name of the first object (see Zadeh: e.g., --the user (e.g., browser) is taken to the catalog item webpage, saving the user time and trouble of sorting through thousands of irrelevant items.  In one embodiment, the exact, similar and matching items are shown/provided to the user, based on color, pattern, or style identified/recognized in the image.  In one embodiment, complementary items (e.g., in an outfit), e.g., by pattern, style, size, material, model, brand, price, and merchant, are shown/provided to the user, in a computing device such as a mobile device, laptop, or desktop--, and, --a user's computing device sends or uploads an image to a server (e.g., a merchant server or website).  In one embodiment, the user captures the image via built-in camera on the computing device (e.g., a mobile device) or from an album repository on the device.  In one embodiment, the user via the computing device provides a URI for the image (e.g., residing in a cloud or network) to the server and the image is uploaded to the server based on the URI.--, in [2774]-[2782]).

Re Claim 12, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein causing the first object identifier of the first object to be stored in the memory comprises:
causing, with the computer system, in response to the first object being detected, the first object identifier of the first object to be stored in the memory (see Rodriguez-Serrano : e.g., Fig. 5, and, -- to identify a subset 56 of one or more similar annotated image(s), i.e., those which have the most similar representations (in the new feature space).--, in [0046])., wherein:

the first object identifier of the first object is stored in the memory in association with the first image, one or more features extracted from the first image, or the first image and the one or more features extracted from the first image (See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]; also see Birdwell: e.g., -- First, the database's tree-structured index can be maintained in memory, as well as vectors of attributes for the stored objects. Second, the operations that must be performed at each node of the index are a small number of vector inner products (to obtain the scores for a search target for each principal component used by the node), followed by evaluation of a set of Boolean expressions involving a small number of comparisons. Depending upon the complexity of the application, search times for exact matches of microseconds to 10s of milliseconds are feasible for a database…. The methodology exhibits good scalability, with the largest runs to date involving over 100 million stored objects. Search times typically scale logarithmically with database size.--, in [0164]).

Re Claim 13, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the input comprises at least one of: a touch event whereby a user is determined to have touched the display screen of the mobile computing device at the input location, wherein the display screen comprises a capacitive touch screen;
a gesture detected by the mobile computing device, wherein the gesture is determined to be directed to the input location; or an eye gaze detected by the mobile computing device, wherein the eye gaze is determined by tracking a user’s eyes, wherein the input location is determined based on the user’s eyes being tracked to the input location and dwelling on the input location for more than a threshold amount of time (see Zadeh: e.g., -- once the user clicks on some object on screen, which is traceable, as an input device (such as screen of APPLE® IPHONE), the system can find what object is chosen by the user, based on extracted objects or based on the coordinate of the objects on screen… the system will understand what the user wants to select from its screen coordinate and location of the objects,--, in [2197]-[2198]).

	 

Re Claim 14, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the computer-vision object recognition model comprises a convolutional neural network having three or more layers (See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]).

Re Claim 15, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations comprise steps for performing depthwise separable convolutions (See Rodriguez-Serrano: e.g. --For a query image, a query image representation is generated, based on activations output by the new layer of the model.  A subset of the annotated images is identified by computing a similarity between the query image representation and each of the annotated image representations.  Object location information from at least one of the subset of annotated images is transferred to the query image and information based on the transferred object location information is output.--, in [0019] {herein the “subset of annotated images” is “the first training set”}, and, -- the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5….. similarity measure… such as the Euclidean distance, is used--, in [0070]-[0072]; -- the metric 52, such as a matrix, is learned, at S122, on a set of annotated training images.  This may include jointly learning the metric and adapting weights of the convolutional layers of the model 42 by backpropagation.  The metric learning may take place at any time before embedding the annotated images and query image into the new feature space.--, in [0051], [0078]).

Re Claim 16, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations comprise steps for sparse learning for computer vision (see Zadeh: e.g., -- Learning Higher Details Iteratively:… the preprocessed thumbnail is applied to the visible layer, as for example depicted in FIG. 197(b), by clamping a thumbnail pixel value (e.g., obtained by averaging the data/image pixel values) to a corresponding (sparse) visible unit in V layer, according to the resolution reduction from the image/data to the thumbnail.--, in [1745]-[1746], -- use Vapnik's support vector machines (SVM) to classify the data or recognize the object. In one embodiment, in addition, we use kernels (e.g. using Gaussian processes or models) to be able to handle any shape of data distribution with respect to feature space, to transfer the space in such a way that the separation of classes or clusters becomes easier. In one embodiment, we use sparse kernel machines, maximum margin classifiers, multiclass SVMs, logistic regression method, multivariate linear regression, or relevance vector machines (RVM) (which is a variation of SVM with less limitations), for classification or recognition.--, in [2114], [2554], [3154] and [3174]).

Re Claim 17, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations comprise steps for training data collection for computer vision (see Rodriguez-Serrano: e.g., -- trained neural network 43 may be used directly as the model 42 or the model may be generated by adapting the trained CNN… the CNN 43 may be/have been trained by end-to-end learning of the parameters of the neural network using a set of training images labeled by class--, in [0041], and, -- annotated image 38 is annotated with a bounding box which identifies a location of an object of interest.--, in [0029], and [0042]; also see: -- a classifier which classifies the localized object, e.g., a bird, animal, vehicle, or the like, into one of a predefined set of classes, or outputs a probabilistic assignment over some or all the classes.  In another embodiment, the information 36 may simply be a crop of the image 12 which includes the region in the bounding box, or other information extracted therefrom.  An output component 62 outputs the information 32.--, in [0032]-[0033]).

Re Claim 18, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations comprise steps for context-aided visual search (see Zadeh: e.g., -- related to General AI, versus Specific or Vertical or Narrow AI, machine learning, using/requiring only a small number of training samples (same as humans can do), learning one concept and use it in another context or environment (same as humans can do), addition of reasoning and cognitive layers to the learning module (same as humans can do), continuous learning and updating the learning machine continuously (same as humans can do), simultaneous learning and recognition--, in [0225], and, [1249]-[1251]; and, -- default properties for a given object, so that they are applicable in the absence of any other data. In one embodiment, we define general knowledge and contextual knowledge, for specific situations. In one embodiment, having a large knowledge base and large training samples are very helpful for learning and recognition purposes.--, in [2160]).

Re Claim 19, Rodriguez-Serrano as modified by Zadeh and Strong and Birdwell further disclose wherein the operations comprise steps for tap-to-search (see Zadeh: e.g., -- the search engine works on music or sound or speech or talking pieces or notes, to find or match or compare, for taped e-books, text-to-voice conversions, people's speech, notes, music, sound effects, sound sources, ring tones, movie's music, or the like, e.g. to find a specific corresponding music title or movie title, by just humming or whistling the sound (or imitate the music or notes by mouth, or tapping or beating the table with hand), as the input. The output is all the similar sounds or sequence of notes that resemble the input, extracted and searched from Internet or a music or sound repository.--, in [1405]).



Re Claim 20, claim 20 is corresponding method claim to claim 1 respectively.  Claim 20 thus is rejected for the similar reasons for claim 1. See above discussions with regard to claim 1 respectively. Further, Rodriguez-Serrano  as modified by Zadeh and Strong and Birdwell method to perform the steps (see Rodriguez-Serrano: e.g. -- The memory 14 may represent any type of non-transitory computer readable medium such as random access memory (RAM)… the processor 18 and memory 14 may be combined in a single chip.  Memory 14 stores instructions for performing the exemplary method--, in [0035]).








Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIWEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on Monday-Friday 8:30am-4:30pm east.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


						/WEI WEN YANG/                                                                           Primary Examiner, Art Unit 2667