DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 9, 11, and 16-19 are rejected under 35 U.S.C. 102(a)(1) & 102(a)(2) as being anticipated by Philbin et al (US 2016/0180151).
Regarding Claim 1, Philbin teaches a method of training a neural network ([0037], Fig. 3, process 300 for training a neural network on a triplet), the method comprising: 
generating a plurality of triplets of training vectors, each triplet in the plurality of triplets comprising a reference data point, a positive data point, and a negative data point, the reference data point representing a first object, the positive data point representing a second object similar to the first object, and the negative data point representing a third object dissimilar to the first object ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type, the negative image is an image that has been classified as being an image of a different object of the particular object type from the anchor image and the positive image);
for each triplet in the plurality of triplets: passing the reference data point, the positive data point, and the negative data point in each triplet through the neural network to generate extracted features ([0038-0040], Fig. 3, system processes the anchor image in the triplet using the neural network in accordance with current values of the parameters of the neural network to generate a numeric embedding of the anchor image at 302, system processes the positive image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the positive image at 304, and system processes the negative image in the triplet using the neural network in accordance with the current values of the parameters of the neural network to generate a numeric embedding of the negative image at 306); 
calculating a loss from the extracted features ([0041], Fig. 3, system computes a triplet loss from the numeric embedding of the anchor image, the positive image, and the negative image at 308); and 
adjusting the parameters of the neural network based on the loss ([0043], Fig. 3, system adjusts the current values of the parameters of the neural network using the triplet loss at 310).
Regarding Claim 2, Philbin teaches all aspects of the claimed invention as disclosed in Claim 1 above. Philbin further teaches wherein generating a first triplet in the plurality of triplets comprises classifying an image of a first object as the reference data point of the first triplet and classifying an image of a second object different than the first object as the positive data point of the first triplet ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type, the negative image is an image that has been classified as being an image of a different object of the particular object type from the anchor image and the positive image).
Regarding Claim 3, Philbin teaches all aspects of the claimed invention as disclosed in Claim 2 above. Philbin further teaches wherein the first object is visually similar to the second object ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type).
Regarding Claim 9, Philbin teaches all aspects of the claimed invention as disclosed in Claim 1 above. Philbin further teaches wherein generating a first triplet in the plurality of triplets comprises classifying an image of a first object as a first reference data point and an image of a second object different than the first object as a first negative data point ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type, the negative image is an image that has been classified as being an image of a different object of the particular object type from the anchor image and the positive image).
Regarding Claim 11, Philbin teaches all aspects of the claimed invention as disclosed in Claim 9 above. Philbin further teaches wherein the first object is visually dissimilar to the second object ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type, the negative image is an image that has been classified as being an image of a different object of the particular object type from the anchor image and the positive image).
Regarding Claim 16, Philbin teaches all aspects of the claimed invention as disclosed in Claim 1 above. Philbin further teaches wherein calculating the loss comprises calculating at least one of an N-pair loss, a triplet loss with L1 norm, a triplet loss with L2 norm, a lifted structure loss, or a margin-based loss ([0041-0042] system computes a triplet loss L from embedding of anchor image, positive image, and negative image, where the triplet loss is expressed such that it is minimized when an image of a specific object has an embedding that is closer to the embeddings of all other images of the specific object than it is to the embedding of any other image of any other object, with a margin between positive and negative pairs of at least a).
Regarding Claim 17, Philbin teaches all aspects of the claimed invention as disclosed in Claim 1 above. Philbin further teaches a system comprising: a memory to store the parameters of the neural network adjusted in claim 1; and a processor, operably coupled to the memory, to implement the neural network of claim 1 ([0049-0055], embodiments of disclosed functional operations in computer software encoded in memory and executed by a processor).
Regarding Claim 18, Philbin teaches a system comprising: a server ([0057-0058], embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server) to implement a neural network trained on a plurality of triplets of training vectors ([0037], Fig. 3, process 300 for training a neural network on a triplet), 
each triplet in the plurality of triplets comprising a reference data point, a positive data point, and a negative data point, the reference data point representing a first object, the positive data point representing a second object similar to the first object, and the negative data point representing a third object dissimilar to the first object ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type, the negative image is an image that has been classified as being an image of a different object of the particular object type from the anchor image and the positive image).
Regarding Claim 19, Philbin teaches all aspects of the claimed invention as disclosed in Claim 18 above. Philbin further teaches wherein the first object is visually similar to the second object ([0035], system generates a set of triplets from the set of training images, each triplet includes a respective anchor image, a respective positive image, and a respective negative image, for each triplet, the anchor image and the positive image are images that have both been classified as being images of the same object of the particular object type, the negative image is an image that has been classified as being an image of a different object of the particular object type from the anchor image and the positive image).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 4, 6, 10, 13, 20, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Philbin et al (US 2016/0180151), in view of Wang et al (US 2017/0140248).
Regarding Claims 4 and 20, Philbin teaches all aspects of the claimed invention as disclosed in Claims 2 and 18 above. Philbin fails to teach wherein the first object is visually dissimilar to the second object.
In the same field of endeavor, Wang teaches wherein the first object is visually dissimilar to the second object ([0017-0018], third network is trained for triplet similarity based on image group information, where group membership can be used to define relevance between images, typically two images belonging to the same group are relevant or similar to each other, a triplet can be formed of a reference image, a positive image (i.e., an image that comes from the same group as a reference image) and a negative image (i.e., an image that comes from a different group as the reference image, triplet sampling can be carried out similarly as in the single-task network and all the label information (i.e., object category, style field, image group) can be utilized to sample positive and negative images (~visual similarity (a.k.a. object category) considered separately from other types of image information, therefore visually dissimilar images (a.k.a. different objects) belonging to the same image group can still be considered similar/positive image)).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include determination of similarity based on more characteristics than simply visual similarity, such as image group or label information, as taught in Wang, in order to better predict tags to associate with images across multiple types of tasks and similarities, providing more accurate characterization of images overall. (See Wang [0001-0002])
Regarding Claims 6, 13, and 22, Philbin teaches all aspects of the claimed invention as disclosed in Claims 2, 9, and 18 above. Philbin fails to teach wherein the first object has an extrinsic similarity with respect to the second object.
In the same field of endeavor, Wang teaches wherein the first object has an extrinsic similarity with respect to the second object ([0017-0018], third network is trained for triplet similarity based on image group information widely available from project, album, post, etc., where group membership can be used to define relevance between images, typically two images belonging to the same group are relevant or similar to each other, a triplet can be formed of a reference image, a positive image (i.e., an image that comes from the same group as a reference image) and a negative image (i.e., an image that comes from a different group as the reference image, triplet sampling can be carried out similarly as in the single-task network and all the label information (i.e., object category, style field, image group) can be utilized to sample positive and negative images (~style and group information define extrinsic characteristics)).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include determination of similarity based on more characteristics than simply visual similarity, such as image group or label information, as taught in Wang, in order to better predict tags to associate with images across multiple types of tasks and similarities, providing more accurate characterization of images overall. (See Wang [0001-0002])
Regarding Claim 10, Philbin teaches all aspects of the claimed invention as disclosed in Claim 9 above. Philbin fails to teach wherein the first object is visually similar to the second object.
In the same field of endeavor, Wang teaches wherein the first object is visually similar to the second object ([0017-0018], third network is trained for triplet similarity based on image group information, where group membership can be used to define relevance between images, typically two images belonging to the same group are relevant or similar to each other, a triplet can be formed of a reference image, a positive image (i.e., an image that comes from the same group as a reference image) and a negative image (i.e., an image that comes from a different group as the reference image, triplet sampling can be carried out similarly as in the single-task network and all the label information (i.e., object category, style field, image group) can be utilized to sample positive and negative images (~visual similarity (a.k.a. object category) considered separately from other types of image information, therefore visually similar images (a.k.a. same object) belonging to different image group can still be considered dissimilar/negative image)).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include determination of similarity based on more characteristics than simply visual similarity, such as image group or label information, as taught in Wang, in order to better predict tags to associate with images across multiple types of tasks and similarities, providing more accurate characterization of images overall. (See Wang [0001-0002])

Claims 5, 7-8, 12, 21, and 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over Philbin et al (US 2016/0180151), in view of Shlens et al (US 2016/0378863).
Regarding Claims 5, 12, and 21, Philbin teaches all aspects of the claimed invention as disclosed in Claims 2, 9, and 18 above. Philbin fails to teach wherein the first object has a non-stationary similarity distribution with respect to the second object.
In the same field of endeavor, Shlens teaches wherein the first object has a non-stationary similarity distribution with respect to the second object ([0024], similarities determined between terms, where relative locations of the terms reflect semantic and syntactic similarities, vector subtraction and vector addition operations performed on the locations can be used to determine relationships between terms, [0030], system then combines the term representations for the query terms to generate the query representation (~different similarities determined based on which terms included in query)).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include determination of similarity based on more characteristics than simply visual similarity, such as location and relationship to other terms provided in a query, as taught in Shlens, in order to select more relevant responses dependent on the received search queries thereby improving user experience. (See Shlens [0005])
Regarding Claim 7, Philbin teaches all aspects of the claimed invention as disclosed in Claim 2 above. Philbin fails to teach wherein classifying the image of the second object as the first positive data point comprises: conducting a search of an image database using a name of the first object; and identifying a result of the search of the image database as the image of the second.
In the same field of endeavor, Shlens teaches wherein classifying the image of the second object as the first positive data point comprises: conducting a search of an image database using a name of the first object ([0055], Fig. 5, system obtains, for each training video, search queries that are associated with the training video (step 504), the search queries associated with a given training video are search queries that users have submitted to a video search engine and that resulted in the users selecting a search result identifying the training video); and identifying a result of the search of the image database as the image of the second object ([0056-0059], Fig. 5, system computes, for each training video, the query representations of the queries associated with the training video (step 506), system generates training triplets for training the modified image classification neural network (step 508), each triplet includes a video frame from a training video, a positive query representation, and a negative query representation, the positive query representation is a query representation for a query associated with the training video and the negative query representation is a query representation for a query that is not associated with the training video but that is associated with a different training video, system selects as the positive query representation for the training triple that includes the frame the query representation that is the closest to the frame representation for the frame from among the representations for queries associated with the training video).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include determination of similarity in response to a query received from a user, the responses to the query used in classifying the returned objects, as taught in Shlens, in order to select more relevant responses dependent on the received search queries thereby improving user experience. (See Shlens [0005])
Regarding Claim 8, Philbin, as modified by Shlens, teaches all aspects of the claimed invention as disclosed in Claim 7 above. The combination, particularly Shlens further teaches randomly selecting an image in the image database not returned in the search of the image database as the negative data point in the first triplet ([0056-0059], Fig. 5, each triplet includes a video frame from a training video, a positive query representation, and a negative query representation, the positive query representation is a query representation for a query associated with the training video and the negative query representation is a query representation for a query that is not associated with the training video but that is associated with a different training video).
Regarding Claim 25, Philbin teaches all aspects of the claimed invention as disclosed in Claim 18 above. Philbin fails to teach a user interface, operably coupled to the server, to display an image of the first object to a user and to receive a query from the user about the first object.
In the same field of endeavor, Philbin teaches a user interface, operably coupled to the server ([0016-0018], the video search system 114 provides a user interface to the user device 104 through which the user 102 can interact with the video search system 114), to display an image of the first object to a user and to receive a query from the user about the first object ([0018-0020], user 102 submits a query 110, the query 110 may be transmitted through the network 112 to the video search system 124, when the query 110 is received by the video search engine 130, the video search engine 130 identifies responsive videos for the query 110 from the videos that are indexed in the index 122, after the video search engine 130 has selected responsive videos for the query 110, the representative frame system 150 selects a representative video frame from each of the responsive videos, the video search system 114 then generates a response to the query 110 that includes video search results).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include determination of similarity in response to a query received from a user, the responses to the query used in classifying the returned objects, as taught in Shlens, in order to select more relevant responses dependent on the received search queries thereby improving user experience. (See Shlens [0005])
Regarding Claim 26, Philbin, as modified by Shlens, teaches all aspects of the claimed invention as disclosed in Claim 25 above. The combination, particularly Shlens further teaches wherein the server is configured to return an image of an object similar to the first object in response to the query from the user and the user interface is configured to display an image of the object similar to the first object to the user ([0021], each of the video search results identifies a respective one of the responsive videos and includes a presentation of the representative frame selected for the responsive video by the representative frame system 150, the presentation of the representative frame may be, e.g., a thumbnail of the representative frame or another image that includes content from the representative frame, each video search result also generally includes a link that, when selected by a user, initiates playback of the video identified by the video search result).

Claims 14-15 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Philbin et al (US 2016/0180151), in view of Kalai et al (US 2012/0296776).
Regarding Claims 14 and 23, Philbin teaches all aspects of the claimed invention as disclosed in Claims 1 and 18 above. Philbin fails to teach wherein the plurality of triplets comprises between about 5 x 106 triplets to about 1.25 x 1012 triplets.
In the same field of endeavor, Kalai teaches wherein the plurality of triplets comprises between about 5 x 106 triplets to about 1.25 x 1012 triplets ([0042], FIGS. 2a-2c, illustrate building a similarity model for a set of images, each of which depicts a flag of a different country, the set of images may comprise any suitable number of images and may contain tens, hundreds, thousands, and/or millions of images, [0064-0065], number of fixed subsets corresponding to each of the N inputted items may be any suitable number between 1 and N, though preferably it would be a number smaller than N, each subset may consist of three items (a triplet), each subset may consist of two items or four or five items, not all subsets may consist of the same number of items (~generating triplets for millions of images discloses the claimed range)).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include generation of triplets from any suitable number of images, preferably larger numbers such as thousands or millions, thereby producing millions of triplets for training the network, as taught in Kalai, in order to better tune the classifier based on a plethora of training data.
Regarding Claims 15 and 24, Philbin teaches all aspects of the claimed invention as disclosed in Claims 1 and 18 above. Philbin fails to teach wherein the plurality of triplets is generated from a plurality of images comprising between about 2000 total images and about 1,000,000 total images and at least 4 images per class of similar object.
In the same field of endeavor, Kalai teaches wherein the plurality of triplets is generated from a plurality of images comprising between about 2000 total images and about 1,000,000 total images and at least 4 images per class of similar object ([0042], FIGS. 2a-2c, illustrate building a similarity model for a set of images, each of which depicts a flag of a different country, the set of images may comprise any suitable number of images and may contain tens, hundreds, thousands, and/or millions of images, [0064-0065], number of fixed subsets corresponding to each of the N inputted items may be any suitable number between 1 and N, though preferably it would be a number smaller than N, each subset may consist of three items (a triplet), each subset may consist of two items or four or five items, not all subsets may consist of the same number of items).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the training of a neural network on a plurality of triplets by processing an anchor image, a positive image similar to the anchor image, and a negative image dissimilar to the anchor image, as taught in Philbin, to further include generation of triplets from any suitable number of images, preferably larger numbers such as thousands or millions, thereby producing millions of triplets for training the network, as taught in Kalai, in order to better tune the classifier based on a plethora of training data.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Schroeder et al (US 2017/0161919) discloses ([0015-0016]) training a neural network based on comparing a triplet of known images of the plurality, where each known data structure of the plurality is a respective known N dimensional vector in an N dimensional space, a first known image of the triplet may be a matching image for a second known image of the triplet, a third known image of the triplet may be a non-matching image for the first known image of the triplet; Spizhevoy et al (US 2018/0018451) discloses ([0034]) an eye authentication trainer 104 utilizing a deep neural network (DNN) 112 with a triplet network architecture to learn the embedding 108, to learn the embedding 108, the triplet network architecture can include three identical embedding networks, for example an anchor embedding network (ENetworkA) 124a, a positive embedding network (ENetworkP) 124p, and a negative embedding network (ENetworkN) 124n, which can map eye images from the eye image space into embedding space representations of the eye images in the embedding space.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARGARET G MASTRODONATO whose telephone number is (571)270-7803. The examiner can normally be reached M-F 9:00-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARGARET G MASTRODONATO/Primary Examiner, Art Unit 2641