DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 1/14/2019.
Claims 15 and 16 are cancelled.
Claims 1-14, and 17-28 are presented for examination.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-14, 20-22, and 26-28 are rejected under 35 U.S.C. 103 as being unpatentable over Philbin, et al (WO2016/100717 A1, herein Philbin), and Corrado, et al (US 2015/0178383 A1, herein Corrado).
Regarding claim 1,
	Philbin teaches a method comprising: obtaining training data for training a machine learning model having a plurality of parameters (Philbin, Fig. 1, step 120, and, par. [0004], ln. 1 “… one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of obtaining a plurality of training images…”, and par. [0027], ln. 2 “..trains the neural network 120 on training images to determine trained values of the parameters of the neural network…”

    PNG
    media_image1.png
    451
    545
    media_image1.png
    Greyscale

In other words, neural network is machine learning model, obtaining a plurality of training images is obtaining training data, and trains the neural network on training images to determine trained values of the parameters of the neural network is training a machine learning model having a plurality of parameters.), wherein
	the machine learning model is configured to process input images to generate, for each input image, a predicted point in an embedding space (Philbin, par. [0025], ln. 1 “convolutional neural network, that is configured to receive an input image and to process the input image..” and Fig. 1, step 124, “Numeric embedding”. In other words, convolutional neural network is machine learning model, configured to process input images is configured to process input images, and generate a numeric embedding is generate, for each input image, a predicted point in an embedding space.), and wherein
	the training data comprises a plurality of training images and, (Philbin, Fig. 1, and par. [0027], ln. 2 “…trains the neural network 120 on training images…” In other words, trains the neural network on training images is training data comprises a plurality of training images.),
	for each training image, label data that identifies one or more object categories from a set of object categories to which one or more objects depicted in the training image belong (Philbin, par. [0027], ln. 3 “The training images are images that have been classified as being images of objects…” In other words, training images is training image, classified as being images of objects is label data that identifies one or more object categories…objects depicted in the training image.);
	determining, from the label data for the training images in the training data, a respective numeric embedding in the embedding space of each of the object categories in the [set of object categories] (Philbin, Fig. 1, par. [0028], ln. 1 “Once the neural network 120 has been trained, the numeric embeddings generated by the numeric embedding system… can be used in various image processing tasks.” And, par. [0029], ln. 2 “In this example, the numeric embedding system 120 receives two input images of objects of the particular object type.  The numeric embedding system 120 processes each of the input images using the neural network 120 in order to generate a respective numeric embedding of each of the two images.”  In other words, generate a respective numeric embedding is determining… a respective numeric embedding in the embedding space of each of the object categories. ), wherein
	a distance in the embedding space between the numeric embeddings of any [two] object categories reflects a degree of visual co-occurrence of the two object categories in the training images ( Philbin, par. [0029], ln. 7 “..may determine that the two images are of the same object when the distance between the numeric embeddings of the two images is less than a threshold distance.” In other words, distance between the numeric embeddings is distance in the embedding space between the numeric embeddings, and may determine that the two images are the same object is a degree of visual co-occurrence of the two objects in the training images.), wherein
	[the degree of visual co-occurrence is based on a relative frequency with which the label data for a training image associates both of the object categories with the training image; and]
	training the machine learning model on the training data, comprising, for each of the training images (Philbin, FIG. 2, and par.[0032] “FIG. 2 is a flow diagram of an example process 200 for training a neural network to generate numeric embeddings.”

    PNG
    media_image2.png
    489
    648
    media_image2.png
    Greyscale

FIG. 2
In other words, training the neural network is training the machine learning model, and training images is training images.):
	processing the training image using the machine learning model in accordance with current values of the parameters to generate a predicted point in the embedding space for the training image (Philbin, FIG. 3, and par. [0037], “The system processes the anchor image in the triplet using the neural network in accordance with current values of the parameters of the neural network to generate a numeric embedding of the anchor image (Step 202).  

    PNG
    media_image3.png
    582
    473
    media_image3.png
    Greyscale

In other words, process images is processing the training image, neural network is machine learning model, current values of the parameters of the neural network is current values of the parameters, and generate a numeric embedding is generate a predicted point in the embedding space.); and
	adjusting the current values of the parameters to reduce a distance between the predicted point in the embedding space and the numeric embeddings of the object categories identified in the label data for the training image (Philbin, par. [0042], “The system adjusts the current values of the parameters of the neural network using the triplet loss (step 310).  That is, the system adjusts the current values of the parameters of the neural network to minimize the triplet loss.  The system can adjust the current values of the parameters of the neural network using conventional neural network training techniques, e.g., stochastic gradient descent with back propagation.”  In other words, adjust the current values of the parameters is adjusting the current values of the parameters, to minimize the triplet loss is to reduce the distance between the predicted point in the embedding space and the numeric embeddings of the object categories identified in the label data.).
	Thus far, Philbin does not explicitly teach set of object categories  or more than one object category such as two.
	Corrado teaches set of object categories and at least two object categories (Corrado, Fig. 3, and par. [0003], ln. 1 “Data object classification systems can classify data objects into one or more pre-determined categories.  For example, visual recognition systems can identify objects in images, i.e., classify input images as including objects from one or more object categories.  Some data object classification systems use one or more neural networks to classify an input data object.” And, par. [0005], ln.  8 “wherein the classification data includes a respective score for each of the plurality of categories represents a likelihood that the data object belongs to the category, and wherein each of the categories is associated with a respective category label; computing an aggerate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores;” And, par. [0039], ln. 6 “For example, the system can determine the closest representation by identifying the representation that has the largest cosine similarity value with the aggregate representation for the input data object or that is the closest representation to the aggregate representation using a different distance metric, e.g., Euclidian distance, Hamming distance, and so on.” In other words, one or more object categories is set of object categories, and more object categories is at least two object categories.)
	Corrado teaches the degree of visual co-occurrence is based on a relative frequency with which the label data for a training image associates both of the object categories with the training image (Corrado, par. [0042], ln. 12 “That is, the specified level of generality may be associated with a threshold frequency of occurrence, and the system can determine that any term that has a frequency of occurrence that exceeds the threshold frequency has at least the specified level of generality.” And, par. [0007], ln. 9 “Further, labels that are inaccurately predicted by the data object classification system may be semantically or syntactically related to the correct label for the input data object.  Additionally, the visual recognition system may be able to easily predict labels that are specific, generic, or both for a given data object. Data object classification systems can classify data objects into one or more pre-determined categories. For example, visual recognition systems can identify objects in images, i.e., classify from one or more object categories.”  In other words, threshold frequency of occurrence is degree of co-occurrence based on relative frequency. Label data for a training image was previously mapped. See above mapping of claim 1.);
	Both Philbin and Corrado are directed to generating numeric embeddings of objects for the purpose of classification. Philbin teaches training and using a neural network to generate  numeric embeddings of image objects, but does not teach two or more object categories or a relative frequency of co-occurrence associated with both of the object categories.  Corrado teaches data object classification into one or more pre-determined categories as well as a degree of visual co-occurrence based on a relative frequency of which the label data associates with both categories.  In view of the teaching of Philbin, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Corrado into Philbin.  This would result in being able to generate numeric embeddings of objects with more than one category of objects.
	One of ordinary skill in the art would be motivated to do this because frequently objects belong to more than one category.  By categorizing objects into higher dimensional space, category labels can be more accurately predicted. (Corrado, par. [0007], ln. 1 “Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages.  By configuring a data object classification system to predict representations of labels in a high-dimensional space, category labels for data objects can be accurately predicted.  Additionally, the accuracy of zero-shot predictions, i.e., predictions of labels that were not observed during training, can be improved.”)
Regarding claim 5,
	The combination of Philbin and Corrado teaches the method of claim 1, wherein
adjusting the current values of the parameters comprises:   
	determining a combined embedding from the numeric embeddings of the object categories identified in the label data for the training image (Corrado, par. [0005], ln.  8 “wherein the classification data includes a respective score for each of the plurality of categories represents a likelihood that the data object belongs to the category, and wherein each of the categories is associated with a respective category label; computing an aggerate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores;” In other words, computing an aggregate high-dimensional representation for the data object is determining a combined embedding from the numeric embeddings of the object categories, and category labels is label data for the training image. ); and 
	adjusting the current values of the parameters  (Philbin, par. [0012], “Adjusting the current values of the parameters may comprise adjusting the current values of the parameters to minimize the triplet loss.”  In other words, adjusts the current parameters is adjusting the current parameters.) to
	reduce a cosine proximity between the combined embedding and the predicted point in the embedding space for the training image (Corrado, par. [0039], ln. 6 “For example, the system can determine the closest representation by identifying the representation that has the largest cosine similarity value with the aggregate representation for the input data object or that is the closest representation to the aggregate representation using a different distance metric, e.g., Euclidian distance, Hamming distance, and so on.” In other words, determine the closest representation by identifying the representation that has the largest cosine similarity value with the aggregate representation for the input data object is reducing the cosine proximity between the combined embedding and the predicted point in the embedding space for the training image.)
Regarding claim 6,
	The combination of Philbin and Corrado teaches the method of claim 5, wherein
	determining the combined embedding comprises summing the numeric embeddings of the object categories identified in the label data for the training image (Corrado, Fig. 3, step 304, “Compute aggregate representation,” And, par. [0037], ln. 8 “The system then combines the products to generate the aggregate representation for the input data object, e.g., by summing the respective products to generate the aggregate representation.” And par. [0038], ln. 1 “The system selects one or more category labels for the input data object using the aggregate high-dimensional representation for the input data object using the aggregate high-dimensional representation (step 306).”  In other words, compute is determining, aggregate high-dimensional representation is combined embedding, and compute aggregate representation…by summing is determine combined embedding comprises summing the embeddings identified in the label data.)
Regarding claim 7,
	The combination of Philbin and Corrado teaches the method of claim 1, wherein
the machine learning model is a deep convolutional neural network (Philbin, par. [0025], “The neural network 120 is a neural network having multiple parameters, e.g., a deep convolutional neural network…”  In other words, deep convolutional neural network is deep convolutional neural network.).
Claim 8 is a method comprising: maintaining data that corresponds to method claim 1.  In addition, claim 8 has the added limitation of “classifying the input image as including images of one or more objects that belong to object categories represented by the one or more numeric embeddings.” The combination of Philbin and Corrado teaches this. (Philbin, par. [0027], ln. 3 “The training images are images that have been classified as being images of objects…” and, par. [0004], ln. 11 “…input image of an object of the particular object type and to process the input image to generate a numeric embedding of the input image…” In other words, classified as being images of objects is classifying the input image, images of objects is images of one or more objects belong to object categories, and generate a numeric embedding is represented by the one or more numeric embeddings.) Otherwise, they are the same.  It is implicit that a computer implemented method requires a way to store and/or maintain data in order to execute.  Therefore, claim 8 is rejected for the same reasons as claim 1.
Claim 9 is a method claim that depends from claim 8 corresponding to method claim 7 which depends from claim 1.  Otherwise, they are the same.  Therefore, claim 9 is rejected for the same reasons as claim 7.
Regarding claim 10,
	The combination of Philbin and Corrado teaches the method of claim 9, wherein determining, from the maintained data, one or more numeric embeddings that are closest to the predicted point in the embedding space comprises:
	determining a predetermined number of numeric embeddings that are closest to the predicted point in the embedding space ( Philbin, par. [0041], ln. 5 “Thus, the triplet loss is expressed such that it is minimized when an image of a specific object has an embedding that is closer to the embeddings of all other images of the specific object than it is to the embedding of any other image of any other object…” In other words, triplet is predetermined number of numeric embeddings, and minimized when an image of a specific object has an embedding that is closer to the embeddings of all other images of the specific object than it is to the embedding of any other image is numeric embeddings that are closest to the predicted point in the embedding space.)
Regarding claim 11,
	The combination of Philbin and Corrado teaches the method of claim 8, wherein determining, from the maintained data, one or more numeric embeddings that are closest to the predicted point in the embedding space comprises:
	identifying each numeric embedding that is closer than a threshold distance to the predicted point in the embedding space.  (Philbin, par. [0029], ln. 3 “The numeric embedding system 120 processes each of the input images using the neural network….For example, the numeric embedding system 120 may determine that the two images are of the same object when the distance between the numeric embeddings of the two images is less than a threshold distance.” In other words, determine is identifying, numeric embedding is numeric embedding, each of the input images is each numeric embedding, and less than a threshold distance is closer than a threshold distance to the predicted point in the embedding space.)
Regarding claim 12,
	The combination of Philbin and Corrado teaches the method of claim 8, wherein determining, from the maintained data, one or more numeric embeddings that are closest to the predicted point in the embedding space comprises: 
	using cosine proximity to determine the one or more numeric embeddings that are closest to the predicted point.  (Corrado, par. [0039], ln. 6 “For example, the system can determine the closest representation that has the largest cosine similarity value with the aggregate representation of the input data object…” In other words, cosine similarity value is cosine proximity, aggregate representation of the input data object is one or more numeric embeddings that are closest to the predicted point. Examiner notes that numeric embedding was previously mapped in claim 1.)
Claim 13 is a system claim corresponding to method claim 1.  Otherwise, they are the same.  It is implicit that a computer implemented method requires a system in order to execute.  Therefore, claim 13 is rejected for the same reasons as claim 1.
Claim 14 is a non-transitory computer-readable storage medium claim corresponding to method claim 1.  Otherwise, they are the same.  It is implicit that a computer implemented method requires a non-transitory computer-readable storage medium encoded with instructions executed by one or more computers in order to execute.  Therefore, claim 14 is rejected for the same reasons as claim 1.
Claims 20-22 are system claims corresponding to method claims 5-7, respectively.  Otherwise, they are the same.  Therefore, claims 20-22 are rejected for the same reasons as claims 5-7, respectively.
Claims  26-28 are computer-readable storage medium claims corresponding to claims 5-7, respectively.  Otherwise, they are the same.  Therefore, claims 26-28 are rejected for the same reasons as claims 5-7, respectively.
Claims 2-4, 17-19, and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Philbin, Corrado, and Li et al (A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution, herein Li).
Regarding claim 2,
	The combination of Philbin and Corrado teaches the method of claim 1, wherein determining the respective embedding of each of the object categories comprises:
	Thus far, the combination of Philbin and Corrado does not explicitly teach determining a respective pointwise mutual information measure between each possible pair of object categories in the set of object categories as measured in the training data.
	Li teaches determining a respective pointwise mutual information measure between each possible pair of [object categories in the set of object categories (see Corrado, par. [0003], ln.1) page 7 of office action ] as measured in the training data (Li, pg. 3, col.1, par. 3, ln. 1 “the Pointwise Mutual Information between two words si, sj is defined as  
PMI(si; sj) = log (P(si; sj)/P(si)P(sj)).” And, pg. 7, col. 2, par.2, ln. 1 “All models were trained on the English Wikipedia snapshot in March 2015.”   In other words, PMI(si; sj) = log (P(si; sj)/P(si)P(sj)) is a respective pointwise mutual information measure between each possible pair of objects, and models were trained is training data.  Examiner notes that pointwise mutual information measures including eigen-value decompositions are known in the art of semantic language processing.)
	constructing a matrix of the pointwise mutual information measures (Li, Table 1, and pg. 2, col. 2, par. 2, ln. 13, “Thereby a generative model of documents is constructed, parameterized by embeddings and residuals.  The learning objective is to maximize the corpus likelihood, which reduces to a weighted low-rank positive semidefinite (PSD) approximation problem of the PMI matrix.” And, pg. 3, col. 2, par. 3, “ (3) of all bigrams is represented in matrix form: VTV + A = G, where G is the PMI matrix.” 

    PNG
    media_image4.png
    445
    541
    media_image4.png
    Greyscale

In other words, model of document is constructed is constructing a matrix, and PMI matrix is matrix of pointwise mutual information measures.)
performing an eigen-decomposition of the matrix of pointwise mutual information measures to determine an embedding matrix  (Li, Algorithm 1, and, Table 1, and pg. 6, col. 2, par. 3, ln. 1 “In Algorithm 1, the essential subroutine PSD_Approximate() does eigendecomposition on Gt, which is dense due to the logarithm transformation.”

    PNG
    media_image5.png
    429
    516
    media_image5.png
    Greyscale

In other words, Algorithm 1 does eigendecomposition on Gt is perform eigendecomposition, G is PMI matrix, and V is embedding matrix.); and 
	determining the numeric embeddings from the rows of the embedding matrix (Li, pg. 6, col. 1, par. 3, ln. 1 “We summarize our learning algorithm in Algorithm 1. Here “o” is the entry-wise product.  We suppose the eigenvalues λ returned by Eigen_Decomposition(X) are in descending order. QT[1:N] extracts the 1 to N rows from QT.” In other words, Algorithm 1 determines the numeric embeddings, and extracts the 1 to N rows is from the rows of the embedding matrix.)
	Both Li and the combination of Philbin and Corrado are directed to embedding objects, among other things.  The combination of Philbin and Corrado teaches embedding multiple image objects but does not explicitly teach using pointwise mutual information measures or  optimization using eigen-decomposition. Li teaches a word embedding model that uses pointwise mutual information measures and optimization using eigen-decomposition.  In view of the teaching of the combination of Philbin and Corrado, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Li into the combination of Philbin and Corrado. This would result in embedding multiple image objects and using pointwise mutual information measure and eigen-decomposition for optimization.
	One of ordinary skill in the art would be motivated to do this in order to simplify and speed up training as well as improve performance.  (Li, pg. 1, col. 2, ln. 9 “However, there are interaction matrices between the embeddings in all these models, which complicate and slow down the training, hindering them from being trained on huge corpora.  Mikolov et al. (2013a) and Mikolov et al. (2013b) greatly simplify the conditional distribution, where the two embeddings interact directly.  They implemented the well-known “word3vec”, which can be trained efficiently on huge corpora.  The obtained embeddings show excellent performance on various tasks.”)
Regarding claim 3,
	The combination of Philbin, Corrado, and Li teaches the method of claim 2, wherein 
determining the numeric embeddings from the rows of the embedding matrix comprises:
restricting the embedding matrix to its first k columns to generate a restricted embedding matrix (Li, Table 1, V is the embedding matrix {vsi,…vsw}, vsi is an embedding word.  In other words, w columns is k columns, and each word vsi of characters represents a row.); and
using the rows of the restricted embedding matrix as the numeric embeddings (Li, see above. In other words, embedding words vsi is using rows as the numeric embeddings.).  
Regarding claim 4,
	the combination of Philbin, Corrado and Li teaches the method of claim 2, wherein performing an eigen-decomposition of the matrix of pointwise mutual information measures to determine an embedding matrix comprises: 
	decomposing the matrix of pointwise mutual information measures PMI into a matrix product of matrices that satisfies:
		 
    PNG
    media_image6.png
    24
    134
    media_image6.png
    Greyscale

wherein the embedding matrix E satisfies:
   
    PNG
    media_image7.png
    22
    104
    media_image7.png
    Greyscale

(Li,  Algorithm 1, and Table 1, and pg. 6, col. 2, par. 2, ln. 1 “In Algorithm 1, the essential subroutine PSD_Approximate() does eigendecomposition on Gt, which is dense due to the logarithm transformation.” In other words, Gt is the PMI matrix, V is the embedding matrix and eigendecomposition is eigen-decomposition.  Examiner notes that though Li does not explicitly show the equations as shown in the limitation, Li teaches the requirement of the limitation.  The equations are implicitly taught by Li and are derivative.)
Claims 17-19 are system claims corresponding to method claims 2-4, respectively. Otherwise, they are the same.  Therefore, claims 17-19 are rejected for the same reasons as claims 2-4, respectively.
Claims 23-25 are computer readable storage claims corresponding to method claims 
2-4, respectively. Otherwise they are the same.  Therefore, claims 23-25 are rejected for the same reasons as claims 2-4, respectively.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124