Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on March 3, 2021 has been entered.

Remarks
	 This Office Action is in response to applicant’s amendment and RCE filed on March 3, 2021. Claims 1-20 remain pending and under consideration.

Response to Arguments
Applicant’s arguments with respect to the rejection of claims 1-7, 15, and 18 have been considered but are moot under the new ground of rejection.
	Applicant argued that the newly added limitations of “wherein a number of the plurality of first latent vector representations is determined based on a density of the first input” and “wherein a number of the plurality of second latent vector representations is determined based on a density of the second input” recited in claim 1 distinguish over the previously cited references Qu and Qiu. In the new ground of rejection set forth below, this new limitation is 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


1.	Claims 1-7, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Qu et al., “Product-Based Neural Networks for User Response Prediction,” 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, 2016, pp. 1149-1154, doi: 10.1109/ICDM.2016.0151 (hereinafter “Qu”) in view of Qiu et al., “Context-Dependent Sense Embedding,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 183–191, Austin, Texas, November 1-5, 2016 (hereinafter “Qiu”), and He et al., “Neural Factorization Machines for Sparse Predictive Analytics.” SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan (hereinafter “He”).
As to claim 1, Qu teaches a method comprising:
by a computing device [it is understood that the method of Qu is performed by a computing device, since Qu discloses experiments using large datasets (§ IV) and comparisons of computational performance (see § IV.D in particular). Therefore, Qu teaches this and the subsequent “by the computing device” limitations], accessing a neural network having a module [FIG. 1: “Product-based Neural Network Architecture” (caption). The bottom three sections of the neural network architecture correspond to a module.], the module having a first input and a second input; [FIG. 1: “input” layer comprising a plurality of “fields.” Fields are described on page 1151, left column, second paragraph (“input feature vector containing multiple fields”)] and
by the computing device, using the module to: 
process the first input to generate a first latent vector representation of the first input; [In general, Qu teaches that embedding vectors are generated from respective fields, as shown in FIG. 1 (embedding vectors output by the “embedding layer”) and described on page 1151, left column, paragraph 2 (which states: “The embedding vector fi of field i, is the output of the embedding layer”). Such embedding vectors constitute “latent vector representations,” because they are representations of respective fields computed based on parameters W0 (as described on page 1151, left column, paragraph 2). See also § E (p. 1153), paragraph 3, describing “latent vectors.” Here, a first field in Qu (e.g., “Field 1” in Fig. 1) may correspond to the instant “first input,” and an embedding is generated from this field, as discussed above.] 
process the second input to generate a second latent vector representation of the second input; [A second field in Qu (e.g., “Field 2” in FIG. 1) may correspond to the instant “second input.” As described above, embedding vectors are generated from respective fields.]
determine unique pairwise combinations of latent vector representations, wherein each unique pairwise combination comprises latent vector representations selected from (the vectors in the embedding layer) [As shown in FIG. 1, the “product layer” computes unique pairwise combinations of items from the embedding layer. This process is described on page 1151, left column, paragraph 1: “                        
                            
                                
                                    p
                                
                                
                                    i
                                    ,
                                    j
                                
                            
                            =
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                     defines the pairwise interaction.” For example, § III.B, paragraph 1, teaches the pairwise interaction may be the vector inner product                        
                             
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    . That is, the combinations of                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            ,
                            
                                
                                    f
                                
                                
                                    j
                                
                            
                        
                     are determined as pairwise combinations. Qu teaches pairwise interactions between all embedding vectors in the embedding layer.]
model pairwise interactions between the unique pairwise combinations of latent vector representations; [As noted above, Qu models “pairwise interactions” the unique pairwise combinations in the “product layer” of FIG. 1. Page 1151, left column, paragraph 1 teaches that “                        
                            
                                
                                    p
                                
                                
                                    i
                                    ,
                                    j
                                
                            
                            =
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                     defines the pairwise interaction” and § III.B, paragraph 1, teaches the pairwise interaction may be the vector inner product                        
                             
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    .] and
produce an intermediate output by combining results of the modeled pairwise interactions. [As shown in eq. (7) on page 1151, left column (                        
                            p
                            =
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            i
                                            ,
                                            j
                                        
                                    
                                
                            
                            ,
                             
                            i
                            =
                            1
                            …
                            N
                            ,
                             
                            j
                            =
                            1
                            …
                            N
                        
                    ), the different pairwise interactions                         
                            
                                
                                    p
                                
                                
                                    i
                                    ,
                                    j
                                
                            
                        
                     are combined to generate the output p (i.e., an “intermediate output”).]
Although Qu teaches a plurality of inputs and pairwise interactions between all embedding vectors in the embedding layer, Qu does not teach that a plurality of different latent representations represented and are generated by processing the same input. Therefore, Qu does not teach the limitations of: 
(1)	“a plurality of first latent vector representations of the first input, the plurality of first latent vector representations being different from each other”; 
(2)	“a plurality of second latent vector representations of the second input, the plurality of second latent vector representations being different from each other”; 
(3)	the latent vector representations being “selected from (1) the plurality of first latent vector representations of the first input; and (2) the plurality of second latent vector representations of the second input”; and
(4)	“wherein a number of the plurality of first latent vector representations is determined based on a density of the first input” and “wherein a number of the plurality of second latent vector representations is determined based on a density of the second input.” 
Qiu, in an analogous art, teaches or suggests the limitations (1) through (3) listed above. Qiu generally pertains to natural language processing using neural networks (abstract and § 1). Therefore, Qiu is analogous art for at least the reason of being in the field of machine learning. 
In particular, Qiu teaches inputs [a word, as described in Qiu’s abstract and § 1, corresponds to an input] and generating “a plurality of first latent vector representations of the first input, the plurality of first latent vector representations being different from each other” and “a plurality of second latent vector representations of the second input, the plurality of second latent vector representations being different from each other” [§ 1, paragraph 2: “learning one vector for each word may not cover all the senses of the word…and…may not be a good representation of any of the senses. A possible solution is sense embedding which trains a vector for each sense of a word. There are two key steps in training sense embeddings. First, we need to perform word sense disambiguation (WSD) or word sense induction (WSI) to determine the senses of words in the training corpus. Then, we need to train embedding vectors for word senses according to their contexts.”]. 
The Examiner notes that a “word” in Qiu is analogous to a field (or input) in Qu, particularly given that Qu relates to information retrieval wherein field vectors represent words (e.g., “Tuesday,” “male,” or “London”), as described in Qu, § 1, paragraph 2. Qiu’s teachings are applicable to Qu, particularly in light of the fact that Qu discusses the application of word embeddings used to represent the information of a word (see Qu, § IV). Furthermore, with respect to multiple inputs (i.e., a first input and a second input), Qiu’s teachings are applicable to multiple words.
Furthermore, Qiu suggests “selected from (1) the plurality of first latent vector representations of the first input; and (2) the plurality of second latent vector representations of the second input” [As noted above, Qiu teaches a plurality of latent vector representations per input, and teaches the elements of “the plurality of first latent vector representations of the first input” and “the plurality of second latent vector representations of the second input.” Moreover, Qu already teaches pairwise interactions between all embedding vectors in the embedding layer in general. Therefore, the instant limitation results from the combination of Qiu and Qu set forth below.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Qu and Qiu by modifying Qu such that the module is used to process the first input to generate “a plurality of first latent vector representations of the first input, the plurality of first latent vector representations being different from each other” and to process the second input to generate “a plurality of second latent vector representations of the second input, the plurality of second latent vector representations being different from each other,” such that the latent vector representations are “selected from (1) the plurality of first latent vector representations of the first input; and (2) the plurality of second latent vector representations of the second input.” The motivation for doing so would have been to obtain good representations of different meanings (senses) of the first and second inputs, as suggested by Qiu, (§ 1, paragraph 2).
The limitation of “selected from (1) the plurality of first latent vector representations of the first input; and (2) the plurality of second latent vector representations of the second input,” would alternatively have been obvious as a combination of prior art elements according to known methods to yield predictable results. As noted above, Qiu teaches a plurality of latent vectors per input, and Qu teaches pairwise interactions between all embedding vectors in general. The use of pairwise interactions (e.g., inner product) between different pairs embedding vectors generated from different inputs is a combination of known elements with no change in their respective functions, and the combination yielded nothing more than predictable results to one of ordinary skill in the art. The different embeddings for different senses taught in Qiu still function in representing the different senses, and Qu’s technique of computing pairwise interactions still function in the same way. The results would have been predictable because the instant combination of features would have been recognized as a way of adding additional information to be processed, particularly information that distinguishes between word senses. 
He, in an analogous art, teaches the remaining limitations. He generally relates to neural factorization machines for sparse predictivive analytics (title), and is analogous art for at least the reason of being in the field of machine learning. 
He teaches “wherein a number of the plurality of first latent vector representations is determined based on a density of the first input” and “wherein a number of the plurality of second latent vector representations is determined based on a density of the second input” [FIG. 2 and accompanying description in § 3.1, paragraph 2 (“Embedding Layer”): “we obtain a set of embedding vectors Vx …. to represent the input feature vector x. Owing to sparse representation of x, we only need to include the embedding vectors for non-zero features.” As shown in FIG. 2, an embedding vector is created for each non-zero feature of the “input feature vector.” Since the density of an input is the number of non-zero features, and an embedding vector is generated for each non-zero feature, He teaches that the number of embedding vectors is determined based on the density of the input vector. In Qu, each field is an input vector. Therefore, the teachings of He are applicable to both the “first input” and the “second input.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Qu and Qiu with the teachings of He by modifying the method of Qu (as modified by Qiu) to include the features that a number of the plurality of first latent vector representations is determined based on a density of the first input and a number of the plurality of second latent vector representations is determined based on a density of the second input. The motivation for doing so would have been to obtain, for the first input and the second input, a set of latent vector representations that adequately represents the non-zero features of the input, as suggested by He (see parts quoted above, which teach including embedding vectors only for non-zero features.).  

As to claim 2, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein the pairwise interactions are modeled by dot product operations or cosine similarity operations. [Qu, § III.B, paragraph 1, teaching that pairwise interaction may be the vector inner product                        
                             
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    . The vector inner product reads on the limitation of “dot product.”]

As to claim 3, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein results of the modeled pairwise interactions are combined by being concatenated together. [As stated in Qu, eq. (7) on page 1151, left column (which states that:                        
                             
                            p
                            =
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            i
                                            ,
                                            j
                                        
                                    
                                
                            
                            ,
                             
                            i
                            =
                            1
                            …
                            N
                            ,
                             
                            j
                            =
                            1
                            …
                            N
                        
                    ), the different pairwise interactions                         
                            
                                
                                    p
                                
                                
                                    i
                                    ,
                                    j
                                
                            
                        
                     are combined to generate the output p. Since “p” is a vector (see FIG. 1, eq. (5), and the last paragraph on page 1150) that includes the pairwise interactions, the pairwise interactions are concatenated together as the final p.]

As to claim 4, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein the first and second latent vector representations are generated using linear transformation or embedding operations. [Qu, page 1151, left column, paragraph 2, teaching an embedding operation to generate embedding vector fi]

As to claim 5, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein the plurality of second latent vector representations of the second input are generated based on different embeddings associated with the second input. [Qiu, § 1, paragraph 2, teaching different “sense embedding” for a word.]

As to claim 6, the combination of Qu, Qiu, and He teaches the method of claim 5, wherein the different embeddings represent different related meanings of the second input. [Qiu, § 1, paragraph 2, teaching that the different embeddings represent different meanings (senses) of a word.]

As to claim 7, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein the first input and second input are both dense feature vectors or both sparse feature vectors. [Qu, § IV.E (p. 1153), paragraph 2, teaches that its inputs are sparse feature vectors: “The embedding layer is to convert sparse binary inputs to dense real-value vectors.” Qu, § III, paragraph 2, also teaches: “the information is represented as a multi-field categorical feature vector, where each field (e.g. City) is one-hot encoded as discussed in Section I. Such a field-wise one-hot encoding representation results in curse of dimensionality and enormous sparsity.” § I, paragraph 2, illustrates examples of three sparse feature vectors]. 

As to claim 15, this claim is directed to one or more computer-readable non-transitory storage media embodying software that is operable when executed to perform operations that are the same or substantially the same as those recited in claim 1. Therefore, the rejection of claim 1 is applied to claim 15. 
In addition, Qu teaches “one or more computer-readable non-transitory storage media embodying software” because Qu’s method is performed by a computing device, which is understood to be controlled by software stored in a computer-readable non-transitory storage media, such as the computer’s memory.
Furthermore, with respect to the feature of “the module being a module of nodes,” this feature is taught by Qu, FIG. 1, for example, which shows nodes in the product layer. 

As to claim 18, this claim is directed to a system comprising “one or more processors” and “one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to” perform operations that are the same or substantially the same as those recited in claim 1. Therefore, the rejection made to claim 1 is applied to claim 18.
Additionally, Qu teaches “one or more processors” and “one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to” because it is understood that Qu’s computing device comprises such features.
 Furthermore, with respect to the feature of “the module being a module of nodes,” this feature is taught by Qu, FIG. 1, for example, which shows nodes in the product layer. 

2.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Qu in view of Qiu and He and further in view of Wang et al., “Deep & Cross Network for Ad Click Predictions,” arXiv:1708.05123v1 [cs.LG] 17 Aug 2017 (hereinafter “Wang”).
As to claim 8, the combination of Qu, Qiu and He teaches the method of claim 1, wherein the second input is a sparse feature vector [Qu, § IV.E (p. 1153), paragraph 2, teaches that the inputs are sparse feature vectors: “The embedding layer is to convert sparse binary inputs to dense real-value vectors.” Qu, § III, paragraph 2, also teaches: “the information is represented as a multi-field categorical feature vector, where each field (e.g. City) is one-hot encoded as discussed in Section I. Such a field-wise one-hot encoding representation results in curse of dimensionality and enormous sparsity.” § I, paragraph 2, illustrates examples of three sparse feature vectors].
Qu does not teach that “the first input is a dense feature vector” and “the dense feature vector is associated with a greater number of latent vector representations than the sparse feature vector.”
Wang, in an analogous art, teaches “the first input is a dense feature vector.” Wang generally relates to techniques for deep learning (abstract) for web-scale data (§ 1, paragraph). Therefore, Wang is analogous art at least on the basis of being in the field of machine learning. 
In particular, Wang teaches “the first input is a dense feature vector” [§ 1.2: “Web-scale automatic feature learning with both sparse and dense inputs”; § 2.1: “We consider input data with sparse and dense features.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have the combined the teachings of Qu, Qiu and He with the teachings of Wang by modifying the combination of Qu, Qiu and He such that the first input is a dense feature vector, in order to apply the model of Qu, Qiu and He to a dense input. Such a modification would also have been a simple substitution of one known element for another to obtain predictable results because the model of Qu, Qiu and He is applicable to dense vectors.
With respect to “the dense feature vector is associated with a greater number of latent vector representations than the sparse feature vector,” He (see portions cited in the rejection of claim 1) teaches that the number of embeddings depends on the number of non-zero features of the input vector. Therefore, He teaches that the denser the feature vector (i.e., the more non-zero features), the greater the number of latent vector representations. Therefore, given for example, two vectors of the same size (or similar sizes but with a comparatively large difference in density), the denser vector would have a greater number of latent vector representations. Furthermore, one of ordinary skill in the art would have recognized that the size of the vector is a result-effective variable, since the size relates to the amount of possible information that the vector conveys. For example, Qu, § 1, paragraph 2, teaches that the size of the vector depends on the number of possible values. Therefore, in the combined teachings of the references set forth above, the instant limitation would have been obvious as a result of the discovery of an optimum or workable range for vector sizes, under the principle that “where the general conditions of a claim are disclosed in the prior art, it is not inventive to discover the optimum or workable ranges by routine experimentation.” MPEP § 2144.05(II)(A) (citing In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (C.C.P.A. 1955)). 

3.	Claims 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Qu in view of Qiu and He and further in view of Liu et al., “Collaborative Prediction for Multi-entity Interaction with Hierarchical Representation.” CIKM’15, October 19–23, 2015, Melbourne, Australia (hereinafter “Liu”).
As to claim 9, the combination of Qu, Qiu and He teaches the method of claim 1, but does not teach that the neural network has the further limitations recited in the instant claim. 
Liu, in an analogous art suggests “a plurality of said modules, including a first module and a second module, and the intermediate output of the first module being fed forward to the first input of the second module.” Liu generally relates to a hierarchical interaction representation model comprising hidden layers (abstract), suitable for use in web-related applications such as “recommender systems, information retrieval and social network analysis” (§ 1, paragraph 1). Liu is analogous art at least on the basis of being in the field of machine learning.
In particular, Liu teaches a model [FIG. 2: illustrating a model comprising hidden layers] having a plurality of modules, including a first module and a second module, [as shown in FIG. 2; for example, layer 1 and layer 2 constitute a module, while a subsequent set of layers represented by layer m-1 and layer m constitute another module] and the intermediate output of the first module being fed forward to the first input of the second module [FIG. 2: The intermediate output                         
                            
                                
                                    r
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    (
                                    2
                                    )
                                
                            
                        
                    of the first module is fed to the first input of the second module (the input                         
                            
                                
                                    r
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            k
                                        
                                        
                                            m
                                            -
                                            1
                                        
                                    
                                
                                
                                    (
                                    m
                                    -
                                    1
                                    )
                                
                            
                        
                    ) for a second set of layers. In general, each subsequent module receives the input (generally represented as                        
                             
                            
                                
                                    r
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            k
                                        
                                        
                                            m
                                            -
                                            1
                                        
                                    
                                
                                
                                    (
                                    m
                                    -
                                    1
                                    )
                                
                            
                        
                    ) from the previous module, and outputs                         
                             
                            
                                
                                    r
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            k
                                        
                                        
                                            m
                                        
                                    
                                
                                
                                    (
                                    m
                                    -
                                    1
                                    )
                                
                            
                        
                    to the next module.]. That is, as shown in FIG. 2 and described in § 3.3, Liu teaches duplicating a particular module (e.g., layers 1-2) to form a model that includes a series of similar module. The purpose of the model is to account for interactions between multiple entities, wherein the interaction of two entities is further interacted with another entity in an iterative manner by progressing through the series of modules.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Qu, Qiu and He and the teachings of Liu by duplicating the module of Qu, Qiu and He as “a plurality of modules, including a first module and a second module” configured such that “the intermediate output of the first module being fed forward to the first input of the second module,” in the manner taught by Qiu, in order to achieve improvements in predictive performance by modeling multi-entity interaction (Liu, § 6) in a manner that reveals the joint characteristics of entities (see Liu, § 6 and abstract).

As to claim 11, the combination of Qu, Qiu, He, and Liu teaches the method of claim 9, wherein the second input of the second module receives an input different than the second input of the first module [Liu, § 3.1, generally teaches different entities (corresponding to inputs). For example, in FIG. 2 of Liu, the second input                         
                            
                                
                                    e
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    (
                                    m
                                    )
                                
                            
                        
                      of the second module (layers m and m-1) is different from the second input                         
                            
                                
                                    e
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    (
                                    2
                                    )
                                
                            
                        
                     of the first module (layers 1-2)].

4.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Qu in view of Qiu, He and Liu and further in view of He et al., “Neural Collaborative Filtering.” arXiv:1708.05031v2 [cs.IR] 26 August 2017 (hereinafter referred to as “He 2017b” to distinguish from the other He et al. reference).
As to claim 10, the combination of Qu, Qiu, He and Liu teaches the method of claim 9, and teaches “a plurality of different second latent vector representations,” as set forth in the rejection of claim 1, above. However, the combination of references does not teach the further limitations of the instant claim.
He 2017b, in an analogous art, suggests “wherein the second input of the second module is coupled to receive the same second input of the first module, the first module and second module generating a different…second latent vector representations for their respective second input.” He 2017b generally relates to collaborative filtering using deep neural networks (abstract) with applicability to online services, including E-commerce, online news and social media sites (§ 1, paragraph 1). He 2017b is analogous art at least on the basis of being in the field of machine learning.
In particular, He 2017b, FIG. 3 and § 3.4 teaches a model comprising a first module [General Matrix Factorization (GMF) component, including “GMF layer” in the left portion of FIG. 3 and associated inputs] and a second module [Multilayer Perception (MLF) component, including MPL layers 1 through x shown in the right portion of FIG. 3, along with associated inputs] wherein the second input of the second module is coupled to receive the same second input of the first module [FIG. 3: both the GMF component and the MLF component receive the same input, the “item (i)” vector], the first module and second module generating different second latent vector representations for their respective second input [FIG. 3: the “MF Item Vector” and the “MLP Item Vector” respectively generated by the GMF component and the MPL component. Note that these features are latent vectors, as described in § 2.2, paragraph 1 (describing a “latent vector for…item i”) and § 3.3, paragraph 1 (describing “item latent features”).]. While He 2017b does not specifically teach “a plurality of different second latent vector representations,” it is noted that the combination of Qu, Qiu, He, and Liu teaches this feature, and that He 2017b’s respective latent vector representations are each analogous to the plurality of second latent vector representations taught by Qiu.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the model of the combination of Qu, Qiu, He, and Liu based on the teachings of He 2017b to the result that “the second input of the second module is coupled to receive the same second input of the first module, the first module and second module generating a different, plurality of different second latent vector representations for their respective second input.” One of ordinary skill would have been motivated to so because allowing two modules of differing characteristics to process a common input would result in improved performance, as suggested by He 2017b (see § 4.2, paragraph 2).

5.	Claims 12-13, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Qu in view of Qiu and He and further in view of Liu and Guo et al., “DeepFM: A Factorization-Machine based Neural Network for CTR Prediction.” arXiv:1703.04247v1 [cs.IR] 13 Mar 2017 (hereinafter “Guo”).  
As to claim 12, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein the module is a first module, [as set forth in the rejection of claim 1, above] but does not teach that the neural network further comprises a second module.
Liu, in an analogous art suggests “a second module different than the first module…wherein the intermediate output of the first module is fed forward to one of the third, fourth or fifth input of the second module.” Liu generally relates to a hierarchical interaction representation model comprising hidden layers (abstract), suitable for us in web-related applications such as “recommender systems, information retrieval and social network analysis” (§ 1, paragraph 1). Liu is analogous art at least one the basis of being in the field of machine learning.
In particular, Liu teaches a model [FIG. 2: illustrating a model comprising hidden layers, such model corresponding to the “neural network” of the instant claim] having a first module [as shown in FIG. 2; for example, layers 1-2 constitute a module] and further comprising a second module different than the first module, [FIG. 2: and layer m-1 and m constitute a second module, which is different than the first module at least with respect to the inputs] wherein the intermediate output of the first module is fed forward to one of the [inputs] of the second module [FIG. 2: The intermediate output                         
                            
                                
                                    r
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                
                                
                                    (
                                    2
                                    )
                                
                            
                        
                    of the first module is fed to the first input of the second module (the input                         
                            
                                
                                    r
                                
                                
                                    
                                        
                                            k
                                        
                                        
                                            1
                                        
                                    
                                    
                                        
                                            k
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            k
                                        
                                        
                                            m
                                            -
                                            1
                                        
                                    
                                
                                
                                    (
                                    m
                                    -
                                    1
                                    )
                                
                            
                        
                    ).]. That is, as shown in FIG. 2 and described in § 3.3, Liu teaches duplicating a particular module (e.g., layers 1-2) to form a model that includes a series of similar module. The purpose of the model is to account for interactions between multiple entities, wherein the interaction of two entities is further interacted with another entity in an iterative manner by progressing through the series of modules. 
It is noted that Liu’s teaching of the intermediate output being fed forward to one of the inputs of the second module is analogous to feeding such an input to “one of the third, fourth or fifth input of the second module,” when the second module has three inputs, as set forth below.
Duplication of the “first module” of the combination of Qu, Qiu, and He to create a “second module,” in the manner taught by Liu, would satisfy the features of the second module having a third input, a fourth input, and a fifth input [As shown in FIG. 1 of Qu, there can be any number of inputs (fields) per module; FIG. 1 shows an example with 3 fields] and the method further comprising:
by the computing device, the second module generating one third latent vector representation of the third input, one fourth latent vector representation of the fourth input, and one fifth latent vector representation of the fifth input; [Qu, FIG. 1 and page 1151, left column, paragraph 2, teaching respective embedding vectors fi for the fields. It is noted that while the claim recites “one third latent vector representation,” “one fourth latent vector representation,” and “one fifth latent vector representation,” the claim is open-ended and permits additional latent vector representations to be generated in the manner of with Qiu. Therefore, the “first module” that is being duplicated may be the “first module” as disclosed in Qu with or without the modification by Qiu.]
by the computing device, the second module modeling second pairwise interactions between unique pairwise combination of the third latent vector representation, the fourth latent vector representation, and the fifth latent vector representation; [Interactions in the “product layer” of FIG. 1. In detail, Qu, page 1151, left column, paragraph 1 teaches that “                        
                            
                                
                                    p
                                
                                
                                    i
                                    ,
                                    j
                                
                            
                            =
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                     defines the pairwise interaction.” For example, § III.B, paragraph 1, teaches the pairwise interaction may be the vector inner product (i.e., dot product)                        
                             
                            g
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            f
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    .]
by the computing device, the second module producing a second intermediate output by combining results of the second modeled pairwise interactions [As shown in Qu, eq. (7) on page 1151, left column (                        
                            p
                            =
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            i
                                            ,
                                            j
                                        
                                    
                                
                            
                            ,
                             
                            i
                            =
                            1
                            …
                            N
                            ,
                             
                            j
                            =
                            1
                            …
                            N
                        
                    ), the different pairwise interactions                         
                            
                                
                                    p
                                
                                
                                    i
                                    ,
                                    j
                                
                            
                        
                     are combined to generate the output p (i.e., an “intermediate output”).]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Qu, Qiu, and He and the teachings of Liu by: (1) duplicating the “first module” to obtain a “second module” that is “different than the first module” and has the above-discussed limitations of the “second module” of the instant claim; and by (2) modifying the neural network to further include said second module such that the intermediate output of the first module is fed forward to one of the third, fourth or fifth input of the second module,” in the manner shown in Qiu, in order to achieve improvements in predictive performance by modeling multi-entity interaction (Liu, § 6) in a manner that reveals the joint characteristics of entities (see Liu, § 6 and abstract). 
The combination of Qu, Qiu, He, and Liu does not teach that the second intermediate output is obtained by combining results of the second modeled pairwise interactions “with the third input.”
Guo, in an analogous art, teaches combining results of the second modeled pairwise interactions “with the third input.” Guo generally relates to implementing factorization-machines based on deep neural network (see title and abstract), with applicability to recommender systems (§ 1, paragraph1). Therefore, Guo is analogous art for at least the reason of being in the field of machine learning.
In particular, Guo teaches combining results of the second modeled pairwise interactions with the third input [FIGS. 1-2. Initially, it is noted that the “FM layer” models pairwise interactions between dense embeddings (i.e., latent vectors Vi, as described in § 1, paragraph 1) corresponding to various inputs (fields i, j, and m) via an inner product operation. Then, in the FM layer, the pairwise interactions are combined with the original fields (any of which may correspond to the “third input” of the instant claim). This combination is formulated in equation 2 in § 2.1, which shows that the inner products                         
                            
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            V
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                     (i.e., pairwise interactions of the latent vectors) are combined with the input vector x according to the operations associated with the terms                        
                             
                            
                                
                                    w
                                    ,
                                    x
                                
                            
                        
                     and also                        
                             
                            
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                        
                                    
                                    ,
                                    
                                        
                                            V
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            
                                
                                    x
                                
                                
                                    j
                                    1
                                
                            
                            ⋅
                            
                                
                                    x
                                
                                
                                    j
                                    2
                                
                            
                        
                    . In particular, the term                         
                            
                                
                                    w
                                    ,
                                    x
                                
                            
                        
                     “reflects the importance of order-1 features,” as described in the text below equation 2.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Qu, Qiu, He, and Liu based on the teachings of Guo by combining results of the second modeled pairwise interactions with the third input, as taught by Guo, in order to model the effect of linear (order-1) interactions on the intermediate output, as suggested by Guo, § 2.1, paragraphs 2-3.

As to claim 13, the combination of Qu, Qiu, He, Liu, and Guo teaches the method of claim 12, wherein:
the third input [since the third, fourth, and fifth inputs are interchangeable in the combination of Qu, Qiu, Liu, and Guo, a particular input that receives the intermediate output of the previous module may be regarded as the “third input.”] is a dense feature vector [The vector p in Qu (i.e., the “intermediate output,” which can be used as the third input) is composed of the individual values of the dot product between embedding vectors. Since Qu teaches that the embeddings may be dense vectors (see Qu, § IV.E (p. 1153), paragraph 2), the resulting vector p would also be a dense feature vector. Therefore, the instant limitation naturally flows from the combined teachings of the references.] and the fourth and fifth inputs are both sparse feature vectors; [In general, the inputs in Qu are sparse feature vectors. Qu, § IV.E (p. 1153), paragraph 2: “The embedding layer is to convert sparse binary inputs to dense real-value vectors.” Qu, § III, paragraph 2, also teaches: “the information is represented as a multi-field categorical feature vector, where each field (e.g. City) is one-hot encoded as discussed in Section I. Such a field-wise one-hot encoding representation results in curse of dimensionality and enormous sparsity.” § I, paragraph 2, illustrates examples of three sparse feature vectors. Therefore, the instant limitation of the fourth and fifth inputs (which are not predetermined, unlike the third input) being sparse is taught by the combination of references.].
the neural network has a plurality of said second modules arranged in a feedforward arrangement; [Liu, FIG. 2, wherein the ellipsis between the bottom two boxes indicate that there may be any arbitrary number of modules] and
the second intermediate output of one second module feeds into the third input of the next second module in the feedforward arrangement. [Since the third, fourth, and fifth inputs are interchangeable in the combination of Qu, Qiu, He, Liu, and Guo, the particular input that receives the intermediate output of the previous module may be regarded as the “third input.”]

As to claim 16, the combination of Qu, Qiu, and He teaches the media of parent claim 15, as set forth above. The additional limitations recited in claim 16 are the same or substantially the same as those recited in claim 12. Therefore, the rejection of claim 12 is applied to claim 16.

As to claim 19, the combination of Qu, Qiu, and He teaches the system of parent claim 18, as set forth above. The additional limitations recited in claim 19 are the same or substantially the same as those recited in claim 12. Therefore, the rejection made to claim 12 is applied to claim 19.

6.	Claims 14, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Qu in view of Qiu and He and further in view of Li et al. (US 2015/0019640A1) (hereinafter “Li”).
As to claim 14, the combination of Qu, Qiu, and He teaches the method of claim 1, wherein:
the plurality of first latent vector representations correspond to different related meanings of the first input; [Qiu, § 1, paragraph 2, teaching that the different embeddings represent different meanings (senses) of a word.]
the plurality of second latent vector representations correspond to different related meanings of the second input; [Qiu, § 1, paragraph 2, teaching that the different embeddings represent different meanings (senses) of a word. Note that this teaching of Qiu has been applied to both the first and second latent vector representations, as set forth in the rejection of claim 1.];
execution of the neural network is used to provide a ranking of content determined to be of interest…based on pairwise interactions of related meanings of the first input and second input [Qu, § 1: “This predicted probability indicates the user’s interest on the specific item such as a news article, a commercial item or an advertising post, which influences the subsequent decision making such as document ranking.” Note that the “predicted probability” refers is based on the overall output of the model described in Qu; thus, the “document ranking” is based on pairwise interactions of related meanings of the first input and second input.] 
However, the combination of the references does not disclose the details that the neural network is “executable on a computer network having a network client” and that the ranking is of “content determined to be of interest to the network client.”
Li, in an analogous art, teaches the limitations of “executable on a computer network having a network client” and a ranking of “content determined to be of interest to the network client.” Li generally relates to page recommendations on online social networks (title), and is analogous art for at least the reason of being in the field of social networking systems or being pertinent to the problems thereof. 
In particular, Li teaches a matrix factorization model executable on a computer network [FIG. 1: social-networking system 160 comprising a plurality of servers 162, as described in [0015]] having a network client [FIG. 1: client system 130], and teaches that execution of the model is used to provide a ranking of content determined to be of interest to the network client [[0035]: “Based on the calculated ratings functions, social-networking system 160 may then rank the concepts with respect to each user based on the scores and store the ranking (e.g., a list of 50 top-ranked concept-profile pages) for the each user.” Since interactions with the “user” are performed via client system 130 of FIG. 1 (see [0013], [0030]), Li teaches ranking of content determined to be of interest to the network client.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Qu, Qiu, and He with the teachings of Li by configuring the neural network to be executable on a computer network having a network client such that execution of the neural network is used to provide a ranking of content determined to be of interest to the network client based on the pairwise interactions of related meanings of the first input and second input, in order to provide services, such as recommended content, to users of a social networking system, as suggested by Li ([0004] and [0035]).  

As to claim 17, the combination of Qu, Qiu, and He teaches the media of claim 15, wherein:
the second input is a sparse feature vector; [Qu, § IV.E (p. 1153), paragraph 2, teaches that its inputs are sparse feature vectors: “The embedding layer is to convert sparse binary inputs to dense real-value vectors.” Qu, § III, paragraph 2, also teaches: “the information is represented as a multi-field categorical feature vector, where each field (e.g. City) is one-hot encoded as discussed in Section I. Such a field-wise one-hot encoding representation results in curse of dimensionality and enormous sparsity.” § I, paragraph 2, illustrates examples of three sparse feature vectors]
the plurality of different second latent vector representations of the second input are based on different embeddings associated with the second input, each embedding being trained to represent a different related meaning to the second input; [Qiu, § 1, paragraph 2, teaching that the different embeddings represent different meanings (senses) of a word. Qu, § IV.E (page 1153), paragraph 3, teaches that embeddings (latent vectors) are learned (i.e., trained). Therefore, the combination of Qu, Qiu, and He teaches the instant limitation.];
the plurality of different first latent vector representations correspond to different related meanings of the first input; [Qiu, § 1, paragraph 2, teaching that the different embeddings represent different meanings (senses) of a word. Note that this teaching of Qiu has been applied to both the first and second latent vector representations, as set forth in the rejection of claim 1.]; and
execution of the neural network is used to provide a ranking of content determined to be of interest…based on pairwise interactions of related meanings of the first input and second input [Qu, § 1: “This predicted probability indicates the user’s interest on the specific item such as a news article, a commercial item or an advertising post, which influences the subsequent decision making such as document ranking.” Note that the “predicted probability” refers is based on the overall output of the model described in Qu; thus, the “document ranking” is based on pairwise interactions of related meanings of the first input and second input.] 
However, the combination of the references does not disclose the details that the neural network is “executable on a computer network having a network client” and that the ranking is of “content determined to be of interest to the network client.”
Li, in an analogous art, teaches the limitations of “executable on a computer network having a network client” and a ranking of “content determined to be of interest to the network client.” Li generally relates to page recommendations on online social networks (title), and is analogous art for at least the reason of being in the field of social networking systems or being pertinent to the problems thereof. 
In particular, Li teaches a matrix factorization model executable on a computer network [FIG. 1: social-networking system 160 comprising a plurality of servers 162, as described in [0015]] having a network client [FIG. 1: client system 130], and teaches that execution of the model is used to provide a ranking of content determined to be of interest to the network client [[0035]: “Based on the calculated ratings functions, social-networking system 160 may then rank the concepts with respect to each user based on the scores and store the ranking (e.g., a list of 50 top-ranked concept-profile pages) for the each user.” Since interactions with the “user” are performed via client system 130 of FIG. 1 (see [0013], [0030]), Li teaches ranking of content determined to be of interest to the network client.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Qu, Qiu, and He with the teachings of Li by configuring the neural network to be executable on a computer network having a network client such that execution of the neural network is used to provide a ranking of content determined to be of interest to the network client based on the pairwise interactions of related meanings of the first input and second input, in order to provide services, such as recommended content, to users of a social networking system, as suggested by Li ([0004] and [0035]).  

As to claim 20, the combination of Qu, Qiu, and He teaches the system of parent claim 18, as set forth above. The additional limitations recited in claim 20 are the same or substantially the same as those recited in claim 17. Therefore, the rejection made to claim 17 is applied to claim 20.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US9514404B1 teaches a plurality of embedding functions, wherein each of the embedding functions operates independently of each other embedding function.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764.  The examiner can normally be reached on Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124