DETAILED ACTION
1.	This communication is in response to the amendment filed on April 12, 2022 for Application No. 15/943,445 in which claims 1, 3-5, and 16-21 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
3.	The amendments filed April 12, 2022 have been considered. Claims 1 and 4 are amended, claims 2 and 6-15 have been cancelled, and claims 16-21 are newly added. Claims 1, 3-5, and 16-21 are pending and presented for examination.

4.	Applicant’s amendment to the specification, filed April 12, 2022, regarding the mis-numbered elements in Par. [0032-0034] have been fully considered. The objections to the specification have been withdrawn.

5.	Applicant’s arguments filed in regards to the 35 U.S.C. 101 rejection for Claims 1-15 have been fully considered but they are not persuasive. Further, as mentioned by the applicant on Page 10 of the applicant’s arguments, the first action of the record was the First Action Interview Pilot Program Pre-Interview Communication which was abbreviated and did not require an in-depth prong by prong analysis or analysis of each additional element. However, the examiner has now taken into consideration the amendments submitted by the applicant and the complete 35 U.S.C. 101 analysis and rejection of amended claims 1, 3-5, and 16-21 is presented in the subsequent section below.
	Applicant’s further states that the claims are not directed to a judicial exception and that any alleged exception is integrated into a practical application. Examiner respectfully disagrees as Claims 1, 3-5, and 16-21 are all directed towards both mental processes and mathematical concepts. That is, other than the recitation of a processor and/or device, which are generic computer components, each limitation may be performed within the human mind or by mathematical calculation. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the “Mathematic Concepts” grouping of abstract ideas. Similarly, if a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.

6.	Applicant’s arguments with respect to claims 1-15 (now amended to claims 1, 3-5, and 16-21) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. While the Hinton and Rendle references were silent on deriving factor matrices from a transformation matrix, or replacing a transformation matrix with factor matrices, the newly introduced Roy reference (US PG-PUB 20180075353) teaches deriving factor matrices from a transformation matrix and replacing a transformation matrix with factor matrices. The updated claim limitation mapping has been presented in the subsequent 35 U.S.C 103 section below.

7.	Applicant also filed new claims 16-20 which are directed to computer-readable media and an apparatus. These claims recite similar elements to those in amended claims 1 and 3-5 and hence are rejected under the same rationale. Applicant’s arguments with respect to newly filed Claim 21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The claim limitation mapping has been presented in the subsequent 35 U.S.C 103 section below.

Claim Objections
8.	Claims 1, 16, and 20 are objected to because step (h) has been incorrectly recited twice. The last line in each claim recites “(h) reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix” but should instead be corrected to read as “(j) reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix” as step (h) already refers to a different “processing” step in the preceding lines of the claim. Appropriate correction is required.

Claim Rejections - 35 USC § 101
9.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

10.	Claim 1, 3-5, and 16-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
	Claim 1 recites a method comprising: (a) instantiating, by a processor of a device, a capsule network having a plurality of capsules arranged in one or more layer, wherein each capsule includes a trainable transformation matrix; (b) receiving, by the device, training data for the capsule network and a desired output associated with the training data; (c) receiving, by the processor, a value c for a factor matrix inner dimension; (d) deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j; (e) processing, by the device using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, the training data to generate actual output; (f) comparing, by the processor, the actual output of the capsule network with the desired output associated with the training data; (g) generating, by the processor, a system of equations associated with the first factor matrix and the second factor matrix based, at least in part, on differences determined by the comparison of the actual output with the desired output; and (h) processing, by the processor, the system of equations via a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix; (i) repeating, by the device, steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix until the actual output of the capsule network corresponds to the desired output associated with the training data; and (j) reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix.
	2A Prong 1: The limitation of (d) deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the processor”, deriving a first factor matrix and a second factor matrix from a first transformation matrix in this context encompasses generic matrix decomposition, in which a matrix is decomposed into a product of matrices. Similarly, (f) comparing, by the processor, the actual output of the capsule network with the desired output associated with the training data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “by the processor”, comparing the actual output of the capsule network with the desired output associated with the training data may be performed manually by the user. Similarly, (g) generating, by the processor, a system of equations associated with the first factor matrix and the second factor matrix based, at least in part, on differences determined by the comparison of the actual output with the desired output, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the processor”, generating a system of equations associated with the first and second factor matrices based on the determined differences in output encompasses mathematical calculation to determine a system of equations based on the differences in output. Similarly, (i) repeating, by the device, steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix until the actual output of the capsule network corresponds to the desired output associated with the training data, as drafted is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind and by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the device”, repeating steps (d)-(h) with the updated values for entries in the first and second factor matrices until the actual output corresponds to the desired output encompasses the series of mathematical concepts and mental processes as recited in the rejection of the according steps above. Similarly, (j) reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix, as drafted is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the processor”, reconstructing the first transformation matrix using the first and second factor matrices encompasses generic matrix multiplication. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the “Mathematic Concepts” grouping of abstract ideas. Similarly, if a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
	2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – (a) instantiating, by a processor of a device, a capsule network having a plurality of capsules arranged in one or more layers, wherein each capsule includes a trainable transformation matrix. The processor and capsule network are both recited at a high-level of generality (i.e., as a generic processor performing a generic machine learning function of instantiating a generic capsule network) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Further, the claim recites (b) receiving, by the device, training data for the capsule network and a desired output associated with the training data and (c) receiving, by the processor, a value c for a factor matrix inner dimension. Both of these steps relate to receiving data and are recited at a high level of generality, amounting to merely receiving data over a network, which is a form of insignificant extra-solution activity. Further, the claim recites additional elements – (e) processing, by the device using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, the training data to generate actual output. The device using the capsule network is recited at a high-level of generality (i.e., as a generic processing device performing a generic machine learning function to process training data and generate output) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Further, the claim recites additional elements – (h) processing, by the processor, the system of equations via a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix. The processor and factorization machine are both recited at a high-level of generality (i.e., as a generic processor paired to a generic factorization machine to process a system of equations and determine updated values for first and second matrices) such that it amounts to no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
	2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements within steps (a), (e), and (h) amounts to no more than mere instructions to apply the exception using generic computer components and cannot provide an inventive concept. Further, as discussed above with respect to integration of the abstract idea into a practical application, receiving training data and receiving a value as per steps (b) and (c) were considered to be insignificant extra-solution activities in Step 2A Prong 2, and thus are re-evaluated in Step 2B to determine if they are more than what is well-understood, routine, conventional activities in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Receiving or transmitting data over a network” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed steps are well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible. 
	For the reasons above, Claim 1 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 3-5. The additional limitations of the dependent claims are addressed below. 

	Claim 3 recites the method of claim 1, wherein the value of the factor matrix inner dimension is greater than or equal to three (3) and less than or equal to six (6). Dependent claim 3 recites insignificant extra-solution activity for configuring data input, this limitation is just another activity referring to receiving data over a network. Accordingly, this additional element does not integrate the abstract idea into practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of Claim 1.

	Claim 4 recites the method of claim 1, wherein step (i) further comprises: (i-1) determining, by the processor, that the actual output capsule network is not converging to the desired output associated with the training data; and (i-2) increasing, by the processor, the value c of the factor matrix inner dimension to a second value c ' responsive to the determination that the capsule network is not converging; and wherein steps (d)-(h) are repeated with the derived first factor matrix having dimensions in x c', and the derived second factor matrix having dimensions c' x j. Dependent Claim 4 recites mental process “determining, by the processor, that the actual output capsule network is not converging to the desired output associated with the training data”, such that other than reciting “by the processor” the determining may be performed within the mind. Further, dependent claim 4 recites mental process “increasing, by the processor, the value c of the factor matrix inner dimension to a second value c’ responsive to the determination that the capsule network is not converging”, such that other than reciting “by the processor” the increasing the value step may be performed manually by a user after determining the capsule network is not converging. Accordingly, dependent claim 4 also recites “wherein steps (d)-(h) are repeated with the derived first factor matrix having dimensions m x c’, and the derived second factor matrix having dimensions c’ x j”, which refer to the steps performed in Claim 1, and covers performance of the limitation in the mind and by mathematical calculation but for the recitation of generic computer components. Accordingly, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of Claim 1.

	Claim 5 recites the method of claim 1, further comprising configuring the factorization machine to utilize stochastic gradient descent as a learning mode. Dependent claim 4 is just another activity referring to configuring the generic factorization machine additional element, such that it amounts to no more than mere instructions to apply the exception using generic computer components. Accordingly, this additional element does not integrate the abstract idea into practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of Claim 1.
	
	Claim 16 recites substantially the same limitations of Claim 1, in the form of one or more non-transitory machine-readable media comprising program code, including generic computer components, that carry out the limitations recited in claim 1. The claim is also directed to an abstract idea and performing mental processes and mathematical calculations without significantly more, therefore it is rejected under the same rationale.

	Claim 17, which is dependent on Claim 16, recites substantially the same limitations of Claim 3, in the form of one or more non-transitory machine-readable media comprising program code, including generic computer components, that carry out the limitations recited in claim 3. The claim is also directed to an abstract idea and performing mental processes and mathematical calculations without significantly more, therefore it is rejected under the same rationale.

Claim 18, which is dependent on Claim 16, recites substantially the same limitations of Claim 4, in the form of one or more non-transitory machine-readable media comprising program code, including generic computer components, that carry out the limitations recited in claim 4. The claim is also directed to an abstract idea and performing mental processes and mathematical calculations without significantly more, therefore it is rejected under the same rationale.

Claim 19, which is dependent on Claim 16, recites substantially the same limitations of Claim 5, in the form of one or more non-transitory machine-readable media comprising program code, including generic computer components, that carry out the limitations recited in claim 5. The claim is also directed to an abstract idea and performing mental processes and mathematical calculations without significantly more, therefore it is rejected under the same rationale.

Claim 20, recites substantially the same limitations of Claim 1, in the form of an apparatus comprising at least one processor and a memory storing instructions to be executed by the processor, that carry out the limitations recited in claim 1. The claim is also directed to an abstract idea and performing mental processes and mathematical calculations without significantly more, therefore it is rejected under the same rationale.

Claim 21 recites a method for training a capsule network, comprising: deriving, by a processor of a device, from a first transformation matrix of a capsule of a capsule network, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j; iteratively processing a set of training data to generate a set of output data, by the processor using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, until the generated output data corresponds to predetermined output data for the set of training data after a final iteration, with each iteration using modified values for one or both of the derived first factor matrix and the derived second factor matrix; responsive to the generated output data corresponding to the predetermined output data, generating new values for the first transformation matrix, by the processor, using values of the derived first factor matrix and the derived second factor matrix from the final iteration; and replacing, by the processor, the values of the first transformation matrix with the generated new values.
2A Prong 1: The limitation of deriving, by a processor of a device, from a first transformation matrix of a capsule of a capsule network, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the processor”, deriving a first factor matrix and a second factor matrix from a first transformation matrix in this context encompasses generic matrix decomposition, in which a matrix is decomposed into a product of matrices. Similarly, iteratively processing a set of training data to generate a set of output data, by the processor using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, until the generated output data corresponds to predetermined output data for the set of training data after a final iteration, with each iteration using modified values for one or both of the derived first factor matrix and the derived second factor matrix, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the processor”, iteratively processing a set of training data to generate a set of output until the generated output data corresponds to predetermined output data encompasses repetitive mathematical calculation Similarly, responsive to the generated output data corresponding to the predetermined output data, generating new values for the first transformation matrix, by the processor, using values of the derived first factor matrix and the derived second factor matrix from the final iteration, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components. That is, other than reciting “by the processor”, generating new values for the first transformation matrix using values of the derived first and second factor matrices encompasses mathematical calculation to generate new values for a matrix based on generated output data. Similarly, replacing, by the processor, the values of the first transformation matrix with the generated new values, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “by the processor”, replacing values in a matrix with new values may be performed manually by a user. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the “Mathematic Concepts” grouping of abstract ideas. Similarly, if a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of a processor and a capsule network. The processor and capsule network are both recited at a high-level of generality (i.e., as a generic processor performing a generic machine learning function of a generic capsule network) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Further, the claim recites iteratively processing a set of training data to generate a set of output data, by the processor using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, until the generated output data corresponds to predetermined output data for the set of training data after a final iteration, with each iteration using modified values for one or both of the derived first factor matrix and the derived second factor matrix and responsive to the generated output data corresponding to the predetermined output data, generating new values for the first transformation matrix, by the processor, using values of the derived first factor matrix and the derived second factor matrix from the final iteration. Both of these steps relate to iteratively performing calculations and are recited at a high level of generality, amounting to merely performing repetitive calculations, which is a form of insignificant extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception using generic computer components and cannot provide an inventive concept. Further, as discussed above with respect to integration of the abstract idea into a practical application, iteratively performing mathematical calculations were considered to be insignificant extra-solution activities in Step 2A Prong 2, and thus are re-evaluated in Step 2B to determine if they are more than what is well-understood, routine, conventional activities in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Performing repetitive calculations” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed steps are well-understood, routine, conventional activity is supported under Berkheimer. The claim is not patent eligible. 
For the reasons above, Claim 21 is rejected as being directed to an abstract idea without significantly more.

Claim Rejections - 35 USC § 103
11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


12.	Claims 1, 3-5, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hinton et al. (hereinafter Hinton) (“Matrix Capsules with EM Routing”), in view of Rendle (“Factorization Machines”), further in view of Roy (US PG-PUB 20180075353).
Regarding Claim 1, Hinton in view of Rendle further in view of Roy teaches a method comprising: 
(a) instantiating, by a processor of a device, a capsule network having a plurality of capsules arranged in one or more layer, wherein each capsule includes a trainable transformation matrix (Hinton, Pg. 1, Abstract, “A capsule is a group of neurons whose outputs represent different properties of the same entity. Each layer in a capsule network contains many capsules. We describe a version of capsules in which each capsule has a logistic unit to represent the presence of an entity and a 4x4 matrix which could learn to represent the relationship between that entity and the viewer (the pose). A capsule in one layer votes for the pose matrix of many different capsules in the layer above by multiplying its own pose matrix by trainable viewpoint-invariant transformation matrices that could learn to represent part-whole relationships.”, therefore a capsule network having a plurality of capsules, including a trainable transformation matrix in each capsule is disclosed); 
(b) receiving, by the device, training data for the capsule network (Hinton, Figure 1, depicts the general architecture of the capsule network with the trainable transformation matrix. The first step shows an inputted training image. Page 4, Section 5 (Experiments), "The smallNORB dataset (LeCun et al. (2004)) has gray-level stereo images of 5 classes of toys: airplanes, cars, trucks, humans and animals. There are 10 physical instances of each class which are painted matte green. 5 physical instances of a class are selected for the training data and the other 5 for the test data.", thus, the capsule network receives training data) and a desired output associated with the training data (Hinton, Page 6, Figure 2, depicts the histogram of votes after each routing iteration & shows the actual output/prediction of the capsule network (left) compared to the desired output associated with the training data input (right), thus the training data and desired output are received); 
(c) receiving, by the processor, a value c for a factor matrix inner dimension (Rendle, Page 996, Section III (Factorization Machines (FM)), “A row vi within V describes the i-th variable with k factors.k ∈ N + 0 is a hyperparameter that defines the dimensionality of the factorization.” In machine learning, it is known that a hyperparameter is a parameter whose value is used to control the learning process – this is further supported by evidence from Pedregosa (“Hyperparameter optimization with approximate gradient”, Introduction) which recites “that many models in machine learning contain at least one hyperparameter to control for model complexity”); 
(d) deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules (Hinton, Page 2, Section 2 (How Capsules Work), "The set of capsules in layer L is denoted as ΩL. Each capsule has a 4x4 pose matrix, M, and an activation probability, a. These are like the activities in a standard neural net: they depend on the current input and are not stored. In between each capsule i in layer L and each capsule j in layer L + 1 is a 4x4 trainable transformation matrix, Wij. These Wij’s (and two learned biases per capsule) are the only stored parameters and they are learned discriminatively. The pose matrix of capsule i is transformed by Wij to cast a vote Vij = MiWij for the pose matrix of capsule j." Therefore, the capsule network contains a trainable transformation matrix for each capsule), a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j (Roy, Par. [0030-0032], “At block 406, an M by N matrix of relevance scores is generated based on the N dimensional vector of estimated emotional responses and the received M by N matrix of implicit feedback. In one embodiment, an M by K user matrix and a K by N item matrix are generated, such that when multiplied together, produce an M by N matrix of relevance scores. In this way, the M by K user matrix and the K by N item matrix are latent factor representations of the M by N user-item relevance matrix” & “In one embodiment, the K by N item matrix is generated indirectly by generating an N by N transformation matrix, that, when multiplied by the N dimensional vector of estimated emotional responses, generates the K by N item matrix. One example of this is equation 204 of FIG. 2”, thus, the transformation matrix may be broken down into first factor matrix having dimensions M x K and a second factor matrix having dimensions K X N Wherein K may represent the factor matrix inner dimension c); 
(e) processing, by the device using the capsule network [[with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix]] (the derived first factor matrix and second factor matrix substituting the transformation matrix is taught in step (d) above), the training data to generate actual output (Hinton, Pg. 7, Section 5.1 Generalization to novel viewpoints, “A more severe test of generalization is to use a limited range of viewpoints for training and to test on a much wider range. We trained both our convolutional baseline and our capsule model on one-third of the training data containing azimuths of (300, 320, 340, 0, 20, 40) and tested on the two-thirds of the test data that contained azimuths from 60 to 280. In a separate experiment, we trained on the 3 smaller elevations and tested on the 6 larger elevations.”, thus, the training data is processed by the capsule network to generate actual output); 
(f) comparing, by the processor, the actual output of the capsule network with the desired output associated with the training data (Hinton, Page 6, Figure 2, depicts the histogram of votes after each routing iteration & shows the actual output/prediction of the capsule network (left) compared to the desired output associated with the training data input (right) & Pg. 7, Section 5.1 Generalization to novel viewpoints, “It is hard to decide if the capsules model is better at generalizing to novel viewpoints because it achieves better test accuracy on all viewpoints. To eliminate this confounding factor, we stopped training the capsule model when its performance matched the baseline CNN on the third of the test set that used the training viewpoints. Then, we compared these matched models on the two thirds of the test set with novel viewpoints. Results in Tab. 2 show that compared with the baseline CNN capsules with matched performance on familiar viewpoints reduce the test error rate on novel viewpoints by about 30% for both novel azimuths and novel elevations.”, thus, the actual output of the capsule network is compared to the desired output associated with the training data and a baseline model); 
(g) generating, by the processor, a system of equations associated with the first factor matrix and the second factor matrix based, at least in part, on differences determined by the comparison of the actual output with the desired output (Hinton, Page 2, Section 2 (How Capsules Work), "The poses and activations of all the capsules in layer L + 1 are calculated by using a non-linear routing procedure which gets as input Vij and ai for all i ∈ ΩL, j ∈ ΩL+1. The non-linear procedure is a version of the Expectation-Maximization procedure. It iteratively adjusts the means, variances, and activation probabilities of the capsules in layer L + 1 and the assignment probabilities between all i ∈ ΩL, j ∈ ΩL+1." The system of equations used is the Expectation-Maximization routing procedure, which takes into consideration the first and second factor matrices between capsules in different layers.); and 
(h) processing, by the processor, the system of equations via a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix (Rendle, Page 995, Abstract, "In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters." & Page 999, Section A (Matrix and Tensor Factorization), "Matrix factorization (MF) is one of the most studied factorization models (e.g. [7], [8], [2]). It factorizes a relationship between two categorical variables (e.g. U and I)." Therefore, a factorization machine may take the inputs of a first factor matrix and a second factor matrix to determine updated values.); 
(i) repeating, by the device, steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix (Roy, Par. [0033], “In one embodiment, the M by K user matrix and the N by N transformation matrix are generated by random seeding and iterative updating based on an objective function, such as function 206 of FIG. 2. The objective function 206 is defined in terms of user matrix U (the M by N user matrix) and transformation matrix T (the N by N transformation matrix). The M by K user matrix and the N by N transformation matrix are, in one embodiment, updated by adding the result of the previous iteration to the partial derivative of the objective function. Specifically, the M by K user matrix U is updated by adding the partial derivative of the objective function with respect to U (function 208 of FIG. 2) to the previous iteration of U. Similarly, the N by N transformation matrix T is updated by adding the partial derivative of the objective function with respect to T (function 210 of FIG. 2) to the previous iteration of T. A more detailed description of this process is described above with regard to FIG. 3.”, thus, the entries in the first and second factor matrices are updated iteratively) until the actual output of the capsule network corresponds to the desired output associated with the training data (Hinton, Page 6, Figure 2, depicts the histogram of votes after each routing iteration & shows the actual output/prediction of the capsule network (left) compared to the desired output associated with the training data input (right) & Pg. 7, Section 5.1 Generalization to novel viewpoints, “It is hard to decide if the capsules model is better at generalizing to novel viewpoints because it achieves better test accuracy on all viewpoints. To eliminate this confounding factor, we stopped training the capsule model when its performance matched the baseline CNN on the third of the test set that used the training viewpoints. Then, we compared these matched models on the two thirds of the test set with novel viewpoints. Results in Tab. 2 show that compared with the baseline CNN capsules with matched performance on familiar viewpoints reduce the test error rate on novel viewpoints by about 30% for both novel azimuths and novel elevations.”, thus, the actual output of the capsule network is compared to the desired output associated with the training data and a baseline model); and 
(j) reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix (Roy, Par. [0030], “In one embodiment, an M by K user matrix and a K by N item matrix are generated, such that when multiplied together, produce an M by N matrix of relevance scores. In this way, the M by K user matrix and the K by N item matrix are latent factor representations of the M by N user-item relevance matrix.”, thus, the transformation matrix may be reconstructed by multiplying the first and second factor matrices together).

Hinton teaches a capsule network having a plurality of capsules arranged in one or more layers wherein each capsule includes a trainable transformation matrix (Hinton, Page 1, Abstract), receiving training data for the capsule network and a desired output associated with the training data (Hinton, Figs. 1-2), processing the training data to generate actual output Hinton, Pg. 7, Section 5.1), comparing the actual output of the capsule network with the desired output associated with the training data (Hinton, Fig. 2), and generating a system of equations associated with the first factor matrix and the second factor matrix based, at least in part, on differences determined by the comparison of the actual output with the desired output (Hinton, Pg. 2, Section 2). Hinton does not explicitly disclose processing the system of equations via a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix. However, Rendle teaches a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix (Rendle, Page 995, Abstract, "In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters." & Page 999, Section A (Matrix and Tensor Factorization), "Matrix factorization (MF) is one of the most studied factorization models (e.g. [7], [8], [2]). It factorizes a relationship between two categorical variables (e.g. U and I)." Therefore, a factorization machine may take the inputs of a first factor matrix and a second factor matrix to determine updated values.). It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the capsule network as disclosed by Hinton to include the factorization machine of Rendle. One of ordinary skill in the art would have been motivated to make this modification to produce a capsule network that can be trained more quickly and is more storage/memory efficient than the conventional capsule network (Rendle, Page 995, Section 1 (Introduction), "We show that the model equation of FMs can be computed in linear time and that it depends only on a linear number of parameters. This allows direct optimization and storage of model parameters without the need of storing any training data (e.g. support vectors) for prediction”).
Further, Hinton as modified by Rendle teaches the limitations of Claim 1 as cited above. Hinton as modified by Rendle, does not explicitly disclose deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j, repeating, by the device, steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix until the actual output of the capsule network corresponds to the desired output associated with the training data, and reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix. However, Roy teaches deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j (Roy, Par. [0030-0032]), repeating, by the device, steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix until the actual output of the capsule network corresponds to the desired output associated with the training data (Roy, Par. [0033]), and reconstructing, by the processor, the first transformation matrix using the first factor matrix and the second factor matrix (Roy, Par. [0030]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the capsule network paired with a factorization machine as disclosed by Hinton in view of Rendle, with the first and second factor matrices derived from a transformation matrix and iterative updates of the first and second factor matrices, as disclosed by Roy. One of ordinary skill in the art would have been motivated to make this modification to produce a system which can improve performance and efficiency of the matrix factorization process (Roy, Par. [0001]).

Regarding Claim 3, Hinton in view of Rendle further in view of Roy teaches the method of claim 1, wherein the value of the factor matrix inner dimension is greater than or equal to three (3) and less than or equal to six (6) (Rendle, Page 996, Section III (Factorization Machines (FM)), “A row vi within V describes the i-th variable with k factors.k ∈ N + 0 is a hyperparameter that defines the dimensionality of the factorization.” In machine learning, it is known that a hyperparameter is a parameter whose value is used to control the learning process – this is further supported by evidence from Pedregosa (“Hyperparameter optimization with approximate gradient”, Introduction) which recites “that many models in machine learning contain at least one hyperparameter to control for model complexity” & Hinton, Page 2, Section 2 (How Capsules Work), “The set of capsules in layer L is denoted as ΩL. Each capsule has a 4x4 pose matrix, M, and an activation probability, a. These are like the activities in a standard neural net: they depend on the current input and are not stored. In between each capsule i in layer L and each capsule j in layer L + 1 is a 4x4 trainable transformation matrix, Wij. These Wij’s (and two learned biases per capsule) are the only stored parameters and they are learned discriminatively. The pose matrix of capsule i is transformed by Wij to cast a vote Vij = MiWij for the pose matrix of capsule j.” Thus, the factor matrix inner dimension would ideally be 4, as per standard in Hinton’s capsule network – a dimension greater than or equal to three and less than or equal to six).
	The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 4, Hinton in view of Rendle further in view of Roy teaches the method of claim 1, wherein step (i) further comprises: 
(i-1) determining, by the processor, that the actual output capsule network is not converging to the desired output associated with the training data; and (i-2) increasing, by the processor, the value c of the factor matrix inner dimension to a second value c ' responsive to the determination that the capsule network is not converging (Rendle, Page 996, Section III (Factorization Machines (FM)), “A row vi within V describes the i-th variable with k factors.k ∈ N + 0 is a hyperparameter that defines the dimensionality of the factorization.” In machine learning, it is known that a hyperparameter is a parameter whose value is used to control the learning process and is usually adjusted to control learning, according to network convergence/divergence. This is further supported by evidence from Pedregosa (“Hyperparameter optimization with approximate gradient”, Abstract) which recites “An advantage of this method is that hyperparameters can be updated before model parameters have fully converged.”, thus, upon determination that the capsule network is not converging to the desired output, the model parameters/factor matrix inner dimension may be increased to a second value c’ as it is considered a hyperparameter and is adjusted to control learning); and 
wherein steps (d)-(h) are repeated with the derived first factor matrix having dimensions in x c', and the derived second factor matrix having dimensions c' x j (Roy, Par. [0033], “In one embodiment, the M by K user matrix and the N by N transformation matrix are generated by random seeding and iterative updating based on an objective function, such as function 206 of FIG. 2. The objective function 206 is defined in terms of user matrix U (the M by N user matrix) and transformation matrix T (the N by N transformation matrix). The M by K user matrix and the N by N transformation matrix are, in one embodiment, updated by adding the result of the previous iteration to the partial derivative of the objective function. Specifically, the M by K user matrix U is updated by adding the partial derivative of the objective function with respect to U (function 208 of FIG. 2) to the previous iteration of U. Similarly, the N by N transformation matrix T is updated by adding the partial derivative of the objective function with respect to T (function 210 of FIG. 2) to the previous iteration of T. A more detailed description of this process is described above with regard to FIG. 3.”, thus, the entries in the first and second factor matrices are updated iteratively).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 5, Hinton in view of Rendle further in view of Roy teaches the method of claim 1, further comprising configuring the factorization machine to utilize stochastic gradient descent as a learning mode (Rendle, Page 997, Section C (Learning Factorization Machines), “Thus, the model parameters (w0, w and V) of FMs can be learned efficiently by gradient descent methods – e.g. stochastic gradient descent (SGD) – for a variety of losses, among them are square, logit or hinge loss.” Thus, the factorization machine may be configured to utilize stochastic gradient descent as a learning mode).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 16, Hinton in view of Rendle further in view of Roy teaches one or more non-transitory machine-readable media comprising program code (Roy, Claim 20, “A non-transitory computer-readable storage medium […]”, thus a non-transitory machine-readable media is disclosed and include instructions for execution) for training a capsule network that, when executed by a processor of a device, causes the device to: 
(a) instantiate a capsule network having a plurality of capsules arranged in one or more layers, wherein each capsule includes a trainable transformation matrix (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale); 
(b) receive training data for the capsule network and a desired output associated with the training data (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale); 
(c) receive a value c for a factor matrix inner dimension (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale); 
(d) derive, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale);
(e) process, using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, the training data to generate actual output (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale);
(f) compare the actual output of the capsule network with the desired output associated with the training data (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale);
(g) generate a system of equations associated with the first factor matrix and the second factor matrix based, at least in part, on differences determined by the comparison of the actual output with the desired output (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale); and 
(h) process the system of equations via a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale);
(i) repeat steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix until the actual output of the capsule network corresponds to the desired output associated with the training data (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale); and 
(j) reconstruct the first transformation matrix using the first factor matrix and the second factor matrix (See claim 1 – recites substantially the same limitations as Claim 1 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Claim 17 recites substantially the same limitations as Claim 3 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale.

Claim 18 recites substantially the same limitations as Claim 4 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale.

Claim 19 recites substantially the same limitations as Claim 5 in the form of one or more non-transitory machine-readable media, therefore it is rejected under the same rationale.

Regarding Claim 20, Hinton in view of Rendle further in view of Roy teaches an apparatus comprising: 
at least one processor (Roy, claim 14, “A computing apparatus for ranking items, the computing apparatus comprising: a processor”, thus, at least one processor is disclosed); and 
a memory storing instructions that, when executed by the processor, cause the processor to (Roy, claim 14, “A computing apparatus for ranking items, the computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configures the apparatus to:”, thus, a memory storing instructions that are executed by the processor is disclosed): 
(a) instantiate a capsule network having a plurality of capsules arranged in one or more layers, wherein each capsule includes a trainable transformation matrix (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); 
(b) receive training data for the capsule network and a desired output associated with the training data (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); 
(c) receive a value c for a factor matrix inner dimension (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); 
(d) derive, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); 
(e) process, using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, the training data to generate actual output (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); 
(f) compare the actual output of the capsule network with the desired output associated with the training data (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); 
(g) generate a system of equations associated with the first factor matrix and the second factor matrix based, at least in part, on differences determined by the comparison of the actual output with the desired output (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); and 
(h) process the system of equations via a factorization machine to determine updated values for entries in the first factor matrix and the second factor matrix (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); -6-Atty. Dkt. No. 106861-4433Client Ref. CAI-20180202US1
(i) repeat steps (d)-(h) with the updated values for entries in the first factor matrix and the second factor matrix until the actual output of the capsule network corresponds to the desired output associated with the training data (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale); and 
(j) reconstruct the first transformation matrix using the first factor matrix and the second factor matrix (See claim 1 – recites substantially the same limitations as Claim 1 in the form of an apparatus, therefore it is rejected under the same rationale).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

13.	Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Hinton et al. (hereinafter Hinton) (“Matrix Capsules with EM Routing”), in view of Roy (US PG-PUB 20180075353).
Regarding Claim 21, Hinton in view of Roy teaches a method for training a capsule network, comprising: 
deriving, by a processor of a device, from a first transformation matrix of a capsule of a capsule network (Hinton, Page 2, Section 2 (How Capsules Work), "The set of capsules in layer L is denoted as ΩL. Each capsule has a 4x4 pose matrix, M, and an activation probability, a. These are like the activities in a standard neural net: they depend on the current input and are not stored. In between each capsule i in layer L and each capsule j in layer L + 1 is a 4x4 trainable transformation matrix, Wij. These Wij’s (and two learned biases per capsule) are the only stored parameters and they are learned discriminatively. The pose matrix of capsule i is transformed by Wij to cast a vote Vij = MiWij for the pose matrix of capsule j." Therefore, the capsule network contains a trainable transformation matrix for each capsule), a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j (Roy, Par. [0030-0032], “At block 406, an M by N matrix of relevance scores is generated based on the N dimensional vector of estimated emotional responses and the received M by N matrix of implicit feedback. In one embodiment, an M by K user matrix and a K by N item matrix are generated, such that when multiplied together, produce an M by N matrix of relevance scores. In this way, the M by K user matrix and the K by N item matrix are latent factor representations of the M by N user-item relevance matrix” & “In one embodiment, the K by N item matrix is generated indirectly by generating an N by N transformation matrix, that, when multiplied by the N dimensional vector of estimated emotional responses, generates the K by N item matrix. One example of this is equation 204 of FIG. 2”, thus, the transformation matrix may be broken down into first factor matrix having dimensions M x K and a second factor matrix having dimensions K X N Wherein K may represent the factor matrix inner dimension c); 
iteratively processing a set of training data to generate a set of output data (Hinton, Pg. 7, Section 5.1 Generalization to novel viewpoints, “A more severe test of generalization is to use a limited range of viewpoints for training and to test on a much wider range. We trained both our convolutional baseline and our capsule model on one-third of the training data containing azimuths of (300, 320, 340, 0, 20, 40) and tested on the two-thirds of the test data that contained azimuths from 60 to 280. In a separate experiment, we trained on the 3 smaller elevations and tested on the 6 larger elevations.”, thus, the training data is processed by the capsule network to generate actual output), by the processor using the capsule network [[with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix]] (the derived first factor matrix and second factor matrix substituting the transformation matrix is taught above), until the generated output data corresponds to predetermined output data for the set of training data after a final iteration (Roy, Par. [0025-0026], “Line 4 begins the block by executing in a loop, for i equal to 1 to M, the block from lines 5-7. In this block from lines 5-7, line 5 depicts updating the user matrix U by adding the previous value to equation 208 multiplied by a learning rate y, where equation 208 includes the partial derivative of objective function 206 with respect to user vector U.sub.i. The purpose of the learning rate γ is to ensure convergence. Line 6 depicts an inner loop, from j equals 1 to N, executing line 7, which updates transformation matrix T by adding the result of equation 210 multiplied by the learning rate γ, where equation 210 represents a partial derivative of the objective function 206 with respect to the transformation matrix T. At line 10, the final latent factor matrices U and T are returned as output.”, thus, multiple iterations are performed until convergence is ensured, hence output data would correspond to predetermined output data), with each iteration using modified values for one or both of the derived first factor matrix and the derived second factor matrix (Roy, Par. [0033], “In one embodiment, the M by K user matrix and the N by N transformation matrix are generated by random seeding and iterative updating based on an objective function, such as function 206 of FIG. 2. The objective function 206 is defined in terms of user matrix U (the M by N user matrix) and transformation matrix T (the N by N transformation matrix). The M by K user matrix and the N by N transformation matrix are, in one embodiment, updated by adding the result of the previous iteration to the partial derivative of the objective function. Specifically, the M by K user matrix U is updated by adding the partial derivative of the objective function with respect to U (function 208 of FIG. 2) to the previous iteration of U. Similarly, the N by N transformation matrix T is updated by adding the partial derivative of the objective function with respect to T (function 210 of FIG. 2) to the previous iteration of T. A more detailed description of this process is described above with regard to FIG. 3.”, thus, the entries in the first and second factor matrices are modified and updated iteratively); 
responsive to the generated output data corresponding to the predetermined output data (Roy, Par. [0025], “Line 4 begins the block by executing in a loop, for i equal to 1 to M, the block from lines 5-7. In this block from lines 5-7, line 5 depicts updating the user matrix U by adding the previous value to equation 208 multiplied by a learning rate y, where equation 208 includes the partial derivative of objective function 206 with respect to user vector U.sub.i. The purpose of the learning rate γ is to ensure convergence.” thus, multiple iterations are performed until convergence is ensured, hence output data would correspond to predetermined output data), generating new values for the first transformation matrix , by the processor, using values of the derived first factor matrix and the derived second factor matrix from the final iteration (Roy, Par. [0030], “In one embodiment, an M by K user matrix and a K by N item matrix are generated, such that when multiplied together, produce an M by N matrix of relevance scores. In this way, the M by K user matrix and the K by N item matrix are latent factor representations of the M by N user-item relevance matrix.”, thus, the transformation matrix may be generated by multiplying the first and second factor matrices together); and 
replacing, by the processor, the values of the first transformation matrix with the generated new values (Roy, Par. [0025-0026], “Line 6 depicts an inner loop, from j equals 1 to N, executing line 7, which updates transformation matrix T by adding the result of equation 210 multiplied by the learning rate γ, where equation 210 represents a partial derivative of the objective function 206 with respect to the transformation matrix T. At line 10, the final latent factor matrices U and T are returned as output.”, thus, the values of the transformation matrix are replaced with newly generated values).

Hinton teaches a capsule network having a plurality of capsules arranged in one or more layers wherein each capsule includes a trainable transformation matrix (Hinton, Page 1, Abstract) and processing training data to generate actual output (Hinton, Pg. 7, Section 5.1). Hinton does not explicitly disclose deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j,  iteratively processing a set of training data to generate a set of output data, by the processor using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, until the generated output data corresponds to predetermined output data for the set of training data after a final iteration, with each iteration using modified values for one or both of the derived first factor matrix and the derived second factor matrix, responsive to the generated output data corresponding to the predetermined output data, generating new values for the first transformation matrix, by the processor, using values of the derived first factor matrix and the derived second factor matrix from the final iteration, and replacing, by the processor, the values of the first transformation matrix with the generated new values. However, Roy teaches deriving, by the processor, from a first transformation matrix of a first capsule of the plurality of capsules, a first factor matrix and a second factor matrix for the first capsule, the first transformation matrix having dimensions m x j, the derived first factor matrix having dimensions m x c, and the derived second factor matrix having dimensions c x j (Roy, Par. [0030-0032]), iteratively processing a set of training data to generate a set of output data, by the processor using the capsule network with the derived first factor matrix and the derived second factor matrix replacing the first transformation matrix, until the generated output data corresponds to predetermined output data for the set of training data after a final iteration (Roy, Par. [0025]), with each iteration using modified values for one or both of the derived first factor matrix and the derived second factor matrix (Roy, Par. [0033]), responsive to the generated output data corresponding to the predetermined output data (Roy, Par. [0025]), generating new values for the first transformation matrix, by the processor, using values of the derived first factor matrix and the derived second factor matrix from the final iteration (Roy, Par. [0030]),, and replacing, by the processor, the values of the first transformation matrix with the generated new values (Roy, Par. [0025-0026]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the capsule network as disclosed by Hinton, with the first and second factor matrices derived from a transformation matrix and iterative updates of the first and second factor matrices, as disclosed by Roy. One of ordinary skill in the art would have been motivated to make this modification to produce a capsule network with improved performance and efficiency through the implementation of a matrix factorization process (Roy, Par. [0001]).

Conclusion
14.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Conlon et al. (US PG-PUB 20100156890) disclosed a method and system for decomposing a transformation matrix into a plurality of individual operation matrices.
Garimella et al. (US Patent 9,400,955) disclosed a neural network weight matrix which may be approximated as two low-rank matrices using a decomposition technique.

15.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

15.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is 571-272-0829. The examiner can normally be reached Monday - Thursday 7:30am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.S.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123