DETAILED ACTION
1.	This office action is in response to the Application No. 16927018 filed on 07/13/2020. Claims 1-20 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

3.	Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more.
	Step 1
	Independent claim 1 is directed to a method, and falls into one of the four statutory categories.
	Step 2A, Prong 1
	Claim 1 recites the following abstract ideas:
	determining an attention matrix based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (mathematical concept directed to mathematical calculations of a dot product)
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix is normalized with respect to the plurality of values along the respective dimension; (mathematical concept directed to mathematical calculations using a softmax function to normalize the dimensions of the attention matrix)
	determining an update matrix based on (i) the plurality of feature vectors transformed by a value function and (ii) the attention matrix; (mathematical concept directed to mathematical calculations of matrix based on a value function and an attention matrix) and
	Step 2A, Prong 2
	Claim 1 recites the following additional elements:
	receiving a perceptual representation comprising a plurality of feature vectors; (this limitation is directed to data transmission of feature vectors. This is directed to data transmission and does not integrate the abstract idea into a practical application)
	initializing a plurality of slot vectors represented by a neural network memory unit, (this is directed to data storage by a generic computer equipment and does not integrate the abstract idea into a practical application)
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (this is directed to data description of the input data and does not integrate the abstract idea into a practical application)
	by way of the neural network memory unit (this is directed to a highly generic computer component used for storing data and does not integrate the abstract idea into a practical application)
	updating the plurality of slot vectors based on the update matrix (this is directed writing data to memory which is a well understood routine and does not integrate the abstract idea into a practical application)

	Step 2B
	Claim 1 recites the following additional elements:
	receiving a perceptual representation comprising a plurality of feature vectors; (this limitation is directed to data transmission of feature vectors. This is directed to data transmission and does not amount to significantly more, see MPEP 2106.05(d)(II)(i))
	initializing a plurality of slot vectors represented by a neural network memory unit, (this is directed to data storage by a generic computer equipment and does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (this is directed to data description of the input data and does not amount to significantly more, see MPEP 2106.05(d)(II))
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit (this is directed writing data to memory which is a well understood routine and conventional function . This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))	
Dependent claim 2 is directed to a method, and falls into one of the four statutory categories.  
	Claim 2 recites the following abstract ideas:
	determining a second attention matrix based on a product of (i) the plurality of feature vectors transformed by the key function and (ii) the plurality of updated slot vectors transformed by the query function, (mathematical concept directed to mathematical calculations of a dot product)
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the second attention matrix is normalized with respect to the plurality of values along the respective dimension of the second attention matrix; (mathematical concept directed to mathematical calculations using a softmax function to normalize the dimensions of the attention matrix)
	determining a second update matrix based on (i) the plurality of feature vectors transformed by the value function and (ii) the second attention matrix; (mathematical concept directed to mathematical calculations of matrix based on a value function and an attention matrix) and
	Claim 2 recites the following additional elements:
	further updating the plurality of updated slot vectors based on the second update matrix by way of the neural network memory unit (this is directed writing data to memory which is a well understood routine and does not integrate the abstract idea into a practical application)
	Claim 2 recites the following additional elements:
	further updating the plurality of updated slot vectors based on the second update matrix by way of the neural network memory unit (this is directed to writing data to memory which is a well understood routine and conventional function and does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

Dependent claim 3 is directed to a method, and falls into one of the four statutory categories.  
	Claim 3 recites the following abstract ideas:
	wherein each respective slot vector represents a semantic embedding of the corresponding entity, (mental process directed to the mapping of a vector to a corresponding entity)
	binds the respective slot vector to the corresponding entity independently of a classification of the corresponding entity. (mental process directed to the mapping of a vector to a corresponding entity)
	Claim 3 recites the following additional elements:
	wherein updating the respective slot vector iteratively refines the semantic embedding of the corresponding entity (this is directed to a well understood routine and conventional function and does not integrate the abstract idea into a practical application) and
	Claim 3 recites the following additional elements:
	wherein updating the respective slot vector iteratively refines the semantic embedding of the corresponding entity (this is directed to a well understood routine and conventional function. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv)))

 Dependent claim 4 is directed to a method, and falls into one of the four statutory categories.  
	Claim 4 recites the following abstract ideas:
	normalizing each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values along the respective dimension by way of a softmax function by dividing (i) an exponent of the respective value of the K values along the respective dimension by (ii) a sum of exponents of the K values along the respective dimension (mathematical concept directed to mathematical calculations using a softmax function to normalize the dimensions of the attention matrix)
	Claim 4 recites the following additional elements:
	wherein the plurality of slot vectors comprises K slot vectors, wherein each respective dimension of the plurality of dimensions comprises K values, (this is directed to data description of the slot vectors and does not integrate the abstract idea into a practical application)
 	Claim 4 recites the following additional elements:
	wherein the plurality of slot vectors comprises K slot vectors, wherein each respective dimension of the plurality of dimensions comprises K values, (this is directed to data description of the slot vectors and does not amount to significantly more, see MPEP 2106.05(d)(II))
Dependent claim 5 is directed to a method, and falls into one of the four statutory categories.  
	Claim 5 recite the following abstract ideas:
	wherein normalizing each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values of the respective dimension by way of the softmax function causes the plurality of slot vectors to compete with one another for representing entities contained in the perceptual representation (mental process directed to the result from the normalization of the attention matrix)
	Claim 5 do not recite any additional elements.

Dependent claim 6 is directed to a method, and falls into one of the four statutory categories.  
	Claim 6 recites the following abstract ideas:
	determining a product of (i) the plurality of feature vectors transformed by the value function and (ii) a transpose of the attention matrix (mathematical concept directed to mathematical calculations of a dot product)
	Claim 6 do not recite any additional elements.

 Dependent claim 7 is directed to a method, and falls into one of the four statutory categories.  
	Claim 7 recites the following abstract ideas:
	wherein the plurality of feature vectors comprises N feature vectors, wherein the plurality of dimensions is a first plurality of dimensions comprising N dimensions, (mental process of the description of the rows and columns of a matrix)
	wherein determining the update matrix (mathematical concept directed to mathematical calculations of an update value of a matrix) comprises: 
	determining an attention weight matrix by dividing (i) each respective value of N values along each respective dimension of a second plurality of dimensions of the attention matrix by (ii) a sum of the N values along the respective dimension of the second plurality of dimensions; (mathematical concept directed to mathematical calculations of an attention weight matrix) and
	determining a product of (i) the plurality of feature vectors transformed by the value function and (ii) a transpose of the attention weight matrix. (mathematical concept directed to mathematical calculations of a dot product)
	Claim 7 do not recite any additional elements.

 Dependent claim 8 is directed to a method, and falls into one of the four statutory categories.  
	Claim 8 recites the following abstract ideas:
	wherein the plurality of feature vectors are represented by an input matrix comprising: (i) N rows corresponding to a number of the plurality of feature vectors and (ii) I columns corresponding to a number of dimensions of each of the plurality of feature vectors, (mental process of the description of the rows and columns of a matrix)
	wherein the plurality of slot vectors are represented by a slot matrix comprising: (ii) K rows corresponding to a number of the plurality of slot vectors and (ii) S columns corresponding to a number of dimensions of each of the plurality of slot vectors, (mental process of the description of the rows and columns of a matrix)
	wherein the key function comprises a linear transformation represented by a key weight matrix comprising I rows and D columns, (mathematical concept directed to mathematical calculations of a function using a matrix with rows and columns)
	wherein the query function comprises a linear transformation represented by a query weight matrix comprising S rows and D columns, (mathematical concept directed to mathematical calculations of a function using a matrix with rows and columns)
	wherein the value function comprises a linear transformation represented by a value weight matrix comprising I rows and D columns, (mathematical concept directed to mathematical calculations of a function using a matrix with rows and columns) and 
	wherein one or more of the key weight matrix, the query weight matrix, or the value weight matrix are learned during training (mathematical concepts directed to results from calculations)
	Claim 8 do not recite any additional elements.

 Dependent claim 9 is directed to a method, and falls into one of the four statutory categories.  
	Claim 9 recites the following abstract ideas:
	wherein determining the attention matrix based on the product comprises: determining a dot product of (i) the plurality of feature vectors transformed by the key function and (ii) a transpose of the plurality of slot vectors transformed by the query function; and dividing the dot product by a square root of D. (mathematical concept directed to mathematical calculations of a dot product)
	Claim 9 do not recite any additional elements.

Dependent claim 10 is directed to a method, and falls into one of the four statutory categories.  
	Claim 10 recites the following abstract ideas:
	wherein the plurality of slot vectors are permutation equivariant with respect to one another such that, for multiple different initializations of the plurality of slot vectors with respect to a given perceptual representation, a set of values of the plurality of slot vectors is approximately constant (mental process directed to the values of the vectors to be constant) and 
	an order of the plurality of slot vectors is variable, (mental process directed to the values of the vectors can change) and 
	wherein the plurality of slot vectors are permutation invariant with respect to the plurality of feature vectors such that, for multiple different permutations of the plurality of feature vector, the set of values of the plurality of slot vectors is approximately constant. (mental process directed to the values of the vectors to be constant)
	claim 10 do not recite any additional elements.

 Dependent claim 11 is directed to a method, and falls into one of the four statutory categories.  
	Claim 11 do not recite any abstract ideas.
	Claim 11 recite the following additional elements:
	wherein the perceptual representation comprises one or more of: two-dimensional image data, depth image data, point cloud data, time series data, audio data, or text data, (this is directed to data description of the input data and does not integrate the abstract idea into a practical application)
	wherein the perceptual representation is processed by way of one or more machine learning models to generate the plurality of feature vectors, (this is directed to data processing of the input data and does not integrate the abstract idea into a practical application) and 
	wherein the corresponding entity represented by the respective slot vector comprises one or more of: an object, a surface, a background, a waveform pattern, or one or more words (this is directed to data description of the input data and does not integrate the abstract idea into a practical application)
	Claim 11 recite the following additional elements:
	wherein the perceptual representation comprises one or more of: two-dimensional image data, depth image data, point cloud data, time series data, audio data, or text data, (this is directed to data description of the input data and does not amount to significantly more, see MPEP 2106.05(d)(II))
	wherein the perceptual representation is processed by way of one or more machine learning models to generate the plurality of feature vectors, (this is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) and 
	wherein the corresponding entity represented by the respective slot vector comprises one or more of: an object, a surface, a background, a waveform pattern, or one or more words (this is directed to data description of the input data and does not amount to significantly more, see MPEP 2106.05(d)(II))

 Dependent claim 12 is directed to a method, and falls into one of the four statutory categories.  
	Claim 12 recites the following abstract ideas:
	wherein each respective feature vector of the plurality of feature vectors comprises a position embedding that indicates a portion of the perceptual representation represented by the respective feature vector. (mental process directed to the position of a value)
	Claim 12 do not recite any additional elements.

Dependent claim 13 is directed to a method, and falls into one of the four statutory categories.  
	Claim 13 do not recite any abstract ideas.
	Claim 13 recite the following additional elements:
	wherein the neural network memory unit comprises at least one of: (i) a gated recurrent unit (GRU) or (ii) a long-short term memory neural network (LSTM), (this limitation is directed to a conventional component used to process data and does not integrate the abstract idea into a practical application) and 	
	wherein one or more weights of the neural network memory unit are learned during training. (this limitation is directed to a description of a data stored in a conventional component and does not integrate the abstract idea into a practical application)
	Claim 13 recite the following additional elements:
	wherein the neural network memory unit comprises at least one of: (i) a gated recurrent unit (GRU) or (ii) a long-short term memory neural network (LSTM), and (this limitation is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))	wherein one or more weights of the neural network memory unit are learned during training. (this is directed to data storage by a generic computer equipment and does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

Dependent claim 14 is directed to a method, and falls into one of the four statutory categories.  
	Claim 14 do not recite any abstract.
	Claim 14 recites the following additional elements:
	wherein updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit (this is directed to writing data to memory which is a well understood routine and does not integrate the abstract idea into a practical application) comprises:
	processing the update matrix the by way of the neural network memory unit; (this is directed to high level recitation of generic computer software by a generic computer component. This does not integrate the abstract idea into a practical application)
 and 
	updating the plurality of slot vectors by way of a feed-forward artificial neural network connected to an output of the neural network memory unit (this is directed to writing data to memory which is a well understood routine and does not integrate the abstract idea into a practical application)
	Claim 14 recites the following additional elements:
	wherein updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit (this is directed to writing data to memory which is a well understood routine and conventional function. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv)) comprises:
	processing the update matrix the by way of the neural network memory unit; (this is directed to high level recitation of generic computer software and carried out by a generic computer component. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)) and 
	updating the plurality of slot vectors by way of a feed-forward artificial neural network connected to an output of the neural network memory unit. (this is directed to writing data to memory which is a well understood routine and conventional function. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

 Dependent claim 15 is directed to a method, and falls into one of the four statutory categories.  
	Claim 15 do not recite any abstract ideas.
	Claim 15 recite the following additional elements:
	performing, by one or more machine learning models, a supervised learning task based on the updated plurality of slot vectors, (this is directed to high level recitation of generic computer software and does not integrate the abstract idea into a practical application)
	wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the supervised learning task (this is directed to high level recitation of generic computer software and does not integrate the abstract idea into a practical application)
	Claim 15 recite the following additional elements:
	performing, by one or more machine learning models, a supervised learning task based on the updated plurality of slot vectors, (this is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the supervised learning task (this is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
Dependent claim 16 is directed to a method, and falls into one of the four statutory categories.  
	Claim 16 do not recite any abstract ideas.
	Claim 16 recite the following additional elements:
	performing, by one or more machine learning models, an unsupervised learning task based on the updated plurality of slot vectors, (this is directed to high level recitation of generic computer software and does not integrate the abstract idea into a practical application)
	wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the unsupervised learning task (this is directed to high level recitation of generic computer software and does not integrate the abstract idea into a practical application)
	Claim 16 recite the following additional elements:
	performing, by one or more machine learning models, an unsupervised learning task based on the updated plurality of slot vectors, (this is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the unsupervised learning task (this is directed to high level recitation of generic computer software and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
Dependent claim 17 is directed to a method, and falls into one of the four statutory categories.  
	Claim 17 do not recite any abstract ideas.
	Claim 17 recite the following additional elements:
	wherein initializing the plurality of slot vectors comprises: initializing the plurality of slot vectors based on one or more of: (i) values selected from a normal distribution or (ii) values of one or more preceding slot vectors determined for a preceding perceptual representation processed before the perceptual representation, (this is directed to selecting a particular data to be manipulated and does not integrate the abstract idea into a practical application)
	wherein initializing the plurality of slot vectors based on the values of the one or more preceding slot vectors causes the plurality of slot vectors to track entities across successive perceptual representations (this is directed to data storage in memory which is a well understood routine and does not integrate the abstract idea into a practical application)
	Claim 17 recite the following additional elements:
	wherein initializing the plurality of slot vectors comprises: initializing the plurality of slot vectors based on one or more of: (i) values selected from a normal distribution or (ii) values of one or more preceding slot vectors determined for a preceding perceptual representation processed before the perceptual representation, (this is directed to selecting a particular data to be manipulated and does not amount to significantly more, see MPEP 2106.05(g))
	wherein initializing the plurality of slot vectors based on the values of the one or more preceding slot vectors causes the plurality of slot vectors to track entities across successive perceptual representations this is directed to data storage in memory which is a well understood routine and conventional function. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

Dependent claim 18 is directed to a method, and falls into one of the four statutory categories.  
	Claim 18 recite the following abstract ideas:
	wherein, when a number of slot vectors in the plurality of slot vectors exceeds a number of entities contained in the perceptual representation, (mental process directed to comparing values)
	values of one or more slot vector of the plurality of slot vectors are configured to indicate that the one or more slot vectors are unused, (mental process of assigning a value to a vector) and
	wherein when the number of entities contained in the perceptual representation exceeds the number of slot vectors in the plurality of slot vectors, at least one slot vector of the plurality of slot vectors is configured to represent multiple corresponding entities contained in the perceptual representation. (mental process directed to comparing values)
	Claim 18 do not recite any additional elements:
	 
Independent claim 19 is directed to a system, and falls into one of the four statutory categories.  
	Claim 19 recites the following abstract ideas:
	determining an attention matrix based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (mathematical concept directed to mathematical calculations of a dot product)
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix is normalized with respect to the plurality of values along the respective dimension; (mathematical concept directed to mathematical calculations using a softmax function to normalize the dimensions of the attention matrix)
	determining an update matrix based on (i) the plurality of feature vectors transformed by a value function and (ii) the attention matrix; (mathematical concept directed to mathematical calculations of matrix based on a value function and an attention matrix) and
	Claim 19 recites the following additional elements:
	a processor; (this is directed to high level recitation of generic computer component and does not integrate the abstract idea into a practical application) and
	receiving a perceptual representation comprising a plurality of feature vectors; (this limitation is directed to data transmission of feature vectors. This is directed to data transmission and does not integrate the abstract idea into a practical application)
	initializing a plurality of slot vectors represented by a neural network memory unit, (this is directed to data storage by a generic computer equipment and does not integrate the abstract idea into a practical application)
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (this is directed to data description of the input data and does not integrate the abstract idea into a practical application)
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit (this is directed writing data to memory which is a well understood routine and does not integrate the abstract idea into a practical application)
	Claim 19 recites the following additional elements:
	a processor; (this is directed to high level recitation of generic computer component and does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	receiving a perceptual representation comprising a plurality of feature vectors; (this limitation is directed to data transmission of feature vectors. This is directed to data transmission and does not amount to significantly more, see MPEP 2106.05(d)(II)(i))
	initializing a plurality of slot vectors represented by a neural network memory unit, (this is directed to data storage by a generic computer equipment and does not integrate the abstract idea into a practical application. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (this is directed to data description of the input data and does not amount to significantly more, see MPEP 2106.05(d)(II))
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit (this is directed writing data to memory which is a well understood routine and conventional function. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))

 Independent claim 20 is directed to a system, and falls into one of the four statutory categories.  
	Claim 20 recites the following abstract ideas:
	determining an attention matrix based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (mathematical concept directed to mathematical calculations of a dot product)
	 wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix is normalized with respect to the plurality of values along the respective dimension; (mathematical concept directed to mathematical calculations using a softmax function to normalize the dimensions of the attention matrix)
	determining an update matrix based on (i) the plurality of feature vectors transformed by a value function and (ii) the attention matrix; (mathematical concept directed to mathematical calculations of matrix based on a value function and an attention matrix) and
	Claim 20 recites the following additional elements:
	receiving a perceptual representation comprising a plurality of feature vectors; (this limitation is directed to data transmission of feature vectors. This is directed to data transmission and does not integrate the abstract idea into a practical application)
	initializing a plurality of slot vectors represented by a neural network memory unit, (this is directed to data storage by a generic computer equipment and does not integrate the abstract idea into a practical application)
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (this is directed to data description of the input data and does not integrate the abstract idea into a practical application)
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit. (this is directed writing data to memory which is a well understood routine and does not integrate the abstract idea into a practical application)
	Claim 20 recites the following additional elements:
	receiving a perceptual representation comprising a plurality of feature vectors; (this limitation is directed to data transmission of feature vectors. This is directed to data transmission and does not amount to significantly more, see MPEP 2106.05(d)(II)(i))
	initializing a plurality of slot vectors represented by a neural network memory unit, (this is directed to data storage by a generic computer equipment and does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (this is directed to data description of the input data and does not amount to significantly more, see MPEP 2106.05(d)(II))
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit (this is directed writing data to memory which is a well understood routine and conventional function. This does not amount to significantly more, see MPEP 2106.05(d)(II)(iv))
	
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


23.	Claims 1-20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	It is unclear in claims 1, 19 and 20, the following recited limitations, “each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix …”. A person of ordinary skill in the art will understand a dimension of a matrix to be synonymous to size which is number of rows and columns of the matrix. However, it is unclear what the limitation “each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix …” means. It is not clear if the Applicant is referring to the number of rows or number of columns of a matrix.
	For the purpose of examination, the Office is interpreting each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix, as either the respective values along the rows or the respective values along the columns.
	It is unclear in claim 2, the following recited limitations, “… each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the second attention matrix …”. A person of ordinary skill in the art will understand a dimension of a matrix to be synonymous to size which is number of rows and columns of the matrix. However, it is unclear what the limitation “… each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the second attention matrix …” means. It is not clear if the Applicant is referring to the number of rows or number of columns of a matrix.
	For the purpose of examination, the Office is interpreting each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the second attention matrix, as either the respective values along the rows or the respective values along the columns.
	It is unclear in claims 4 and 5, the following recited limitations, “…  each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values along the respective dimension …”. A person of ordinary skill in the art will understand a dimension of a matrix to be synonymous to size which is number of rows and columns of the matrix. However, it is unclear what the limitation “… each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values along the respective dimension” means. It is not clear if the Applicant is referring to the number of rows or number of columns of a matrix.
	For the purpose of examination, the Office is interpreting each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values along the respective dimension, as either the respective values along the rows or the respective values along the columns.
	It is unclear in claim 7, the following recited limitations, “…  each respective value of N values along each respective dimension of a second plurality of dimensions of the attention matrix …”. A person of ordinary skill in the art will understand a dimension of a matrix to be synonymous to size which is number of rows and columns of the matrix. However, it is unclear what the limitation “… each respective value of N values along each respective dimension of a second plurality of dimensions of the attention matrix …” means. It is not clear if the Applicant is referring to the number of rows or number of columns of a matrix.
	For the purpose of examination, the Office is interpreting each respective value of N values along each respective dimension of a second plurality of dimensions of the attention matrix, as either the respective values along the rows or the respective values along the columns.
	It is unclear in claim 5, the following recited limitations, “normalizing … causes the plurality of slot vectors to compete with one another for representing entities contained in the perceptual representation”. According to the published instant specification (US20210383199), [0023], [0065], [0105] and [0109] appears to mention “slot vector compete with one another” but does not give any clarification of what that means.
	It is unclear in claim 15, the following recited limitations, “… wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the supervised learning task”. A person of ordinary skill in the art will understand that it requires two or more machine learning models to be jointly trained with functions to perform the supervised learning task. It is unclear how one machine learning model is jointly trained with functions to perform supervised learning task.
	It is unclear in claim 16, the following recited limitations, “… wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the unsupervised learning task”. A person of ordinary skill in the art will understand that it requires two or more machine learning models to be jointly trained with functions to perform the unsupervised learning task. It is unclear how one machine learning model is jointly trained with functions to perform the unsupervised learning task.
	The claims that are not mention are rejected due to dependency.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



24.	Claims 1, 2, 11-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Herdade et al (US20210201044 filed 12/30/2019) in view of Byeon et al (US20210089867 filed 9/24/2019) and further in view of Nashida et al. (US20220043972 filed 12/17/2019)

	Regarding claim 1, Herdade teaches a computer-implemented method comprising: receiving a perceptual representation comprising a plurality of feature vectors; (receiving appearance features 618 (e.g., comprising an n-dimensional appearance feature vector [0106], Fig. 7); input 800 can comprise feature vectors, Fig. 8 [0114]; Reference is made to FIG. 5, which provides an example of digital image 502 input [0100]. Examiner notes that the image is a perceptual representation)
	initializing a plurality of slot vectors (appearance features 618 are input to projection layer 702 … Projection layer 702 receives appearance features 618 and generates a set of tokens 704 [0109] Fig. 7; embedded appearance feature vectors which are also referred to herein as input tokens, e.g., N input tokens corresponding to N objects detected in image 602 [0112];  the ORT's encoder comprises multiple layers comprising an initial layer (referred to herein as a projection layer of the encoder) [0054]; Examiner notes the generated tokens are slot vectors which are similar to slot vectors generated by slot attention module)
	represented by a neural network memory unit, (the ORT's encoder comprises multiple layers comprising an initial layer (referred to herein as a projection layer 702 of the encoder) [0054] Fig. 7. Examiner notes the encoder (as memory unit) Fig. 7 is synonymous to the slot attention module including a memory unit) 
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (…, the tokens are a set of word embeddings summed with their respective positional encodings, where the embeddings correspond to the words of the partial image caption generated so far by decoder 614 [0146]. Examiner notes that the token word embeddings are slot vectors and the corresponding entity represented by the respective slot vector are words contained in the image)
	determining an attention matrix (appearance-based attention weight 902, Fig. 9; In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder [0144];  In addition, decoder layer 1104 comprises an encoder-decoder attention layer 1110, which assists the decoder to focus on relevant objects using the attention weight matrix (e.g., matrix 902 of FIG. 9) [0143]; Each attention head (of the self-attention layer 804) determines a set of vectors comprising a query Q, key K and value V for each of the N tokens [0116])
	based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (By way of a non-limiting example, an appearance-based attention weight ωA mn for an ordered pair of objects (e.g., the mth and nth objects) can be determined by determining a query vector and a key vector for each object of the ordered pair, determining a score for the ordered pair of objects (e.g., the mth and nth objects) by combining (e.g., combining using a dot product) the mth object's query vector and the nth object's key vector, and then dividing the result by a constant, …  the query vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned query matrix WQ. In accordance with one or more embodiments, the key vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned key matrix WK [0121]) 
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix (each respective value of a cell in 902 of multiple values, along each respective row of the plurality of rows of the appearance-based attention weight matrix 902, Fig. 9)
	is normalized with respect to the plurality of values along the respective dimension; (In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder… following a standard procedure of doing inner products of queries by keys and normalizing, with the difference that the queries come from the decoder, and the keys from the encoder [0144]; In addition, decoder layer 1104 comprises an encoder-decoder attention layer 1110, which assists the decoder to focus on relevant objects using the attention weight matrix (e.g., matrix 902 of FIG. 9) [0143]; This implies the attention weight matrix used by encoder-decoder attention layer 1110 is normalized) 
	Herdade does not explicitly teach determining an update matrix based on (i) the plurality of feature vectors transformed by a value function and (ii) the attention matrix; and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit, initializing a plurality of slot vectors
	Byeon teaches receiving a perceptual representation comprising a plurality of feature vectors; (The input sequence may be any sequence of data, such as a sequence of video frames [0019]; In the context of video frames, computing attention using the hidden states (instead of input frames) takes into account the spatio-temporal context of each frame in addition to the pixel-level information in the frame itself, which is more suitable for computing frame-level attention for videos [0023]. Examiner notes a frame which is a single image of a video is the perceptual representation and the pixels the image of a video are feature vectors)
	neural network memory unit (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	determining an attention matrix (This attention mechanism can be formulated as Att(Q, K, V)=softmax(WQQ·WQK)·WQV. It consists of queries (Q), keys (K) and values (V) [0041] Fig. 3)
	based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (It computes the dot products of the queries and the keys … The queries, keys, and values can be optionally transformed by the WQ, WK, and WV [0041] Fig. 3. Examiner notes WK (as key weight matrix) is an example of a key function and WQ (as query weight matrix) as an example of query function)
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix is normalized with respect to the plurality of values along the respective dimension; (Finally, the values (V) are weighted by the outputs of the softmax function. The queries, keys, and values can be optionally transformed by the WQ, WK, and WV matrices [0041] Fig. 3; Examiner notes that the attention matrix is normalized by the softmax function)
	determining an update matrix (Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1 [0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053])
	 based on (i) the plurality of feature vectors transformed by a value function (Finally, the values (V) are weighted by the outputs of the softmax function. The values can be optionally transformed by the WV matrix [0041]. Examiner notes Wv(as value weight matrix) is an example of a value function) and 
	(ii) the attention matrix; (determine an update matrix by computing a second dot product of the attention matrix (output of dot product of the queries and the keys) and the value matrix, Fig. 3) and 
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit. (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Herdade to incorporate the method of Byeon for the benefit of recurrent neural networks used for future state prediction [0001] and modelling long-term dependencies in sequential data (Byeon [0017])
	Nashida teaches initializing a plurality of slot vectors (vectors were initialized using a normal distribution [0214])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Herdade to incorporate the method of Nashida for the benefit of using a neural network to calculate an attention matrix from the vector sequences (Nashida [0069])

	Regarding claim 2, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches further comprising: determining a second attention matrix (geometry-based attention weight matrix 904) Fig. 9)
	 based on a product of (i) the plurality of feature vectors transformed by the key function and (ii) the plurality of updated slot vectors transformed by the query function, (By way of a non-limiting example, an appearance-based attention weight ωA mn for an ordered pair of objects (e.g., the mth and nth objects) can be determined by determining a query vector and a key vector for each object of the ordered pair, determining a score for the ordered pair of objects (e.g., the mth and nth objects) by combining (e.g., combining using a dot product) the mth object's query vector and the nth object's key vector, and then dividing the result by a constant, …  the query vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned query matrix WQ. In accordance with one or more embodiments, the key vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned key matrix WK [0121]) 
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the second attention matrix (each respective value of a cell in 902 of multiple values, along each respective row of the plurality of rows of the geometry-based attention weight matrix 904, Fig. 9)
	is normalized with respect to the plurality of values along the respective dimension of the second attention matrix; (In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder… following a standard procedure of doing inner products of queries by keys and normalizing, with the difference that the queries come from the decoder, and the keys from the encoder [0144])
	Byeon teaches determining a second update matrix ((Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1 [0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053])
	based on (i) the plurality of feature vectors transformed by the value function (Finally, the values (V) are weighted by the outputs of the softmax function. The values can be optionally transformed by the WV matrix [0041]. Examiner notes Wv(as value weight matrix) is an example of a value function) and 	
	(ii) the second attention matrix; (determine an update matrix by computing a second dot product of the attention matrix (output of dot product of the queries and the keys) and the value matrix, Fig. 3) and
	 further updating the plurality of updated slot vectors based on the second update matrix by way of the neural network memory unit (Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1[0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053])
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 8, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein the plurality of feature vectors are represented by an input matrix comprising: (i) N rows corresponding to a number of the plurality of feature vectors and (ii) I columns corresponding to a number of dimensions of each of the plurality of feature vectors, ((Each attention head (of the self-attention layer 804) determines a set of vectors comprising a query Q, key K and value V for each of the N tokens. An exemplary expression follows: Q=XW Q ,K=XW K ,V=XW v, where X comprises the N input vectors (e.g., x1 . . . xn) stacked into a matrix [0116-0117]. Examiner notes that N input vectors (e.g., x1 . . . xn) stacked into a matrix comprises rows and columns)
	wherein the plurality of slot vectors (embedded appearance feature vectors which are also referred to herein as input tokens, e.g., N input tokens corresponding to N objects detected in image 602 [0112])
	are represented by a slot matrix (matrix 908, Fig. 9) comprising: 
	(ii) K rows corresponding to a number of the plurality of slot vectors (matrix 908 has 4 rows, Fig.9) and 
	(ii) S columns corresponding to a number of dimensions of each of the plurality of slot vectors, (matrix 908 has 4 columns, Fig. 9) 
	wherein the key function comprises a linear transformation represented by a key weight matrix comprising I rows and D columns, wherein the query function comprises a linear transformation represented by a query weight matrix comprising S rows and D columns, wherein the value function comprises a linear transformation represented by a value weight matrix comprising I rows and D columns, (key matrix WK [0121] is a key weight matrix WKey which is synonymous to key function , and query WQ [0121] is a query weight matrix WQUERY which is synonymous to query function, value matrix Wv [0122] is a value weight matrix Wvalue which is synonymous to value function) and 
	wherein one or more of the key weight matrix, the query weight matrix, or the value weight matrix are learned during training (WQ, WK, Wv are learned projection matrices [0117])

	Regarding claim 11, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein the perceptual representation comprises one or more of: two-dimensional image data, depth image data, point cloud data, time series data, audio data, or text data, wherein the perceptual representation is processed by way of one or more machine learning models to generate the plurality of feature vectors, (input 800 can comprise 5vectors [0114] Fig. 8; FIG. 5, which provides an example of digital image 502 input [0100];  input 800 can comprise feature vectors … generated by a previous encoder layer [0114]) and 
	wherein the corresponding entity represented by the respective slot vector comprises one or more of: an object, a surface, a background, a waveform pattern, or one or more words (…, the tokens are a set of word embeddings summed with their respective positional encodings, where the embeddings correspond to the words of the partial image caption generated so far by decoder 614 [0146]. Examiner notes that the token word embeddings are slot vectors and the corresponding entity represented by the respective slot vector are words contained in the image.) 

	Regarding claim 12, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein each respective feature vector of the plurality of feature vectors comprises a position embedding that indicates a portion of the perceptual representation represented by the respective feature vector (Examples of object spatial relationship information include relative position and relative size of objects depicted in digital image content [0005]; … , a learned positional embedding can be used [0130]; In accordance with one or more embodiments, positional encoding functions can be used with high-dimensional embedding in the decoder 614. [0141])

	Regarding claim 13, Modified Herdade teaches the computer-implemented method of claim 1, Byeon teaches wherein the neural network memory unit comprises at least one of: (i) a gated recurrent unit (GRU) or (ii) a long-short term memory neural network (LSTM), (In various embodiments, the history recurrent neural network may be a long short-term memory (LSTM) network, a convolutional long short-term memory (ConvLSTM) network, a gated recurrent unit (GRU) network [0021]) and
	 wherein one or more weights of the neural network memory unit are learned during training. (By using LSTM connectivities across the temporal and spatial dimensions, each processing layer covers the entire context in a video or other input sequence. Weighted blending, directional weight sharing (DWS), and two skip connections between the layers 1-3 and 2-4 may also be included [0048]; The weighted blending layer learns the relative importance of each direction during training [0049]) 
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 14, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit comprises: processing the update matrix by way of the neural network memory unit; (output of the last encoder layer (e.g., encoder layer 6 corresponding to element 708 of Fig. 7) of encoder 612 [0140]); and 
	updating the plurality of slot vectors by way of a feed-forward artificial neural network connected to an output of the neural network memory unit (the output of the encoder is also used as input when generating each word [0141]; output of last encoder layer 6 is received by decoder, Fig. 10, which comprises feedforward neural layer 1112, Fig. 11; Examiner notes the encoder (as memory unit), Fig. 7 is synonymous to the slot attention module including a memory unit)
	Byeon teaches neural network memory unit (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 15, Modified Herdade teaches the computer-implemented method of claim 1, further comprising: Byeon teaches performing, by one or more machine learning models, a supervised learning task based on the updated plurality of slot vectors, (In at least one embodiment, training may be performed in either a supervised manner [0068])
	wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the supervised learning task. (The present disclosure provides a recurrent neural network architecture involving two recurrent neural networks that operate in combination to model long-term dependencies in sequential data [0017]; The HistAtt unit uses a dot-product-based self-attention mechanism. This attention mechanism can be formulated as Att(Q, K, V)=softmax(WQQ·WQK)·WQV. It consists of queries (Q), keys (K) and values (V) [0041])
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 16, Modified Herdade teaches the computer-implemented method of claim 1, further comprising: Byeon teaches performing, by one or more machine learning models, an unsupervised learning task based on the updated plurality of slot vectors, (In at least one embodiment, training may be performed in a unsupervised manner [0068]) 
	wherein the one or more machine learning models are jointly trained with one or more of the key function, the query function, the value function, or the neural network memory unit to perform the unsupervised learning task.  (The present disclosure provides a recurrent neural network architecture involving two recurrent neural networks that operate in combination to model long-term dependencies in sequential data [0017]; The HistAtt unit uses a dot-product-based self-attention mechanism. This attention mechanism can be formulated as Att(Q, K, V)=softmax(WQQ·WQK)·WQV. It consists of queries (Q), keys (K) and values (V) [0041])
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 17, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein initializing the plurality of slot vectors based on the values of the one or more preceding slot vectors causes the plurality of slot vectors to track entities across successive perceptual representations (the decoder 614 generates one word of the output caption (e.g., caption 616) at a time, successively from left to right. When the ORT 610 is generating a word at a given position, all of the words that have been generated so far (from previous positions) are fed as input [0141]; user profiles specific to a user may be generated to model user behavior, for example, by tracking a user's path through a web site or network of sites [0073])  
	Nashida teaches wherein initializing the plurality of slot vectors comprises: initializing the plurality of slot vectors based on one or more of: (i) values selected from a normal distribution or (ii) values of one or more preceding slot vectors determined for a preceding perceptual representation processed before the perceptual representation, (vectors were initialized using a normal distribution [0214]. Examiner notes the values are selected from normal distribution)
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 18, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein, when a number of slot vectors in the plurality of slot vectors exceeds a number of entities contained in the perceptual representation, (image content from one or more regions outside an object's bounding box can be used in determining intermediate features that lie within the bounding box [0053])
	values of one or more slot vector of the plurality of slot vectors are configured to indicate that the one or more slot vectors are unused, (a class prediction probability associated with each bounding box can be used to determine whether to retain or discard a proposed bounding box [0102]) and 
	wherein when the number of entities contained in the perceptual representation exceeds the number of slot vectors in the plurality of slot vectors, (image content from one or more regions outside an object's bounding box can be used in determining intermediate features that lie within the bounding box [0053])
	 at least one slot vector of the plurality of slot vectors is configured to represent multiple corresponding entities contained in the perceptual representation (In addition, the process involves using the appearance and geometry features to generate encoded output which is then decoded in order to automatically generate a caption (comprising a sequence of words) for the digital content item [0096]; “translate” a set of objects detected in a digital content item to generate a caption comprising a sequence of words [0148]. Examiner notes that words entities contained in digital content are images (as perceptual representation))  

	Regarding claim 19, Herdade teaches a system comprising: a processor; and a non-transitory computer-readable storage medium having stored thereon instruction that, when executed by the processor, cause the processor to perform operations (In accordance with one or more embodiments, program code (or program logic) executed by a processor(s) of a computing device to implement functionality in accordance with one or more such embodiments is embodied in, by and/or on a non-transitory computer-readable medium [0015]) comprising: 
	receiving a perceptual representation comprising a plurality of feature vectors; (receiving appearance features 618 (e.g., comprising an n-dimensional appearance feature vector [0106], Fig. 7); input 800 can comprise feature vectors, Fig. 8 [0114]; Reference is made to FIG. 5, which provides an example of digital image 502 input [0100]. Examiner notes that the image is a perceptual representation)
	initializing a plurality of slot vectors (appearance features 618 are input to projection layer 702 … Projection layer 702 receives appearance features 618 and generates a set of tokens 704 [0109] Fig. 7. Examiner notes the generated tokens are slot vectors which are similar to slot vectors generated by slot attention module)
 	represented by a neural network memory unit, (the ORT's encoder comprises multiple layers comprising an initial layer (referred to herein as a projection layer 702 of the encoder) [0054] Fig. 7. Examiner notes the encoder (as memory unit) Fig. 7 is synonymous to the slot attention module including a memory unit) 
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (…, the tokens are a set of word embeddings summed with their respective positional encodings, where the embeddings correspond to the words of the partial image caption generated so far by decoder 614 [0146]. Examiner notes that the token word embeddings are slot vectors and the corresponding entity represented by the respective slot vector are words contained in the image)
	determining an attention matrix (appearance-based attention weight 902, Fig. 9; In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder [0144];  In addition, decoder layer 1104 comprises an encoder-decoder attention layer 1110, which assists the decoder to focus on relevant objects using the attention weight matrix (e.g., matrix 902 of FIG. 9) [0143]; Each attention head (of the self-attention layer 804) determines a set of vectors comprising a query Q, key K and value V for each of the N tokens [0116])
	based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (By way of a non-limiting example, an appearance-based attention weight ωA mn for an ordered pair of objects (e.g., the mth and nth objects) can be determined by determining a query vector and a key vector for each object of the ordered pair, determining a score for the ordered pair of objects (e.g., the mth and nth objects) by combining (e.g., combining using a dot product) the mth object's query vector and the nth object's key vector, and then dividing the result by a constant, …  the query vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned query matrix WQ. In accordance with one or more embodiments, the key vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned key matrix WK [0121]) 
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix (each respective value of a cell in 902 of multiple values, along each respective row of the plurality of rows of the appearance-based attention weight matrix 902, Fig. 9)
	is normalized with respect to the plurality of values along the respective dimension; (In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder… following a standard procedure of doing inner products of queries by keys and normalizing, with the difference that the queries come from the decoder, and the keys from the encoder [0144]; In addition, decoder layer 1104 comprises an encoder-decoder attention layer 1110, which assists the decoder to focus on relevant objects using the attention weight matrix (e.g., matrix 902 of FIG. 9) [0143]; This implies the attention weight matrix used by encoder-decoder attention layer 1110 is normalized) 	Herdade does not explicitly teach determining an update matrix based on (i) the plurality of feature vectors transformed by a value function and (ii) the attention matrix; and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit, initializing a plurality of slot vectors.
	Byeon teaches receiving a perceptual representation comprising a plurality of feature vectors; (The input sequence may be any sequence of data, such as a sequence of video frames [0019]; In the context of video frames, computing attention using the hidden states (instead of input frames) takes into account the spatio-temporal context of each frame in addition to the pixel-level information in the frame itself, which is more suitable for computing frame-level attention for videos [0023]. Examiner notes a frame which is a single image of a video is the perceptual representation and the pixels the image of a video are feature vectors)
	neural network memory unit (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	determining an attention matrix (This attention mechanism can be formulated as Att(Q, K, V)=softmax(WQQ·WQK)·WQV. It consists of queries (Q), keys (K) and values (V) [0041] Fig. 3)
	based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (It computes the dot products of the queries and the keys … The queries, keys, and values can be optionally transformed by the WQ, WK, and WV [0041] Fig. 3. Examiner notes WK (as key weight matrix) is an example of a key function and WQ (as query weight matrix) as an example of query function)
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix is normalized with respect to the plurality of values along the respective dimension; (Finally, the values (V) are weighted by the outputs of the softmax function. The queries, keys, and values can be optionally transformed by the WQ, WK, and WV matrices [0041] Fig. 3; Examiner notes that the attention matrix is normalized by the softmax function)
	determining an update matrix (Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1 [0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053])
	 based on (i) the plurality of feature vectors transformed by a value function (Finally, the values (V) are weighted by the outputs of the softmax function. The values can be optionally transformed by the WV matrix [0041]. Examiner notes Wv(as value weight matrix) is an example of a value function) and 
	(ii) the attention matrix; (determine an update matrix by computing a second dot product of the attention matrix (output of dot product of the queries and the keys) and the value matrix, Fig. 3) and 
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit. (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Herdade to incorporate the method of Byeon for the benefit of recurrent neural networks used for future state prediction [0001] and modelling long-term dependencies in sequential data (Byeon [0017])
	Nashida teaches initializing a plurality of slot vectors (vectors were initialized using a normal distribution [0214])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Herdade to incorporate the method of Nashida for the benefit of using a neural network to calculate an attention matrix from the vector sequences (Nashida [0069])

	Regarding claim 20, Herdade a non-transitory computer-readable storage medium having stored thereon instruction that, when executed by a computing system, cause the computing system to perform operations (In accordance with one or more embodiments, program code (or program logic) executed by a processor(s) of a computing device to implement functionality in accordance with one or more such embodiments is embodied in, by and/or on a non-transitory computer-readable medium [0015]) comprising: 
	receiving a perceptual representation comprising a plurality of feature vectors; (receiving appearance features 618 (e.g., comprising an n-dimensional appearance feature vector [0106], Fig. 7); input 800 can comprise feature vectors, Fig. 8 [0114]; Reference is made to FIG. 5, which provides an example of digital image 502 input [0100]. Examiner notes that the image is a perceptual representation)
	initializing a plurality of slot vectors (appearance features 618 are input to projection layer 702 … Projection layer 702 receives appearance features 618 and generates a set of tokens 704 [0109] Fig. 7. Examiner notes the generated tokens are slot vectors which are similar to slot vectors generated by slot attention module)
	represented by a neural network memory unit, (the ORT's encoder comprises multiple layers comprising an initial layer (referred to herein as a projection layer 702 of the encoder) [0054] Fig. 7. Examiner notes the encoder (as memory unit) Fig. 7 is synonymous to the slot attention module including a memory unit) 
	wherein each respective slot vector of the plurality of slot vectors is configured to represent a corresponding entity contained in the perceptual representation; (…, the tokens are a set of word embeddings summed with their respective positional encodings, where the embeddings correspond to the words of the partial image caption generated so far by decoder 614 [0146]. Examiner notes that the token word embeddings are slot vectors and the corresponding entity represented by the respective slot vector are words contained in the image)
	determining an attention matrix (appearance-based attention weight 902, Fig. 9; In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder [0144];  In addition, decoder layer 1104 comprises an encoder-decoder attention layer 1110, which assists the decoder to focus on relevant objects using the attention weight matrix (e.g., matrix 902 of FIG. 9) [0143]; Each attention head (of the self-attention layer 804) determines a set of vectors comprising a query Q, key K and value V for each of the N tokens [0116])
	based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (By way of a non-limiting example, an appearance-based attention weight ωA mn for an ordered pair of objects (e.g., the mth and nth objects) can be determined by determining a query vector and a key vector for each object of the ordered pair, determining a score for the ordered pair of objects (e.g., the mth and nth objects) by combining (e.g., combining using a dot product) the mth object's query vector and the nth object's key vector, and then dividing the result by a constant, …  the query vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned query matrix WQ. In accordance with one or more embodiments, the key vector determination for an object can comprise combining (e.g., via matrix multiplication) the object's n-dimensional appearance feature vector and the learned key matrix WK [0121]) 
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix (each respective value of a cell in 902 of multiple values, along each respective row of the plurality of rows of the appearance-based attention weight matrix 902, Fig. 9)
	is normalized with respect to the plurality of values along the respective dimension; (In accordance with some embodiments, encoder-decoder attention layer 1110 makes use of an attention weight matrix, keys and values from the final output of the encoder, and the queries from the decoder… following a standard procedure of doing inner products of queries by keys and normalizing, with the difference that the queries come from the decoder, and the keys from the encoder [0144]; In addition, decoder layer 1104 comprises an encoder-decoder attention layer 1110, which assists the decoder to focus on relevant objects using the attention weight matrix (e.g., matrix 902 of FIG. 9) [0143]; This implies the attention weight matrix used by encoder-decoder attention layer 1110 is normalized) 	Herdade does not explicitly teach determining an update matrix based on (i) the plurality of feature vectors transformed by a value function and (ii) the attention matrix; and updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit,  initializing a plurality of slot vectors.
	Byeon teaches receiving a perceptual representation comprising a plurality of feature vectors; (The input sequence may be any sequence of data, such as a sequence of video frames [0019]; In the context of video frames, computing attention using the hidden states (instead of input frames) takes into account the spatio-temporal context of each frame in addition to the pixel-level information in the frame itself, which is more suitable for computing frame-level attention for videos [0023]. Examiner notes a frame which is a single image of a video is the perceptual representation and the pixels the image of a video are feature vectors)
	neural network memory unit (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	determining an attention matrix (This attention mechanism can be formulated as Att(Q, K, V)=softmax(WQQ·WQK)·WQV. It consists of queries (Q), keys (K) and values (V) [0041] Fig. 3)
	based on a product of (i) the plurality of feature vectors transformed by a key function and (ii) the plurality of slot vectors transformed by a query function, (It computes the dot products of the queries and the keys … The queries, keys, and values can be optionally transformed by the WQ, WK, and WV [0041] Fig. 3. Examiner notes WK (as key weight matrix) is an example of a key function and WQ (as query weight matrix) as an example of query function)
	wherein each respective value of a plurality of values along each respective dimension of a plurality of dimensions of the attention matrix is normalized with respect to the plurality of values along the respective dimension; (Finally, the values (V) are weighted by the outputs of the softmax function. The queries, keys, and values can be optionally transformed by the WQ, WK, and WV matrices [0041] Fig. 3; Examiner notes that the attention matrix is normalized by the softmax function)
	determining an update matrix (Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1 [0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053])
	 based on (i) the plurality of feature vectors transformed by a value function (Finally, the values (V) are weighted by the outputs of the softmax function. The values can be optionally transformed by the WV matrix [0041]. Examiner notes Wv(as value weight matrix) is an example of a value function) and 
	(ii) the attention matrix; (determine an update matrix by computing a second dot product of the attention matrix (output of dot product of the queries and the keys) and the value matrix, Fig. 3) and 
	updating the plurality of slot vectors based on the update matrix by way of the neural network memory unit. (The history recurrent neural network 202 is a LSTM networks [0029]; one being a History LSTM (H-LSTM) [0035]; the H-LSTM may be formulated to include an attention mechanism (HistAtt) as shown in Equation 2 below [0037] that outputs HistAtt(Hk-1, Hk-m:k-2), Fig. 3, is updated by Update recurrent neural network 204, Fig. 2)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Herdade to incorporate the method of Byeon for the benefit of recurrent neural networks used for future state prediction [0001] and modelling long-term dependencies in sequential data (Byeon [0017])
	Nashida teaches initializing a plurality of slot vectors (vectors were initialized using a normal distribution [0214])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Herdade to incorporate the method of Nashida for the benefit of using a neural network to calculate an attention matrix from the vector sequences (Nashida [0069])

25.	Claims 3, 7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Herdade et al (US20210201044 filed 12/30/2019) in view of Byeon et al (US20210089867 filed 9/24/2019) in view of Nashida et al. (US20220043972 filed 12/17/2019) and further in view of Rosset et al. (US20210326742 filed 04/16/2020)

	Regarding claim 3, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein each respective slot vector represents a semantic embedding of the corresponding entity, (the tokens are a set of word embeddings summed with their respective positional encodings, where the embeddings correspond to the words of the partial image caption generated so far by decoder 614 [0146]. The Examiner notes that the token word embeddings are slot vectors and the corresponding entity represented by the respective slot vector are words contained in the image) 
	the semantic embedding of the corresponding entity and binds the respective slot vector to the corresponding entity independently of a classification of the corresponding entity (In some embodiments, additional CNN layers can be applied to predict class labels and make bounding box [0101];  By way of a representation, the nth token (corresponding to the nth embedded appearance feature vector of the nth bounding box)) can be referred to as xn in a set of N tokens [0112])
	Modified Herdade does not explicitly teach wherein updating the respective slot vector iteratively refines
	Rosset teaches wherein updating the respective slot vector iteratively refines (For example, given the first two tokens of the input query, the generative-type SGC 504 will predict the third token. In the next iteration of processing, however, the SGS 502 updates the input information to include the actual third token of the input query [0077]; a linguistic embedding mechanism 802 transforms the tokens in the input information into a set of input embeddings, also referred to herein as input vectors [0078])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Herdade to incorporate the method of Rosset for the benefit of a self-attention mechanism 808 which then linearly projects this matrix X into three matrices Q, K, V, corresponding to a query matrix, key matrix, and value matrix, respectively [0082] and  a speech recognition component converts the input signal into recognized speech information, e.g., using a deep neural network (DNN) of any type (Rosset [0046])

	Regarding claim 7, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein the plurality of feature vectors comprises N feature vectors, (the encoded output that is input to decoder 614 can comprise a set of feature vectors comprising a feature vector for each detected object [0107]) 
	wherein the plurality of dimensions is a first plurality of dimensions comprising N dimensions, (the plurality of rows of the appearance-based attention weight matrix 902 is a first plurality of rows comprising N rows, Fig. 9) and 
	Byeon teaches wherein determining the update matrix (Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1 [0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053]) comprises: 
	Modified Herdade does not explicitly teach determining an attention weight matrix by dividing (i) each respective value of N values along each respective dimension of a second plurality of dimensions of the attention matrix by (ii) a sum of the N values along the respective dimension of the second plurality of dimensions; and determining a product of (i) the plurality of feature vectors transformed by the value function and (ii) a transpose of the attention weight matrix.  
	Rosset teaches determining an attention weight matrix by dividing (i) each respective value of N values along each respective dimension of a second plurality of dimensions of the attention matrix by (ii) a sum of the N values along the respective dimension of the second plurality of dimensions; (The add-and-normalize mechanism 810 adds the input to the self-attention mechanism 808 (i.e., the position-modified input embeddings) to the output result of the self-attention mechanism 808, and then performs layer-normalization on that sum [0084]; The self-attention mechanism 808 can determine the above-described cross-term relevance by packing the position-modified embeddings into a single matrix X.[0082]. Examiner notes matrix X comprises columns as second plurality of dimensions) and 
	determining a product of (A dot-product mechanism computes attention based on the equation: [0082])
	 (i) the plurality of feature vectors transformed by the value function and (ii) a transpose of the attention weight matrix. (
				
    PNG
    media_image1.png
    69
    244
    media_image1.png
    Greyscale

[0082]. Examiner notes that in the equation: T is the transpose of (product of query Q and key K which is the attention matrix) and transformation by value function is v)
	The same motivation to combine dependent claim 3 applies here.

	Regarding claim 9, Modified Herdade teaches the computer-implemented method of claim 8, Byeon teaches wherein determining the attention matrix (This attention mechanism can be formulated as Att(Q, K, V)=softmax(WQQ·WQK)·WQV. It consists of queries (Q), keys (K) and values (V) [0041] Fig. 3)
	 Modified Herdade does not explicitly teach based on the product comprises: determining a dot product of (i) the plurality of feature vectors transformed by the key function and (ii) a transpose of the plurality of slot vectors transformed by the query function; and dividing the dot product by a square root of D.  
	Rosset teaches based on the product comprises: determining a dot product of (A dot-product mechanism computes attention based on the equation: [0082])
 	(i) the plurality of feature vectors transformed by the key function and (ii) a transpose of the plurality of slot vectors transformed by the query function; and dividing the dot product by a square root of D.  (
				
    PNG
    media_image1.png
    69
    244
    media_image1.png
    Greyscale

[0082]. Examiner notes that in the equation: T is the transpose of (product of query Q and key K which is the attention matrix) and transformation by value function is v)
	The same motivation to combine dependent claim 3 applies here.

26.	Claims 4-6 are rejected under 35 U.S.C. 103 as being unpatentable over Herdade et al (US20210201044 filed 12/30/2019) in view of Byeon et al (US20210089867 filed 9/24/2019) in view of Nashida et al. (US20220043972 filed 12/17/2019) and further in view of Hu et al. (US20210303939 filed 03/25/2020)

	Regarding claim 4, Modified Herdade teaches the computer-implemented method of claim 1, Herdade teaches wherein the plurality of slot vectors comprises K slot vectors, wherein each respective dimension of the plurality of dimensions comprises K values, (Each attention head (of the self-attention layer 804) determines a set of vectors comprising a query Q, key K and value V for each of the N tokens. An exemplary expression follows: Q=XW Q ,K=XW K ,V=XW v, where X comprises the N input vectors (e.g., x1 . . . xn) stacked into a matrix [0116-0117]) and 
	Modified Herdade does not explicitly teach wherein the method further comprises: normalizing each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values along the respective dimension by way of a softmax function by dividing (i) an exponent of the respective value of the K values along the respective dimension by (ii) a sum of exponents of the K values along the respective dimension
	Hu teaches wherein the method further comprises: normalizing each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values along the respective dimension by way of a softmax function by dividing (i) an exponent of the respective value of the K values along the respective dimension by (ii) a sum of exponents of the K values along the respective dimension (The self-attention mechanism 1012 then linearly projects this matrix X into three matrices Q, K, V, corresponding to a query matrix, key matrix, and value matrix, respectively, where dk is the dimension of the queries and keys in Q and K, respectively. A dot-product mechanism computes attention based on the equation: 

    PNG
    media_image2.png
    56
    287
    media_image2.png
    Greyscale

[0086])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Herdade to incorporate the method of Hu for the benefit of identifying one or more candidate regions in the input image using a first neural network to determine one or more target regions encompassing an object-of-interest (Hu [0108])

	Regarding claim 5, Modified Herdade teaches the computer-implemented method of claim 4, Herdade teaches causes the plurality of slot vectors to compete with one another for representing entities contained in the perceptual representation. ( an alternative to using a greedy left-to-right generation of words, a beam search technique can be used in generating the output caption (e.g., caption 616). Beam searching constructs multiple simultaneous competing alternate caption hypotheses, eventually picking the one with the highest overall score among them (e.g., the score for an alternate caption hypothesis can be derived from probabilities assigned to the words selected for the alternate caption hypothesis) [0142]) 
	Modified Herdade does not explicitly teach wherein normalizing each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values of the respective dimension by way of the softmax function.
	Hu teaches wherein normalizing each respective value of the K values along each respective dimension of the plurality of dimensions of the attention matrix with respect to the K values of the respective dimension by way of the softmax function (The self-attention mechanism 1012 then linearly projects this matrix X into three matrices Q, K, V, corresponding to a query matrix, key matrix, and value matrix, respectively, where dk is the dimension of the queries and keys in Q and K, respectively. A dot-product mechanism computes attention based on the equation: 

    PNG
    media_image2.png
    56
    287
    media_image2.png
    Greyscale

[0086])
	The same motivation to combine dependent claim 4 applies here.

	Regarding claim 6, Modified Herdade teaches the computer-implemented method of claim 1, Byeon teaches wherein determining the update matrix (Going back to the dual recurrent neural network architecture system 200, U-LSTM updates the states Hk and Ck for the time step k, given the input Xk, previous cell state Ck-1, and the output of the H-LSTM Hk-1′, by replacing Hk-1 with Hk-1′ in Equation 1 [0044] Fig. 3; … a loss function may be utilized to train the history recurrent neural network and/or the update recurrent neural network [0053]) comprises: 
	Modified Herdade does not explicitly teach determining a product of (i) the plurality of feature vectors transformed by the value function and (ii) a transpose of the attention matrix.  
	Hu teaches determining a product of (A dot-product mechanism computes attention based on the equation: [0086])
	(i) the plurality of feature vectors transformed by the value function and (ii) a transpose of the attention matrix.  (
					
    PNG
    media_image2.png
    56
    287
    media_image2.png
    Greyscale

[0086]. Examiner notes that in the equation: T is the transpose of (product of query Q and key K which is the attention matrix) and transformation by value function is v)
	The same motivation to combine dependent claim 4 applies here.

27.	Claims 10 is rejected under 35 U.S.C. 103 as being unpatentable over Herdade et al (US20210201044 filed 12/30/2019) in view of Byeon et al (US20210089867 filed 9/24/2019) in view of Nashida et al. (US20220043972 filed 12/17/2019) and further in view of Sjölund (US20210020296 filed 07/16/2019)

	Regarding claim 10, Modified Herdade teaches the computer-implemented method of claim 1, Sjölund teaches wherein the plurality of slot vectors (invariant or equivariant to permutations of first and second subsets of the first set of parameters [0013])
	 are permutation equivariant with respect to one another such that, for multiple different initializations of the plurality of slot vectors with respect to a given perceptual representation, a set of values of the plurality of slot vectors is approximately constant and an order of the plurality of slot vectors is variable, (Equivariance to variable permutations means that reordering the optimization variables gives a corresponding reordering of the solution (without affecting the optimal value of the objective function) [0103]) and 
	wherein the plurality of slot vectors are permutation invariant with respect to the plurality of feature vectors such that, for multiple different permutations of the plurality of feature vector, the set of values of the plurality of slot vectors is approximately constant (Invariance to constraint permutations means that if the feasible set is described as a set of equalities and/or inequalities, then reordering them has no effect on the set of solutions (but could change the behavior of the optimization algorithm) [0103]; This implies that a permutation invariant function can be learned by treatment processing logic 120 using this expression, e.g., by learning the transformations ρ and ϕ with a ML algorithm such as a neural network [0104])  
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Modified Herdade to incorporate the method of Sjölund for the benefit of optimization problems by applying a machine learning model to the first set of parameters to estimate the second set of parameters, wherein in the machine learning model is trained to establish a relationship between the second set of parameters and the first set of parameters (Sjölund, [0007])

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121              





/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121