DETAILED ACTION
1.	This office action is in response to the Application No. 16303623 filed on 5/20/2016. Claims 1-20 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 

3.	Claims 1-6, and 12-18 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more.
	Step 1
	Independent claim 1 is directed to a system implemented by one or more computers, and falls into one of the four statutory categories.
	Step 2A, Prong 1
	Claim 1 recites the following abstract ideas:
	process the received input to produce as output multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (mental steps directed to predicting effects of the relationship between receiver and sender)

	Step 2A, Prong 2
	Claim 1 recites the following additional elements:
	implemented by one or more computers (This limitation is directed to generic computer component of processing. This is directed to high level generic computer and does not integrate the abstract idea into a practical application)
	an interaction component configured to (This limitation is directed to a generic computer component to perform the method of receiving input and processing. It is a mere instruction to apply An Exception (MPEP 2106.05(f)), i.e. using a computer to perform the mental step of prediction effect and therefore, it is not integrated into practical application)
	receive as input (i) states of one or more receiver entities and one or more sender entities, and (ii) attributes of one or more relationships between the one or more receiver entities and one or more sender entities; and (This limitation is directed to mere data gathering to apply An Exception, MPEP 2106.05(g) and therefore does not integrate into practical application)
	a dynamical component configured to (This limitation is directed to a generic computer component to perform the method of receiving input and processing. It is a mere instruction to apply An Exception (MPEP 2106.05(f)), i.e. using a computer to 
	receive as input (i) the states of the one or more receiver entities and one or more sender entities, and (ii) the multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (This limitation is directed to mere data gathering to apply An Exception, MPEP 2106.05(g) and therefore does not integrate into practical application)
	Step 2B
	Claim 1 recites the following additional elements:
	implemented by one or more computers (This limitation is directed to generic computer component of processing. This is directed to high level generic computer and does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	an interaction component configured to (This limitation is directed to a generic computer component to perform the method of receiving input and processing. It is a mere instruction to apply An Exception (MPEP 2106.05(f)), i.e. using a computer to perform the mental step of prediction effect and therefore, it does not amount to significantly more than judicial exception. See MPEP 2106.05(f))
	receive as input (i) states of one or more receiver entities and one or more sender entities, and (ii) attributes of one or more relationships between the one or more receiver entities and one or more sender entities; and (This limitation is directed to mere data gathering to apply An Exception, MPEP 2106.05(g) and therefore does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

	receive as input (i) the states of the one or more receiver entities and one or more sender entities, and (ii) the multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (This limitation is directed to mere data gathering to apply An Exception, MPEP 2106.05(g) and therefore does not amount to significantly more than judicial exception. See MPEP 2106.05(f))

4. 	Dependent claim 2 is directed to a system, and falls into one of the four statutory categories.  
	Claim 2 recites the following abstract ideas:
	wherein a receiver entity comprises an entity that is affected by one or more sender entities through one or more respective relationships (This limitation is directed to a mathematical concept of describing the elements of a graph used to represent relationships between sender and receiver entities, using graphs to represent a relationship between entities is a mathematical concept)
	Claim 2 do not recite any additional elements.


	Claim 3 recites the following abstract ideas:
	wherein a relational attribute of a relationships between a receiver entity and a sender entity describes the relationships between the receiver entity and sender entity (This limitation is directed to a mental process of describing the elements of a graph used to represent relationships between sender and receiver entities, using graphs to represent a relationship between entities is a mathematical concept)
	Claim 3 do not recite any additional elements.

6.	Dependent claim 4 is directed to a system, and falls into one of the four statutory categories.
	Claim 4 recites the following abstract ideas:
	wherein the (i) states of the one or more receiver entities and one or more sender entities, and (ii) relational attributes of the one or more relationships between the one or more receiver entities and one or more sender entities are represented as an attributed directed multigraph comprising multiple nodes for each entity and one or more directed edges for each relationships indicating an influence of one entity on another. (This limitation is directed to a mathematical concept of describing the elements of a graph used to represent relationships between sender and receiver entities, using graphs to represent a relationship between entities is a mathematical concept)

	Claim 4 do not recite any additional elements.

7.	Dependent claim 5 is directed to a system, and falls into one of the four statutory categories.
	Claim 5 do not recite any abstract ideas.
	Claim 5 recites the following additional elements:
	wherein each binary interaction is represented by a 3-tuple comprising (i) an index of a respective receiver entity, (ii) an index of a respective sender entity, and (iii) a vector containing relational attributes of the respective receiver entity and respective sender entity. (This limitation is directed to a mental process of describing usage of data type (tuple) to store information that describes the entities and relationships of the graph. This is directed to the use of generic computer component (i.e. data types) to implement the mathematical concept (i.e. graph of the entity relationships). It does not integrate the abstract idea into practical application)
	Claim 5 recites the following additional elements:
	wherein each binary interaction is represented by a 3-tuple comprising (i) an index of a respective receiver entity, (ii) an index of a respective sender entity, and (iii) a vector containing relational attributes of the respective receiver entity and respective sender entity. (This limitation is directed to a mental process of describing usage of data type (tuple) to store information that describes the entities and relationships of the graph. This is directed to the use of generic computer component (i.e. data types) to implement the 

8.	Dependent claim 6 is directed to a system, and falls into one of the four statutory categories.
	Claim 6 do not recite abstract ideas.
	Claim 6 recites the following additional elements:
	and wherein each high-order interaction is represented by a (2m-l)-tuple, where m represents the order of the interaction. (This limitation is directed to a mental process of describing usage of data type (tuple) to store information that describes the entities and relationships of the graph. This is directed to the use of generic computer component (i.e. data types) to implement the mathematical concept (i.e. graph of the entity relationships). It does not integrate the abstract idea into practical application)
	Claim 6 recites the following additional elements:
	and wherein each high-order interaction is represented by a (2m-l)-tuple, where m represents the order of the interaction. (This limitation is directed to a mental process of describing usage of data type (tuple) to store information that describes the entities and relationships of the graph. This is directed to the use of generic computer component (i.e. data types) to implement the mathematical concept (i.e. graph of the entity relationships). It does not amount to significantly more than the judicial exception, see MPEP 2106.05(f))


	Claim 12 recites the following abstract ideas:
	wherein the number of output multiple effects of the relationships between the one or more receiver entities and one or more sender entities is equal to the number of input relationships between the one or more receiver entities and one or more sender entities (This limitation is directed to a mathematical concept to describe relationship between number of output equal number of input relationships)
	Claim 12 do not recite any additional elements.

10.	Dependent claim 13 is directed to a system, and falls into one of the four statutory categories.
	Claim 13 recites the following abstract ideas:
	wherein processing the received input to produce as output a respective prediction of a subsequent state of each of the one or more receiver entities and one or more sender entities comprises aggregating the received multiple effects of the relationships between the one or more receiver entities and one or more sender entities using one or more commutative and associative operations, wherein the one or more commutative and associative operations optionally comprise element-wise summations. (This claim is directed to mathematical concept of commutative and associative operations to aggregate effects of the relationship)


11.	Dependent claim 14 is directed to a system, and falls into one of the four statutory categories.
	Claim 14 recites the following abstract ideas:
	wherein the produced respective prediction of the subsequent state of each of the one or more receiver entities and one or more sender entities comprises multiple entity states corresponding to subsequent receiver entity states and subsequent sender entity states (This limitation is directed to a mental step of predicting multiple states)
	Claim 14 do not recite any additional elements

12.	Dependent claim 15 is directed to a system, and falls into one of the four statutory categories.
	Claim 15 recites the following abstract ideas:
	wherein the system is further configured to analyze the produced respective prediction of the subsequent state of each of the one or more receiver entities and one or more sender entities to predict global properties of the one or more receiver entities and one or more sender entities (This limitation is directed to a mental step of predicting global properties)
	Claim 15 do not recite any additional elements


	Claim 16 recites the following abstract ideas.
	wherein the interaction component utilizes a first function approximator to model relationships between entities, and wherein the dynamical component utilizes a second function approximator to model the state of the environment in which the entities reside (This limitation is directed to a mental step of modeling relationships between entities)
	Claim 16 do not recite any additional elements

14.	Dependent claim 17 is directed to a system, and falls into one of the four statutory categories.
	Claim 17 do not recite abstract ideas.
	Claim 17 recites the following additional elements:
	wherein the interaction component and dynamical component are trained independently (This limitation is directed to recite training (i.e. configuring) components (i.e. generic computer components) The claims are directed to generic computer components (MPEP 2106.05(f)) They do not integrate the judicial exception into a practical application)
	Claim 17 recites the following additional elements:
	wherein the interaction component and dynamical component are trained independently (This limitation is directed to recite training (i.e. configuring) components 

15.	Dependent claim 18 is directed to a system, and falls into one of the four statutory categories.
	Claim 18 do not recite abstract ideas:
	Claim 18 recites the following additional elements:
	wherein the interaction component and dynamical component are trained end to end using a gradient based optimization technique (This limitation is directed to recite training (i.e. configuring) components (i.e. generic computer components) The claims are directed to generic computer components (MPEP 2106.05(f)). They do not integrate the judicial exception into a practical application)
	Claim 18 recites the following additional elements:
	wherein the interaction component and dynamical component are trained end to end using a gradient based optimization technique (This limitation is directed to recite training (i.e. configuring) components (i.e. generic computer components) The claims are directed to generic computer components (MPEP 2106.05(f)). They do not amount to significantly more than the judicial exception)


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. Also these limitations use generic place holders modified by functional language and the area not modified by sufficient structure.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:




16.	Claims 1-7, 14-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mottaghi et al (“What happens if...” Learning to Predict the Effect of Forces in Images, arXiv:1603.05600v1 [cs.CV] 17 Mar 2016) in view of Mai et al (US20160092736)

	Regarding claim 1, Mottaghi teaches a system implemented by one or more computers, the system (The physics engine (computer software that is implemented in a computer) simulates forward the effect of applying the force to the point that corresponds to p in the 3D synthetic scene and generates the velocity profile and locations for the query object, pg. 4, Problem Statement) comprising:
	an interaction component (A Convolutional Neural Network (CNN) to
encode scene and object appearance and geometry, (blue colored CNN), Fig. 4 pg. 6, 5.1 Model architecture) configured to: 
	receive as input (i) states of one or more receiver entities and one or more sender entities, and (The physics engine takes a scene and a force as input and
simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	(ii) attributes of one or more relationships between the one or more receiver entities and one or more sender entities; (For a given force vector applied to a specific location in an image, our goal is to predict long-term sequential movements caused by that force. Doing so entails reasoning about scene geometry, objects, their attributes, and the physical rules that govern the movements of objects, abstract) and
	process the received input (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	to produce as output multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (The physics engine
simulates forward the effect of applying the force to the point that corresponds to p in the 3D synthetic scene and generates the velocity profile and locations for the query object., pg. 4, fourth para.) 
	a dynamical component (A Recurrent Neural Network (RNN) pg. 6, 5.1 Model architecture) configured to: 
	receive as input (i) the states of the one or more receiver entities and one or more sender entities, (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.) and 
	(ii) the multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (that receives the output of the CNNs and generates the object motion (or equivalently, a sequence of vectors that represent the velocity of the object at each time step, pg. 6, 5.1 Model architecture) and 
(The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.)
	Mottaghi does not explicitly teach one or more receiver entities and one or more sender entities.
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Mottaghi to incorporate the teachings of Mia for the benefit of increasing the detectability of the  (Mai, abstract)

	Regarding claim 2, Mottaghi modified by Mai teaches the system of claim 1, Mottaghi teaches wherein a receiver entity comprises an entity that is affected by one or more sender entities through one or more respective relationships. (our goal is to predict the future movement of the object (receiver entity) as the result of applying the force to the object. More specifically, for a force f and an impact point p on the object surface in the RGB image, pg. 4, Problem Statement)

	Regarding claim 3, Modified Mottaghi teaches the system of claim 1, Mottaghi teaches wherein a relational attribute of a relationships between a receiver entity and a sender entity describes the relationships between the receiver entity and sender entity. (More specifically, for a force f (sender) and an impact point p on the object (receiver) surface in the RGB image, pg. 4, Problem Statement)
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	The same motivation to combine as independent claim 1 applies here.

	Regarding claim 4, Modified Mottaghi teaches the system of claim 1, Mottaghi teaches wherein the (i) states of the one or more receiver entities and one or more sender entities, (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.) and 
	(ii) relational attributes of the one or more relationships between the one or more receiver entities and one or more sender entities (More specifically, for a force f (sender) and an impact point p (relational attribute) on the object (receiver) surface in the RGB image, pg. 4, Problem Statement)
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	are represented as an attributed directed multigraph comprising multiple nodes for each entity and one or more directed edges for each relationships indicating an influence of one entity on another. (The network 300 further captures the notion that the output of a noisy attribute detector depends on both the viewing conditions and the attributes of the object. FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 and a2 represented by a node 370, and the output of their corresponding attribute detectors, d1 represented by a node 365 and d2 represented by a node 375. The generalization of the graph structure from two to N attributes is straightforward for those in the relevant art [0091])
	The same motivation to combine as independent claim 1 applies here.
	
	Regarding claim 5, Modified Mottaghi teaches the system of claim 1, Mottaghi teaches wherein the one or more relationships between the one or more receiver entities and (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	Mai teaches teach one or more sender entities comprise binary interactions between a receiver entity and a sender entity, (Let x represent the binary decision of whether the candidate object is the object of interest, represented by a node 350. In the following discussion, x=1 represents the decision that the candidate object is the object of interest, and x=0 represents the contrary, [0090]) and
	wherein each binary interaction is represented by a 3-tuple comprising (i) an index of a respective receiver entity, (ii) an index of a respective sender entity, and (iii) a vector containing relational attributes of the respective receiver entity and respective sender entity (FIG. 4. Term p(di|ai,v) (as 3-tuple)is the probability of an observed attribute detector output di, given the attribute of the object ai and the viewing conditions v. This quantity represents the detectability of attribute ai when observed under viewing conditions v, corresponding to the output 821 of step 820 in FIG. 8. For example, if a1 is a binary attribute class label, then p(d1=1|a1=0, v) is the probability of incorrectly detecting attribute a1 under viewing condition v (also known as a “false positive” detection), while p(d1=1|a1=1, v) is the probability of correctly detecting attribute a1 under viewing condition v (also known as a “true positive” detection). [0094])
	The same motivation to combine as independent claim 1 applies here.

	Regarding claim 6, Modified Mottaghi teach the system of claim 1, Mai teaches wherein the one or more relationships between the one or more receiver entities and one or more sender entities comprise high-order interactions, and wherein each high-order interaction is represented by a (2m−1)-tuple, where m represents the order of the interaction. (In one implementation of the step 950, stored tuples (φ, I(x; d|v))k 931 (for k=1 . . . K, where K is the number of provisional camera settings) recording the provisional camera setting selected at the step 910 and the corresponding mutual information determined at the step 930 in each iteration of the method 460 (see FIG. 9) are compared. The tuple (φ*, I*(x; d|v)) corresponding to the tuple with the maximum mutual information from amongst the stored tuples (φ, I(x; d|v))k is selected, and the camera setting φ* from the selected tuple is output as the new camera setting 461 at step 450 in FIG. 9 [0143]; In one alternative VIDD arrangement, the direction of motion is estimated from the current frame and two or more previous frames containing the candidate object, based on a second or higher order finite difference approximation to the change in location of the candidate object [0123])
	The same motivation to combine as independent claim 1 applies here.

	Regarding claim 7, Modified Mottaghi teaches the system of claim 1, Mottaghi teaches wherein the interaction component comprises a first neural network (A Convolutional Neural Network (CNN) (blue colored CNN), Fig. 4, pg. 6, 5.1 Model architecture) and
	the dynamical component comprises a second neural network, (A Recurrent Neural Network (RNN) pg. 6, 5.1 Model architecture) 
	optionally wherein the first neural network comprises a first multilayer perceptron (MLP) and the second neural network comprises a second MLP. (The training is performed end-to-end, and each iteration involves a forward pass through the entire network (which implies feedforward neural network or multilayer perceptron) pg. 7, Training)

	Regarding claim 14, Mottaghi teaches the system of claim 1, Mottaghi teaches wherein the produced respective prediction of the subsequent state of each of the one or more receiver entities and one or more sender entities comprises multiple entity states corresponding to subsequent receiver entity states and subsequent sender entity states ((The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.)
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	The same motivation to combine as independent claim 1 applies here.

	Regarding claim 15, Mottaghi teach the system of claim 1, Mottaghi teaches wherein the system is further configured to analyze the produced respective prediction of the subsequent state of each of the one or more receiver entities and one or more sender entities to predict global properties of the one or more receiver entities and one or more sender entities (The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.)
(wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	The same motivation to combine as independent claim 1 applies here.

	Regarding claim 16, Modified Mottaghi teach the system of claim 1, Mottaghi teaches wherein the interaction component utilizes a first function approximator to model relationships between entities, and wherein the dynamical component utilizes a second function approximator to model the state of the environment in which the entities reside. (The hidden layer of the RNN at time step t is a function of I and
the previous hidden unit (ht−1). More formally, ht = f(I, ht−1), where f is a linear function (fully connected layer) followed by a non-linear ReLU (Rectified Linear Unit), pg. 7, second para. (linear function is a function approximator))

	Regarding claim 18, Modified Mottaghi teach the system of claim 1, Mottaghi
teaches wherein the interaction component and dynamical component are trained end to end using a gradient based optimization technique (The forward pass and the backward pass are performed for 15,000 iterations when we use AlexNet for the image tower (the loss value does not change after 15K iterations), pg. 10, 6.3 Network and optimization parameters, (back propagation algorithm is based on gradient descent))

	Regarding claim 19, Mottaghi teaches a method (we describe the evaluation of our method and compare our method with a set of baseline approaches. We provide the details of the dataset and explain how we interact with objects in the scenes, pg. 8, 6 Experiments; The physics engine (computer software that is implemented in a computer) simulates forward the effect of applying the force to the point that corresponds to p in the 3D synthetic scene and generates the velocity profile and locations for the query object, pg. 4, Problem Statement) comprising:
	receiving an input i) states of one or more receiver entities and one or more sender entities, and (The physics engine takes a scene and a force as input and
simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	(ii) attributes of one or more relationships between the one or more receiver entities and one or more sender entities; (For a given force vector applied to a specific location in an image, our goal is to predict long-term sequential movements caused by that force. Doing so entails reasoning about scene geometry, objects, their attributes, and the physical rules that govern the movements of objects, abstract) and
	processing the received input (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	using an interaction component (A Convolutional Neural Network (CNN) to
encode scene and object appearance and geometry, (blue colored CNN), Fig. 4 pg. 6, 5.1 Model architecture)
	to produce as output multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (The physics engine
simulates forward the effect of applying the force to the point that corresponds to p in the 3D synthetic scene and generates the velocity profile and locations for the query object., pg. 4, fourth para.) and
	processing (i) the states of the one or more receiver entities and one or more sender entities, (ht represents the hidden layer of the RNN at time step t. Also, we use the abbreviation FC for a fully connected layer. The output of our model is a sequence of velocity directions at each time step. We consider 17 directions and an additional ‘stop’ class, which is shown by a red circle. The green ellipses show the chosen direction at each time step. The RNN stops when it generates the ‘stop’ class, pg. 6, Fig. 4. Model) and
	(ii) the multiple effects of the relationships between the one or more receiver entities and one or more sender entities (that receives the output of the CNNs and generates the object motion (or equivalently, a sequence of vectors that represent the velocity of the object at each time step, pg. 6, 5.1 Model architecture) 
	using a dynamical component (A Recurrent Neural Network (RNN) pg. 6, 5.1 Model architecture) 
	to produce as output a respective prediction of a subsequent state of each of the one or more receiver entities and one or more sender entities. (The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.)
	Mottaghi does not explicitly teach one or more receiver entities and one or more sender entities.
	 Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Mottaghi to incorporate the teachings of Mia for the benefit of increasing the detectability of the  (Mai, abstract)

	Regarding claim 20, Mottaghi teaches receiving an input comprising i) states of one or more receiver entities and one or more sender entities, (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	and (ii) attributes of one or more relationships between the one or more receiver entities and one or more sender entities; (For a given force vector applied to a specific location in an image, our goal is to predict long-term sequential movements caused by that force. Doing so entails reasoning about scene geometry, objects, their attributes, and the physical rules that govern the movements of objects, abstract) 
	processing the received input (The physics engine takes a scene and a force as input and simulates the future states of the objects in the scene according to the applied forces, pg. 5, second para.)
	 using an interaction component (A Convolutional Neural Network (CNN) to
encode scene and object appearance and geometry, (blue colored CNN), Fig. 4 pg. 6, 5.1 Model architecture) 
	to produce as output multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (The physics engine
simulates forward the effect of applying the force to the point that corresponds to p in the 3D synthetic scene and generates the velocity profile and locations for the query object., pg. 4, fourth para.) and
	processing (i) the states of the one or more receiver entities and one or more sender entities, (ht represents the hidden layer of the RNN at time step t. Also, we use the abbreviation FC for a fully connected layer. The output of our model is a sequence of velocity directions at each time step. We consider 17 directions and an additional ‘stop’ class, which is shown by a red circle. The green ellipses show the chosen direction at each time step. The RNN stops when it generates the ‘stop’ class, pg. 6, Fig. 4. Model) and
	ii) the multiple effects of the relationships between the one or more receiver entities and one or more sender entities (that receives the output of the CNNs and generates the object motion (or equivalently, a sequence of vectors that represent the velocity of the object at each time step, pg. 6, 5.1 Model architecture)
	using a dynamical component (A Recurrent Neural Network (RNN) pg. 6, 5.1 Model architecture) 
	to produce as output a respective prediction of a subsequent state of each of the one or more receiver entities and one or more sender entities. (The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the
RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.
	Mottaghi does not explicitly teach one or more receiver entities and one or more sender entities; one or more non-transitory computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform the operations comprising
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	 one or more non-transitory computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform the operations comprising (Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 150 for execution and/or processing [0072])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Mottaghi to incorporate the teachings of Mia for the benefit of increasing the detectability of the  (Mai, abstract)
	
17.	Claims 8-11, 13 are rejected under 35 U.S.C. 103 as being unpatentable over Mottaghi et al (“What happens if...” Learning to Predict the Effect of Forces in Images, arXiv:1603.05600v1 [cs.CV] 17 Mar 2016) in view of Mai et al (US20160092736) and further in view of Simard et al (US20070086655)

	Regarding claim 8, Modified Mottaghi teach the system of claim 7, Mottaghi teaches wherein processing the received input to produce as output multiple effects of the relationships between the one or more receiver entities and one or more sender entities; (The physics engine simulates forward the effect of applying the force to the point that corresponds to p in the 3D synthetic scene and generates the velocity profile and locations for the query object., pg. 4, fourth para.) comprises:
	processing the input using the first MLP (A Convolutional Neural Network (CNN) to encode scene and object appearance and geometry, (blue colored CNN), Fig. 4 pg. 6, 5.1 Model architecture) 
	to produce as output multiple effects of the relationships between the one or more receiver entities and one or more sender entities, (that receives the output of the CNNs and generates the object motion (or equivalently, a sequence of vectors that represent the velocity of the object at each time step, pg. 6, 5.1 Model architecture)
	wherein the produced output comprises an effect matrix whose columns represent the multiple effects of the relationships between the one or more receiver entities and the (Fig. 5. Synthesizing the effect of the force. A force (shown by a yellow arrow) is applied to a point on the surface of the chair. The three pictures on the right show different time steps of the scene simulated in the physics engine. There is a red circle around the object that moves, pg. 10)
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Mottaghi to incorporate the teachings of Mia for the benefit of increasing the detectability of the attribute of the object and to determine the confidence that the candidate object is the object of interest. (Mai, abstract)
	Modified Mottaghi does not explicitly teach defining (i) a state matrix as a matrix whose i-th column represents a state of entity i, (ii) a receiver matrix as a NO×NR matrix, where NO represents the total number of entities and NR represents the total number of relationships, and wherein each column of the receiver matrix contains zero entries except for the position of an entity that is a receiver of the corresponding relationships, and (iii) a sender matrix as a NO×NR matrix, wherein each column of the sender matrix contains zero entries except for the position of an entity that is a sender of the 
	Simard teaches defining (i) a state matrix as a matrix whose i-th column represents a state of entity i, (ii) a receiver matrix as a NO×NR matrix, where NO represents the total number of entities and NR represents the total number of relationships (each column of matrix 214 contains zero entries, Fig. 2) and 
	(iii) a sender matrix as a NO×NR matrix, wherein each column of the sender matrix contains zero entries except for the position of an entity that is a sender of the corresponding relationships; (columns of matrix 212 contain zero entries, Fig. 2)
	multiplying the defined receiver matrix (matrix 214, Fig. 2) and 
	the defined sender matrix (matrix 212, Fig. 2) 
	by the defined state matrix; (Fig. 2)
	concatenating the multiplied matrices to generate an input for the first MLP; (matrix 214 can be rewritten accordingly by concatenating into columns the kernels associated with respective output features [0033])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Modified Mottaghi to incorporate the teachings of Simard for the benefit of generating an aggregated input matrix which can be convolved with a kernel matrix to produce output feature information for multiple output features concurrently (Simard, abstract)
	
(To obtain a large number of observations of forces and objects to train this model, we collect a new dataset using physics engines, pg. 2, second para.) and 
	to generate an input for the second MLP (A Recurrent Neural Network
(RNN) that receives the output of the two CNNs and generates the object motion (or equivalently, a sequence of vectors that represent the velocity of the object at each time step, pg. 6, 5.1 Model architecture)
	Simard teaches wherein concatenating the multiplied matrices further comprises concatenating the multiplied matrices and a matrix representing the relationships of different types (a neural network can comprise receiving multiple input features and unfolding such features to generate an unfolded input feature matrix. Additionally, input features can be concatenated to reduce matrix size while preserving essential input feature information [0010])
	The same motivation to combine as dependent claim 8 applies here

	Regarding claim 10, Modified Mottaghi teaches the system of claim 8, Mottaghi teaches wherein processing the received input to produce as output a respective prediction of the state of each of the one or more receiver entities and one or more sender entities ((The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.) comprises:
	generate an input for the second MLP; (A Recurrent Neural Network
(RNN) that receives the output of the two CNNs and generates the object motion (or equivalently, a sequence of vectors that represent the velocity of the object at each time step, pg. 6, 5.1 Model architecture)
	processing the input using the second MLP to produce as output a respective prediction of a subsequent state of each of the one or more receiver entities and one or more sender entities. (The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.)
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	Simard teaches multiplying the effect matrix with a transpose of the defined receiver matrix;( the associated matrix-vector (as opposed to a vector-matrix) solution obtained by taking matrix transposes on both sides of the equation, the associated matrix-vector solution obtained by re-arranging the columns and rows of the kernel matrix [0026])
	concatenating the multiplied effect matrix (a neural network can comprise receiving multiple input features and unfolding such features to generate an unfolded input feature matrix. Additionally, input features can be concatenated to reduce matrix size while preserving essential input feature information [0010])
	with the transpose of the defined receiver matrix with the defined state matrix to ;( the associated matrix-vector (as opposed to a vector-matrix) solution obtained by taking matrix transposes on both sides of the equation, the associated matrix-vector solution obtained by re-arranging the columns and rows of the kernel matrix [0026])
	The same motivation to combine as dependent claim 8 applies here

	Regarding claim 11, Mottaghi modified by Simard teaches the system of claim 8, Mottaghi wherein during a system training process the sender and receiver matrices are constant. (‘Identity’ propagates the input to the output with no change, Fig 4. Model)

	Regarding claim 13, Modified Mottaghi teaches the system of claim 1, Mottaghi teaches wherein processing the received input to produce as output a respective prediction of a subsequent state of each of the one or more receiver entities and one or more sender entities ((The recurrent part of our network receives I as input and generates a sequence of velocity vectors. The advantage of using a Recurrent Neural Network (RNN) is twofold. First, the velocities at different time steps are dependent on each other, and the RNN can capture these temporal dependencies. Second, RNNs enable us to predict a variable-length sequence of velocities (the objects move different distances depending on the magnitude of the force and the structure of the scene). We show the unfolded RNN in Figure 4, pg. 7, second para.) comprises
	Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	Modified Mottaghi does not explicitly teach aggregating the received multiple effects of the relationships between the one or more receiver entities and one or more sender entities, using one or more commutative and associative operations, wherein the 
	Simard teaches aggregating the received multiple effects of the relationships between the one or more receiver entities and one or more sender entities (concatenating input features in order to aggregate input signal data into a larger matrix [0027])
	using one or more commutative and associative operations, wherein the one or more commutative and associative operations optionally comprise element-wise summations. (A summation sign (Σ) is shown near the inputs … to illustrate that the dot products of the matrices are summed to obtain the output matrices 210 [0032])
	The same motivation to combine as dependent claim 8 applies here

18.	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Mottaghi et al (“What happens if...” Learning to Predict the Effect of Forces in Images, arXiv:1603.05600v1 [cs.CV] 17 Mar 2016) in view of Mai et al (US20160092736) in view of Simard et al (US20070086655) and further in view of Weston et al (US20170193390 filed on 12/30/2015)

	Regarding claim 12, Modified Mottaghi teaches the system of claim 1, Mai teaches one or more receiver entities and one or more sender entities (wherein information from which the attributes of the object of interest 100 can be determined is received as input [0102]; FIG. 3 shows the belief network 300 for two attributes, a1 represented by a node 360 (sender) and a2 represented by a node 370 (sender),  and the output of their corresponding attribute detectors, d1 represented by a node 365 (receiver) and d2 represented by a node 375 (receiver) [0091])
	Mottaghi does not explicitly teach wherein the number of output multiple effects of the relationships between the one or more receiver entities and one or more sender entities is equal to the number of input relationships between the one or more receiver entities and one or more sender entities.
	Weston teaches wherein the number of output multiple effects of the relationships between the one or more receiver entities and one or more sender entities is equal to the number of input relationships between the one or more receiver entities and one or more sender entities (Inputs 302, 304, 306, and 308 may be any suitable number of entities. Outputs 312 may be one or more embeddings of entities [0037] which means the number of output entities can be equal to the number of input entities)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Modified Mottaghi to incorporate the teachings of Weston for the benefit of social-networking system 160 used to predict the entities with which the user will positively interact (Weston, [0064])

19.	Claim 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mottaghi et al (“What happens if...” Learning to Predict the Effect of Forces in Images, arXiv:1603.05600v1 [cs.CV] 17 Mar 2016) in view of Mai et al (US20160092736) and further in view of Agrafiotis et al (US20020099675)

	Regarding claim 17, Modified Mottaghi teach the system of claim 1, Modified Mottaghi does not explicitly teach wherein the interaction component and dynamical component are trained independently.
	Agrafiotis teaches wherein the interaction component and dynamical component are trained independently (In an embodiment, the projection can be carried out using a multiplicity of neural networks, each of which is trained independently and specializes in the prediction of a subset of the m output features. For example, the system may involve m independent neural networks each of which specializes in the prediction of a single output feature [0089])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Modified Mottaghi to incorporate the teachings Agrafiotis for the benefit of representing a set of objects in a multidimensional space given a set of pairwise relationships between some of these objects (Agrafiotis [0018])

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121                                    


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121