Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/913,160 filed on 06/26/2020. Claims 1-30, as originally filed, are currently pending and have been considered below. Claim 1, 7, 13, 19 and 25 are independent claims.

Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: Claims 19-24 recites “machine-readable medium” but the Specification does not recite “machine-readable medium”.


Information Disclosure Statement
The information disclosure statement (IDS) are submitted on 03/23/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 19-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim could be considered signal per se.
Claim 19 recites “machine-readable medium." The broadest reasonable interpretation of a claim that recites "machine-readable medium," in view of the present specification, does not cover any form of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. §101, Aug. 24, 2009; p. 2. 1351 Off. Gaz. Pat. Off. 212 (2010). Under broadest reasonable interpretation, "machine-readable medium" recited in claim 19-24 encompasses a transitory, propagating signal, which is not a process, machine, manufacture, or composition of matter. Nuijten, 500 F.3d at 1357. Therefore, the claim "covers material not found in any of the four statutory categories [and thus] falls outside the plainly expressed scope of § 101." Id. at 1354. A recommended amendment is to recite “non-transitory computer-readable medium” (emphasis added).
Claims 20-24 are rejected based on the same rationale as discussed above in the rejected claim 19.
Claims 2, 8, 14, and 26 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a processor, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a processor for interaction determination using one or more neural networks. Each of the following limitation(s):  
perform instance segmentation to identify features for the one or more objects in one or more 3input images

as drafted, claim 2 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses perform instance segmentation to identify features for the one or more objects in one or more 3input images (corresponds to evaluation and judgement).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “one or more circuits to use one or more neural networks to generate one or 3more images indicating one or more interactions between a user and one or more objects in 4the one or more images” and “wherein the one or more circuits are further to”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. In particular, "one or more circuits to use one or more neural networks to generate one or more images..." amount to mere instruction to apply because the limitation amounts to using a neural network as a tool for generating images. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element(s) amounts to no more than mere instructions to apply the exception. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a system, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a system for interaction determination using one or more neural networks. Each of the following limitation(s):  
perform instance segmentation to identify features for the one or more objects in one or more 3input images

as drafted, claim 8 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses perform instance segmentation to identify features for the one or more objects in one or more 3input images (corresponds to evaluation and judgement).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “one or more processors to use one or more neural networks to generate one or 3more images indicating one or more interactions between a user and one or more objects in the 4one or more images” and “wherein the one or more processor are further to”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. In particular, "one or more processors to use one or more neural networks to generate one or 3more images..." amount to mere instruction to apply because the limitation amounts to using a neural network as a tool for generating images. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element(s) amounts to no more than mere instructions to apply the exception. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for interaction determination using one or more neural networks. Each of the following limitation(s):  
performing instance segmentation to identify features for the one or more objects 3in one or more input images

as drafted, claim 14 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses performing instance segmentation to identify features for the one or more objects in one or more input images (corresponds to evaluation and judgement).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “using one or more neural networks to generate one or more images indicating one 3or more interactions between a user and one or more objects in the one or more images”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. In particular, "using one or more neural networks to generate one or more images..." amount to mere instruction to apply because the limitation amounts to using a neural network as a tool for generating images. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element(s) amounts to no more than mere instructions to apply the exception. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.
Regarding claim 26,
Claim 26 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 26 is directed to a player training system, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a player training system for interaction determination using one or more neural networks. Each of the following limitation(s):  
perform instance segmentation to identify features for the one or more 3objects in one or more input images

as drafted, claim 26 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses perform instance segmentation to identify features for the one or more objects in one or more 3input images (corresponds to evaluation and judgement).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “one or more processors to use one or more neural networks to generate one or 3more images indicating one or more interactions between a player and one or more objects in the 4one or more images” and “wherein the one or more 2processors are further to”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. In particular, "one or more processors to use one or more neural networks to generate one or 3more images..." amount to mere instruction to apply because the limitation amounts to using a neural network as a tool for generating images. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element(s) amounts to no more than mere instructions to apply the exception. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. Therefore, these additional elements do not amount to significantly more. The claim in not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 7, 13, and 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Schwartz et al. (US 11010951 B1)
Regarding Claim 1,
Schwartz et al. teaches a processor, comprising (Schwartz et al., Col. 8 Lines 54-57, “The head-mounted device comprises one or more processors configured to implement the camera 520, the eye computing unit 530, the face computing unit 540, and the avatar rendering unit 550 of the central module 510” teaches the processor). 
2one or more circuits to use one or more neural networks to generate one or 3more images indicating one or more interactions between a user and one or more objects in 4the one or more images (Schwartz et al., Col. 13 Lines 40-51, “Herein, a computer-readable non-transitory storage medium or media may include one or more semiconductor-based or other integrated circuits (ICs) (such, as for example, field-programmable gate arrays (FPGAs) or application-specific ICs (ASICs)), hard disk drives (HDDs), hybrid hard drives (HHDs), optical discs, optical disc drives (ODDs), magneto-optical discs, magneto-optical drives, floppy diskettes, floppy disk drives (FDDs), magnetic tapes, solid-state drives (SSDs), RAM-drives, SECURE DIGITAL cards or drives, any other suitable computer-readable non-transitory storage media, or any suitable combination of two or more of these, where appropriate” teaches the one or more circuits. FIG. 1 and Col. 4 Lines 14-39, “FIG. 1 illustrates an example diagram of an avatar-rendering system architecture 100, in accordance with certain embodiments. The avatar-rendering system 100 may comprise at least one HMD 112 which utilizes a neural network 120 to render an avatar for users 110 respectively. For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints. The decoder 124 then decodes the code 126, which includes the geometry information and the view-dependent texture information of the subject, to render an avatar 140 for the user 110 a, which avatar 140 can be viewed by a user 110 b via his/her HMD 112 b. In particular embodiments, the decoder 124 decodes the code 126 to produce a stereo image of the user 110 a. In particular embodiments, the avatar-rendering process in the avatar-rendering system 100 may be bidirectional. For example, the user 110 b may also render an avatar of the user 110 b to be displayed in the HMD 112 a using his/her HMD 112 b” teaches utilizing a neural network (corresponds to the one or more neural networks) to render an avatar (corresponds to generating one or more images) for the user. Col. 3 Lines 45-55, “For simulating a real eye contact for the user in a display, especially a head-mounted display (HMD), an individual eye model is provided to render the eyeballs of the user. Embodiments described herein provides a method using a neural network to generate the avatar's eyeballs which is separated from the rest of the avatar, such that the face of the avatar is constructed based on (1) a facial mesh and a facial texture and (2) an eyeball mesh and an eyeball texture. Therefore, a gaze of the user described in the present disclosure can be reproduced accurately and vividly in the rendered avatar” teaches interaction of the rendered avatar with the face and eye features (corresponds to one or more objects)).
Regarding Claim 7,
Schwartz et al. teaches a system comprising (Schwartz et al., FIG. 1 and Col. 4 Lines 14-18, “FIG. 1 illustrates an example diagram of an avatar-rendering system architecture 100, in accordance with certain embodiments. The avatar-rendering system 100 may comprise at least one HMD 112 which utilizes a neural network 120 to render an avatar for users 110 respectively” teaches a system).
2one or more processors to use one or more neural networks to generate one or 3more images indicating one or more interactions between a user and one or more objects in the 4one or more images (Schwartz et al., Col. 8 Lines 54-57, “The head-mounted device comprises one or more processors configured to implement the camera 520, the eye computing unit 530, the face computing unit 540, and the avatar rendering unit 550 of the central module 510” teaches the one or more processors. FIG. 1 and Col. 4 Lines 14-39, “FIG. 1 illustrates an example diagram of an avatar-rendering system architecture 100, in accordance with certain embodiments. The avatar-rendering system 100 may comprise at least one HMD 112 which utilizes a neural network 120 to render an avatar for users 110 respectively. For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints. The decoder 124 then decodes the code 126, which includes the geometry information and the view-dependent texture information of the subject, to render an avatar 140 for the user 110 a, which avatar 140 can be viewed by a user 110 b via his/her HMD 112 b. In particular embodiments, the decoder 124 decodes the code 126 to produce a stereo image of the user 110 a. In particular embodiments, the avatar-rendering process in the avatar-rendering system 100 may be bidirectional. For example, the user 110 b may also render an avatar of the user 110 b to be displayed in the HMD 112 a using his/her HMD 112 b” teaches utilizing a neural network (corresponds to the one or more neural networks) to render an avatar (corresponds to generating one or more images) for the user. Col. 3 Lines 45-55, “For simulating a real eye contact for the user in a display, especially a head-mounted display (HMD), an individual eye model is provided to render the eyeballs of the user. Embodiments described herein provides a method using a neural network to generate the avatar's eyeballs which is separated from the rest of the avatar, such that the face of the avatar is constructed based on (1) a facial mesh and a facial texture and (2) an eyeball mesh and an eyeball texture. Therefore, a gaze of the user described in the present disclosure can be reproduced accurately and vividly in the rendered avatar” teaches interaction of the rendered avatar with the face and eye features (corresponds to one or more objects)).
Regarding Claim 13,
Schwartz et al. teaches a 1amethod comprising (Schwartz et al., FIG. 6 and Col. 9 Lines 7-9, “FIG. 6 illustrates an example method 600 for rendering an avatar using an individual eyeball model, in accordance with certain embodiments” teaches a method).
2using one or more neural networks to generate one or more images indicating one 3or more interactions between a user and one or more objects in the one or more images (Schwartz et al., FIG. 1 and Col. 4 Lines 14-39, “FIG. 1 illustrates an example diagram of an avatar-rendering system architecture 100, in accordance with certain embodiments. The avatar-rendering system 100 may comprise at least one HMD 112 which utilizes a neural network 120 to render an avatar for users 110 respectively. For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints. The decoder 124 then decodes the code 126, which includes the geometry information and the view-dependent texture information of the subject, to render an avatar 140 for the user 110 a, which avatar 140 can be viewed by a user 110 b via his/her HMD 112 b. In particular embodiments, the decoder 124 decodes the code 126 to produce a stereo image of the user 110 a. In particular embodiments, the avatar-rendering process in the avatar-rendering system 100 may be bidirectional. For example, the user 110 b may also render an avatar of the user 110 b to be displayed in the HMD 112 a using his/her HMD 112 b” teaches utilizing a neural network (corresponds to the one or more neural networks) to render an avatar (corresponds to generating one or more images) for the user. Col. 3 Lines 45-55, “For simulating a real eye contact for the user in a display, especially a head-mounted display (HMD), an individual eye model is provided to render the eyeballs of the user. Embodiments described herein provides a method using a neural network to generate the avatar's eyeballs which is separated from the rest of the avatar, such that the face of the avatar is constructed based on (1) a facial mesh and a facial texture and (2) an eyeball mesh and an eyeball texture. Therefore, a gaze of the user described in the present disclosure can be reproduced accurately and vividly in the rendered avatar” teaches interaction of the rendered avatar with the face and eye features (corresponds to one or more objects)).
Regarding Claim 19,
Schwartz et al. teaches a machine-readable medium having stored thereon a set of instructions, 2which if performed by one or more processors, cause the one or more processors to at least (Schwartz et al., Col. 13 Lines 40-42, “Herein, a computer-readable non-transitory storage medium or media may include one or more semiconductor-based or other integrated circuits” teaches a computer-readable non-transitory storage medium (corresponds to the machine-readable medium). Col. 11 Lines 43-45, “In particular embodiments, memory 704 includes main memory for storing instructions for processor 702 to execute or data for processor 702 to operate on” teaches a memory (corresponds to the machine-readable medium) that stores instructions executed by a processor (corresponds to the processors). Col. 8 Lines 54-57, “The head-mounted device comprises one or more processors configured to implement the camera 520, the eye computing unit 530, the face computing unit 540, and the avatar rendering unit 550 of the central module 510” teaches the one or more processors).  
3use one or more neural networks to generate one or more images indicating one or 4more interactions between a user and one or more objects in the one or more images (Schwartz et al., FIG. 1 and Col. 4 Lines 14-39, “FIG. 1 illustrates an example diagram of an avatar-rendering system architecture 100, in accordance with certain embodiments. The avatar-rendering system 100 may comprise at least one HMD 112 which utilizes a neural network 120 to render an avatar for users 110 respectively. For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints. The decoder 124 then decodes the code 126, which includes the geometry information and the view-dependent texture information of the subject, to render an avatar 140 for the user 110 a, which avatar 140 can be viewed by a user 110 b via his/her HMD 112 b. In particular embodiments, the decoder 124 decodes the code 126 to produce a stereo image of the user 110 a. In particular embodiments, the avatar-rendering process in the avatar-rendering system 100 may be bidirectional. For example, the user 110 b may also render an avatar of the user 110 b to be displayed in the HMD 112 a using his/her HMD 112 b” teaches utilizing a neural network (corresponds to the one or more neural networks) to render an avatar (corresponds to generating one or more images) for the user. Col. 3 Lines 45-55, “For simulating a real eye contact for the user in a display, especially a head-mounted display (HMD), an individual eye model is provided to render the eyeballs of the user. Embodiments described herein provides a method using a neural network to generate the avatar's eyeballs which is separated from the rest of the avatar, such that the face of the avatar is constructed based on (1) a facial mesh and a facial texture and (2) an eyeball mesh and an eyeball texture. Therefore, a gaze of the user described in the present disclosure can be reproduced accurately and vividly in the rendered avatar” teaches interaction of the rendered avatar with the face and eye features (corresponds to one or more objects)).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 8, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Schwartz et al. in view of Feng et al. (“Computer vision algorithms and hardware implementations: A survey”) 
Regarding Claim 2,
Schwartz et al. teaches t1tthe processor of claim 1, 
Schwartz et al. does not appear to explicitly teach wherein the one or more circuits are further to 2perform instance segmentation to identify features for the one or more objects in one or more 3input images
However, Feng et al., teaches wherein the one or more circuits are further to 2perform instance segmentation to identify features for the one or more objects in one or more 3input images (Feng et al., Section 1 Pg. 310, “summarize the notable hardware units including GPUs, field-programmable gate arrays (FPGAs) and other advanced mobile hardware platforms that are adapted or designed to accelerate DNN-based computer vision algorithms” teaches one or more circuits. Fig. 2(d) and Section 2.3 Pg. 312, “instance segmentation, which predicts different labels for different object instances as a further improvement to semantic segmentation, as shown in Fig. 2 (d)” teaches instance segmentation that predicts different labels for different object instances (corresponds to identify features for the one or more objects) in the input images). 
Schwartz et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Feng et al., with motivation wherein the one or more circuits are further to 2perform instance segmentation to identify features for the one or more objects in one or more 3input images. “In this paper, we conduct a comprehensive survey on computer vision techniques. Specially, we have highlighted the recent accomplishments in both the algorithms for a variety of computer vision tasks such as image classification, object detection and image segmentation, and the promising hardware platforms to implement DNNs efficiently for practical applications, such as GPUs, FPGAs and other new generation of hardware accelerators” (Feng et al., Conclusion). The proposed teaching is beneficial in that it facilitates real-time and/or energy efficient operations.
Regarding Claim 8,
Schwartz et al. teaches t135\\NORTHCA - 1R2674/010401 - 2772512 vlthe system of claim 7, 
Schwartz et al. does not appear to explicitly teach wherein the one or more processors are further to 2perform instance segmentation to identify features for the one or more objects in one or more 3input images
However, Feng et al., teaches wherein the one or more processors are further to 2perform instance segmentation to identify features for the one or more objects in one or more 3input images (Feng et al., Col. 16 Lines 25-29, “The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers” teaches the one or more processor. Section 1 Pg. 310, “summarize the notable hardware units including GPUs, field-programmable gate arrays (FPGAs) and other advanced mobile hardware platforms that are adapted or designed to accelerate DNN-based computer vision algorithms” teaches one or more circuits. Fig. 2(d) and Section 2.3 Pg. 312, “instance segmentation, which predicts different labels for different object instances as a further improvement to semantic segmentation, as shown in Fig. 2 (d)” teaches instance segmentation that predicts different labels for different object instances (corresponds to identify features for the one or more objects) in the input images). 
Schwartz et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Feng et al., with motivation wherein the one or more processors are further to 2perform instance segmentation to identify features for the one or more objects in one or more 3input images. “In this paper, we conduct a comprehensive survey on computer vision techniques. Specially, we have highlighted the recent accomplishments in both the algorithms for a variety of computer vision tasks such as image classification, object detection and image segmentation, and the promising hardware platforms to implement DNNs efficiently for practical applications, such as GPUs, FPGAs and other new generation of hardware accelerators” (Feng et al., Conclusion). The proposed teaching is beneficial in that it facilitates real-time and/or energy efficient operations.
Regarding Claim 14,
Schwartz et al. teaches t1the method of claim 13, further comprising: 
Schwartz et al. does not appear to explicitly teach performing instance segmentation to identify features for the one or more objects 3in one or more input images
However, Feng et al., teaches 2performing instance segmentation to identify features for the one or more objects 3in one or more input images (Feng et al., Fig. 2(d) and Section 2.3 Pg. 312, “instance segmentation, which predicts different labels for different object instances as a further improvement to semantic segmentation, as shown in Fig. 2 (d)” teaches instance segmentation that predicts different labels for different object instances (corresponds to identify features for the one or more objects) in the input images). 
Schwartz et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Feng et al., with motivation of performing instance segmentation to identify features for the one or more objects 3in one or more input images. “In this paper, we conduct a comprehensive survey on computer vision techniques. Specially, we have highlighted the recent accomplishments in both the algorithms for a variety of computer vision tasks such as image classification, object detection and image segmentation, and the promising hardware platforms to implement DNNs efficiently for practical applications, such as GPUs, FPGAs and other new generation of hardware accelerators” (Feng et al., Conclusion). The proposed teaching is beneficial in that it facilitates real-time and/or energy efficient operations.
Regarding Claim 20,
Schwartz et al. teaches t1the machine-readable medium of claim 19, wherein the instructions if 2performed further cause the one or more processors to: 
Schwartz et al. does not appear to explicitly teach perform instance segmentation to identify features for the one or more objects in 4one or more input images
However, Feng et al., teaches 3perform instance segmentation to identify features for the one or more objects in 4one or more input images (Feng et al., Section 1 Pg. 310, “summarize the notable hardware units including GPUs, field-programmable gate arrays (FPGAs) and other advanced mobile hardware platforms that are adapted or designed to accelerate DNN-based computer vision algorithms” teaches one or more circuits. Fig. 2(d) and Section 2.3 Pg. 312, “instance segmentation, which predicts different labels for different object instances as a further improvement to semantic segmentation, as shown in Fig. 2 (d)” teaches instance segmentation that predicts different labels for different object instances (corresponds to identify features for the one or more objects) in the input images). 
Schwartz et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Feng et al., with motivation to perform instance segmentation to identify features for the one or more objects in 4one or more input images. “In this paper, we conduct a comprehensive survey on computer vision techniques. Specially, we have highlighted the recent accomplishments in both the algorithms for a variety of computer vision tasks such as image classification, object detection and image segmentation, and the promising hardware platforms to implement DNNs efficiently for practical applications, such as GPUs, FPGAs and other new generation of hardware accelerators” (Feng et al., Conclusion). The proposed teaching is beneficial in that it facilitates real-time and/or energy efficient operations.
Claims 3, 5, 9, 11, 15, 17, 21, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Schwartz et al. in view of Feng et al. in view of Dunning et al. (US 10872293 B2)
Regarding Claim 3,
Schwartz et al. in view of Feng et al. teaches the processor of claim 2, 
Schwartz et al. in view of Feng et al. does not appear to explicitly teach wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects
However, Dunning et al., teaches wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space (Dunning et al., Col. 13 Lines 33-43, “The system can then train a variational autoencoder, in particular a variational sequence autoencoder, with the captured temporal sequences. The variational sequence autoencoder may include a recurrent neural network encoder to encode an input data sequence as a set of latent variables, coupled to a recurrent neural network decoder to decode the set of latent variables to produce an output data sequence, i.e., to reproduce the output data sequence from the input data sequence. During training the latent variables are constrained to approximate a defined distribution, for example a Gaussian distribution” teaches a variation autoencoder to encode the input data sequence (corresponds to the features of the one or more objects) into a set of latent variable (corresponds to a latent space)). 
the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects (Dunning et al., Col. 14 Lines 35-48, “Thus a set of latent variables of the variational sequence autoencoder may provide a latent representation of a sequence of observations in terms of the high level features. A sequence of sets of latent variables derived from the further observations may be processed using a mixture model to identify clusters. For example each of the sets of latent variables may be mapped to one of K components or clusters using the mixture model, which may be a Gaussian mixture model. Each component or cluster may correspond to a different temporally extending behavior derived from features representing high-level characteristics of the agent-environment system. These may be considered as prototypical behaviors for the agent, each extending over the timeframe of the sequences used to train the autoencoder” teaches the variational sequence autoencoder deriving latent representations that are mapped to interaction with the agent-environment system with observation of images and object position data (corresponds to one or more objects)). 
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 5,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches t1the processor of claim 3, 
Dunning et al. further teaches wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states (Dunning et al., Col. 13 Lines 44-53, “In some implementations the system also captures a further sequence of observations of the environment as the agent interacts with the environment. The system can then process the further sequence of observations using the trained variational sequence autoencoder to determine a sequence of sets of latent variables. Optionally, the system can process the sequence of sets of latent variables to identify a sequence of clusters in a space of the latent variables, each of the clusters representing a temporally extending behaviour pattern of the agent” teaches a variational autoencoder trained with a sequence of clusters (corresponds to unsupervised learning) to determine behavior patterns (corresponds to the one or more interactions for one or more potential state)).
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 9,
Schwartz et al. in view of Feng et al. teaches t1tt the system of claim 8, 
Schwartz et al. in view of Feng et al. does not appear to explicitly teach wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects
However, Dunning et al., teaches wherein the one or more neural networks include a 2variational autoencoder (VAE) to encode the features of the one or more objects into a latent 3space (Dunning et al., Col. 13 Lines 33-43, “The system can then train a variational autoencoder, in particular a variational sequence autoencoder, with the captured temporal sequences. The variational sequence autoencoder may include a recurrent neural network encoder to encode an input data sequence as a set of latent variables, coupled to a recurrent neural network decoder to decode the set of latent variables to produce an output data sequence, i.e., to reproduce the output data sequence from the input data sequence. During training the latent variables are constrained to approximate a defined distribution, for example a Gaussian distribution” teaches a variation autoencoder to encode the input data sequence (corresponds to the features of the one or more objects) into a set of latent variable (corresponds to a latent space)).
the VAE further maintaining one or more mappings between the one or more interactions 4and the one or more objects (Dunning et al., Col. 14 Lines 35-48, “Thus a set of latent variables of the variational sequence autoencoder may provide a latent representation of a sequence of observations in terms of the high level features. A sequence of sets of latent variables derived from the further observations may be processed using a mixture model to identify clusters. For example each of the sets of latent variables may be mapped to one of K components or clusters using the mixture model, which may be a Gaussian mixture model. Each component or cluster may correspond to a different temporally extending behavior derived from features representing high-level characteristics of the agent-environment system. These may be considered as prototypical behaviors for the agent, each extending over the timeframe of the sequences used to train the autoencoder” teaches the variational sequence autoencoder deriving latent representations that are mapped to interaction with the agent-environment system with observation of images and object position data (corresponds to one or more objects)).
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 11,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches t1the system of claim 9, 
Dunning et al. further teaches wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states (Dunning et al., Col. 13 Lines 44-53, “In some implementations the system also captures a further sequence of observations of the environment as the agent interacts with the environment. The system can then process the further sequence of observations using the trained variational sequence autoencoder to determine a sequence of sets of latent variables. Optionally, the system can process the sequence of sets of latent variables to identify a sequence of clusters in a space of the latent variables, each of the clusters representing a temporally extending behaviour pattern of the agent” teaches a variational autoencoder trained with a sequence of clusters (corresponds to unsupervised learning) to determine behavior patterns (corresponds to the one or more interactions for one or more potential state)).
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 15,
Schwartz et al. in view of Feng et al. teaches t1the method of claim 14, 
Schwartz et al. in view of Feng et al. does not appear to explicitly teach wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects
However, Dunning et al., teaches wherein the one or more neural networks include 2a variational autoencoder (VAE) to encode the features of the one or more objects into a latent 3space (Dunning et al., Col. 13 Lines 33-43, “The system can then train a variational autoencoder, in particular a variational sequence autoencoder, with the captured temporal sequences. The variational sequence autoencoder may include a recurrent neural network encoder to encode an input data sequence as a set of latent variables, coupled to a recurrent neural network decoder to decode the set of latent variables to produce an output data sequence, i.e., to reproduce the output data sequence from the input data sequence. During training the latent variables are constrained to approximate a defined distribution, for example a Gaussian distribution” teaches a variation autoencoder to encode the input data sequence (corresponds to the features of the one or more objects) into a set of latent variable (corresponds to a latent space)).
the VAE further maintaining one or more mappings between the one or more interactions 4and the one or more objects (Dunning et al., Col. 14 Lines 35-48, “Thus a set of latent variables of the variational sequence autoencoder may provide a latent representation of a sequence of observations in terms of the high level features. A sequence of sets of latent variables derived from the further observations may be processed using a mixture model to identify clusters. For example each of the sets of latent variables may be mapped to one of K components or clusters using the mixture model, which may be a Gaussian mixture model. Each component or cluster may correspond to a different temporally extending behavior derived from features representing high-level characteristics of the agent-environment system. These may be considered as prototypical behaviors for the agent, each extending over the timeframe of the sequences used to train the autoencoder” teaches the variational sequence autoencoder deriving latent representations that are mapped to interaction with the agent-environment system with observation of images and object position data (corresponds to one or more objects)).
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 17,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches 1the method of claim 15, 
Dunning et al. further teaches wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states (Dunning et al., Col. 13 Lines 44-53, “In some implementations the system also captures a further sequence of observations of the environment as the agent interacts with the environment. The system can then process the further sequence of observations using the trained variational sequence autoencoder to determine a sequence of sets of latent variables. Optionally, the system can process the sequence of sets of latent variables to identify a sequence of clusters in a space of the latent variables, each of the clusters representing a temporally extending behaviour pattern of the agent” teaches a variational autoencoder trained with a sequence of clusters (corresponds to unsupervised learning) to determine behavior patterns (corresponds to the one or more interactions for one or more potential state)). Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 21,
Schwartz et al. in view of Feng et al. teaches t1the machine-readable medium of claim 20, 
Schwartz et al. in view of Feng et al. does not appear to explicitly teach wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects
However, Dunning et al., teaches wherein the one or more 2neural networks include a variational autoencoder (VAE) to encode the features of the one or 3more objects into a latent space (Dunning et al., Col. 13 Lines 33-43, “The system can then train a variational autoencoder, in particular a variational sequence autoencoder, with the captured temporal sequences. The variational sequence autoencoder may include a recurrent neural network encoder to encode an input data sequence as a set of latent variables, coupled to a recurrent neural network decoder to decode the set of latent variables to produce an output data sequence, i.e., to reproduce the output data sequence from the input data sequence. During training the latent variables are constrained to approximate a defined distribution, for example a Gaussian distribution” teaches a variation autoencoder to encode the input data sequence (corresponds to the features of the one or more objects) into a set of latent variable (corresponds to a latent space)). 
the VAE further maintaining one or more mappings between the 4one or more interactions and the one or more objects (Dunning et al., Col. 14 Lines 35-48, “Thus a set of latent variables of the variational sequence autoencoder may provide a latent representation of a sequence of observations in terms of the high level features. A sequence of sets of latent variables derived from the further observations may be processed using a mixture model to identify clusters. For example each of the sets of latent variables may be mapped to one of K components or clusters using the mixture model, which may be a Gaussian mixture model. Each component or cluster may correspond to a different temporally extending behavior derived from features representing high-level characteristics of the agent-environment system. These may be considered as prototypical behaviors for the agent, each extending over the timeframe of the sequences used to train the autoencoder” teaches the variational sequence autoencoder deriving latent representations that are mapped to interaction with the agent-environment system with observation of images and object position data (corresponds to one or more objects)). 
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the one or more neural networks 5include a variational autoencoder (VAE) to encode the features of the one or more objects into a 6latent space, the VAE further maintaining one or more mappings between the one or more 7interactions and the one or more objects. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Regarding Claim 23,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches t1the machine-readable medium of claim 21, 
Dunning et al. further teaches wherein the VAE is trained 2using unsupervised learning to determine the one or more interactions for one or more potential 3states (Dunning et al., Col. 13 Lines 44-53, “In some implementations the system also captures a further sequence of observations of the environment as the agent interacts with the environment. The system can then process the further sequence of observations using the trained variational sequence autoencoder to determine a sequence of sets of latent variables. Optionally, the system can process the sequence of sets of latent variables to identify a sequence of clusters in a space of the latent variables, each of the clusters representing a temporally extending behaviour pattern of the agent” teaches a variational autoencoder trained with a sequence of clusters (corresponds to unsupervised learning) to determine behavior patterns (corresponds to the one or more interactions for one or more potential state)).
Schwartz et al. in view of Feng et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. and Feng et al. with Dunning et al., with motivation wherein the VAE is trained using unsupervised 2learning to determine the one or more interactions for one or more potential states. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Claims 4, 10, 16, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Schwartz et al. in view of Feng et al. in view of Dunning et al. in further view of Ma et al. (“Background Augmentation Generative Adversarial Networks (BAGANs): Effective Data Generation Based on GAN-Augmented 3D Synthesizing”)
Regarding Claim 4,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches t1the processor of claim 3, 
Schwartz et al. further teaches the generative network accepting as input at least the latent space and the mappings (Schwartz et al., Col. 4 Lines 49-51, “In particular embodiments, the ML model may be an autoencoder, a generative adversarial network, or any other suitable ML architecture” teaches the machine-learning model being a generative adversarial network.  FIG. 1 and Col. 4 Lines 19-29, “For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints” teaches a neural network (corresponds to generative network) that receives interaction (corresponds to mapping) of the face and eye features of the user. Col. 5 Lines 24-29, “In order to output an image for an assigned/specific region, e.g., the left eye of the user, the eye model fixes/preserves a latent segmented face code 260 corresponding to a region 270 on the face, randomly mixes all other segmented face codes 262, 264 in the code 210, and decodes outputs” teaches utilizing a latent segment code (corresponds to latent space) as an input to generate an image for an assigned/specific region).
Schwartz et al. in view of Feng et al. in view of Dunning et al. does not appear to explicitly teach wherein the one or more neural networks 2include a generative network for generating the one or more images indicating the one or more 3interactions
However, Ma et al., teaches wherein the one or more neural networks 2include a generative network for generating the one or more images indicating the one or more 3interactions (Ma et al., Section 3.2.2 Pg. 8, “If the integrity of the background image xback is considered too great, the BAGAN degenerate to the network sample generated by the ACGAN” teaches one or more neural networks. Figure 10 and Section 3.1 Pg. 6, “Background augmentation generative adversarial networks (BAGANs) do not directly generate foreground target information associated with the visual intelligence algorithm. This ensures any picture of our synthesis data has guaranteed foreground appearance. The BAGAN is responsible for generating related backgrounds only according to category information and pose annotations for foreground objects” teaches a BAGAN’s Generator (corresponds to the generative network) that generates synthetic images with the interaction of the background and the foreground objects). 
Schwartz et al. in view of Feng et al. in view of Dunning et al. in view of Ma et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al., Feng et al., and Dunning et al. with Ma et al., with motivation wherein the one or more neural networks 2include a generative network for generating the one or more images indicating the one or more 3interactions. “Our approach has been validated to have better performance than other methods through image recognition tasks with respect to the natural image database ObjectNet3D. This study can shorten the algorithm development time of AR and expand its application scope, which is of great significance for immersive interactive systems” (Ma et al., Abstract). The proposed teaching is beneficial in that it has better performance than other methods as well as shorten the algorithm development time of AR and expand its application scope.
Regarding Claim 10,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches the system of claim 9, 
Schwartz et al. further teaches the generative network accepting as input at least the latent space and the mappings (Schwartz et al., Col. 4 Lines 49-51, “In particular embodiments, the ML model may be an autoencoder, a generative adversarial network, or any other suitable ML architecture” teaches the machine-learning model being a generative adversarial network.  FIG. 1 and Col. 4 Lines 19-29, “For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints” teaches a neural network (corresponds to generative network) that receives interaction (corresponds to mapping) of the face and eye features of the user. Col. 5 Lines 24-29, “In order to output an image for an assigned/specific region, e.g., the left eye of the user, the eye model fixes/preserves a latent segmented face code 260 corresponding to a region 270 on the face, randomly mixes all other segmented face codes 262, 264 in the code 210, and decodes outputs” teaches utilizing a latent segment code (corresponds to latent space) as an input to generate an image for an assigned/specific region).
Schwartz et al. in view of Feng et al. in view of Dunning et al. does not appear to explicitly teach wherein the one or more neural networks include a 2generative network for generating the one or more images indicating the one or more 3interactions
However, Ma et al., teaches wherein the one or more neural networks include a 2generative network for generating the one or more images indicating the one or more 3interactions (Ma et al., Section 3.2.2 Pg. 8, “If the integrity of the background image xback is considered too great, the BAGAN degenerate to the network sample generated by the ACGAN” teaches one or more neural networks. Figure 4 and Section 3.1 Pg. 6, “Background augmentation generative adversarial networks (BAGANs) do not directly generate foreground target information associated with the visual intelligence algorithm. This ensures any picture of our synthesis data has guaranteed foreground appearance. The BAGAN is responsible for generating related backgrounds only according to category information and pose annotations for foreground objects” teaches a BAGAN’s Generator (corresponds to the generative network) that generates one or more final images. Section 5 Pg. 16, “Moreover, for industrial development, high-quality data are easier to obtain from real images than from real photos. Therefore, our approach is to promote immersive HCI with good exploration” teaches human-computer interaction with real images).
Schwartz et al. in view of Feng et al. in view of Dunning et al. in view of Ma et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al., Feng et al., and Dunning et al. with Ma et al., with motivation wherein the one or more neural networks 2include a generative network for generating the one or more images indicating the one or more 3interactions. “Our approach has been validated to have better performance than other methods through image recognition tasks with respect to the natural image database ObjectNet3D. This study can shorten the algorithm development time of AR and expand its application scope, which is of great significance for immersive interactive systems” (Ma et al., Abstract). The proposed teaching is beneficial in that it has better performance than other methods as well as shorten the algorithm development time of AR and expand its application scope.
Regarding Claim 16,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches t136\\NORTHCA - 1R2674/010401 - 2772512 vltthe method of claim 15, 
Schwartz et al. further teaches the generative network accepting as input at least the latent space and the mappings (Schwartz et al., Col. 4 Lines 49-51, “In particular embodiments, the ML model may be an autoencoder, a generative adversarial network, or any other suitable ML architecture” teaches the machine-learning model being a generative adversarial network.  FIG. 1 and Col. 4 Lines 19-29, “For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints” teaches a neural network (corresponds to generative network) that receives interaction (corresponds to mapping) of the face and eye features of the user. Col. 5 Lines 24-29, “In order to output an image for an assigned/specific region, e.g., the left eye of the user, the eye model fixes/preserves a latent segmented face code 260 corresponding to a region 270 on the face, randomly mixes all other segmented face codes 262, 264 in the code 210, and decodes outputs” teaches utilizing a latent segment code (corresponds to latent space) as an input to generate an image for an assigned/specific region).
Schwartz et al. in view of Feng et al. in view of Dunning et al. does not appear to explicitly teach wherein the one or more neural networks include 2a generative network for generating the one or more images indicating the one or more 3interactions
However, Ma et al., teaches wherein the one or more neural networks include 2a generative network for generating the one or more images indicating the one or more 3interactions (Ma et al., Section 3.2.2 Pg. 8, “If the integrity of the background image xback is considered too great, the BAGAN degenerate to the network sample generated by the ACGAN” teaches one or more neural networks. Figure 4 and Section 3.1 Pg. 6, “Background augmentation generative adversarial networks (BAGANs) do not directly generate foreground target information associated with the visual intelligence algorithm. This ensures any picture of our synthesis data has guaranteed foreground appearance. The BAGAN is responsible for generating related backgrounds only according to category information and pose annotations for foreground objects” teaches a BAGAN’s Generator (corresponds to the generative network) that generates one or more final images. Section 5 Pg. 16, “Moreover, for industrial development, high-quality data are easier to obtain from real images than from real photos. Therefore, our approach is to promote immersive HCI with good exploration” teaches human-computer interaction with real images).
Schwartz et al. in view of Feng et al. in view of Dunning et al. in view of Ma et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al., Feng et al., and Dunning et al. with Ma et al., with motivation wherein the one or more neural networks 2include a generative network for generating the one or more images indicating the one or more 3interactions. “Our approach has been validated to have better performance than other methods through image recognition tasks with respect to the natural image database ObjectNet3D. This study can shorten the algorithm development time of AR and expand its application scope, which is of great significance for immersive interactive systems” (Ma et al., Abstract). The proposed teaching is beneficial in that it has better performance than other methods as well as shorten the algorithm development time of AR and expand its application scope.  
Regarding Claim 22,
Schwartz et al. in view of Feng et al. in view of Dunning et al. teaches t1the machine-readable medium of claim 21, 
Schwartz et al. further teaches the generative network accepting as input at least the latent space 4and the mappings (Schwartz et al., Col. 4 Lines 49-51, “In particular embodiments, the ML model may be an autoencoder, a generative adversarial network, or any other suitable ML architecture” teaches the machine-learning model being a generative adversarial network.  FIG. 1 and Col. 4 Lines 19-29, “For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints” teaches a neural network (corresponds to generative network) that receives interaction (corresponds to mapping) of the face and eye features of the user. Col. 5 Lines 24-29, “In order to output an image for an assigned/specific region, e.g., the left eye of the user, the eye model fixes/preserves a latent segmented face code 260 corresponding to a region 270 on the face, randomly mixes all other segmented face codes 262, 264 in the code 210, and decodes outputs” teaches utilizing a latent segment code (corresponds to latent space) as an input to generate an image for an assigned/specific region).
Schwartz et al. in view of Feng et al. in view of Dunning et al. does not appear to explicitly teach wherein the one or more 2neural networks include a generative network for generating the one or more images indicating 3the one or more interactions
However, Ma et al., teaches wherein the one or more 2neural networks include a generative network for generating the one or more images indicating 3the one or more interactions (Ma et al., Section 3.2.2 Pg. 8, “If the integrity of the background image xback is considered too great, the BAGAN degenerate to the network sample generated by the ACGAN” teaches one or more neural networks. Figure 4 and Section 3.1 Pg. 6, “Background augmentation generative adversarial networks (BAGANs) do not directly generate foreground target information associated with the visual intelligence algorithm. This ensures any picture of our synthesis data has guaranteed foreground appearance. The BAGAN is responsible for generating related backgrounds only according to category information and pose annotations for foreground objects” teaches a BAGAN’s Generator (corresponds to the generative network) that generates one or more final images. Section 5 Pg. 16, “Moreover, for industrial development, high-quality data are easier to obtain from real images than from real photos. Therefore, our approach is to promote immersive HCI with good exploration” teaches human-computer interaction with real images). 
Schwartz et al. in view of Feng et al. in view of Dunning et al. in view of Ma et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al., Feng et al., and Dunning et al. with Ma et al., with motivation wherein the one or more neural networks 2include a generative network for generating the one or more images indicating the one or more 3interactions. “Our approach has been validated to have better performance than other methods through image recognition tasks with respect to the natural image database ObjectNet3D. This study can shorten the algorithm development time of AR and expand its application scope, which is of great significance for immersive interactive systems” (Ma et al., Abstract). The proposed teaching is beneficial in that it has better performance than other methods as well as shorten the algorithm development time of AR and expand its application scope.
Claims 6, 12, 18, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Schwartz et al. in view of Dunning et al. 
Regarding Claim 6,
Schwartz et al. teaches the processor of claim 1, 
Schwartz et al. does not appear to explicitly teach wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions
However, Dunning et al., teaches wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions (Dunning et al., Col. 3 Lines 44-49, “The observations may also include, for example, sensed electronic signals such as motor current or a temperature signal; and/or image or video data for example from a camera or a LIDAR sensor, e.g., data from sensors of the agent or data from sensors that are located separately from the agent in the environment” teaches the observation being image or video data (corresponds to video content) in the environment. Col. 3 Lines 26-30, “the observations may include, e.g., one or more of: images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator” teaches observations (corresponds to the video content with one or more segments) of the agent interactions with the environment. Col. 6 Lines 49-54, “Additionally, the system 100 can generate and provide for presentation to users data identifying temporally extended behavior patterns of the agent interacting with an environment. That is, the system 100 can generate user interface data and provide the user interface data to users for presentation on user devices” teaches identifying temporally extended behavior patterns of the agent interacting with the environment. Col. 14 Lines 49-52, “Processing the further sequence of observations thus ultimately results in determining which of a set of prototypical behaviors the agent is involved in at each of a succession of times” teaches determining the resulting behaviors based on the sequence of observations (corresponds to the one or more segments)).
Schwartz et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Dunning et al., with motivation wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Regarding Claim 12,
Schwartz et al. teaches t1the system of claim 7, 
Schwartz et al. does not appear to explicitly teach wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions
However, Dunning et al., teaches wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions (Dunning et al., Col. 3 Lines 44-49, “The observations may also include, for example, sensed electronic signals such as motor current or a temperature signal; and/or image or video data for example from a camera or a LIDAR sensor, e.g., data from sensors of the agent or data from sensors that are located separately from the agent in the environment” teaches the observation being image or video data (corresponds to video content) in the environment. Col. 3 Lines 26-30, “the observations may include, e.g., one or more of: images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator” teaches observations (corresponds to the video content with one or more segments) of the agent interactions with the environment. Col. 6 Lines 49-54, “Additionally, the system 100 can generate and provide for presentation to users data identifying temporally extended behavior patterns of the agent interacting with an environment. That is, the system 100 can generate user interface data and provide the user interface data to users for presentation on user devices” teaches identifying temporally extended behavior patterns of the agent interacting with the environment. Col. 14 Lines 49-52, “Processing the further sequence of observations thus ultimately results in determining which of a set of prototypical behaviors the agent is involved in at each of a succession of times” teaches determining the resulting behaviors based on the sequence of observations (corresponds to the one or more segments)).  
Schwartz et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Dunning et al., with motivation wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Regarding Claim 18,
Schwartz et al. teaches t1the method of claim 13, 
Schwartz et al. does not appear to explicitly teach wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions
However, Dunning et al., teaches wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions (Dunning et al., Col. 3 Lines 44-49, “The observations may also include, for example, sensed electronic signals such as motor current or a temperature signal; and/or image or video data for example from a camera or a LIDAR sensor, e.g., data from sensors of the agent or data from sensors that are located separately from the agent in the environment” teaches the observation being image or video data (corresponds to video content) in the environment. Col. 3 Lines 26-30, “the observations may include, e.g., one or more of: images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator” teaches observations (corresponds to the video content with one or more segments) of the agent interactions with the environment. Col. 6 Lines 49-54, “Additionally, the system 100 can generate and provide for presentation to users data identifying temporally extended behavior patterns of the agent interacting with an environment. That is, the system 100 can generate user interface data and provide the user interface data to users for presentation on user devices” teaches identifying temporally extended behavior patterns of the agent interacting with the environment. Col. 14 Lines 49-52, “Processing the further sequence of observations thus ultimately results in determining which of a set of prototypical behaviors the agent is involved in at each of a succession of times” teaches determining the resulting behaviors based on the sequence of observations (corresponds to the one or more segments)).  
Schwartz et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Dunning et al., with motivation wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Regarding Claim 24,
Schwartz et al. teaches 137\\NORTHCA - 1 R2674/010401 - 2772512 vlthe machine-readable medium of claim 19, 
Schwartz et al. does not appear to explicitly teach wherein the one or more 2images are frames of video content, and wherein the video content includes one or more 3segments representing the one or more interactions, the one or more segments further 4representing resulting behaviors for the one or more interactions
However, Dunning et al., teaches wherein the one or more 2images are frames of video content, and wherein the video content includes one or more 3segments representing the one or more interactions, the one or more segments further 4representing resulting behaviors for the one or more interactions (Dunning et al., Col. 3 Lines 44-49, “The observations may also include, for example, sensed electronic signals such as motor current or a temperature signal; and/or image or video data for example from a camera or a LIDAR sensor, e.g., data from sensors of the agent or data from sensors that are located separately from the agent in the environment” teaches the observation being image or video data (corresponds to video content) in the environment. Col. 3 Lines 26-30, “the observations may include, e.g., one or more of: images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator” teaches observations (corresponds to the video content with one or more segments) of the agent interactions with the environment. Col. 6 Lines 49-54, “Additionally, the system 100 can generate and provide for presentation to users data identifying temporally extended behavior patterns of the agent interacting with an environment. That is, the system 100 can generate user interface data and provide the user interface data to users for presentation on user devices” teaches identifying temporally extended behavior patterns of the agent interacting with the environment. Col. 14 Lines 49-52, “Processing the further sequence of observations thus ultimately results in determining which of a set of prototypical behaviors the agent is involved in at each of a succession of times” teaches determining the resulting behaviors based on the sequence of observations (corresponds to the one or more segments)). 
Schwartz et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Schwartz et al. with Dunning et al., with motivation wherein the one or more images are frames of 2video content, and wherein the video content includes one or more segments representing the one 3or more interactions, the one or more segments further representing resulting behaviors for the 4one or more interactions. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance.
Claims 25 and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Gatica-Rojas et al. (“Virtual reality interface devices in the reorganization of neural networks in the brain of patients with neurological diseases”) in view of Schwartz et al. in view of Dunning et al.
 Regarding Claim 25,
Gatica-Rojas et al. teaches a player training system, comprising (Gatica-Rojas et al., , “Virtual reality systems offer a simulated environment that allows the user to have a real-time interaction through a computer” teaches a virtual reality system. “Many virtual reality devices have been used in rehabilitation, e.g., in patients with attention deficit hyperactivity disorder and improving the function of the hand or upper limb in hemiplegic patients (Figure 2A, C) and balance training in subjects with neurological disorders (Parkinson’s disease, post- stroke, cerebral palsy) or older adults with loss of balance and normal individuals (Figure 2B, D)” teaches a balance training for the patient (corresponds to the player)). 
Gatica-Rojas et al. does not appear to explicitly teach one or more processors to use one or more neural networks to generate one or 3more images indicating one or more interactions between a player and one or more objects in the 4one or more images
However, Schwartz et al., teaches 2one or more processors to use one or more neural networks to generate one or 3more images indicating one or more interactions between a player and one or more objects in the 4one or more images 5 (Schwartz et al., Col. 8 Lines 54-57, “The head-mounted device comprises one or more processors configured to implement the camera 520, the eye computing unit 530, the face computing unit 540, and the avatar rendering unit 550 of the central module 510” teaches the one or more processors. FIG. 1 and Col. 4 Lines 14-39, “FIG. 1 illustrates an example diagram of an avatar-rendering system architecture 100, in accordance with certain embodiments. The avatar-rendering system 100 may comprise at least one HMD 112 which utilizes a neural network 120 to render an avatar for users 110 respectively. For example, a HMD 112 a captures one or more images 130 of a user 110 a using its camera and encodes the one or more images 130 into a code 126 via an encoder 122. The code 126 describes a state of the face of the user 110 a. In particular embodiments, the one or more images 130, e.g., inputs, may include geometry information and view-dependent texture information of a subject, e.g., the user 110 a. Furthermore, the one or more image 130 of the user 110 a may be captured from various viewpoints, such that a texture of the avatar can be compensated based on the images from different viewpoints. The decoder 124 then decodes the code 126, which includes the geometry information and the view-dependent texture information of the subject, to render an avatar 140 for the user 110 a, which avatar 140 can be viewed by a user 110 b via his/her HMD 112 b. In particular embodiments, the decoder 124 decodes the code 126 to produce a stereo image of the user 110 a. In particular embodiments, the avatar-rendering process in the avatar-rendering system 100 may be bidirectional. For example, the user 110 b may also render an avatar of the user 110 b to be displayed in the HMD 112 a using his/her HMD 112 b” teaches utilizing a neural network (corresponds to the one or more neural networks) to render an avatar (corresponds to generating one or more images) for the user. Col. 3 Lines 45-55, “For simulating a real eye contact for the user in a display, especially a head-mounted display (HMD), an individual eye model is provided to render the eyeballs of the user. Embodiments described herein provides a method using a neural network to generate the avatar's eyeballs which is separated from the rest of the avatar, such that the face of the avatar is constructed based on (1) a facial mesh and a facial texture and (2) an eyeball mesh and an eyeball texture. Therefore, a gaze of the user described in the present disclosure can be reproduced accurately and vividly in the rendered avatar” teaches interaction of the rendered avatar with the face and eye features (corresponds to one or more objects)).
Gatica-Rojas et al. in view of Schwartz et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al. with Schwartz et al., with motivation of one or more processors to use one or more neural networks to generate one or 3more images indicating one or more interactions between a player and one or more objects in the 4one or more images 5. “According to various embodiments, an advantage of features herein is that an individual eye model for an avatar can render an accurate, precise, real eye contact for a display” (Schwartz et al., Col. 14 Lines 23-25). The proposed teaching is beneficial in that it render an accurate, precise, real eye contact for a display. 
Gatica-Rojas et al. in view of Schwartz et al. does not appear to explicitly teach memory for storing network parameters for the one or more neural networks
However, Dunning et al., teaches memory for storing network parameters for the one or more neural networks (Dunning et al., Col. 8 Lines 3-10, “Additionally, in some implementations, the fast updating RNN and the slow updating RNN are augmented with a shared external memory, i.e., both of the RNNs read to and write from the same, shared external memory as part of updating the corresponding hidden state. An example architecture for augmenting an RNN with an external memory that can be used by the system is the Differentiable Neural Computer (DNC) memory architecture” teaches a memory that stores the fast updating RNN (corresponds to the one or more neural networks). Col. 7 Lines 53-56, “The system then processes a fast updating input that includes the observation 120, the slow updating hidden state, and the parameters of the prior distribution 222 using the fast updating RNN to update the fast updating hidden state” teaches the parameters being part of the fast updating RNN).
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al. and Schwartz et al.  with Dunning et al., with motivation of a 5memory for storing network parameters for the one or more neural networks. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Regarding Claim 30,
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. teaches t1the player training system of claim 25, 
Dunning et al. further teaches wherein the one or more images 2are frames of video content, and wherein the video content includes one or more segments 3representing the one or more interactions, the one or more segments further representing 4resulting behaviors for the one or more interactions (Dunning et al., Col. 3 Lines 44-49, “The observations may also include, for example, sensed electronic signals such as motor current or a temperature signal; and/or image or video data for example from a camera or a LIDAR sensor, e.g., data from sensors of the agent or data from sensors that are located separately from the agent in the environment” teaches the observation being image or video data (corresponds to video content) in the environment. Col. 3 Lines 26-30, “the observations may include, e.g., one or more of: images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator” teaches observations (corresponds to the video content with one or more segments) of the agent interactions with the environment. Col. 6 Lines 49-54, “Additionally, the system 100 can generate and provide for presentation to users data identifying temporally extended behavior patterns of the agent interacting with an environment. That is, the system 100 can generate user interface data and provide the user interface data to users for presentation on user devices” teaches identifying temporally extended behavior patterns of the agent interacting with the environment. Col. 14 Lines 49-52, “Processing the further sequence of observations thus ultimately results in determining which of a set of prototypical behaviors the agent is involved in at each of a succession of times” teaches determining the resulting behaviors based on the sequence of observations (corresponds to the one or more segments)). 
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al. and Schwartz et al.  with Dunning et al., with motivation wherein the one or more images 2are frames of video content, and wherein the video content includes one or more segments 3representing the one or more interactions, the one or more segments further representing 4resulting behaviors for the one or more interactions. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Claims 26-28 are rejected under 35 U.S.C. 103 as being unpatentable over Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al.
Regarding Claim 26,
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. teaches 1the player training system of claim 25, 
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al.  does not appear to explicitly teach wherein the one or more 2processors are further to perform instance segmentation to identify features for the one or more 3objects in one or more input images
However, Feng et al., teaches wherein the one or more 2processors are further to perform instance segmentation to identify features for the one or more 3objects in one or more input images (Feng et al., Col. 16 Lines 25-29, “The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers” teaches the one or more processor. Section 1 Pg. 310, “summarize the notable hardware units including GPUs, field-programmable gate arrays (FPGAs) and other advanced mobile hardware platforms that are adapted or designed to accelerate DNN-based computer vision algorithms” teaches one or more circuits. Fig. 2(d) and Section 2.3 Pg. 312, “instance segmentation, which predicts different labels for different object instances as a further improvement to semantic segmentation, as shown in Fig. 2 (d)” teaches instance segmentation that predicts different labels for different object instances (corresponds to identify features for the one or more objects) in the input images). 
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al., Schwartz et al., and Dunning et al. with Feng et al., with motivation wherein the one or more 2processors are further to perform instance segmentation to identify features for the one or more 3objects in one or more input images. “In this paper, we conduct a comprehensive survey on computer vision techniques. Specially, we have highlighted the recent accomplishments in both the algorithms for a variety of computer vision tasks such as image classification, object detection and image segmentation, and the promising hardware platforms to implement DNNs efficiently for practical applications, such as GPUs, FPGAs and other new generation of hardware accelerators” (Feng et al., Conclusion). The proposed teaching is beneficial in that it facilitates real-time and/or energy efficient operations.
Regarding Claim 27,
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. teaches t1the player training system of claim 26, 
Dunning et al. further teaches wherein the one or more neural 2networks include a variational autoencoder (VAE) to encode the features of the one or more 3objects into a latent space (Dunning et al., Col. 13 Lines 33-43, “The system can then train a variational autoencoder, in particular a variational sequence autoencoder, with the captured temporal sequences. The variational sequence autoencoder may include a recurrent neural network encoder to encode an input data sequence as a set of latent variables, coupled to a recurrent neural network decoder to decode the set of latent variables to produce an output data sequence, i.e., to reproduce the output data sequence from the input data sequence. During training the latent variables are constrained to approximate a defined distribution, for example a Gaussian distribution” teaches a variation autoencoder to encode the input data sequence (corresponds to the features of the one or more objects) into a set of latent variable (corresponds to a latent space)).
the VAE further maintaining one or more mappings between the one 4or more interactions and the one or more objects (Dunning et al., Col. 14 Lines 35-48, “Thus a set of latent variables of the variational sequence autoencoder may provide a latent representation of a sequence of observations in terms of the high level features. A sequence of sets of latent variables derived from the further observations may be processed using a mixture model to identify clusters. For example each of the sets of latent variables may be mapped to one of K components or clusters using the mixture model, which may be a Gaussian mixture model. Each component or cluster may correspond to a different temporally extending behavior derived from features representing high-level characteristics of the agent-environment system. These may be considered as prototypical behaviors for the agent, each extending over the timeframe of the sequences used to train the autoencoder” teaches the variational sequence autoencoder deriving latent representations that are mapped to interaction with the agent-environment system with observation of images and object position data (corresponds to one or more objects)).   
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al., Schwartz et al., and Dunning et al. with Feng et al., with motivation wherein the one or more neural 2networks include a variational autoencoder (VAE) to encode the features of the one or more 3objects into a latent space, the VAE further maintaining one or more mappings between the one 4or more interactions and the one or more objects. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Regarding Claim 28,
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. teaches 1he player training system of claim 26, 
Dunning et al. further teaches wherein the VAE is trained using 2unsupervised learning to determine the one or more interactions for one or more potential states (Dunning et al., Col. 13 Lines 44-53, “In some implementations the system also captures a further sequence of observations of the environment as the agent interacts with the environment. The system can then process the further sequence of observations using the trained variational sequence autoencoder to determine a sequence of sets of latent variables. Optionally, the system can process the sequence of sets of latent variables to identify a sequence of clusters in a space of the latent variables, each of the clusters representing a temporally extending behaviour pattern of the agent” teaches a variational autoencoder trained with a sequence of clusters (corresponds to unsupervised learning) to determine behavior patterns (corresponds to the one or more interactions for one or more potential state)).
 Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al., Schwartz et al., and Dunning et al. with Feng et al., with motivation wherein the VAE is trained using 2unsupervised learning to determine the one or more interactions for one or more potential states. “Certain described aspects relate to using a temporally hierarchical neural network system to select actions. These aspects allow actions that are selected at each time step to be consistent with long-term plans for the agent. The agent can then achieve improved performance on tasks that require that actions be selected that are dependent on data received in observations at time steps that are a large number of time steps before the current time step” (Dunning et al., Col. 1-2 Lines 66-67 and Lines 1-5). The proposed teaching is beneficial in that it improves the agent’s performance. 
Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over  Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. in further view of Livne et al. (“Deep Context-Aware Recommender System Utilizing Sequential Latent Context”)
Regarding Claim 29,
 Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. teaches t1the player training system of claim 27, 
 Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. does not appear to explicitly teach wherein the one or more neural 2networks include a generative adversarial network (GAN) to accept the latent space as input 3
Schwartz et al. further teaches wherein the one or more neural 2networks include a generative adversarial network (GAN) to accept the latent space as input 3 (Schwartz et al., Col. 4 Lines 49-51, “In particular embodiments, the ML model may be an autoencoder, a generative adversarial network, or any other suitable ML architecture” teaches the machine-learning model being a generative adversarial network. Col. 5 Lines 24-29, “In order to output an image for an assigned/specific region, e.g., the left eye of the user, the eye model fixes/preserves a latent segmented face code 260 corresponding to a region 270 on the face, randomly mixes all other segmented face codes 262, 264 in the code 210, and decodes outputs” teaches utilizing a latent segment code (corresponds to latent space) as an input to generate an image for an assigned/specific region).
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al., Schwartz et al., and Dunning et al. with Feng et al., with motivation wherein the one or more neural 2networks include a generative adversarial network (GAN) to accept the latent space as input35. “According to various embodiments, an advantage of features herein is that an individual eye model for an avatar can render an accurate, precise, real eye contact for a display” (Schwartz et al., Col. 14 Lines 23-25). The proposed teaching is beneficial in that it render an accurate, precise, real eye contact for a display.  
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. does not appear to explicitly teach generate the one or more recommendations based at least in part upon the one or more 4cumulative changes of state determined from the latent space 3
However, Livne et al., teaches generate the one or more recommendations based at least in part upon the one or more 4cumulative changes of state determined from the latent space (“In context-aware recommender systems (CARSs), contextual factors are taken into account when modeling user profiles and generating recommendations” teaches the context-aware recommender systems generating recommendations based on the state of the user profile. “Matrix factorization (MF), which projects users and items into a shared latent space, is a common approach for latent factor model based recommendation. Most research efforts in CARSs have been devoted to the enhancement of MF” teaches CARSs enhances matrix factorization, which projects users and items into a shared latent space).
Gatica-Rojas et al. in view of Schwartz et al. in view of Dunning et al. in view of Feng et al. in view of Livne et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Gatica-Rojas et al., Schwartz et al., Dunning et al. and Feng et al. with Livne et al., with motivation to generate the one or more recommendations based at least in part upon the one or more 4cumulative changes of state determined from the latent space. “We deployed our approach using two context-aware datasets with different context dimensions. Empirical analysis of our results validates that our proposed sequential latent context-aware model (SLCM), surpasses state of the art CARS models” (Livne et al., Abstract). The proposed teaching is beneficial in that it validates an improved model that surpasses state of the art CARS models. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
  
/HENRY TRONG NGUYEN/Examiner, Art Unit 2125                                                                                                                                                                                                        4 
1
1a /KAMRAN AFSHAR/ Supervisory Patent Examiner, Art Unit 2125                                                                                                                                                                                                       
1tht 
1aa 
1aa