DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on 4/25/2022 has been entered. The application has pending
claims 1-30. With respect to the drawings, Applicant has amended Para. 0048 to designate reference character 232 as only the image editor, Applicant has amended Fig. 12B to correct for minor informalities regarding Cache(s) 1262C, Applicant has amended Fig. 21 to include the missing reference characters from the specification, and Applicant has amended the specification to include missing reference characters in Figs. 2, 3, 4, 9, 17B. Please see below for the drawing objection made for reference character 1244 not being added in the specification. With respect to the specification, Applicant has amended the specification to correct for minor informalities. Therefore, the objections to the specification have been withdrawn. With respect to the claim rejections under 112(b), Applicant has amended claims 5, 11, 17, 23, and 29 to correct for lack of antecedent basis. Therefore, the 112(b) rejections have been withdrawn. With respect to the claim rejections under 101, Applicant has amended claims 19-24 to recite “non-transitory computer-readable medium”. Therefore, the rejections under 101 have been withdrawn.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/25/2022 is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: fetch 1244 in Figs. 12B-12C.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 7-9, 13-15, 19-21, 25-27 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Deep Feature Consistent Variational Autoencoder” by Hou et al.
Regarding claim 1, Hou et al. teaches, a processor, comprising: 2one or more circuits to use one or more neural networks to generate one or 3more time-lapsed images of a second object based, at least in part, on one or more images of 4a first object (As shown in Pg. 1, “1. Introduction”, deep convolutional neural networks (CNNs) are used in many computer vision tasks for supervised and unsupervised learning. A computer inherently has a processor and circuitry to train the CNN for image processing tasks; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Fig. 4, output images of faces are generated from input images of faces with different VAE models; As shown in Pg. 5, first paragraph of left-hand column, VAE-123 generates faces of different ages (i.e. time-lapsed image of an object)).
Regarding claim 2, Hou et al. teaches, the processor of claim 1 (see claim 1 above), wherein the one or more neural networks 2include a convolutional neural network (CNN) to extract features of the first object from the 3one or more images of the first object, wherein the features are transformed into one or more 4feature vectors adhering to a schema (As shown in the abstract, hidden features of a deep convolutional neural network (CNN) are used for VAE training; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 1, right-hand column and Pg. 3, right-hand column, hidden representations (i.e. features) are extracted from the deep CNN; As shown in Pgs. 5-6, “4.3.2 Facial Attribute Manipulation”, two facial attributes (i.e. smiling and wearing eyeglasses) are fed into the encoder network to compute latent vectors of faces; As shown in Fig. 6, there are attribute-specific vectors (i.e. features transformed into vectors that have a certain schema) that are added or subtracted so that faces can be generated from latent vectors).
Regarding claim 3, Hou et al. teaches, the processor of claim 2 (see claim 2 above), wherein the one or more neural networks 6include one or more variational autoencoders (VAEs) to encode the features for the first 7object to a latent space to act as a constraint in generating the one or more time-lapsed 8images (As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 4, left-hand column, latent space is investigated and the semantic relationship between different latent representations are studied and applied to facial attribute prediction; As shown in Pg. 5, left-hand column, the relationship between different learned latent vectors are investigated and the experiments are based on a trained VAE-123 model; As shown in Pg. 6, left-hand column, images with facial attributes (i.e. features) are fed into the encoder network to compute latent vectors; As shown in Fig. 6, images are generated from latent vectors in which the input image is reconstructed (i.e. time-lapsed image)).
Regarding claim 7, Hou et al. teaches, a system comprising: 2one or more processors to use one or more neural networks to generate one or 3more time-lapsed images of a second object based, at least in part, on one or more images of a 4first object (As shown in Pg. 1, “1. Introduction”, deep convolutional neural networks (CNNs) are used in many computer vision tasks for supervised and unsupervised learning. A computer is a system and inherently has a processor to train the CNN for image processing tasks; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Fig. 4, output images of faces are generated from input images of faces with different VAE models; As shown in Pg. 5, first paragraph of left-hand column, VAE-123 generates faces of different ages (i.e. time-lapsed image of an object)).
Regarding claim 8, Hou et al. teaches, the system of claim 7 (see claim 7 above), wherein the one or more neural networks include a 2convolutional neural network (CNN) to extract features of the first object from the one or more 3images of the first object, wherein the features are transformed into one or more feature vectors 4adhering to a schema (As shown in the abstract, hidden features of a deep convolutional neural network (CNN) are used for VAE training; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 1, right-hand column and Pg. 3, right-hand column, hidden representations (i.e. features) are extracted from the deep CNN; As shown in Pgs. 5-6, “4.3.2 Facial Attribute Manipulation”, two facial attributes (i.e. smiling and wearing eyeglasses) are fed into the encoder network to compute latent vectors of faces; As shown in Fig. 6, there are attribute-specific vectors (i.e. features transformed into vectors that have a certain schema) that are added or subtracted so that faces can be generated from latent vectors).
Regarding claim 9, Hou et al. teaches, the system of claim 8 (see claim 8 above), wherein the one or more neural networks include 2one or more variational autoencoders (VAEs) to encode the features for the first object to a latent 3space to act as a constraint in generating the one or more time-lapsed images (As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 4, left-hand column, latent space is investigated and the semantic relationship between different latent representations are studied and applied to facial attribute prediction; As shown in Pg. 5, left-hand column, the relationship between different learned latent vectors are investigated and the experiments are based on a trained VAE-123 model; As shown in Pg. 6, left-hand column, images with facial attributes (i.e. features) are fed into the encoder network to compute latent vectors; As shown in Fig. 6, images are generated from latent vectors in which the input image is reconstructed (i.e. time-lapsed image)).
Regarding claim 13, Hou et al. teaches, a method comprising: 2using one or more neural networks to generate one or more time-lapsed images of 3a second object based, at least in part, on one or more images of a first object (As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 4, “4. Experiments”, experiments are performed on face images to test a method in which VAE models are trained; As shown in Fig. 4, output images of faces are generated from input images of faces with different VAE models; As shown in Pg. 5, first paragraph of left-hand column, VAE-123 generates faces of different ages (i.e. time-lapsed image of an object)).
Regarding claim 14, Hou et al. teaches, the method of claim 13 (see claim 13 above), wherein the one or more neural networks include 2a convolutional neural network (CNN) to extract features of the first object from the one or more 3images of the first object, wherein the features are transformed into one or more feature vectors 4adhering to a schema (As shown in the abstract, hidden features of a deep convolutional neural network (CNN) are used for VAE training; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 1, right-hand column and Pg. 3, right-hand column, hidden representations (i.e. features) are extracted from the deep CNN; As shown in Pgs. 5-6, “4.3.2 Facial Attribute Manipulation”, two facial attributes (i.e. smiling and wearing eyeglasses) are fed into the encoder network to compute latent vectors of faces; As shown in Fig. 6, there are attribute-specific vectors (i.e. features transformed into vectors that have a certain schema) that are added or subtracted so that faces can be generated from latent vectors).
Regarding claim 15, Hou et al. teaches, the method of claim 14 (see claim 14 above), wherein the one or more neural networks include 2one or more variational autoencoders (VAEs) to encode the features for the first object to a latent 3space to act as a constraint in generating the one or more time-lapsed images (As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 4, left-hand column, latent space is investigated and the semantic relationship between different latent representations are studied and applied to facial attribute prediction; As shown in Pg. 5, left-hand column, the relationship between different learned latent vectors are investigated and the experiments are based on a trained VAE-123 model; As shown in Pg. 6, left-hand column, images with facial attributes (i.e. features) are fed into the encoder network to compute latent vectors; As shown in Fig. 6, images are generated from latent vectors in which the input image is reconstructed (i.e. time-lapsed image)).
Regarding claim 19, Hou et al. teaches, a non-transitory computer-readable medium having stored thereon a set of instructions, 2which if performed by one or more processors, cause the one or more processors to at least: 3use one or more neural networks to generate one or more time-lapsed images of a 4second object based, at least in part, on one or more images of a first object (As shown in Pg. 1, “1. Introduction”, deep convolutional neural networks (CNNs) are used in many computer vision tasks for supervised and unsupervised learning. A computer inherently has a processor and a memory that stores instructions for the processor to execute to train the CNN for image processing tasks; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Fig. 4, output images of faces are generated from input images of faces with different VAE models; As shown in Pg. 5, first paragraph of left-hand column, VAE-123 generates faces of different ages (i.e. time-lapsed image of an object)).
Regarding claim 20, Hou et al. teaches, the non-transitory computer-readable medium of claim 19 (see claim 19 above), wherein the one or more 2neural networks include a convolutional neural network (CNN) to extract features of the first 3object from the one or more images of the first object, wherein the features are transformed into 4one or more feature vectors adhering to a schema (As shown in the abstract, hidden features of a deep convolutional neural network (CNN) are used for VAE training; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 1, right-hand column and Pg. 3, right-hand column, hidden representations (i.e. features) are extracted from the deep CNN; As shown in Pgs. 5-6, “4.3.2 Facial Attribute Manipulation”, two facial attributes (i.e. smiling and wearing eyeglasses) are fed into the encoder network to compute latent vectors of faces; As shown in Fig. 6, there are attribute-specific vectors (i.e. features transformed into vectors that have a certain schema) that are added or subtracted so that faces can be generated from latent vectors).
Regarding claim 21, Hou et al. teaches, the non-transitory computer-readable medium of claim 20 (see claim 20 above), wherein the one or more 2neural networks include one or more variational autoencoders (VAEs) to encode the features for 3the first object to a latent space to act as a constraint in generating the one or more time-lapsed 4images (As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 4, left-hand column, latent space is investigated and the semantic relationship between different latent representations are studied and applied to facial attribute prediction; As shown in Pg. 5, left-hand column, the relationship between different learned latent vectors are investigated and the experiments are based on a trained VAE-123 model; As shown in Pg. 6, left-hand column, images with facial attributes (i.e. features) are fed into the encoder network to compute latent vectors; As shown in Fig. 6, images are generated from latent vectors in which the input image is reconstructed (i.e. time-lapsed image)).
Regarding claim 25, Hou et al. teaches, an image generation system, comprising: 2one or more processors to use one or more neural networks to generate one or 3more time-lapsed images of a second object based, at least in part, on one or more images of a 4first object (As shown in Pg. 1, “1. Introduction”, deep convolutional neural networks (CNNs) are used in many computer vision tasks for supervised and unsupervised learning. A computer inherently has a processor to train the CNN for image processing tasks; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Fig. 4, output images of faces are generated from input images of faces with different VAE models (i.e. image generation system); As shown in Pg. 5, first paragraph of left-hand column, VAE-123 generates faces of different ages (i.e. time-lapsed image of an object)); 
and 5memory for storing network parameters for the one or more neural networks (As shown in Pg. 1, “1. Introduction”, deep convolutional neural networks (CNNs) are used in many computer vision tasks for supervised and unsupervised learning. A computer inherently has a processor and a memory that stores instructions for training the CNN for image processing tasks; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 3, left-hand column, parameters are updated for the encoder and the decoder based on the perceptual loss; As shown in Pg. 4, “4.1. Training Details”, loss weighting parameters are set for the neural network).
Regarding claim 26, Hou et al. teaches, the image generation system of claim 25 (see claim 25 above), wherein the one or more neural 2networks include a convolutional neural network (CNN) to extract features of the first object 3from the one or more images of the first object, wherein the features are transformed into one or 4more feature vectors adhering to a schema (As shown in the abstract, hidden features of a deep convolutional neural network (CNN) are used for VAE training; As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 1, right-hand column and Pg. 3, right-hand column, hidden representations (i.e. features) are extracted from the deep CNN; As shown in Pgs. 5-6, “4.3.2 Facial Attribute Manipulation”, two facial attributes (i.e. smiling and wearing eyeglasses) are fed into the encoder network to compute latent vectors of faces; As shown in Fig. 6, there are attribute-specific vectors (i.e. features transformed into vectors that have a certain schema) that are added or subtracted so that faces can be generated from latent vectors).
Regarding claim 27, Hou et al. teaches, The image generation system of claim 26 (see claim 26 above), wherein the one or more neural 2networks include one or more variational autoencoders (VAEs) to encode the features for the 140 \\NORTHCA - 1 R2674/010901 - 2775522 vlfirst object to a latent space to act as a constraint in generating the one or more time-lapsed 4images (As shown in Fig. 1 and Pg. 3, “3.1 Variational Autoencoder Network Architecture”, there is a deep CNN-based variational autoencoder (i.e. neural network); As shown in Pg. 4, left-hand column, latent space is investigated and the semantic relationship between different latent representations are studied and applied to facial attribute prediction; As shown in Pg. 5, left-hand column, the relationship between different learned latent vectors are investigated and the experiments are based on a trained VAE-123 model; As shown in Pg. 6, left-hand column, images with facial attributes (i.e. features) are fed into the encoder network to compute latent vectors; As shown in Fig. 6, images are generated from latent vectors in which the input image is reconstructed (i.e. time-lapsed image)).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 12, 18, 24, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over “Deep Feature Consistent Variational Autoencoder” by Hou et al in view of “The Pose Knows: Video Forecasting by Generating Pose Futures” by Walker et al.
Regarding claim 6, Hou et al. teaches the limitations as explained above in claim 1.
Hou et al. does not expressly disclose the following limitation: wherein one or more times or time periods 2corresponding to the one or more time-lapsed images are determined using one or more time 3vectors, each time vector including a magnitude for a time-shift and a direction forward or 4backward in time.
However, Walker et al. teaches, wherein one or more times or time periods corresponding to the one or more time-lapsed images are determined using one or more time vectors, each time vector including a magnitude for a time-shift and a direction forward or backward in time (As shown in Pg. 3, “3.1. Pose-VAE”, at time t, given a series of past poses P1..t and the last frame of input video Xt, the future poses are predicted up to timestep T, Pt+1..T...predict a series of pose velocities Yt+1..T. In this case, the magnitude of the timestep is T and the future poses indicates that the pose is predicted at a time forward in time. The pose velocity further indicates that there is a magnitude and direction of time used to determine the future pose; Pg. 5, left-hand column: every timestep t represents 0.2 second…conditioned the past on 2 timesteps and predict for 5 timesteps. The 0.2 second timestep is a magnitude and the 5 timesteps predicted is the direction of the time vector; Fig. 1 shows generation of future video (i.e. time-lapsed images) using VAE and GAN).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining times of the time-lapsed image using time vectors as taught by Walker et al. into the image generation of Hou et al. in order to improve prediction of future frames of video (Walker et al., Abstract).
Regarding claim 12, Hou et al. teaches the limitations as explained above in claim 7.
Hou et al. does not expressly disclose the following limitation: wherein one or more times or time periods 2corresponding to the one or more time-lapsed images are determined using one or more time 3vectors, each time vector including a magnitude for a time-shift and a direction forward or 4backward in time.
However, Walker et al. teaches, wherein one or more times or time periods 2corresponding to the one or more time-lapsed images are determined using one or more time 3vectors, each time vector including a magnitude for a time-shift and a direction forward or 4backward in time (As shown in Pg. 3, “3.1. Pose-VAE”, at time t, given a series of past poses P1..t and the last frame of input video Xt, the future poses are predicted up to timestep T, Pt+1..T...predict a series of pose velocities Yt+1..T. In this case, the magnitude of the timestep is T and the future poses indicates that the pose is predicted at a time forward in time. The pose velocity further indicates that there is a magnitude and direction of time used to determine the future pose; Pg. 5, left-hand column: every timestep t represents 0.2 second…conditioned the past on 2 timesteps and predict for 5 timesteps. The 0.2 second timestep is a magnitude and the 5 timesteps predicted is the direction of the time vector; Fig. 1 shows generation of future video (i.e. time-lapsed images) using VAE and GAN).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining times of the time-lapsed image using time vectors as taught by Walker et al. into the image generation of Hou et al. in order to improve prediction of future frames of video (Walker et al., Abstract).
Regarding claim 18, Hou et al. teaches the limitations as explained above in claim 13.
Hou et al. does not expressly disclose the following limitation: wherein one or more times or time periods 2corresponding to the one or more time-lapsed images are determined using one or more time 3vectors, each time vector including a magnitude for a time-shift and a direction forward or 4backward in time.
However, Walker et al. teaches, wherein one or more times or time periods 2corresponding to the one or more time-lapsed images are determined using one or more time 3vectors, each time vector including a magnitude for a time-shift and a direction forward or 4backward in time (As shown in Pg. 3, “3.1. Pose-VAE”, at time t, given a series of past poses P1..t and the last frame of input video Xt, the future poses are predicted up to timestep T, Pt+1..T...predict a series of pose velocities Yt+1..T. In this case, the magnitude of the timestep is T and the future poses indicates that the pose is predicted at a time forward in time. The pose velocity further indicates that there is a magnitude and direction of time used to determine the future pose; Pg. 5, left-hand column: every timestep t represents 0.2 second…conditioned the past on 2 timesteps and predict for 5 timesteps. The 0.2 second timestep is a magnitude and the 5 timesteps predicted is the direction of the time vector; Fig. 1 shows generation of future video (i.e. time-lapsed images) using VAE and GAN).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining times of the time-lapsed image using time vectors as taught by Walker et al. into the image generation of Hou et al. in order to improve prediction of future frames of video (Walker et al., Abstract).
Regarding claim 24, Hou et al. teaches the limitations as explained above in claim 19.
Hou et al. does not expressly disclose the following limitation: wherein one or more times or 2time periods corresponding to the one or more time-lapsed images are determined using one or 3more time vectors, each time vector including a magnitude for a time-shift and a direction 4forward or backward in time.
However, Walker et al. teaches, wherein one or more times or 2time periods corresponding to the one or more time-lapsed images are determined using one or 3more time vectors, each time vector including a magnitude for a time-shift and a direction 4forward or backward in time (As shown in Pg. 3, “3.1. Pose-VAE”, at time t, given a series of past poses P1..t and the last frame of input video Xt, the future poses are predicted up to timestep T, Pt+1..T...predict a series of pose velocities Yt+1..T. In this case, the magnitude of the timestep is T and the future poses indicates that the pose is predicted at a time forward in time. The pose velocity further indicates that there is a magnitude and direction of time used to determine the future pose; Pg. 5, left-hand column: every timestep t represents 0.2 second…conditioned the past on 2 timesteps and predict for 5 timesteps. The 0.2 second timestep is a magnitude and the 5 timesteps predicted is the direction of the time vector; Fig. 1 shows generation of future video (i.e. time-lapsed images) using VAE and GAN).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining times of the time-lapsed image using time vectors as taught by Walker et al. into the image generation of Hou et al. in order to improve prediction of future frames of video (Walker et al., Abstract).
Regarding claim 30, Hou et al. teaches the limitations as explained above in claim 25.
Hou et al. does not expressly disclose the following limitation: wherein one or more times or 2time periods corresponding to the one or more time-lapsed images are determined using one or 3more time vectors, each time vector including a magnitude for a time-shift and a direction 4forward or backward in time.
However, Walker et al. teaches, wherein one or more times or 2time periods corresponding to the one or more time-lapsed images are determined using one or 3more time vectors, each time vector including a magnitude for a time-shift and a direction 4forward or backward in time (As shown in Pg. 3, “3.1. Pose-VAE”, at time t, given a series of past poses P1..t and the last frame of input video Xt, the future poses are predicted up to timestep T, Pt+1..T...predict a series of pose velocities Yt+1..T. In this case, the magnitude of the timestep is T and the future poses indicates that the pose is predicted at a time forward in time. The pose velocity further indicates that there is a magnitude and direction of time used to determine the future pose; Pg. 5, left-hand column: every timestep t represents 0.2 second…conditioned the past on 2 timesteps and predict for 5 timesteps. The 0.2 second timestep is a magnitude and the 5 timesteps predicted is the direction of the time vector; Fig. 1 shows generation of future video (i.e. time-lapsed images) using VAE and GAN).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining times of the time-lapsed image using time vectors as taught by Walker et al. into the image generation of Hou et al. in order to improve prediction of future frames of video (Walker et al., Abstract).

Claims 5, 11, 17, 23, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over “Deep Feature Consistent Variational Autoencoder” by Hou et al in view of “Face Aging with Contextual Generative Adversarial Nets” by Liu et al.
Regarding claim 5, Hou et al. teaches the limitations as explained above in claim 3.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural networks 2include a generative network to generate the one or more time-lapsed images of the second 3object, a second class of object belonging to a same object class as the first object, the one 4or more time-lapsed images of the second object having an appearance appropriate for one 5or more points or periods in time corresponding to the one or more time-lapsed images.
However Liu et al. teaches, wherein the one or more neural networks 2include a generative network to generate the one or more time-lapsed images of the second 3object, a second class of object belonging to a same object class as the first object, the one 4or more time-lapsed images of the second object having an appearance appropriate for one 5or more points or periods in time corresponding to the one or more time-lapsed images (As shown in Fig. 1, there is a proposed C-GAN algorithm for face gaining in which the input image is transformed to any specific age group; As shown in Fig. 6, the C-GAN is applied to the original image (i.e. input image) to generate faces for 7 different ages groups. The images generated for each age group are the time-lapsed images in which the face of the different age groups (i.e. second object) is based on the original face (i.e. first object in the input image). The object in the original image and generated image are both faces (i.e. same class). The appearance of the generated images have an appropriate appearance for the person’s face a specific age (i.e. point in time); see Pg. 6, “4.2 Implementation details”.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a generative network to generate time-lapsed images of a second object belonging to the same class as the first object in which the second object has an appearance appropriate for the point in time as taught by Liu et al. into the image generation of Hou et al. in order to improve cross-age face verification (Liu et al., Abstract).
Regarding claim 11, Hou et al. teaches the limitations as explained above in claim 9.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural networks include a 2generative network to generate the one or more time-lapsed images of the second object, a 3second class of object belonging to a same object class as the first object, the one or more time- 4lapsed images of the second object having an appearance appropriate for one or more points or 5periods in time corresponding to the one or more time-lapsed images.
However Liu et al. teaches, wherein the one or more neural networks include a 2generative network to generate the one or more time-lapsed images of the second object, a 3second class of object belonging to a same object class as the first object, the one or more time- 4lapsed images of the second object having an appearance appropriate for one or more points or 5periods in time corresponding to the one or more time-lapsed images (As shown in Fig. 1, there is a proposed C-GAN algorithm for face gaining in which the input image is transformed to any specific age group; As shown in Fig. 6, the C-GAN is applied to the original image (i.e. input image) to generate faces for 7 different ages groups. The images generated for each age group are the time-lapsed images in which the face of the different age groups (i.e. second object) is based on the original face (i.e. first object in the input image). The object in the original image and generated image are both faces (i.e. same class). The appearance of the generated images have an appropriate appearance for the person’s face a specific age (i.e. point in time); see Pg. 6, “4.2 Implementation details”.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a generative network to generate time-lapsed images of a second object belonging to the same class as the first object in which the second object has an appearance appropriate for the point in time as taught by Liu et al. into the image generation of Hou et al. in order to improve cross-age face verification (Liu et al., Abstract).
Regarding claim 17, Hou et al. teaches the limitations as explained above in claim 15.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural networks include 2a generative network to generate the one or more time-lapsed images of the second object, a 3second class of object belonging to a same object class as the first object, the one or more time- 4lapsed images of the second object having an appearance appropriate for one or more points or 5periods in time corresponding to the one or more time-lapsed images.
However Liu et al. teaches, wherein the one or more neural networks include 2a generative network to generate the one or more time-lapsed images of the second object, a 3second class of object belonging to a same object class as the first object, the one or more time- 4lapsed images of the second object having an appearance appropriate for one or more points or 5periods in time corresponding to the one or more time-lapsed images (As shown in Fig. 1, there is a proposed C-GAN algorithm for face gaining in which the input image is transformed to any specific age group; As shown in Fig. 6, the C-GAN is applied to the original image (i.e. input image) to generate faces for 7 different ages groups. The images generated for each age group are the time-lapsed images in which the face of the different age groups (i.e. second object) is based on the original face (i.e. first object in the input image). The object in the original image and generated image are both faces (i.e. same class). The appearance of the generated images have an appropriate appearance for the person’s face a specific age (i.e. point in time); see Pg. 6, “4.2 Implementation details”.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a generative network to generate time-lapsed images of a second object belonging to the same class as the first object in which the second object has an appearance appropriate for the point in time as taught by Liu et al. into the image generation of Hou et al. in order to improve cross-age face verification (Liu et al., Abstract).
Regarding claim 23, Hou et al. teaches the limitations as explained above in claim 21.
Hou et al. does not expressly disclose the following limitation: wherein the one or more 2neural networks include a generative network to generate the one or more time-lapsed images of 3the second object, the second class of object belonging to a same object class as the first object, 4the one or more time-lapsed images of the second object having an appearance appropriate for 5one or more points or periods in time corresponding to the one or more time-lapsed images.
However Liu et al. teaches, wherein the one or more 2neural networks include a generative network to generate the one or more time-lapsed images of 3the second object, the second class of object belonging to a same object class as the first object, 4the one or more time-lapsed images of the second object having an appearance appropriate for 5one or more points or periods in time corresponding to the one or more time-lapsed images (As shown in Fig. 1, there is a proposed C-GAN algorithm for face gaining in which the input image is transformed to any specific age group; As shown in Fig. 6, the C-GAN is applied to the original image (i.e. input image) to generate faces for 7 different ages groups. The images generated for each age group are the time-lapsed images in which the face of the different age groups (i.e. second object) is based on the original face (i.e. first object in the input image). The object in the original image and generated image are both faces (i.e. same class). The appearance of the generated images have an appropriate appearance for the person’s face a specific age (i.e. point in time); see Pg. 6, “4.2 Implementation details”.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a generative network to generate time-lapsed images of a second object belonging to the same class as the first object in which the second object has an appearance appropriate for the point in time as taught by Liu et al. into the image generation of Hou et al. in order to improve cross-age face verification (Liu et al., Abstract).
Regarding claim 29, Hou et al. teaches the limitations as explained above in claim 27.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural 2networks include a generative network to generate the one or more time-lapsed images of the 3second object, a second class of object belonging to a same object class as the first object, the 4one or more time-lapsed images of the second object having an appearance appropriate for one 5or more points or periods in time corresponding to the one or more time-lapsed images.
However Liu et al. teaches, wherein the one or more neural 2networks include a generative network to generate the one or more time-lapsed images of the 3second object, a second class of object belonging to a same object class as the first object, the 4one or more time-lapsed images of the second object having an appearance appropriate for one 5or more points or periods in time corresponding to the one or more time-lapsed images (As shown in Fig. 1, there is a proposed C-GAN algorithm for face gaining in which the input image is transformed to any specific age group; As shown in Fig. 6, the C-GAN is applied to the original image (i.e. input image) to generate faces for 7 different ages groups. The images generated for each age group are the time-lapsed images in which the face of the different age groups (i.e. second object) is based on the original face (i.e. first object in the input image). The object in the original image and generated image are both faces (i.e. same class). The appearance of the generated images have an appropriate appearance for the person’s face a specific age (i.e. point in time); see Pg. 6, “4.2 Implementation details”.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a generative network to generate time-lapsed images of a second object belonging to the same class as the first object in which the second object has an appearance appropriate for the point in time as taught by Liu et al. into the image generation of Hou et al. in order to improve cross-age face verification (Liu et al., Abstract).

Claims 4, 10, 16, 22, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over “Deep Feature Consistent Variational Autoencoder” by Hou et al. in view of “Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-based Representations” by Kopf et al. and further in view of “Unsupervised feature extraction with autoencoder trees” by Irsoy et al. 
Regarding claim 4, Hou et al. teaches the limitations as explained above in claim 3.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural networks 2include a gating network to select the one or more VAEs from a set of VAEs each trained 3for a different class of object, the gating network to select the one or more VAEs using a 4hierarchical mixture-of-experts approach.
However, Kopf et al. teaches, wherein the one or more neural networks 2include a gating network to select the one or more VAEs from a set of VAEs each trained 3for a different class of object, the gating network to select the one or more VAEs (As shown in the Abstract, a mixture-of-experts similarity variational autoencoder is introduced; As shown in Pg. 3, the data is mapped via the latent representation into K clusters with each cluster corresponding to one of the K generator experts and the clustering network (i.e. gating network) is used for MoE models. This shows that the gating network selects the expert (i.e. VAE) for each cluster (i.e. object class); As shown in Pg. 7, section 4.3, the MoE-Sim-VAE is used for classification; Fig. 1 shows the proposed MoE-Sim-VAE model).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects VAEs as taught by Kopf et al. into the image generation of Hou et al. in order to improve classification (Kopf et al., Pg. 7). 
The combination of Hou et al. and Kopf et al. does not expressly disclose the following limitation: using a 4hierarchical mixture-of-experts approach.
However, Irsoy et al. in the combination teaches, using a 4hierarchical mixture-of-experts approach (As shown in Pg. 64, 3. Autoencoder trees,” a soft decision node redirects instances to all its children but with different probabilities, as given by a gating function…this architecture is equivalent to that of the hierarchical mixture of experts”; Pg. 71, 5. Conclusions, “the autoencoder tree implements soft hierarchical clustering”; Pg. 64, left-hand column, “We use the soft decision tree model whose internal nodes implement a soft multivariate split as defined by a gating function; As shown in Fig. 1, a soft selection is made among the leaf responses).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects autoencoders using hierarchical mixture-of-experts as taught by Irsoy et al. into the combined image generation of Hou et al. and Kopf et al. in order to improve prediction accuracy (Irsoy, Pg., 71).
Regarding claim 10, Hou et al. teaches the limitations as explained above in claim 9.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural networks include a 2gating network to select the one or more VAEs from a set of VAEs each trained for a different 3class of object, the gating network to select the one or more VAEs using a hierarchical mixture- 4of-experts approach.
However, Kopf et al. teaches, wherein the one or more neural networks include a 2gating network to select the one or more VAEs from a set of VAEs each trained for a different 3class of object, the gating network to select the one or more VAEs (As shown in the Abstract, a mixture-of-experts similarity variational autoencoder is introduced; As shown in Pg. 3, the data is mapped via the latent representation into K clusters with each cluster corresponding to one of the K generator experts and the clustering network (i.e. gating network) is used for MoE models. This shows that the gating network selects the expert (i.e. VAE) for each cluster (i.e. object class); As shown in Pg. 7, section 4.3, the MoE-Sim-VAE is used for classification; Fig. 1 shows the proposed MoE-Sim-VAE model).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects VAEs as taught by Kopf et al. into the image generation of Hou et al. in order to improve classification (Kopf et al., Pg. 7). 
The combination of Hou et al. and Kopf et al. does not expressly disclose the following limitation: using a 4hierarchical mixture-of-experts approach.
However, Irsoy et al. in the combination teaches, using a 4hierarchical mixture-of-experts approach (As shown in Pg. 64, 3. Autoencoder trees,” a soft decision node redirects instances to all its children but with different probabilities, as given by a gating function…this architecture is equivalent to that of the hierarchical mixture of experts”; Pg. 71, 5. Conclusions, “the autoencoder tree implements soft hierarchical clustering”; Pg. 64, left-hand column, “We use the soft decision tree model whose internal nodes implement a soft multivariate split as defined by a gating function; As shown in Fig. 1, a soft selection is made among the leaf responses).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects autoencoders using hierarchical mixture-of-experts as taught by Irsoy et al. into the combined image generation of Hou et al. and Kopf et al. in order to improve prediction accuracy (Irsoy, Pg., 71).
Regarding claim 16, Hou et al. teaches the limitations as explained above in claim 15.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural networks include a 2gating network to select the one or more VAEs from a set of VAEs each trained for a different 3class of object, the gating network to select the one or more VAEs using a hierarchical mixture- 4of-experts approach.
However, Kopf et al. teaches, wherein the one or more neural networks include a 2gating network to select the one or more VAEs from a set of VAEs each trained for a different 3class of object, the gating network to select the one or more VAEs (As shown in the Abstract, a mixture-of-experts similarity variational autoencoder is introduced; As shown in Pg. 3, the data is mapped via the latent representation into K clusters with each cluster corresponding to one of the K generator experts and the clustering network (i.e. gating network) is used for MoE models. This shows that the gating network selects the expert (i.e. VAE) for each cluster (i.e. object class); As shown in Pg. 7, section 4.3, the MoE-Sim-VAE is used for classification; Fig. 1 shows the proposed MoE-Sim-VAE model).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects VAEs as taught by Kopf et al. into the image generation of Hou et al. in order to improve classification (Kopf et al., Pg. 7). 
The combination of Hou et al. and Kopf et al. does not expressly disclose the following limitation: using a 4hierarchical mixture-of-experts approach.
However, Irsoy et al. in the combination teaches, using a 4hierarchical mixture-of-experts approach (As shown in Pg. 64, 3. Autoencoder trees,” a soft decision node redirects instances to all its children but with different probabilities, as given by a gating function…this architecture is equivalent to that of the hierarchical mixture of experts”; Pg. 71, 5. Conclusions, “the autoencoder tree implements soft hierarchical clustering”; Pg. 64, left-hand column, “We use the soft decision tree model whose internal nodes implement a soft multivariate split as defined by a gating function; As shown in Fig. 1, a soft selection is made among the leaf responses).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects autoencoders using hierarchical mixture-of-experts as taught by Irsoy et al. into the combined image generation of Hou et al. and Kopf et al. in order to improve prediction accuracy (Irsoy, Pg., 71).
Regarding claim 22, Hou et al. teaches the limitations as explained above in claim 21.
Hou et al. does not expressly disclose the following limitation: wherein the one or more 2neural networks include a gating network to select the one or more VAEs from a set of VAEs 3each trained for a different class of object, the gating network to select the one or more VAEs 4using a hierarchical mixture-of-experts approach.
However, Kopf et al. teaches, wherein the one or more neural networks include a 2gating network to select the one or more VAEs from a set of VAEs each trained for a different 3class of object, the gating network to select the one or more VAEs (As shown in the Abstract, a mixture-of-experts similarity variational autoencoder is introduced; As shown in Pg. 3, the data is mapped via the latent representation into K clusters with each cluster corresponding to one of the K generator experts and the clustering network (i.e. gating network) is used for MoE models. This shows that the gating network selects the expert (i.e. VAE) for each cluster (i.e. object class); As shown in Pg. 7, section 4.3, the MoE-Sim-VAE is used for classification; Fig. 1 shows the proposed MoE-Sim-VAE model).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects VAEs as taught by Kopf et al. into the image generation of Hou et al. in order to improve classification (Kopf et al., Pg. 7). 
The combination of Hou et al. and Kopf et al. does not expressly disclose the following limitation: using a 4hierarchical mixture-of-experts approach.
However, Irsoy et al. in the combination teaches, using a 4hierarchical mixture-of-experts approach (As shown in Pg. 64, 3. Autoencoder trees,” a soft decision node redirects instances to all its children but with different probabilities, as given by a gating function…this architecture is equivalent to that of the hierarchical mixture of experts”; Pg. 71, 5. Conclusions, “the autoencoder tree implements soft hierarchical clustering”; Pg. 64, left-hand column, “We use the soft decision tree model whose internal nodes implement a soft multivariate split as defined by a gating function; As shown in Fig. 1, a soft selection is made among the leaf responses).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects autoencoders using hierarchical mixture-of-experts as taught by Irsoy et al. into the combined image generation of Hou et al. and Kopf et al. in order to improve prediction accuracy (Irsoy, Pg., 71).
Regarding claim 28, Hou et al. teaches the limitations as explained above in claim 27.
Hou et al. does not expressly disclose the following limitation: wherein the one or more neural 2networks include a gating network to select the one or more VAEs from a set of VAEs each 3trained for a different class of object, the gating network to select the one or more VAEs using a 4hierarchical mixture-of-experts approach.
However, Kopf et al. teaches, wherein the one or more neural networks include a 2gating network to select the one or more VAEs from a set of VAEs each trained for a different 3class of object, the gating network to select the one or more VAEs (As shown in the Abstract, a mixture-of-experts similarity variational autoencoder is introduced; As shown in Pg. 3, the data is mapped via the latent representation into K clusters with each cluster corresponding to one of the K generator experts. The clustering network (i.e. gating network) is used for MoE models. This shows that the gating network selects the expert (i.e. VAE) for each cluster (i.e. object class); As shown in Pg. 7, section 4.3, the MoE-Sim-VAE is used for classification; Fig. 1 shows the proposed MoE-Sim-VAE model).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects VAEs as taught by Kopf et al. into the image generation of Hou et al. in order to improve classification (Kopf et al., Pg. 7). 
The combination of Hou et al. and Kopf et al. does not expressly disclose the following limitation: using a 4hierarchical mixture-of-experts approach.
However, Irsoy et al. in the combination teaches, using a 4hierarchical mixture-of-experts approach (As shown in Pg. 64, 3. Autoencoder trees,” a soft decision node redirects instances to all its children but with different probabilities, as given by a gating function…this architecture is equivalent to that of the hierarchical mixture of experts”; Pg. 71, 5. Conclusions, “the autoencoder tree implements soft hierarchical clustering”; Pg. 64, left-hand column, “We use the soft decision tree model whose internal nodes implement a soft multivariate split as defined by a gating function; As shown in Fig. 1, a soft selection is made among the leaf responses).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include a gating network that selects autoencoders using hierarchical mixture-of-experts as taught by Irsoy et al. into the combined image generation of Hou et al. and Kopf et al. in order to improve prediction accuracy (Irsoy, Pg., 71).

Response to Arguments
Applicant's arguments filed 4/25/2022 have been fully considered but they are not persuasive. 
Applicant, in Pgs. 12-14 of the remarks, argues Hou et al. does not expressly disclose the following limitation in each of the independent claims 1, 7, 13, 19, and 25: “…use one or more neural networks to generate one or more time-lapsed images of a second object based, at least in part, on one or more images of a first object.”
However, the Examiner respectfully disagrees. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “generate different images of the interior of a house…chair in the 1960s” and “replacing a first object with a second object having a similar type from a different period of time” in Pg. 3 of the remarks) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
All remaining arguments are reliant on the aforementioned and addressed arguments and thus are considered to be wholly addressed herein. Please see the above claim rejections.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniella M. DiGuglielmo whose telephone number is (571)272-2682. The examiner can normally be reached Monday - Friday 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on 571-272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Daniella M. DiGuglielmo/Examiner, Art Unit 2664                                                                                                                                                                                                        
/PING Y HSIEH/Primary Examiner, Art Unit 2664