DETAILED ACTION
1.	This office action is in response to the Application No. 16926407 filed on 07/10/2020. Claims 1-20 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
3.	Claim 18 is objected to because of the following informalities:  
	Claim 18 recites “The apparatus of Claim 6”. It should be “The apparatus of Claim 16”.
	 Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


4.	Claims 5 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 5 recites “second model weights …”. The term “second model weights” lacks antecedent basis. It is not clear which of the two models do the weights belong to or if the model weights belong to another model different from the two models.
	Claim 16 recites “second model weights …”. The term “second model weights” lacks antecedent basis. It is not clear which of the two models do the weights belong to or if the model weights belong to another model different from the two models.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



5.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ryan et al (US20200387797 filed 08/14/2019) in view of Li et al. ("Convergent learning: Do different neural networks learn the same representations?." arXiv:1511.07543v3 [cs.LG] 28 Feb 2016).

	Regarding claim 1, Ryan teaches a method comprising: obtaining, with at least one hardware processor, (FIG. 32 is a block diagram of a server 500 which may be used to implement the systems and methods described herein. The server 500 can implement the various processes associated with the systems and methods described herein. The server 500 may be a digital computer that, in terms of hardware architecture, generally includes a processor 502 [0175], Fig. 32; The processor 502 is a hardware device for executing software instructions [0176]) 
	data specifying: (the processor 502 is configured to execute software stored within the memory 510 to communicate data to and from the memory 510 [0176])
	two trained neural network models; (Thus, a trained GAN discriminator (e.g., discriminator 602) may be used to determine the probability that a never-before-seen sample x has come from the same probability distribution as the training data, while a trained encoder (e.g., encoder 616) can be used to find the probability of observing x in the training data [0202], Fig. 33 and Fig. 34; The GAN network architecture 600 in this embodiment may have two major components, including a discriminator sub-network 602 and a generator sub-network 604, … The generator 604 and the discriminator 602 may be trained jointly [0200]) and
	alignment data; (Multiple data transformations (blocks 174-1 through 174-4) can be combined into a single transformed data. Each component data transformation changes the dimensions of the input data, i.e., final data is aligned to the same dimension matrix [0118], Fig. 18)
	with said at least one hardware processor, (The processor 502 is a hardware device for executing software instructions [0176])
	with said at least one hardware processor, (The processor 502 is a hardware device for executing software instructions [0176])
	selecting a new model along said minimal loss curve that maximizes accuracy (The machine learning algorithm coupled with a data transformation becomes a new enhanced machine learning algorithm, … the best performing model is chosen, … The best model is selected based on a key performance indicator (KPI) relevant to how the model is going to be used for prediction/classification (e.g. highest true positive rate [0126]; The pattern detection approaches described above may use a CNN-based network (e.g., masked R-CNN network) and can obtain an outlier detection accuracy or True Positive Rate (TPR) of about 95%. With respect to the unsupervised approach described with respect to FIGS. 33-41, the present outlier detection techniques may use a Generalized Adversarial Network (GAN) to obtain a TPR of about 92% [0192]) 
	on adversarially perturbed data. (Noise may be introduced into the inputs to the black boxes, … The black boxes may be described as machine learning 222 and meta learning 224 processes for providing models and selecting the best performing models [0139]; Anomalies are deviations from regular patterns of data profiles. Unexpected bursts in time-series data might indicate, … an intrusion activity or cyber-attack in network traffic data [0129]; One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182]. The Examiner notes that noise and cyber-attack in network traffic data are adversarial attacks on data.)
	Ryan does not explicitly teach carrying out neuron alignment on said two trained neural network models using said alignment data to obtain two aligned models; training a minimal loss curve between said two aligned models;
 	Li teaches carrying out neuron alignment on said two trained neural network models using said alignment data to obtain two aligned models; (we also performed one-to-one alignments of neurons by measuring the mutual information between them, pg. 6, section 3.2, first para.; We begin research into this question by introducing three techniques to approximately align different neural networks on a feature or subspace level, abstract)
	training a minimal loss curve between said two aligned models; (This layer is then trained to minimize the sum of squared prediction errors plus an L1 penalty, the strength of which is varied.9, pg. 7, second para.; Second, … making the initial cost about the same on all layers and allowing the same learning rate and SGD momentum hyperparameters to be used for all layers, pg. 7, footnote 9. The Examiner notes SGD also known as stochastic gradient descent is an optimization method that minimizes the loss function)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ryan to incorporate the method of Li for the benefit of improvements that are possible via training multiple models and then using model compilation techniques to realize the resulting ensemble in a single model. (Li, pg. 2, first para.)

	Regarding claim 2, Modified Ryan teaches the method of claim 1, Ryan teaches wherein said alignment data (Multiple data transformations (blocks 174-1 through 174-4) can be combined into a single transformed data. Each component data transformation changes the dimensions of the input data, i.e., final data is aligned to the same dimension matrix [0118], Fig. 18)
	includes training data (The algorithm is trained with the multiple transformed data [0126])

	Regarding claim 3, Modified Ryan teaches the method of claim 2, Ryan teaches further comprising implementing said new model on a computer in an artificial intelligence application (Meta Learning 224 selects the best performing models which is sent as input according to the arrow into machine learning 222 (which is considered as artificial intelligence application) Fig. 15, diagram on the right of last shaded section; meta learning 224 processes for providing models and selecting the best performing models [0139])

	Regarding claim 4, Modified Ryan teaches the method of claim 3, Ryan teaches wherein said artificial intelligence application comprises computer vision, further comprising controlling (…. control operations of the server 500 pursuant to the software instructions [0176])
	at least one of a vehicle and a tool (forecasting traffic congestion on streets by detecting patterns in a time-series from video cameras on streets, cars [0061]. The Examiner notes that video cameras as tools for perceiving scenes in computer vision)
	with said new model based at least in part on adversarial input (One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182])

	Regarding claim 5, Modified Ryan teaches the method of claim 3, Ryan teaches with said at least one hardware processor, (The processor 502 is a hardware device for executing software instructions [0176])
	Li teaches wherein said carrying out of said neuron alignment (we also performed one-to-one alignments of neurons by measuring the mutual information between them, pg. 6, section 3.2, first para.; We begin research into this question by introducing three techniques to approximately align different neural networks on a feature or subspace level, abstract) comprises:
	computing correlations between hidden states of said two trained neural network models; (Figure 1: Correlation matrices for the conv1 layer, displayed as images with minimum value at black and maximum at white. (a, b) Within-net correlation matrices for Net1 and Net2, respectively, pg. 3, Fig. 1; The within-net correlation values for each layer can be considered as a symmetric square matrix with side length equal to the number of units in that layer (e.g. a 96 × 96 matrix for conv1 as in Figure 1a,b), pg. 3, last para.) and
	with said at least one hardware processor, permuting second model weights to maximize correlation between corresponding hidden states (Due to symmetries in the architecture and weight initialization procedures, for any given parameter vector that is found, one could create many equivalent solutions simply by permuting the unit orders within a layer (and permuting the outgoing weights accordingly), pg. 4, second para.; (d) Between-net correlation for Net1 vs. a version of Net2 that has been permuted to approximate Net1’s feature order. The partially white diagonal of this final matrix shows the extent to which the alignment is successful pg. 3, Fig. 1; We find matching units between a pair of networks — here Net1 and Net2 — in two ways. In the first approach, for each unit in Net1, we find the unit in Net2 with maximum correlation to it, which is the max along each row of Figure 1c, pg. 4, section 3.1, first para.)
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 6, Modified Ryan teaches the method of claim 2, Ryan teaches further comprising: with said at least one hardware processor, substituting said new model for one of said two trained neural network models; (Models have to be extensively replaced by other algorithms and optimized to avoid under-fitting when the input evolves to a more complex and heterogeneous data [0138]) and
	with said at least one hardware processor, (The processor 502 is a hardware device for executing software instructions [0176])
	and selecting steps to obtain a further refined new model. (… and selecting the best algorithm among the optimized algorithms, given the current network context [0095]; The preparation step is selected during the training of the machine learning algorithm [0119]; The procedure 110 includes selecting hyper-parameters (step 112) [0114])
	Li teaches iteratively repeating said neuron alignment, training (We would like to investigate the similarities and differences between multiple training runs of same network architecture, pg. 4, second para.; Both methods reveal that the average correlation for one-to-one alignments varies from layer to layer (Figure 4), with the highest matches in the conv1 and conv5 layers, pg. 5, second para. The Examiner notes that multiple training runs is the iteration)
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 7, Modified Ryan teaches the method of claim 6, Ryan teaches further comprising implementing said further refined new model on a computer in an artificial intelligence application. (… and selecting the best algorithm among the optimized algorithms, given the current network context [0095]; (Meta Learning 224 selects the best performing models which is sent as input according to the arrow into machine learning 222 (which is considered as artificial intelligence application) Fig. 15, diagram on the right of last shaded section)

	Regarding claim 8, Modified Ryan teaches the method of claim 7, Ryan teaches wherein said artificial intelligence application comprises computer vision, further comprising controlling (…. control operations of the server 500 pursuant to the software instructions [0176])
	at least one of a vehicle and a tool (forecasting traffic congestion on streets by detecting patterns in a time-series from video cameras on streets, cars [0061]. The Examiner notes that video cameras as tools for perceiving scenes in computer vision)
	with said further refined new model based at least in part on adversarial input (… and selecting the best algorithm among the optimized algorithms, given the current network context [0095]; Noise may be introduced into the inputs to the black boxes, … The black boxes may be described as machine learning 222 and meta learning 224 processes for providing models and selecting the best performing models [0139]; Anomalies are deviations from regular patterns of data profiles. Unexpected bursts in time-series data might indicate, … an intrusion activity or cyber-attack in network traffic data [0129]; One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182]. The Examiner notes that noise and cyber-attack in network traffic data are adversarial attacks on data.)

	Regarding claim 9, Modified Ryan teaches the method of claim 2, Li teaches wherein training said minimal loss curve comprises applying stochastic gradient descent (This layer is then trained to minimize the sum of squared prediction errors plus an L1 penalty, the strength of which is varied.9, pg. 7, second para.; Second, … making the initial cost about the same on all layers and allowing the same learning rate and SGD momentum hyperparameters to be used for all layers, pg. 7, footnote 9. The Examiner notes that SGD also known as stochastic gradient descent is an optimization technique that minimizes the loss function)
	The same motivation to combine independent claim 1 applies here.

	Regarding claim 10,  Ryan teaches a non-transitory computer readable medium comprising computer executable instructions which when executed by a hardware processor cause said hardware processor to perform a method of: (When stored in the non-transitory computer-readable medium, software can include instructions executable by a processor or device (e.g., any type of programmable circuitry or logic) that, in response to such execution, cause a processor or the device to perform a set of operations, steps, methods [0180]; The processor 502 is a hardware device for executing software instructions [0176])
	obtaining data specifying: (the processor 502 is configured to execute software stored within the memory 510 to communicate data to and from the memory 510 [0176])
	two trained neural network models; (Thus, a trained GAN discriminator (e.g., discriminator 602) may be used to determine the probability that a never-before-seen sample x has come from the same probability distribution as the training data, while a trained encoder (e.g., encoder 616) can be used to find the probability of observing x in the training data [0202], Fig. 33 and Fig. 34; The GAN network architecture 600 in this embodiment may have two major components, including a discriminator sub-network 602 and a generator sub-network 604, … The generator 604 and the discriminator 602 may be trained jointly [0200]) and
	alignment data; (Multiple data transformations (blocks 174-1 through 174-4) can be combined into a single transformed data. Each component data transformation changes the dimensions of the input data, i.e., final data is aligned to the same dimension matrix [0118], Fig. 18)
	selecting a new model along said minimal loss curve that maximizes accuracy (The machine learning algorithm coupled with a data transformation becomes a new enhanced machine learning algorithm, … the best performing model is chosen, … The best model is selected based on a key performance indicator (KPI) relevant to how the model is going to be used for prediction/classification (e.g. highest true positive rate [0126]; The pattern detection approaches described above may use a CNN-based network (e.g., masked R-CNN network) and can obtain an outlier detection accuracy or True Positive Rate (TPR) of about 95%. With respect to the unsupervised approach described with respect to FIGS. 33-41, the present outlier detection techniques may use a Generalized Adversarial Network (GAN) to obtain a TPR of about 92% [0192]) 
	on adversarially perturbed data. (Noise may be introduced into the inputs to the black boxes, … The black boxes may be described as machine learning 222 and meta learning 224 processes for providing models and selecting the best performing models [0139]; Anomalies are deviations from regular patterns of data profiles. Unexpected bursts in time-series data might indicate, … an intrusion activity or cyber-attack in network traffic data [0129]; One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182]. The Examiner notes that noise and cyber-attack in network traffic data are adversarial attacks on data.)
	Ryan does not explicitly teach carrying out neuron alignment on said two trained neural network models using said alignment data to obtain two aligned models; training a minimal loss curve between said two aligned models; 
	Li teaches carrying out neuron alignment on said two trained neural network models using said alignment data to obtain two aligned models; (we also performed one-to-one alignments of neurons by measuring the mutual information between them, pg. 6, section 3.2, first para.; We begin research into this question by introducing three techniques to approximately align different neural networks on a feature or subspace level, abstract)
	training a minimal loss curve between said two aligned models; (This layer is then trained to minimize the sum of squared prediction errors plus an L1 penalty, the strength of which is varied.9, pg. 7, second para.; Second, … making the initial cost about the same on all layers and allowing the same learning rate and SGD momentum hyperparameters to be used for all layers, pg. 7, footnote 9. The Examiner notes SGD also known as stochastic gradient descent is an optimization method that minimizes the loss function)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ryan to incorporate the method of Li for the benefit of improvements that are possible via training multiple models and then using model compilation techniques to realize the resulting ensemble in a single model. (Li, pg. 2, first para.)

	Regarding claim 11, Modified Ryan teaches the non-transitory computer readable medium of claim 10, Ryan teaches wherein said alignment data (Multiple data transformations (blocks 174-1 through 174-4) can be combined into a single transformed data. Each component data transformation changes the dimensions of the input data, i.e., final data is aligned to the same dimension matrix [0118], Fig. 18)
	includes training data. (The algorithm is trained with the multiple transformed data [0126])

	Regarding claim 12, Ryan teaches an apparatus comprising: a memory;
a non-transitory computer readable medium comprising computer executable instructions; and at least one processor, coupled to said memory and said non-transitory computer readable medium, and operative to execute said instructions to be operative to: (The server 500 may be a digital computer that, in terms of hardware architecture, generally includes a processor 502 [0175], Fig. 32; The processor 502 is a hardware device for executing software instructions, … When the server 500 is in operation, the processor 502 is configured to execute software stored within the memory 510, to communicate data to and from the memory 510 [0176]; When stored in the non-transitory computer-readable medium, software can include instructions executable by a processor or device (e.g., any type of programmable circuitry or logic) that, in response to such execution, cause a processor or the device to perform a set of operations, steps, methods [0180])
	obtain data specifying: (the processor 502 is configured to execute software stored within the memory 510 to communicate data to and from the memory 510 [0176])
	two trained neural network models; (Thus, a trained GAN discriminator (e.g., discriminator 602) may be used to determine the probability that a never-before-seen sample x has come from the same probability distribution as the training data, while a trained encoder (e.g., encoder 616) can be used to find the probability of observing x in the training data [0202], Fig. 33 and Fig. 34; The GAN network architecture 600 in this embodiment may have two major components, including a discriminator sub-network 602 and a generator sub-network 604, … The generator 604 and the discriminator 602 may be trained jointly [0200]) and
	alignment data; (Multiple data transformations (blocks 174-1 through 174-4) can be combined into a single transformed data. Each component data transformation changes the dimensions of the input data, i.e., final data is aligned to the same dimension matrix [0118], Fig. 18)
	select a new model along said minimal loss curve that maximizes accuracy (The machine learning algorithm coupled with a data transformation becomes a new enhanced machine learning algorithm, … the best performing model is chosen, … The best model is selected based on a key performance indicator (KPI) relevant to how the model is going to be used for prediction/classification (e.g. highest true positive rate [0126]; The pattern detection approaches described above may use a CNN-based network (e.g., masked R-CNN network) and can obtain an outlier detection accuracy or True Positive Rate (TPR) of about 95%. With respect to the unsupervised approach described with respect to FIGS. 33-41, the present outlier detection techniques may use a Generalized Adversarial Network (GAN) to obtain a TPR of about 92% [0192]) 
	on adversarially perturbed data. (Noise may be introduced into the inputs to the black boxes, … The black boxes may be described as machine learning 222 and meta learning 224 processes for providing models and selecting the best performing models [0139]; Anomalies are deviations from regular patterns of data profiles. Unexpected bursts in time-series data might indicate, … an intrusion activity or cyber-attack in network traffic data [0129]; One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182]. The Examiner notes that noise and cyber-attack in network traffic data are adversarial attacks on data.)
	Ryan does not explicitly teach carry out neuron alignment on said two trained neural network models using said alignment data to obtain two aligned models; train a minimal loss curve between said two aligned models; 
	Li teaches carry out neuron alignment on said two trained neural network models using said alignment data to obtain two aligned models; (we also performed one-to-one alignments of neurons by measuring the mutual information between them, pg. 6, section 3.2, first para.; We begin research into this question by introducing three techniques to approximately align different neural networks on a feature or subspace level, abstract)
	train a minimal loss curve between said two aligned models; (This layer is then trained to minimize the sum of squared prediction errors plus an L1 penalty, the strength of which is varied.9, pg. 7, second para.; Second, … making the initial cost about the same on all layers and allowing the same learning rate and SGD momentum hyperparameters to be used for all layers, pg. 7, footnote 9. The Examiner notes SGD also known as stochastic gradient descent is an optimization method that minimizes the loss function)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ryan to incorporate the method of Li for the benefit of improvements that are possible via training multiple models and then using model compilation techniques to realize the resulting ensemble in a single model. (Li, pg. 2, first para.)

	Regarding claim 13, Modified Ryan teaches the apparatus of claim 12, Ryan teaches wherein said alignment data (Multiple data transformations (blocks 174-1 through 174-4) can be combined into a single transformed data. Each component data transformation changes the dimensions of the input data, i.e., final data is aligned to the same dimension matrix [0118], Fig. 18)
	includes training data (The algorithm is trained with the multiple transformed data [0126])
	Regarding claim 14, Modified Ryan teaches the apparatus of claim 13, Ryan teaches wherein said at least one processor is further operative to implement said new model in an artificial intelligence application. (Meta Learning 224 selects the best performing models which is sent as input according to the arrow into machine learning 222 (which is considered as artificial intelligence application) Fig. 15, diagram on the right of last shaded section; meta learning 224 processes for providing models and selecting the best performing models [0139])

	Regarding claim 15, Modified Ryan teaches the apparatus of claim 14, Ryan teaches wherein said artificial intelligence application comprises computer vision, and wherein said at least one processor is further operative to control (the processor 502 is configured to execute software stored within the memory 510, to communicate data to and from the memory 510, and to generally control operations of the server 500 pursuant to the software instructions [0176])
	 at least one of a vehicle and a tool (forecasting traffic congestion on streets by detecting patterns in a time-series from video cameras on streets, cars [0061]. The Examiner notes that video cameras as tools for perceiving scenes in computer vision)
	with said new model based at least in part on adversarial input. (One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182])
	Regarding claim 16, Modified Ryan teaches the apparatus of claim 14, Ryan teaches with said at least one hardware processor, (The processor 502 is a hardware device for executing software instructions [0176])
	Li teaches wherein said carrying out of said neuron alignment (we also performed one-to-one alignments of neurons by measuring the mutual information between them, pg. 6, section 3.2, first para.; We begin research into this question by introducing three techniques to approximately align different neural networks on a feature or subspace level, abstract) comprises:
	computing correlations between hidden states of said two trained neural network models; (Figure 1: Correlation matrices for the conv1 layer, displayed as images with minimum value at black and maximum at white. (a, b) Within-net correlation matrices for Net1 and Net2, respectively, pg. 3, Fig. 1; The within-net correlation values for each layer can be considered as a symmetric square matrix with side length equal to the number of units in that layer (e.g. a 96 × 96 matrix for conv1 as in Figure 1a,b), pg. 3, last para.) and
	with said at least one processor, permuting second model weights to maximize correlation between corresponding hidden states. (Due to symmetries in the architecture and weight initialization procedures, for any given parameter vector that is found, one could create many equivalent solutions simply by permuting the unit orders within a layer (and permuting the outgoing weights accordingly), pg. 4, second para.; (d) Between-net correlation for Net1 vs. a version of Net2 that has been permuted to approximate Net1’s feature order. The partially white diagonal of this final matrix shows the extent to which the alignment is successful pg. 3, Fig. 1; We find matching units between a pair of networks — here Net1 and Net2 — in two ways. In the first approach, for each unit in Net1, we find the unit in Net2 with maximum correlation to it, which is the max along each row of Figure 1c, pg. 4, section 3.1, first para.)
	The same motivation to combine independent claim 12 applies here.

	Regarding claim 17, Modified Ryan teaches the apparatus of claim 13, Ryan teaches wherein said at least one processor is further operative to: (The processor 502 is a hardware device for executing software instructions [0176])
	substitute said new model for one of said two trained neural network models; (Models have to be extensively replaced by other algorithms and optimized to avoid under-fitting when the input evolves to a more complex and heterogeneous data [0138]) and
	and selecting to obtain a further refined new model.  (… and selecting the best algorithm among the optimized algorithms, given the current network context [0095]; The preparation step is selected during the training of the machine learning algorithm [0119]; The procedure 110 includes selecting hyper-parameters (step 112) [0114])
	Li teaches iteratively repeat said neuron alignment, training (We would like to investigate the similarities and differences between multiple training runs of same network architecture, pg. 4, second para.; Both methods reveal that the average correlation for one-to-one alignments varies from layer to layer (Figure 4), with the highest matches in the conv1 and conv5 layers, pg. 5, second para. The Examiner notes that multiple training runs is the iteration)
	The same motivation to combine independent claim 12 applies here.

	Regarding claim 18, Modified Ryan teaches the apparatus of claim 6, Ryan teaches wherein said at least one processor is further operative to implement said further refined new model in an artificial intelligence application. (Meta Learning 224 selects the best performing models which is sent as input according to the arrow into machine learning 222 (which is considered as artificial intelligence application) Fig. 15, diagram on the right of last shaded section)

	Regarding claim 19, Modified Ryan teaches the apparatus of claim 18, Ryan teaches wherein said artificial intelligence application comprises computer vision, and wherein said at least one processor is further operative to control (…. control operations of the server 500 pursuant to the software instructions [0176])
	at least one of a vehicle and a tool (forecasting traffic congestion on streets by detecting patterns in a time-series from video cameras on streets, cars [0061]. The Examiner notes that video cameras as tools for perceiving scenes in computer vision)
	with said further refined new model based at least in part on adversarial input. (… and selecting the best algorithm among the optimized algorithms, given the current network context [0095]; Noise may be introduced into the inputs to the black boxes, … The black boxes may be described as machine learning 222 and meta learning 224 processes for providing models and selecting the best performing models [0139]; Anomalies are deviations from regular patterns of data profiles. Unexpected bursts in time-series data might indicate, … an intrusion activity or cyber-attack in network traffic data [0129]; One way that this can be done is by creating images from time-series data, as described above, and then passing the image data to a Generalized Adversarial Network (GAN), which is a Deep Neural Network that enables learning of a distribution of the data from the time-series [0182]. The Examiner notes that noise and cyber-attack in network traffic data are adversarial attacks on data.)

	Regarding claim 20, Modified Ryan teaches the apparatus of claim 13, Li teaches wherein training said minimal loss curve comprises applying stochastic gradient descent. (This layer is then trained to minimize the sum of squared prediction errors plus an L1 penalty, the strength of which is varied.9, pg. 7, second para.; Second, … making the initial cost about the same on all layers and allowing the same learning rate and SGD momentum hyperparameters to be used for all layers, pg. 7, footnote 9. The Examiner notes that SGD also known as stochastic gradient descent is an optimization technique that minimizes the loss function)
	The same motivation to combine independent claim 12 applies here.

Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121  

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121