DETAILED ACTION
This action is in response the communications filed on 09/29/2022 in which claims 1, 2, 5, 9, 11, 12, 15, 19 and 21 are amended, claims 3, 6, 13 and 16 are canceled, and therefore claims 1-2, 4, 7-12, 14 and 17-21 are pending, claims 5 and 15 are objected to.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Objections
Claims 4, 5, 14 and 15 are objected to because of the following informalities: 
In claims 4 and 14, “The electronic device of claim 3/ The method of 13” where claim 3 has been canceled. For examination purposes, the claim is viewed as “The electronic device of claim 2/ The method of 2”
In claims 5 and 15 line 4, “where s(k) is the average of the sign function on the kth hidden unit…” should be “where s(k) is an average of the sign function on the kth hidden unit.”
(“the sign function on the kth hidden unit” is appropriate because it is how it’s described in mathematics.)
In claims 5 and 15 line 5, “avg… denotes the average taken over all possible pairs…” should be “avg… denotes an average taken over all possible pairs”
In claims 5 and 15 line 6, “d is the number of hidden units in a given layer” should be “d is a total number of hidden units in the given layer.” “a given layer” is first seen in line 2.


 Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4, 7-12, 14 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Salimans ("Improved Techniques for Training GANs") in view of Ioffe ("Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift") in further view of Raghu ("On the Expressive Power of Deep Neural Networks").


In regard to claims 1 and 11, Salimans teaches: An electronic device for improved neural network training comprising: a processor; a non-transitory computer-readable medium storing data representative of a generative adversarial network (GAN) to learn from unlabeled data by engaging a generator and a discriminator; and (Salimans, 6 Experiments, 6.4 ImageNet "We extensively modified a publicly available implementation of DCGANs2 using TensorFlow [28] to achieve high performance, using a multi-GPU implementation."; Salimans indicates that they implement their method using TensorFlow on a computer, where a processor / GPU, a non-transitory computer-readable medium / memory, programs / instructions are inherent; Section 2 "One of the primary goals of this work is to improve the effectiveness of generative adversarial networks for semi-supervised learning (improving the performance of a supervised task, in this case, classification, by learning on additional unlabeled examples) [learn from unlabeled data].")
one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: (see above)
receiving a plurality of training cases; (Salimans, 3.2 Minibatch discrimination, "... The task of the discriminator is thus effectively still to classify single examples as real data or generated data …6 Experiments, "We performed semi-supervised experiments on MNIST, CIFAR-10 and SVHN, and sample generation experiments on MNIST, CIFAR-10, SVHN and ImageNet."; examples from the generator (generated data) and real data, or images from MNIST, CIFAR-10, SVHN are also examples of training cases.)
training the generative adversarial network, based on the plurality of training cases, to classify the training cases as real or fake; and (Salimans, section 1 "The training signal for G is provided by a discriminator network D(x) that is trained to distinguish samples from the generator distribution pmodel(x) from real data"; 3.2 Minibatch discrimination, "The task of the discriminator is thus effectively still to classify single examples as real data [real data] or generated data [fake data]…")
executing a regularizer to configure the discriminator to allocate a model capacity evenly; (Salimans, 3.2 Minibatch discrimination, "One of the main failure modes for GAN is for the generator to collapse to a parameter setting where it always emits the same point. When collapse to a single mode is imminent, the gradient of the discriminator may point in similar directions for many similar points... An obvious strategy to avoid this type of failure is to allow the discriminator to look at multiple data examples in combination, and perform what we call minibatch discrimination [regularization]...  The concept of minibatch discrimination is quite general: any discriminator model that looks at multiple examples in combination, rather than in isolation... Let f (xi) RA denote a vector of features for input xi, produced by some intermediate layer in the discriminator... The output o(xi) for this minibatch layer for a sample xi is then defined as the sum of the cb(xi; xj)’s to all other samples... "; batch discrimination is a regularizer to allocate the discriminator capacity evenly. A model capacity is based on the parameter setting, and batch discrimination can avoid the problem of same-point parameter setting, i.e. it can help allocate a model capacity evenly, see related reference Berthelot ("a heuristic regularizer") and Arora ("discriminator capacity").)

Salimans does not teach, but Ioffe teaches: wherein the discriminator is a rectifier network having an activation function defined as: f(x) = x+ = max(0,x), where x is input to a neuron of the rectifier network; (Ioffe, Section 1 "In practice, the saturation problem and the resulting vanishing gradients are usually addressed by using Rectified Linear Units... ReLU(x) = max(x, 0)..."; section 3.2 "Batch Normalization can be applied to any set of activations in the network. Here, we focus on transforms that consist of an affine transformation followed by an elementwise nonlinearity: z = g(Wu + b) where W and b are learned parameters of the model, and g(.) is the nonlinearity such as sigmoid or ReLU.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the discriminator of Salimans to include ReLU of Ioffe in the model. Doing so would solve the saturation problem and the resulting vanishing gradients. (Ioffe, Section 1 "In practice, the saturation problem and the resulting vanishing gradients are usually addressed by using Rectified Linear Units... ReLU(x) = max(x, 0)")

Salimans and Ioffe do not teach, but Raghu teaches: wherein the regularizer is configured to encourage each piecewise linear region of the discriminator to contain as few data points as possible so that data points represented by x lie in different regions and that ∇xD(x) is diverse, where D(x) is represented by x+=max(0, x). (Raghu, p. 2 “… the notion of a ‘linear region’ is introduced. Given a neural network with piecewise linear activations (such as ReLU [max(0, x)] or hard tanh), the function it computes is also piecewise linear, a consequence of the fact that composing piecewise linear functions results in a piecewise linear function. So one way to measure the ‘expressive power’ of different architectures A is to count the number of linear pieces (regions), which determines how nonlinear the function is.”; p. 4 "Figure 1. Deep networks with piecewise linear activations subdivide input space into convex polytopes. We take a three hidden layer ReLU network, with input x... , and four units in each layer... This final set of convex polytopes corresponds to all activation patterns for this network (with its current set of weights) over the unit square, with each polytope representing a different linear function."; The different local linear regions is closely related to the different activation patterns. In other words, two inputs points into a model with different activation patterns on all layers are guaranteed to lie on different regions, therefore each linear region will have few data points as possible so that the data is diverse lying in different regions, and so is the gradient.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the discriminator of Salimans to include ReLU of Ioffe and Raghu in the model. Doing so would have different linear regions because of the different activation patterns on all layers. (Raghu, p. 2 “… the notion of a ‘linear region’ is introduced. Given a neural network with piecewise linear activations (such as ReLU or hard tanh), the function it computes is also piecewise linear, a consequence of the fact that composing piecewise linear functions results in a piecewise linear function. So one way to measure the ‘expressive power’ of different architectures A is to count the number of linear pieces (regions), which determines how nonlinear the function is.”)

Claim 11 recites substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 11.

In regard to claims 2 and 12, reference is made to the rejection of claims 1 and 11 respectively, Salimans teaches: wherein at least one of the generator and the discriminator is a neural network. (Salimans, section 1 "The goal of GANs is to train a generator network G(z; θ(G)) that produces samples from the data distribution, pdata(x), by transforming vectors of noise z as x = G(z; θ(G)). The training signal for G is provided by a discriminator network D(x)…"; in practice generator and discriminator are often neural network models, also see related reference Arora (“where Gu is a function — which is often a neural network in practice)... Suppose the generator and discriminator are both k-layer neural networks”))

In regard to claims 4 and 14, reference is made to the rejection of claims 2 and 12 respectively, and further, Salimans does not teach, but Ioffe teaches: wherein the discriminator is configured to compute a piecewise linear function. (Ioffe, Section 1 "In practice, the saturation problem and the resulting vanishing gradients are usually addressed by using Rectified Linear Units... ReLU(x) = max(x, 0)..."; ReLu or Maxout are examples of piecewise linear functions, e.g. ReLu are pieces of linear functions y=0 and y=x.)

The rationale for combining the teachings of Salimans and Ioffe is the same as set forth in the rejection of claims 1 and 11 respectively.

In regard to claims 7 and 17, reference is made to the rejection of claims 1 and 11 respectively, Salimans teaches: wherein the plurality of training cases transmitted to the discriminator comprise real data and fake data. (Salimans, section 1 "The training signal for G is provided by a discriminator network D(x) that is trained to distinguish samples from the generator distribution pmodel(x) from real data"; 3.2 Minibatch discrimination, "The task of the discriminator is thus effectively still to classify single examples as real data [real data] or generated data [fake data]…")

In regard to claims 8 and 18, reference is made to the rejection of claims 7 and 17 respectively, Salimans teaches: wherein the plurality of training cases transmitted to the discriminator comprise interloped real and fake data. (Salimans, section 5 Semi-supervised learning "Assuming half of our data set consists of real data and half of it is generated (this is arbitrary), our loss function for training the classifier then becomes…")

In regard to claims 9 and 19, reference is made to the rejection of claims 8 and 18 respectively, and further, Salimans does not teach, but Ioffe teaches: wherein the regularizer is applied to immediate pre-nonlinearity activities on one or more layers of the discriminator model. (Ioffe, section 3.2 "Batch Normalization can be applied to any set of activations in the network. Here, we focus on transforms that consist of an affine transformation followed by an elementwise nonlinearity: z = g(Wu + b) where W and b are learned parameters of the model, and g(.) is the nonlinearity such as sigmoid or ReLU. This formulation covers both fully-connected and convolutional layers. We add the BN transform immediately before the nonlinearity, by normalizing x = Wu + b."; batch normalization / regularizer is right before ReLU.)

The rationale for combining the teachings of Salimans and Ioffe is the same as set forth in the rejection of claims 1 and 11 respectively.

In regard to claims 10 and 20, Salimans teaches: wherein the regularizer is applied on generated data and random interpolation inbetween real and generated fake data. (Salimas, section 3.2 "The concept of minibatch discrimination… we have restricted our experiments to models that explicitly aim to identify generator samples that are particularly close together… The output o(xi) for this minibatch layer for a sample xi is then defined as the sum of the cb(xi; xj)’s to all other samples... We compute these minibatch features separately for samples from the generator and from the training data."; section 5 Semi-supervised learning "Assuming half of our data set consists of real data and half of it is generated (this is arbitrary)..."; Because in the GAN training, mix of real data and fake data [arbitrary / random] are provided to the discriminator, the minibatch discrimination [regularizer] is applied on the fake data and random interpolation inbetween real and fake data.)

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Salimans in view of Ioffe in view of Raghu in view of Zhang ("StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks").

In regard to claim 21, Salimans teaches: An electronic device comprising: one or more processors; memory having stored thereon one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: (Salimans, 6 Experiments, 6.4 ImageNet "We extensively modified a publicly available implementation of DCGANs2 using TensorFlow [28] to achieve high performance, using a multi-GPU implementation."; Salimans indicates that they implement their method using TensorFlow on a computer, where processors, memory and programs are inherent.)
… trained using a regularizer to configure a discriminator to evenly use its model capacity; and (Salimans, 3.2 Minibatch discrimination, "One of the main failure modes for GAN is for the generator to collapse to a parameter setting where it always emits the same point. When collapse to a single mode is imminent, the gradient of the discriminator may point in similar directions for many similar points... An obvious strategy to avoid this type of failure is to allow the discriminator to look at multiple data examples in combination, and perform what we call minibatch discrimination [regularization]...  The concept of minibatch discrimination is quite general: any discriminator model that looks at multiple examples in combination, rather than in isolation... Let f (xi) RA denote a vector of features for input xi, produced by some intermediate layer in the discriminator... The output o(xi) for this minibatch layer for a sample xi is then defined as the sum of the cb(xi; xj)’s to all other samples... "; batch discrimination is a regularizer to allocate the discriminator capacity evenly. A model capacity is based on the parameter setting, and batch discrimination can avoid the problem of same-point parameter setting, i.e. it can help allocate a model capacity evenly, see related reference Berthelot ("a heuristic regularizer") and Arora ("discriminator capacity").)

Salimans does not teach, but Zhang teaches: receiving a text string; (Zhang, abstract "In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256 x 256 photo-realistic images conditioned on text descriptions."; section 3 see Figure 2 "Figure 2. The architecture of the proposed StackGAN. The Stage-I generator draws a low-resolution image by sketching rough shape and basic colors of the object from the given text [receiving a text string] and painting the background from a random noise vector. Conditioned on Stage-I results, the Stage-II generator corrects defects and adds compelling details into Stage-I results, yielding a more realistic high-resolution image.")
processing the text string using a generative adversarial network... (Zhang, section 3 see Figure 2 "Figure 2. The architecture of the proposed StackGAN [using a GAN]. The Stage-I generator draws a low-resolution image by sketching rough shape and basic colors of the object from the given text [text string] and painting the background from a random noise vector.")
generating an image based on the processed text string. (Zhang, abstract "In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256 x 256 photo-realistic images conditioned on text descriptions."; section 3 see Figure 2 "Figure 2... the Stage-II generator corrects defects and adds compelling details into Stage-I results, yielding a more realistic high-resolution image.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have applied the GNA model of Salimans on the applications of Zhang. Doing so would allow the model to generating photo-realistic images from text descriptions. (Zhang, abstract "Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications... Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.")

Salimans and Zhang do not teach, but Ioffe teaches: wherein the discriminator is a rectifier network having an activation function defined as: f(x) = x+ = max(0,x), where x is input to a neuron of the rectifier network; (Ioffe, Section 1 "In practice, the saturation problem and the resulting vanishing gradients are usually addressed by using Rectified Linear Units... ReLU(x) = max(x, 0)..."; section 3.2 "Batch Normalization can be applied to any set of activations in the network. Here, we focus on transforms that consist of an affine transformation followed by an elementwise nonlinearity: z = g(Wu + b) where W and b are learned parameters of the model, and g(.) is the nonlinearity such as sigmoid or ReLU.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the discriminator of Salimans and Zhang to include ReLU of Ioffe in the model. Doing so would solve the saturation problem and the resulting vanishing gradients. (Ioffe, Section 1 "In practice, the saturation problem and the resulting vanishing gradients are usually addressed by using Rectified Linear Units... ReLU(x) = max(x, 0)")

Salimans, Zhang and Ioffe do not teach, but Raghu teaches: wherein the regularizer is configured to encourage each piecewise linear region of the discriminator to contain as few data points as possible so that data points represented by x lie in different regions and that ∇xD(x) is diverse, where D(x) is represented by x+=max(0, x). (Raghu, p. 2 “… the notion of a ‘linear region’ is introduced. Given a neural network with piecewise linear activations (such as ReLU [max(0, x)] or hard tanh), the function it computes is also piecewise linear, a consequence of the fact that composing piecewise linear functions results in a piecewise linear function. So one way to measure the ‘expressive power’ of different architectures A is to count the number of linear pieces (regions), which determines how nonlinear the function is.”; p. 4 "Figure 1. Deep networks with piecewise linear activations subdivide input space into convex polytopes. We take a three hidden layer ReLU network, with input x... , and four units in each layer... This final set of convex polytopes corresponds to all activation patterns for this network (with its current set of weights) over the unit square, with each polytope representing a different linear function."; The different local linear regions is closely related to the different activation patterns. In other words, two inputs points into a model with different activation patterns on all layers are guaranteed to lie on different regions, therefore each linear region will have few data points as possible so that the data is diverse lying in different regions, and so is the gradient.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the discriminator of Salimans and Zhang to include ReLU of Ioffe and Raghu in the model. Doing so would have different linear regions because of the different activation patterns on all layers. (Raghu, p. 2 “… the notion of a ‘linear region’ is introduced. Given a neural network with piecewise linear activations (such as ReLU or hard tanh), the function it computes is also piecewise linear, a consequence of the fact that composing piecewise linear functions results in a piecewise linear function. So one way to measure the ‘expressive power’ of different architectures A is to count the number of linear pieces (regions), which determines how nonlinear the function is.”)

Allowable Subject Matter
Claims 5 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The closest prior arts for claim 5 and 15 are Courbariaux ("Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1") and Zhao ("Energy-based Generative Adversarial Network"). Courbariaux teaches batch normalization and activations constrained to ±1, but does not teach the concept of average of hidden units of the square of activation functions across the mini-batch. Zhao teaches a repelling regularizer that may be related to the second term, but does not teach the number of hidden units and |s_i T s_j|.

Response to Arguments
Applicant's amendments with respect to rejection of claims under 35 U.S.C. 112(b) have been fully considered and are sufficient to overcome the rejection. The rejection to the claims under 35 U.S.C. 112(b) has been withdrawn.

Applicant's arguments with respect to have been fully considered but they are not persuasive:
Applicant argues: (see p. 10 bottom): “Claim 1 has been amended to incorporate the subject matter of claim 3 and allowable claim 6. Claims 3 and 6 have been cancelled. Claim 11 has been amended to incorporate the subject matter of claim 13 and allowable claim 16. Claims 13 and 16 have been cancelled. Claim 21 has been amended to incorporate the allowable subject matter of claim 16.” 

Examiner answers: claims 5 and 15 are allowable subject matter, and claims 6 and 16 are also put under allowable subject matter because they depend on claims 5 and 15 respectively. The features in canceled claims 6 and 16 are not allowable.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Berthelot ("BEGAN: Boundary Equilibrium Generative Adversarial Networks") teaches a heuristic regularizer can be batch discrimination and repelling regularizer.
Arora ("Generalization and Equilibrium in Generative Adversarial Nets (GANs)") teaches the concept of model capacity.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.C./Examiner, Art Unit 2122                 

/BRIAN M SMITH/Primary Examiner, Art Unit 2122