DETAILED ACTION
This Office Action is in response to Applicant's Communication received on 04/29/2020 for application number 16/861,273.  
Claims 1-7 are presented for examination.  Claims 1, 6, and 7 are independent claims.   

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d).  Certified copy has been received for the foreign priority Application No. JP2019-095468 filed on 05/21/2019. 

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 04/29/2020 and 11/05/2020 have been considered by the Examiner.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Information processing apparatus, non-transitory computer readable storage medium for storing information processing program, and information processing method for learning using deep neural networks.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, and 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Roychowdhury et al. (US 2017/0206457 A1 hereinafter Roychowdhury) in view of Hu (US 2022/0036548 A1).

Regarding Claim 1, Roychowdhury teaches an information processing apparatus ([0053] computing device 602 that is representative of one or more computing systems and/or devices that may implement the various techniques) comprising: 
a memory ([0054] computing device includes one or more computer-readable media;  [0056] the computer-readable storage media including memory/storage 612); and  
a processor coupled to the memory ([0054] computing device includes a processing system, one or more computer-readable media; [0056] the computer-readable storage media including memory/storage 612.  See fig. 6), the processor being configured to execute a generation processing that includes generating a first mini-batch and processing to generate a second mini-batch  ([0055] the processing system perform one or more operations using hardware; [0038] stochastic gradient descent (SGD) works by randomly selecting a “batch"/subset of data to work with in any given iteration, and performs optimization of the model parameters using that subset of data; [0039] in the following iterations, this technique selects other subsets  and repeats the process; [0043] the iterative machine learning system starts by selecting the first mini-batch of this SGD sub-dataset using the batch selection module; in successive iterations, the batch selection module successively selects batches (i.e., second mini-batch) from the SGD sub-dataset), and  
execute a learning processing by using a neural network, the learning processing being configured to perform first learning by using the first mini-batch, and then perform second learning by using the second mini-batch ([0043] the iterative machine learning system starts by selecting the first mini-batch of this SGD sub-dataset using the batch selection module; in successive iterations, the batch selection module successively selects batches from the SGD sub-dataset; [0050]-[0051] model is trained using machine learning, such as through stochastic gradient descent; iterative selections are made of a batch 222 from the sampled training data by a batch selection module; the iteratively selected batches are iteratively processed using machine learning to train the model, such as through use of a neural network - thus, performing first learning by using the first mini-batch and second learning by using the second mini-batch).  
However, Roychowdhury fails to expressly teach wherein generate first mini-batch by performing data extension processing on learning data and generate a second mini-batch without performing the data extension processing on the learning data.
In the same field of endeavor, Hu teaches wherein generate first mini-batch by performing data extension processing on learning data and generate a second mini-batch without performing the data extension processing on the learning data ([0063] an augmentation technique is provided to capture differences in colors and/or color saturation between different subsets of training data, augment one or more subsets of training data with the captured differences to obtain an augmented subset of training data; train the deep learning neural network using the original subsets of training data (i.e., mini-batch without performing the data extension processing on the learning data) and the augmented subset of training data (i.e., mini-batch by performing data extension processing on learning data)).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein generate first mini-batch by performing data extension processing on learning data and generate a second mini-batch without performing the data extension processing on the learning data, as taught by Hu into Roychowdhury.  Doing so would be desirable because this data augmentation can improve performance of the model for images input for inference from various sources or studies (Hu [0066]) and increase accuracy of the deep neural network architecture (Hu [0042]). 

As to dependent Claim 3, Roychowdhury and Hu teach all the limitations of Claim 1.  Hu further teaches wherein the learning processing is configured to perform the second learning when the first learning reaches a learning rate determined in advance ([0096] model training was performed with a step size of 3000 for 30 epochs using a batch size of 1; Adam optimizer was used (learning rate=1×10-5) for optimization - thus, the second learning performed when the first learning reaches learning rate of 1×10-5).  

As to dependent Claim 5, Roychowdhury and Hu teach all the limitations of Claim 1.  Hu further teaches wherein execute an inference processing that includes performing inference processing by inputting inference data into learning result obtained in the learning processing ([0097] The trained model was applied to the cropped ovary images in Study 2, 3, 4 and 5 (i.e., learning result obtained in the learning processing); during the inference, the cropped ovary images were first resized using the same methods as described above and then fed to the trained model (i.e., inputting inference data into learning result).  Roychowdhury further teaches wherein execute an update processing when the inference data is inputted, the update processing being configured to calculate an inference parameter by using the inference data, and set the inference parameter into the inference processing ([0038]-[0039] stochastic gradient descent (SGD) works by randomly selecting a “batch”/ subset of data to work with in any given iteration, and performs optimization of the model parameters using that subset of data (i.e., calculate inference parameter); stochastic gradient is then used to update parameters of each of the layers in the model in such a way that the error is minimized (i.e., execute update processing)).  

Claim 6 is a medium claim that is corresponding to the apparatus claim 1 above and therefore, rejected for the same reasons.  Roychowdhury further teaches a non-transitory computer-readable storage medium for storing a program which causes a processor to perform processing ([0054]-[0055] computing device includes one or more computer-readable media; the processing system perform one or more operations using hardware; [0059] the described modules and techniques stored on some form of computer-readable media).

Claim 7 is a method claim that is corresponding to the apparatus claim 1 above and therefore, rejected for the same reasons.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Roychowdhury in view of Hu, further in view of Chen (US 2011/0055131 A1).

As to dependent Claim 2, Roychowdhury and Hu teach all the limitations of Claim 1.  Roychowdhury and Hu fail to expressly teach wherein the learning processing is configured to perform the second learning when the first learning reaches an epoch number determined in advance .  However, Roychowdhury teaches that the batch-size for the mini-batch can be specified as a predefined parameter; in successive iterations of the second phase assuming the batch-size is less than the size of the subsampled data, the batch selection module  successively selects batches from the SGD sub-dataset to send to the classifier module; if the batch-size of the batch is set to the same size as the sample, then there is a single iteration in the second phase in which each of the data points in the current SGD sub-dataset  is used in computation of the stochastic gradient; once an end of the sample (i.e., the SGD sub-dataset) is reached, the iterative machine learning system reverts to the first phase and draws another sample from the full set of training data, again using the specified subsampling  (see [0043]-[0044]), which implies that if batch-size is same as the sample, the second learning will be performed after one iteration/ epoch of the first learning.  Further, Hu teaches wherein the model training was performed with a step size of 3000 for 30 epochs using a batch size of 1 (see [0096]).  
Alternatively, Chen teaches wherein perform the second learning when the first learning reaches an epoch number determined in advance ([0051] partitioning the whole data; nine subsets generated; Nine predetermined groups to include subsets for training; trains one group and shift to the next group after a predetermined number of iterations).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein perform the second learning when the first learning reaches an epoch number determined in advance, as taught by Chen into Roychowdhury and Hu.  Doing so would be desirable because reduce the time needed for training (Chen [0051]), thereby improving efficiency. 

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Roychowdhury in view of Hu, further in view of Li (US 2021/0174066 A1).

As to dependent Claim 4, Roychowdhury and Hu teach all the limitations of Claim 1.  However, Roychowdhury and Hu fail to expressly teach wherein the learning processing is configured to perform the first learning, and then perform the second learning in a state where a weight parameter of each network layer included in the learning processing is fixed.  
In the same field of endeavor, Li teaches wherein the learning processing is configured to perform the first learning, and then perform the second learning in a state where a weight parameter of each network layer included in the learning processing is fixed ([0042] the gender model and the age model share at least two convolution layers, and the gender model is trained to be converged (i.e., first learning); and the age model is trained (i.e., second learning), where the weights of the at least two convolution layers described above remain unchanged in the age model training process; [0070] after the gender model converges, the weights of conv1, conv2, conv3 and conv4 in the gender model converged after the first training may be fixed; then, the above face image samples labeled with the age information are input to the age model to train the conv5, FC3, FC4 and softmax2 in the age model to converge the age model).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein perform the first learning, and then perform the second learning in a state where a weight parameter of each network layer included in the learning processing is fixed, as taught by Li into Roychowdhury and Hu.  Doing so would be desirable because it would improve the accuracy of the estimation model (Li [0042]). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Applicant is required under 37 CFR § 1.111(c) to consider these references fully when responding to this action.  
Hara et al. (US 2018/0012124 A1) teaches the processing unit may train the neural network 150 by the training module 140 based on the original and additional training data that is obtained for the target set of the chemical compounds. Parameters of the neural network 150, which may include weights between each units and biases of each unit, are optimized by appropriate training algorithm; the one or more additional training data augmented by the augmenting module 122 for each chemical compound in the target set; The training data stored in the training data store 130 may include original training data merely converted by the converting module 120 and the one or more additional training data augmented by the augmenting module 122 for each chemical compound in the target set (see [0063], [0033]). 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to REJI KARTHOLY whose telephone number is (571)272-3432.  The examiner can normally be reached on Monday - Thursday 7:30 am - 3:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached on 571-272-7212.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/R.K./Examiner, Art Unit 2143      

/JENNIFER N WELCH/           Supervisory Patent Examiner, Art Unit 2143