DETAILED ACTION
This Office Action is with regard to the most recent papers filed 9/19/2022.

Response to Arguments
Applicant’s arguments filed 9/19/2022 have been fully considered, but are generally deemed not persuasive.
First, with regard to the rejection under 35 USC 112, Applicant has provided “circuitry configured to implement” the various units, providing structure in the form of circuitry in the claim for the units, thus rendering the instant rejection moot.
With regard to the rejection of the instant claims under 35 USC 103, Applicant argues that Singh is silent with regard to each of the instant steps, and that Bosworth fails to cure the deficiencies of Singh.  It is noted that in Applicant’s remarks, the rationale of how the claims are rejected, including the specific mappings and explanation provided with regard to combining Bosworth with Singh was not specifically addressed in the remarks, but rather Applicant appears to be broadly asserting that Singh in isolation of Bosworth fails to teach all of the claim steps, and that Bosworth is not the same as the language of the subsampling unit in its entirety.  It is noted that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
As currently presented, the instant claims appear to provide for two buffering units (first and second), where information that is to continue to be processed (learning data determined to be used) is placed in the first buffer, and information that is not to be further processed (learning data determined to not be used) is placed in the second buffer.  Without details of the buffers, any processing of information in the second buffer, etc., the buffer arrangement appears to be equivalent to a hierarchical memory arrangement, such as provided in Bosworth, but with a specific application in learning by gradient boosting, as in Singh.  Specifically, in Bosworth, most active data is placed in a fastest memory (first buffer) and less active data is placed in a slower memory (second buffer), thus allowing the data to be all provided in memory, with the higher speed but lower capacity memory providing its advantage with regard to active processing while not being wasted on data that is not being processed, while higher capacity but lower speed memory expands the memory space for the data while not slowing down the processing of the data.  Applicant’s broad assertions do not appear to specifically address the application of a hierarchy with the tasks of learning by gradient boosting, as in the combination of Singh and Bosworth.
Thus, after careful consideration of Applicant’s remarks, the rejection of the instant claims has been maintained.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Harsheep Singh in “Understanding Gradient Boosting,” posted at <https://towardsdatascience.com/understanding-gradient-boosting-machines-9be756fe76ab> on 11/3/2018 (Singh) in view of “The Memory Hierarchy,” as posted at <http://www.edwardbosworth.com/CPSC2105/MyTextbook2105_HTM/MyText2105_Ch12_V06.htm> on 2/26/2018 (Bosworth).
With regard to claim 1, Singh discloses a learning device configured to perform learning by gradient boosting, the learning device comprising: 
a data memory configured to store learning data for learning a model by the gradient boosting and gradient information corresponding to the learning data (Singh: Pages 2-3 (Note, the explanation for how Singh teaches the cited language for these steps below)); and
circuitry configured to implement 
a learning unit configured to learn the model using the learning data stored in the data memory (Singh: Pages 2-3); 
an update unit configured to update gradient information of each learning data based on the model learned by the learning unit (Singh: Pages 2-3); 
a subsampling unit configured to determine whether to use the learning data corresponding to the gradient information updated by the update unit, for learning of a next model after learning of one model based on a subsampling rate as a ratio of learning data used for learning of the model (Singh: Pages 2-3); and
a volume determined in advance (Singh: Page 3.  The instant claim appears to provide for a typical Gradient Boost algorithm with a set limit (volume in advance), where Singh provides a Gradient Boost algorithm with a maximum number of iterations.  The instant claim proceeds to recite first and second buffers and transfers between them, which appears to provide for memory management functions, which is addressed below with the use of Bosworth.).
Singh fails to disclose, but Bosworth teaches:
that the subsampling unit is to divide the learning data into learning data determined to be used for learning of the next model and learning data determined not to be used for learning of the next model (Bosworth: Page 17.  With the different tiers, data would be placed in a tier in accordance with the actively processed data (used in the next model) in a fastest tier and data that is not actively processed (not used in a next model) in a slower tier.);
a first buffer unit configured to buffer the learning data determined to be used for learning of the next model by the subsampling unit and gradient information corresponding to the learning data up to a volume determined in advance (Bosworth: Page 17, Virtual Memory in Practice and corresponding Figure.  In virtual memory systems, different tiers are provided where a second memory (e.g. hard disk drive), main memory (e.g. DRAM), and Cache (e.g. SRAM) are all coordinated to have the most actively processed data in the cache (First unit), the next data provided for in the main memory (Second unit), and the least used data in the second memory.); 
a second unit configured to buffer learning data determined not to be used for learning of the next model by the subsampling unit and gradient information corresponding to the learning data up to a volume determined in advance (Bosworth: Page 17, Virtual Memory in Practice and corresponding Figure.  In virtual memory systems, different tiers are provided where a second memory (e.g. hard disk drive), main memory (e.g. DRAM), and Cache (e.g. SRAM) are all coordinated to have the most actively processed data in the cache (First unit), the next data provided for in the main memory (Second unit), and the least used data in the second memory.), wherein
the subsampling unit is configured to output the learning data determined to be used for learning of the next model to the first buffer unit and, in parallel, to output the learning data determined not to be used for learning of the next model to the second buffer unit (Bosworth: Page 17.  The data is stored in the different tiers at the same time (in parallel).), and
the first and second unit being buffers and have the first buffer unit and the second buffer unit are configured to write the learning data and the gradient information into the data memory for each predetermined block when buffering the learning data and the gradient information up to the volume determined in advance (Bosworth: Page 17.  The data is transferred between the tiers according to the current processing needs of the system).
Accordingly, it would have been obvious to one of ordinary skill in the art at the time of filing to utilize memory tiers for the system of Singh to utilize standard computer practice for the Gradient Boosting, where different types of volatile memory and non-volatile memory are used in sequence to realize the benefits of the different types of memory without as many disadvantages.  For example, non-volatile memories (e.g. hard disk drives) tend to be much larger than the other memories in relation to cost, but are also much slower.  The next tier, DRAM, is faster but more expensive than a hard drive, but meanwhile is slower and cheaper than SRAM.  Finally, SRAM is the fastest, but most expensive for the space.  By reading and writing data from the different tiers according to the current processing needs of the system, such as having data required for the current tasks in the SRAM while having data that is required for the overall processing, but not for the current task in a lower tier, the system can benefit from the faster access of the SRAM, while having the additional space provided at least by the DRAM (and, if it is needed, the virtual memory provided from the hard disk drive), thus providing for some of the advantages of the faster memory types while mitigating the high cost of such while providing enough memory space for the complete processing to be performed.  

With regard to claim 2, as best understood, Singh teaches that the update unit is configured to read out the learning data for each predetermined block, when reading out the learning data from the data memory to update the gradient information (Bosworth: Pages 17.  Virtual memory and the different tiers function by moving blocks of data (e.g. pages) between the different tiers of memory.).

With regard to claim 3, Singh in view of Bosworth teaches that the data memory comprises: a first bank region for storing the learning data and the gradient information corresponding to the learning data buffered in the first buffer unit; and a second bank region for storing the learning data and the gradient information corresponding to the learning data buffered in the second buffer unit (Boswortha: Pages 17.  As provided in the rejection of claim 1, above, active processing data, such as the more complex tree, would be in the higher tier of memory, while the lower tier would have the easier tree.).

With regard to claim 4, Singh in view of Bosworth teaches that the data memory includes two data memories, and one of the two data memories includes the first bank region, and the other one of the two data memories includes the second bank region (Bosworth: Page 17). 

With regard to claim 5, Singh in view of Bosworth teaches wherein the data memory comprises a dynamic random access memory (DRAM) (Bosworth: Page 4.  The system includes SRAM and DRAM, where the highest tier (cache) is provided by SRAM and the main memory would be provided using DRAM.).

With regard to claim 6, Singh in view of Bosworth teaches a second memory configured to store the learning data, the second memory comprising a memory outside the processing chip (Singh: Page 4).  Singh in view of Bosworth fails to teach expressly, but Official Notice is taken that it was well-known in the art at the time of filing to have the data memory comprises: a first memory configured to store the gradient information, the first memory comprising a memory inside a processing chip in which at least the learning unit and the update unit are configured (more specifically, the inclusion of SRAM in the processing ship was well-known in the art, where in the teachings of Bosworth, such SRAM would include the active processing tasks data).  Accordingly, it would have been well-known in the art at the time of filing to have the SRAM included on the processing chip to improve the access time of such SRAM, such that less distance would need to be traversed to access the SRAM versus having the SRAM on a separate chip while maintaining a more direct pathway between the processor and the SRAM that would not need to leave the processing chip.

With regard to claim 7, Singh in view of Bosworth teaches that the first memory comprises a static random access memory (SRAM), and the second memory comprises a DRAM (Bosworth: Page 4).

With regard to claim 8, Singh in view of Bosworth teaches that the volume determined in advance is determined based on a burst length of the data memory (Bosworth: Page 7.  As a note, the instant language uses the term “such that,” which does not appear to require that any determinations are made with regard to the burst length, or even that the burst length is known, by the system (for example, the instant claim can refer to a design choice made by the builder of the system.).

With regard to claim 9, Singh in view of Bosworth teaches that an operation of reading-out and writing is divided to be performed on a time-series basis in the data memory (Bosworth: Page 17).

With regard to claim 10, Singh in view of Bosworth teaches that the operation of reading-out and writing is divided to be performed on the time-series basis such that transfer efficiency of the operation of reading-out and writing is equal to or larger than a predetermined value in the data memory (Bosworth: Page 2.  It is noted that the instant language uses the term “such that,” which does not appear to require that any determinations are made with regard to the transfer efficiency, or even that the transfer efficiency is known, by the system.  Further, the “predetermined value” does not have any requirements, where such can merely provide that the different tiers would be faster than a previous tier (with the approximate speeds being predetermined).

With regard to claim 11, Singh in view of Bosworth teaches that an operating frequency of processing performed by the learning unit and the update unit is different from an operating frequency of processing performed by the subsampling unit, the first buffer unit, and the second buffer unit (Bosworth: Page 2.  The different components operate at different speeds.).

With regard to claim 12, Singh in view of Bosworth fails to teach, but Official Notice is taken that it would have been obvious to one of ordinary skill in the art at the time of filing to have the learning unit include a plurality of learning units and the update unit includes a plurality of update units (more specifically, multithreading and multitasking were well-known in the art, where such would provide for multiple concurrent tasks being performed.).  Accordingly, it would have been obvious to one of ordinary skill in the art at the time of filing to have multiple processes performed simultaneously (learning units and update units) to provide for faster processing by performing such tasks in parallel, such as by using a multiprocessor or multithreading system.

With regard to claim 13, Singh in view of Bosworth teaches a distributing unit configured to read out the learning data from the data memory for each predetermined block, and distribute learning data included in each block to the learning units in order (Bosworth: Page 17.  The system reads and writes data for different pages/blocks as needed (“in order” does not denote if there is a sequence number or if this refers to the order that is required for processing).).

With regard to claim 14, Singh in view of Bosworth teaches that the learning unit is configured to learn a decision tree as the model by a gradient boosting decision tree as the gradient boosting (Singh: pages 2-3).

With regard to claim 15, Singh in view of Bosworth teaches the learning unit is configured to determine whether to cause the learning data stored in the data memory to branch to one node or to the other node of lower nodes of a node of the decision tree based on a branch condition for the node, the first buffer unit is configured to buffer learning data determined to branch to the one node by the learning unit up to a volume determined in advance, the second buffer unit is configured to buffer learning data determined to branch to the other node by the learning unit up to the volume determined in advance, and, the first buffer unit and the second buffer unit are configured to write the learning data into continuous addresses of the data memory for each predetermined block when buffering the learning data up to the volume determined in advance (Singh: Pages 2-3.  The tree branch that is more complicated has further processing performed while the tree that is easier be maintained.  When providing multiple tiers of memory, the tree that is to be processed more would be placed in the faster buffer for processing.).

With regard to claim 16, the instant claims are similar to claim 1, and is rejected for similar reasons.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT B CHRISTENSEN whose telephone number is (571)270-1144. The examiner can normally be reached Monday through Friday, 6AM to 2PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Follansbee can be reached on (571) 272-3964. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SCOTT B. CHRISTENSEN
Examiner
Art Unit 2444



/SCOTT B CHRISTENSEN/Primary Examiner, Art Unit 2444