DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
All information disclosure statements were submitted prior to the first action and are incompliance with the provisions of 37 C.F.R. § 1.97.  Accordingly, they have been considered. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10, and 12-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 10 and 17 substantially recite: “selecting a top number of memory pages based on the prediction errors for the grouping of the memory pages into the number of patterns.”  It is not clear what parameter must be the “top number” or how “top” (or bottom) is being determined.  Specifically it is not clear whether the recited “top” pages must be the ones with the most prediction errors or if they could be the top most correctly predicted since the language only recites “selecting a top number . . . based on the prediction errors”    
Claim 12 recites: “training each of the LSTM RNNs with a top P number of the sorted patterns across the last E number of epochs”.  It is not clear how “top” (or bottom) would limit a “sorted pattern”.  This may, in the context of the inventive concept, refer to a high access count.  But the claims do not require that a “top” number of sorted patterns be the patterns with the highest counts.  Without any specific metric for determining “top” the term is subjective.  See MPEP § 2173.05(b).  
All dependent claims are rejected as containing the limitations of the claims from which they depend.  



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8 and 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hasbun (2019/0347202, filed Aug 2018, different assignee) and Martin (2020/0133489, filed Oct 2018, different assignee).
1. A computer processing system comprising: 
a first memory having resident therein a first set of memory pages; a second memory having a characteristic different from a characteristic of the first memory, the second memory coupled to the first memory; (Hasbun teaches: “As discussed above and in some examples, certain memory pages stored in the non-volatile memory array may be preemptively moved to the volatile memory array to avoid future access delays before the data otherwise located in the memory pages is requested (i.e., the data may be prefetched).”  Hasbun column 3 lines 60-65.) a long short-term memory (LSTM) instance module having a resource tracker and one or more LSTM recurrent neural network (RNN) instances; and a predictor configured to: determine one or more memory page access patterns from the one or more LSTM RNN instances; (With respect to claim interpretation, note that the recited “access patterns” are explained in paragraph 0016 of the specification as number of accesses. Hasbun teaches: “In some examples, access patterns for data stored in the non-volatile memory array are monitored to dynamically determine which memory pages to move to the volatile memory array—e.g., to determine which memory pages contain data that is likely to be accessed within a certain period of time. For example, a counter (which may be referred to as a “saturation counter”) may be used to monitor the number of times data in a certain memory page is accessed within a certain period of time or cycle, and the memory page may be moved from the non-volatile memory array to the volatile memory array based on a value of the counter.” Hasbun column 3, line 66 to column 4 line 10.
Hasbun does not teach using a recurrent neural network as a predictor.  
Martin teaches: “data from the sequential sectors may be prefetched into the acquired cache slots in anticipation of the data being requested from these sectors (e.g., as part of a read or write operation). However, such storage systems may not be able to reliably predict I/O requests that are not sequential.”  Martin paragraph 0021.  “Described herein is a system, and related techniques, for predicting I/O requests that are not necessarily directed to sequential sectors of a physical storage device. In some embodiments, I/O patterns that do not involve sequential-sector access, and that may be relatively long-term patterns, may be recognized. More generally, long-term I/O patterns may be recognized, and long-term I/O behavior may be predicted.” Martin paragraph 0022.  “To recognize such patterns, deep machine-learning techniques may be used, for example, neural networks. Such neural networks may be a recurrent neural network (RNN) such as, for example, a long short-term memory RNN (LSTM-RNN), which may be referred to herein as an “LSTM.” Other types of neural networks may be used, albeit they may not be as effective in recognizing long-term patterns as LSTMs. An I/O workstream for a storage device may be sampled for specific I/O features to produce a time series of I/O feature values of a workstream, and this time series of data may be fed to a prediction engine, e.g., an LSTM, to predict one or more future I/O features values, and I/O actions may be taken based on these predicted feature values.” Martin paragraph 0023.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Martin because using an LTBM to determine which data to prefetch is an effective way of recognizing usage patterns, especially non-sequential long term usage patterns, and therefore make prefetching more efficient (by caching data more likely to be accessed).) 
predict, by operation of the one or more LSTM RNN instances, a number of page accesses for each determined one or more memory page access patterns; and based on the predicted number of page accesses, select a second set of memory pages for moving from the first memory to the second memory.  (“a counter (which may be referred to as a “saturation counter”) may be used to monitor the number of times data in a certain memory page is accessed within a certain period of time or cycle, and the memory page may be moved from the non-volatile memory array to the volatile memory array based on a value of the counter.”  Hasbun column 4 lines 5-10.  “Or, if the counter indicates that the number of access attempts by SoC/processor 250 during the time interval is equal to or larger than the pre-determined threshold value, then interface controller 230 may, upon removal of the data from virtual memory bank 235, store the data in buffer 240, as the interface controller 230 may anticipate that SoC/processor 250 is likely to access the data soon.” Hasbun column 12, lines 15-20.  With respect to using an LSTM for this task, see Martin cited above.)
2. The computer processing system of claim 1, wherein the predictor is further configured to: 
identify a plurality of memory pages from the first set of memory pages in the first memory for the one or more LSTM RNN instances; group, by an aggregator of the predictor, the memory pages of the identified plurality of memory pages into a number of memory page access patterns based on a number of memory accesses per time of the respective identified memory pages; determine at least one of a P number of memory page access patterns for the one or more LSTM RNN instances; (if the counter indicates that the number of access attempts by SoC/processor 250 during the time interval is equal to or larger than the pre-determined threshold value, then interface controller 230 may, upon removal of the data from virtual memory bank 235, store the data in buffer 240, as the interface controller 230 may anticipate that SoC/processor 250 is likely to access the data soon.” Hasbun column 12, lines 15-20.) and move each of the second set of memory pages of the second set from the first memory to the second memory. (“if the counter indicates that the number of access attempts by SoC/processor 250 during the time interval is equal to or larger than the pre-determined threshold value, then interface controller 230 may, upon removal of the data from virtual memory bank 235, store the data in buffer 240, as the interface controller 230 may anticipate that SoC/processor 250 is likely to access the data soon.” Hasbun column 12, lines 15-20.)
3. The computer processing system of claim 2, wherein: 
the LSTM instance module includes P number of LSTM RNN instances; (“a controller may include a prefetch counter for each memory page, for each memory subpage in the memory array, or some combination.” Hasbun column 13, lines 10-15.) and the predictor provides the P number of memory page access patterns to the P number of LSTM RNN instances, one memory page access pattern per LSTM RNN instance. (With respect to claim interpretation, paragraph 0016 describes the recited “access patterns” as access counts within a time period.   Hasbun teaches: “During a prefetch operation, the memory system may move, from the non-volatile memory array to the volatile memory array, one or more pages of data (or a “page” or “memory page”) that are likely to be accessed.” Hasbun column 2 lines 25-30.  “if the counter indicates that the number of access attempts by SoC/processor 250 during the time interval is equal to or larger than the pre-determined threshold value, then interface controller 230 may, upon removal of the data from virtual memory bank 235, store the data in buffer 240, as the interface controller 230 may anticipate that SoC/processor 250 is likely to access the data soon.” Hasbun column 12, lines 15-20.)
4. The computer processing system of claim 2, wherein: 
the number of memory accesses per time is a total of page accesses over a last E number of recent epochs for the respective memory pages; and grouping the memory pages into the number of memory page access patterns is based on the totals of page accesses over the last E number of recent epochs. (For example, a counter (which may be referred to as a “saturation counter”) may be used to monitor the number of times data in a certain memory page is accessed within a certain period of time or cycle, and the memory page may be moved from the non-volatile memory array to the volatile memory array based on a value of the counter.” Hasbun column 3, line 66 to column 4 line 10.)
5. The computer processing system of claim 1, further comprising: 
a pattern sorter, wherein the pattern sorter sorts the one or more memory page access patterns based on a total number of memory accesses by one or more processing cores across all pages of the respective memory page access pattern in a last E number of epochs; (See rejection of claim 1.  Note that Hasbun teaches the system configured with a single processor: “For example, a processor at the device may initiate a read operation at the volatile memory array, and the volatile memory array may provide the requested data to the processor.”  Hasbun column 2 lines 65-67.) and wherein the computer processing system trains each of the LSTM RNN instances based on at least one of the sorted memory page access patterns across the last E number of epochs.  (“Other parameters may include: training sample size; look-back window size; and gradient descent optimizer. The training sample size may be expressed as a unit of time (e.g., 30 seconds, 60 seconds, 5 minutes) or as a number of samples (e.g., several hundred thousand, several million or another number). The number of samples resulting from the predefined time or the specified number of samples represents the number of I/O requests that will be used to train the prediction engine (“training samples”). For example, a 60-second sample size may result in up to several million I/O requests or even more (e.g., for a single storage device or multiple storage devices), and thus resulting in several million samples or even more. . . . [0065] The look-back window size may define how far back—i.e., how many samples back—each block of the LSTM should look when generating output values, which may define the size of an input vector for the LSTM; i.e., the size of each input vector to each block of the LSTM.” Martin paragraphs 0064-0065.)
6. The computer processing system of claim 5, wherein 
the memory pages are grouped based on a similarity measure as a function of a distance between memory accesses over the last E number of epochs.  (Hashbun teaches; “In some cases, interface controller 230 may include a counter that records a number (e.g., quantity or frequency) of access attempts by SoC/processor 250 to the contents of virtual memory bank 235 during a certain time interval.”  Hashbun column 12, lines 1-4.)
7. The computer processing system of claim 1, wherein 
the second set of memory pages includes the grouped memory pages of the identified plurality of memory pages.  (See rejection of claim 1 showing pages associated with a count below a threshold being sent to a different area of memory.)
8. The computer processing system of claim 1, wherein the characteristic of the first and second memories is one from a group of characteristics including: 
an energy efficiency, a memory access time, and an amount of the first memory relative to an amount of the second memory having the different characteristic.  (“The non-volatile memory array . . . may have a larger storage capacity than the volatile memory array.” Hashbun column 3, lines 25-30.)
10. The computer processing system of claim 1, wherein identifying the plurality of memory pages by the predictor includes: 
determining a number of memory page accesses in one or more E number of recent epochs for each memory page of the first set of memory pages in the first memory; (See rejection of claim 1.) determining a prediction error for each memory page of the first set of memory pages; sorting the memory pages of the first set of memory pages by the respective prediction errors; (Martin teaches: “The LSTM may be configured to perform any of a variety of gradient descent optimization algorithms (optimizers) for minimizing prediction error. In some embodiments, the LSTM may be configured to apply an Adam optimization algorithm. A first phase of training the LSTM may result in error measurements resulting from a training (e.g., an initial training) of the LSTM that may be applied to later trainings. The specified optimizer of the LSTM may be applied along with the error measurements to the LSTM, and the LSTM may learn and adjust its parameters (e.g., one or more parameters defined in Equations 1-6 above) to produce a fastest decline in error, as the optimizer naturally gravitates toward as global minimum.”  Martin paragraph 0066. “Setting parameters in step 702 also may include establishing one or more prediction thresholds including, for example, one or more prediction accuracy thresholds, and a prediction probability threshold. The one or more prediction accuracy thresholds may include a threshold percentage of correct predictions or ratio of correct-to-incorrect predictions, or a threshold average prediction error (e.g., average difference between the predicted LBA change and the actual LBA change). The prediction probability threshold may be a threshold for a calculated probability of a next I/O feature value that needs to be satisfied in order to take any action based on the prediction, as described in more detail elsewhere herein.” Martin paragraph 0067. Note that actions are only taken when a prediction threshold is met, sorting the data (pages) into at least two groups (those using the prediction and those not using the prediction).) and selecting a top number of memory pages based on the prediction errors for the grouping of the memory pages into the number of patterns. (See rejection of claim 1 showing that .  “sorting the memory pages . . . by the respective prediction errors” and “selecting a top number of pages based on prediction errors” reads on sorting the pages based on an ANN using errors to make the prediction.)
11. A method for memory page placement in a computer processing system, the method comprising: 
identifying a plurality of memory pages from a first set of memory pages in a first memory for long short-term memory (LSTM) recurrent neural network (RNN) prediction by a set of P number of LSTM RNN instances; (See rejection of claim 2.) determining one or more memory page access patterns from the set of P number of LSTM RNN instances; (See rejection of claim 2.) predicting a number of page accesses for each of a P number of patterns of memory pages by the LSTM RNN instances; (See rejection of claim 1.) selecting a second set of memory pages for moving from the first memory based on the predicted number of page accesses; and moving each of the second set of memory pages from the first memory to a second memory. (See rejection of claim 1.)
12. The method of claim 11, further comprising: 
grouping the memory pages of the identified plurality of memory pages into a number of patterns based on a number of memory accesses per time of the respective identified memory pages; (See rejection of claim 2.) selecting the P number of patterns for the P number of LSTM RNNs based on the grouping of the memory pages; sorting the P number of patterns based on a total number of memory accesses across all pages of the respective pattern in a last E number of epochs; (See rejection of claim 1.  Note that selecting a pattern “based on the grouping of the memory pages” and “sorting the . . . patterns based on a total number of memory accesses” reads on selecting the “pattern” with a given count.) and training each of the LSTM RNNs with a top P number of the sorted patterns across the last E number of epochs. (“The look-back window size may define how far back—i.e., how many samples back—each block of the LSTM should look when generating output values, which may define the size of an input vector for the LSTM; i.e., the size of each input vector to each block of the LSTM.” Martin paragraph 0065.   “In step 708, it may be determined whether the highest probability of predicted future values satisfies a probability threshold, for example, the probability threshold described in relation to step 702. It may be desirable to require that such a threshold be met to avoid taking one or more I/O actions if there is insufficient confidence in the accuracy of the predicted value causing the action to be taken. That is, even though the determined highest probability is the highest probability from among values occurring in the prediction vector, it still may not be high enough to warrant taking one or more I/O actions.” Martin paragraph 0081. “In step 710, the predicted value of the prediction vector having the highest probability is selected as the predicted value of a next I/O feature. That is, for a next x I/O operations (or for any I/O operation over a predefined period of time; e.g., for a certain storage device), actions may be taken, based at least in part, on the prediction that the I/O operation will exhibit the selected I/O feature value. It should be appreciated that this prediction likely will not be accurate 100% of time, and in fact it likely was not predicted that the predicted value will be correct 100% time, but it was determined by performance of step 708 that the prediction should be correct often enough to justify taking certain actions based on the prediction.”  Martin paragraph 0082. See also Martin figure 7.)
13. The method of claim 12, wherein 
the second set of memory pages includes at least each of the identified plurality of memory pages used for grouping the memory pages into the number of patterns. (See rejection of claim 1. Note again that the description of “patterns” in the specification at paragraph 0016 includes pages with access counts above/below a threshold.)
14. The method of claim 12, wherein 
the memory pages are grouped based on a similarity measure of a respective memory page access pattern of the memory pages within an E number of recent epochs.  (See rejection of claim 1. Note again that the description of “patterns” in the specification at paragraph 0016 includes pages with access counts above/below a threshold.)
15. The method of claim 12, wherein 
the memory pages are grouped based on a similarity measure of a respective memory access pattern of the memory pages within an E number of recent epochs. (See rejection of claim 1. Note again that the description of “patterns” in the specification at paragraph 0016 includes pages with access counts above/below a threshold.)
16. The method of claim 12, wherein: 
the number of memory accesses per time is a total of page accesses over a last E number of recent epochs for the respective memory pages; and selecting the P number of patterns is based on the totals of page accesses over the last E number of recent epochs.  (See rejection of claim 1. Note again that the description of “patterns” in the specification at paragraph 0016 includes pages with access counts above/below a threshold.)
17. The method of claim 12, wherein identifying the plurality of memory pages includes: 
determining a number of memory page accesses in one or more E number of recent epochs for each memory page of the first set of memory pages in the first memory; (See rejection of claim 10.) determining a prediction error for each memory page of the first set of memory pages; sorting the memory pages of the first set of memory pages by the respective prediction errors; (See rejection of claim 10.) and selecting a top number of memory pages based on the prediction errors for the grouping of the memory pages into the number of patterns. (See rejection of claim 10.)
18. The method of claim 11, further comprising: identifying one or more resource constraints for operation of the set of P number of LSTM RNN instances; and adjusting a resource requirement of the set of P number of LSTM RNN instances, wherein the resource requirement is at least one of: 
a number of processing cores designated for the set of P number of LSTM RNN instances; an amount of memory designated for the set of P number of LSTM RNN instances; and a number instances of the LSTM RNN instances in the set of P number of LSTM RNN instances operative in the computer processing system in response to a change in at least one of the number of processing cores and the amount of memory designated for the set of P number of LSTM RNN instances.  (“Although only four LSTM modules are illustrated, the number of instances (or number of active modules in a hardware/firmware embodiment) at any given time may vary depending on how the LSTM is implemented, including how much parallel processing is desired and/or possible given the constraints of the implementation (e.g., prediction module 28 of system 12). . . . [0059] The learning capacity of an LSTM may scale relative to the number of LSTM modules (i.e., neurons) implemented and the number of layers within the modules, which means that it's learning capacity may be limited to a certain number of permutations or interactions based on the amount of memory and compute resources that are dedicated to it. Thus, various parameters of the LSTM may be configured and adjusted to balance the learning capacity of the LSTM and resource consumption.” Martin paragraph 0058-0059.)
19. The method of claim 18, wherein: 
identifying the one or more resource constraints for operation of the set of P number of LSTM RNN instances is performed by a resource tracker associated with a respective LSTM RNN instance; (See rejection of claim 18.  Note that denoting the module for carrying out configuration of the LSTM based on resources a “resource tracker” does not require additional steps to be performed or limit to a particular structure.  See MPEP §§ 2103 and 2111.04.) and the set of P number of LSTM RNN instances are controlled by a LSTM microcontroller.  (See rejection of claim 18.  Martin teaches: “In some embodiments, prediction module 28 may include one or more CPUs, graphical processing units (GPUs) or other types of processors or controllers, configured to execute software to perform one or more aspects of I/O behavior prediction described herein.”  Martin paragraph 0039.  Note that denoting the structures/software used to change the number of LSTM instances an “LSTM microcontroller” does not limit to a particular structure (noting that a processor is taught as carrying out the operations in Martin). See MPEP §§ 2103 and 2111.04.)
20. The method of claim 11, wherein 
identifying the plurality of memory pages from the first set of memory pages includes selecting memory pages having a highest access rate from the memory pages of the first set of memory pages. (Hashbun teaches; “In some cases, interface controller 230 may include a counter that records a number (e.g., quantity or frequency) of access attempts by SoC/processor 250 to the contents of virtual memory bank 235 during a certain time interval.”  Hashbun column 12, lines 1-4.)
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Hasbun, Martin, and Mandal (US 2018/0350684)
9. The computer processing system of claim 1, wherein 
the first memory is a set of physical memory modules on separate integrated circuit (IC) dies and coupled by a memory interconnect. (The previously cited art does not teach memory on separate dies coupled by a memory interconnect. 
Mandal teaches: “Three dimensional (3D) integrated circuit (IC) technology provides many benefits, such as a small form factor. The 3D integrated circuit requires a stacked configuration consisting of a die to die (D2D) bonding. In D2D bonding techniques, the interconnects of each of the separate integrated circuits (dies) within the stack must be aligned and in electrical connection for the stack to be operable.”  Mandal paragraph 0002.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Mandal as an instance of applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; The prior art contained a "base" device (method, or product) upon which the claimed invention can be seen as an "improvement” (using a plurality of dies with interconnects is part of 3D IC technology, which improves form factor (makes it smaller)).  The prior art contained a known technique that is applicable to the base device (method, or product) (The use of the type of memory taught in Mandal is applicable to the memory devices in the base device). One of ordinary skill in the art would have recognized that applying the known technique would have yielded predictable results and resulted in an improved system (one of ordinary skill in the art would have recognized that details of 3D memory, which is more dense, would have resulted in an improved system). See MPEP § 2143(I)(D).)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Title
Document I.D.
Reason Included
METHOD AND SYSTEM FOR PROFILING VIRTUAL APPLICATION RESOURCE UTILIZATION PATTERNS
US 20120005674 A1
“Then, only the pages accessed within a predetermined amount of time from the first transition data structure may be selected for inclusion in the prefetch xsequence file(s). Alternatively, a predetermined number of pages accessed within a least amount of time from the first transition data structure may be selected for inclusion in the prefetch xsequence file(s). By way of another non-limiting example, a predetermined percentage of pages including those that were accessed within the least amount of time from the first transition data structure may be selected for inclusion in the prefetch xsequence file(s).” paragraph 0202.
DYNAMICALLY DETERMINING TRACKS TO PRESTAGE FROM STORAGE TO CACHE BY TRAINING A MACHINE LEARNING MODULE
US 20190391920 A1
In an embodiment of blocks 608 and 610, the increasing or decreasing by the time margin of error may comprise multiplying the trigger track and prestage amount by one plus the margin of error to increase or multiplying by one minus the margin of error to decrease. In alternative embodiments, the time margin of error may be used in other functions to increase or decrease the trigger track and destage amount. In alternative embodiments, the time margin of error may be calculated in additional ways using the current time and a time the demoted track was prefetched into the cache 208.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571)272-8646.  The examiner can normally be reached on Monday - Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on 571 272 4204.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


PAUL M. KNIGHT
Examiner
Art Unit 2139



/PAUL M KNIGHT/Examiner, Art Unit 2139