DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event a determination of the status of the application as subject to AIA  35 U.S.C. 102, 103, and 112 (or as subject to pre-AIA  35 U.S.C. 102, 103, and 112) is incorrect, any correction of the statutory basis for a rejection will not be considered a new ground of rejection if the prior art relied upon and/or the rationale supporting the rejection, would be the same under either status.  

Notice of Claim Interpretation
Claims in this application are not interpreted under 35 U.S.C. 112(f) unless otherwise noted in an office action.

	Duty of Disclosure
Applicant is reminded of 37 C.F.R. 1.56(a-b) which states:
(a) A patent by its very nature is affected with a public interest. The public interest is best served, and the most effective patent examination occurs when, at the time an application is being examined, the Office is aware of and evaluates the teachings of all information material to patentability. Each individual associated with the filing and prosecution of a patent application has a duty of candor and good faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that individual to be material to patentability as defined in this section. The duty to disclose information exists with respect to each pending claim until the claim is cancelled or withdrawn from consideration, or the application becomes abandoned. Information material to the patentability of a claim that is cancelled or withdrawn from consideration need not be submitted if the information is not material to the patentability of any claim remaining under consideration in the application. There is no duty to submit information which is not material to the patentability of any existing claim. The duty to disclose all information known to be material to patentability is deemed to be satisfied if all information known to be material to patentability of any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by §§ 1.97 (b)-(d) and 1.98. However, no patent will be granted on an application in connection with which fraud on the Office was practiced or attempted or the duty of disclosure was violated through bad faith or intentional misconduct. The Office encourages applicants to carefully examine:
(1) Prior art cited in search reports of a foreign patent office in a counterpart application, and
(2) The closest information over which individuals associated with the filing or prosecution of a patent application believe any pending claim patentably defines, to make sure that any material information contained therein is disclosed to the Office.

(b) Under this section, information is material to patentability when it is not cumulative to information already of record or being made of record in the application, and
(1) It establishes, by itself or in combination with other information, a prima facie case of unpatentability of a claim; or
(2) It refutes, or is inconsistent with, a position the applicant takes in:
(i) Opposing an argument of unpatentability relied on by the Office, or
(ii) Asserting an argument of patentability.
(3) A prima facie case of unpatentability is established when the information compels a conclusion that a claim is unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term in the claim its broadest reasonable construction consistent with the specification, and before any consideration is given to evidence which may be submitted in an attempt to establish a contrary conclusion of patentability.  


	Information Disclosure Statement
The information disclosure statement (IDS) submitted on 29 June 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
Figures 2 and 3 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated.  See MPEP § 608.02(g).  Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 100, 200.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 140.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to because both 1010 and 1020 in figure 10 are described by the specification as caches, but do not both appear to be caches based on figure 10.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 21-23 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 21 is directed to a coprocessor comprising a series of elements, but fails to specify a conjunction to link those elements.  It is unclear what conjunction Applicant intended.  Claims 22 and 23 are rejected based on their dependence on claim 21.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 10, and 11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Jung et al. (US 2017/0285968).
In regards to claim 1, Jung teaches a coprocessor comprising:
a processor that corresponds to a core of the coprocessor and generates a memory request (processor 131, figure 1; “A graphic processing unit (GPU) or many integrated core (MIC) device is one example of the accelerator 300.”, paragraph 0044);
a cache used as a buffer of the processor (“L1 cache of a worker LWP 313”, paragraph 0088);
an interconnect network (network 350, figure 2);
a flash network (network 360, figure 2);
a flash memory (flash package 321, figure 2); and
a flash controller that is connected to the processor and the cache through the interconnect network, is connected to the flash memory through the flash network, and reads or writes target data from or to the flash memory (“If the host requests an I/O service, the master LWP 311 signals to a flash LWP 312 via a flash execution interface for the I/O service, and the flash LWP 312 executes data read/write on the flash backbone 320.”, paragraph 0075).
In regards to claim 10, Jung teaches a coprocessor comprising:
a processor that corresponds to a core of the coprocessor (processor 131, figure 1; “A graphic processing unit (GPU) or many integrated core (MIC) device is one example of the accelerator 300.”, paragraph 0044);
a cache used as a read buffer of the processor (“L1 cache of a worker LWP 313”, paragraph 0088);
a flash memory including an internal register used as a write buffer of the processor and a memory space for storing data (“As each flash package in practice has its own I/O control logic and a set of data registers, all the low-level transactions for flash may be handled from outside via ready/busy (R/B) and chip enable (CE) pins.”, paragraph 0065); and
a flash controller (memory controller 333, figure 3) that when a read request from the processor misses in the cache (“Referring to FIG. 8, when a memory access requested by an NDP kernel execution is missed from an L1 cache of a worker LWP 313, the memory access goes directly to a memory controller 333 (S810).”, paragraph 0088), reads read data of the read request from the flash memory (“As the data exist in the flash data space 331b, the memory controller 333 updates the present bit (P) flag (S870) and serves the data to the L1 cache of the worker LWP 313 (S880).”, paragraph 0091), and first stores write data of a write request from the processor to the write buffer before writing the write data to the memory space of the flash memory (“The buffer subsystem 330 may operate as a buffer memory between the flash backbone 320 for reading and writing data in pages and the host or LWP 310 for reading and writing data in words or bytes.”, paragraph 0055).
In regards to claim 11, Jung further teaches an interconnect network that connects the processor, the cache, and the flash controller (network 350, figure 2); and
a flash network that connects the flash memory and the flash controller (network 360, figure 2).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Caulfield et al. (“Moneta: A High-performance Storage Array Architecture for Next-generation, Non-volatile Memories”).
In regards to claim 2, Jung teaches claim 1.  Jung fails to teach that the flash controller includes a plurality of flash controllers, and
wherein memory requests are interleaved over the flash controllers.
Caulfield teaches that the flash controller includes a plurality of flash controllers (“Moneta’s architecture provides low-latency access to a large amount of non-volatile memory spread across four memory controllers.”, section II(B), paragraph 1), and
wherein memory requests are interleaved over the flash controllers (“The Moneta scheduler stripes internal storage addresses across the memory controllers to extract parallelism from large requests.”, section II(B), paragraph 2)
“to extract parallelism from large requests” (id.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Caulfield such that the flash controller includes a plurality of flash controllers, and
wherein memory requests are interleaved over the flash controllers
“to extract parallelism from large requests” (id.).

Claims 3, 5-8, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Wang et al. (“A Real-Time Flash Translation Layer for NAND Flash Memory Storage Systems”).
In regards to claim 3, Jung further teaches a memory management unit including a table that stores a plurality of physical addresses mapped to a plurality of addresses respectively, and is connected to the interconnect network (“The mapping table 332c is provided on a flash translation layer (FTL) and maps the virtual address (i.e., a logical address) used by the host to a physical address exposed by the flash.”, paragraph 0086).
Jung fails to teach that each of the physical addresses includes a physical log block number and a physical data block number, 
wherein an address of the memory request is translated into a target physical address that is mapped to the address of the memory request among the physical addresses, and
wherein the target physical address includes a target physical log block number and a target physical data block number.
Wang teaches that each of the physical addresses includes a physical log block number and a physical data block number (“A block mapping table is used to map a logical block with three physical blocks: the primary block, the replacement block and the buffer block as shown in Fig. 3.”, section 3.2.1, paragraph 1), 
wherein an address of the memory request is translated into a target physical address that is mapped to the address of the memory request among the physical addresses (“A block mapping table is used to map a logical block with three physical blocks: the primary block, the replacement block and the buffer block as shown in Fig. 3.”, section 3.2.1, paragraph 1), and
wherein the target physical address includes a target physical log block number and a target physical data block number (“A block mapping table is used to map a logical block with three physical blocks: the primary block, the replacement block and the buffer block as shown in Fig. 3.”, section 3.2.1, paragraph 1)
in order “to provide guaranteed space for write operations.” (section 3.2.1, paragraph 1)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Wang such that each of the physical addresses includes a physical log block number and a physical data block number, 
wherein an address of the memory request is translated into a target physical address that is mapped to the address of the memory request among the physical addresses, and
wherein the target physical address includes a target physical log block number and a target physical data block number
in order “to provide guaranteed space for write operations.” (id.)
In regards to claim 5, Wang further teaches that the flash memory includes a plurality of physical log blocks (“A block mapping table is used to map a logical block with three physical blocks: the primary block, the replacement block and the buffer block as shown in Fig. 3.”, section 3.2.1, paragraph 1; “Otherwise, if the requests want to access different logical blocks, the garbage collection operations are correspondingly distributed to different logical blocks.”, section 3.2.2, paragraph 6), and
wherein each of the physical log blocks stores page mapping information between a page index and a physical page number (“In order to reduce the RAM cost, the page-level mapping table is divided into N small tables, and each small table is  stored in the OOB area of the newly allocated page.”, section 3.2.1, paragraph 2).
In regards to claim 6, Wang further teaches that the address of the memory request is split into at least a logical block number and a target page index (“In block-level FTL schemes [4], a logical page number (LPN) is made up of a LBN and a block offset (BO).”, section 2.2, paragraph 2), and
wherein when the memory request is a read request and the target page index hits in the page mapping information of a target physical log block indicated by the target physical log block number, the target physical log block reads the target data based on the page mapping information (“When a read request is scheduled, the LPN is first translated to an LBN and a BO. The corresponding LBN will be searched in the block mapping table in RAM. Then, the page mapping sub-table for the requested BO can be obtained using the page table index in RAM. From the sub-table, we can get the physical page that stores the requested data.”, section 3.2.3, paragraph 5).
In regards to claim 7, Wang further teaches that the address of the memory request is split into at least a logical block number and a target page index (“In block-level FTL schemes [4], a logical page number (LPN) is made up of a LBN and a block offset (BO).”, section 2.2, paragraph 2), and
wherein when the memory request is a read request and the target page index does not hit in the page mapping information of a target physical log block indicated by the target physical log block number, a physical data block indicated by the target physical data block number reads the target data based on the target page index (“When a read request is scheduled, the LPN is first translated to an LBN and a BO. The corresponding LBN will be searched in the block mapping table in RAM. Then, the page mapping sub-table for the requested BO can be obtained using the page table index in RAM. From the sub-table, we can get the physical page that stores the requested data.”, section 3.2.3, paragraph 5).
In regards to claim 8, Wang further teaches that the address of the memory request is split into at least a logical block number and a target page index (“In block-level FTL schemes [4], a logical page number (LPN) is made up of a LBN and a block offset (BO).”, section 2.2, paragraph 2), and
wherein when the memory request is a write request, a target physical log block indicated by the target physical log block number writes the target data to a free page in the target physical log block (“The primary block is first used to serve the write requests, and the buffer block will serve the pending write requests when the primary block is full, while the replacement block provides a space to reclaim the primary block.”, section 3.2.1, paragraph 1), and stores mapping between the target page index and a physical page number of the free page to the page mapping information (“In order to reduce the RAM cost, the page-level mapping table is divided into N small tables, and each small table is stored in the OOB area of the newly allocated page.”, section 3.2.1, paragraph 2).
In regards to claim 21, Jung taches a coprocessor comprising:
a processor that corresponds to a core of the coprocessor (processor 131, figure 1; “A graphic processing unit (GPU) or many integrated core (MIC) device is one example of the accelerator 300.”, paragraph 0044);
a memory management unit including a table that stores a plurality of physical addresses mapped to a plurality of addresses, respectively (“The mapping table 332c is provided on a flash translation layer (FTL) and maps the virtual address (i.e., a logical address) used by the host to a physical address exposed by the flash.”, paragraph 0086), 
a flash memory that includes a plurality of physical data blocks (flash package 321, figure 2), 
a flash controller that reads data of a read request generated by the processor from the flash memory (“If the host requests an I/O service, the master LWP 311 signals to a flash LWP 312 via a flash execution interface for the I/O service, and the flash LWP 312 executes data read/write on the flash backbone 320.”, paragraph 0075).
Jung fails to teach that each of the physical addresses including a physical log block number and a physical data block number,
that the flash memory includes a plurality of physical log blocks, wherein each of the physical log blocks stores page mapping information between page indexes and physical page numbers, and
that the flash controller read data based on a physical log block number or target physical data block number that is mapped to an address of the read request among the physical addresses, the page mapping information of a target physical log block indicated by the physical log block number mapped to the address of the read request, and a page index split from the address of the read request.
Wang teaches that each of the physical addresses including a physical log block number and a physical data block number (“A block mapping table is used to map a logical block with three physical blocks: the primary block, the replacement block and the buffer block as shown in Fig. 3.”, section 3.2.1, paragraph 1),
that the flash memory includes a plurality of physical log blocks (“A block mapping table is used to map a logical block with three physical blocks: the primary block, the replacement block and the buffer block as shown in Fig. 3.”, section 3.2.1, paragraph 1; “Otherwise, if the requests want to access different logical blocks, the garbage collection operations are correspondingly distributed to different logical blocks.”, section 3.2.2, paragraph 6), wherein each of the physical log blocks stores page mapping information between page indexes and physical page numbers (“In order to reduce the RAM cost, the page-level mapping table is divided into N small tables, and each small table is  stored in the OOB area of the newly allocated page.”, section 3.2.1, paragraph 2), and
that the flash controller read data based on a physical log block number or target physical data block number that is mapped to an address of the read request among the physical addresses, the page mapping information of a target physical log block indicated by the physical log block number mapped to the address of the read request, and a page index split from the address of the read request (“When a read request is scheduled, the LPN is first translated to an LBN and a BO. The corresponding LBN will be searched in the block mapping table in RAM. Then, the page mapping sub-table for the requested BO can be obtained using the page table index in RAM. From the sub-table, we can get the physical page that stores the requested data.”, section 3.2.3, paragraph 5)
in order “to provide guaranteed space for write operations.” (section 3.2.1, paragraph 1)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Wang such that each of the physical addresses including a physical log block number and a physical data block number,
that the flash memory includes a plurality of physical log blocks, wherein each of the physical log blocks stores page mapping information between page indexes and physical page numbers, and
that the flash controller read data based on a physical log block number or target physical data block number that is mapped to an address of the read request among the physical addresses, the page mapping information of a target physical log block indicated by the physical log block number mapped to the address of the read request, and a page index split from the address of the read request
in order “to provide guaranteed space for write operations.” (id.)
In regards to claim 22, Wang further teaches that the flash controller writes data of a write request generated by the processor to a physical log block indicated by a physical log block number that is mapped to an address of the write request among the physical addresses (“The primary block is first used to serve the write requests, and the buffer block will serve the pending write requests when the primary block is full, while the replacement block provides a space to reclaim the primary block.”, section 3.2.1, paragraph 1).
In regards to claim 23, Wang further teaches that mapping between a physical page number indicating a page of the physical log block to which the data of the write request is written and a page index split from the address of the write request is stored in the page mapping information of the physical log block indicated by the physical log block number mapped to the address of the write request (“In order to reduce the RAM cost, the page-level mapping table is divided into N small tables, and each small table is stored in the OOB area of the newly allocated page.”, section 3.2.1, paragraph 2).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Wang et al. (“A Real-Time Flash Translation Layer for NAND Flash Memory Storage Systems”) and Picahi et al. (“Architectural Support for Address Translation on GPUs”).
In regards to claim 4, Jung in view of Wang teaches claim 3.  Jung in view of Wang fails to teach that a part of the table is buffered to a translation lookaside buffer (TLB) of the processor, and
wherein the TLB or the memory management unit translates the address of the memory request into the target physical address.
Pichai teaches that a part of the table is buffered to a translation lookaside buffer (TLB) of the processor (“Our goal is to provide to the GPU the same programmability benefits enjoyed by the CPU. This implies that GPU address translation must support physically-addressed caches. Therefore, we study GPU MMUs where TLBs are accessed in parallel with the L1 cache.”, section 3, paragraph 1), and
wherein the TLB or the memory management unit translates the address of the memory request into the target physical address (“Our goal is to provide to the GPU the same programmability benefits enjoyed by the CPU. This implies that GPU address translation must support physically-addressed caches. Therefore, we study GPU MMUs where TLBs are accessed in parallel with the L1 cache.”, section 3, paragraph 1)
in order to “make data structures and pointers globally visible among compute units, obviating the need for expensive memory copies between CPUs and accelerators” (section 1, paragraph 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jun with Wang and Pichai such that a part of the table is buffered to a translation lookaside buffer (TLB) of the processor, and
wherein the TLB or the memory management unit translates the address of the memory request into the target physical address
in order to “make data structures and pointers globally visible among compute units, obviating the need for expensive memory copies between CPUs and accelerators” (id.).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Wang et al. (“A Real-Time Flash Translation Layer for NAND Flash Memory Storage Systems”) and Eggleston (US 2009/0013148).
In regards to claim 9, Jung in view of Wang teaches claim 5.  Jung in view of Wang fails to teach that each of the physical log blocks includes a row decoder, and
wherein the row decoder includes a programmable decoder for storing the page mapping information.
Eggleston teaches that each of the physical log blocks includes a row decoder (“In one example, row decoders for a memory device are not shared among the arrays. This permits independent row addressing of blocks of memory from the arrays. Such programming of a row decoder or an address translator to skip over bad blocks in an array and retain a contiguous mapping of good blocks can be readily accommodated at production test.”, paragraph 0034), and
wherein the row decoder includes a programmable decoder for storing the page mapping information (“In another embodiment, the particular mappings for an array are stored by programming a row decoder for the array.”, paragraph 0061)
in order “to skip over bad blocks in an array” (paragraph 0034).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Wang and Eggleston such that each of the physical log blocks includes a row decoder, and
wherein the row decoder includes a programmable decoder for storing the page mapping information
in order “to skip over bad blocks in an array” (id.).

Claims 12-16 are rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Oh et al. (“WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs”).
In regards to claim 12, Jung teaches claim 10.  Jung fails to teach a cache control logic that records an access history of a plurality of read requests, and predicts spatial locality of an access pattern of the read requests to determine a data block to be prefetched.  Oh teaches a cache control logic that records an access history of a plurality of read requests, and predicts spatial locality of an access pattern of the read requests to determine a data block to be prefetched (“The stride calculator stores the inter-warp stride for each static load. This module determines whether the currently calculated stride is regular by comparing it with the previously calculated stride. If the stride is regular, the prefetch request generator calculates the prefetching addresses for the target warps and generates the prefetch requests for them.”, section 4.2) “by efficiently hiding the memory latency” (section 1, paragraph 3).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Oh to include a cache control logic that records an access history of a plurality of read requests, and predicts spatial locality of an access pattern of the read requests to determine a data block to be prefetched “by efficiently hiding the memory latency” (id.).
In regards to claim 13, Oh further teaches that the cache control logic predicts the spatial locality based on program counter addresses of the read requests (“To support prefetching for prefetching target warps, WASP employs the inter-warp stride prefetching scheme. WASP calculates the per-warp and per-PC stride, which is the memory address delta between the accesses of two neighbor warps executing the same load.”, section 4.1, paragraph 6).
In regards to claim 14, Oh further teaches that the cache control logic includes a predictor table including a plurality of entries indexed by program counter addresses (See figure 7b),
wherein each of the entries includes a plurality of fields that record information on pages accessed by a plurality of warps, respectively (“If a global load is executed, the stride calculator updates the warp ID and requested memory address in the stride table entry that has the same PC address. If there is no stride table entry with the PC address of the current load, this information is updated in a new entry.”, section 4.3.2, paragraph 1), and a counter field that records a counter corresponding to a number of times the pages recorded in the fields are accessed (“When a warp executes Nth loads, the comparator for each warp calculates load execution count-(N+k). k in Fig. 7a is a variable that is used to adjust the prefetching distance (details will be provided later).”, section 4.3.1, paragraph 2), and
wherein in a case where a cache miss occurs, when the counter of an entry indexed by a program counter address of a read request corresponding to the cache miss is greater than a threshold, the cache control logic prefetches a data block corresponding to the page recorded in the entry indexed by the program counter address (“On the other hand, if an early eviction of a prefetched cache line is detected, k is decreased by one. Each cache line has a bit field to detect early evictions. … When a regular memory request misses the L1 cache, an early eviction is detected if the bit of the currently selected victim line is true.”, section 4.3.1, paragraph 3).
In regards to claim 15, Oh further teaches that the counter increases when an incoming read request accesses a same page as the page recorded in the fields of a corresponding entry (“If a redundant prefetch is detected, k is incremented by one to increase the prefetching distance. When a prefetch request hits the L1 cache or merges in an MSHR, it is considered to be a redundant prefetch.”, section 4.3.1, paragraph 3), and decreases when an incoming read request accesses a different page from the page recorded in the fields of the corresponding entry (“On the other hand, if an early eviction of a prefetched cache line is detected, k is decreased by one. Each cache line has a bit field to detect early evictions. If a cache line has recently been accessed by a regular memory request, the bit in the corresponding line is set to false. If a cache line is fetched by a prefetch request, the bit for that line is set to true. When a regular memory request misses the L1 cache, an early eviction is detected if the bit of the currently selected victim line is true.”, section 4.3.1, paragraph 3).
In regards to claim 16, Oh further teaches that the cache control logic tracks data access status in the cache and dynamically adjusts a granularity of prefetch based on the data access status (“By dynamically adjusting k, the prefetching distance can be controlled. At the beginning of a GPU kernel, this value is initially set to the minimum value (1) and dynamically increased or decreased according to the results of prefetches.”, section 4.3.1, paragraph 3).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Tavakkol et al. (“Network-on-SSD: A Scalable and High-Performance Communication Design Paradigm for SSDs”).
In regards to claim 19, Jung teaches claim 10.  Jung fails to teach that the flash memory includes a plurality of flash planes, 
wherein the internal register includes a plurality of flash registers included in the flash planes, and
wherein a flash register group including the flash registers operates as the write buffer.
Tavakkol teaches that the flash memory includes a plurality of flash planes (“Each NAND flash package consists of multiple dies and each die corresponds to multiple planes containing the actual physical memory pages inside (e.g. 2, 4, or 8KB).”, section 2, paragraph 1), 
wherein the internal register includes a plurality of flash registers included in the flash planes (“To support parallelism inside a die, each plane includes an embedded data register to hold data when a read or write request is issued.”, section 2, paragraph 1), and
wherein a flash register group including the flash registers operates as the write buffer (“To support parallelism inside a die, each plane includes an embedded data register to hold data when a read or write request is issued.”, section 2, paragraph 1)
“[t]o support parallelism inside a die” (id.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Tavakkol such that the flash memory includes a plurality of flash planes, 
wherein the internal register includes a plurality of flash registers included in the flash planes, and
wherein a flash register group including the flash registers operates as the write buffer
“[t]o support parallelism inside a die” (id.).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Jung et al. (US 2017/0285968) in view of Tavakkol et al. (“Network-on-SSD: A Scalable and High-Performance Communication Design Paradigm for SSDs”) and Jeong et al. (“A Technique to Improve Garbage Collection Performance for NAND Flash-based Storage Systems”).
In regards to claim 20, Jung teaches claim 10.  Jung fails to teach that the flash memory includes a plurality of flash planes including a first flash plane and a second flash plane, 
wherein each of the flash planes includes a plurality of flash registers, 
wherein at least one flash register among the flash registers included in each of flash planes is assigned as a data register, 
wherein the write data is stored in a target flash register among the flash registers of the first flash plane, and
wherein when the write data stored in the target flash register is written to a data block of the second flash plane, the write data moves from the target flash register to the data register of the second flash plane, and is written from the data register of the second flash plane to the second flash plane.
Tavakkol teaches that the flash memory includes a plurality of flash planes including a first flash plane and a second flash plane (“Each NAND flash package consists of multiple dies and each die corresponds to multiple planes containing the actual physical memory pages inside (e.g. 2, 4, or 8KB).”, section 2, paragraph 1), 
wherein each of the flash planes includes a plurality of flash registers (“To support parallelism inside a die, each plane includes an embedded data register to hold data when a read or write request is issued.”, section 2, paragraph 1), 
wherein at least one flash register among the flash registers included in each of flash planes is assigned as a data register (“To support parallelism inside a die, each plane includes an embedded data register to hold data when a read or write request is issued.”, section 2, paragraph 1), and
wherein the write data is stored in a target flash register among the flash registers of the first flash plane (“To support parallelism inside a die, each plane includes an embedded data register to hold data when a read or write request is issued.”, section 2, paragraph 1)
“[t]o support parallelism inside a die” (id.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Tavakkol such that the flash memory includes a plurality of flash planes including a first flash plane and a second flash plane, 
wherein each of the flash planes includes a plurality of flash registers, 
wherein at least one flash register among the flash registers included in each of flash planes is assigned as a data register, and
wherein the write data is stored in a target flash register among the flash registers of the first flash plane
“[t]o support parallelism inside a die” (id.).
Jung in view of Tavakkol fails to teach that when the write data stored in the target flash register is written to a data block of the second flash plane, the write data moves from the target flash register to the data register of the second flash plane, and is written from the data register of the second flash plane to the second flash plane.  Jeong teaches that when the write data stored in the target flash register is written to a data block of the second flash plane, the write data moves from the target flash register to the data register of the second flash plane, and is written from the data register of the second flash plane to the second flash plane (“LCB is a copy-back between different planes in the same device. Unlike ICB, it requires that the page data be loaded into a data register for programming.”, section III(A), paragraph 2).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jung with Tavakkol and Jeong such that when the write data stored in the target flash register is written to a data block of the second flash plane, the write data moves from the target flash register to the data register of the second flash plane, and is written from the data register of the second flash plane to the second flash plane in order to reduce I/O bus traffic.

Allowable Subject Matter
Claims 17 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art fails to teach that “the cache control logic increases an evict counter when each cache line is evicted, determines whether to increase an unused counter based on values of the first and second bits corresponding to each cache line, and adjusts the granularity of prefetch based on the evict counter and the unused counter” in conjunction with the other claim limitations, nor would it have been obvious.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Lim (US 7,484,047) teaches a coprocessor that controls access to memory.  Boyd (US 2014/0325098) teaches staging data from disk in an accelerator.  Rogers (US 2018/0359318) teaches an SSD with a GPU and DSP.  Cho (US 2020/0159584) teaches a storage device with an accelerator.  Qureshi (US 2021/0177333) teaches direct access from a GPU to non-volatile memory.  Zhuang et al. ("A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches") teaches prefetch filtering based on a prefetch indicator bit and a reference indicator bit.  Lee et al. ("Many-Thread Aware Prefetching Mechanisms for GPGPU Applications") teaches throttling based on prefetch ratios.  Cho et al. ("XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD") teaches embedding a GPU in a flash controller.  Zhang et al. ("FlashGPU: Placing New Flash Next to GPU Cores") teaches flash integrated into a GPU.
The other art made of record and not relied upon is considered pertinent to applicant's disclosure.  Jung (US 10,824,341) and Jung (US 10,831,376) teach related inventions.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NATHAN SADLER whose telephone number is (571)270-7699. The examiner can normally be reached Monday - Friday 9am - 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on (571)272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Nathan Sadler/Primary Examiner, Art Unit 2139                                                                                                                                                                                                        7 November 2022