Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/19/2021 has been entered.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 2, 11, and 18 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claim 1 is objected to because of the following informalities:
In claim 1: “the plurality contiguous addresses” should be “the plurality of contiguous addresses”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 9, 11, 13, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Godard US 2015/0106567 in view of Schmidt’s “A Case for Hardware-Supported Sub-Cache Line Accesses”.
[CLM 1]
1. (Currently Amended) A computing device, comprising:
a processor configured to access data using memory addresses in an address space;
a first memory configured to store a block of data at a block of contiguous addresses in the space of memory address; and
a second memory configured to cache a first portion of the block of data identified by an item selection vector, wherein the item selection vector has a sequence of bits corresponding to a plurality of contiguous addresses for the block of data, and a count of bits in the item selection vector corresponds to a count of the plurality [of] contiguous addresses for the block of data;
wherein the computing device is configured to communicate the first portion of the block of data from the first memory to the second memory according to the item selection vector, in response to a request to cache the block of data stored in the first memory;
wherein the computing device communicates the first portion from the first memory to the second memory without communicating a second portion of the block of data in response to the request;
wherein the second memory is configured to store tag information identifying the block of contiguous addresses among a plurality of blocks of contiguous addresses.

Godard US 2015/0106567 teaches:
a processor configured to access data using memory addresses in an address space;
CPU 102 [Fig. 1] uses addresses to access data
Use of addresses in an address space [0080-0081; 0084]
a first memory configured to store a block of data at a block of contiguous addresses in the space of memory address; and
A main memory storing cache lines, e.g. memory 101 [Fig. 1]
Cache lines or blocks as a unit of accessible memory [0040], larger than typical program access sizes [0041]
a second memory configured to cache a first portion of the block of data identified by an item selection vector, wherein the item selection vector has a sequence of bits corresponding to a plurality of contiguous addresses for the block of data, and a count of bits in the item selection vector corresponds to a count of the plurality [of] contiguous addresses for the block of data;
Cache may read or write at granularities smaller than a cache line; e.g., byte-sized portions of cache lines from memory 101, specified by a base address and a byte mask vector [read request processing at 0042; write request processing at 0045].
The byte mask comprises a sequence of bits each corresponding to a portion of the cache line identified by the base address, each bit identifying for data at addresses within that cache line whether the corresponding byte of data is to be accessed. The number of bits in the mask corresponds to the number of bytes in a cache line [0042].
wherein the computing device is configured to communicate the first portion of the block of data from the first memory to the second memory according to the item selection vector, in response to a request to cache the block of data stored in the first memory;
Only the bytes in the cache line which are indicated to be read by the mask are transmitted between levels in the memory hierarchy, based on the mask (“load request is processed by the top level cache of the hierarchical memory system, looking for one or more valid bytes of the requested cache line corresponding to the target address of the load request. The valid byte(s) of the cache line corresponding to the byte mask as stored in cache can be identified by reading out the valid bit(s) and data byte(s) stored by the cache for putative matching cache lines for those data bytes that are specified by the byte mask of the load request, while ignoring the valid bit(s) and data byte(s) for such putative matching cache lines for those data bytes that are not specified by the byte mask of the load request.” [0042]).
When the request misses any bytes in the current level, the byte mask is updated to reflect only those missing bytes, and those bytes are loaded from a lower level memory, such as the main memory (“If any bits remain set in the byte mask, this indicates that one or more bytes desired by the load request have not yet been satisfied. In this case, a load request is issued to the next lower level cache of the memory hierarchy employing the updated byte mask. The next lower level cache…can repeat these operations to check for storage of the remaining bytes as specified by the byte mask of the load request…In the event that requested bytes remain unsatisfied after checking all cache levels, then the request cache line can be read the line from main memory (a cache fill) in order to satisfy the remaining desired bytes from the cache line. Thus a single load request may be satisfied by bytes obtained from several different caches and/or main memory.” [0044])
“If not found in the next lower level, the load request is lowered further down the memory hierarchy until satisfied.” [0094]
wherein the computing device communicates the first portion from the first memory to the second memory without communicating a second portion of the block of data in response to the request;
	Transmitting only the remaining bytes between levels of the memory hierarchy [0044], e.g. from main memory [0047].
	Some accesses to main memory may also be at sub-cache line granularity [0050].
wherein the second memory is configured to store tag information identifying the block of contiguous addresses among a plurality of blocks of contiguous addresses.
	Caches contain tags for distinguishing each cache line from other cache lines [0070].

	Hence, Godard discloses a system and technique for transferring only missing data elements (e.g., bytes) between different levels of a memory hierarchy by use of a mask.
	Where a first memory is construed as a lower level memory in the memory hierarchy, Godard appears to read on the instant claims.
	

	However, similar practices for accessing main memory at granularities smaller than a cache line were known. See Schmidt, Fig. 1 and [P2, C2], for reducing memory bandwidth usage by transmitting data in units smaller than cache lines between main memory and the caches. Hence, Schmidt suggests that techniques for transferring data in units smaller than a cache line, such as the byte mask of Godard, between caches and main memory would improve the memory bandwidth usage.
	Hence, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to employ known sub-cache line access techniques such as the byte mask employed by Godard to transfers between caches and main memory as suggested by Schmidt, for the purpose of reducing wasted memory bandwidth associated with traditional loads, thereby improving memory system performance.

[CLM 9]
9. (Original) The computing device of claim 1, wherein the item selection vector has a list of indices identifying the portion of the first portion of the block of data.
	The combination teaches claim 1, wherein the item selection vector has a list of indices identifying the portion of the first portion of the block of data (“index list containing the offsets to the actual values” [Schmidt, P2, C2]).

[CLM 11]
Claim 11 is rejected on similar grounds as claim 1, as it is the method performed by the apparatus of claim 1.

[CLM 13]
The combination teaches claim 11, wherein
the plurality of blocks of contiguous memory addresses have a same size; and
Cache lines may have a set size [Godard, 0012-0013; 0040]
the method further comprises:
storing in the second memory tag information identifying the block of contiguous memory addresses among a plurality of blocks of contiguous memory addresses; and
	Caches contain tags for distinguishing each cache line from other cache lines [Godard, 0070; Fig. 5A].
caching different blocks in the plurality of blocks in different cache blocks in the second memory.
	Caches store cache lines in different cache line slots in the cache, see e.g. plurality of cache lines stored in each way of the set-associative cache [Godard, Fig. 5A].


[CLM 18]
18. The method of claim 11, wherein the communicating of the first portion of the block of data from the first memory to the second memory comprises:
transmitting the item selection vector to a controller of the first memory;
retrieving the first portion of the block of data from the first memory according to the item selection vector; and
transmitting the first portion of the block of data in a batch to the second memory.  
wherein the communicating of the first portion of the block of data from the first memory to the second memory comprises:
transmitting the item selection vector to a controller of the first memory;
	Load request to cache.
	“a processor is configured with execution logic that includes a load unit that executes load operations…The execution of a given load operation involves the generation of a load request this communicated to the hierarchical memory system. The load request includes an address specifying a requested cache line as well as a mask (referred to herein as a ‘byte mask’) that includes a number of bits each corresponding to a different byte of the requested cache line…The load request is processed by the top level cache of the hierarchical memory system” [Godard, 0042]
	Load to request to subsequent caches and main memory after missing the top level cache.
	“If any bits remain set in the byte mask, this indicates that one or more bytes desired by the load request have not yet been satisfied. In this case, a load request is issued to the next lower level cache of the memory hierarchy employing the updated byte mask. The next lower level cache of the memory hierarchy can repeat these operations to check for storage of the remaining bytes as specified by the byte mask of the load request. In the event that requested bytes remain unsatisfied after checking all cache levels, then the request cache line can be read the line from main memory (a cache fill) in order to satisfy the remaining desired bytes from the cache line.” [Godard, 0044]

retrieving the first portion of the block of data from the first memory according to the item selection vector; and
	Reading a number of bytes from the main memory according to the byte mask [Godard, 0044], as suggested by Schmidt [P2, C2; Fig. 1].

transmitting the first portion of the block of data in a batch to the second memory.
	Reading the portion from main memory in a cache line-sized batch [Schmidt, P2, C2; Fig. 1].

	It would have been obvious to the skilled artisan before the effective filing date of the claimed invention to read and transmit the portion of the cache line in a batch including other data as a cache line, as disclosed by Schmidt, in order to avoid partial line insertions from main memory, thereby obviating the need for more complex circuitry in lower level caches, e.g. the L3 [Schmidt, P2, C2].
 

[CLM 19]
19. A non-transitory computer storage medium storing instructions which when executed on in a computing system, cause the computing system to perform a method, the method comprising: 
storing, in a first memory of the computing system, a block of data at a block of contiguous memory addresses in an address space;
accessing, by a processor of the computing system, data using memory -- 19 --Patent ApplicationAttorney Docket No. 120426-104800/US addresses in the address space; and
in response to a request to cache the block of data stored in the first memory, communicating a first portion of the block of data from the first memory to a second memory of the computing system according to an item selection vector without accessing a second portion of the block of data; and
caching, in the second memory of the computing system, the first portion of the block of data identified by the item selection vector.  
	Claim 19 is rejected on similar grounds as claim 1, as it is the medium embodying the method of claim 1. It is considered that various computer storage media were well-known, e.g. optical disks or CDs.
.


Claims 4-5 and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and further in view of Busaba US 2015/0089159.
[CLM 4]
4. (Previously Presented) The computing device of claim 1, wherein the different cache blocks in the second memory have different sizes. 
	The combination teaches claim 1, but is silent to wherein the different cache blocks in the second memory have different sizes.
Where the combination is silent, Busaba teaches wherein the different cache blocks in the second memory have different sizes (“Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub -cache line portions of a full cache line. Each cache is associated with a directory having a number of directory entries and with a side table having a smaller number of entries. The directory entry for a cache line associates the cache line with a tag and a set of full-line descriptive bits. Creating a side table entry for the cache line places the cache line in sub-line coherency mode. The side table entry associates each of the sub -cache line portions of the cache line with a set of sub-line descriptive bits. Removing the side table entry may return the cache line to full-line coherency mode.” [Abstract]).
	Specifically, Busaba discloses devices and methods to manage a cache with variable line-sizes [0016]. Having only one fixed cache line size is known to have advantages when the spatial locality of 
	Hence, it would have been obvious to the skilled artisan before the effective filing date to incorporate Busaba’s techniques for supporting cache accesses at variable granularities to the caches and logic of the combination for the purpose of reducing the need for invalidations of shared data.

[CLM 5]
5. (Previously Presented) The computing device of claim 1, wherein the different cache blocks in the second memory have a same size but have different sizes of cached portions of data from the different blocks in the first memory.
	The combination teaches claim 1, wherein the different cache blocks in the second memory have a same size but have different sizes of cached portions of data from the different blocks in the first memory.
Under a broadest reasonable interpretation, having a plurality of cache lines with the same cache line size where the portions of data in different cache lines may be sourced from a different page in DRAM reads as claimed, at least because “different sizes of cached portions” may refer to the size of the group of cached portions from each respective source block in the main memory. Under such an interpretation, if one cache line contains one portion of data from one page, and another contains two portions of data from one page, the cache line would read on the claims because there is one cache line that has gathered more data from one page than the other cache line. The combination permits such cache lines comprising different amounts of data from different cache lines [Schmidt, Fig. 1].
Scalar sub-cache line memory accesses lead to partially valid lines in caches complicating the cache design. However, SCG forms full cache lines via several sub-cache line accesses before inserting the data which eliminates the partial line issue” [Schmidt, P2, C2]).

[CLM 14]
	Claim 14 is rejected on similar grounds as claim 4, as it is the method performed by the apparatus of claim 4.
[CLM 15]
	Claim 15 is rejected on similar grounds as claim 5, as it is the method performed by the apparatus of claim 5.

Claims 6-7, 10 and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, in further view of Alexander US 8,176,252.
[CLM 6]
6. (Previously Presented) The computing device of claim 1, wherein each of the different cache blocks stores a separate item selection vector.  
The combination teaches claim 1.
Where the combination is silent, Alexander discloses storing an item selection vector in each cache block [Fig. 9]. Specifically, a cache line includes at least a tag and a number of SGL element addresses indicating the addresses of data elements to be targeted. An SGL is a list of elements which Arbitrary fragment size and alignment…SG cache maintains and stores the context and the memory space location of the fragments involved in an SG list” [C12, L59-67]), as well as to perform prefetching [C12, L15-22].
	Accordingly, it would have been obvious to the skilled artisan before the effective filing date to apply Alexander’s practice of including SG list data in each cache line in order to support the transfer and organization of data portions of arbitrary size, and further to improve DMA performance by providing the memory controller additional means of identifying data to prefetch [C13, L15-22]. 

[CLM 7]
7. (Original) The computing device of claim 6, wherein item selection vectors of the different cache blocks have different sizes.
The combination teaches claim 6, wherein item selection vectors of the different cache blocks have different sizes (“every cache line 202 holds information on up to four SG elements in the SG list” [C9-29-36]).

[CLM 10]
10. (Original) The computing device of claim 1, wherein the item selection vector has a list of index pairs, each identifying a range of the block of contiguous addresses in the space of memory address.
	The combination teaches claim 1. The combination further teaches that data units smaller than a cache line may be identified using indices [Schmidt, P2, C2]. The skilled artisan would recognize that a data element may be demarcated by index information comprising a starting address and element size 
	However, alternative methods of identifying a data element, e.g. a start and end address, constitutes an index pair, were known in the art. Other prior art evidences alternative methods of specifying a particular data element’s boundaries.
Where the combination is silent to an item selection vector specifically using index pairs to identify a memory block, Alexander teaches cache lines containing an item selection vector, where each portion of a cache line is identified using a start and end index (Start Offset and End Offset) [Fig. 6], from which values such as a length of the data fragment may be computed [C10, L6-25]. By providing complete information to a DMA controller for identifying the boundaries of a data element, support for independent operation of the DMAC for accessing the variable size chunks is provided [Alexander, C1, L60-67].
	It would have been obvious to the skilled artisan before the effective filing date to substitute other known structures, e.g. the pair of starting and ending indices for identifying the bounds of a memory portion as disclosed by Alexander, to identify the bounds of portions of the cache line in the combination for the purpose of supporting offloading of memory access processing to a DMA controller. Further, the results would have been predictable because the alternative methods are each performing the same function as designed – denoting the bounds of a data element.

[CLM 16]
16. The method of claim 13, further comprising:
storing a separate item selection vector for each of the different cache blocks.


[CLM 17]
17. The method of claim 16, wherein item selection vectors of the different cache blocks have different sizes.  
	Claim 17 is rejected on similar grounds as claim 7, as it is the method performed by the apparatus of claim 7.


Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 19 above, and further in view of Alexander and Busaba.
[CLM 20]
20. The non-transitory computer storage medium of claim 19, wherein the method further comprises: 
caching data from different blocks of the first memory of a same size in different cache blocks of different sizes in the second memory;
storing tag information for the different cache blocks to identify the different blocks in the first memory respectively; and
storing different item selection vectors for the different cache blocks respectively.
	
The combination teaches claim 19, and further teaches:
storing tag information for the different cache blocks to identify the different blocks in the first memory respectively; and
Tag information stored with each cache line [Godard, Fig. 5A].
storing different item selection vectors for the different cache blocks respectively, 

Where the combination is silent, Alexander discloses storing different item selection vectors for the different cache blocks respectively (storing an item selection vector in each cache block for the respective cache line [Fig. 9]; “every cache line 202 holds information on up to four SG elements in the SG list” [C9-29-36]). Hence, a cache line includes at least a tag and a number of SGL element addresses indicating the addresses of data elements to be targeted. An SGL is a list of elements which together define a block of memory to be transferred, where each chunk may be in various locations and of varying sizes in the memory [C1, L60-67]. By including the targeting information in each cache line, it is possible to allow a DMA controller to independently obtain information for and perform scatter and gather operations [C1, L23-42] to variable-sized portions of data in memory (“Arbitrary fragment size and alignment…SG cache maintains and stores the context and the memory space location of the fragments involved in an SG list” [C12, L59-67]), as well as to perform prefetching [C12, L15-22].
	Accordingly, it would have been obvious to the skilled artisan before the effective filing date to apply Alexander’s practice of including SG list data in each cache line in order to support the transfer and organization of data portions of arbitrary size, and further to improve DMA performance by providing the memory controller additional means of identifying data to prefetch [C13, L15-22].

Where the combination is silent to caching data from different blocks of the first memory of a same size in different cache blocks of different sizes in the second memory,
Busaba teaches wherein the different cache blocks in the second memory have different sizes (“Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub -cache line portions of a full cache line. Each cache is associated with a directory having a number of directory entries and with a side table having a smaller number of entries. The directory entry for a cache line associates the cache line with a tag and a set of full-line descriptive bits. Creating a side table entry for the cache line places the cache line in sub-line coherency mode. The side table entry associates each of the sub -cache line portions of the cache line with a set of sub-line descriptive bits. Removing the side table entry may return the cache line to full-line coherency mode.” [Abstract]).
	Specifically, Busaba discloses devices and methods to manage a cache with variable line-sizes [0016]. Having only one fixed cache line size is known to have advantages when the spatial locality of accesses is high, however in other cases it would be beneficial to support accesses which use smaller cache lines [0222-0223], e.g. by using a configurable cache line size whereby cache lines of different sizes may coexist in the cache [0256-0257], so the unit of transfer between the cache and the memory may be smaller and hence bus or network traffic for invalidations of shared data may be reduced [0225].

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEWY H LI whose telephone number is (571)272-8714.  The examiner can normally be reached on Mon-Fri 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571)272-4085.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/HEWY H LI/Examiner, Art Unit 2136        

/CHARLES RONES/Supervisory Patent Examiner, Art Unit 2136