DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Claims 1, 4-8, 11, 13-14, 16-20, and 22 are pending in this office action and presented for examination. Claims 1, 4-8, 11, 13-14, 17-19, and 22 are newly amended, and claims 2, 10, and 12 are cancelled, by the response received October 27, 2022.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 11, 13-14, 16-20, and 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 11 recites the limitation “search for buffered data that comprises each of the plurality of first parts of data based on an address interval of each of the plurality of first parts of data and an address interval of the buffered data” in lines 10-12. However, it is indefinite as to whether data in a buffer is searched to determine whether the data comprises each of (i.e., all of) the plurality of first parts of data, or whether another interpretation is intended, such as one in which each search is for a different first part of data, but not for all first parts of data.
Claims 13-14 and 16-20 are rejected for failing to alleviate the rejection of claim 11 above.

Claim 22 recites the limitation “search for buffered data that comprises each of the plurality of first parts of data based on an address interval of each of the plurality of first parts of data and an address interval of the buffered data” in lines 13-15. However, it is indefinite as to whether data in a buffer is searched to determine whether the data comprises each of (i.e., all of) the plurality of first parts of data, or whether another interpretation is intended, such as one in which each search is for a different first part of data, but not for all first parts of data.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4-8, 11, 13-14, 16-20, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sheaffer (US 20120047311 A1) in view of Batley (US 20170109165 A1).
Consider claim 1, Sheaffer discloses an instruction executing method, comprising: receiving an address-unaligned data load instruction, the address-unaligned data load instruction instructing to read target data from a memory ([0038], lines 5-8, instruction 510 that indicates a load operation to access arrays that are contiguous in the virtual address space but are misaligned with the boundary of a cache memory line and/or a page memory); acquiring a first part of data of the target data from a buffer ([0025], lines 5-7, the stored data of the preceding cache memory line of the particular cache memory line in the stored data array 320), comprising: for the first part of data of the target data, searching for buffered data that comprises the first part of data based on an address interval of the first part of data and an address interval of the buffered data (Sheaffer, [0024], lines 6-9, in one embodiment of the invention, the respective addresses stored in the tag array 325 are the addresses of the cache memory lines that are stored in the stored data array 320); acquiring a second part of data of the target data from the memory ([0025], lines 3-5, the incoming data of a particular cache memory line 310 from the L1 data cache memory 250), comprising: for the second part of data of the target data, accessing the memory based on an address of the second part of data and a bit width of the memory to obtain the second part of data, wherein the second part of data is located in a bit width of the memory (FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access); and merging the first part of data and the second part of data to obtain the target data ([0025], lines 1-9, when the cache memory line split access logic 235 receives a non-aligned cache memory access request, the merge logic 330 combines or merges the incoming data of a particular cache memory line 310 from the L1 data cache memory 250 with the stored data of the preceding cache memory line of the particular cache memory line in the stored data array 320. The output 340 of the combination by the merge logic 330 fulfills the non-aligned cache memory access request).
However, Sheaffer does not disclose that the aforementioned acquiring from a buffer entails a plurality of first parts, with the aforementioned searching and merging entailing each first part, and that the aforementioned acquiring from the memory entails a plurality of second parts, with the aforementioned accessing and merging entailing each second part, and with each second part of data being located in a separate bit width of the memory.
On the other hand, Batley, in the same field of address-unaligned data loading, discloses acquiring a plurality of first parts of data of target data from a buffer and acquiring a plurality of second parts of data of the target data from the memory, with each second part of data being located in a separate bit width of a memory, and merging each first part of data and each second part of data to obtain the target data ([0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 1-18, as shown in FIG. 3, by using the stream buffer 58, a single instruction which accesses a relatively large amount of data starting from an unaligned address only needs to incur this performance penalty on the first beat of the overall series of load operations. The first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block. If there are three or more load operations in the series then any middle operations (i.e. operations other than the first or last operations) would each load a full data word from the data store, stash some bytes of the data word in the stream buffer 58 and pull other bytes out of the stream buffer from the data written in response to a previous operation of the series, to obtain a full N bytes of unaligned data required for the register write. This is shown schematically in FIG. 3; [0074], lines 3-13, the data from addresses 0x02-0x07 from the stream buffer 58 is combined with data from addresses 0x08-0X09 loaded by that transaction, to form an 8 byte block of unaligned data (corresponding to addresses 0x02-0x09) which is written to the register Q0. Meanwhile, the remaining part of the loaded data word corresponding to addresses 0x0A-0x0F is placed in the stream buffer 58 for the subsequent transaction to use. The transactions at cycles 2 and 3 are handled in a similar way to the transaction in cycle 1).
Batley’s teaching of loading a plurality of first parts of data and second parts of data increases system performance relative to only loading a first part and second part. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Batley with the invention of Sheaffer in order to increase system performance. Additionally, this modification merely entails the use of a known technique (Batley’s teaching cited above) to improve similar devices (methods, or products) (the invention of Sheaffer, which is also directed to address-unaligned data load instructions) in the same way (Batley’s teaching cited above, when applied to the invention of Sheaffer, results in Sheaffer being improved in the same way by likewise supporting acquiring and merging a plurality of first parts of data and a plurality of second parts of data), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Batley’s teaching as cited above, when applied to the invention of Sheaffer, results in the overall claim language of ‘acquiring a “plurality of” first “parts” of data of the target data from a buffer, comprising: for “each” first part of data of the target data, searching for buffered data that comprises “each of” the “plurality of” first “parts” of data based on an address interval of “each of” the “plurality of” first “parts” of data and an address interval of the buffered data; acquiring a “plurality of” second “parts” of data of the target data from the memory, comprising: for “each” second part of data of the target data, accessing the memory based on an address of “each” second part of data and a bit width of the memory to obtain “each” second part of data, wherein “each” second part of data is located in a “separate” bit width of the memory; and merging “each” first part of data and “each” second part of data to obtain the target data.’

Consider claim 4, the overall combination entails the instruction executing method of claim 1 (see above), wherein accessing the memory based on an address of each second part of data and the bit width of the memory comprises: specifying, based on the bit width of the memory, a data length of data to be acquired; and specifying, based on the address of each second part of data, an address of the data in the memory to be acquired, the address in the memory being aligned to the data length (Sheaffer, FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access; Batley, [0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 5-10, the first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block).

Consider claim 5, the overall combination entails the instruction executing method of claim 1 (see above), wherein after accessing the memory, the instruction executing method further comprises: storing at least one part of the second parts of data into the buffer as buffered data (Sheaffer, [0032], lines 1-8, after the merge logic 330 receives the stored data of the cache memory line n-1 402, the cache memory line split access logic 235 replaces the stored data of the cache memory line n-1 402 in the stored data array 320 with the data of the cache memory line n 404 in one embodiment of the invention. This facilitates contiguous cache memory line split accesses to achieve full throughput operation within a single machine or clock cycle).

Consider claim 6, the overall combination entails the instruction executing method of claim 1 (see above), further comprising: determining, based on an address interval of the target data and the bit width of the memory, at least one of the plurality of first parts of data or the plurality of second parts of data of the target data (Sheaffer, FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access; Batley, [0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 5-10, the first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block).

Consider claim 7, the overall combination entails the instruction executing method of claim 6 (see above), wherein determining, based on the address interval of the target data and the bit width of the memory, at least one of the plurality of first parts of data or the plurality of second parts of data of the target data comprises: determining, based on the address interval of the target data and the bit width of the memory, a bit width boundary spanned by the target data; and dividing, based on the spanned bit width boundary, the target data into the plurality of first parts of data and the plurality of second parts of data (Sheaffer, FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access; Batley, [0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 5-10, the first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block).

Consider claim 8, the overall combination entails the instruction executing method of claim 1 (see above), wherein an address of the target data in the address-unaligned data load instruction is not equal to an integer multiple of a data length of the target data (Sheaffer, [0004], lines 3-11, a cache memory line split access of 4 bytes 130 occurs when the access is shifted 4 bytes from the aligned cache memory access 120, i.e., the required data is the data A2 to A16 from the 64-byte cache memory line n 110 and the data Z1 from the 64-byte cache memory line n+1 115. The cache memory line split access of 8 bytes 140 and the cache memory line split access of 12 bytes 150 illustrate two other examples of non-aligned cache memory accesses).

Consider claim 11, Sheaffer discloses a processing apparatus communicatively coupled to a memory ([0025], lines 4-5, L1 data cache memory 250), the processing apparatus comprising: a buffer configured to store buffered data ([0024], line 6, stored data array 320); an instruction executing circuit ([0021], line 1, execution unit 230) configured to execute an address-unaligned data load instruction, wherein the address-unaligned data load instruction is used to read target data from the memory ([0038], lines 5-8, instruction 510 that indicates a load operation to access arrays that are contiguous in the virtual address space but are misaligned with the boundary of a cache memory line and/or a page memory) and the instruction executing circuit is coupled to the buffer and the memory (Figure 2); a data acquisition circuit configured to: acquire a first part of data of the target data from the buffer ([0025], lines 5-7, the stored data of the preceding cache memory line of the particular cache memory line in the stored data array 320); search for buffered data that comprises the first part of data based on an address interval of the first part of data and an address interval of the buffered data (Sheaffer, [0024], lines 6-9, in one embodiment of the invention, the respective addresses stored in the tag array 325 are the addresses of the cache memory lines that are stored in the stored data array 320); acquire a second part of data of the target data ([0025], lines 3-5, the incoming data of a particular cache memory line 310 from the L1 data cache memory 250); and access the memory for the second part of data of the target data based on an address of the second part of data and a bit width of the memory to acquire the second part of data, wherein the second part of data is located in a bit width of the memory (FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access); and a data processing circuit configured to merge the first part of data and the second part of data to obtain the target data ([0025], lines 1-9, when the cache memory line split access logic 235 receives a non-aligned cache memory access request, the merge logic 330 combines or merges the incoming data of a particular cache memory line 310 from the L1 data cache memory 250 with the stored data of the preceding cache memory line of the particular cache memory line in the stored data array 320. The output 340 of the combination by the merge logic 330 fulfills the non-aligned cache memory access request).
However, Sheaffer does not disclose that the aforementioned acquiring from the buffer entails a plurality of first parts, with the aforementioned searching and merging entailing each first part, and that the aforementioned acquiring (in claim 11, line 13) entails a plurality of second parts, with the aforementioned accessing and merging entailing each second part, and with each second part of data being located in a separate bit width of the memory.
On the other hand, Batley, in the same field of address-unaligned data loading, discloses acquiring a plurality of first parts of data of target data from a buffer and acquiring a plurality of second parts of data of the target data from the memory, with each second part of data being located in a separate bit width of a memory, and merging each first part of data and each second part of data to obtain the target data ([0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 1-18, as shown in FIG. 3, by using the stream buffer 58, a single instruction which accesses a relatively large amount of data starting from an unaligned address only needs to incur this performance penalty on the first beat of the overall series of load operations. The first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block. If there are three or more load operations in the series then any middle operations (i.e. operations other than the first or last operations) would each load a full data word from the data store, stash some bytes of the data word in the stream buffer 58 and pull other bytes out of the stream buffer from the data written in response to a previous operation of the series, to obtain a full N bytes of unaligned data required for the register write. This is shown schematically in FIG. 3; [0074], lines 3-13, the data from addresses 0x02-0x07 from the stream buffer 58 is combined with data from addresses 0x08-0X09 loaded by that transaction, to form an 8 byte block of unaligned data (corresponding to addresses 0x02-0x09) which is written to the register Q0. Meanwhile, the remaining part of the loaded data word corresponding to addresses 0x0A-0x0F is placed in the stream buffer 58 for the subsequent transaction to use. The transactions at cycles 2 and 3 are handled in a similar way to the transaction in cycle 1).
Batley’s teaching of loading a plurality of first parts of data and second parts of data increases system performance relative to only loading a first part and second part.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Batley with the invention of Sheaffer in order to increase system performance. Additionally, this modification merely entails the use of a known technique (Batley’s teaching cited above) to improve similar devices (methods, or products) (the invention of Sheaffer, which is also directed to address-unaligned data load instructions) in the same way (Batley’s teaching cited above, when applied to the invention of Sheaffer, results in Sheaffer being improved in the same way by likewise supporting acquiring and merging a plurality of first parts of data and a plurality of second parts of data), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Batley’s teaching as cited above, when applied to the invention of Sheaffer, results in the overall claim language of ‘acquire a “plurality of” first “parts” of data of the target data from the buffer; search for buffered data that comprises “each of” the “plurality of” first “parts” of data based on an address interval of “each of” the “plurality of” first “parts” of data and an address interval of the buffered data; acquire a “plurality of” second “parts” of data of the target data; and access the memory for “each” second part of data of the target data based on an address of “each” second part of data and a bit width of the memory to acquire “each” second part of data, wherein “each” second part of data is located in a “separate” bit width of the memory; and a data processing circuit configured to merge “each” first part of data and “each” second part of data to obtain the target data.’

Consider claim 13, the overall combination entails the processing apparatus of claim 11 (see above), wherein the data acquisition circuit is configured to: acquire the plurality of first parts of data from the memory in response to a determination that the plurality of first parts of data have not been found in the buffer (Sheaffer, [0050], lines 4-7, If no match is found in step 750, the flow ends. If a match is found in step 750, the cache memory cache memory line split access logic 235 merges the data retrieved; [0024], lines 1-4, the stored data array 320 holds or stores one or more cache memory lines of the L1 data cache memory 250 that are previously accessed through a misaligned access of the L1 data cache memory 250; Figure 1; [0027], lines 4-5; in other words, when no match is found in the buffer, for example when there were no relevant previous accesses, no merging using the stored data array 320 occurs, and the data is acquired from the cache memory instead in a manner which uses double the bandwidth of the L1 data cache memory).

Consider claim 14, the overall combination entails the processing apparatus of claim 13 (see above), wherein the data acquisition circuit is configured to: access the memory based on an address of the plurality of first parts of data and the bit width of the memory to acquire the plurality of first parts of data (Sheaffer, [0050], lines 4-7, If no match is found in step 750, the flow ends. If a match is found in step 750, the cache memory cache memory line split access logic 235 merges the data retrieved; [0024], lines 1-4, the stored data array 320 holds or stores one or more cache memory lines of the L1 data cache memory 250 that are previously accessed through a misaligned access of the L1 data cache memory 250; Figure 1; [0027], lines 4-5; in other words, when no match is found in the buffer, for example when there were no relevant previous accesses, no merging using the stored data array 320 occurs, and the data is acquired from the cache memory instead in a manner which uses double the bandwidth of the L1 data cache memory).

Consider claim 16, the overall combination entails the processing apparatus of claim 11 (see above), wherein the data acquisition circuit is configured to: specify, based on the bit width of the memory, a data length of data to be acquired; and specify, based on the address of each second part of data, an address of the data in the memory to be acquired, the address in the memory being aligned to the data length (Sheaffer, FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access; Batley, [0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 5-10, the first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block).

Consider claim 17, the overall combination entails the processing apparatus of claim 11 (see above), wherein the data acquisition circuit is configured to: after accessing the memory, store at least one of the second parts of data into the buffer as buffered data (Sheaffer, [0032], lines 1-8, after the merge logic 330 receives the stored data of the cache memory line n-1 402, the cache memory line split access logic 235 replaces the stored data of the cache memory line n-1 402 in the stored data array 320 with the data of the cache memory line n 404 in one embodiment of the invention. This facilitates contiguous cache memory line split accesses to achieve full throughput operation within a single machine or clock cycle).

Consider claim 18, the overall combination entails the processing apparatus of claim 11 (see above), wherein the data acquisition circuit is configured to: determine, based on an address interval of the target data and the bit width of the memory, at least one of the plurality of first parts of data or the plurality of second parts of data of the target data (Sheaffer, FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access; Batley, [0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 5-10, the first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block).

Consider claim 19, the overall combination entails the processing apparatus of claim 18 (see above), wherein the data acquisition circuit is configured to: determine, based on the address interval of the target data and the bit width of the memory, a bit width boundary spanned by the target data; and divide, based on the spanned bit width boundary, the target data into the plurality of first parts of data and the plurality of second parts of data (Sheaffer, FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access; Batley, [0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 5-10, the first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block).

Consider claim 20, the overall combination entails the processing apparatus of claim 11 (see above), wherein an address of the target data in the address-unaligned data load instruction is not equal to an integer multiple of a data length of the target data (Sheaffer, [0004], lines 3-11, a cache memory line split access of 4 bytes 130 occurs when the access is shifted 4 bytes from the aligned cache memory access 120, i.e., the required data is the data A2 to A16 from the 64-byte cache memory line n 110 and the data Z1 from the 64-byte cache memory line n+1 115. The cache memory line split access of 8 bytes 140 and the cache memory line split access of 12 bytes 150 illustrate two other examples of non-aligned cache memory accesses).

Consider claim 22, Sheaffer discloses a System on Chip ([0053], lines 9-10, a system on a chip (SOC) system), comprising: a memory ([0025], lines 4-5, L1 data cache memory 250); and a processing apparatus communicatively coupled to the memory, the processing apparatus comprising: a buffer configured to store buffered data ([0024], line 6, stored data array 320); an instruction executing circuit ([0021], line 1, execution unit 230) configured to execute an address-unaligned data load instruction, wherein the address-unaligned data load instruction is used to read target data from the memory ([0038], lines 5-8, instruction 510 that indicates a load operation to access arrays that are contiguous in the virtual address space but are misaligned with the boundary of a cache memory line and/or a page memory) and the instruction executing circuit is coupled to the buffer and the memory (Figure 2); a data acquisition circuit configured to: acquire a first part of data of the target data from the buffer ([0025], lines 5-7, the stored data of the preceding cache memory line of the particular cache memory line in the stored data array 320); search for buffered data that comprises the first part of data based on an address interval of the first part of data and an address interval of the buffered data (Sheaffer, [0024], lines 6-9, in one embodiment of the invention, the respective addresses stored in the tag array 325 are the addresses of the cache memory lines that are stored in the stored data array 320); acquire a second part of data of the target data ([0025], lines 3-5, the incoming data of a particular cache memory line 310 from the L1 data cache memory 250); and access the memory for the second part of data of the target data based on an address of the second part of data and a bit width of the memory to acquire the second part of data, wherein the second part of data is located in a bit width of the memory (FIG. 4A, [0029], lines 6-8, each cache memory line of the L1 data cache memory 250 is assumed to have a data width of 64 bytes (as an example); [0030], lines 1-7, the cache memory line split access logic 235 is assumed to receive an instruction or request that requires 48 bytes of data from the cache memory line n-1 402 and 16 bytes of data from the cache memory line n 404. The stored data array 320 is assumed to store the data of the cache memory line n-1 402 during a prior misaligned cache memory access); and a data processing circuit configured to merge the first part of data and the second part of data to obtain the target data ([0025], lines 1-9, when the cache memory line split access logic 235 receives a non-aligned cache memory access request, the merge logic 330 combines or merges the incoming data of a particular cache memory line 310 from the L1 data cache memory 250 with the stored data of the preceding cache memory line of the particular cache memory line in the stored data array 320. The output 340 of the combination by the merge logic 330 fulfills the non-aligned cache memory access request).
However, Sheaffer does not disclose that the aforementioned acquiring from the buffer entails a plurality of first parts, with the aforementioned searching and merging entailing each first part, and that the aforementioned acquiring (in claim 11, line 13) entails a plurality of second parts, with the aforementioned accessing and merging entailing each second part, and with each second part of data being located in a separate bit width of the memory.
On the other hand, Batley, in the same field of address-unaligned data loading, discloses acquiring a plurality of first parts of data of target data from a buffer and acquiring a plurality of second parts of data of the target data from the memory, with each second part of data being located in a separate bit width of a memory, and merging each first part of data and each second part of data to obtain the target data ([0038], lines 1-10, however, sometimes the apparatus may require access to an unaligned block of data which is unaligned with respect to data word boundaries of the data store. For example the unaligned block of data may start part-way through one data word. In this case, handling the load instruction can be more complex because it may require an initial load operation to load an initial portion of the unaligned block of data from one data word, and then a number of subsequent load operations for loading subsequent portions of the unaligned block of data; [0072], lines 1-18, as shown in FIG. 3, by using the stream buffer 58, a single instruction which accesses a relatively large amount of data starting from an unaligned address only needs to incur this performance penalty on the first beat of the overall series of load operations. The first load operation would load an initial portion of the unaligned block to be loaded in response to the overall instruction and place it in the stream buffer 58. This would then be followed by one or more subsequent load operations which load subsequent portions of the unaligned block. If there are three or more load operations in the series then any middle operations (i.e. operations other than the first or last operations) would each load a full data word from the data store, stash some bytes of the data word in the stream buffer 58 and pull other bytes out of the stream buffer from the data written in response to a previous operation of the series, to obtain a full N bytes of unaligned data required for the register write. This is shown schematically in FIG. 3; [0074], lines 3-13, the data from addresses 0x02-0x07 from the stream buffer 58 is combined with data from addresses 0x08-0X09 loaded by that transaction, to form an 8 byte block of unaligned data (corresponding to addresses 0x02-0x09) which is written to the register Q0. Meanwhile, the remaining part of the loaded data word corresponding to addresses 0x0A-0x0F is placed in the stream buffer 58 for the subsequent transaction to use. The transactions at cycles 2 and 3 are handled in a similar way to the transaction in cycle 1).
Batley’s teaching of loading a plurality of first parts of data and second parts of data increases system performance relative to only loading a first part and second part.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Batley with the invention of Sheaffer in order to increase system performance. Additionally, this modification merely entails the use of a known technique (Batley’s teaching cited above) to improve similar devices (methods, or products) (the invention of Sheaffer, which is also directed to address-unaligned data load instructions) in the same way (Batley’s teaching cited above, when applied to the invention of Sheaffer, results in Sheaffer being improved in the same way by likewise supporting acquiring and merging a plurality of first parts of data and a plurality of second parts of data), which is an example of a rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Batley’s teaching as cited above, when applied to the invention of Sheaffer, results in the overall claim language of ‘acquire a “plurality of” first “parts” of data of the target data from the buffer; search for buffered data that comprises “each of” the “plurality of” first “parts” of data based on an address interval of “each of” the “plurality of” first “parts” of data and an address interval of the buffered data; acquire a “plurality of” second “parts” of data of the target data; and access the memory for “each” second part of data of the target data based on an address of “each” second part of data and a bit width of the memory to acquire “each” second part of data, wherein “each” second part of data is located in a “separate” bit width of the memory; and a data processing circuit configured to merge “each” first part of data and “each” second part of data to obtain the target data.’

Response to Arguments
Applicant on page 10 argues: “Figures 3A and 5B of the drawings are objected to. Applicant is submitting herewith copies of the drawings dated May 24, 2022, as noted by the Office in the Office Action, page 25, item 47. Applicant respectfully requests that the objection to the drawings be withdrawn.”
In view of the aforementioned replacement drawings, the previously presented objections to the drawings are withdrawn.

Applicant on page 10 argues: “Claims 2, 4-8, 10-14, 16-20, and 22 are objected to because of various informalities. Applicant has amended the claims to address the informalities identified in the Office Action, pages 3-5, items 6-16. Applicant respectfully requests that the objections to the claims be withdrawn.”
In view of the aforementioned amendments, the previously presented objections to the claims are withdrawn.

Applicant on page 10 argues: “Claims 2, 5, 10, and 17 stand rejected under 35 U.S.C. §112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor regards as the invention. Applicant respectfully traverses the rejection. Applicant has canceled claims 2 and 10, rendering those objections moot. Applicant has amended claims 5 and 17 to address the rejections identified in the Office Action, page 5, item 20 and page 6, item 25. Applicant respectfully requests that the 35 U.S.C. §112(b) rejection of the claims be withdrawn.”
In view of the aforementioned amendments, the previously presented rejections under 35 U.S.C. §112(b) to the claims are withdrawn.

Applicant on page 12 argues: “Sheaffer teaches that the stored data array 320 is capable of storing one or more cache lines. When a non-aligned cache memory access request is received, Sheaffer only takes one piece of data from the stored data array, which is the cache line or portion of the cache line that immediately precedes the cache line retrieved from memory. This is in contrast to claim 1, which recites "acquiring a plurality of first parts of data of the target data from a buffer" (emphasis added).”
Examiner notes that Sheaffer at least implicitly teaches acquiring a plurality of first parts of data of target data from a buffer (see [0036]: “After the merge logic 330 receives the stored data of the cache memory line n+1 406, the cache memory line split access logic 235 replaces the stored data of the cache memory line n 404 in the stored data array 320 with the data in the cache memory line n+1 406 in one embodiment of the invention. This facilitates contiguous cache memory line split accesses to achieve full throughput operation within a single machine or clock cycle.”) Nevertheless, for the purposes of compact prosecution, Batley is relied upon to more explicitly teach the claimed limitation. 

Applicant on page 12 argues: ‘Furthermore, the Office relies on these same portions of Batley to teach "acquiring from memory entails a plurality of second parts." Office Action, pages 8-9, item 28 (emphasis added). Batley does not teach or suggest "acquiring a plurality of first parts of data of the target data from a buffer" as recited in claim 1 (emphasis added).’
However, Examiner submits that Batley also teaches the aforementioned amended limitation — see the citations set forth in the Claim Rejections - 35 USC § 103 section above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571)270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEITH E VICARY/Primary Examiner, Art Unit 2182