DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This action is responsive to the application filed on May 10, 2022 and Preliminary Amendment filed on May 10, 2022.
Thus, claims 1-20 are pending for examination.
Examiner Notes
3.	Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Specification
5.	The specification is objected to for information provided on page 1, paragraph 0001, where the current status of the Cross Reference to Related Applications should be updated. See MPEP 608.01[R-5] and 37 CFR 1.78.
Claim Objections
6.	Claims 14-20 are objected to because of the following informalities:
	As to claim 14, line 9, recites the limitation “replace the portion of the compiler-generated code the tensor operation” should be changed to, for example, -- replace the portion of the compiler-generated code with the tensor operation -- instead.  Appropriate correction is required.
	Claims 15-20 are also objected to for being depended upon the objection of base claim 14. 
Double Patenting
7.	A rejection based on double patenting of the “same invention” type finds its support in the language of 35 U.S.C. 101 which states that “whoever invents or discovers any new and useful process... may obtain a patent therefor...” (Emphasis added). Thus, the term “same invention,” in this context, means an invention drawn to identical subject matter. See Miller v. Eagle Mfg. Co., 151 U.S. 186 (1894); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Ockert, 245 F.2d 467, 114 USPQ 330 (CCPA 1957).

A statutory type (35 U.S.C. 101) double patenting rejection can be overcome by canceling or amending the claims that are directed to the same invention so they are no longer coextensive in scope. The filing of a terminal disclaimer cannot overcome a double patenting rejection based upon 35 U.S.C. 101.

8.	Claims 14 and 18 are rejected under 35 U.S.C. 101 as claiming the same invention as that of claim 14 of prior U.S. Patent No. 11,347,486 B2. This is a statutory double patenting rejection.
	The conflicting claims are the same, based on the comparison listed in the following table:

Current Application 
U.S. Patent No. US 11,347,486 B2
Claim 18 (including it base claim 14):
    14. A processing system comprising: 
a special-purpose hardware accelerator; and
 a processor configured to: 
    in response to receiving an indication that compiler-generated code executing at the processing system is tileable, determine whether a portion of the compiler-generated code for a tile comprises a sequence of instructions that can be replaced with a tensor operation; and 




       replace the portion of the compiler-generated code the tensor operation in response to determining that the portion can be replaced with the tensor operation.  


18. The processing system of claim 14, wherein the processor is further to: 


    receive a hint in the compiler-generated code indicating that an inner loop of a tile is replaceable with a type of tensor operation.  

Claim 14:
   14. A processing system comprising:
a special-purpose hardware accelerator; and 
a processor configured to:
   in response to receiving an indication that compiler-generated code executing at the processing system is tileable, determine whether a portion of the compiler-generated code for a tile comprises a sequence of instructions that can be replaced with a tensor operation;
     receive a hint in the compiler-generated code indicating that an inner loop of a tile is replaceable with a type of tensor operation; and 
      replace the portion of the compiler-generated code with the tensor operation in response to determining that the portion can be replaced with the tensor operation.


14. A processing system comprising:…
a special-purpose hardware accelerator; and 
a processor configured to:…

     receive a hint in the compiler-generated code indicating that an inner loop of a tile is replaceable with a type of tensor operation; 
…
 be replaced with the tensor operation.



Based on the comparison of the above table, which highlight the differences by underling words indicates that,
Claims 14 and 18 of the current application and claim 14 of the U.S. Patent No. ‘486 are identical subject matter.
Hence, claims 14-18 of the current application are claiming the same invention as of claims 14, of prior U.S. Patent No.’486 and as such are unpatentable under a statutory type (35 U.S.C. 101) double patenting. 

9.	The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
10.	Claims 1-20 are rejected on the ground of non-statutory double patenting as being unpatentable over claims 1-19 of U.S. Patent No. US 11,347,486 B2.
	Although the claims at issue are not identical, they are not patentably distinct from other.

Current Application 
U.S. Patent No. US 11,347,486 B2
 1. A method comprising:
     in response to receiving an indication that source code to be compiled at a processing system is tileable, 


    determining at a compiler of the processing system whether compiler-generated code comprising a plurality of tiles comprises a sequence of instructions for a tile that can be replaced with a tensor operation executable at a special-purpose hardware accelerator of the processing system; and 
   generating code that replaces the sequence of instructions of the compiler- generated code with the tensor operation in response to determining that the sequence of instructions can be replaced with the tensor operation.  

2. The method of claim 1, further comprising: generating a wrapper to invoke execution of the tensor operation at a special- purpose hardware accelerator of the processing system.  
3. The method of claim 1, wherein generating code comprises: replacing an inner loop of the tile in response to dimensions and data types of a sequence of instructions of the inner loop matching dimensions and data types of a tensor operation executable by the special-purpose hardware accelerator.  

4. The method of claim 1, further comprising: receiving a hint in the source code indicating that an inner loop of a tile is replaceable with a type of tensor operation; and in response to receiving the hint and in response to dimensions and data types of the sequence of instructions of the inner loop matching dimensions and a data type of a tensor operation of the type indicated by the hint, generating code replacing the inner loop with the type of tensor operation indicated by the hint.  

5. The method of claim 1, further comprising: identifying as an imperfect tile a sequence of instructions of an inner loop of a tile that does not match dimensions and data types of a tensor operation executable by the special-purpose hardware accelerator.  

6. The method of claim 5, further comprising: generating code to invoke a general-purpose processor of the processing system to execute the imperfect tile.  

7. The method of claim 1, wherein the tensor operation is an aggregate instruction comprising a general matrix to matrix multiplication.  

8. A method comprising:
      receiving, at a compiler of a processing system, an indication that source code comprises a tile;


     comparing an inner loop of the tile to tensor operations executable by a special- purpose hardware accelerator of the processing system; and
       in response to the inner loop of the tile matching a tensor operation executable by the special-purpose hardware accelerator, generating enhanced code that replaces the inner loop of the tile with the tensor operation to invoke the special-purpose hardware accelerator.  

9. The method of claim 8, wherein the inner loop comprises a first sequence of instructions that matches dimensions and data types of the tensor operation.  

10. The method of claim 9, wherein the tile further comprises a second sequence of instructions that does not match dimensions and data types of the tensor operation.  

11. The method of claim 10, further comprising generating code to invoke a general- purpose processor of the processing system to execute the second sequence of instructions.  

12. The method of claim 9, further comprising: receiving a hint in the source code indicating that an inner loop of a tile is replaceable with a type of tensor operation; and generating code replacing the inner loop with the type of tensor operation indicated by the hint.  
13. The method of claim 12, wherein comparing comprises: in response to receiving the hint, comparing dimensions and a data type of the first sequence of instructions with a subset of tensor operations executable by a special-purpose hardware accelerator of the processing system specified by the type of tensor operation indicated by the hint.  

14. A processing system comprising: 
a special-purpose hardware accelerator; and
 a processor configured to: 
    in response to receiving an indication that compiler-generated code executing at the processing system is tileable, determine whether a portion of the compiler-generated code for a tile comprises a sequence of instructions that can be replaced with a tensor operation; and 




       replace the portion of the compiler-generated code the tensor operation in response to determining that the portion can be replaced with the tensor operation.  

15. The processing system of claim 14, wherein the special-purpose hardware accelerator is configured to execute one or more tensor operations.  

16. The processing system of claim 15, wherein the processor is further to: 
    compare dimensions and a data type of the sequence of instructions with tensor operations executable by the special-purpose hardware accelerator; and 
    replace at least one inner loop in response to the dimensions and data types of the sequence of instructions of the 
inner loop that match dimensions and data - 23 -\ttorney Docket Number: 1458-190416-CNT types of a tensor operation executable by the special-purpose hardware accelerator.  

17. The processing system of claim 16, wherein the processor is further to: execute sequences of instructions of the
 inner loop that do not match dimensions and data types of tensor operations executable by the special-purpose hardware accelerator.  

18. The processing system of claim 14, wherein the processor is further to: 


    receive a hint in the compiler-generated code indicating that an inner loop of a tile is replaceable with a type of tensor operation.  



19. The processing system of claim 18, wherein the processor is further configured to: in response to receiving the hint, compare dimensions and a data type of the sequence of instructions with a subset of tensor operations executable by a special-purpose hardware accelerator of the processing system specified by the type of tensor operation indicated by the hint.  

20. The processing system of claim 14, wherein the tensor operation is an aggregate instruction comprising a general matrix to matrix multiplication.

1. A method comprising:
   in response to receiving an indication that source code to be compiled at a processing system is tileable such that a tile representing at least one function of the source code performs a memory access to a block of data,
    determining at a compiler of the processing system whether compiler-generated code comprising a plurality of tiles comprises a sequence of instructions for a tile that can be replaced with a tensor operation executable at a special-purpose hardware accelerator of the processing system; and 
     generating code that replaces the sequence of instructions of the compiler-generated code with the tensor operation in response to determining that the sequence of instructions can be replaced with the tensor operation.

2. The method of claim 1, further comprising: generating a wrapper to invoke execution of the tensor operation at a special-purpose hardware accelerator of the processing system.
3. The method of claim 1, wherein generating code comprises: replacing an inner loop of the tile in response to dimensions and data types of a sequence of instructions of the inner loop matching dimensions and data types of a tensor operation executable by the special-purpose hardware accelerator.

4. The method of claim 1, further comprising: receiving a hint in the source code indicating that an inner loop of a tile is replaceable with a type of tensor operation; and in response to receiving the hint and in response to dimensions and data types of the sequence of instructions of the inner loop matching dimensions and a data type of a tensor operation of the type indicated by the hint, generating code replacing the inner loop with the type of tensor operation indicated by the hint.

5. The method of claim 1, further comprising: identifying as an imperfect tile a sequence of instructions of an inner loop of a tile that does not match dimensions and data types of a tensor operation executable by the special-purpose hardware accelerator.

6. The method of claim 5, further comprising: generating code to invoke a general-purpose processor of the processing system to execute the imperfect tile.

7. The method of claim 1, wherein the tensor operation is an aggregate instruction comprising a general matrix to matrix multiplication.

8. A method comprising: 
      responsive to receiving, at a compiler of a processing system, an indication that source code comprises a tile representing one or more functions of the source code that perform a memory access to a block of data,
     comparing an inner loop of the tile to tensor operations executable by a special-purpose hardware accelerator of the processing system; and
     in response to the inner loop of the tile matching a tensor operation executable by the special-purpose hardware accelerator, generating enhanced code that replaces the inner loop of the tile with the tensor operation to invoke the special-purpose hardware accelerator.

9. The method of claim 8, wherein the inner loop comprises a first sequence of instructions that matches dimensions and data types of the tensor operation.

10. The method of claim 9, wherein the tile further comprises a second sequence of instructions that does not match dimensions and data types of the tensor operation.

11. The method of claim 10, further comprising generating code to invoke a general-purpose processor of the processing system to execute the second sequence of instructions.

12. The method of claim 9, further comprising: receiving a hint in the source code indicating that an inner loop of a tile is replaceable with a type of tensor operation; and generating code replacing the inner loop with the type of tensor operation indicated by the hint.
13. The method of claim 12, wherein comparing comprises: in response to receiving the hint, comparing dimensions and a data type of the first sequence of instructions with a subset of tensor operations executable by a special-purpose hardware accelerator of the processing system specified by the type of tensor operation indicated by the hint.

14. A processing system comprising:
a special-purpose hardware accelerator; and 
a processor configured to:
   in response to receiving an indication that compiler-generated code executing at the processing system is tileable, determine whether a portion of the compiler-generated code for a tile comprises a sequence of instructions that can be replaced with a tensor operation;
     receive a hint in the compiler-generated code indicating that an inner loop of a tile is replaceable with a type of tensor operation; and 
      replace the portion of the compiler-generated code with the tensor operation in response to determining that the portion can be replaced with the tensor operation.

15. The processing system of claim 14, wherein the special-purpose hardware accelerator is configured to execute one or more tensor operations.

16. The processing system of claim 15, wherein the processor is further to:
  compare dimensions and a data type of the sequence of instructions with tensor operations executable by the special-purpose hardware accelerator; and 
    replace at least one inner loop in response to the dimensions and data types of the sequence of instructions of the at least one inner loop that match dimensions and data types of a tensor operation executable by the special-purpose hardware accelerator.

17. The processing system of claim 16, wherein the processor is further to: execute sequences of instructions of the at least one inner loop that do not match dimensions and data types of tensor operations executable by the special-purpose hardware accelerator.

14. A processing system comprising:…
a special-purpose hardware accelerator; and 
a processor configured to:…

     receive a hint in the compiler-generated code indicating that an inner loop of a tile is replaceable with a type of tensor operation; 
…
 be replaced with the tensor operation.

18. The processing system of claim 14, wherein the processor is further configured to: in response to receiving the hint, compare dimensions and a data type of the sequence of instructions with a subset of tensor operations executable by a special-purpose hardware accelerator of the processing system specified by the type of tensor operation indicated by the hint.

19. The processing system of claim 14, wherein the tensor operation is an aggregate instruction comprising a general matrix to matrix multiplication.



Based on the comparison of the above table, indicates that, although claims 1-19 of U.S. Patent No. `486 are not identical, they are not patentably distinct from  claims 1-20 of the current examined application since claims 1-19 of the U.S. Patent No. ‘486 are read on claims 1-20 of the current examined application. 
Accordingly, claims 1-20 of the current examined application are not patentably distinct from claims 1-19 of the U.S. Patent Application No. ‘486 and as such are unpatentable for anticipated-type double patenting. 
Claim Rejections - 35 USC § 103
11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

12.	Claims 1, 2, 7, 8, 14, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable Bruestle et al. (US 20180107456 A1, hereinafter Bruestle) in view of Gan et al., "Tile Percolation: an OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor," European Conference on Parallel Processing, Lecture Notes in Computer Science, vol. 5704. Springer, Berlin, Heidelberg, pp 839-850, University of Delaware, Newark, Delaware 19716, U.S.A., 2009, hereinafter Gan. 
As to claim 1, Bruestle discloses a method comprising: 
in response to receiving  source code to be compiled at a processing system is tileable, determining at a compiler of the processing system whether compiler-generated code comprising a plurality of tiles comprises a sequence of instructions for a tile that can be replaced with a tensor operation executable at a special-purpose hardware accelerator of the processing system – (e.g., the compiler 106 comprise a receiving software component 108, configured to receive source code 216 , for analyzing of constraints to TILE operations, which the constraint including the index variables access a valid entry within a tensor and transform the TILE by re-write the TILE operation– see at least 0018, 0023, 0026, 0032, 0033, 0038-0042, 0093, Fig. 1, and Fig. 2); and 
generating code that replaces the sequence of instructions of the compiler- generated code with the tensor operation in response to determining that the sequence of instructions can be replaced with the tensor operation(e.g., the code generation software component 112 generates computer executable instructions of the rewrite TILE within valid tensor for executed by a target hardware platform 102 – see at least 0017, 0030).  
It is to note that while Bruestle discloses in response to receiving that source code to be compiled at a processing system is tileable, determining at a compiler of the processing system whether compiler-generated code comprising a plurality of tiles comprises a sequence of instructions for a tile that can be replaced with a tensor operation executable at a special-purpose hardware accelerator of the processing system – (e.g., the compiler 106 comprise a receiving software component 108, configured to receive source code 216 , for analyzing of constraints to TILE operations, which the constraint including the index variables access a valid entry within a tensor and transform the TILE by re-write the TILE operation– see at least 0018, 0023, 0026, 0032, 0033, 0038-0042, 0093, Fig. 1, and Fig. 2), but does not explicitly disclose that receiving of an indication within source code . 
However, Gan, in an, analogous art, discloses receiving of an indication within source code of a compiler for generating code on hardware platform device as such,
– (e.g., The format of the tile descriptor is similar to the declaration of a multi-dimensional array variable, except that each of the tile descriptor’s dimension specifier is a 3-tuple, not a singleton. The tile descriptor tells the compiler how the data tile is carved out from the multi-dimensional data array that hosts it. To make the paper easy to follow, we call the multi-dimensional data array that hosts the current data tile as its host array. The tile descriptor contains the complete information of the host array. Therefore, the number of dimension specifiers in the tile descriptor is the same as the dimension of the host array. It is not necessarily the same as the dimension of the data tile… The tile descriptor functions like a template and the associated for loops instantiate this template. To make the code generation easy, currently, a writable tile descriptor (specified in the rw or the wo clause) can only has one instantiation. The read-only tiles (specified in the ro clause) can have multiple instantiations. Example is given in Figure 5. To put it in a simple way, roughly, the percolate directive and the tile directive tell the compiler where the data tiles will be percolated and the tile descriptors tell the compiler how the data tiles are percolated… In this paper, we have proposed a semi-automatic approach to data movement code generation. This novel approach is termed as tile percolation. It provides the programmers with a set of OpenMP-like directives. The programmers can annotate the programs with these directives to tell the compiler where and how data movement should be performed. Accordingly, the compiler will generate the optimized data movement code and the correct computation code based on the information provided in the tile percolation directives. That way, the programmers can save themselves from writing tedious and error-prone data movement code.” – See Gan, at least page 7, paragraphs 1-3, Figures 5-6, page 7, paragraphs 2-3 to page 8, paragraph 1, and page 12, paragraph 3. 

	Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Gan’s teaching of annotation with the tile descriptor directive into compiler of Bruestle for further allowing compiler with generate of optimized code for the target hardware platform and avoid tedious and error-prone code as seen in Gan (e.g., page 12, paragraph 3).
As to claim 2, modified Bruestle with Gan discloses further comprising: generating a wrapper to invoke execution of the tensor operation at a special- purpose hardware accelerator of the processing system – (e.g., incorporate Gan’s teaching of annotation with the tile descriptor directive  including tile size --See Gan, at least page 7, paragraphs 1-3, Figures 5-6, page 7, paragraphs 2-3 to page 8, paragraph 1, and page 12, paragraph 3. into compiler of  Bruestle for further  allowing compiler with generate of optimized code for the target hardware platform and avoid tedious and error-prone code as seen in Gan (e.g., page 12, paragraph 3)).
As to claim 7, modified Bruestle with Gan discloses wherein the tensor operation is an aggregate instruction comprising a general matrix to matrix multiplication—see Bruestle, at least 0032-0033. 
As to claim 8, Bruestle discloses a method comprising: 
 receiving, at a compiler of a processing system, that source code comprises a tile; comparing an inner loop of the tile to tensor operations executable by a special- purpose hardware accelerator of the processing system – (e.g., the compiler 106 comprise a receiving software component 108, configured to receive source code 216 , for analyzing of constraints to TILE operations, which the constraint including the index variables access a valid entry within a tensor and transform the TILE by re-write the TILE operation– see at least 0018, 0023, 0026, 0032, 0033, 0038-0042, 0093, Fig. 1, and Fig. 2); and
 in response to the inner loop of the tile matching a tensor operation executable by the special-purpose hardware accelerator, generating enhanced code that replaces the inner loop of the tile with the tensor operation to invoke the special-purpose hardware accelerator-- (e.g., the code generation software component 112 generates computer executable instructions of the rewrite TILE within valid tensor for executed by a target hardware platform 102 – see at least 0017, 0030).  
 It is to note that while Bruestle does not explicitly disclose; however, Gan, in an, analogous art, discloses receiving, at a compiler of a processing system, an indication within source code, in response to the inner loop of the tile matching a tensor operation executable by hardware platform– 
	(e.g., The format of the tile descriptor is similar to the declaration of a multi-dimensional array variable, except that each of the tile descriptor’s dimension specifier is a 3-tuple, not a singleton. The tile descriptor tells the compiler how the data tile is carved out from the multi-dimensional data array that hosts it. To make the paper easy to follow, we call the multi-dimensional data array that hosts the current data tile as its host array. The tile descriptor contains the complete information of the host array. Therefore, the number of dimension specifiers in the tile descriptor is the same as the dimension of the host array. It is not necessarily the same as the dimension of the data tile… The tile descriptor functions like a template and the associated for loops instantiate this template. To make the code generation easy, currently, a writable tile descriptor (specified in the rw or the wo clause) can only has one instantiation. The read-only tiles (specified in the ro clause) can have multiple instantiations. Example is given in Figure 5. To put it in a simple way, roughly, the percolate directive and the tile directive tell the compiler where the data tiles will be percolated and the tile descriptors tell the compiler how the data tiles are percolated… In this paper, we have proposed a semi-automatic approach to data movement code generation. This novel approach is termed as tile percolation. It provides the programmers with a set of OpenMP-like directives. The programmers can annotate the programs with these directives to tell the compiler where and how data movement should be performed. Accordingly, the compiler will generate the optimized data movement code and the correct computation code based on the information provided in the tile percolation directives. That way, the programmers can save themselves from writing tedious and error-prone data movement code.” – See Gan, at least page 7, paragraphs 1-3, Figures 5-6, page 7, paragraphs 2-3 to page 8, paragraph 1, and page 12, paragraph 3. 

	Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Gan’s teaching of annotation with the tile descriptor directive as template including tile size within loop into compiler of Bruestle for further allowing compiler with generate of optimized code for the target hardware platform and avoid tedious and error-prone code as seen in Gan (e.g., page 12, paragraph 3).
As to claim 14, Bruestle discloses a processing system (server 206, Fig. 2) comprising: a special-purpose hardware accelerator (accelerator 216); and a processor (processor 206 a ) – see at least 0024, Figure 2, and associated text) configured to:
in response to that compiler-generated code executing at the processing system is tileable, determine whether a portion of the compiler-generated code for a tile comprises a sequence of instructions that can be replaced with a tensor operation-- (e.g., the compiler 106 comprise a receiving software component 108, configured to receive source code 216 , for analyzing of constraints to TILE operations, which the constraint including the index variables access a valid entry within a tensor and transform the TILE by re-write the TILE operation– see at least 0018, 0023, 0026, 0032, 0033, 0038-0042, 0093, Fig. 1, and Fig. 2); and
 replace the portion of the compiler-generated code the tensor operation in response to determining that the portion can be replaced with the tensor operation--(e.g., the code generation software component 112 generates computer executable instructions of the rewrite TILE within valid tensor for executing– see at least 0017, 0030).  
It is to note that while Bruestle does not explicitly disclose; however, Gan, in an, analogous art, discloses receiving of an indication of s compiler for generating code on hardware platform device as such,
– (e.g., The format of the tile descriptor is similar to the declaration of a multi-dimensional array variable, except that each of the tile descriptor’s dimension specifier is a 3-tuple, not a singleton. The tile descriptor tells the compiler how the data tile is carved out from the multi-dimensional data array that hosts it. To make the paper easy to follow, we call the multi-dimensional data array that hosts the current data tile as its host array. The tile descriptor contains the complete information of the host array. Therefore, the number of dimension specifiers in the tile descriptor is the same as the dimension of the host array. It is not necessarily the same as the dimension of the data tile… The tile descriptor functions like a template and the associated for loops instantiate this template. To make the code generation easy, currently, a writable tile descriptor (specified in the rw or the wo clause) can only has one instantiation. The read-only tiles (specified in the ro clause) can have multiple instantiations. Example is given in Figure 5. To put it in a simple way, roughly, the percolate directive and the tile directive tell the compiler where the data tiles will be percolated and the tile descriptors tell the compiler how the data tiles are percolated… In this paper, we have proposed a semi-automatic approach to data movement code generation. This novel approach is termed as tile percolation. It provides the programmers with a set of OpenMP-like directives. The programmers can annotate the programs with these directives to tell the compiler where and how data movement should be performed. Accordingly, the compiler will generate the optimized data movement code and the correct computation code based on the information provided in the tile percolation directives. That way, the programmers can save themselves from writing tedious and error-prone data movement code.” – See Gan, at least page 7, paragraphs 1-3, Figures 5-6, page 7, paragraphs 2-3 to page 8, paragraph 1, and page 12, paragraph 3. 

	Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Gan’s teaching of annotation with the tile descriptor directive into compiler of Bruestle for further allowing compiler with generate of optimized code for the target hardware platform and avoid tedious and error-prone code as seen in Gan (e.g., page 12, paragraph 3).
As to claim 15, modified Bruestle with Gan disclose wherein the special-purpose hardware accelerator is configured to execute one or more tensor operations—See Bruestle, at least 0017, 0030). 
	As to claim 20, modified Bruestle with Gan discloses wherein the tensor operation is an aggregate instruction comprising a general matrix to matrix multiplication—see Bruestle, at least 0032-0033. 
Allowable Subject Matter 
13.	Claims 3-6, 9-13, and 16-17, and 19 are objected to as being dependent upon a rejected base claims 1, 8, and 14 respectively, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, as well as further rewritten or amended to overcome the objection (i.e. claim 14), and double patenting rejection as set forth in this Office action.
Conclusion
14.	The prior art made of record and not relied upon (cited on 892 form) is considered pertinent to application disclosure.
	Schmidt (US 20140237460 A1) discloses optimizing compiler includes a vectorization mechanism that optimizes a computer program.

15.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARINA LEE whose telephone number is (571)270-1648.  The examiner can normally be reached on Monday to Friday (8 am to 4: 30 pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S. Sough can be reached on (571)-272-6799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARINA LEE/Primary Examiner, Art Unit 2192