Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

				DETAILED ACTION
1.	This initial office action is based on the application filed on September 29th, 2017, which claims 1-20 have been presented for examination.

Status of Claim
2.	Claims 1-20 are pending in the application and have been examined below, of which, claims 1, 11 and 16 are presented in independent form.

Priority
3.	No priority document has been filed.

Information Disclosure Statement
4.	The information disclosure statement (IDS) submitted on 9/29/2017.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Examiner Notes


 Claim Objections
6.	Claim 14 recites the limitation “the hardware design" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 19 recites the limitation “the profiling blocks" in lines 3-4.  There is insufficient antecedent basis for this limitation in the claim.
Claim 19 recites the limitation “the information" in lines 3-4.  There is insufficient antecedent basis for this limitation in the claim.
Claim 19 does not further limit the subject matter of claim 18.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 17-20 are also rejected under 35 U.S.C 101, since they are depend on claim 16.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

8.	Claims 1-2 and 5-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher et al. (US Patent No. 10,380,313 B1 – herein after Schumacher) in view of Skyler Windh (High-Level Language Tools for Reconfigurable Computing, 2015 – herein after Windh).

Regarding claim 1. 
Schumacher discloses 
A method of using circuit design tools running on computing equipment to implement an integrated circuit (Fig. 1, design tools and col. 1, lines 37-44), the method comprising:
receiving a source code (compilation 120 and 125 receiving host source code 110 and kernel source code 115); 
compiling the source code to generate a hardware description for the integrated circuit (compilation 125 compiles the kernel source code 115. Compiled kernels 135 may be hardware accelerated as kernel circuits implemented within an IC operating as a device of the host processor – See Fig. 1, col. 4, lines 28-54), wherein the hardware description includes profiling blocks; 
simulating the hardware description so that the profiling blocks can gather profiling data (compiled design 108 is run by simulating compiled design 108 using a data processing system.  For example, compiled host 130 may be executed by a host processor or a data processing system such as a simulator adapted to simulate design 108 as if implemented in the HC platform.  In that case, block 125 is capable of generating compiled kernels 135 as executable program code, e.g., object code.  The compiled kernels 135 are adapted to model behavior of the kernel(s) as if hardware accelerated as kernel circuits – See Fig. 1, blocks 145 and 150; Col. 4, lines 54-67 and Col. 5, lines 1-7); 
Schmacher does not disclose
analyzing the profiling data to identify opportunities for hardware optimization;
updating the source code based on the identified opportunities for hardware optimization to generate a smaller hardware design for the integrated circuit.

analyzing the profiling data to identify opportunities for hardware optimization (LegUp provides a built in profiler to help identify computation intensive code regions that are strong candidates for hardware acceleration – See page 398, right column. Simple change to access pattern and order of calculations makes a significant difference for the CPU optimization opportunities – – See page 400 and Fig. 11); and 
updating the source code based on the identified opportunities for hardware optimization to generate a smaller hardware design for the integrated circuit (Simple change to access pattern and order of calculations makes a significant difference for the CPU optimization opportunities. (a) Dilation code in C. Same code was used for the ROCCC compilation; (b) optimized dilation code in C. These optimization, along with data reuse, are applied by the ROCCC compiler reducing the number of memory reads (Table 5). – See page 400 and Fig. 11).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s invention because incorporating Windh’s teaching would enhance Schumacher to enable to guiding the optimizations that can be defined directly in the source code as suggested by Windh (page 394, right column).

Regarding claim 2, the method of claim 1, 
Schumacher discloses
wherein the profiling blocks are configured to monitor data path usage (kernel write utilization of 0.024% on PCIE is low, improve kernel data path or memory read efficiency – See Fig. 4)

Regarding claim 5, the method of claim 1, 
Windh discloses
wherein the profiling blocks are configured to monitor communications channel usage (Design choices like the softcore processor and the communication bus are tightly coupled to these specific boards – See page 397, right column and page 398, left column).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s invention because incorporating Windh’s teaching would enhance Schumacher to enable to optimize of memory channels to better utilize the available bandwidth as suggested by Windh (page 403, right column).

Regarding claim 6, the method of claim 1, 
Schumacher discloses
wherein the profiling blocks are configured to compute branch probabilities (kernel write utilization of 0.042% on PCIE is low – See Fig. 4).

Regarding claim 7, the method of claim 1, 
Schumacher discloses
wherein the profiling blocks are non-intrusively inserted for simulation purposes only (block 125 may instrument the kernels with diagnostic program code that executes as part of compiled kernel(s) 135 to generate profile data 150.  Data collected by execution of design 108 using a simulator, for example, may be stored as profile data 150 in memory – See Col. 5, lines 8-19).

Regarding claim 8, the method of claim 1,
Schumacher discloses
wherein simulating the hardware description comprises simulating the hardware description using a representative set of inputs (compiled design 108 is run by simulating compiled design 108 using a data processing system.  For example, compiled host 130 may be executed by a host processor or a data processing system such as a simulator adapted to simulate design 108 as if implemented in the HC platform – See Fig. 1, Col. 4, lines 61-67 and Col. 5, lines 1-6).

Regarding claim 9, the method of claim 1, 
Windh discloses
further comprising:
using heuristic algorithms to identify useful information required for additional hardware optimizations (This last phase attempts to solve an NP-complete problem using heuristics, such as simulated annealing, and may take hours or days to complete depending on the size of the circuit relative to the device size as well as the timing constraints imposed by the user --page 390 and 391. The impact of various 
gathering the useful information using the profiling blocks (LegUp provides a built in profiler to help identify computation intensive code regions that are strong candidates for hardware acceleration – See page 398, right column).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s invention because incorporating Windh’s teaching would enhance Schumacher to enable to use matching heuristic to handle the binding problem as suggested by Windh (page 398, right column).

Regarding claim 10, the method of claim 1, further comprising:
Schumacher discloses
presenting a user with an opportunity to approve the updating of the source code (the "Not Recommended" portion and the "Recommended" portions of source code may be provided as general examples that are not specific to the user's actual design.  The examples of recommended and not recommended source code illustrate that vector processing should be used.  As noted, in one aspect, the system may provide the guidance option of FIG. 7 and/or the guidance options of FIGS. 4-6 based upon compliance of the design with the profile rules, e.g., which of the profile rules are not being met or complied with… – See Col. 12, line 36-67 and Col. 13, lines 1-30).

9.	Claim 3-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher and Windh as applied to claim 1 above, and further in view of Sadooghi-Alvandi et al. (US Patent No. 9,529,950 B1 –IDS filed on 9/29/2017 – herein after Sadooghi).

Regarding claim 3, the method of claim 1, 
Sadooghi discloses 
wherein the profiling blocks are configured to identify memory loop dependencies (if desired, engine 112 may perform range analysis operations on the emulated device to perform aggressive memory dependence removal (e.g., removal of assumed/conservative code dependencies) – See Col. 11, lines 41-51).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Sadooghi’s teaching into Schumacher’s and Windh’s inventions because incorporating Sadoogh’s teaching would enhance Schumacher and Windh to enable to adjust the logic design for device as suggested by Sadooghi (Col. 11, lines 41-51).

Regarding claim 4, the method of claim 1, 
	Sadooghi disclose
wherein the profiling blocks are configured to monitor memory interface behavior (Engine 72 may monitor the emulated performance of the logic design and may generate emulated profile data 82 that includes performance metric data 
	It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Sadooghi’s teaching into Schumacher’s and Windh’s inventions because incorporating Sadoogh’s teaching would enhance Schumacher and Windh to enable to performance metric data characterizing the performance/behavior of the emulated logic design as suggested by Sadooghi (Col. 9, lines 1-23).

10.	Claims 11 and 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher et al. (US Patent No. 10,380,313 B1 – herein after Schumacher) in view of Koh et al. (US Patent No. 9,442,696 B1 – herein after Koh).

Regarding claim 11. 
Schumacher discloses 
An integrated circuit (an integrated circuit (IC) – See paragraph [0004]), comprising:
circuitry having a given hardware design (A circuit design specifying the kernel circuit, for example, may be loaded into a programmable IC thereby implementing the kernel as a kernel circuit in hardware. – See Col. 1, lines 37-52); and 
a profiling block that is coupled to the circuitry (Block 125 is also capable of including monitor circuitry within compiled kernels 135.  The monitor circuitry collects and/or processes data from operation of compiled design 108 to generate profile data 150.  For example, the monitor circuitry may capture start and stop times of kernel circuit operation, information about data transfers to and from kernel circuits, bus transactions, and so forth) – See Col. 4, lines 54-60) and that is used to gather data on the circuitry (run compiled design 145 and profile data 155), 
Schumacher does not disclose
wherein the given hardware design of the circuitry is updated based on the data gathered by the profiling block.
Koh discloses
wherein the given hardware design of the circuitry is updated based on the data gathered by the profiling block (Based on the profiling results, the user may change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code.  For example, improved execution speed may include, but is not limited to, increasing execution speed, minimizing memory usage, minimizing power consumption, improving load distribution across computational devices, minimizing power consumption, minimizing communication among the computational devices, minimizing latency at various computing devices, etc. For example, a user may change an allocation scheme to achieve faster execution speeds for an application and/or to better meet application design constraints. – See Col. 9, lines 35-58 and col. 11, lines 39-67 and col. 12, lines 1-5).


Regarding claim 13, the integrated circuit of claim 11, 
Koh discloses
wherein the updated hardware design is faster than the given hardware design (a user may change an allocation scheme to achieve faster execution speeds for an application and/or to better meet application design constraints – See Col. 9, lines 35-58).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code as suggested by Koh (Col. 9, lines 35-58).

Regarding claim 14, the integrated circuit of claim 11, 
Schumacher discloses
wherein the hardware design is verified by feeding a representative set of input values to the circuitry (the IC used for hardware acceleration may provide set infrastructure to which the hardware accelerated kernel may couple.  This infrastructure 

Regarding claim 15, the integrated circuit of claim 11, 
Koh discloses 
further comprising:
an in-hardware verification (the user may change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code– See Col. 9, lines 35-58) and profile- guided hardware optimization circuit that is configured to optimize the given hardware design to generate the updated hardware design (a user may change an allocation scheme to achieve faster execution speeds for an application and/or to better meet application design constraints – (Col. 9, lines 35-58).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable to improve execution efficiency of the code to the dynamic allocation scheme as suggested by Koh (Col. 9, lines 35-58).

Regarding claim 16. 

A non-transitory computer-readable storage medium comprising instructions for:
receiving a software code for implementing an integrated circuit (receive kernel source code – See Fig. 1, block 115); 
compiling the software code to output a hardware code (compilation 125 compiles kernel source code 115 – See Fig. 1; compiled host 130 may be executed by the host processor of the HC platform.  Compiled kernels 135 may be hardware accelerated as kernel circuits implemented within an IC operating as a device of the host processor – See col., 4, lines 40-54); 
simulating the hardware code to obtain simulation results (compiled design 108 is run by simulating compiled design 108 using a data processing system.  For example, compiled host 130 may be executed by a host processor or a data processing system such as a simulator adapted to simulate design 108 as if implemented in the HC platform.  In that case, block 125 is capable of generating compiled kernels 135 as executable program code, e.g., object code.  The compiled kernels 135 are adapted to model behavior of the kernel(s) as if hardware accelerated as kernel circuits – See Fig. 1, block 108, 145 and col. 4, lines 61-67 and col. 5, lines 1-7.  Block 120 may instrument compiled host 130 in order to generate profile data 150 relating to operation of compiled host 130.  Thus, profile data 150 may include data relating to operation of the kernel portion of design 105 and/or operation of the host portion of design 105 – Col. 5, lines 8-19); and 
annotating the software code based on the simulation results (the system is capable of listing the portion, or portions, of the source code found to violate a particular profile rule and highlight the portion that is found to violate the profile rule – See Col. 11, lines 63-67 and Col. 12, lines 1-6.  Data collected by execution of design 108 using a simulator, for example, may be stored as profile data 150 in memory – See Col. 5, lines 7-19).
Schumacher does not disclose
wherein the annotated software code implements a faster and smaller design for the integrated circuit.
Koh discloses
wherein the annotated software code implements a faster and smaller design for the integrated circuit (The execution efficiency of the code may include, but is not limited to, increasing execution speed, minimizing memory usage, minimizing power consumption, improving load distribution across computational devices, minimizing power consumption, minimizing communication among the computational devices, etc. For example, the user may change the allocation scheme of application components 752, 754, 756 to computational devices 706, 708, 710, 712 using user interface 724 via input device 722 for faster execution or to better meet application design constraints … execution-time statistics may be provided in the co-simulation design environment and back-annotated to corresponding application components – See Col. 15, lines 64-67 and 1-12).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention .

11.	Claims 12 and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher and Koh as applied to claims 11 and 16 respectively above, and further in view of Skyler Windh (High-Level Language Tools for Reconfigurable Computing, 2015 – herein after Windh).

Regarding claim 12, the integrated circuit of claim 11, 
Windh discloses
wherein the updated hardware design uses fewer logic resources than the given hardware design (on a CPU…the ability to configure local customized storage on the FPGA makes it possible to reduce the number of memory accesses, mostly reads, by reusing already fetched data resulting in a more efficient use of the memory bandwidth and lower energy consumption per task– See page 392, left column).
	It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s and Koh’s inventions because incorporating Windh’s teaching would enhance Schumacher and Koh to enable to reduce the number of memory accesses as suggested by Windh (page 392, left column).

Regarding claim 17, the non-transitory computer-readable storage medium of claim 16, 
Windh discloses
further comprising instructions for:
using heuristic algorithms to identify potentially useful information required for more aggressive hardware optimizations (Similar to several other tools, LegUp is based on the LLVM compiler framework. The impact of various LLVM optimizations on the performance of the generated hardware structures is explored in [41]. Extra passes are added to LLVM for HLS and work in three phases: allocation, scheduling, and binding. The allocation stage determines the available hardware based on the target architecture and manages the application’s constraints like clock speed and power consumption…matching heuristic is used to handle the binding problem – See page 398, right column).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s and Koh’s inventions because incorporating Windh’s teaching would enhance Schumacher and Koh to enable to use matching heuristic to handle the binding problem as suggested by Windh (page 398, right column).

Regarding claim 18, the non-transitory computer-readable storage medium of claim 17, 
Schumacher discloses
inserting non-intrusive profiling hooks into the software code to gather the potentially useful information (block 125 may instrument the kernels with diagnostic program code that executes as part of compiled kernel(s) 135 to generate profile data 150.  Data collected by execution of design 108 using a simulator, for example, may be stored as profile data 150 in memory – See Col. 5, lines 8-19).

Regarding claim 19, the non-transitory computer-readable storage medium of claim 18, further comprising instructions for:
Koh discloses
analyzing the information gathered by the profiling blocks to identify additional opportunities for hardware optimization (allow profiling results, i.e. run-time statistics or execution-time statistics of various static and dynamic allocation schemes to be considered and used.  For example, relevant run-time or execution-time statistics, such as computational load, observed latency, memory usage, power consumption, etc., may be streamed back to the co-simulation design environment from an HTE in real time, (i.e. while the code for the application components is executing on the HTE). – See col. 9, lines 26-57).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable chang dynamic allocation scheme to improve execution efficiency of the code based on profiling result as suggested by Koh (See col. 9, lines 26-57).

Regarding claim 20, the non-transitory computer-readable storage medium of claim 19, 
Koh discloses
further comprising instructions for:
allowing a user to approve a suggested hardware optimization selected from the additional opportunities for hardware optimization (the user may change dynamic allocation scheme …improving load distribution across computing devices…better meet application design constraints– col. 9, lines 26-57).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable to change dynamic scheme by user as suggested by Koh (col. 9, lines 26-57).

Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kachmar et al. (US Patent No. 9,569,179 B1) discloses the method may include determining, by the device and based on the profiling information, that the performance criteria is not satisfied.  The method may include applying, by the device and based on determining that the performance criteria is not satisfied, a modification technique to a portion of the model to create a modified model.  The modification technique may be applied to the portion of the model to cause program code, generated based on the modified model, to satisfy the performance criteria – See Abstract).
Kadiyala et al. (US Pub. No. 2012/0185820 A1) discloses automatically generating a target profiler using a profiler generator; iteratively generating a new processor architecture by changing one or more parameters of the processor architecture until all user constraints or requirements are met using the generated target compiler, assembler, linker, simulator, and profiler; for each new processor architecture regenerating the target compiler, assembler, linker, simulator, profiler for the new processor architecture; and synthesizing an optimal generated processor architecture into a computer readable description of the custom integrated circuit for semiconductor fabrication – See Abstract).
Van Eijndhoven et al. (US Pub. No. 2012/0144376 A1) discloses a design point 3002 in the design space view 3001 exists.  During the transform process 2000, in particular as a result of the execution of optimization process 2300, new design alternatives 2399 are being generated.  Each of these alternatives is assigned a new design point 3002 and is added to the design space view 3001 – See paragraph [0251]).
Aubury (US Pub. No. 2003/0140337 A1) discloses a source program is compiled to a platform-independent bytecode.  The program is executed.  Note that the program passes data implicitly using pointers.  Accesses to memory are traced for generating a trace.  The trace is analyzed.  Memory use profile data is generated based on the trace – See Abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MONGBAO NGUYEN whose telephone number is (571)270-7180.  The examiner can normally be reached on Monday-Friday 8am-5pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S. Sough can be reached on 571-272-6799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MONGBAO NGUYEN/           Examiner, Art Unit 2192