Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

				DETAILED ACTION
Status of Claims
1.	Applicant’s amendment dated August 16th, 2021 responding to the Office Action April 14th, 2021 provided in the rejection of claims 1-20.
2.	Claims 16-20 are amended 
3.	Claims 1-20 are pending in the application, of which claims 1, 11 and 16 are in independent form and which have been fully considered by the examiner.
4.	Claim 20 is objected to.

Response to Amendments
5.	(A) Claim objections: Claim objections raised in previous office action are withdrawn in view of Applicants’ amendments.
(B) Regarding to 101 rejection: 101 rejection raised in previous office action are withdrawn in view of Applicants’ amendments.
(C) Regarding art rejection: In regards to claims 1-20, Applicants’ arguments are not persuasive; therefore, rejections to the claims 1-20 have been maintained as below. 

6.	Answers to Applicants’ Arguments:

Applicants’ arguments have been fully considered but they are no persuasive respectively.
a.	Argument:  Applicant argues:
The cited references, taken alone or in hypothetical combination, do not teach or suggest using profiling blocks that gather and analyze profiling data, and using the profiling data to identify opportunities for hardware optimization (e.g., generating a smaller hardware design for an integrated circuit). Instead, the cited references merely appear to discuss hardware acceleration (e.g., offloading software tasks for processing by hardware) or software optimization (e.g., making software more efficient) – Remarks, page 7.

Answer:  Examiner respectfully disagrees because:
1.	The teaching of Schumacher:
Schumacher discloses a circuit design specifying the kernel circuit, for example, may be loaded into a programmable IC thereby implementing the kernel as a kernel circuit in hardware – See Col. 1, lines 37-44 and profile blocks and profile data, profile rule checking and the profile rule can specify a design requirement for a hardware accelerated implementation of the kernel – See Fig. 1 and Col. 1, lines 56-65 and col. 2, lines 1-15.  Examiner respectfully notes that Schumacher discloses using profiling blocks that gather and analyze profiling data (The profile rule can specify a design requirement for a hardware accelerated implementation of the kernel – please see col. 2, lines 3-15; profile data that may be collected can include kernel read size (for data), 

2.	The teaching of Windh:
Windh discloses there are several compiler optimizations that can be applied to OpenCL code: kernel vectorization, static memory coalescing, generating multiple compute units, and loop unrolling – please see page 395.  Windh further discloses Optimizations are applied in two locations: For hardware specific optimizations (e.g., pipelining) the user needs to create a .tcl script. Common software optimizations (e.g., loop-unrolling) can be passed directly to the LLVM compiler through the makefile.  LegUp then generates the necessary accelerators in Verilog and the application is recompiled to insert the necessary hardware calls.  LegUp provides a built in profiler to help identify computation intensive code regions that are strong candidates for hardware acceleration – See pages 398-399.  Therefore, Windh discloses using profiling blocks that gather and analyze profiling data, and using the profiling data to identify opportunities for hardware optimization

3.	The teaching of Sadooghi-Alvandi:
Sadooghi-Alvandi discloses the first compiler may process the emulated profile data to identify optimizations to perform on the logic circuit and may compile an 

4.	The teaching of Koh:
Koh discloses the allocation of a component to a specific computational device may be changed multiple times while the application executes (i.e., the allocation may change over time) – See col. 4, lines 15-36. Koh further discloses a portion of the run-time or execution-time statistics may be provided to a user in various graphical and/or textual formats, if desired.  Based on the profiling results, the user may change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code.  For example, improved execution speed may include, but is not limited to, increasing execution speed, minimizing memory usage, minimizing power consumption, improving load distribution across computational devices, minimizing power consumption, minimizing communication among the computational devices, minimizing latency at various computing devices, etc. – See col. 9, lines 35-58.

A.	Regarding the hardware optimization –Remark, page 8.  Examiner respectfully notes that independent claim 1 recites hardware optimization and Windh discloses hardware optimization (Optimizations are applied in two locations: For hardware specific optimizations and Common software optimizations – See page 398.  The independent claim 11 recites the given hardware design of circuitry is updated and 

b.	Applicants argued:  
Schumacher and Windh do not teach or suggest simulating a hardware description using profiling blocks that gather and analyze profiling data, and using the profiling data to identify opportunities for hardware optimization, as generally recited in independent claim 1 – Remarks, page 9.

Answer:  Examiner respectfully disagrees because:
Schumacher discloses the profile data accessed from the memory with the profile rule.  The profile rule specifies a design requirement for a hardware accelerated implementation of the kernel – See col. 1, lines 56-67 and col. 2, lines 1-32 and col. 4, lines 54-67 and col. 5, lines 1-19.
Windh discloses AOC provides an OpenCL emulator to simulate the behaviour of the OpenCL kernel program. The emulated kernel is used as a dynamically linked C++ library that can be called from a host program – please see page 396.  A built in profiler to help identify computation intensive code regions that are strong candidates for hardware acceleration… Optimizations are applied in two locations: For hardware specific optimizations and optimizations are applied in two locations: For hardware specific optimizations – See page 398.
simulating a hardware description using profiling blocks that gather (Schumarcher teaches the compiled kernels 135 are adapted to model behavior of the kernel(s) as if hardware accelerated as kernel circuits.  Accordingly, the compiled kernels 135 are executable program code that may be executed by the data processing system as part of a simulation.  The simulator is capable of monitoring operation of the simulated elements of design 105 – See col. 4, lines 54-67 and col. 5, lines 1-19) and analyze profiling data (in evaluating the profile rule and comparing the profile rule to the profile data, may determine that the host processor transfers 1 MB of data to global memory in 1 kB chunks so that the kernel may access the data – col. 9, lines 4-47.  Profile data in a memory, wherein the profile data is generated from running the design for the HC platform and wherein the design includes a kernel adapted for hardware acceleration – See col. 16, lines 50-61 and col. 18, lines 3-19), Schumacher discloses guidance options 165 provide instruction on optimizing design 105 to improve performance.  Guidance options 165 may be correlated with particular ones of profile rules 160, which may include the source code analysis rules – See col. 5, lines 63-67 and col. 6, lines 1-6.  Examiner respectfully notes that Schumacher discloses simulating a hardware description using profiling blocks that gather and analyze profiling data and using the profiling data to identify the optimizing design to improve performance.  Windh discloses we survey five High-Level Language tools for the development of FPGA programs: Xilinx Vivado, Altera OpenCL, BluespecBSV, ROCCC, and LegUp to provide an overview of their tool flow, the optimizations they provide, and a qualitative analysis of their hardware implementations of high level code – See abstract, page 390.  A built in profiler to help identify using the profiling data to identify opportunities for hardware optimization (build in profiler to help identify the strong candidates hardware…the entire design can be simulated to verify correctness/optimization software and hardware).  Examiner further notes that 
c. KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 420, 82 USPQ2d 1385, 1397 (2007). This does not require that the reference be from the same field of endeavor as the claimed invention, in light of the Supreme Court's instruction that "[w]hen a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one." Id. at 417, 82 USPQ2d 1396. Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention). See Bigio, 381 F.3d at 1325, 72 USPQ2d at 1212-See MPEP 2141.01 (a).
Schumacher and Windh, both disclose build in profiler/profiling rule/data to correctness/optimize the design.  Schumacher discloses using profiling data to identify software optimization and Windh discloses using profiling data to identify opportunities for software and hardware optimization.  Therefore, as set forth in previous Office 
c.	Applicants argued:  
Schumacher and Koh do not teach or suggest updating a hardware design of a circuitry based on data gathered by a profile block coupled to the circuitry, as generally recites in independent claim 11 – Remarks, page 11.

Answer:  Examiner respectfully disagrees because:
Schumacher discloses the design includes a kernel adapted for hardware acceleration – See col. 2, lines 3-15.  The kernels are compiled into circuitry that is implemented within an IC.  A kernel circuit, referring to a circuit implementation of a kernel, is functionally equivalent to an executable version of the kernel.  A circuit design specifying the kernel circuit, for example, may be loaded into a programmable IC thereby implementing the kernel as a kernel circuit in hardware – See col. 1, lines 37-52 and col. 2, lines 3-32.  Monitor circuitry within compiled kernels 135.  The monitor circuitry collects and/or processes data from operation of compiled design 108 to generate profile data 150 – See col. 4, lines 54-60 and col. 5, lines 1-7.  
Koh discloses profiling results for DSP 708 indicate that the code coverage is 87% and the power consumption associated with FPGA 712 is 50 mW.  While modified allocation scheme 760 alters which application components 752, 754, 756 are executed on which computational devices 706, 708, 710, 712, modified allocation scheme 760 does not stop execution of application components 752, 754, 756.  Modifying mapping of application components 752, 754, 756 from allocation scheme 720 to allocation a profiling block that is coupled to the circuitry and that is used to gather data on the circuitry (Schumacher discloses the design includes a kernel adapted for hardware acceleration – See col. 1, lines 62-67 and Schumacher further discloses a memory adapted to store profile data generated from running the design for the HC platform, wherein the design includes a kernel adapted for hardware acceleration…profile rule –col. 2, lines 16-32). Koh discloses 
wherein the given hardware design of the circuitry is updated based on the data gathered by the profiling block (An instance mapper 156 and an architecture mapper 157 can create a simulation of the target application with simulator & validator 158 that can operate to measure the impact the patch will have on each of the affected software applications and hardware – See col. 10, lines 54-65. The allocation of a component to a specific computational device may be changed multiple times while the application executes (i.e., the allocation may change over time – See col. 4, lines 15-35.  The allocation scheme of allocator 518 may be varied during the execution of application 508 – See Col. 12, lines 5-6.  Modifying an allocation scheme as illustrated in FIGS. 7A-7F provides improvements, such as but not limited to, improved load balancing on the HTE, faster execution of the application, improved processing 

d.	Applicant argued: 
Schumacher and Koh do not teach or suggest simulating hardware code to obtain simulation results and annotating the software code based on the simulation results to implement a faster and smaller design for an integrated circuits, as generally recited in independent claim 16 – Remarks, page 7

Answer:
Examiner respectfully disagrees:
Schumacher discloses the simulator is capable of monitoring operation of the simulated elements of design 105 – See col. 4, lines 61-67 and col. 5, lines 1-5.  Data collected by execution of design 108 using a simulator, for example, may be stored as profile data 150 in memory – See col. 5 lines 8-19.  Schumacher further discloses the guidance option may provide a specific example or correction of source code of the design.  For example, the system is capable of listing the portion, or portions, of the source code found to violate a particular profile rule and highlight the portion that is found to violate the profile rule – See Col. 11, lines 63-67 and col. 12, lies 1-6.  Therefore, simulating hardware code to obtain simulation results (Data collected by execution of design 108 using a simulator) and annotating the software code based on the simulation results (source code found to violate a particular profile rule and highlight the portion that is found to violate the profile rule).  
Koh discloses the execution efficiency of the code may include, but is not limited to, increasing execution speed, minimizing memory usage, minimizing power consumption, improving load distribution across computational devices, minimizing power consumption, minimizing communication among the computational devices, etc. …execution-time statistics may be provided in the co-simulation design environment and back-annotated to corresponding application components — See Col. 15, lines 64-67 and 1-12.  Therefore, Koh discloses wherein the annotated software code implements a faster and smaller design for the integrated circuit (execution 
Schumacher discloses simulating hardware code to obtain simulation results such that data collected by execution of design 108 using a simulator and annotating the software code based on the simulation results such that source code found to violate a particular profile rule and highlight the portion that is found to violate the profile rule and Koh discloses simulating hardware code to obtain simulation results such that simulators of computational devices, including a hardware description language (HDL) simulator, SystemC simulator or distributed simulation– See col. 2, lines 59-67 and col. 3, lines 1-2; annotation or select efficiency of code implement the executing speed and minimizing memory.  Therefore, as set forth in previous Office Action, the combination Schumacher and Koh are properly teach the limitations of independent claim 16.

Examiner Notes
6.	Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	Claims 1-2 and 5-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher et al. (US Patent No. 10,380,313 B1—art of record – herein after Schumacher) in view of Skyler Windh (High-Level Language Tools for Reconfigurable Computing, 2015 –art of record -- herein after Windh).

Regarding claim 1. 
Schumacher discloses 
A method of using circuit design tools running on computing equipment to implement an integrated circuit (Fig. 1, design tools and col. 1, lines 37-44), the method comprising:
receiving a source code (compilation 120 and 125 receiving host source code 110 and kernel source code 115); 
compiling the source code to generate a hardware description for the integrated circuit (compilation 125 compiles the kernel source code 115. Compiled kernels 135 may be hardware accelerated as kernel circuits implemented within an IC 
simulating the hardware description so that the profiling blocks can gather profiling data (compiled design 108 is run by simulating compiled design 108 using a data processing system.  For example, compiled host 130 may be executed by a host processor or a data processing system such as a simulator adapted to simulate design 108 as if implemented in the HC platform.  In that case, block 125 is capable of generating compiled kernels 135 as executable program code, e.g., object code.  The compiled kernels 135 are adapted to model behavior of the kernel(s) as if hardware accelerated as kernel circuits – See Fig. 1, blocks 145 and 150; Col. 4, lines 54-67 and Col. 5, lines 1-7); 
Schmacher does not disclose
analyzing the profiling data to identify opportunities for hardware optimization;
updating the source code based on the identified opportunities for hardware optimization to generate a smaller hardware design for the integrated circuit.
Windh discloses
analyzing the profiling data to identify opportunities for hardware optimization (we survey five High-Level Language tools for the development of FPGA programs: Xilinx Vivado, Altera OpenCL, BluespecBSV, ROCCC, and LegUp to provide an overview of their tool flow, the optimizations they provide, and a qualitative analysis of their hardware implementations of high level code – See abstract, page 390.  A built in profiler to help identify computation intensive code regions that are strong candidates for hardware acceleration – See page 398, right column.  The allocation stage 
updating the source code based on the identified opportunities for hardware optimization to generate a smaller hardware design for the integrated circuit (Simple change to access pattern and order of calculations makes a significant difference for the CPU optimization opportunities. (a) Dilation code in C. Same code was used for the ROCCC compilation; (b) optimized dilation code in C. These optimization, along with data reuse, are applied by the ROCCC compiler reducing the number of memory reads (Table 5). – See page 400 and Fig. 11.  Optimizations are applied in two locations: For hardware specific optimizations and software optimizations – See page 398).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s invention because incorporating Windh’s teaching would enhance Schumacher to enable to guiding the optimizations that can be defined directly in the source code and apply optimizations for hardware specific optimization and software optimizations as suggested by Windh (pages 394 and 398, right column).

Regarding claim 2, the method of claim 1, 
Schumacher discloses
wherein the profiling blocks are configured to monitor data path usage (kernel write utilization of 0.024% on PCIE is low, improve kernel data path or memory read efficiency – See Fig. 4)

Regarding claim 5, the method of claim 1, 
Windh discloses
wherein the profiling blocks are configured to monitor communications channel usage (Design choices like the softcore processor and the communication bus are tightly coupled to these specific boards – See page 397, right column and page 398, left column).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s invention because incorporating Windh’s teaching would enhance Schumacher to enable to optimize of memory channels to better utilize the available bandwidth as suggested by Windh (page 403, right column).

Regarding claim 6, the method of claim 1, 
Schumacher discloses
wherein the profiling blocks are configured to compute branch probabilities (kernel write utilization of 0.042% on PCIE is low – See Fig. 4).

Regarding claim 7, the method of claim 1, 
Schumacher discloses
wherein the profiling blocks are non-intrusively inserted for simulation purposes only (block 125 may instrument the kernels with diagnostic program code that executes as part of compiled kernel(s) 135 to generate profile data 150.  Data collected by execution of design 108 using a simulator, for example, may be stored as profile data 150 in memory – See Col. 5, lines 8-19).

Regarding claim 8, the method of claim 1,
Schumacher discloses
wherein simulating the hardware description comprises simulating the hardware description using a representative set of inputs (compiled design 108 is run by simulating compiled design 108 using a data processing system.  For example, compiled host 130 may be executed by a host processor or a data processing system such as a simulator adapted to simulate design 108 as if implemented in the HC platform – See Fig. 1, Col. 4, lines 61-67 and Col. 5, lines 1-6).

Regarding claim 9, the method of claim 1, 
Windh discloses
further comprising:
using heuristic algorithms to identify useful information required for additional hardware optimizations (This last phase attempts to solve an NP-complete problem using heuristics, such as simulated annealing, and may take hours or days to complete depending on the size of the circuit relative to the device size as well as the timing constraints imposed by the user --page 390 and 391. The impact of various 
gathering the useful information using the profiling blocks (LegUp provides a built in profiler to help identify computation intensive code regions that are strong candidates for hardware acceleration – See page 398, right column).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s invention because incorporating Windh’s teaching would enhance Schumacher to enable to use matching heuristic to handle the binding problem as suggested by Windh (page 398, right column).

Regarding claim 10, the method of claim 1, further comprising:
Schumacher discloses
presenting a user with an opportunity to approve the updating of the source code (the "Not Recommended" portion and the "Recommended" portions of source code may be provided as general examples that are not specific to the user's actual design.  The examples of recommended and not recommended source code illustrate that vector processing should be used.  As noted, in one aspect, the system may provide the guidance option of FIG. 7 and/or the guidance options of FIGS. 4-6 based upon compliance of the design with the profile rules, e.g., which of the profile rules are not being met or complied with… – See Col. 12, line 36-67 and Col. 13, lines 1-30).

8.	Claim 3-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher and Windh as applied to claim 1 above, and further in view of Sadooghi-Alvandi et al. (US Patent No. 9,529,950 B1 –IDS filed on 9/29/2017 – herein after Sadooghi).

Regarding claim 3, the method of claim 1, 
Sadooghi discloses 
wherein the profiling blocks are configured to identify memory loop dependencies (if desired, engine 112 may perform range analysis operations on the emulated device to perform aggressive memory dependence removal (e.g., removal of assumed/conservative code dependencies) – See Col. 11, lines 41-51).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Sadooghi’s teaching into Schumacher’s and Windh’s inventions because incorporating Sadoogh’s teaching would enhance Schumacher and Windh to enable to adjust the logic design for device as suggested by Sadooghi (Col. 11, lines 41-51).

Regarding claim 4, the method of claim 1, 
	Sadooghi disclose
wherein the profiling blocks are configured to monitor memory interface behavior (Engine 72 may monitor the emulated performance of the logic design and may generate emulated profile data 82 that includes performance metric data 
	It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Sadooghi’s teaching into Schumacher’s and Windh’s inventions because incorporating Sadoogh’s teaching would enhance Schumacher and Windh to enable to performance metric data characterizing the performance/behavior of the emulated logic design as suggested by Sadooghi (Col. 9, lines 1-23).

9.	Claims 11 and 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher et al. (US Patent No. 10,380,313 B1 – herein after Schumacher) in view of Koh et al. (US Patent No. 9,442,696 B1 – herein after Koh).

Regarding claim 11. 
Schumacher discloses 
An integrated circuit (an integrated circuit (IC) – See paragraph [0004]), comprising:
circuitry having a given hardware design (A circuit design specifying the kernel circuit, for example, may be loaded into a programmable IC thereby implementing the kernel as a kernel circuit in hardware. – See Col. 1, lines 37-52); and 
a profiling block that is coupled to the circuitry (Block 125 is also capable of including monitor circuitry within compiled kernels 135.  The monitor circuitry collects and/or processes data from operation of compiled design 108 to generate profile data 150.  For example, the monitor circuitry may capture start and stop times of kernel circuit operation, information about data transfers to and from kernel circuits, bus transactions, and so forth) – See Col. 4, lines 54-60) and that is used to gather data on the circuitry (run compiled design 145 and profile data 155), 
Schumacher does not disclose
wherein the given hardware design of the circuitry is updated based on the data gathered by the profiling block.
Koh discloses
wherein the given hardware design of the circuitry is updated based on the data gathered by the profiling block (Based on the profiling results, the user may change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code.  For example, improved execution speed may include, but is not limited to, increasing execution speed, minimizing memory usage, minimizing power consumption, improving load distribution across computational devices, minimizing power consumption, minimizing communication among the computational devices, minimizing latency at various computing devices, etc. For example, a user may change an allocation scheme to achieve faster execution speeds for an application and/or to better meet application design constraints. – See Col. 9, lines 35-58 and col. 11, lines 39-67 and col. 12, lines 1-5.  An instance mapper 156 and an architecture mapper 157 can create a simulation of the target application with simulator & validator 158 that can 
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable to change  allocation scheme such that update or change specific computational device or GPU as 

Regarding claim 13, the integrated circuit of claim 11, 
Koh discloses
wherein the updated hardware design is faster than the given hardware design (a user may change an allocation scheme to achieve faster execution speeds for an application and/or to better meet application design constraints – See Col. 9, lines 35-58).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code as suggested by Koh (Col. 9, lines 35-58).

Regarding claim 14, the integrated circuit of claim 11, 
Schumacher discloses
wherein the hardware design is verified by feeding a representative set of input values to the circuitry (the IC used for hardware acceleration may provide set infrastructure to which the hardware accelerated kernel may couple.  This infrastructure may include a predetermined or fixed data bus, I/O interfaces, memory interfaces (e.g., memory controllers), etc…the source code analysis rule may check whether the data transfer function uses vectors so that rather than requesting a single 32-bit word of data 

Regarding claim 15, the integrated circuit of claim 11, 
Koh discloses 
further comprising:
an in-hardware verification (the user may change the co-simulation static or dynamic allocation scheme to improve execution efficiency of the code– See Col. 9, lines 35-58) and profile- guided hardware optimization circuit that is configured to optimize the given hardware design to generate the updated hardware design (a user may change an allocation scheme to achieve faster execution speeds for an application and/or to better meet application design constraints – (Col. 9, lines 35-58).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable to improve execution efficiency of the code to the dynamic allocation scheme as suggested by Koh (Col. 9, lines 35-58).

Regarding claim 16. 
Schumacher discloses
A non-transitory computer-readable storage medium comprising computer-executable instructions that, when executed, cause one or more processors to:
receive a software code for implementing an integrated circuit (receive kernel source code – See Fig. 1, block 115); 
compile the software code to output a hardware code (compilation 125 compiles kernel source code 115 – See Fig. 1; compiled host 130 may be executed by the host processor of the HC platform.  Compiled kernels 135 may be hardware accelerated as kernel circuits implemented within an IC operating as a device of the host processor – See col., 4, lines 40-54); 
simulate the hardware code to obtain simulation results (compiled design 108 is run by simulating compiled design 108 using a data processing system.  For example, compiled host 130 may be executed by a host processor or a data processing system such as a simulator adapted to simulate design 108 as if implemented in the HC platform.  In that case, block 125 is capable of generating compiled kernels 135 as executable program code, e.g., object code.  The compiled kernels 135 are adapted to model behavior of the kernel(s) as if hardware accelerated as kernel circuits – See Fig. 1, block 108, 145 and col. 4, lines 61-67 and col. 5, lines 1-7.  Block 120 may instrument compiled host 130 in order to generate profile data 150 relating to operation of compiled host 130.  Thus, profile data 150 may include data relating to operation of the kernel portion of design 105 and/or operation of the host portion of design 105 – Col. 5, lines 8-19); and 
annotate the software code based on the simulation results (the system is capable of listing the portion, or portions, of the source code found to violate a particular profile rule and highlight the portion that is found to violate the profile rule – See Col. 11, lines 63-67 and Col. 12, lines 1-6.  Data collected by execution of design 108 using a 
Schumacher does not disclose
wherein the annotated software code implements a faster and smaller design for the integrated circuit.
Koh discloses
wherein the annotated software code implements a faster and smaller design for the integrated circuit (The execution efficiency of the code may include, but is not limited to, increasing execution speed, minimizing memory usage, minimizing power consumption, improving load distribution across computational devices, minimizing power consumption, minimizing communication among the computational devices, etc. For example, the user may change the allocation scheme of application components 752, 754, 756 to computational devices 706, 708, 710, 712 using user interface 724 via input device 722 for faster execution or to better meet application design constraints … execution-time statistics may be provided in the co-simulation design environment and back-annotated to corresponding application components – See Col. 15, lines 64-67 and 1-12).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable to display statistic by annotating or notifying to the corresponding application components in the co-simulation or simulation design environment as suggested by Koh (Col. 18, lines 40-57).

10.	Claims 12 and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher and Koh as applied to claims 11 and 16 respectively above, and further in view of Skyler Windh (High-Level Language Tools for Reconfigurable Computing, 2015 – herein after Windh).

Regarding claim 12, the integrated circuit of claim 11, 
Windh discloses
wherein the updated hardware design uses fewer logic resources than the given hardware design (on a CPU…the ability to configure local customized storage on the FPGA makes it possible to reduce the number of memory accesses, mostly reads, by reusing already fetched data resulting in a more efficient use of the memory bandwidth and lower energy consumption per task– See page 392, left column).
	It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s and Koh’s inventions because incorporating Windh’s teaching would enhance Schumacher and Koh to enable to reduce the number of memory accesses as suggested by Windh (page 392, left column).

Regarding claim 17, the non-transitory computer-readable storage medium of claim 16, further comprising computer-executable instructions that, when executed, cause the one or more processors to:
Windh discloses

using heuristic algorithms to identify potentially useful information required for more aggressive hardware optimizations (Similar to several other tools, LegUp is based on the LLVM compiler framework. The impact of various LLVM optimizations on the performance of the generated hardware structures is explored in [41]. Extra passes are added to LLVM for HLS and work in three phases: allocation, scheduling, and binding. The allocation stage determines the available hardware based on the target architecture and manages the application’s constraints like clock speed and power consumption…matching heuristic is used to handle the binding problem – See page 398, right column).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Windh’s teaching into Schumacher’s and Koh’s inventions because incorporating Windh’s teaching would enhance Schumacher and Koh to enable to use matching heuristic to handle the binding problem as suggested by Windh (page 398, right column).

Regarding claim 18, the non-transitory computer-readable storage medium of claim 17, further comprising computer-executable instructions that, when executed, cause the one or more processors to:
Schumacher discloses
inserting non-intrusive profiling hooks into the software code to gather the potentially useful information (block 125 may instrument the kernels with diagnostic program code that executes as part of compiled kernel(s) 135 to generate profile data 

Regarding claim 19, the non-transitory computer-readable storage medium of claim 18, further comprising computer-executable instructions that, when executed, cause the one or more processors to:
Koh discloses
analyzing the information gathered by the profiling blocks to identify additional opportunities for hardware optimization (allow profiling results, i.e. run-time statistics or execution-time statistics of various static and dynamic allocation schemes to be considered and used.  For example, relevant run-time or execution-time statistics, such as computational load, observed latency, memory usage, power consumption, etc., may be streamed back to the co-simulation design environment from an HTE in real time, (i.e. while the code for the application components is executing on the HTE). – See col. 9, lines 26-57).
It would have been obvious to one ordinary skill in the art before the effective filing date of claimed invention to use Koh’s teaching into Schumacher’s invention because incorporating Koh’s teaching would enhance Schumacher to enable change dynamic allocation scheme to improve execution efficiency of the code based on profiling result as suggested by Koh (See col. 9, lines 26-57).

Allowable Subject Matter
11.	Claim 20 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and the intervening claims.

Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Denisenko et al. (US Patent No. 10,558,437 B1) discloses designing a system on a target device includes performing a high-level compilation of a computer program language description of the system to generate a hardware description language (HDL) of the system.  The high-level compilation performs optimizations in response to profile data obtained from an earlier compilation of the system – See Abstract.   Adjusting global and local memory architectures in response to profile data on a corresponding LSU may be classified as an unsafe optimization – See col. 7, lines 9-67 and col. 8, lines 1-7.
Master et al. (US Pub. No. 2004/0093601 A1) discloses enables further IC optimization for speed, size, utilization factors, and power consumption, with additional emphasis on enabling concurrent or parallel computing across multiple computational units 200 and computational elements 250 – See paragraph [0024, 0033, 0043, 0046-0047].
Ravi et al. (US Pub. No. 2004/0019859 A1) discloses fast synthesis refers to the process of performing a limited synthesis of the RTL description that is much faster than the actual logic synthesis and technology mapping process – See paragraph [0072].
Sundararajan (US Patent No. 7,813,912 B1) discloses detect such conditions by profiling the circuit design through emulation and/or simulation.  Profiling can indicate bugs or errors which may be corrected by adjusting FIFO depth, adjusting the number of memory requests served per clock cycle, etc. – See Col. 5, lines 45-59.
Ebcioglu et al. (US Pub. No. 2013/0125097 A1) discloses several compiler optimizations that existing approaches to automatic parallelization do not have.  By targeting application-specific hardware, high efficiency and low overhead implementations of these optimizations and mechanisms are realized – See paragraph [0108].
Beardslee et al. (US Patent No. 7,072,818 B1) discloses the ability to debug hardware designs at the HDL level facilitates correction or adjustment of the HDL description of the hardware designs. —See Abstract and Fig. 2. To increase simulation performance some functional simulators additionally make use of special purpose hardware which acts as a co-processor and accelerates the simulation – See col. 2, lines 14-49.
Poznanovic et al. (US Pub. No. 2004/0088666 A1) discloses a control-dataflow graph to hardware definition language converter to convert the reconfigurable hardware portion of the control-dataflow graph representations to a hardware definition language file, a hardware definition language to bitstream converter to convert the hardware definition language file to a bitstream file – See paragraphs [0011 and 0013].
13.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MONGBAO NGUYEN whose telephone number is (571)270-7180.  The examiner can normally be reached on Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S. Sough can be reached on 571-272-6799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  






/MONGBAO NGUYEN/           Examiner, Art Unit 2192