Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims 
This Office Action is in response to Applicant’s amendments filed on September 9, 2022.
Claims 9 – 14, & 16 - 22 are pending and/or currently amended. 
Claims 1 – 8, & 15 are cancelled. 
Claims 9 – 14, & 16 - 22 are rejected. This rejection is final. 

Response to Amendment/Remarks 
Applicant's amendments have fixed the deficiencies set forth in the previous Office
Action, hence the respective rejections/objections have been withdrawn, including the 112(f) interpretation, except for the rejections presented in this Office Action.
New rejections are presented in this Office Action based on Applicant's amendment/remarks.

Response to Arguments

Regarding Applicant's arguments about the rejections for claims  9 – 14, & 16 – 22 under 35 U.S.C § 102 and 35 U.S.C § 103, the arguments have been fully considered but are rendered moot due to new ground(s) of rejection necessitated by Applicant’s amendments. 

Regarding claims 9 and 16, Applicant argued in substance that (1) the amended and newly added limitations:  “calculate, indicated by the circuit synthesis information,”  “determine and “output the determined optimum combination of the loop unrolling number and the circuit parallel number that obtains the maximum estimation processing performance” are not taught by the prior arts of record. 

Examiner fully considered this argument but the argument is rendered moot due to new ground(s) of rejection necessitated by Applicant’s amendment. 

As per point (1), as recited below in this Office Action, reference Schumacher teaches: in columns 7, and line 48 to column 8, line 10  that calculate for each piece of the circuit synthesis information, an estimation processing performance related to the synthesis circuit indicated by the circuit synthesis information; reference Smith teaches: in col 6, lines 5 to 52 that determine an optimum combination of the loop unrolling number and the circuit parallel number based on the circuit synthesis information for which a maximum estimation processing performance is obtained; reference Smith teaches: in column 6, lines 5 to 52 that output the determined optimum combination of the loop unrolling number and the circuit parallel number that obtains the maximum estimation processing performance. Therefore, Schumacher in combination with Smith teaches the amended and newly added limitations. 

Applicant's arguments for other claims, which depend on the argued patentability of
claims 9 and 16, are also respectfully traversed by Examiner based on the reasons recited above.

Therefore, the rejections, based on new ground(s) necessitated by Applicant's amendment, are presented. 

NEW REJECTIONS DUE TO AMENDMENT: 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims  9 – 12, 16 – 19, & 22  are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher (US Pat. 10,380,313 ), in view of Smith (U.S. Pub. 5,367,651). 

Regarding claim 9, Schumacher teaches:  
a parameter optimization apparatus (Abstract) comprising:
 a processor (Fig 2, module 205; see also col 6, ln 18 - 41); 
a non-transitory storage medium coupled to the processor, the storage medium storing instructions that, when executed by the processor, cause the processor to (Fig 2, module 210; see also col 6, ln 18 - 41): 
set,  (Fig. 1, modules 160 & 165, col 5, ln 22 – 24; col 5, ln 29 – 40: “each profile rule may specify a design requirement”; col 5, ln 63 – col 6, ln 6; col 8, ln 37 – 48: “specify a compute unit utilization specifying a desired number of compute units that should be used (or a range) and/or amount or range that each compute unit should be used for efficient operation”; col 11, ln 1 – 27: “amount of loop unrolling”; see also col 10, ln 29-  35 & col 4, ln 28 - 50  & col 7, ln 48 – col 8, ln 10);[[and]] 
calculate,  (col 7, ln 48 – col 8, ln 10), [[and]] 
determined instructions for modifying a design that are used as design parameters for a circuit design the circuit design is performed by the high-level synthesis processing for a processing system that executes a target processing on a plurality of processing circuits by loop unrolling (col 10, ln 36 – col 11, ln 19; see also col 4, ln 28 - 50  & col 7, ln 48 – col 8, ln 10),

Schumacher specifically teaches (underlines and red boxes are added by the Examiner for emphasis):


    PNG
    media_image1.png
    566
    1242
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    581
    729
    media_image2.png
    Greyscale


Profile rules 160 may be stored in a data storage device. In one example implementation, profile rules 160 may be stored within a database. It should be appreciated, however, that profile rules 160 may be stored in any of a variety of different data structures and/or files within the data storage device, e.g., one or more text files, eXtensible Markup Language (XML) files, etc.

As an illustrative example, each profile rule may specify a design requirement for an implementation of design 105. A design requirement refers to an operating and/or performance requirement. Examples of performance requirements can include, but are not limited to, data transfer rate, latency, etc. Block 155 is capable of determining whether design 105 complies with design rules160 by comparing profile rules 160 with profile data 150. Block 155, for example, may perform the comparison of profile data 150 with profile rules 160 and output results 170. Results 170 can indicate compliance of design 105, or an implementation thereof, with one or more of profile rules 160.


Guidance options 165 provide instruction on optimizing design 105 to improve performance. Guidance options 165 may be correlated with particular ones of profile rules 160, which may include the source code analysis rules. In one aspect, each guidance option 165 is correlated with one or more profile rules 160. For example, in response to determining that design 105 does not comply with a selected profile rule, block 155 may retrieve one or more guidance options 165 that are correlated, or associated, with the selected profile rule and output those particular guidance option(s) 165 as selected guidance options 175.

Other examples of profiles rules may specify that each compute unit is utilized at least one time for a run of the design. Another example profile rule may specify an upper threshold for calling compute units to ensure that the compute units are not called too often as the HC system in curs overhead or a setup penalty in calling the compute units. A profile rule may specify a compute unit utilization specifying a desired number of compute units that should be used (or a range) and/or amount or range that each compute unit should be used for efficient operation. Another example profile rule may specify that each device of the HC platform is used at least one time.

Another example of a guidance option is to suggest the use and/or inclusion of optimization pragmas for the kernel(s). Optimization pragmas are compiler directives that provide implementation guidance to the compilation/synthesis tools for compiling the kernel source code. The optimization pragmas indicate particular structural implementations to be implemented in the resulting circuit design and/or circuitry that is generated for the kernel(s).



In one aspect, optimization pragmas may be included or added within the source code of the kernel(s). In another aspect, optimization pragmas may be added to a data structure other than the source code of the kernel(s). The other data structure may be associated with the kernel(s) and/or read by the system during compilation of the kernel(s). For example, the optimization pragmas may be added to a database, specified as a script (e.g., a Tcl script, or the like), as metadata, a project setting for the EDA tools, etc. In either case, the system reads the optimization pragmas and implements the optimization pragmas during compilation.

For example, one optimization pragma specify that loop unrolling should be performed in compiling the kernel source code. The optimization pragma may specify an amount of loop unrolling to be performed. Another optimization program may specify that pipelining should be performed. Another optimization pragma may specify using dataflow mode where processing starts as soon as data is available.


In block 315, the design is run. The design may be run using the system that compiles the design or another system in the case of simulation. The design may be run in an HC platform that includes a device adapted to implement the kernel(s) in hardware. In any case, running the design generates profile data. The profile data, in general, includes various quantities that indicate the operating performance of the host processor, kernels, or both.

 Examples of profile data that may be collected include, but are not limited to, data relating to data transfers between the host processor and the kernel, runtime of the kernel to complete a processing task, kernel and compute unit utilization, host and kernel bandwidth utilization, functional tests, etc. Regarding data transfers, example profile data that may be collected can include kernel read size (for data), kernel write size (for data), kernel read utilization, kernel write utilization, amount or size of total kernel data read, host read transfers from off-chip global memory, host write transfers to off-chip global memory, compute unit utilization (number of compute units of a kernel used and/or amount of usage of particular compute units of the kernel), kernel utilization of work groups, available device usage, etc

In block 335, the system is capable of indicating compliance with the profile rules. For example, the system is capable of indicating which profile rules are met and/or not met. The system is capable of outputting results, e.g., indications, of compliance with the profile rules. In one aspect, compliance is indicated on a per rule basis. As an illustrative example, the system is capable of generating a report and providing the information via a user interface that may be displayed using a display device.

 In block 340, the system is capable of selecting one or more guidance options based upon compliance with the profile rules. For example, the system may select guidance options for those design rules that are not met. The system may include a database of guidance options. Each guidance option can specify instructions for modifying the design, e.g., the source code of the host portion of the design or the source code of the kernel portion of the design, in order to improve operating performance of the resulting system as implemented in an HC platform.

but Schumacher may not explicitly disclose:
a plurality of combinations of a loop unrolling number and a circuit parallel number to generate circuit synthesis information indicating a synthesis circuit obtained by a high-level synthesis processing for each of the plurality of combinations;
determine and; 
output the determined optimum combination of the loop unrolling number and the circuit parallel number that obtains the maximum estimation processing performance, wherein 
the determined optimum combination of the loop unrolling number and the circuit parallel number are used; 

       However, Smith teaches:
a plurality of combinations of a loop unrolling number and a circuit parallel number to generate circuit synthesis information indicating a synthesis circuit obtained by a high-level synthesis processing for each of the plurality of combinations (Smith: col 6, ln 5 - 52);
determine (Smith: col 6, ln 5 - 52);
output the determined optimum combination of the loop unrolling number and the circuit parallel number that obtains the maximum estimation processing performance, wherein (Smith: col 6, ln 5 - 52)
the determined optimum combination of the loop unrolling number and the circuit parallel number are used (Smith: col 6, ln 5 - 52); 

It would have been obvious to the one of ordinary skill in the art before the effective filing date of the claimed invention to have modified teachings of Schumacher and incorporate the teachings Smith for determining and using the optimum combination of the loop unrolling number and the circuit parallel number for which a maximum estimation processing performance is obtained. The one of ordinary skill in the art would have been motivated to do so to achieve a trade-off between unrolling loops too few times (and not achieving the maximum speed up potential) and unrolling loops many times ( and pessimize the circuit program because of possible increase in code size) . As a result, enabling the one of ordinary skill to determine the optimal number of times for unrolling loops that are not only consistent with the resources available (i.e., parallel circuits) but also with the goal of achieving maximum speed up potential (i.e., processing performance) (Smith: col 6, ln 5 - 52).

Smith specifically teaches (underlines and red boxes are added by the Examiner for emphasis): 

    PNG
    media_image3.png
    743
    626
    media_image3.png
    Greyscale


The improved loop unroller 50 has two modes of operations, a "preliminary" mode and a "final"
mode. When invoked in "preliminary" mode by the improved register allocator 48, the improved loop
unroller 50, in cooperation with the improved scheduler 52, determines and provides the improved
register allocator with the optimal number of times the various loops in the program being compiled
should be unrolled. The improved loop unroller 50 determines the optimal number of times the various
loops should be unrolled by repeatedly invoking the improved scheduler 52 to determine the
amount of parallelism that can be achieved for different number of times the various loops
are unrolled. Unrolling too few times does not achieve the maximum speed up potential. On the other hand, unrolling too many times can pessimize the program because of possible increase in spilling and code size. When invoked in "final" mode by the improved register allocator 48, the loop unroller 50 receives the optimized intermediate representations with allocated global registers and associated information as inputs. In response, it restructures the instructions being generated, unrolling loops in the instructions being generated for the determined optimal number of times consistent with the
resources available in the exemplary computer system of FIG. 1.

The improved scheduler 52 also has two modes of operation, a "preliminary" mode and a "final" mode. When invoked in "preliminary mode" by either the improved register allocator 48 or the improved loop unroller 50, the improved scheduler 52 determines and provides a preliminary instruction schedule for its invoker, thereby providing the invoker with an estimate on the amount of parallelism that can achieved. The improved scheduler 52 determines the preliminary instruction schedule, allocating local registers, using all registers of the target machine. When invoked in "final mode", the improved scheduler 52 receives the unrolled intermediate representations and associated information as inputs. In response, it determines the final instruction schedule, allocating the local registers. In both cases, all global register candidates have been either assigned or spilled. The improved scheduler 52 is concerned with only the local register candidates.

Lastly, the assembly code generator 54 receives the optimized, register allocated, and restructured intermediate representations and associated information as inputs. In response, it generates the object code for the program being compiled.

Regarding claim 10, modified Schumacher teaches all of the limitations of claim 9. 
Modified Schumacher further teaches and Smith also teaches: 
instructions that, when executed by the processor, further cause the processor to set the loop unrolling number based on a loop total number indicating a total number of loops to be unrolled in the loop unrolling when a combination is set (Smith: col 6, ln 5 - 52).  

Regarding claim 11, modified Schumacher teaches all of the limitations of claim 9. 
Modified Schumacher further teaches and Smith also teaches: 
instructions that, when executed by the processor, further cause the processor to set the circuit parallel number based on a resource constraint indicating resources that are usable in the processing system when a combination is set (Smith: col 6, ln 5 - 52).  

Regarding claim 12, modified Schumacher teaches all of the limitations of claim 9. 
Modified Schumacher further teaches and Schumacher also teaches: 
the instructions that, when executed by the processor, further cause the processor to calculate the estimation processing performance based on the circuit synthesis information, a delay constraint indicating a processing delay allowable in the target processing, and a number of simultaneous inputs of data to be input in parallel to the target processing (Schumacher: col 5, 31 – 35 & col 8, ln ln 37 -48).  


Regarding claims 16 – 19, & 22, modified Schumacher teaches the parameter optimization apparatus. Therefore, modified Schumacher teaches parameter optimization method and computer program product. 


Claims 13 - 14,  & 20 -21 are rejected under 35 U.S.C. 103 as being unpatentable over Schumacher (US Pat. 10,380,313 ), in view of Smith (US Pub. 5,367,651), and in further view of Vassiliev (US Pub. 2017/0262567).

Regarding claim 13, modified Schumacher teaches all of the limitations of claim 9, 
but modified Schumacher may not explicitly disclose wherein the processing system includes: 
the plurality of processing circuits that execute part of the target processing in which loop unrolling is preliminarily performed on distributed packets; 
a distributor that distributes a plurality of flows of packets to be simultaneously input to the plurality of processing circuits; and 
an aggregator that aggregates and outputs processing results obtained by the plurality of processing circuits.

However, Vassiliev teaches: 
the plurality of processing circuits that execute part of the target processing in which loop unrolling is preliminarily performed on distributed packets (Vassiliev: para [0058] & [0146] & [0191]); 
a distributor that distributes a plurality of flows of packets to be simultaneously input to the plurality of processing circuits (Vassiliev: para [0193]); and 
an aggregator that aggregates and outputs processing results obtained by the plurality of processing circuits (Vassiliev: para [0177]).

	It would have been obvious to the one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified Schumacher and incorporate the teachings of Vassiliev for targeting loop unrolling performed on distributed packets. The one of ordinary skill in the art would have been motivated to do so to maximize routing priority based on quality of service (QoS) agreement, thereby guaranteeing input and output processing circuits bandwidth reservations or some combination thereof (Vassiliev: para [0193]).

Regarding claim 14, modified Schumacher teaches all of the limitations of claim 13. 
	Modified Schumacher teaches and Vassiliev also teaches wherein: 
the plurality of processing circuits switch a state for processing the packets depending on a flow of packets distributed from the distributor (Vassiliev: para [0140]: “switch interface”).

Regarding claims 20 -21, modified Schumacher teaches the parameter optimization apparatus. Therefore, modified Schumacher teaches parameter optimization method and computer program product. 

Conclusion

 The prior art made of record and not relied upon is considered pertinent to the applicant’s
disclosure.

Cheng (US Pub. 2018/0011957) : teaches improving hardware execution efficiency of high-level synthesis tools. 


Applicant's amendment necessitated the new ground(s) of rejection presented in this Office
action. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMEIR MYERS whose telephone number is (571)272-8160.  The examiner can normally be reached on 8:30 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THOMAS LEE, can be reached on (571) 272‐3667.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/A.M./Examiner, Art Unit 2115                                                                                                                                                                                                        12/12/2022



/THOMAS C LEE/Supervisory Patent Examiner, Art Unit 2115