DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	Applicant’s amendment and response dated August 30, 2022 in responding to the Office Action of March 31, 2022 provided in the rejection of all previous pending claims 1-9 and 12-15.
	Claims 1, 4, 9, 13, and 14 have been amended.
No claims have been canceled nor newly added.
Thus, claims 1-9 and 12-15 are pending for examination.
Examiner Notes
3.	Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Objections
5. 	Claim 6, 8, and 12 are objected to because of the following informalities: 
As to claim 6, line 2, recited to include the limitation “the dataflow graph” should be changed to, for example , --- the unfolded dataflow graph --.  Appropriate correction is required.
As to claim 8, lines (1, 3 and 4), recited to include the limitation “the dataflow graph” should be changed to, for example , --- the unfolded dataflow graph --respectively.  Appropriate correction is required.
As to claim 12, lines 1-2, recited to include the limitation “wherein the optimizing of the dataflow graph comprises folding the identified repeating patterns into the dataflow graph” should be changed to, for example , --- wherein the optimizing of the unfolded dataflow graph comprises folding the identified repeating patterns into the unfolded dataflow graph--.  Appropriate correction is required.
6.	Claim 9 is objected under 37 CFR 1.75 as being a substantial duplicate of claim 1.
	When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim.  See MPEP § 706.03(k).
Claim Rejections - 35 USC § 112
7.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


8.	Claim 14 is  rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	As to claim 14, line 2, recited the limitation “comprising identification of individual dataflows for the output values in an array which are output, and separating common operations on the output from unique operations unique to every one of the outputs, thereby providing a graph with the common operations and a list of graphs with the unique operations” it unclear which the “individual dataflows” are referred to; however, since claim 14 is now amended to be depended upon claim 4, for interest in compact prosecution, examiner treats the “individual dataflows” to be “individual dataflows of the unfolded dataflow graph” . 
Claim Rejections - 35 USC § 102
9.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

10.	Claims 1, 9, and 15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Lotfi et al. “REHLS: Resource-aware Program Transformation Workflow for High-level Synthesis”, 2017 IEEE 35th International Conference on Computer Design, 5 November 2017, herein after Lotfi. 
As to claim 1, Lotfi discloses a method for generation of a configuration of a hardware accelerator -- (e.g., Given a C program with inherent computational patterns, REHLS explores the opportunities for design optimization through resource sharing, and automatically transforms the program so that the HLS tool generates a more efficient design after synthesis – see at least page 534, left column, paragraph 1), comprising:
 inputting a program with a plurality of lines of code describing an algorithm to be implemented on the hardware accelerator to a front end of an apparatus  -- (e.g., Given an input C program – see at least page 534, left column, paragraph 2,  C program of Fig. 1 and associate text); 
 executing the program and generating in the front end an unfolded dataflow graph in memory of the apparatus from the inputted program – (e.g., program transformation for a pattern detection within LLVM Compiler within i nodes (iteration) “ REHLS finds all patterns that are repeated across different basic blocks in each function in the program. To do that, the data flow graph (DFG) of different basic blocks are extracted from the program’s LLVM IR. Given a pair of DFGs, G1 = (V1, E1) and G2 = (V2, E2), the goal is to enumerate all the subgraphs of G1 that are isomorphic to subgraphs of G2, where functionally, type, and bitwidth of corresponding nodes of V1 and V2 are the same. Our algorithm detects patterns with a breadth-first search approach. Our subgraph enumeration process is incremental, meaning that size i+1 subgraphs are enumerated when all the size i subgraphs are enumerated. If a size i subgraph is not frequent enough, it is removed and no further considered for creating new subgraphs of size i+1. In each iteration i, our algorithm finds all patterns of size i (with i nodes) across different DFG…This process is repeated until either no more new pattern is found or we enumerate the subgraph with maximum possible size” – see Lofti, at least page 534, left column : pattern detection to paragraph 1, right column, Fig. 1, and associated text);
passing the generated unfolded dataflow graph to a back end of the apparatus – (e.g., path in Fig.1 to the program transformation step as in Fig. 1, and associated text) ; 
 optimizing the unfolded dataflow graph in the back end – (e.g., Different system-level optimizations and code restructuring can be applied on a given application specification, where each transformation impacts differently on resource utilization and performance of the design after synthesis – see at least page 533, left column paragraph 3, program transformation of Fig 1, and associated text) ;  and 
outputting an output program representative of the configuration of the hardware accelerator and being generated from the unfolded optimized dataflow graph – (e.g., the modified C program used by the HLS tool targeting Xilinx FPGAs, as in Fig. 1 – see at least page 535, left column, paragraph 4, page 536, right column, paragraph 3, Fig. 1,  and associated text). 
As per claims 9 and 15, Lotfi discloses a method of the configuration of a hardware accelerator comprising:  
generation of a configuration of the hardware accelerator in the form of an output program representative of the configuration of the hardware accelerator, wherein the generation --(e.g., Given a C program with inherent computational patterns, REHLS explores the opportunities for design optimization through resource sharing, and automatically transforms the program so that the HLS tool generates a more efficient design after synthesis – see at least page 534, left column, paragraph 1) comprises: 
inputting a program with a plurality of lines of code describing an algorithm to be implemented on the hardware accelerator to a front end of an apparatus-- (e.g., Given an input C program – see at least page 534, left column, paragraph 2,  C program of Fig. 1 and associate text),
executing the program and generating in the front end an unfolded dataflow graph in memory from the inputted program– (– (e.g., program transformation for a pattern detection within LLVM Compiler within i nodes (iteration) “ REHLS finds all patterns that are repeated across different basic blocks in each function in the program. To do that, the data flow graph (DFG) of different basic blocks are extracted from the program’s LLVM IR. Given a pair of DFGs, G1 = (V1, E1) and G2 = (V2, E2), the goal is to enumerate all the subgraphs of G1 that are isomorphic to subgraphs of G2, where functionally, type, and bitwidth of corresponding nodes of V1 and V2 are the same. Our algorithm detects patterns with a breadth-first search approach. Our subgraph enumeration process is incremental, meaning that size i+1 subgraphs are enumerated when all the size i subgraphs are enumerated. If a size i subgraph is not frequent enough, it is removed and no further considered for creating new subgraphs of size i+1. In each iteration i, our algorithm finds all patterns of size i (with i nodes) across different DFG…This process is repeated until either no more new pattern is found or we enumerate the subgraph with maximum possible size” – see Lofti, at least page 534, left column : pattern detection to paragraph 1, right column, Fig. 1, and associated text); passing the generated unfolded dataflow graph to a back end of the apparatus– (e.g., path in Fig.1 to the program transformation step as in Fig. 1, and associated text);
optimizing the unfolded dataflow graph in the back end--(e.g., Different system-level optimizations and code restructuring can be applied on a given application specification, where each transformation impacts differently on resource utilization and performance of the design after synthesis – see at least page 533, left column paragraph 3, program transformation of Fig 1, and associated text), and 
outputting an output program representative of the configuration of the hardware accelerator and being generated from the optimized dataflow graph;  and providing the output program to the hardware accelerator, thereby enabling configuration of the hardware accelerator – (e.g., the modified C program used by the HLS tool targeting Xilinx FPGAs, as in Fig. 1 – see at least page 535, left column, paragraph 4, page 536, right column, paragraph 3, Fig. 1,  and associated text).
Further regarding to claim 15, Lotfi disclose a hardware accelerator (e.g., Xilinx FPGA – see at least page 533, left column, paragraph 1 ) for implementing method steps as of claim 9 above. 
Claim Rejections - 35 USC § 103
11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

12.	Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Lotfi in view of J. P. Pinilla and S. J. E. Wilton, "Enhanced source-level instrumentation for FPGA in-system debug of High-Level Synthesis designs," 2016 International Conference on Field-Programmable Technology (FPT), 2016, pp. 109-116, doi: 10.1109/FPT.2016.7929514, hereinafter Pinilla.
As to claim 2, it is to note that Lotfi does not explicitly disclose, but Monson, in an analogous art, discloses, further comprising injecting instrumentation code  prior to the executing of the inputted program – (e.g., In this paper, we describe a methodology using source-level instrumentation for C-based HLS tools, to create memories and related circuitry to gather trace data that provides visibility into the operation of the circuit…In this paper, we present three important contributions related to source-level instrumentation… We then outline the role of instrumentation and the advantages of inserting this instrumentation in the original source code … III. SOURCE-LEVEL DEBUG FRAMEWORK Figure 2 shows our overall in-system debug framework. Starting at the top-left, the original user C code is parsed into an Abstract Syntax Tree (AST) and instrumentation is automatically inserted using a custom tool we built using the ROSE source-to-source compiler infrastructure API [21] – see Pinilla, at least Abstract, paragraph 2,  page 109, right column, paragraph 3, page 110, left column, paragraph 3, page 111, left column, paragraph 2, Figure 2, and associated text) . 
 Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate using source-level instrumentation for C-based HLS tools as taught in Pinilla on HLS tool of Lotfi for further optimizing the source level debugging by reducing the overhead and speeding up performance execution as seen in Pinilla (e.g., page 110, left column, paragraphs 1-2). 
As to claim 3, modified Lotfi with Pinilla discloses wherein a plurality of instrumentation code is injected before one or more of the lines of code –(e.g., incorporate using source-level instrumentation for C-based HLS tools as taught in Pinilla see Pinilla, at least Abstract, paragraph 2,  page 109, right column, paragraph 3, page 110, left column, paragraph 3, page 111, left column, paragraph 2, Figure 2, and associated text) on HLS tool of Lotfi  for further optimizing the source debugging by reducing the overhead and speeding up performance execution as seen in Pinilla (e.g., page 110, left column, paragraphs 1-2). 
13.	Claims 4, 5, and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Lotfi in view of Gutson et al. (US 20190042760 A1, hereinafter Gutson).
As to claim 4, it is to note that Lotfi does not explicitly disclose, but Gutson, in an analogous art, discloses, wherein the optimizing of the unfolded dataflow graph comprises one or more of identification of repeating sequences in the dataflow graph and pruning of unnecessary nodes –(e.g., During processing, back end 140 can include analyzer component 142 to analyze the control and data flow graph, and optimizer component 144 to perform optimizations (e.g., vectorization, basic block ordering, etc.) on the analyzed control and data flow graph. …Code can be vectorized during compilation (e.g., by optimizer component 144) if the code applies the same operation to multiple values.  In this scenario, special instructions may be used to all operations on multiple values at the same time – identifying repeating sequence --.  An optimizer component (e.g., 144) may also be identified to receive a feedback signal based on the location of a vulnerable value in an instruction and the type of instruction.  For example, it may be possible to reorder a basic block in order to remove (pruning) a vulnerable value from a jump instruction, a call instruction, or jump if condition is met instruction.  In some instances, the instruction itself may be transformed to no longer require branching.– see Gutson, at least 0030, 0061-0063, Fig. 1, and associated text).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate performance functions of the optimizer component on the dataflow graph as taught in Gutson into Lotfi’s teaching  for further optimizing backend debugging by reducing the latency and improving memory efficiency. 
As to claim 5, it is to note that Lotfi does not explicitly disclose, but Gutson, in an analogous art, discloses further comprising at least one of reducing the number of steps in arithmetic operations or increasing the number of concurrent arithmetic operations--- (e.g., An optimizer component (e.g., 144) may also be identified to receive a feedback signal based on the location of a vulnerable value in an instruction and the type of instruction… While division and multiplication instructions with 0xC3 or 0xC2 in their operands may be transformed into more than two instructions to eliminate the occurrence of 0xC3 or 0xC2, other instructions (e.g., arithmetic, subtraction, etc.) having 0xC3 or 0xC2 in their operands may be transformed into two instructions to eliminate the occurrence of 0xC3 or 0xC2.) –see Gutson, at least 0030, 0061-0063, 0112, Fig. 1, and associated text),
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Gutson’s teaching into Lotfi’s teaching for further optimizing backend debugging by reducing the latency and improving memory efficiency. 
As to claim 12, modified Lotfi with Gutson discloses wherein the optimizing of the dataflow graph comprises folding the identified repeating patterns into the dataflow graph – (e.g., incorporate performance functions of the optimizer component on the dataflow graph as taught in Gutson -- see Gutson, at least 0030, 0061-0063, Fig. 1, and associated text, into Lotfi’s teaching  for further optimizing backend debugging by reducing the latency and improving memory efficiency).
As to claim 13, modified Lotfi with Gutson discloses, wherein the optimizing of the unfolded dataflow graph further comprises converting local arrays to scalar variables(e.g., incorporate performance functions of the optimizer component on the dataflow graph as taught in Gutson -- see Gutson, at least 0030, 0061-0063, Fig. 1, and associated text, into Lotfi’s teaching  for further optimizing backend debugging by reducing the latency and improving memory efficiency).
As to claim 14, modified Lotfi with Gutson discloses further comprising identification of individual dataflows for the output values in an array which are output, and separating common operations on the output from unique operations unique to every one of the outputs, thereby providing a graph with the common operations and a list of graphs with the unique operations (e.g., the modified C program used by the HLS tool targeting Xilinx FPGAs, as in Fig. 1 – see Lotfi, at least page 535, left column).
14.	Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Lotfi in view of Cheng et al., "Architectural synthesis of computational pipelines with decoupled memory access," 2014 International Conference on Field-Programmable Technology (FPT), 2014, pp. 83-90, doi: 10.1109/FPT.2014.7082758., hereinafter Cheng. 
As to claim 6, it is to note that Lotfi does not explicitly disclose, but Cheng, in an analogous art, discloses, further comprising identifying pipelining in the dataflow graph – (e.g., In this paper, we try to narrow the gap between software and hardware execution mechanisms by automatically transforming application kernels into pipelines of processing stages, complemented by load/store primitives capable of pipelined data accesses. The main contributions of this paper are: • a novel tool flow for converting software loop nests to pipelines of decoupled processing stages, where: ◦ the effects of long latency operations are localized, ◦ memory load/store operations are converted to data access modules which use memory bandwidth efficiently, and ◦ customization of memory access mechanisms based on the data access patterns of the accelerated loop nests… An essential aspect of modern high-level synthesis flow lies in the exploitation of parallelism between loop iterations. Due to the presence of loop carried dependencies in many applications, the HLS tools use software pipelining [14] to initiate new iterations before previous ones are completed-- See Cheng , at least page 83, right column, paragraph 2-last paragraph, page 84, left column, last paragraph, Figs. 2 and 5,and associated text).
 Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Cheng’s teaching into Lotfi’s teaching  for further optimizing backend debugging by reducing the latency and improving memory efficiency .
As to clam 7, it is to note that Lotfi does not explicitly disclose, but Cheng, in an analogous art, discloses, further comprising identification of memory accesses and at least one of removing redundant memory accesses, providing storage of reused values in buffers, or adapting memory accesses to resources available in the hardware accelerator – (e.g., In this paper, we present an automatic flow to refactor and restructure processor centric software implementations, making them better suited for FPGA platforms. The methodology generates pipelines that decouple memory operations and data access from computation. The resulting pipelines have much better throughput due to their efficient use of the memory bandwidth and improved tolerance to data access latency. ..in our flow, partitioning of the memory space has provided an opportunity to create better hardware for memory access on the reconfigurable fabric. Each independent data access interface, corresponding to one memory partition, can be supported differently according to the nature of the address stream it generates. In particular, for streaming type accesses, there is no reuse of data, and thus our flow does not allocate an on-FPGA buffer. Rather, the send req module described in section IV-A is modified to send burst requests, concatenating multiple load/store in the original program execution. On the other hand, if there is a cycle of dependency through memory, an on-FPGA buffer would be beneficial. Our flow currently adds a general-purpose cache in this case, but if the particular address stream is analyzable and the reuse distance can be determined statically, structures like smart buffers [24] can be incorporated. Even in the case when the memory accesses are random and a general-purpose cache is the only plausible solution, its size and associativity can be adjusted according to a runtime profile --See Cheng , at least page 83, right column, paragraph 2-last paragraph, page 84, left column, last paragraph, page 87,left column, last paragraph, Figs. 2 and 5,and associated text).
 Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Cheng’s teaching into Lotfi’s teaching  for further optimizing backend debugging by reducing the latency and improving memory efficiency .
15.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Lotfi in view of Ahmed et al. (US 10025566 B1, hereinafter Ahmed).
As to clam 8, it is to note that Lotfi does not explicitly disclose, Ahmed, in an analogous art, discloses wherein the dataflow graph comprises a plurality of loops and the method further comprises unfolding one or more of the plurality of loops in the dataflow graph and then repeating the optimization of the dataflow graph –(e.g., Scheduling techniques transform dataflow graphs (DFGs), for example, of digital signal processing (DSP) arrangements of filters, into efficient schedules for concurrent execution on processing resources coupled to a memory.  A DSP arrangement may be represented by an executable model having interconnected filters represented by model elements.  The techniques may apply scheduling transforms according to a classification of the model elements based on a lifetime of their internal states (e.g., finite or infinite).  Exemplary scheduling transforms may include unfolding, coordinated loop scheduling and pipelining to parallelize a DFG and enhance overall performance, i.e., reduce average sample execution time of the DSP arrangement.  Notably, the scheduling transforms may aggregate (i.e., merge) multiple finite state model elements for concurrent execution and repeat execution of infinite state model elements to achieve the overall improved performance – See at Ahmed, at least abstract, col. 1: 1-59, col. 6: 37-58). 
 Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Cheng’s teaching into Lotfi’s teaching  for further optimizing backend debugging by reducing the latency and speeding up performance execution
Prior Art's Arguments – Rejections
16.	Applicant's argument filed August 30, 2022 in responding to the Office Action of March 31, 2022  –See Remarks, pages 6-8, especially with respect to per amendment of independent claims 1 has been fully considered and are not persuasive and/or moot as follow:
	As to amendment of claim 1, applicant alleges that Lotfi does not disclose unfolded of dataflow graph – see Remarks, page 7, paragraph 1, which examiner respectfully disagrees.
Lotfi still teaches executing the program and generating in the front end an unfolded dataflow graph in memory of the apparatus from the inputted program as follow:
In light of Tomas’s teaching (published in 2006), unfolding technique is generally referring to:
“ 2.3 Unfolding The unfolding technique, or loop unrolling as designated in compiler theory, consists on transforming a DFG to allow the execution of multiple iterations of the algorithm in just one loop cycle. For this matter, unfolding factor is defined as the number of times a loop is unrolled, i.e. the number of iterations to perform per cycle. However this transformation generates a number of nodes/edges in the final DFG proportional to the unfolding factor. Therefore, it increases the complexity of the DFG and requires more computational resources to execute a single iteration of the unfolded algorithm. The advantage of this technique is that it decreases the time required to perform one iteration down to a minimum value equal to the iteration bound.”

As can be seen unfolding technique is also happening in Data flow graph of Lotfi especially program transformation for a pattern detection within LLVM Compiler within i nodes (iteration) cycle,  as such,
 “ REHLS finds all patterns that are repeated across different basic blocks in each function in the program. To do that, the data flow graph (DFG) of different basic blocks are extracted from the program’s LLVM IR. Given a pair of DFGs, G1 = (V1, E1) and G2 = (V2, E2), the goal is to enumerate all the subgraphs of G1 that are isomorphic to subgraphs of G2, where functionally, type, and bitwidth of corresponding nodes of V1 and V2 are the same. Our algorithm detects patterns with a breadth-first search approach. Our subgraph enumeration process is incremental, meaning that size i+1 subgraphs are enumerated when all the size i subgraphs are enumerated. If a size i subgraph is not frequent enough, it is removed and no further considered for creating new subgraphs of size i+1. In each iteration i, our algorithm finds all patterns of size i (with i nodes) across different DFG…This process is repeated until either no more new pattern is found or we enumerate the subgraph with maximum possible size” – see Lofti, at least page, 534, left column : pattern detection to paragraph 1, right column, Fig. 1, and associated text. 

Therefore, DFGs of Lotfi is unfolding. Accordingly , Lotfi does teach claim 1 recitation. 
Similar notation is also applied to claim 15. 
As to per claims 2-8, 12, and 13, which depended upon claim 1, examiner noted that Applicant called for similar arguments as of claim 1 above – see Remarks, page 8, paragraph 2,  which found to be not persuasive or moot as noted above.
Conclusion

17.	Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP §706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
18.	The prior art made of record and not relied upon (cited on 892 form) isconsidered pertinent to application disclosure. 
Tomas “Algorithms and tools for automatic generation of DSP hardware structures” , https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.6088&rep=rep1&type=pdf, 2006, disclose automatic generation of efficient hardware structures having algorithms and techniques for i) balancing the paths in a graph, ii) scheduling of operations to functional units, iii) allocating registers and iv) generating the VHDL code as well as describing background of  folding and unfolding technique. 

Liang-Fang Chao and E. Hsing-Mean Sha, "Scheduling data-flow graphs via retiming and unfolding," in IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 12, pp. 1259-1267, Dec. 1997, doi: 10.1109/71.640018, discloses Loop scheduling in data flow graphs via retiming and unfolding.

K. K. Parhi and D. G. Messerschmitt, "Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding," in IEEE Transactions on Computers, vol. 40, no. 2, pp. 178-195, Feb. 1991, doi: 10.1109/12.73588 disclose static rate-optimal scheduling of iterative data-flow programs via optimum unfolding.

19.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARINA LEE whose telephone number is (571)270-1648.  The examiner can normally be reached on Monday to Friday (8 am to 4: 30 pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S. Sough can be reached on (571)-272-6799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARINA LEE/Primary Examiner, Art Unit 2192