Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance:
Claim 1 requires among other things: A device comprising: a template register configured to store a stream definition template…  receive an instruction from the processor that specifies at least one of: a number of bits in the template register allocated to each of the respective counts of the plurality of nested loops or a number of bits in the template register allocated to specify a distance between pointer positions of each loop of the plurality of nested loops; based on the instruction, retrieve a set of data elements from the memory by: generating a set of addresses for the set of data elements by traversing the plurality of nested loops according to the respective counts in the stream definition template; and retrieving the set of data elements from the memory according to the set of addresses; and provide the set of data elements to the processor.”

The closest prior art includes Anderson (patent application publication No. 2005/0019840) and Nuzman (patent application publication No. 2011/0029962).
Anderson taught A device comprising: a template register (stream template register 2300) (e.g., see fig. 23) configured to store a stream definition template that includes a respective count for each loop of a plurality of nested loops (e.g., see paragraph 0147); and a memory unit (stream buffer)(1500) coupled to the template register and configured to couple to a memory(1510) and a processor(1520)(e.g., see fig. 15 and paragraphs 0107-0109), wherein the memory unit is configured to: receive an instruction from the processor(e.g., see paragraphs 0067,0089 and fig. 11); based on the instruction, retrieve a set of data elements from the memory (e.g., see paragraph 0106) by: generating a set of addresses for the set of data elements by traversing the plurality of nested loops according to the respective counts in the stream definition template(e.g., see paragraphs 0107-0108,0147); and retrieving the set of data elements from the memory according to the set of addresses; and provide the set of data elements to the processor(e.g., see paragraphs 0100-0102 and 0075-0079). Nuzman however taught  wherein each of the respective counts is independent of a remainder of the respective counts. (e.g., see paragraphs 0027- 0029 and fig. 3A) [Nuzman taught performing iteration of loop L1 up to an aligned memory address and then performing the loop count of iteration of loop L2 and then performing the remaining iterations of loop L1. Note the counts of L1 to the aligned memory address and the iterations of counts L2 did not depend on the remaining iterations of loop L1]. 
Anderson taught instructions executed were vector instructions (e.g., see paragraph 0064 and figs. 15,16) and transporting data to/from vector register file (e.g., see fig. 5) and therefor the paths for data provide vector data path(s). Anderson also taught streaming engine(s) coupled to CPU and memory (e.g., see paragraphs 0011, 0050) and template register (e.g., see paragraph 0147). As to the instruction(s) being stream start instruction Anderson taught “the stream instructions separately specify starting address” (e.g., see paragraph 0147)

 Anderson taught wherein the stream definition template includes a format field that specifies a number of bits in the template register allocated for each of the respective counts (e.g., see paragraphs 0147-0148 including Table 9 and Table 10)[the size bits for the fields for iterations provides this limitation and Anderson taught “The template... fully specifies the type of elements, length and dimension of the stream’].

 	Anderson taught wherein: a first value of the format field specifies that a first number of bits is allocated for a first count of the respective counts and a second number of bits is allocated for a second count of the respective counts; and the first number of bits is different from the second number of bits (e.g., see paragraphs 0147-0148 including Table 9 and Table 10)[the size bits field indicates the number of bits for the iteration
count(s) and the size of the size bits field for loop 0 is 32 while the size bits field for loop 3 is 8, therefore the number of bits indicated by the size bits fields are different]

Anderson taught wherein the format field further specifies a count of loops within the plurality of nested loops(e.g., see paragraph 0147)[note Anderson taught “ the streaming engine defines a four level loop nest for addressing elements” in lines 8-9 of paragraph 0147 and table 9 shows the definition fields for the format including field for iteration counts for plural loops (which, as understood, are for the nested loops)].

Anderson taught wherein the stream definition template includes a direction bit that specifies whether a first loop of the plurality of nested loops increases or decreases in address as the memory unit traversed the plurality of nested loops (e.g., see paragraph 0148 including table 10)[ the stream direction field “DIR” (0 forward direction, 1 reverse direction) in table 10 provides this limitation][ also note in paragraph 0134 Anderson taught “Each streaming engine includes a streaming address generator ... address generators output... addresses ... in a sequence defined by stream parameters’).

Anderson taught wherein each of the respective counts has a sign that indicates whether the respective loop of the plurality of nested loops increases or decrease in address as the memory unit traversed the plurality of nested loops (e.g., see paragraph 0148 including table 10)[ the stream direction field “DIR” (0 forward direction, 1 reverse direction) in table 10 provides this limitation][the 0 or 1 bit of the DIR field provide a value equivalent to a sign for the whether the loops increase of decrease].



Anderson taught wherein: the stream definition template further includes a respective loop dimension for each loop of the plurality of nested loops, wherein each of the respective loop dimensions is independent of a remainder of the respective loop dimensions; and the generating of the set of addresses by the memory unit further traverses the plurality of nested loops according to the respective loop dimensions in the stream definition template (e.g., see paragraph 0147)[note “Anderson taught “The template above fully specifies the type of elements, the length and dimensions of the stream’].

 Anderson taught wherein the stream definition template includes a format field that specifies a number of bits in the template register allocated for each of the respective loop dimensions (e.g., see paragraph 0147)[note the Signed dimension for loop1 size bits, Signed dimension for loop 2 size bits and Signed dimension for loop 3 size bits in table 9 (including the stream field template provides this limitation].
Anderson taught wherein the memory is a level- two (L2) cache memory, and the memory unit is configured to bypass a level-one (L1) cache memory (e.g., see paragraphs 0011 and 0089 and fig. 28)[streaming engine (SE) to L2 interface provides the means to bypass the L1 cache).
 Anderson taught further comprising the memory (1510)(e.g., see fig. 15 and paragraphs 0107- 0109).
However Anderson and Nuzman did not disclose among other things:
 “A device comprising: a template register configured to store a stream definition template … receive an instruction from the processor that specifies at least one of: a number of bits in the template register allocated to each of the respective counts of the plurality of nested loops or a number of bits in the template register allocated to specify a distance between pointer positions of each loop of the plurality of nested loops; based on the instruction, retrieve a set of data elements from the memory by: generating a set of addresses for the set of data elements by traversing the plurality of nested loops according to the respective counts in the stream definition template; and retrieving the set of data elements from the memory according to the set of addresses; and provide the set of data elements to the processor.”
	Claim 13 recites similar limitations.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/           Primary Examiner, Art Unit 2183