Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 7-10, 15, 18, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mejdrich et al. (US 2009/0150647 A1, hereinafter Mejdrich) in view of Wildman (US 2008/0209164 A1) and Clery, III (US 6079008, Hereinafter Clery)
	Regarding claim 1, Mejdrich teaches:
A single instruction multiple data (SIMD) processing unit configured to process a plurality of tasks…, wherein the work items of a task are arranged for executing a common sequence of instructions on respective data items, (Mejdrich, page 1 [0004], “Another popular technique for increasing performance is to use a single instruction multiple data (SIMD) architecture, which is also referred to as `vectorizing` the data. In this manner, operations are performed on multiple data elements at the same time, and in response to the same SIMD instruction. A vector execution unit typically includes multiple processing lanes that handle different datapoints in a vector and perform similar operations on all of the datapoints at the same time. For example, for an architecture that relies on quad(4)word vectors, a vector execution unit may include four processing lanes that perform the identical operations on the four words in each vector.” A work item corresponds to a processing lane that handles multiple data scheduled on that lane. A task corresponds to a vector execution unit.)
wherein blocks of work items within a task relate to respective blocks of data items, each block of data items being a pixel quad, ([0004], “For example, for an architecture that relies on quad(4)word vectors, a vector execution unit may include four processing lanes that perform the identical operations on the four words in each vector.” A work item corresponds to a processing lane that handles multiple data scheduled on that lane. A task corresponds to a vector execution unit.)
the SIMD processing unit comprising: 
a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles, wherein each of the processing lanes of the group is configured to execute instructions of a respective block of work items over a plurality of consecutive processing cycles; (Mejdrich, page 1 [0010], “processing lanes can be selectively grouped together to operate as different types of vector execution units”, [0011], “a vectorizable execution unit may be provided with multiple processing lanes that in one mode, are grouped together into the same logical execution unit such that the processing lanes operate collectively as a single vector or SIMD execution unit”. Page 2 [0023], “A logical execution unit, in this regard, constitutes one or more physical processing lanes defined in an execution unit, where a physical processing lane typically incorporates execution logic configured to perform one or more data processing operations, in one or more stages, responsive to an instruction provided thereto. A logical execution unit, in addition, typically is capable of receiving up to one instruction (typically a vector or scalar instruction) per cycle, although if the processing lanes incorporated in a logical execution unit are pipelined, multiple instructions may be at different stages of execution in a logical execution unit at any given time. Where a given mode of a vectorizable execution unit organizes the processing lanes into multiple logical execution units, those units are typically capable of being operated independently and in parallel with one another”) and 
However, Mejdrich does not teach:
a plurality of tasks which each include up to a predetermined maximum number of work items,
logic coupled to the group of processing lanes configured to cause the group of processing lanes to skip a particular processing cycle if there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle.
On the other hand, Wildmant teaches:
a plurality of tasks which each include up to a predetermined maximum number of work items, (Wildmant, page 2 [0040],  "FIG. 4 illustrates a register file 42 for use in a processing element which includes an execution unit which operates on data stored in the register file, and which is able to process multiple instruction threads. The register file can also be used in the serial processor 10. Such a register file embodies another aspect of the invention. The parallel processor 15 can process a predetermined maximum number of instruction streams (threads). The register file 42 is provided with a set of registers for each such thread.”)
Mejdrich teaches processing a plurality of task, which is to execute a sequence of instructions on data items. Wildmant teaches each task can have a maximum number of work items.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the SIMD task of Mejdrich with the predetermined maximum number of work items of Wildmant, so to limit the number of work items in a vector execution unit to not more than the predetermined maximum number. The motivation is to consider and accommodate the hardware limitation when implementing the SIMD processing unit. 
However, Mejdrich in view of Wildmant does not teach:
logic coupled to the group of processing lanes configured to cause the group of processing lanes to skip a particular processing cycle if there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle.
On the other hand, Clery teaches:
In order to improve performance, logic coupled to the group of processing lanes configured to cause the group of processing lanes to skip a particular processing cycle if there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. (Clery, column 23, 1st para, “As a thread repeatedly cycles, it determines whether or not the direct memory access (DMA) operation has provided any new data. If no sampled data is available in a processing unit 14 for that cycle of the thread, the processing unit skips execution on that cycle. ")
Mejdrich teaches in order to improve performance, a quad words are loaded to a vector execution unit, which includes four processing lane that perform the identical operations on the four words in each vector. Clery teaches in order to improve performance, a processing lane can skip a particular processing cycle if there is no work items scheduled in that processing in that particular processing cycle.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the SIMD processing unit of Mejdrich in view of Wildmant with the skipping processing cycle method of Clery, so a logic module coupled to the groups of processing lanes configured to cause a particular group of processing lanes to skip a particular processing cycle, if there are no work items scheduled for execution in any of the processing lanes of the particular group in the particular processing cycle. The motivation is to improve the total performance of the SIMD processing unit.

Regarding claim 4, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 1, wherein the logic is configured to assemble the work items into the tasks such that work items of a block of work items are grouped together into the same task.( Mejdrich [0064], “However, it will be appreciated that this architecture is flexible enough to account for any number of processing lanes, any number of modes, and any grouping of processing lanes into logical execution units. As but one example, an execution unit with four (N=4) processing lanes could support various modes, e.g., where all processing lanes are functionally combined into a single logical execution unit to process 4 word vectors, where three processing lanes are functionally combined into a first logical execution unit to process 3 word vectors and the fourth processing lane operates in parallel as a second logical execution unit configured as an independent scalar unit, where the four processing lanes are combined to form two independent logical execution units configured as vector units each capable of processing 2 word vectors in parallel, where two processing lanes are combined to form a 2 word vector logical execution unit and the remaining two processing units operate in parallel as independent scalar logical execution units, where the four processing lanes are configured to operate in parallel as independent scalar logical execution units, etc. In addition, while it is desirable in the illustrated embodiments to incorporate at least one mode where scalar instruction execution is supported, it will be appreciated that a vectorizable execution unit consistent with the invention may support vector-only modes exclusively, where processing lanes are combined in different manners in different modes to form different vector-based logical execution units (e.g., a first mode with two 2 word vector logical execution units and a second mode with one 4 word vector logical execution unit).”)

Regarding claim 7, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 1, wherein there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle if all of the work items which are scheduled for execution over the group of processing lanes in the particular processing cycle are invalid work items. (Clery, column 23, 1st para, “As a thread repeatedly cycles, it determines whether or not the direct memory access (DMA) operation has provided any new data. If no sampled data is available in a processing unit 14 for that cycle of the thread, the processing unit skips execution on that cycle. ". The work items execution is skipped. The combination of claim 1 is incorporated here.)

Regarding claim 8, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 1, wherein there is not a valid work item scheduled for execution in a processing lane in a particular processing cycle if there is not a work item which is scheduled for execution in the processing lane in the particular processing cycle. (Clery, column 23, 1st para, “As a thread repeatedly cycles, it determines whether or not the direct memory access (DMA) operation has provided any new data. If no sampled data is available in a processing unit 14 for that cycle of the thread, the processing unit skips execution on that cycle. ". The work items execution is skipped. The combination of claim 1 is incorporated here.)

Regarding claim 9, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 8, wherein work items which are not ready for execution when the task is due to be sent to the group of parallel processing lanes are not scheduled for execution. (Clery, column 23, 1st para, “As a thread repeatedly cycles, it determines whether or not the direct memory access (DMA) operation has provided any new data. If no sampled data is available in a processing unit 14 for that cycle of the thread, the processing unit skips execution on that cycle. ". When there is no data available yet, the work items are not ready, so the execution is skipped. Mejdrich teaches the group of parallel processing lanes. The combination of claim 1 is incorporated here.)

Regarding claim 10, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 1, wherein some of the tasks comprise fewer than the predetermined maximum number of work items, and wherein the SIMD processing unit comprises a plurality of parallel groups of processing lanes, each group being configured to execute instructions of work items of a respective task over a plurality of processing cycles. (Mejdrich, page 1 [0010], “processing lanes can be selectively grouped together to operate as different types of vector execution units”, [0011], “a vectorizable execution unit may be provided with multiple processing lanes that in one mode, are grouped together into the same logical execution unit such that the processing lanes operate collectively as a single vector or SIMD execution unit”. Page 2 [0023], “A logical execution unit, in this regard, constitutes one or more physical processing lanes defined in an execution unit, where a physical processing lane typically incorporates execution logic configured to perform one or more data processing operations, in one or more stages, responsive to an instruction provided thereto. A logical execution unit, in addition, typically is capable of receiving up to one instruction (typically a vector or scalar instruction) per cycle, although if the processing lanes incorporated in a logical execution unit are pipelined, multiple instructions may be at different stages of execution in a logical execution unit at any given time. Where a given mode of a vectorizable execution unit organizes the processing lanes into multiple logical execution units, those units are typically capable of being operated independently and in parallel with one another” Wildmant teaches maximum number of work items. By combining the teachings of Mejdrich with the specific teachings of Wildmant, the task can include up to maximum number of work items. The combination rationale of claim 1 is incorporated here.)

	Claim 15 recites similar limitations of claim 1, thus are rejected using the same rejection rationale.

Claim 18 recite similar limitations of claim 4, thus are rejected using the same rejection rationale.

Regarding claim 20, Mejdrich in view of Wildman and Clery teaches:
A non-transitory computer readable storage medium having stored thereon an integrated circuit dataset description that when inputted causes an integrated circuit manufacturing system to generate a single instruction multiple data (SIMD) processing unit configured to (Mejdrich, Claim 14. “A program product comprising a computer readable medium and logic definition program code resident on the computer readable medium and defining the circuit arrangement of claim 1.”) The rest of claim 20 recites limitations of claim 1, thus are rejected using the same rejection rationale.

Claims 2-3, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Mejdrich in view of Wildman and Clery and further in view of Teruyama et al. (US 2007/0182750 A1, Teruyama).
Regarding claim 2, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 1, wherein the logic is configured
However, Mejdrich in view of Wildman and Clery does not, but Teruyama teaches:
 to set indicators to indicate how the work items have been assembled into the tasks. ([0117], “For example, in FIG. 6, a stamp with STID=15 is inside the triangle. Accordingly, all the pixels contained in this stamp need to be drawn. However, for example, for a stamp with STID=7, pixels with PIXIDs=0 to 8, 12, 13, and 15 are outside the triangle and need not be drawn. Only the pixels with PIXIDs=9 to 11 and 14 need to be drawn. Thus, pixels that need to be drawn are hereinafter referred to as "valid" pixels, whereas pixels that need not be drawn are hereinafter referred to as "invalid" pixels.”[0138]-[0139], “The quad merge operation is such a process as described below with reference to FIG. 13. FIG. 13 is a conceptual drawing of a quad merge operation. The quad merge operation involves merging two temporally successive stamps with the same XY coordinates into one stamp. By the quad merge, valid quads in two stamps can be compounded into one stamp and can be processed at a time. Thus, the amount of data to be subjected to the rendering process can be compressed. As shown in FIG. 13, the four quads contained in one stamp are hereinafter referred to as quads Q0 to Q3. It is assumed that first, the stamp 1 in which the quads Q0 and Q2 are valid, whereas the quads Q1 and Q3 are invalid is input to the instruction control unit and that the stamp 2 in which the quads Q1 and Q2 are valid, whereas the quads Q0 and Q3 are invalid is subsequently input to the instruction control unit. In this case, the two stamps 1 and 2 are merged to generate a new stamp containing the quads Q0 and Q2 of the stamp 1 and the quads Q1 and Q2 of the stamp 2. This process is the quad merge operation. The newly generated stamp is hereinafter referred to as a thread so as to be distinguished from the stamp not subjected to the quad merge operation.” Mejdrich in view of Wildman and Clery teaches assembling work items together into a task. Teruyama teaches using an indicator to indicate how the work items are assembled into a task.  Using an indicator provides an easy and straight forward way for system to identify valid and invalid work items. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Mejdrich in view of Wildman and Clery with the specific teachings of Teruyama to easily identify how work items are assembled.)

Regarding claim 3, Mejdrich in view of Wildman and Clery and Teruyama teaches:
The SIMD processing unit of claim 2, further comprising: a store configured to store the processed data items output from the group of processing lanes; (Mejdrich [0065], “When processing lanes are functionally combined to operate as a vector unit, the partitions associated with those processing lanes are likewise functionally combined to store vectorized data.” [0063]) and storing logic configured to determine addresses for storing the processed data items in the store based on the indicators.( Teruyama [0127], “The first state machine 50 then generates a first data write enable signal and a stamp data write enable signal on the basis of the first start signal. The first data write enable signal enables a write operation on the first data holding unit 42. The stamp data write enable signal enables a write operation on the stamp holding unit 44. A first data write address signal is also generated on the basis of a stamp number STN sent by the stamp holding unit 44. The stamp number STN is an identification number uniquely provided for each stamp. The first data write address signal indicates an address in the first data holding unit 42 at which the first data is written.” Mejdrich teaches save processed data. Teruyama further teaches using determine the data storage address based on an indicator. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Mejdrich in view of Wildman and Clery with the specific teachings of Teruyama to easily identify data storage address.)

Claim 16-17 recite similar limitations of claim 2-3 respectively, thus are rejected using the same rejection rationale respectively.

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Mejdrich in view of Wildman and Clery and further in view of Park (US 2012/0212573 A1).
Regarding claim 5, Mejdrich in view of Wildman and Clery teaches:
The SIMD processing unit of claim 1, wherein the work items are assembled into blocks of work items such that each work item within a block of work items can be used to perform different processing on the block of work items before it is passed to the group of processing lanes (Mejdrich, page 1 [0004], “Another popular technique for increasing performance is to use a single instruction multiple data (SIMD) architecture, which is also referred to as `vectorizing` the data. In this manner, operations are performed on multiple data elements at the same time, and in response to the same SIMD instruction. A vector execution unit typically includes multiple processing lanes that handle different datapoints in a vector and perform similar operations on all of the datapoints at the same time. For example, for an architecture that relies on quad(4)word vectors, a vector execution unit may include four processing lanes that perform the identical operations on the four words in each vector.” The output of the operation can be passed to the group of processing lanes for other processing purpose.) 
Mejdrich in view of Wildman and Clery does not, but Park teaches:
A processing can be perform a pre-processing operation ([0030], “First, the pre-processing part 120 in accordance with one example embodiment of the present invention may calculate the gradient vector components representing the changes in intensity or color with respect to respective pixels in the two-dimensional adjusted image. Herein, directions of the gradient vector components may be determined in the directions of maximum changes in intensity or color and magnitudes of the gradient vector components may be decided to be the rate of change in the directions of the maximum changes in intensity or color.”)
Mejdrich in view of Wildman and Clery teaches a group of processing lanes used to perform different processing on pixel quad. The output of one processing can be feed into next processing process. Park teaches one specific processing is to perform gradient operation on image pixel data before the image pixel data is used for other operation. 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the SIMD processing structure of Mejdrich in view of Wildman and Clery with the specific teachings of Park to use the SIMD processing structure to do image processing taught by Park. The SIMD processing structure is very efficient and high-performance structure. The benefit of this combination would to efficiently perform image processing.

Regarding claim 6, Mejdrich in view of Wildman and Clery and Park teaches:
The SIMD processing unit of claim 5, wherein the pre-processing operation is a gradient operation configured to determine the rate of change of a varying quantity between different pixels in a pixel quad.(Park, ([0030], “First, the pre-processing part 120 in accordance with one example embodiment of the present invention may calculate the gradient vector components representing the changes in intensity or color with respect to respective pixels in the two-dimensional adjusted image. Herein, directions of the gradient vector components may be determined in the directions of maximum changes in intensity or color and magnitudes of the gradient vector components may be decided to be the rate of change in the directions of the maximum changes in intensity or color.” Mejdrich teaches pixel quad. The combination rationale of claim 6 is incorporated here.)

Allowable Subject Matter
Claims 11-14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: none of the references along or in combination teaches the limitations of “wherein the logic coupled to the groups of processing lanes is further configured to cause a particular group of processing lanes to skip a particular processing cycle, independently of the other groups of processing lanes, if there are no valid work items scheduled for execution in any of the processing lanes of the particular group in the particular processing cycle.” Recited in claim 11 and similarly recited in claim 19.

none of the references along or in combination teaches the limitations of “wherein there are three levels of validity for pixels of a pixel quad, a first level of validity being full validity, a second level of validity being partial invalidity and a third level of validity being full invalidity, and wherein the logic is configured to: skip a first particular processing cycle comprising work items corresponding to pixels of the third level of validity when instructions are to be executed on pixels of the first and second levels of validity, but instructions are not to be executed on pixels of the third level of validity; and skip a second particular processing cycle comprising work items corresponding to pixels of the second level of validity when instructions are to be executed on pixels of the first level of validity, but instructions are not to be executed on pixels of the second level of validity.” Recited in claim 13.
Claims 12 and 14 are objected for be dependent on claims 11 and 13.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725. The examiner can normally be reached Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YANNA WU/Primary Examiner, Art Unit 2611