DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10, 11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
As to claim 10, the meaning of the claim language is unclear: “…are initialized to a different.”  (last line). It is not clear what is being different from the reason counts? Is it referring to the values of the reason counts or other different reason counts?  Therefore, the meaning of the claim language is unclear. For the examination purpose, it is assumed that each reason count has a different binary value. Applicant’s clarification and correction is advised in the next response.
As to claim 11, claim 11 is dependent from claim 10 and rejected under the same reason due its dependency from claim 10.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 is/are rejected under 35 U.S.C. 102 a (1) as being anticipated by Rao et al. (20160180486), published Jun, 23, 2016.
As to claim 1, Rao teaches a graphics processor (see fig.11 graphic processing unit (GPU) 1114 for the block diagram that includes drive logic 1116 and dynamic pipelined workload execution mechanism 1110. See also fig.12 that shows more details and the specific functional blocks of the dynamic pipelined workload execution mechanism 1110), comprising: 
a grouping of processing resources [resources] (see resources in para [0129] and resource threads in [0130]; see also the execution resources include an array of graphics execution units in [0046] for introductory teaching); and 
control logic [drive logic 1116: dynamic pipelined workload execution mechanism 1110] (see 1110 may be part of 1116 in [0125])  that is associated with the grouping of processing resources [resources], the control logic is configured to sample a state of at least one at least one of stalls and reason counts for stalling activity (e.g. the wait due the dependency), instruction types, pipeline utilization, thread utilization (e.g. kernel), or shader activity. (See the dependency logic 1205, which is a functional block of the driver logic [drive logic 1116: dynamic pipelined workload execution mechanism 1110], detects that the kernel k3 may not be processed without having processed kernel k2, the entire command buffer might wait until k2 is processed in [0130]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao et al. (20160180486) in view of GONZALEZ-ALBERQUILLA et al. (20160092235).
As to claim 2, Rao does not but GONZALEZ-ALBERQUILLA teaches a cache unit that is associated with the grouping of processing resources, the cache unit (see fig.9, the thread selection 870, the pipeline stages and the functionally associated DID 860: History Table 900 as a cache structure, see para [0068], the thread selection logic 870 selects instructions to be processed at the various pipeline stages 802, 810, 830, 835, 840, and 850 in accordance with cache/TLB hit/miss history data maintained for each instruction in the history table 900 of the DID 860; see also a new entry can be reserved if no existing entry and the least recently used eviction policy in the history table 900 in [0069], which are the characteristic features of a cache) to receive an instruction pointer address [IP]  and the activity data [History/MP] including a stall reason (e.g. Miss/Miss Pending) for each state of processing resources [instruction/cache/TLB] that are associated with the cache unit [DID 860: History Table 900], as claimed. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include a cache unit that is associated with the grouping of processing resources, the cache unit to receive an instruction pointer address and the activity data including a stall reason for each state of processing resources that are associated with the cache unit, as claimed, (see details of the claim mapping above), because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the history table 900 as taught by  GONZALEZ-ALBERQUILLA, to a known device/method, such as the graphic processor of Rao as set forth above, for the purpose of   indicating historical hits and .
Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao et al. (20160180486) in view of GONZALEZ-ALBERQUILLA et al. (20160092235), as applied to claim 2 above, and in further view of Lee  et al. (20130219405).
As to claim 3, neither Rao nor GONZALEZ-ALBERQUILLA but Lee teaches wherein each sampling (e.g. gathering) of a state is scheduled for a chosen clock cycle (i.e. periodically) and is minimally intrusive. (see para [0062], the local monitoring units 423, 433 and 443 periodically gather information about the allocation of resources to tasks being executed and information about the use status of the allocated resources and data stream processing performance information, and send them to the QoS monitoring unit 413 of the apparatus for managing a data stream distributed parallel processing service 411. Note: periodically gathering the status of the allocation resources is minimally intrusive because it is not a periodically gathering, i.e. selective period, not a continuing gathering the status of the resources).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include each sampling of a state is scheduled for a chosen clock cycle and is minimally intrusive, as claimed, (see details of the claim mapping above), because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the periodically gathering the status of the resources as taught by  Lee, to a known device/method, such as the graphic processor of Rao as set forth above, for the purpose of   gathering the use status of the allocated resources and data stream processing .
Claims 4, 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rao et al. (20160180486) in view of Lee et al. (20130219405).
As to claim 4, the limitations of parent claim 1 have been discussed in claim 1 above. Rao does not  but Lee teaches wherein the control logic [scheduler 114][task management units 422, 432 and 442][413] is configured to store a state when threads are allocated on a processing resource with no instruction being executed (e.g. the redundant resource being allocated/selected by the scheduler 114)  for a chosen cycle (i.e. periodically) that is sampled (e.g. the information about the load of the tasks and the nodes gathered periodically, see para [0059], the scheduling unit 414 may select a node having redundant resources based on information about the resources of nodes that are managed by the QoS monitoring unit 413, that is, information about the load the nodes, and then allocate one or more tasks to relevant task execution devices; see  para [0057, this information is gathered periodically by QoS monitoring unit 413 as taught in [0057]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include control logic configured to store a state when threads are allocated on a processing resource with no instruction being executed for a chosen cycle (i.e. periodically) that is sampled, as claimed (see details of the claim mapping above), because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the periodically gathering the status of the resources and the allocation of the tasks, as taught by  Lee, to a known device/method, such as the graphic processor of Rao as set forth 
As to claim 6, Rao does not but Lee teaches wherein the control logic [scheduler 114][task management units 422, 432 and 442][413]  is configured to interleave samplings (e.g. periodically) of states of processing resources among the grouping of processing resources and other groupings of processing resources [tasks and the nodes], to resolve the states into one of a number of supported stall reasons (see I/O overload] at step 805 and CPU overload at step 804 in fig.8), and to prioritize the supported stall reasons [S804][s805] based on a priority level [does ratio > set value ? ] of the stall reasons [S804][s805]. (see para [0059], the scheduling unit 414 may select a node having redundant resources based on information about the resources of nodes that are managed by the QoS monitoring unit 413, that is, information about the load the nodes, and then allocate one or more tasks to relevant task execution devices; see para [0057, this information is gathered periodically by QoS monitoring unit 413. See also fig.8, para [0095], it describes further the criteria at step 807 for deciding whether to exclude the overload tasks or generate task rearrangement).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to include the control logic configured to interleave samplings of states of processing resources among the grouping of processing resources and other groupings of processing resources, to resolve the states into one of a number of supported stall reasons, and to prioritize the supported stall reasons based on a priority level of the stall reasons, as claimed (see details of the claim mapping above), because one of ordinary skill in the art should be able .
Claim(s) 8, 9, 10, 11,14,15,16,17,18,21 is/are rejected under 35 U.S.C. 102 a (1) as being anticipated by GONZALEZ-ALBERQUILLA et al. (20160092235), published Mar, 31, 2016.
As to claim 8,  GONZALEZ-ALBERQUILLA (“GA” hereafter) teaches a cache structure, comprising (see fig.9, the thread selection 870, the pipeline stages and the functionally associated DID 860: History Table 900 as a cache structure, see para [0068], the thread selection logic 870 selects instructions to be processed at the various pipeline stages 802, 810, 830, 835, 840, and 850 in accordance with cache/TLB hit/miss history data maintained for each instruction in the history table 900 of the DID 860. Note: see also a new entry can be reserved if no existing entry and the least recently used eviction policy in the history table 900 in [0069], which are the characteristic features of a cache): 
logic [thread selection logic 870] [pipeline stages Execute 840, Retire 850] to perform operations of the cache structure (see the DID 860 interfaces with the reorder buffer (ROB) 851 at Retire Stage “841” (850), the MSHRs 841 at the execution stage 840, and with the thread selection logic 870 in [0069]); and 
memory [900] coupled to the logic [thread selection logic 870][pipeline stages Execute 840, Retire 850], the memory [DID 860: History Table 900] to store instruction pointer 
wherein the logic is configured to receive an instruction pointer address [IP] and activity data for a state [miss/hit/MP] of processing resources [instruction/cache/TLB] that are associated with the cache structure (see the historical hits and misses are indicated in the history table 900 in [0068]; see also the miss pending status is also recorded in history table 900 in [0069]).
As to claim 9, GA teaches wherein the logic is configured to perform an instruction pointer address lookup (indexed by pointer) within the cache structure. (See the history table 900 is indexed by the instruction address pointer IP in para [0068]; Note: see also a new entry can be reserved if no existing entry and the least recently used eviction policy in the history table 900 in [0069], which are the characteristic features of a cache).
As to claim 10, GA teaches wherein the logic is configured to build an entry [new entry] for a new cache line when the instruction pointer lookup misses [non-existing entry], to store the instruction pointer address and the activity data in the new cache line (See a new entry can be reserved if no existing entry in [0069]; see fig.9 each entry includes at least IP and the activity data, such as history), to initialize the identified activity including a stall reason to a count while all other reason counts are initialized to a different. (See the history field in the History Table 900 is initialized with the activity, such as miss pending field MP that is being updated to Yes (binary 1) or No (binary 0) in fig.9, [0069]).
As to claim 11, GA teaches wherein the logic is configured to determine if all available lines of the cache structure are occupied (e.g. no more space) and to perform a capacity-
As to claim 14, GONZALEZ-ALBERQUILLA (“GA” hereafter) teaches a method for minimally intrusive profiling of a graphics processing unit (GPU), comprising (see fig.8, core 0 includes at least DID 860; see the core may be a graphic core as introduced in [0022])
receiving, with a cache unit [DID 860] (see fig.9 shows more details of DID 860 that includes at least a History Table 900, para [0068]), 
an instruction pointer address [IP] and activity data [miss/hit/MP]  for each state of processing resources [instruction/cache/TLB] that are associated with the cache unit [DID 860]  ; and 
performing an instruction pointer address lookup (e.g. by the pointer) within the cache unit for the received instruction pointer address [IP] and associated activity data [Hit/miss/MP]. (See [0068], each entry in the history table 900 is indexed with an instruction pointer 901 (i.e., a pointer identifying the location of the instruction in the memory subsystem) and includes a plurality of history bits 902 to indicate historical hits and misses to the L1 cache, L2 cache, and/or the TLB).
As to claim 15, GA teaches further comprising: building an entry for a new cache line when the instruction pointer lookup misses. (See fig.9 History Table 900, the misses indexed by the instruction pointer IP, [0069]).
As to claim 16, GA teaches further comprising: storing the instruction pointer address [IP] and the activity data [history] [MP] in the new cache line (See a new entry can be reserved 
As to claim 17, GA teaches further comprising: determining if all available lines of the cache structure are occupied (e.g. no more space) and to perform a capacity-eviction to evict an existing line if all available lines of the cache structure are occupied. (See the least recently used eviction policy in which an entry is removed to make space for the new entry in [0069]).
As to claim 18, GA teaches further comprising: determining a hit for instruction pointer address lookup. (See fig.9 900 Hit entry of the corresponding IP)
As to claim 21, GA teaches wherein the activity data includes at least one of stalls and reason counts for stalling activity  (see the misses to the L1 cache, L2 cache, and/or the TLB as stalls of the L1 cache, L2 cache, and/or the TLB in [0068]; see also the lookup process of the history data and miss pending data in [0070]),  instruction types, pipeline utilization (see the lookup process of the history data and miss pending data of the pipeline stage in [0070]), thread utilization (see the selection of a new thread in [0070]), or shader activity.

Allowable Subject Matter
Claims 5,7,12,13,19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

b) The supported stalls and reason counts for stalling activity comprise a synch stall field for a stall or delay between threads to reach a common point, an instruction fetch field for an instruction fetch from memory that is stalled, a scoreboard field for a stall based on a data dependency, a send stall field for a send bus bandwidth limit for an processing resource, a pipe stall field for a stall within a pipeline, and an internal stall field for a stall caused from a memory bank collision. (Claim 7)
c) The determination of a hit for instruction pointer address lookup, to perform a read operation of a cache line for the instruction pointer address, to perform a modify operation to increment a count of the identified activity, and to perform a write operation for the cache line. (Claim 12)
d) The maximum value eviction when a given cache line has an activity count that reaches a maximum representable value and performs the maximum value eviction by evicting the instruction pointer address and its corresponding data to a circular buffer in main memory. (Claim 13)
e) The read operation of a cache line for the instruction pointer address; performing a modify operation to increment a count of the identified activity for the instruction pointer address; and performing a write operation for the cache line. (Claim 19)

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  

b) Clarberg et al.  (20150070355) is cited for the teaching of a graphics processor has multiple physical shader cores, each running a large number of logical threads (contexts), in order to hide latencies due to memory stalls etc. (see para [0058]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL H PAN whose telephone number is (571)272-4172.  The examiner can normally be reached on M-F 8:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571 272 4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access 


DANIEL H. PAN
Examiner
Art Unit 2182



/DANIEL H PAN/             Primary Examiner, Art Unit 2182