Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action

1.	The Examiner acknowledges the applicant’s amendment filed September 7, 2021.  At this point claims 1-20 are pending in the instant application and ready for examination by the Examiner.

Claim Rejections - 35 USC § 103
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

(s) 1-2, 4, 6-9, 11, 13-16, 18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cox in view or Babu in view of Marchezi in view of Cheng in view of Trobough and in view of Thibeault. (U. S. Patent Publication 20140152848, referred to as Cox; U. S. Patent Publication20160092115, referred to as Babu; U. S. Patent Publication20180276507, referred to as Marchezi; U. S. Patent Publication 20120050260, referred to as Cheng; U. S. Patent Publication 20150127983, referred to as Trobough; U. S. Patent 9349092, referred to as Thibeault)

Claim 1
Cox discloses an apparatus comprising: a graphics processor detect one or more components on to which a neural network is deployed, wherein the one or more components comprise memory and the graphics processor coupled to the memory, (Cox, 0039, 0048, 0018, 0020; In one embodiment, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit ( GPU).[0018] As shown, parallel processing subsystem 112 includes one or more parallel processing units (PPUs) 202, each of which is coupled to a local parallel processing (PP) memory 204. In general, a parallel processing subsystem includes a number U of PPUs, where U.gtoreq.1.[0020] ‘At step 406, digital camera 302 configures a processing unit based on configuration file 324. The processing unit could be, e.g., an ISP, CPU 304, and/or PPU 306. ….. At step 408, digital camera 302 renders trial images 330 by processing raw images 320 with the configured processing unit. At step 410, training engine 328 updates the weight values within machine learning engine 322 based on the difference between the trial images 330 generated at step 410 and target images 326. Target images 326 represent "ideal" images that would, ideally be produced by digital camera 302 by processing raw images 320. [0048] Machine learning engine 322 includes a set of weight values and is configured to transform a given set of raw images 320 and corresponding pixel values into a set of parameter values comprising configuration file 324 using those weight values, in the fashion consistent with machine learning techniques. In one embodiment, machine learning engine 322 comprises an artificial neural network (ANN).[0039] EC: [0018] Links a graphic processor to a parallel processing system. [0020] Links a parallel to PPU. [0048] Links PPU to a learning engine. [0039] Links a learning engine to a neural network.); and…. form a high bandwidth memory (HBM) system coupled to one or more compute clusters further coupled with the memory through one or more compute elements wherein the HBM system is facilitated through one or more HBM channels. (Cox, 0028; Render targets, such as frame buffers or texture maps may be stored across DRAMs 220, allowing partition units 215 to write portions of each render target in parallel to efficiently use the available bandwidth of parallel processing memory 204. EC: HBW maps to Parallel processing. Clusters of applicant maps to the destinations of each of the parallel bandwidth. HBW channel maps to each specific unit of the parallel processing.)
Cox does not disclose expressly determine a first amount of memory and a second amount of hardware of the graphics processor.
Babu discloses determine a first amount of memory and a second amount of hardware of the graphics processor. (Babu, 0049; Instructions 612 may determine memory demand. For example, instructions 612 may determine, during boot time, how much memory will be used by processes, applications, and/or hardware (e.g., hard drive, CPU) that will be running or utilized at the beginning of runtime, or may receive data from an OS during runtime regarding how much memory is needed by processes/applications that are running, as discussed above with respect to FIG. 1.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox and Babu before him before the effective filing date of the claimed invention, to modify Cox to incorporate values of required memory and processing output of Babu. Given the advantage of allocating the proper resources to accomplish a task, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Cox and Babu do not disclose expressly that is to be used for performing, with precision, neural network tasks of the neural network, allocate, based on the first amount and the second amount, a storage portion of the memory and a hardware portion of the hardware of the graphics processor to a machine learning training set of the neural network, wherein the storage and hardware portions are precise for implementation and processing of the machine learning training set.
Marchezi discloses that is to be used for performing, with precision, neural network tasks of the neural network (Marchezi, 0044; Referring now to the flow diagram of FIG. 12, an example method 1200 of implementing a machine learning classifier begins at block 1202, with allocating in persistent memory, a training structure comprising an array of categories, a category data structure for each category in the array, and a global data structure. The method 1200 continues at block 1204 with Marchezi, 0034, 0046, 0042; ‘A number of categories can be read from a set of training data 113, and that number can be used to allocate a category array 118 in the persistent memory.’ And ‘Incrementing a category word counter can include searching for a corresponding word structure in the word search tree of the category (block 1314), and if a corresponding word structure is found, incrementing the category word counter within corresponding word structure, as shown at block 1316. However, as shown at block 1318, if a corresponding word structure is not found, incrementing the category word counter can include allocating a new word structure in persistent memory,…’ with ‘In some examples, implementing the operations of methods 1200, 1300, and 1500 can be achieved using an ASIC and/or other hardware components (not shown) alone or in combination with programming 
Cox, Babu and Marchezi do not disclose expressly generate a single unified memory system having at least the storage portion of the memory and at least the hardware portion of the graphics processor:
Cheng discloses generate a single unified memory system having at least the storage portion of the memory and at least the hardware portion of the graphics processor: (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126, and the second processor 116 is a discrete graphic processor that drives the second, third, and fourth physical displays 104-108 through the second, third, and fourth display connectors 128-132, respectively. ….. Although the first frame buffer 114, the second frame buffer 118, and the system memory 122 are shown in FIG. 1 as discrete memory devices, it is understood that a unified memory architecture that can accommodate all the processors may also be employed.’ of Cheng. EC: Here Cheng connects graphic processors can be seen as a unified memory architecture. Graphic processors and buffers are hardware.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and 
Cox, Babu, Marchezi and Cheng do not disclose expressly introduce cache coherency within the single unified memory system the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESI) protocol for pages that the graphics processor does not own.
Trobough discloses introduce cache coherency within the single unified memory system the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESI) protocol for pages that the graphics processor does not own.
(Trobough, 0047, 0085, 0010, fig 4; ‘The data cache is to store recently used/operated on elements, such as data operands, which are potentially held in memory coherency 
Cox, Babu, Marchezi, Cheng and Trobough do not disclose expressly process the neural network to identify data dependencies and to group dependent data of the neural network together corresponding to the data dependencies, and  allocate the dependent data in a same HBM channel of the one or more HBM channels.
Thibeault discloses process the neural network to identify data dependencies and to group dependent data of the neural network together corresponding to the data Thibeault, c2:13-37; In another embodiment disclosed herein, a neural network for reinforcement-learning and for action-selection comprises a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, and a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated. EC: data dependencies is merely mentioned in 0017, 0168 and 0186 and has no formal definition of how or how not ‘dependency’ is achieved. A neural network is a classifier by design based on the distinct outputs. The examiner views these individual outputs as ‘independent’ from one another and the output of a single output and their corresponding input are dependent.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough and Thibeault before him before the effective filing date of the claimed invention, to modify Cox, Babu, Marchezi, Cheng and Trobough to incorporate high band memory of Thibeault. Given the advantage of forwarding information of the same label using high bandwidth for improved efficiency, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 2
Cox discloses wherein the graphics processor is further to analyze the one or more components to determine the storage portion and the hardware portion of the memory and the graphics processor, respectively, wherein the one or more components further comprise one or more of one or more compilers, one or more drivers, schedulers, compute clusters, compute elements, or caches. (Cox, 0017, 0018, 0025 FIG. 1 is a block diagram illustrating a computer system 100 configured to implement one or more aspects of the present invention. Computer system 100 includes a central processing unit (CPU) 102 and a system memory 104 that includes a device driver 103. [0017] In one embodiment, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit ( GPU)…. In yet another embodiment, the parallel processing subsystem 112 may be integrated with one or more other system elements, such as the memory bridge 105, CPU 102, and I/O bridge 107 to form a system on chip (SoC)..[0018] Each PPU 202 advantageously 

Claim 4
Cox, Babu and Marchezi do not disclose expressly wherein the  single unified memory system further comprises caches coupled to the graphics processor and one or more other graphics processors to form a communication network for transmission of data between multiple graphics processors including the graphics processor and the one or more other graphics processors.
Cheng discloses wherein the single unified memory system further comprises caches coupled to the graphics processor and one or more other graphics processors to form a communication network for transmission of data between multiple graphics processors including the graphics processor and the one or more other graphics processors. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126, and the second processor 116 is a discrete graphic processor that drives the second, third, and fourth physical displays 104-108 through the second, third, and fourth display connectors 128-132, respectively. The first display connector 126 may be internal to the system 100 and the first physical display 102 may form a part of system 100--e.g., a display forming part of a laptop computer or mobile device such as, for example, a mobile phone. Nevertheless, it is understood that the number of the physical displays that each processor drives may be varied, and that the type of graphic The third processor 120 may be a host central processing unit (CPU) bi-directionally connected to the system memory 122 and bi-directionally connected to other components of the system 100 through the system bus 134 as known in the art, or any other suitable processor. It is understood that, the first, second, and third processors 112, 116, and 120 may be integrated as a general processor (e.g., APU, accelerated processing unit; GPGPU, general-purpose computing on GPU); or the third processor (e.g., CPU) 120 may be integrated with the first processor 112 or with the second processor 116 to form a general processor. Although the first frame buffer 114, the second frame buffer 118, and the system memory 122 are shown in FIG. 1 as discrete memory devices, it is understood that a unified memory architecture that can accommodate all the processors may also be employed.’ of Cheng.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and Marchezi to incorporate either a processor integrated with a graphic processor or multiple processors that can use a multiple memory system of Cheng. Given the advantage of efficient use of a plurality and singular memory to reduce processing time and computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 6
Cox discloses wherein the graphics processor is further to form a high bandwidth memory (HBM) system employing the graphics processor coupled to one or more compute clusters further coupled with the memory through one or more compute Cox, 0028; Render targets, such as frame buffers or texture maps may be stored across DRAMs 220, allowing partition units 215 to write portions of each render target in parallel to efficiently use the available bandwidth of parallel processing memory 204.), wherein the HBM system is facilitated through one or more HBM channels, wherein a scheduler to schedule tasks or threads relating to the graphics processor based on the HBM system. (Cox, 0021; Referring again to FIG. 1, in some embodiments, some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used as graphics memory including, e.g., a conventional frame buffer) to store and update pixel data, delivering pixel data to display device 110, and the like.)

Claim 7
Cox, Babu and Marchezi do not disclose expressly wherein the graphics processor is co-located with an application processor on a common semiconductor package.
Cheng discloses wherein the graphics processor is co-located with an application processor on a common semiconductor package. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126,….’ of Cheng.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu 

Claim 8
Cox discloses a method comprising: detecting one or more components on to which a neural network is deployed, wherein the one or more components comprise memory and a graphics processor coupled to the memory, (Cox, 0039, 0048, 0018, 0020; In one embodiment, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit ( GPU).[0018] As shown, parallel processing subsystem 112 includes one or more parallel processing units (PPUs) 202, each of which is coupled to a local parallel processing (PP) memory 204. In general, a parallel processing subsystem includes a number U of PPUs, where U.gtoreq.1.[0020] ‘At step 406, digital camera 302 configures a processing unit based on configuration file 324. The processing unit could be, e.g., an ISP, CPU 304, and/or PPU 306. ….. At step 408, digital camera 302 renders trial images 330 by processing raw images 320 with the configured processing unit. At step 410, training engine 328 updates the weight values within machine learning engine 322 based on the difference between the trial images 330 generated at step 410 and target images 326. Machine learning engine 322 includes a set of weight values and is configured to transform a given set of raw images 320 and corresponding pixel values into a set of parameter values comprising configuration file 324 using those weight values, in the fashion consistent with machine learning techniques. In one embodiment, machine learning engine 322 comprises an artificial neural network (ANN).[0039] EC: [0018] Links a graphic processor to a parallel processing system. [0020] Links a parallel to PPU. [0048] Links PPU to a learning engine. [0039] Links a learning engine to a neural network.); and …. forming a high bandwidth memory (HBM) system coupled to one or more compute clusters further coupled with the memory through one or more compute elements, wherein the HBM system is facilitated through one or more HBM channels. (Cox, 0028; Render targets, such as frame buffers or texture maps may be stored across DRAMs 220, allowing partition units 215 to write portions of each render target in parallel to efficiently use the available bandwidth of parallel processing memory 204. EC: HBW maps to Parallel processing. Clusters of applicant maps to the destinations of each of the parallel bandwidth. HBW channel maps to each specific unit of the parallel processing.)
Cox does not disclose expressly determining a first amount of the memory and a second amount of hardware of the graphics processor that is to be used.
Babu discloses determining a first amount of the memory and a second amount of hardware of the graphics processor that is to be used. (Babu, 0049; Instructions 612 may determine memory demand. For example, instructions 612 may determine, during boot time, how much memory will be used by processes, applications, and/or hardware (e.g., hard drive, CPU) that will be running or utilized at the beginning of runtime, or may receive data from an OS during runtime regarding how much memory is needed by processes/applications that are running, as discussed above with respect to FIG. 1.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox and Babu before him before the effective filing date of the claimed invention, to modify Cox to incorporate values of required memory and processing output of Babu. Given the advantage of allocating the proper resources to accomplish a task, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Cox and Babu do not disclose expressly for performing with precision neural network tasks of the neural network; and allocating based on the first amount and the second amount, a storage portion of the memory and a hardware portion  of the hardware of the graphics processor to a machine learning training set of the neural network, wherein the storage and hardware portions are precise for implementation and processing of the machine learning training set.
Marchezi discloses for performing with precision neural network tasks of the neural network (Marchezi, 0044; Referring now to the flow diagram of FIG. 12, an example method 1200 of implementing a machine learning classifier begins at block 1202, with allocating in persistent memory, a training structure comprising an array of categories, a category data structure for each category in the array, and a global data structure. The method 1200 continues at block 1204 with reading the categories of the array from training data. At block 1206, for each category, the method includes reading training statements from the training data (block 1208), splitting each training statement Marchezi, 0034, 0046, 0042; ‘A number of categories can be read from a set of training data 113, and that number can be used to allocate a category array 118 in the persistent memory.’ And ‘Incrementing a category word counter can include searching for a corresponding word structure in the word search tree of the category (block 1314), and if a corresponding word structure is found, incrementing the category word counter within corresponding word structure, as shown at block 1316. However, as shown at block 1318, if a corresponding word structure is not found, incrementing the category word counter can include allocating a new word structure in persistent memory,…’ with ‘In some examples, implementing the operations of methods 1200, 1300, and 1500 can be achieved using an ASIC and/or other hardware components (not shown) alone or in combination with programming instructions executable by a processor 102.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu and Marchezi before him before the effective filing date of the claimed invention, 
Cox, Babu and Marchezi do not disclose expressly generating a single unified memory system having at least the storage portion of the memory and at least the hardware portion of the graphics processor.
Cheng discloses generating a single unified memory system having at least the storage portion of the memory and at least the hardware portion of the graphics processor. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126, and the second processor 116 is a discrete graphic processor that drives the second, third, and fourth physical displays 104-108 through the second, third, and fourth display connectors 128-132, respectively. ….. Although the first frame buffer 114, the second frame buffer 118, and the system memory 122 are shown in FIG. 1 as discrete memory devices, it is understood that a unified memory architecture that can accommodate all the processors may also be employed.’ of Cheng. EC: Here Cheng connects graphic processors can be seen as a unified memory architecture. Graphic processors and buffers are hardware.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and Marchezi to incorporate either a processor integrated with a graphic processor or multiple processors that can use a multiple memory system of Cheng. Given 
Cox, Babu, Marchezi and Cheng do not disclose expressly introducing cache coherency within the single unified memory system the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESI) protocol for pages that the graphics processor does not own.
Trobough discloses introducing cache coherency within the single unified memory system the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESI) protocol for pages that the graphics processor does not own; (Trobough, 0047, 0085, 0010, fig 4; ‘The data cache is to store recently used/operated on elements, such as data operands, which are potentially held in memory coherency states. The D-TLB is to store recent virtual/linear to physical address translations. As a specific example, a processor may include a page table structure to break physical memory into a plurality of virtual pages.’ [0047] ‘In one 
Cox, Babu, Marchezi, Cheng and Trobough do not disclose expressly process the neural network to identify data dependencies, group dependent data of the neural network together corresponding to the data dependencies and allocating the dependent data in a same HBM channel of the one or more HBM channels.
Thibeault discloses process the neural network to identify data dependencies, group dependent data of the neural network together corresponding to the data dependencies and allocating the dependent data in a same HBM channel of the one or more HBM channels. (Thibeault, c2:13-37; In another embodiment disclosed herein, a a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, and a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated. EC: data dependencies is merely mentioned in 0017, 0168 and 0186 and has no formal definition of how or how not ‘dependency’ is achieved. A neural network is a classifier by design based on the distinct outputs. The examiner views these individual outputs as ‘independent’ from one another and the output of a single output and their corresponding input are dependent.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough and 

Claim 9
Cox discloses comprising analyzing the one or more components to determine the storage portion and the hardware portion of the memory and the graphics processor, respectively, wherein the one or more components further include one or more of one or more compilers, one or more drivers, schedulers, compute clusters, compute elements, and caches. (Cox, 0017, 0018, 0025 FIG. 1 is a block diagram illustrating a computer system 100 configured to implement one or more aspects of the present invention. Computer system 100 includes a central processing unit (CPU) 102 and a system memory 104 that includes a device driver 103. [0017] In one embodiment, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit ( GPU)…. In yet another embodiment, the parallel processing subsystem 112 may be integrated with one or more other system elements, such as the memory bridge 105, CPU 102, and I/O bridge 107 to form a system on chip (SoC)..[0018] Each PPU 202 advantageously implements a highly parallel processing architecture. As shown in detail, PPU 202(0) 

Claim 11
Cox, Babu and Marchezi do not disclose expressly wherein single unified memory system further comprises caches coupled to the graphics processor and one or more other graphics processors to form a communication network for transmission of data between multiple graphics processors including the graphics processor and the one or more other graphics processors.
Cheng discloses wherein single unified memory system further comprises caches coupled to the graphics processor and one or more other graphics processors to form a communication network for transmission of data between multiple graphics processors including the graphics processor and the one or more other graphics processors. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126, and the second processor 116 is a discrete graphic processor that drives the second, third, and fourth physical displays 104-108 through the second, third, and fourth display connectors 128-132, respectively. The first display connector 126 may be internal to the system 100 and the first physical display 102 may form a part of system 100--e.g., a display forming part of a laptop computer or mobile device such as, for example, a mobile phone. Nevertheless, it is understood that the number of the physical displays that each processor drives may be varied, and that the type of graphic processor may be varied. The third processor 120 may be a host central processing unit (CPU) bi-directionally connected to the system memory 122 and bi-directionally connected to other components of the system 100 through the system bus 134 as known in the art, or any other suitable processor. It is understood that, the first, second, and third processors 112, 116, and 120 may be integrated as a general processor (e.g., APU, accelerated processing unit; GPGPU, general-purpose computing on GPU); or the third processor (e.g., CPU) 120 may be integrated with the first processor 112 or with the second processor 116 to form a general processor. Although the first frame buffer 114, the second frame buffer 118, and the system memory 122 are shown in FIG. 1 as discrete memory devices, it is understood that a unified memory architecture that can accommodate all the processors may also be employed.’ of Cheng.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and Marchezi to incorporate either a processor integrated with a graphic processor or multiple processors that can use a multiple memory system of Cheng. Given the advantage of efficient use of a plurality and singular memory to reduce processing time and computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 13
Cox discloses wherein a scheduler is to schedule tasks or threads relating to the graphics processor based on the HBM system. (Cox, 0021; Referring again to FIG. 1, in some embodiments, some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used as graphics memory including, e.g., a conventional frame buffer) to store and update pixel data, delivering pixel data to display device 110, and the like.)

Claim 14
Cox, Babu and Marchezi do not disclose expressly wherein the graphics processor is co-located with an application processor on a common semiconductor package.
Cheng discloses wherein the graphics processor is co-located with an application processor on a common semiconductor package. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126,….’ of Cheng.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and Marchezi to incorporate either a processor integrated with a graphic processor or multiple processors that can use a multiple memory system of Cheng. Given the advantage of efficient use of a plurality and singular memory to reduce processing time and computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 15
Cox, 0039, 0048, 0018, 0020; In one embodiment, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit ( GPU).[0018] As shown, parallel processing subsystem 112 includes one or more parallel processing units (PPUs) 202, each of which is coupled to a local parallel processing (PP) memory 204. In general, a parallel processing subsystem includes a number U of PPUs, where U.gtoreq.1.[0020] ‘At step 406, digital camera 302 configures a processing unit based on configuration file 324. The processing unit could be, e.g., an ISP, CPU 304, and/or PPU 306. ….. At step 408, digital camera 302 renders trial images 330 by processing raw images 320 with the configured processing unit. At step 410, training engine 328 updates the weight values within machine learning engine 322 based on the difference between the trial images 330 generated at step 410 and target images 326. Target images 326 represent "ideal" images that would, ideally be produced by digital camera 302 by processing raw images 320. [0048] Machine learning engine 322 includes a set of weight values and is configured to transform a given set of raw images 320 and corresponding pixel values into a set of parameter values comprising configuration file 324 using those weight values, in the fashion consistent with machine learning techniques. In one embodiment, machine learning engine 322 comprises an artificial neural network (ANN).[0039] EC: Cox, 0028; Render targets, such as frame buffers or texture maps may be stored across DRAMs 220, allowing partition units 215 to write portions of each render target in parallel to efficiently use the available bandwidth of parallel processing memory 204. EC: HBW maps to Parallel processing. Clusters of applicant maps to the destinations of each of the parallel bandwidth. HBW channel maps to each specific unit of the parallel processing.)
Cox does not disclose expressly determining a first amount of the memory and a second amount of hardware of the graphics processor that is to be used.
Babu discloses determining a first amount of the memory and a second amount of hardware of the graphics processor that is to be used. (Babu, 0049; Instructions 612 may determine memory demand. For example, instructions 612 may determine, during boot time, how much memory will be used by processes, applications, and/or hardware (e.g., hard drive, CPU) that will be running or utilized at the beginning of runtime, or may receive data from an OS during runtime regarding how much memory is needed by processes/applications that are running, as discussed above with respect to FIG. 1.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox and Babu before him before the effective filing date of the claimed invention, to modify Cox to incorporate values of required memory and processing output of Babu. 
Cox and Babu do not disclose expressly for performing, with precision, neural network tasks of the neural network; allocating, based on the first amount and the second amount, a storage portion of the memory and a hardware portion of the hardware of the graphics processor to a machine learning training set of the neural network, wherein the storage and hardware portions are precise for implementation and processing of the machine learning training set.
Marchezi discloses for performing, with precision, neural network tasks of the neural network (Marchezi, 0044; Referring now to the flow diagram of FIG. 12, an example method 1200 of implementing a machine learning classifier begins at block 1202, with allocating in persistent memory, a training structure comprising an array of categories, a category data structure for each category in the array, and a global data structure. The method 1200 continues at block 1204 with reading the categories of the array from training data. At block 1206, for each category, the method includes reading training statements from the training data (block 1208), splitting each training statement into an array of words (block 1210), incrementing a category word counter for each word (block 1212), calculating a category statement probability and storing it in the category data structure (block 1214), and calculating a category word probability for each word and storing it in the category data structure (block 1216). Then at block 1218, the method includes calculating a global word probability for each word and storing it in the global data structure. EC: Precision is viewed as a trained neural network.); Marchezi, 0034, 0046, 0042; ‘A number of categories can be read from a set of training data 113, and that number can be used to allocate a category array 118 in the persistent memory.’ And ‘Incrementing a category word counter can include searching for a corresponding word structure in the word search tree of the category (block 1314), and if a corresponding word structure is found, incrementing the category word counter within corresponding word structure, as shown at block 1316. However, as shown at block 1318, if a corresponding word structure is not found, incrementing the category word counter can include allocating a new word structure in persistent memory,…’ with ‘In some examples, implementing the operations of methods 1200, 1300, and 1500 can be achieved using an ASIC and/or other hardware components (not shown) alone or in combination with programming instructions executable by a processor 102.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu and Marchezi before him before the effective filing date of the claimed invention, to modify Cox and Babu to incorporate allocating memory for segmented training data of Marchezi. Given the advantage of not wasting power for additional memory but being used, one having ordinary skill in the art would have been motivated to make this obvious modification.

Cheng discloses generating a single unified memory system having at least the storage portion of the memory and at least the hardware portion of the graphics processor. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126, and the second processor 116 is a discrete graphic processor that drives the second, third, and fourth physical displays 104-108 through the second, third, and fourth display connectors 128-132, respectively. ….. Although the first frame buffer 114, the second frame buffer 118, and the system memory 122 are shown in FIG. 1 as discrete memory devices, it is understood that a unified memory architecture that can accommodate all the processors may also be employed.’ of Cheng. EC: Here Cheng connects graphic processors can be seen as a unified memory architecture. Graphic processors and buffers are hardware.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and Marchezi to incorporate either a processor integrated with a graphic processor or multiple processors that can use a multiple memory system of Cheng. Given the advantage of efficient use of a plurality and singular memory to reduce processing time and computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Trobough discloses introducing cache coherency within the single unified memory system  the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESI) protocol for pages that the graphics processor does not own (Trobough, 0047, 0085, 0010, fig 4; ‘The data cache is to store recently used/operated on elements, such as data operands, which are potentially held in memory coherency states. The D-TLB is to store recent virtual/linear to physical address translations. As a specific example, a processor may include a page table structure to break physical memory into a plurality of virtual pages.’ [0047] ‘In one embodiment, a QPI based interconnect includes a Modified Exclusive Shared Invalid Forward (MESIF) protocol, which provides a protocol similar to a snoop protocol without the potential limitations of a single, serializing bus. Like a snooping cache protocol, 
Cox, Babu, Marchezi, Cheng and Trobough do not disclose expressly processing the neural network to identify data dependencies and to group dependent data of the neural network together corresponding to the data dependencies and allocating the dependent data in a same HBM channel of the one or more HBM channels.
Thibeault discloses processing the neural network to identify data dependencies and to group dependent data of the neural network together corresponding to the data dependencies and allocating the dependent data in a same HBM channel of the one or more HBM channels. (Thibeault, c2:13-37; In another embodiment disclosed herein, a neural network for reinforcement-learning and for action-selection comprises a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of, and a population of inhibition neurons in each of the channels, wherein each population of inhibition neurons receive an input from a population of output neurons in a same channel that the population of inhibition neurons is part of, and wherein a population of inhibition neurons in a channel has an output to output neurons in every other channel except the channel of which the inhibition neurons are part of, wherein if the environmental input to a population of reward neurons for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, and wherein if the environmental input to a population of reward neurons for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated. EC: data dependencies is merely mentioned in 0017, 0168 and 0186 and has no formal definition of how or how not ‘dependency’ is achieved. A neural network is a classifier by design based on the distinct outputs. The examiner views these individual outputs as ‘independent’ from one another and the output of a single output and their corresponding input are dependent.)  It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough and Thibeault before him before the effective filing date of the claimed invention, to modify Cox, Babu, Marchezi, Cheng and Trobough to incorporate high band memory of Thibeault. Given the advantage of forwarding information of the same label using high 

Claim 16
Cox discloses wherein the operations further comprise analyzing the one or more components to determine the storage portion and the hardware portion of the memory and the graphics processor, respectively, wherein the one or more components further include one or more of one or more compilers, one or more drivers, schedulers, compute clusters, compute elements, and caches. (Cox, 0017, 0018, 0025 FIG. 1 is a block diagram illustrating a computer system 100 configured to implement one or more aspects of the present invention. Computer system 100 includes a central processing unit (CPU) 102 and a system memory 104 that includes a device driver 103. [0017] In one embodiment, the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit ( GPU)…. In yet another embodiment, the parallel processing subsystem 112 may be integrated with one or more other system elements, such as the memory bridge 105, CPU 102, and I/O bridge 107 to form a system on chip (SoC)..[0018] Each PPU 202 advantageously implements a highly parallel processing architecture. As shown in detail, PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208, where C.gtoreq.1.[0025])

Claim 18

Cheng discloses wherein single unified memory system comprises caches coupled to the graphics processor and one or more other graphics processors to form a communication network for transmission of data between multiple graphics processors including the graphics processor and the one or more other graphics processors. (Cheng, 0029; ‘In one example, the first processor 112 is an integrated graphic processor that drives only the first physical display 102 through the first display connector 126, and the second processor 116 is a discrete graphic processor that drives the second, third, and fourth physical displays 104-108 through the second, third, and fourth display connectors 128-132, respectively. The first display connector 126 may be internal to the system 100 and the first physical display 102 may form a part of system 100--e.g., a display forming part of a laptop computer or mobile device such as, for example, a mobile phone. Nevertheless, it is understood that the number of the physical displays that each processor drives may be varied, and that the type of graphic processor may be varied. The third processor 120 may be a host central processing unit (CPU) bi-directionally connected to the system memory 122 and bi-directionally connected to other components of the system 100 through the system bus 134 as known in the art, or any other suitable processor. It is understood that, the first, second, and third processors 112, 116, and 120 may be integrated as a general processor (e.g., APU, accelerated processing unit; GPGPU, general-purpose computing on GPU); or the third processor (e.g., CPU) 120 may be integrated with the first processor 112 or with the second processor 116 to form a general processor. Although the first frame buffer 114, the second frame buffer 118, and the system memory 122 are shown in FIG. 1 as discrete memory devices, it is understood that a unified memory architecture that can accommodate all the processors may also be employed.’ of Cheng.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu Marchezi and Cheng before him before the effective filing date of the claimed invention, to modify Cox, Babu and Marchezi to incorporate either a processor integrated with a graphic processor or multiple processors that can use a multiple memory system of Cheng. Given the advantage of efficient use of a plurality and singular memory to reduce processing time and computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 20
Cox discloses wherein a scheduler is to schedule tasks or threads relating to the graphics processor based on the HBM system, wherein the graphics processor is co-located with an application processor on a common semiconductor package. (Cox, 0021; Referring again to FIG. 1, in some embodiments, some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used 

Claim(s) 3, 10 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cox, Babu, Marchezi, Cheng, Trobough and Thibeault as applied to claims 1-2, 4, 6-9, 11, 13-16, 18 and 20 above, and further in view of Fairweather (U. S. Patent Publication 20070112714, referred to as Fairweather)

Claim 3
Cox, Babu, Marchezi, Cheng, Trobough and Thibeault do not disclose expressly wherein a compiler of one or more compilers to detect the storage and hardware portions, and wherein a driver of the one or more drivers to configure the storage and hardware portions to allow for precision in the implementation and processing of the training set.
Fairweather discloses wherein a compiler of one or more compilers to detect the storage and hardware portions, and wherein a driver of the one or more drivers to configure the storage and hardware portions to allow for precision in the implementation and processing of the training set. (Fairweather, 0359, 1541, 1453; Components maps to ‘In most modern computer environments, such as programming languages, and applications, the programming language compiler itself performs the job of defining data structures and the types and the fields that make them up. That type information is compile-time determined. This approach has the advantage of allowing the compiler itself to detect many common programmer errors in accessing compound data 

Claim 10
Cox, Babu, Marchezi, Cheng, Trobough and Thibeault do not disclose expressly wherein a compiler of one or more compilers to detect the storage and hardware portions, and wherein a driver of the one or more drivers to configure the storage and hardware portions to allow for precision in the implementation and processing of the training set.
Fairweather, 0359, 1541, 1453; Components maps to ‘In most modern computer environments, such as programming languages, and applications, the programming language compiler itself performs the job of defining data structures and the types and the fields that make them up. That type information is compile-time determined. This approach has the advantage of allowing the compiler itself to detect many common programmer errors in accessing compound data structures rather than allowing such errors to occur at run-time where they are much harder to find.’ (compiler), by the example of ‘ In the preferred embodiment, the function CL_PurgeCache( ) is called regularly by the environment. If it discovers that the disk(s) containing the cache folder is becoming full, it purges old files from the cache until disk utilization falls below acceptable limits.’ (cache) and ‘In the preferred embodiment, in order to logically refer to the various possible locations that media may be moved to/from, the logical MSS layer assigns the following numeric values to various locations which must be translated by the driver into the corresponding locations in the physical robot:’ (driver) of Fairweather.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough, Thibeault and Fairweather before him before the effective filing date of the claimed invention, to modify Cox, Babu, Marchezi, Cheng, Trobough and Thibeault to incorporate the function of drivers and compilers of Fairweather. Given the advantage of disclosing 

Claim 17
Cox, Babu, Marchezi, Cheng, Trobough and Thibeault do not disclose expressly wherein a compiler of one or more compilers to detect the storage and hardware portions, and wherein a driver of one or more drivers to configure the storage and hardware portions to allow for precision in the implementation and processing of the training set
Fairweather discloses wherein a compiler of one or more compilers to detect the storage and hardware portions, and wherein a driver of one or more drivers to configure the storage and hardware portions to allow for precision in the implementation and processing of the training set. (Fairweather, 0359, 1541, 1453; Components maps to ‘In most modern computer environments, such as programming languages, and applications, the programming language compiler itself performs the job of defining data structures and the types and the fields that make them up. That type information is compile-time determined. This approach has the advantage of allowing the compiler itself to detect many common programmer errors in accessing compound data structures rather than allowing such errors to occur at run-time where they are much harder to find.’ (compiler), by the example of ‘ In the preferred embodiment, the function CL_PurgeCache( ) is called regularly by the environment. If it discovers that the disk(s) containing the cache folder is becoming full, it purges old files from the cache until disk utilization falls below acceptable limits.’ (cache) and ‘In the preferred embodiment, in .

Claim(s) 5, 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cox, Babu, Marchezi, Cheng, Trobough and Thibeault as applied to claims 1-2, 4, 6-9, 11, 13-16, 18 and 20 above, and further in view of Kissell (U. S. Patent 7017025, referred to as Kissell)

Claim 5
Cox, Babu, Marchezi, Cheng, Trobough and Thibeault do not disclose expressly wherein each of the one or more levels includes at least two graphics processors and at least two caches associated with the two graphics processors.
Kissell discloses wherein each of the one or more levels includes at least two graphics processors and at least two caches associated with the two graphics processors. (Kissell, c4:25-63; ‘In multi-chip, multiprocessing environments, sophisticated cache coherency mechanisms have been designed to insure data coherency across multiple caches.’, ‘The cache 305 is coupled to a unified memory 310. Additional processing elements (PE) 306 are also provided, which are coupled to one or more associated proxy caches 308, which in turn are coupled to both the proxy processor 304 via command lines 311, and the unified memory 310.’ and ‘More complex systems often contain multiple microprocessors (e.g., digital signal processors, graphics processors, etc.), each with their own memory systems, and often of different architecture (i.e., executing different instruction sets). Referring to FIG. 2 a block diagram 200 is shown having N number of processing elements 202, 204 and 206, each with their own memory system 208, 210, and 212, respectively.’ of Kissell. EC: Kissell discloses cache coherency being associated to unified memory. Multiple caches means multiple processors. ) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough, Thibeault and Kissell before him before the effective filing date of the claimed invention, to modify Cox, Babu, Marchezi, Cheng, Trobough and Thibeault to incorporate cache coherency of Kissell. Given the advantage of overcoming a traditional design of a CPU cache and a GPU cache and in a concept of combining these caches into one for faster processing and lower computational costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 12

Kissell discloses wherein each of the one or more levels includes at least two graphics processors and at least two caches associated with the two graphics processors. (Kissell, c4:25-63; ‘In multi-chip, multiprocessing environments, sophisticated cache coherency mechanisms have been designed to insure data coherency across multiple caches.’, ‘The cache 305 is coupled to a unified memory 310. Additional processing elements (PE) 306 are also provided, which are coupled to one or more associated proxy caches 308, which in turn are coupled to both the proxy processor 304 via command lines 311, and the unified memory 310.’ and ‘More complex systems often contain multiple microprocessors (e.g., digital signal processors, graphics processors, etc.), each with their own memory systems, and often of different architecture (i.e., executing different instruction sets). Referring to FIG. 2 a block diagram 200 is shown having N number of processing elements 202, 204 and 206, each with their own memory system 208, 210, and 212, respectively.’ of Kissell.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough, Thibeault and Kissell before him before the effective filing date of the claimed invention, to modify Cox, Babu, Marchezi, Cheng, Trobough and Thibeault to incorporate cache coherency of Kissell. Given the advantage of overcoming a traditional design of a CPU cache and a GPU cache and in a concept of combining these caches into one for faster processing and lower computational costs, 

Claim 19
Cox, Babu, Marchezi, Cheng, Trobough and Thibeault do not disclose expressly wherein each of the one or more levels includes at least two graphics processors and at least two caches associated with the two graphics processors.
Kissell discloses wherein each of the one or more levels includes at least two graphics processors and at least two caches associated with the two graphics processors. (Kissell, c4:25-63; ‘In multi-chip, multiprocessing environments, sophisticated cache coherency mechanisms have been designed to insure data coherency across multiple caches.’, ‘The cache 305 is coupled to a unified memory 310. Additional processing elements (PE) 306 are also provided, which are coupled to one or more associated proxy caches 308, which in turn are coupled to both the proxy processor 304 via command lines 311, and the unified memory 310.’ and ‘More complex systems often contain multiple microprocessors (e.g., digital signal processors, graphics processors, etc.), each with their own memory systems, and often of different architecture (i.e., executing different instruction sets). Referring to FIG. 2 a block diagram 200 is shown having N number of processing elements 202, 204 and 206, each with their own memory system 208, 210, and 212, respectively.’ of Kissell.) It would have been obvious to one having ordinary skill in the art, having the teachings of Cox, Babu, Marchezi, Cheng, Trobough, Thibeault and Kissell before him before the effective filing date of the claimed invention, to modify Cox, Babu, Marchezi, Cheng, Trobough .

Response to Arguments
3.	Applicant’s arguments filed on 9/7/2021 for claims 1-20 have been fully considered but are not persuasive.
Applicant’s argument:

Claim Rejections - 35 U.S.C. § 103

Claims 1-20 are rejected under 35 U.S.C. § 103 as being unpatentable over Cox, U.S. Publication No. 2014/0152848 (“Cox”) in view of Babu, et al., U.S. Publication No. 2016/0092115 (“Babu”) in view of Marchezi, et al., U.S. Publication No. 2018/0276507 (“Marchezi”) in view of Cheng, et al., U.S. Publication No. 2012/0050260 (“Cheng”) in view of Kissell, U.S. Patent No. 7,017,025 (“Kissell”) in view of Fairweather, U.S. Publication No. 2007/0112714 (“Fairweather”) in view of Thibeault, et al., U.S. Patent No. 9,349,092 (“Thibeault”).

With respect to the §103 rejections above, the following remarks are provided. Independent claim 1 has been amended herein. Without limiting the scope, only in an effort to impart precision to the claims (e.g., by more particularly pointing out features, rather than to avoid references), and merely to expedite the prosecution of the present application, independent claim 1 has been amended to in part clarify characteristics of the subject matter. As amended, independent claim 1 is directed to:


….introduce cache coherency within the single unified memory system, the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESD protocol for pages that the graphics processor does not own;….

The cited references, alone or in combination, neither discloses (nor even suggest) an arrangement in which a graphics processor is to: introduce cache coherency within the single unified memory system, the cache coherency to provide a page level coherency across multiple graphics processors comprising at least the graphics processor, wherein the page level coherency utilizes a page table to enable an ability to exchange ownership between the multiple graphics processors at one or more levels of the single unified memory system and allows the graphics processor to snoop the multiple graphics processors using a modified exclusive shared invalid (MESIJ) protocol for pages that the graphics processor does not own, as recited in claim 1 as amended herein. Therefore, the cited references, alone or in combination, cannot render obvious claim 1.

Independent claims 8 and 15 have been amended to recite elements generally similar to those recited in independent claim 1. Accordingly, independent claims 8 and 15 are allowable for at least similar arguments applied to independent claim 1. The remaining dependent claims depend ultimately from one of claims 1, 8, or 15 and are allowable at least by virtue of the dependency on claims 1, 8, or 15 for the claim elements recited separately therein.

Examiner’s answer:
This portion of the amended claim is addressed with new art Trobough.

4.	Claims 1-20 are rejected.

Conclusion – Final
5.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the 


Correspondence Information
6.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Li Zhen can be reached at (571) 272-3768.  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);

	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121