Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Objections

Claim 8 is objected to because of the following informalities: The expression of claim 8;” wherein the array of multiprocessors are to connect to the system memory device via the IO interface.”, should be:” wherein the array of multiprocessors is to connect to the system memory device via the IO interface”.  Appropriate correction is required.


Claim Rejections - 35 USC § 112

1.	The following is a quotation of 35 U.S.C. 112(b):

(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 


2.	Claims 5-8, 19, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

3.	Claim 5 ( and similar claim 19)  recites the limitation: “memory management circuitry to allocate physical memory of the 3D memory stack to as system memory. “ The claim is unclear.  While this is likely a drafting error, the scope is rendered indefinite as there is no way to apprise one of ordinary skill in the art what “3D memory stack to as system memory” is being referred to.  

4.	Claims 6-8, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, for being dependent to a rejected claim.

Claim Rejections - 35 USC § 103

4.        In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


7.	Claims 1,9-11, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, and in view of Hansen et al., US 2004/0015533 A1, and further in view of Siu et al., US 2006/0101244 A1.


8. 	As per claim 1, Falcon discloses:  An apparatus comprising: 
an interconnect fabric comprising one or more fabric switches; (Falcon, ¶84, “Processing device 1000 may be implemented in part by, for example, the elements illustrated in FIGS. 1-8. In the example of FIG. 10, processing device 1000 may include a processor block 1002, a calculation accelerator 1004, and a bus/fabric/interconnect system 1006.” Notes: The interconnect fabric comprises at least one switch.”)
a memory controller coupled to the interconnect fabric; (Falcon, figure 7, Blocks 306, and 702)
an input/output (IO) interface coupled to the interconnect fabric; (Falcon, ¶42, “System 100 may use a proprietary hub interface bus 122 to couple MCH 116 to I/O controller hub (ICH) 130. In one embodiment, ICH 130 may provide direct connections to some I/O devices via a local I/O bus. The local I/O bus may include a high-speed I/O bus for connecting peripherals to memory 120, chipset, and processor 102.”)
an array of multiprocessors coupled to the interconnect fabric to process mixed-precision instructions, (Falcon, ¶66, “FIG. 5 illustrates a block diagram of a second system 500, in accordance with embodiments of the present disclosure. As shown in FIG. 5, multiprocessor system 500 may include a point-to-point interconnect system, and may include a first processor 570 and a second processor 580 coupled via a point-to-point interconnect 550. Each of processors 570 and 580 may be some version of processor 300 as one or more of processors 410,615.”, and ¶53, “FIG. 2 is a block diagram of the micro-architecture for a processor 200 that may include logic circuits to perform instructions, in accordance with embodiments of the present disclosure. In some embodiments, an instruction in accordance with one embodiment may be implemented to operate on data elements having sizes of byte, word, doubleword, quadword, etc., as well as datatypes, such as single and double precision integer and floating point datatypes.”) at least one multiprocessor comprising:
 
9.	Falcon doesn’ t expressly disclose: 
a plurality of packed data registers to store packed floating-point source values at a first precision using a first number of bits, and to store at least one packed floating- point value at a second precision using a second number of bits equal to at least twice the first number of bits; and
mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations including an operation D = A * B + C, wherein A, B, C, and D are matrix elements, A and B are floating-point values at the first precision, and C is a floating-point value at the second precision.  

10.	Hansen discloses: 
a plurality of packed data registers to store packed floating-point source values at a first precision using a first number of bits (Hansen , ¶66, “Two values are taken from the contents of registers or register pairs specified by ra and rb.”, ¶41, “In fact, for all sizes of symbols from 1-16 bits, the result is no larger than 64-bits, which in some architecture designs is the width of a single register. For symbols of 32 bits, the 4 products are 64 bits each, so a 128-bit result is used, which cannot overflow on the sum operation.”), and to store at least one packed floating- point value at a second precision using a second number of bits equal to at least twice the first number of bits (Hansen , ¶41, “For symbols of 64 bits, the 2 products are 128 bits each and nearly all values can be added without overflow. The fact that this instruction takes 128-bit groups rather than 64-bit group means that twice as many multiplies are performed by this instruction, as compared to the instructions illustrated in FIGS. 1 and 2.”); 

11.	Falcon is analogous art with respect to Hansen because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that a plurality of packed data registers to store packed floating-point source values at a first precision using a first number of bits, and to store at least one packed floating- point value at a second precision using a second number of bits equal to at least twice the first number of bits, as taught by Hansen into the teaching of Falcon. The suggestion for doing so would allow optimize both system performance and overall power efficiency. Therefore, it would have been obvious to combine Falcon with Hansen.

12.	Falcon in view of Hansen doesn’t expressly disclose: mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations including an operation D = A * B + C, wherein A, B, C, and D are matrix elements, A and B are floating-point values at the first precision, and C is a floating-point value at the second precision.  

13.	 Siu discloses: mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations including an operation D = A * B + C, (Siu, [0047], “In one embodiment, MMAD unit 220 implements a multiply-add (MAD) pipeline for computing A*B+C for integer or floating-point operands, and various circuits within this pipeline are leveraged to perform numerous other integer and floating-point operations. ”) wherein A, B, C, and D are matrix elements, A and B are floating-point values at the first precision, and C is a floating-point value at the second precision.  (Siu, Abstract, [0156], and [0158])

14.	Siu is analogous art with respect to Hansen in view of Hansen because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations including an operation D = A * B + C, wherein A, B, C, and D are matrix elements, A and B are floating-point values at the first precision, and C is a floating-point value at the second precision, as taught by Siu into the teaching of Falcon in view of Hansen doesn’t. The suggestion for doing so would provide functional units that require reduced chip area and that can be used more efficiently. Therefore, it would have been obvious to combine Falcon with Hansen in view of Hansen.

15.	 Asper claim 9, Falcon in view of Hansen, and in view of Siu discloses: The apparatus of claim 1 wherein the mixed-precision instructions are primitives of a machine learning framework. (Falcon, and in view of Siu, [0023], “The following description describes weighi-shifting mechanism for reconfigurable processing units within or in association with a processor, virtual processor, package, computer system, or other processing apparatus. In one embodiment, such a weight-shifting mechanism may be used in convolution neural networks (CNN). In another embodiment, such CNNs may include low-precision CNNs.”, and [0093], “Weights 1204 may be calculated during, for example, a learning process of the functions for the CNN.” Notes: “The CNN is based on a machine learning.”)

16. 	As per claim 10, Falcon in view of Hansen, and in view of Siu discloses: The apparatus of claim 9 wherein the matrix elements are elements of matrices associated with a convolutional layer (Falcon, [0023], “The following description describes weighi-shifting mechanism for reconfigurable processing units within or in association with a processor, virtual processor, package, computer system, or other processing apparatus. In one embodiment, such a weight-shifting mechanism may be used in convolution neural networks (CNN). In another embodiment, such CNNs may include low-precision CNNs.”)

17. 	As per claim 11, Falcon in view of Hansen, and in view of Siu discloses: The apparatus of claim 10, wherein the matrices associated with the convolutional layer comprise a first matrix and a second matrix and the set of multiply and accumulate operations include a multiplication of a packed data element from the first matrix and a packed data element from the second matrix (Hansen, Figure 5A, and 942, “More specifically, referring to FIG. 5A, this instruction takes two 128-bit operands specified by ra and rb and multiplies the corresponding groups of the specified size, producing a series of results of twice the specified size.”)

18. 	As per claim 14, Falcon in view of Hansen, and in view of Siu discloses: The apparatus of claim 1, wherein the mixed-precision execution circuitry is to generate an output matrix.  (Hansen, Figure 5A, and 942, “More specifically, referring to FIG. 5A, this
instruction takes two 128-bit operands specified by ra and rb and multiplies the corresponding groups of the specified size, producing a series of results of twice the specified size.”)

19.	Claims 2, 4, 16 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, in view of Hansen et al., US 2004/0015533 A1, and in view of Siu et al., US 2006/0101244 A1, and further in view of Cordero et al., US 2012/0256653 A1.

20. 	As per claim 2, Falcon in view of Hansen, and in view of Siu discloses: The apparatus of claim 1 further comprising: 
and array of multiprocessors mounted on the semiconductor substrate; (Falcon, ¶27,” In cases wherein some semiconductor fabrication techniques are used, the data representing the hardware model may be the data specifying the presence or absence of various features on different mask layers for masks used to produce the integrated circuit.”, and ¶63, “GMCH 420 may be a chipset, or a portion of a chipset. GMCH 420 may communicate with processors 410, 415 and control interaction between processors 410, 415 and memory 440. GMCH 420 may also act as an accelerated bus interface between the processors 410, 415 and other elements of system 400. In one embodiment, GMCH 420 communicates with processors 410, 415 via a multi-drop bus, such as a frontside bus (FSB) 495.”)

21.	Falcon in view of Hansen, and in view of Siu doesn’ t expressly disclose:
a semiconductor substrate; 
a parallel processor die comprising the interconnect fabric, memory controller, input/output (IO) interface, and array of multiprocessors mounted on the semiconductor substrate; 
84a 3D memory stack comprising a plurality of stacked memory dies mounted on the semiconductor substrate; and 
a local memory interconnect to couple the memory controller to the 3D memory stack, the local memory interconnect comprising independent groups of memory interfaces, the independent groups of memory interfaces associated with respective memory dies of the plurality of stacked memory dies.  

22.	Cordero discloses: 
a semiconductor substrate; (Cordero, ¶35,” the substrate may be a semiconductor or another integrated circuit.”)
a parallel processor die comprising the interconnect fabric (Cordero, ¶32,” An FPGA or CPLD device has an extensive interconnection fabric that includes a matrix of signal routing conductors arranged on the die along with periodic configurable couplings that permit the user logic outputs to be interconnected to user logic inputs. A TSV stacked die employs some of these configurable couplings to be able to couple signal routing conductors to TSV's that can reach adjacent die to extend the interconnection fabric between die.”), memory controller, input/output (IO) interface, (Cordero, ¶58, “As shown in FIG. 9, the configuration memory management circuits include: a system input 902 connected to an electrode on one side of the die 908 for receiving a configuration memory data stream; a circuit 901 to control the path of the configuration memory data stream (e.g., to a loader circuit on the die, such as central CRAM loader circuit 103 or to another die); an interface 910 to a loader circuit on the die for sending configuration memory data to configure the CRAM on the die; an interface 903 to send configuration memory data to a multiplexer 906, a select signal 905 to allow data to flow through the multiplexer 906;”) 
84a 3D memory stack comprising a plurality of stacked memory dies mounted on the semiconductor substrate; (Cordero, ¶54, “FIG. 6 is a block diagram of a FPGA with ROM in a 3D stack in accordance with an embodiment. As shown in FIG. 6, a die 602 (or substrate) containing ROM memory elements is attached, using TSV, to a die 604 (or substrate) containing logic elements.”)  and 
a local memory interconnect to couple the memory controller to the 3D memory stack, (Cordero, ¶35, “As used herein, the terms "configurable integrated circuit die" or "die" refers to a block of semiconducting material on which a given functional circuit (e.g., a configuration memory management circuit, a programmable logic circuit) is fabricated on a substrate. Different die are located on different substrates.”) the local memory interconnect comprising independent groups of memory interfaces, the independent groups of memory interfaces associated with respective memory dies of the plurality of stacked memory dies.  (Cordero, ¶06, “The first configurable integrated circuit die includes a first array and a first configuration memory management circuit that includes an interface to the first array. The first array includes a first logic elem3ent and a first configuration memory. The configurable die stack arrangement also includes a second configurable integrated circuit die located on a second substrate. The second substrate is different than the first substrate. The second configurable integrated circuit die includes a second array and a second configuration memory management circuit that includes an interface to the second array.”)

23.	Cordero is analogous art with respect to Falcon in view of Hansen, and in view of Siu because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that a semiconductor substrate; a parallel processor die comprising the interconnect fabric, memory controller, input/output (IO) interface, and array of multiprocessors mounted on the semiconductor substrate; 84a 3D memory stack comprising a plurality of stacked memory dies mounted on the semiconductor substrate; and a local memory interconnect to couple the memory controller to the 3D memory stack, the local memory interconnect comprising independent groups of memory interfaces, the independent groups of memory interfaces associated with respective memory dies of the plurality of stacked memory dies, as taught by Cordero into the teaching of Falcon in view of Hansen, and in view of Siu. The suggestion for doing so would provide an arrangement take on the characteristics of a monolithic die has benefits for manufacturability, reliability, and extendable function.  Therefore, it would have been obvious to combine Cordero with Falcon in view of Hansen, and in view of Siu.

24. 	As per claim 4, Falcon in view of Hansen, and in view of Siu, and further in view of Cordero discloses The apparatus of claim 2, further comprising: a cache hierarchy to store data for the array of multiprocessors, the cache hierarchy including an LI cache and an L2 cache to be shared between the plurality of multiprocessors.  (Falcon, ¶32, “In one embodiment, processor 102 may include a Level 1 (L1) internal cache memory 104. Depending on the architecture, the processor 102 may have a single internal cache or multiple levels of internal cache. In another embodiment, the cache memory may reside external to processor 102.”)
 

25.	Claim 16, which is similar in scope to claims 1, and 2, thus rejected under the same rationale.

26.	Claim 18, which is similar in scope to claim 4, thus rejected under the same rationale.

27.	Claims 3,  and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, in view of Hansen et al., US 2004/0205324 A1, and  in view of Siu et al., US 2006/0101244 A1, and  in view of Cordero et al., US 2012/0256653 A1, and further in view of Hodges et al., US 2014/0281783 A1.

28. 	As per claim 3, Falcon in view of Hansen, and in view of Siu, and further in view of Cordero discloses:  The apparatus of claim 2 (See rejection of claim 2 above.)

29.	 Falcon in view of Hansen, and in view of Siu, and further in view of Cordero doesn’t expressly disclose:  a memory interface comprises a memory channel and wherein an independent group of memory interfaces comprises at least one physical memory channel and one or more virtual memory channels between a corresponding memory die and a multiprocessor.  

30.	Hodges discloses: a memory interface comprises a memory channel and wherein an independent group of memory interfaces comprises at least one physical memory channel and one or more virtual memory channels between a corresponding memory die and a multiprocessor.(Hodges, Figure 9, ¶02, “Contemporary high performance computing main memory systems are generally composed of one or more memory devices, which are connected to one or more memory controllers and/or processors via one or more memory interface elements such as buffers, hubs, bus-to-bus converters, etc.”, ¶35, “Various stacking architectures (for example, 3 die stacking, or 3DS) may also be implemented, which may include master ranks and slave ranks in the packaging architecture.”, and ¶24, “The MCU 106 and the MCS 108 may include one or more processing circuits, or processing may be performed by or in conjunction with the processor 104. In the example of FIG. 1, there are five channels 110 that can support parallel memory accesses as a virtual channel 111. In an embodiment, the memory system 100 is a five -channel redundant array of independent memory (RAIM) system, where four of the channels 110 provide access to columns of data and check-bit memory, and a fifth channel provides access to RAIM parity bits in the memory subsystem 112.”, and ¶41, “When implemented as a RAIM system, the memory buffer chips 202a-202n can be configured in the synchronous mode of operation. In a RAIM configuration, memory data is striped across multiple physical memory channels 110, e.g., five channels 110, which can act as the single virtual channel 111 of FIG. 1 in order to provide error-correcting code (ECC) protection for continuous operation, even when an entire channel 110 fails.” )

31.	Hodges is analogous art with respect to Falcon in view of Hansen, and in view of Siu, and further in view of Cordero because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that a memory interface comprises a memory channel and wherein an independent group of memory interfaces comprises at least one physical memory channel and one or more virtual memory channels between a corresponding memory die and a multiprocessor, as taught by Hodges into the teaching of Falcon in view of Hansen, and in view of Cordero. The suggestion for doing so would improve system reliability.  Therefore, it would have been obvious to combine Hodges with Falcon in view of Hansen, and in view of Siu, and further in view of Cordero.

32.	Claim 17, which is similar in scope to claim 3, thus rejected under the same rationale.


33.	Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, and in view of Hansen et al., US 2004/0205324 A1, and  in view of Siu et al., US 2006/0101244 A1, and  in view of Cordero et al., US 2012/0256653 A1, and further in view of Smith et al., US 2018/0246816 A1.

34. 	As per claim 6, Falcon in view of Hansen, and in view of Siu, and in view of Cordero, discloses: The apparatus of claim 5 further comprising: an input/output memory management unit (IOMMU) coupled to the interconnect fabric (Falcon, ¶41, “A system logic chip 116 may be coupled to processor bus 110 and memory 120. System logic chip 116 may include a memory controller hub (MCH).”)

35.	Falcon in view of Hansen, and in view of Siu, and in view of Cordero doesn’t expressly disclose: the IOMMU comprising a translation buffer to store virtual-to-physical address translations to access the system memory, including the 3D memory stack.  

36.	Smith discloses: the IOMMU comprising a translation buffer to store virtual-to-physical address translations to access the system memory, including the 3D memory stack.  (Smith, claim 10, and ¶18, “A memory management controller 120, coupled to the processor 102 and to other units, assists with accessing memory via address translation streams. More specifically, in response to receiving memory access requests, the processor 102 performs virtual-to-physical address translations and accesses memory based on the translated physical addresses.”, ¶14, “The memory 104 is located on the same die as the processor 102, or may be located separately from the processor 102.”, and claim 1, “method for accessing data stored in memory, the method comprising: initializing a translation lookaside buffer ("TLB") pre-fetch stream for a client, wherein the initializing includes performing a pre-fetch operation to fetch virtual-to-physical memory address translations into a TLB; receiving, from the client, a memory access request to access data stored at virtual addresses for which translations are stored in the TLB; translating the virtual addresses to physical addresses based on the translations; and accessing memory based on the memory access request and the physical addresses.”, )

37.	Smith is analogous art with respect to Falcon in view of Hansen, and in view of Siu, and in view of Cordero because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that the IOMMU comprising a translation buffer to store virtual-to-physical address translations to access the system memory, including the 3D memory stack, as taught by Smith into the teaching of Falcon in view of Hansen, and in view of Siu and in view of Cordero. The suggestion for doing so would avoid memory access latency that can result in unacceptable performance for real-time applications. Therefore, it would have been obvious to combine Smith with Falcon in view of Hansen, and in view of Siu and in view of Cordero.

38.	Claims 7, 8, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, and in view of Hansen et al., US 2004/0205324 A1, and in view of Siu et al., US 2006/0101244 A1, and in view of Cordero et al., US 2012/0256653 A1, and further in view of Smith et al., US 2018/0246816 A1, and further in view of Olson et al., US 2014/0143497 A1.

39. 	As per claim 7,  Falcon in view of Hansen, and  in view of Siu et al., US 2006/0101244 A1, and in view of Cordero, and further in view of Smith discloses: The apparatus of claim 6 (See rejection of claim 6 above.)

40.	Olson discloses:  a first one or more virtual-to-physical address translations are to identify regions in the 3D memory stack (Olson, ¶24, “The stack TLBs 112 are utilized to store virtual-to-physical address translations for stack data, which may be virtually indexed and virtually tagged in certain implementations. If an access request for designated stack data cannot be found in the identified stack cache 108, then the corresponding stack TLB 112 will be searched in an attempt to locate the virtual address of the designated stack data.”) 

41.	Olson is analogous art with respect to Falcon in view of Hansen, and in view of Siu, and in view of Cordero and further in view of Smith because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that the a first one or more virtual-to-physical address translations are to identify regions in the 3D memory stack, as taught by Olson into the teaching of Falcon in view of Hansen, and in view of Siu, and in view of Cordero, and in view of Smith. The suggestion for doing so would locate the requested stack data in the cache hierarchy. Therefore, it would have been obvious to combine and further in view of Olson with Falcon in view of Hansen, and in view of Siu and in view of Cordero, and in view of Smith.

42.	Falcon in view of Hansen, and in view of Siu, and in view of Cordero, in view of Smith and in view of Olson doesn’t disclose: a second one or more virtual-to- physical address translations are to identify regions in a system memory device.  

43.	Felch discloses: a second one or more virtual-to- physical address translations are to identify regions in a system memory device.  (Felch, ¶144, “FIG. 21 depicts an embodiment of a Virtual-to-physical address translator 2100 with three entries 2121, 2122, 2123 in a "Virtual address part to match" table 2120, and three entries 2141, 2142, and 2143 in a Corresponding physical addresses table 2140, which create the virtual-to-physical mapping of the contiguous memory region dedicated to Data Structure 9 (2048, 2058, 2064) to the three physically discontiguous memory regions 2090, 2092, and 2094.”)

44.	Felch is analogous art with respect to Falcon in view of Hansen, and in view of Siu, and in view of Cordero and further in view of Smith and in view of Olson because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that a second one or more virtual-to- physical address translations are to identify regions in a system memory device, as taught by Felch into the teaching of Falcon in view of Hansen, and in view of Siu, and in view of Cordero, and in view of Olson. The suggestion for doing satisfy a memory request in a system utilizing the integrated processor core. Therefore, it would have been obvious to combine and further in view of Felch with Falcon in view of Hansen, and in view of Siu, and in view of Cordero, and in view of Smith and in view of Olson.

45. 	As per claim 8,  Falcon in view of Hansen, and  in view of Siu, in view of Cordero, in view of Smith, and in view of Olson, and in view of Felch discloses: The apparatus of claim 14 wherein the array of multiprocessors are to connect to the system memory device via the IO interface.  (Falcon, ¶63, “GMCH 420 may be a chipset, or a portion of a chipset. GMCH 420 may communicate with processors 410, 415 and control interaction between processors 410, 415 and memory 440. GMCH 420 may also act as an accelerated bus interface between the processors 410, 415 and other elements of system 400. In one embodiment, GMCH 420 communicates with processors 410, 415 via a multi-drop bus, such as a frontside bus (FSB) 495.”)

46.	Claim 20, which is similar in scope to claims 6, and 7, thus rejected under the same rationale.

47.	Claims 12, and 13  are rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, and in view of Hansen et al., US 2004/0015533 A1, and in view of Siu et al., US 2006/0101244 A1, and further in view of Nakajima et al., US 2016/0048464 A1.

48. 	As per claim 12, Falcon in view of Hansen and in view of Siu discloses:  The apparatus of claim 1 (See rejection of claim 1 above.)

49.	Falcon in view of Hansen and in view of Siu doesn’ t expressly disclose:  A virtualization circuitry to share the array of multiprocessors with a plurality of virtual machines.  

50.	Nakajima discloses: A virtualization circuitry to share the array of multiprocessors with a plurality of virtual machines.  (Nakajima, ¶19, “The computing device 100 may be embodied as any type of device capable of performing inter -virtual-machine shared memory communication and otherwise performing the functions described herein. For example, the computing device 100 may be embodied as, without limitation, a workstation, a server computer, a distributed computing system, a multiprocessor system, a laptop computer, a notebook computer, a tablet computer, a smartphone, a mobile computing device, a wearable computing device, a computer, a desktop computer, a consumer electronic device, a smart appliance, and/or any other computing device capable of inter -virtual-machine shared memory communication”, and Abstract, “Technologies for secure inter-virtual-machine shared memory communication include a computing device with hardware virtualization support.”. Notes: hardware virtualization is in this case the virtualization circuitry.”)

51.	Nakajima is analogous art with respect to Falcon in view of Hansen and in view of Siu because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of that a virtualization circuitry to share the array of multiprocessors with a plurality of virtual machines, as taught by Nakajima into the teaching of Falcon in view of Hansen and  in view of Siu. The suggestion for doing so would improve shared memory performance.  Therefore, it would have been obvious to combine Nakajima with Falcon in view of Hansen in view of Siu.

52. 	As per claim 13, Falcon in view of Hansen, and in view of Nakajima discloses:  The apparatus of claim 12 wherein the virtualization circuitry comprises multiple sets of control registers to be associated with multiple corresponding virtual machines, a group of 85control registers to store one or more address pointers to identify a region of memory associated with a corresponding virtual machine.  (Nakajima, ¶93, “to register the second shared memory segment comprises to register the shared buffer at a next pointer of the secure view control structure; to process the shared buffer comprises to increment a processed pointer of the secure view control structure in response to processing of the shared buffer; and to generate the shared buffer comprises to determine, by the source virtual machine, whether a capacity of the secure view is exceeded, and in response to a determination that the capacity of the source virtual machine is exceeded, to wait, by the source virtual machine, for the target virtual machine to complete processing the shared buffer; remove, by the source virtual machine, the shared buffer from the grant table in response to completion of processing of the shared buffer by the target virtual machine.”, and ¶56 )


53. 	Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Falcon et al., US 2016/0026912 A1, and in view of Hansen et al., US 2004/0015533 A1, and in view of Siu et al., US 2006/0101244 A1, and further in view of Kuharenko et al., US
2018/0032796 A1.

54.	 As per claim 15, Falcon in view of Hansen, and in view of Cordero discloses: The apparatus of claim 14 (see rejection of claim 14 above.)

55. 	Falcon in view of Hansen and in view of Siu doesn’ t expressly disclose: wherein the mixed-precision execution circuitry is to evaluate an activation function based on the output matrix.

56.	Kuharenko discloses: the mixed-precision execution circuitry is to evaluate an activation function based on the output matrix. (Kuharenko, 956-58, and 953, “FIG. 6 is a diagram illustrating an example architecture of CNN 120 or CNN 170 in its respective training mode. FIG. 7 is a diagram illustrating an example architecture of CNN 120 or CNN 170 in its respective feature-vector-generation mode. In the discussion that follows, the CNN architectures may include convolutional layers, non-linear (e.g., activation function) layers, pooling layers, fully-connected layers, a regularization layer, and a loss layer..”)

57. Kuharenko is analogous art with respect to Falcon in view of Hansen, and in view of Siu because they are from the same field of endeavor, namely image processing. At the time the application was filed, it would have been obvious to a
person of ordinary skill in the art to include the process of the mixed-precision execution circuitry is to evaluate an activation function based on the output matrix, as taught by Kuharenko into the teaching of Falcon in view of Hansen, and in view of Siu. The
suggestion for doing so would learn a better representation for the input in order to generalize well. Therefore, it would have been obvious to combine Kuharenko with Falcon in view of Hansen and in view of Siu.


Conclusion 

68.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDERRAHIM MEROUAN whose telephone number is (571)270-5254.  The examiner can normally be reached on Monday to Friday 8 AM-5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Kent Chang can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ABDERRAHIM MEROUAN/Primary Examiner, Art Unit 2619