DETAILED ACTION
Status of Claims
This action is in response to the applicant amendment filed on 7/16/2021. Claim 1 – 22 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 16, 2013 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 7 – 12 and 18 – 22 are rejected under 35 U.S.C. 103 as being unpatentable over Goyal et al., US20170316312, System and Method for Deep Learning Processor, 2017 in view of Reinhardt et al., US20160352598A1, Message Aggregation Combining and Compression for Efficient Data Communication in GPU based clusters, Dec, 2016, and Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018.

Regarding Claim 1, Goyal disclose: 
A deep neural network (DNN) system (See at least Goyal Abs, ln. 1 – 6, where deep learning processor base on neural network), comprising: 
a plurality of processing elements (See at least Goyal, Fig. 1, where plurality of TE 104), 

wherein multiple inputs, each input being associated with a respective input of the plurality of DNN layers (See at least Goyal, fig. 2, where the inputs being processed by TE are a part of the input data of the plurality of DNN layers), are processed in parallel by the plurality of processing elements, each … associated processing element being configured to process each input of the multiple inputs (See at least Goyal,  para. 0024, ln. 12 – 13, where each of plurality of TE retrieve and process input data from OSM106 and external memory; ln. 9 – 10, where each TE is performing a portion of sub-task of neural network in parallel).
Goyal did not explicitly disclose: 
a plurality of queues 
wherein each queue of the plurality of queues is associated with at least one of the plurality of processing elements 
wherein each queue of the plurality of queues is mapped to one of the plurality of DNN layers,
input processed by the plurality of queues
each queue and processing element being configured to process each input of the multiple inputs based on a DNN processing profile determined from a queue packet associated with the input
Reinhardt discloses: 
a plurality of queues (See at least Reinhardt, Fig. 2 where multiple queue 60)

wherein each queue of the plurality of queues is mapped to one of the plurality of DNN layers (See at least Reinhardt, Fig. 2, & para. 0043, where queue 60 are used in a pipelined manner and map to one of each processing steps [plurality of layers]),
input processed by the plurality of queues (See at least Reinhardt, Fig. 2, where tasks [input] are queued [processed] at each processor 52 to 56)
each queue and processing element being configured to process each input of the multiple inputs based on a DNN processing profile determined from a queue packet associated with the input (See at least Reinhardt fig. 2 & para. 0042, ln. 7 – 9, where message 40 [inputs] are conveyed to queues 60 [processed by queue] where they are stored; para. 0041, 3 – 4, based on the information in package header [processing profile from queue packet])
Goyal and Reinhardt both teach multi-processor, parallel computing in pipeline manner and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal’s teaching of  the hardware based DNN with Reinhardt’s teaching of queueing for each of the processing pipeline to achieve the deep learning processor with queue in between of each layer. One of the ordinary skilled in the art would have motivated to make this modification in order to allow for parallel execution of threads (See at least Reinhardt, para. 0090, ln. 1 – 6).
Goyal in view of Reinhardt did not explicitly disclose:
based on a DNN processing profile determined from a queue packet associated with each input
Das explicitly discloses: 
based on a DNN processing profile determined from a queue packet associated with each input (see at least Das, Table. 2, where task registry initialized by OS in table 2  [information in queue packet] including WD; Col, 7, ln. 15 – 17, where WD can be a job request; Col, 6, ln, 49 – 56, where request include how the associated data [data in queue packet] is to be processed [DNN processing profile])
Goyal (in view of Reinhardt) and Das both teach multi-processor, parallel computing system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal (in view of Reinhardt)’s teaching of  the deep neural network processor with queues and Das’s teaching of the details of the elements in the process instruction to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order for the processor to efficiently process the instructions (See at least Das, col. 13, ln. 18 - 20).

Regarding Claim 7, depending on Claim 1, Goyal further discloses: wherein certain of the plurality of queues and associated processing elements receive queue packets through remote direct memory access (See at least Goyal, Fig. 1, DLC 108 direct access external memory resource; Das, col. 17, ln. 42 – 43, WD may be stored in registers).



Regarding Claim 9, depending on Claim 1, Goyal further discloses: wherein each of the multiple inputs is processed at a different DNN layer type (See at least Goyal, para. 0024, ln. 12 – 13, where each of TE retrieve and process input from OSM or memory source; ln. 9 – 10, where each TE perform a portion/subtask [different layer type] of neural network in parallel).

Regarding Claim 10, depending on Claim 1, Goyal further discloses: wherein an associated processing element for a queue processes with respect to a specific DNN layer (See at least Goyal, para. 0029, ln. 4 – 5, where TE in the example is configured for convolution layer).

Regarding Claim 11, depending on Claim 10, Goyal further discloses: wherein the specific DNN layer is supported by different DNN networks to enable multiple use of the specific DNN layer. (See at least Goyal, para. 0022, ln. 1 – 3, where DLP is configured to implement one or more neural networks).

Regarding Claim 12, Goyal teach: a method for deep neural network (DNN) processing (See at least Goyal, Abs. ln. 1 – 6, deep learning processor based on neural network), the method comprising: 

Goyal do not explicitly disclose: 
writing a queue packet, associated with each input of the multiple inputs, to a queue, 
wherein each queue is mapped to one of the plurality of DNN layers in an inference pipeline; and
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet.
Reinhardt discloses:
writing a queue packet, associated with each input of the multiple inputs to a queue (See at least Reinhardt, Fig. 2, where processor generate [write] multiple messages [queue packet]), 
wherein each queue is mapped to one of a plurality of DNN layers in an inference pipeline (See at least Reinhardt, Fig. 2, & para. 0043, where queue 60 are used in a pipelined manner); 
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet (See at 
Goyal and Reinhardt both teach multi-processor, parallel computing in pipeline manner and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal’s teaching of  the hardware based DNN with Reinhardt’s teaching of queueing for each of the processing pipeline to achieve the deep learning processor with queue in between of each layer. One of the ordinary skilled in the art would have motivated to make this modification in order to allow for parallel execution of threads (See at least Reinhardt, para. 0090, ln. 1 – 6).
Goyal in view of Reinhardt do not explicitly disclose:
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet.
Das discloses:
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet (see at least Das, Table. 2, where task registry initialized by OS in table 2  [information in queue packet] including WD; Col, 7, ln. 15 – 17, where WD can be a job request; Col, 6, ln, 49 – 56, where request include how the data to be processed [DNN processing profile]).
Goyal (in view of Reinhardt) and Das both teach multi-processor, parallel computing system and are analogous. It would have been prima facie obvious to one of ordinary skill in the 

Regarding Claim 18, depending on Claim 12, Reinhardt further discloses: wherein writing another queue packet to another queue based on the processed queue packet. (See at least Reinhardt, Fig. 5 & para. 0082, ln. 1 – 4, where 404 generate [write] another message [queue packet] at processing, 406 determine if the message [queue packet] qualify for further processing, after processing the new message are stored in respective queue).

Regarding Claim 19 – 22, Claim 19 – 22 are the corresponding method claim of Claim 8 – 11. Claim 19 – 22 are rejected with the same reason as Claim 8 – 11.   

Claim 2 – 6 and 13 – 17 are rejected under 35 U.S.C. 103 as being unpatentable over Goyal et al., US20170316312, System and Method for Deep Learning Processor, 2017 in view of Reinhardt et al., US20160352598A1, Message Aggregation Combining and Compression for Efficient Data Communication in GPU based clusters, Dec, 2016, and Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018, further in view of LI et al. CA3013680 Data Flow Processing Method and Apparatus and System Aug 2017.

Regarding Claim 2, Goyal in view of Reinhardt and Das teach the system of Claim 1, Goyal in view of Reinhardt and Das further teach:
wherein the queue packet identifies at least, a pointer to buffer for data, and previous/next DNN layer identifiers (See at least Das Table 2, where address pointer [pointer to buffer for data]; col. 17, ln. 15 – 18, where WD may contain a pointer to a queue of jobs here the job refer to the stages [DNN layers] of pipeline in Goyal’s teaching).
Goyal in view of Reinhardt and Das did not explicitly teach:
a DNN network identifier, a DNN layer identifier
	Li teach: 
a DNN network identifier, a DNN layer identifier (See at least Li 0112, ln. 1, where each pipeline use a unique queue ID [network Identifier]; ln. 4, where a label that identifies a to be performed processing action [layer identifier])
Reinhardt and Li both teach data processing pipeline system and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal in view of Reinhardt and Das’s teaching of  the hardware based deep learning processor system with Li’s teaching of pipeline and packet management to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order for the processor to identify and perform the corresponding action (See at least Li, para. 0112, ln. 5 - 7).



Regarding Claim 4, depending on Claim 2, Li further discloses: wherein the DNN network identifier enables processing of multiple DNN workloads by designating which network to use (See at least Li, para. 0112, ln. 1 – 2, where unique queue ID [network identifier] is assigned to each of multiple pipeline queue [DNN workload]).

Regarding Claim 5, depending on Claim 2, Das further discloses: wherein the previous/next DNN layer identifiers identify connected DNN layers. (See at least Das, col. 17, ln. 15 – 18, where WD may contain a pointer to a queue of jobs here the job refer to the stages [DNN layers] of pipeline in Goyal’s teaching).

Regarding Claim 6, depending on Claim 2, Das further discloses: wherein the queue packets include at least instructions on how to launch threads, provide a size of private memory allocation, provide a size of group memory allocation, provide a handle for an object in memory that includes an executable ISA image for a computation kernel, and control and synchronization information. (See at least Das, Table 2, where Authority mask [memory allocations], Context save/restore pointer [control, synchronization information], work 

Regarding Claim 13 – 17, Claim 13 – 17 are the corresponding method claim of Claim 2 – 6. Claim 13 – 17 are rejected with the same reason as Claim 2 – 6.   

Response to Amendment
Applicant's remark filed on 7/16/2021 has been fully considered but they are not persuasive. 
Applicant state that Goyal does not teach “each input being associated with a respective input of the plurality of DNN layers”. Examiner respectfully disagree. Each neural network data being processed by tensor engines TE are a part of the input data of a plurality of layers of a neural network as shown in figure 2. In addition, paragraph 0024 point out that “each tensor engine 104 is configured to perform a portion/sub-task of the neural network processing task in parallel”. Examiner interpret each portion/sub-task refer to the processing of a layer or a portion of a layer in a neural network. In considering “for a non-limiting example, a large size image can be broken into a plurality of smaller image portions, wherein the size of each of the image portions matches with the input data width of one tensor engine 104 and is handled by each tensor engine 104”, if the image size is bigger than the capacity of a tensor, the system can perform task of a neural network layer in more than one TE. If the image size is not bigger than the capacity of a tensor, such broken down of a processing layer may not be necessary. 

	Lastly, applicant state that Das does not disclose: “wherein multiple inputs, each input being associated with a respective input of the plurality of DNN layer … based on a DNN processing profile determined from a queue packet associated with each input”. Examiner respectfully disagree. Das discloses processing multi processes of DNN layers (Das, fig. 14) base on register entries that contain detailed instruction (Das, tbl. 2) which is analogous to the combined teaching of Goyal and Reinhardt, which disclose processing multi processes of DNN layer based on packet header of the queued information that contain instruction of how to process the information. One of the ordinary skilled in the art would combine Goyal in view of Reinhardt’s teaching with Das’s teaching of the details in the instructions of the multi processes system in order to efficiently process the instructions (Das, Col. 13, ln. 18-20). Goyal in view of Reinhardt and further in view of Das disclose the above limitation. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        


/ERIC NILSSON/Primary Examiner, Art Unit 2122