DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/18/2022 has been entered.
 
Status of Claims
This action is in response to the applicant amendment filed on 3/18/2022. Claim 1 – 22 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 16, 2013 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Argument
Applicant's remark filed on 3/18/2022 has been fully considered but they are not persuasive. 
Regarding claim rejection under 35 U.S.C. 103, applicant state that Goyal does not teach tensor engine processes data from an input of a DNN layer. Examiner respectfully disagrees. Goyal discloses that each MatrixMul engine in tensor engine performs vector/matrix multiplication and summation (Goyal, figure 4 & paragraph 0026, ln. 1 – 6 & ln. 15 – 20). As an embodiment, Goyal disclose a vector-matrix multiplication of input and weight (Goyal, figure 5A & paragraph 0026, ln. 9 – 11). As demonstrated in figure 2 of Goyal, the multiplication of input and weight happens in each layers of deep neural network. Skilled in the art would appraise that each tensor engine of Goyal process data from input of a DNN layer and “each input is associated with a respective input of the plurality of DNN layers … each associated processing element being configured to process each input of the multiple inputs”. 
Applicant further state that Goyal provides no basis to the interpretation that “each portion/sub-task refers to the processing of a layer or a portion of a layer in a neural network”. Examiner respectfully disagrees. Goyal demonstrated in figure 3 that the operation of a layer of a convolution neural network is the multiplication and summation of input matrix and the kernel matrix of the layer. For a non-limiting example, a large size image can be broken into a plurality of smaller image portions, wherein the size of each of the image portion matches with the input data width of one tensor engine and is handled by each tensor engine (Goyal, paragraph 0024, ln. 24 – 28). It is clear in this context that each tensor is processing data of either a portion of a layer (when the image size is bigger than the input width of tensor engine) or a layer (when the image size is not bigger than the input width of tensor engine). Further, Goyal describe the data processing among neural network layers are in pipeline manner (Goyal, paragraph 0021, line 8 – 14, where  the information/data processed progressing from one layer to next in sequence along a processing pipeline) and the DLP runs a complete pipeline of deep learning processing/operations (Goyal, paragraph 0018, line 12 – 14). Since each of the tensor processes data of a layer, it is clear that in order to run a complete pipeline of DNN process, some of these tensors are processing data of first layer, some are the second layer and some are the last layer.  In other words, tensor engines are processing data from different layers of the neural network.   
Applicant further state that Goyal does not disclose “a plurality of queues”, Reinhardt does not disclose “each queue of the plurality of queues is mapped to one of the plurality of DNN layers”, and disagree that queues map to processing step which map to the layer of neural network. In this case, applicant’s attention is direct to the multiprocessor in processing pipeline. Goyal discloses the use of tensor engines each to perform data processing of a layer in neural network and the information/data processed progressing from one layer to next in sequence along a processing pipeline (Goyal, paragraph 0021, line 8 – 14). Reinhardt discloses a sequence of data processing operations performed by separate circuit/logic (Reinhardt figure 2, item 52 – 56), each of the processing steps are connected by queues that store the processing data of the steps. Goyal and Reinhardt are analogous and skilled in the art would be motivated to implement the queues of Reinhardt in the DNN processing pipeline of Goyal in order to allow parallel execution of processing steps (See at least Reinhardt, para. 0090, ln. 1 – 6)
Applicant further state that Goyal and Reinhardt does not disclose processing “based on DNN processing profile determined from a queue packet associated with the input” and Das does not disclose “multiple inputs, each input is associated with a respective input of the plurality of DNN layers" that are each processed "based on a DNN processing profile determined from a queue packet associated with each input." In this case applicant’s attention is direct to the data processing profile. Goyal in view of Reinhardt disclose DNN layer wise data processing pipeline by tensor engine and each layer are connected with queues. Reinhardt further discloses the use of queue packet header and/or other additional data elements to store processing instructions/status for the packet (Reinhardt paragraph 0041 line 2 – 7). Das on the other hand discloses layer wise data processing of DNN by multiple GPU/graphics processing engine (Das, figure 14). For the shared model, the application is required to make an operating system call with … a work descriptor (WD) (Das, column 18, line 60 – 64), which may contain a pointer to a queue of jobs (Das, column 17, line 16 – 18). For the application of machine learning model demonstrated in figure 14 of Das, data is processed through layers in an arranged sequence. It is clear that the queue of jobs in this application is referring to the queue of layers that the data is arranged to be processed and thus include the prev/next layer identifiers.
 Lastly, applicant state that Li is not analogous art and do not disclose “DNN network identifier” and “DNN layer identifier“. Examiner respectfully disagree. Li discloses pipelines of multiple processing actions that are in a sequential order, and multiple processor cores perform the multiple processing actions according to the sequence of the multiple processing actions (Li paragraph 0009). In this aspect, Li is analogous to the Goyal in view of Reinhardt’s disclosure of the use of multiple tensor engine to perform the sequence/pipeline of DNN layers operation. “to be performed processed action” of Li is an identifier to the action within the multiple processing actions and is analogous to the identifier of layer in the sequence of DNN layers of Goyal. The “unique queue ID” is one of the ID in the operation. Since the operation is analogous to the DNN network process, thus analogous to an DNN network identifier. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1, 7 – 12 and 18 – 22 are rejected under 35 U.S.C. 103 as being unpatentable over Goyal et al., US20170316312, System and Method for Deep Learning Processor, 2017 in view of Reinhardt et al., US20160352598A1, Message Aggregation Combining and Compression for Efficient Data Communication in GPU based clusters, Dec, 2016, and Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018.

Regarding Claim 1, Goyal disclose: 
A deep neural network (DNN) system (See at least Goyal Abs, ln. 1 – 6, where deep learning processor base on neural network), comprising: 
a plurality of processing elements (See at least Goyal, Fig. 1, where plurality of TE 104), 
an inference pipeline including a plurality of DNN layers (See at least Goyal, para. 0022, ln. 15 – 16, 3 stages in the processing pipeline [inference pipeline] for each layer [plurality of layers]),  
wherein multiple inputs, each input being associated with a respective input of the plurality of DNN layers (See at least Goyal, fig. 2, where the inputs being processed by TE are a part of the input data of the plurality of DNN layers), are processed in parallel by the plurality of processing elements, each … associated processing element being configured to process each input of the multiple inputs (See at least Goyal,  para. 0024, ln. 12 – 13, where each of plurality of TE retrieve and process input data from OSM106 and external memory; ln. 9 – 10, where each TE is performing a portion of sub-task of neural network in parallel).
Goyal did not explicitly disclose: 
a plurality of queues 
wherein each queue of the plurality of queues is associated with at least one of the plurality of processing elements 
wherein each queue of the plurality of queues is mapped to one of the plurality of DNN layers,
input processed by the plurality of queues
each queue and processing element being configured to process each input of the multiple inputs based on a DNN processing profile determined from a queue packet associated with the input
Reinhardt discloses: 
a plurality of queues (See at least Reinhardt, Fig. 2 where multiple queue 60)
wherein each queue of the plurality of queues is associated with at least one of the plurality of processing elements (See at least Reinhardt, Fig. 2, & para. 0043, where queue 60 are used in a pipelined manner)
wherein each queue of the plurality of queues is mapped to one of the plurality of DNN layers (See at least Reinhardt, Fig. 2, & para. 0043, where queue 60 are used in a pipelined manner and map to one of each processing steps [plurality of layers]),
input processed by the plurality of queues (See at least Reinhardt, Fig. 2, where tasks [input] are queued [processed] at each processor 52 to 56)
each queue and processing element being configured to process each input of the multiple inputs based on a DNN processing profile determined from a queue packet associated with the input (See at least Reinhardt fig. 2 & para. 0042, ln. 7 – 9, where message 40 [inputs] are conveyed to queues 60 [processed by queue] where they are stored; para. 0041, 3 – 4, where process data based on the information in package header [processing profile from queue packet])
Goyal and Reinhardt both teach multi-processor, parallel computing in pipeline manner and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal’s teaching of  the hardware based DNN with Reinhardt’s teaching of queueing for each of the processing pipeline to achieve the deep learning processor with queue in between of each layer. One of the ordinary skilled in the art would have motivated to make this modification in order to allow for parallel execution of threads (See at least Reinhardt, para. 0090, ln. 1 – 6).
Goyal in view of Reinhardt did not explicitly disclose:
based on a DNN processing profile determined from a queue packet associated with each input
Das explicitly discloses: 
based on a DNN processing profile determined from a queue packet associated with each input (see at least Das, Table. 2, where task registry initialized by OS in table 2  [information in queue packet] including WD; Col, 7, ln. 15 – 17, where WD can be a job request; Col, 6, ln, 49 – 56, where request include how the associated data [data in queue packet] is to be processed [DNN processing profile])
Goyal (in view of Reinhardt) and Das both teach multi-processor, parallel computing system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal (in view of Reinhardt)’s teaching of  the deep neural network processor with queues and Das’s teaching of the details of the elements in the process instruction to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order for the processor to efficiently process the instructions (See at least Das, col. 13, ln. 18 - 20).

Regarding Claim 7, depending on Claim 1, Goyal further discloses: wherein certain of the plurality of queues and associated processing elements receive queue packets through remote direct memory access (See at least Goyal, Fig. 1, DLC 108 direct access external memory resource; Das, col. 17, ln. 42 – 43, WD may be stored in registers).

Regarding Claim 8, depending on Claim 1, Goyal further discloses: wherein the plurality of DNN layers are different DNN layer types (See at least Goyal, para. 0023, ln. 3 – 12, where the neural network layer are different type).

Regarding Claim 9, depending on Claim 1, Goyal further discloses: wherein each of the multiple inputs is processed at a different DNN layer type (See at least Goyal, para. 0024, ln. 12 – 13, where each of TE retrieve and process input from OSM or memory source; ln. 9 – 10, where each TE perform a portion/subtask [different layer type] of neural network in parallel).

Regarding Claim 10, depending on Claim 1, Goyal further discloses: wherein an associated processing element for a queue processes with respect to a specific DNN layer (See at least Goyal, para. 0029, ln. 4 – 5, where TE in the example is configured for convolution layer).

Regarding Claim 11, depending on Claim 10, Goyal further discloses: wherein the specific DNN layer is supported by different DNN networks to enable multiple use of the specific DNN layer. (See at least Goyal, para. 0022, ln. 1 – 3, where DLP is configured to implement one or more neural networks).

Regarding Claim 12, Goyal teach: a method for deep neural network (DNN) processing (See at least Goyal, Abs. ln. 1 – 6, deep learning processor based on neural network), the method comprising: 
processing in parallel for multiple inputs (See at least Goyal,  para. 0024, ln. 12 – 13, where each of plurality of TE retrieve and process input data from OSM106 and external memory; ln. 9 – 10, where each TE is performing a portion of sub-task of neural network in parallel) where each input being associated with a respective input of the plurality of DNN layers in an inference pipeline (See at least Goyal, fig. 2 & para. 0020, ln. 14 – 16 where input data processed by TE are a part of the input data of the plurality of DNN layers in the processing pipeline): 
Goyal do not explicitly disclose: 
writing a queue packet, associated with each input of the multiple inputs, to a queue, 
wherein each queue is mapped to one of the plurality of DNN layers in an inference pipeline; and
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet.
Reinhardt discloses:
writing a queue packet, associated with each input of the multiple inputs to a queue (See at least Reinhardt, Fig. 2, where processor generate [write] multiple messages [queue packet]), 
wherein each queue is mapped to one of a plurality of DNN layers in an inference pipeline (See at least Reinhardt, Fig. 2, & para. 0043, where queue 60 are used in a pipelined manner); 
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet (See at least Reinhardt fig. 2 & para. 0041, ln. 3 – 4, where system process 3 data operations, by processing element 52, 54 and 56 associated with each of the queues 60, each queued information 40, based on the information in package header [processing profile from queue packet])
Goyal and Reinhardt both teach multi-processor, parallel computing in pipeline manner and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal’s teaching of  the hardware based DNN with Reinhardt’s teaching of queueing for each of the processing pipeline to achieve the deep learning processor with queue in between of each layer. One of the ordinary skilled in the art would have motivated to make this modification in order to allow for parallel execution of threads (See at least Reinhardt, para. 0090, ln. 1 – 6).
Goyal in view of Reinhardt do not explicitly disclose:
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet.
Das discloses:
processing, by a processing element associated with each queue, each input of the multiple inputs based on a DNN processing profile determined from the queue packet (see at least Das, Table. 2, where task registry initialized by OS in table 2  [information in queue packet] including WD; Col, 7, ln. 15 – 17, where WD can be a job request; Col, 6, ln, 49 – 56, where request include how the data to be processed [DNN processing profile]).
Goyal (in view of Reinhardt) and Das both teach multi-processor, parallel computing system and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal (in view of Reinhardt)’s teaching of  the deep neural network processor with queues and Das’s teaching of the details of the elements in the process instruction to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order for the processor to efficiently process the instructions (See at least Das, col. 13, ln. 18 - 20).

Regarding Claim 18, depending on Claim 12, Reinhardt further discloses: wherein writing another queue packet to another queue based on the processed queue packet. (See at least Reinhardt, Fig. 5 & para. 0082, ln. 1 – 4, where 404 generate [write] another message [queue packet] at processing, 406 determine if the message [queue packet] qualify for further processing, after processing the new message are stored in respective queue).

Regarding Claim 19 – 22, Claim 19 – 22 are the corresponding method claim of Claim 8 – 11. Claim 19 – 22 are rejected with the same reason as Claim 8 – 11.   

Claim 2 – 6 and 13 – 17 are rejected under 35 U.S.C. 103 as being unpatentable over Goyal et al., US20170316312, System and Method for Deep Learning Processor, 2017 in view of Reinhardt et al., US20160352598A1, Message Aggregation Combining and Compression for Efficient Data Communication in GPU based clusters, Dec, 2016, and Das et al., US10776699, Optimized Computer Hardware for Machine Learning Operation, filed on Jan, 2018, further in view of LI et al. CA3013680 Data Flow Processing Method and Apparatus and System Aug 2017.

Regarding Claim 2, Goyal in view of Reinhardt and Das teach the system of Claim 1, Goyal in view of Reinhardt and Das further teach:
wherein the queue packet identifies at least, a pointer to buffer for data, and previous/next DNN layer identifiers (See at least Das Table 2, where address pointer [pointer to buffer for data]; col. 17, ln. 15 – 18, where WD may contain a pointer to a queue of jobs here the job refer to the stages [DNN layers] of pipeline in Goyal’s teaching).
Goyal in view of Reinhardt and Das did not explicitly teach:
a DNN network identifier, a DNN layer identifier
	Li teach: 
a DNN network identifier, a DNN layer identifier (See at least Li 0112, ln. 1, where each pipeline use a unique queue ID [network Identifier]; ln. 4, where a label that identifies a to be performed processing action [layer identifier])
Reinhardt and Li both teach data processing pipeline system and method and are analogous. It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combining Goyal in view of Reinhardt and Das’s teaching of  the hardware based deep learning processor system with Li’s teaching of pipeline and packet management to achieve the claimed teaching. One of the ordinary skilled in the art would have motivated to make this modification in order for the processor to identify and perform the corresponding action (See at least Li, para. 0112, ln. 5 - 7).

Regarding Claim 3, depending on Claim 2, Li further discloses: wherein the DNN layer identifier identifies a DNN layer type, which is used to determine a nature of computation to be performed and what kernels to launch (See at least Li, para. 0112, ln. 5 – 7, where the common processor performs according to the label a corresponding processing action).

Regarding Claim 4, depending on Claim 2, Li further discloses: wherein the DNN network identifier enables processing of multiple DNN workloads by designating which network to use (See at least Li, para. 0112, ln. 1 – 2, where unique queue ID [network identifier] is assigned to each of multiple pipeline queue [DNN workload]).

Regarding Claim 5, depending on Claim 2, Das further discloses: wherein the previous/next DNN layer identifiers identify connected DNN layers. (See at least Das, col. 17, ln. 15 – 18, where WD may contain a pointer to a queue of jobs here the job refer to the stages [DNN layers] of pipeline in Goyal’s teaching).

Regarding Claim 6, depending on Claim 2, Das further discloses: wherein the queue packets include at least instructions on how to launch threads, provide a size of private memory allocation, provide a size of group memory allocation, provide a handle for an object in memory that includes an executable ISA image for a computation kernel, and control and synchronization information. (See at least Das, Table 2, where Authority mask [memory allocations], Context save/restore pointer [control, synchronization information], work descriptor as a job request; Col, 6, ln, 49 – 56, where request include program to be executed [launch thread, executable for computation kernel]).

Regarding Claim 13 – 17, Claim 13 – 17 are the corresponding method claim of Claim 2 – 6. Claim 13 – 17 are rejected with the same reason as Claim 2 – 6.   

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure: Lin et al. US20160275123A1 Pipeline Execution of Multiple Map-reduce Jobs. Lin discloses a pipeline of tasks performed by data node computing devices. The workflow configuration tables includes information of job ID and data node ip that are analogous to the network ID and layer ID of the instant application. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        
/BRIAN M SMITH/Primary Examiner, Art Unit 2122