PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 16/585,521
Filing Date: 27 Sep 2019
Appellant(s): AKIN et al.



__________________
Ted A. Crawford (Reg. No. 50,610)
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed October 14, 2021.
(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated May 18, 2021 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”

The following ground(s) of rejection are applicable to the appealed claims.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 3-10, 12-13, 15, 16, 18-20, 22 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Scott (patent No. 7,437,521).

Scott taught the invention as claimed including (as to claim 1)  An apparatus comprising: a first circuitry of a near-memory processor (vector processing unit) to: receive data access instructions to access a system memory (P circuit in fig. 1L or vector load/store unit receives the data access instructions (e.g., see col. 24, lines 31-63, and col. 20, lines 32-46); the data access instructions  (e.g., see col. 24,lines 31-63) and col. 20, lines 32-46); the data access instruction having corresponding compute 


exchange synchronization information (MSync marker, Lsync marker) with the second circuitry to store one or more data chunks from the accessed system memory to a local memory at the near memory processor for the second circuitry to use for one or more compute iterations(e.g., see col. 37, line 43- col. 39, line 60 and col. 3, lines 11-23); and map data access operations for the first circuitry to access the system memory to obtain the one or more data chunks based on the data access instructions and the exchanged synchronization information (e.g., see col. 27, line 28-col. 28, line 26 and col. 30 lines 1- 17 and col. 23, line 50-col. 24, line 25).

As to claims 1, 13, 20 (not in claim 10), Scott also taught the first circuitry to map the data access operations such that a memory access bandwidth to obtain and store the one or more data chunks to the local memory substantially matches a computing throughput for the second circuitry to compute results in the one or more compute iterations using the one or more data chunks stored to the local memory (é€.g., see “ol. 3, lines 12-23, col. 6, lines 56-67 and col. 23, line 64-col. 24, line 9)[note by synchronizing the data access with compute operation using mapped flag(s) storing in queues this provides obtaining memory access bandwidth that matches throughput of compute operation].


As to claim 3 Scott taught The apparatus of claim 2, the exchanged synchronization information comprises a barrier synchronization primitive to indicate to the first circuitry a number of subsequent data chunks to access from the system memory while the second circuitry computes results using at least a portion of the one or more data chunks, the number of subsequent data chunks determined based on substantially matching the memory access bandwidth to the computing throughput (e.g., see abstract and col. 10, line 46-col. 11, line 16 and col. 47, line 15-col. 48,line 8, and col. 50, lines 3-64).

As to claim 4,12 Scott taught The apparatus of claim 1, further comprising the first circuitry to: access the system memory through an interface (ecache interface unit “EIU 116” 3 in fig. H and memory interface of e-circuitlO1 in fig. 2) coupled with the near- memory processor via one or more memory channels to obtain the one or more data chunks via the one or memory channels based on the mapped data access operations; and store the one or more data chunks to the local memory based on the mapped data access operations (e.g., see figs. 1H, 1L, 2 and col. 13, lines 33-47). As to the further limitation of storing the results to be store to system memory based on data access 
instructions Scott taught this limitation (e.g., see col. 12, lines 7-37 and col. 31, lines 32- 46).

As to claim 5 Scott taught The apparatus of claim 4, further comprising the first circuitry to: obtain results of the one or more compute iterations based on the exchanged synchronization information; and cause the results to be stored in the system memory based on the data access instructions (e.g., see figs. 2,5 and col. 4, lines 16-30 and col. 14, line 39- col. 15, line 11).

As to claim 6 Scott taught The apparatus of claim 1, comprising the data access instructions included in instructions received from an application hosted by a computing platform that also hosts the near-memory processor (e.g., see col. 3, lines 1-42).

As to claims 7, 22 Scott taught The apparatus of claim 1, comprising: the first circuitry to include one or more access processors (VEU 20); and the second circuitry (V/LS 22)to include one or more execute processors and one or more vector functional units, respective one or more execute processors to control respective one or more vector functional units for the respective one or more vector functional units to compute the results in the one or more compute iterations (e.g., see col. 21, line 28-col. 22, line 67).

As to claim 8 Scott taught The apparatus of claim 7, comprising the local memory arranged in a centralized configuration via which the one or more execute processors or
vector functional units separately have access to the local memory (e.g., see fig. 1B , 1H, 2,3, 5).



As to claim 13 Scott taught An apparatus comprising: a first circuitry of a near- memory processor to: receive compute instructions to use one or more data chunks accessed from a system memory in one or more compute iterations (e.g., see figs 1 H,1 L) [vector unit receives the compute instructions] (e.g., see col. 24, lines 31-63 and col. 20, lines 32-46 and col. 21, lines 28-55), the one or more data chunks accessed and stored to a local memory by a second circuitry of the near-memory processor based on corresponding data access instructions (e.g., see col. 4 lines 1-10, and col. 23, line 14-col. 24, line 19); exchange synchronization information with the second circuitry to access the one or more data chunks stored to the local memory for use in the one or more compute iterations (e.g., see col. 37, line 43- col. 39, line 60 and col. 3, lines 11 - 23); 
and map compute operations for the first circuitry to use the one or more data chunks based on the received compute instructions and the exchanged synchronization information (e.g., see col. 27, line 28-col. 28, line 26 and col. 30 lines 1-17 and col. 23, line 50-col. 24, line 25).

As to claim 14 Scott taught The apparatus of claim 13, comprising the first circuitry to map the compute operations such that a computing throughput for the first circuitry to 

As to claim 15 Scott taught The apparatus of claim 13, further comprising the first circuitry to access the local memory through an interface to the local memory to obtain the one or more data chunks based on the mapped compute operations; and store results of the one or more compute iterations to the local memory through the interface based on the exchanged synchronization information (ecache interface unit)(e.g., see figs. 1H 1L, 2 and col. 13, lines 33-47).

As to claim 16 Scott taught The apparatus of claim 13, comprising: the first circuitry including one or more execute processors and one or more vector functional units(VEU/20), respective one or more execute processors (V/LS 22) to control respective one or more vector functional units for the respective one or more vector
functional units to compute the results in the one or more compute iterations(e.g., see col. 21, line 28-col. 22, line 67); and



As to claim 18 Scott taught The apparatus of claim 16, comprising the local memory arranged in a centralized configuration via which the one or more vector functional units separately have access to the local memory (e.g., see figs. 2, 1H) and col. 12, lines 14-28).

As to claim 19 Scott taught The apparatus of claim 16, comprising the local memory arranged in a distributed configuration via which the one or more vector functional units (FUG1, FUG2, FUG3, FUG3) have access to allocated portions of the local memory (e.g., see figs. 3, 5 and col. 21,lines 28-63 and col. 30, line 62-col. 31, line 10).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scott (patent No. 7,437,521).
As to claim 17 Scott taught The apparatus of claim 16, and Scott taught vector functional units (e.g., see fig. 3 and col. 21, lines 27-63) but did not expressly detail that .
(2) Response to Argument

Appellant argues on pages 21-24 of the Brief that the first and second circuitry of claims 1,10,13,20  must be interpreted narrowly in accordance with an embodiment in the specification by providing two arguments alleging that  the rejection did not clearly explain how the Scott reference could or does include the claimed first and second circuitry.
Applicant ‘s  construction of first circuitry… and second circuitry…  attempts to improperly limit these terms to a particular disclosed embodiment. However the terms first circuitry… and second circuitry do not require that each of first and second circuitry are not part of one larger apparatus (or portion of an apparatus). The language is within the scope of   first and second circuitry being on the same apparatus (or portion of the same apparatus) and being independent and decoupled to operate and transfer data there between, where the vector unit and load/store unit (and/or scalar unit) of Scott operate in an independent and decoupled manner using queues. Note, as to the argument of the scalar unit and vector unit of Scott being dependent, Scott taught the scalar address and operand computation are in one embodiment operated in parallel to 

Appellant argues on page 24-25 of the Brief that the bandwidth matching throughput limitation (claimed in claims 1, 10, 13, 20)  must be interpreted narrowly with respect an embodiment in the specification  by arguing that the Scott teachings of decoupled meaning load/store operations execute independently of execute operations and  the Scott teaching of  performing synchronization that is limited to ordering of instructions  show Scott does not meet claimed memory access bandwidth matching throughput of compute operation. The Examiner disagrees and contends that the Scott teaching (e.g.., at col. 3, lines 52-67) of queues to store operands between vector and scalar units, and the use of queue in the load store unit provide the equalization of the bandwidth and throughput (see figs. 1L, 2), and the Scott teaching of equalization of memory bandwidth and throughput (e.g., see col. 6, lines 54-67) also provides the claimed bandwidth limitation. 
 Appellant argues that the arguments discussed above apply to each of the dependent claims 3-9, 12, 15-16, 17, 18-19 and 22. The Examiner contends that Scott teaches the limitations of each dependent claims as addressed in the final rejection and the arguments as discussed with respect to the independent claims above.


For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,

/ERIC COLEMAN/Primary Examiner, Art Unit 2183         
                                                                                                                                                                                               
Conferees:
/IDRISS N ALROBAYE/Supervisory Patent Examiner, Art Unit 2181    

/KEVIN L ELLIS/Primary Examiner                                                                                                                                                                                                                                                                                                                                                                                                           

Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.