DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Drawings.  In view of amendment, the objections to the drawings are withdrawn.
Claim objections.  In view of amendment, the objection to claim 10 is withdrawn.

Applicant’s arguments with respect to the rejection of claims 11-2, 14, and 18 under 35 USC 112(a) have been fully considered but are not persuasive.
Applicant asserts that the “data reorganization unit” and “data supply unit” are disclosed in a manner that complies with the written description requirement because fig 2 and 4a provide details of the data reorganization unit and data supply unit and the specification including [0035] further mentions that electronic circuitry can be employed in implementations (Remarks p. 9-10). 
Examiner respectfully disagrees.  With respect to the written description requirement for limitations interpreted under 35 USC 112(f), it is not enough that one of ordinary skill in the art would understand the specification to disclose a structure, but whether that person would be capable of implementing that structure.  See MPEP 2181.II.A section discussing Aristocrat Techs Australia PTY Ltd v Int’l Game Tech., 521 F3d 1328 (Fed Cir 2008).  With respect to the “data reorganization unit”, figure 2 merely describes the data reorganization unit as a black box, and as structure described in the specification as “electronic circuitry” does not describe “that structure” of a “data the entire claim function.

Applicant’s arguments with respect to the rejection of claims 11-2, 14, and 18 under 35 USC 112(b) have been fully considered but are not persuasive.
For the same reasons asserted with respect to the rejection under 35 USC 112(a) above, Applicant asserts that the skilled artisan would understand the scope of the claims with reasonable certainty (Remarks p. 10-11).
Examiner respectfully disagrees.  Stating that known techniques or methods can be used does not disclose structure in the context of a means plus function limitation.  See MPEP 2181.II.A section discussing 	Biomedino, LLC v. Waters Technology Corp., 490 F.3d 946 (Fed. Cir. 1997).  For the reasons discussed above with respect to the written description, neither the specification nor the drawings disclose sufficient structure, material or acts to perform he entire claimed function of either the “data reorganization unit” or the “data supply unit”.

Applicant’s arguments with respect to the rejection of claim 1-12, 14-18, wherein claim 1 which now incorporates previous claim 13 is rejected under 35 USC 103 have been fully considered but are not persuasive.

With respect to assertions made as to fig 4B, Examiner points out that Fig 4C-436 Accelerator Integration is what is being pointed to, and that in this embodiment the Accelerator Integration is located within the processor (host unit).  Furthermore, Das discloses the Accelerator Integrator performing more than address translation.  Das discloses that the accelerator Integration circuit provides cache management, memory access, context management, and interrupt management services on behalf of a plurality of GPUs of the graphics acceleration module ([0096] lines 1-5).  Examiner interprets the language to provide on behalf of to comprise to configure, and the memory access is interpreted as for a subsequent data transfer between the main memory and the local memory.  See also  [0097] lines 11-12 which discloses a cache with in the accelerator integrator circuit which stores commands and data for efficient access by the GPUs, and [0100] lines 6-8, which further describes the accelerator integrator circuit to manage virtualization of the graphics processing engines, interrupts, and memory management.  For these reasons, Examiner maintains that Judd in view of 

Applicant’s arguments with respect to the rejection of claim 4, 21 now rejected under 35 USC 103 have been fully considered but are not persuasive.
Applicant asserts that Judd does not disclose bit-serial multipliers configurable by software wherein the software comprises an application program carrying out linear algebraic computations.  Applicant asserts Judd describes hardware control by a transposer 520 comprising input registers not software control (Remarks p. 13-14). 
Although Judd discloses an embodiment wherein a hardware transposer performs the function of configuring bit-serial multipliers to perform the linear algebraic operations in variable precision, Judd further discloses that while figures describe the bit serial tile as hardware, the tile may be emulated in software on a processor ([0050]).  Furthermore, the claim requires that the bit-serial multipliers are configurable by software.  As disclosed by paragraph [0050], although an embodiment is disclosed wherein the bit-serial multipliers are configured by hardware, they are also clearly configurable by software as disclosed in [0050].
Although Judd discloses the bit-serial multipliers are configurable by software to perform the linear algebraic operations, Judd does not explicitly disclose the software comprising an application program carrying out linear algebraic computations. However, in the same field of endeavor, Das discloses various graphics processing application program interfaces (APIs) supporting primitives supported by the GPU to include linear 

Specification
The specification is objected to because claim limitations “data reorganization unit” and “data supply unit” invoke 35 USC 112(f).  The broadest reasonable interpretation of these claim elements is the structure, material or acts described in the specification as performing the entire claimed function and equivalents.  See MPEP 2181.  The specification fails to provide an adequate description of these limitations in the form of the structure, material, or acts to perform the claim function. See rejection under 35 USC 112b below for further details as to the description requirement.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:


Claims 11-12, and 18 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.  

Claim limitations the “data reorganization unit” recited in claim 11, and claim 18; and “data supply unit’ recited in claim 12 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.  See rejection under 35 USC 112(b) below as to the specific reasons these elements are lacking structure, material, or acts for performing the entire claimed function that result in this associated rejection for lack of written description of these required elements.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:




Claims 11-12, 14, 18, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim limitations the “data reorganization unit” recited in claim 11, and claim 18; and “data supply unit’ recited in claim 12 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the specification and drawings fail to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
With respect to the each of these limitations the specification merely describes each purely in functional terms with respect to what the limitation does, versus what it is.   
With respect to the “data reorganization unit”, the specification describes the unit with no further structural or algorithmic detail than is claimed functionally.  See, e.g., [0066-0068] and fig 9.   
With respect to the “data supply unit”, the specification similarly describes the unit as comprising a memory stream unit and a scratchpad ([0049] and fig 4).  With respect to the scratchpad, the structure for the scratchpad is known to one of ordinary skill in the art as an embedded computer cache.  See e.g., Hennessy et al., Computer Architecture: A quantitative approach, Elsevier Science and Technology, (2014), p. 480.  entire claimed function is not provided.
Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 11, 14-18, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over US 20170357891 A1 Judd et al., (hereinafter “Judd”)  in view of US 20180322390 A1 Das et al., (hereinafter “Das”).

Regarding claim 1, Judd teaches the following:
a co-processing module comprising a co-processing unit, the co-processing unit comprising a parallel array of bit-serial processing units, the bit-serial processing units being adapted to perform the linear algebraic operations with variable precision, (Fig 2, [0046-0059], device 200 for co-processing module, device 270 as further detailed in fig 3 for bit-serial processing unit, wherein devices 270 are arranged in a 2D parallel array and [0153] lines 4-7 for the array being configured as a systolic array, with last 6 lines of [0048] and 
Judd is, however, silent with respect to the computing system further comprising a host unit.  However, in the same field of endeavor Das discloses
a host unit (Fig 1-101, [0044] lines 1-8) comprising 
a main memory (Fig 1-104);   
a central processing unit (Fig 1-102); and 
an offload engine adapted to configure the co-processing module for a subsequent data transfer between the main memory and the local memory (Fig 4C-436, [0104] Acceleration Integration unit for offload engine, with [0096-0098] describing configuration by the Acceleration Integration unit of data transfer between main memory (system memory-4410 and GPU comprising GFX  433 – M memories).
It would have been obvious to one of ordinary skill in the art before the effective filing date, to include Judd’s co-processing module within a computing system that includes the host unit disclosed by Das, for the benefit of using Das’ host to allocate work to Judd’s co-processing module and/or multiple co-processing modules operating in parallel ([0044]).  This would merely be applying a known technique to a known device ready for improvement to yield predictable results. MPEP  2141 I.(D).


wherein the co-processing module comprises a local memory, the local memory comprising a bit-level memory layout (Fig 2 NBin, NBout wherein NBin is previously described in [0041] as a buffer; [0050] lines 1-5, [0052] lines 6-8 for describing a bit-level memory layout).

Regarding claim 3, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
wherein each of the bit-serial processing units comprises a bit-serial multiplier and a bit-serial adder (fig 3, [0059]).

Regarding claim 4, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
wherein the bit-serial multipliers are configurable by software to perform the linear algebraic operations in variable precisions from 1-bit to k-bit, wherein k is the maximum precision of the bit-serial multiplier, the software application comprising an application program carrying out linear algebraic computations ([0082], in this paragraph Judd discloses hardware configuring bit-serial multipliers of the bit serial tile, however Judd also discloses the tile may be emulated in software [0050], therefore the bit-serial multipliers are configurable by software).


Regarding claim 5, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
wherein the bit-serial processing units are bit-serial multiply-accumulate units configured to perform a multiply-accumulate operation (Fig 3, [0059] serial inner product circuit for bit serial multiply-accumulate unit performing a multiply accumulate by multiplication and accumulating the resulting sum).

Regarding claim 6, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:


Regarding claim 7, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
wherein each of the bit-serial processing units comprises a bit-serial multiplier and a bit-serial adder, (fig 3, [0059]) and 
wherein the computing system comprises a bypass logic configured to support input data of variable precision ([0076], [0082] lines 6-7 and 16-17, fig 20 [0135-0137] OR gates for logic).
Judd discloses bypass logic configured to support input data of variable precision, but does not explicitly disclose the bit-serial multiplier comprising bypass logic at the multiplier.  However in the same field of endeavor Das discloses optimized compute hardware for machine learning operations that include multiply and accumulate operation (title, abstract, [0155], fig 7B, 7E).  Das further discloses bypass logic provided at each functional unit, wherein configuration of the functional unit includes a multiplier in support of a multiplication operation (Fig 6F multiplexer, [0158-0159]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to include bypass logic multiplexers at the multipliers as disclosed by Das at the same location in the apparatus of Judd controlled by the bypass logic of Judd.  This would achieve the benefit of each functional unit within the compute unit can be configured to be bypassed ([0158] line 6-8).  This would merely be applying a known 

Regarding claim 8, in addition to the teachings addressed in the claim 7 analysis, Judd teaches the following:
wherein the bypass logic is configured to use power gating or clock gating to deactivate unused stages of the bit-serial multiplier ([0076] lines 5-8).

Regarding claim 9, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
wherein the parallel array of bit- serial processing units is a 2-dimensional systolic array (Fig 2 devices 270 arranged in a 2D parallel array, and [0153] lines 4-7 for the array being configured as a systolic array).

Regarding claim 11, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
wherein the co-processing module comprises a local memory, the local memory comprising a bit-level memory layout (Fig 2 NBin, NBout wherein NBin is previously described in [0041] as a buffer; [0050] lines 1-5, [0052] lines 6-8 for describing a bit-level memory layout), further comprising 
a data reorganization unit (fig 6 Dispatcher) configured to 
receive input data in a brick-wise format (Fig 6 from Neuron Memory to Shuffler 510, [0061] lines 1-3, brick for byte-wise format) 
transform the input data into a bit-serial format (Fig 5 transposer 520); and 
store the input data in the bit-serial format in the local memory (Fig 6, from Transposer 520 to NBin 230, storing in 230, [0065] lines 9-10).
Judd teaches receiving the input data in brick-wise format, i.e., 2 bytes, but does not explicitly disclose receiving input data in a single byte format.  However, in another part of Judd’s disclosure, Judd discloses decomposable processing units being a common approach of using decomposable multipliers and adders by configuring a 16-bit adder as two 8-bit adders.  It would have been obvious to one of ordinary skill in the art before the effective filing date to receive input data in a byte-wise format instead of a brick-wise format.  This would merely be the use of a known technique to improve a similar device in the same way. MPEP  2141 I.(C).  By modifying Judd to receive input data in a byte-wise format, Judd would achieve the benefit of supporting decomposable processing units that may be configured as 16 bit adders, or 8-bit adders with minimal overhead and to increase computational throughput ([0087]).

Regarding claim 14, in addition to the teachings addressed in the claim 1 analysis, Judd teaches the following:
Wherein the linear algebraic operations are those of a deep neural network application (title, abstract, [0002]).



Regarding claim 16, in addition to the teachings addressed in the claim 15 analysis, Judd teaches the following:
	performing, by the bit-serial processing units, multiply-accumulate operations with variable precision (Fig 3, [0059] serial inner product circuit for bit serial multiply-accumulate unit performing a multiply accumulate by multiplication and accumulating the resulting sum, last 6 lines of [0048] and [0079-0082] for variable precision).

Claim 17 is directed to a method that would be practiced by the apparatus of claim 7.  All steps recited in the method of claim 17 are contained within the apparatus of claim 7.  The rejection with respect to claim 17 applies equally to claim 7.

Claim 18 is directed to a method that would be practiced by the apparatus of claim 11.  All steps recited in the method of claim 18 are contained within the apparatus of claim 11.  The rejection with respect to claim 11 applies equally to claim 18.

Regarding claim 21 in addition to the teachings addressed in the claim 15 analysis, Judd teaches the following:
configurable by software).
Although Judd discloses the bit-serial multipliers are configurable by software to perform the linear algebraic operations, Judd does not explicitly disclose the software comprising an application program carrying out linear algebraic computations. However, in the same field of endeavor, Das discloses various graphics processing application program interfaces (APIs) supporting primitives supported by the GPU to include linear algebraic operations such as GEMM matrix operations ([0003], [0081], [0133],[0179],[0304]).  It would have been obvious to one of ordinary skill in the art before the effective filing date to implement the software disclosed by Judd to comprise an application program as disclosed by Das to implement the linear algebraic operations disclosed by Judd.  Judd discloses that the embodiment of fig 2 may be emulated in software on a processor such as a GPU ([0500]), and Das discloses application programs API for this emulation.  

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Judd in view of Das in view of US 20200050918 A1 Chen et al., (hereinafter “Chen-2020”).

Regarding claim 10, in addition to the teachings addressed in the claim 1 analysis, Judd discloses dividing the dimensions of the matrix into H and W dimensions in number of partitions, but does not explicitly disclose the parallel array being a bit-serial processing elements is a 1-dimensional array. However, in the same field of endeavor Chen-2020 discloses a dynamically configurable processing device that includes an array of multiplier and accumulators with application in neural networks ([0005], Fig 2, Fig 4).  Among the configurations supported is vector multiplication (claim 14).  Chen-2020 further discloses configuration of the apparatus for operations in a single dimension ([0090-0091]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to include Chen-2020s control of the dimensionality of the multiplication and accumulation array to include configurability for one dimension to achieve the benefit of increased configurability of the array.  To do so would be merely to apply a known technique to a known device ready for improvement to yield predictable results. MPEP  2141 I.(D).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Judd in view of Das in view of T. Chen et al., DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning, ASPLOS ’14, 2014 (hereinafter “Chen-2014”).


the co-processing module comprises a local memory, the local memory comprising a bit-level memory layout (Fig 2 NBin, NBout, wherein NBin is previously described in [0041] as a buffer; [0050] lines 1-5, [0052] lines 6-8 for describing a bit-level memory layout), further comprising 
a data supply unit (Fig 6 dispatcher) adapted to 
configure the co-processing unit for a selected bit width ([0079-0082]); 
Judd discloses use of Neural Functional Unit’s (NFUs) with local memory NBin, NBout, data flow at the bit level to and from local memory, and control of operation of the components by a controller (Fig 2, [0172] )but does not explicitly disclose the control mechanism form the data write and read operations to and from the local memory.
However, in the same field of endeavor Chen-2014 discloses a similar tile based accelerator for neural network applications comprising NFUs and local memory NBin, NBout (Fig 11, section 5-5.2). Chen-2014 further discloses control of control data write operations from the local memory to the co- processing unit (Fig 11 Control Processor (CP) with control of DMA to NBin, and control of NFU, section 5.3.1, the CP drives the execution of DMAs of the three buffers and NFU, section 5.2.1.Width. first sentence describing write); and control data read operations from the co-processing unit to the local memory (Section 5.3.1, CP drives the execution DMAs of the three buffers and NFU, 5.2.1. Width. first sentence describing read).  It would have been obvious to one of ordinary skill in the art before the effective filing date to use the control mechanism .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289.  The examiner can normally be reached on 7:30 am - 5:00 pm, CST, every other Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  


/EMILY E LAROCQUE/Examiner, Art Unit 2182