Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 9/22/2022 have been fully considered but they are not persuasive.
[P8]
	Applicant submits that the other recited features narrow the scope of the claims to avoid an inadequate disclosure. The Examiner disagrees, and reiterates that the specification does not support the broad scope of the claims.
The claims are directed to a storage device comprising an operator capable of converting data of any first format to any second format as the genus. The storage device reads stored data, converts the data using the operator, and stores the converted data.
In contrast, the specification discloses a storage device which converts data of a high precision format to a low precision format [SPEC, 0056]. The conversion allows bandwidth to be increased by reducing the data precision, and hence data size, for particular operations which are tolerant of the reduced precision in the input data.
The specification does not appear to support the genus as broadly claimed. Based on a review of the specification, the disclosed genus involves conversions from higher precision data to lower precision data, or at best more general conversions which reduce the size of the data which provide a similar result (e.g., converting to smaller data).
	However, the specification does not appear to disclose embodiments where a storage device comprising an operator performs conversions from lower precision data to higher precision data, or from smaller data to larger data. Such species are clearly within the scope of the claimed genus. Further, the specification does not explain how the unsupported species in the claimed genus would achieve the same effects.
	While the MPEP does not require every detail of the claims to be explicitly disclosed, the MPEP also indicates where a broad genus is claimed and insufficient species are disclosed, the disclosure may be inadequate. See MPEP 2163.II.A.3(a).ii. As above, the claimed genus does not appear commensurate in scope with the disclosed genus. Neither do the disclosed storage devices including precision or size reducing circuitry appear representative of storage devices comprising other circuitry for converting data.
	Accordingly, the rejection of claims 1-12, 14-19, 21-23, and 25 is maintained.

[P10]
	Applicant submits, with respect to “for an operation to be performed on the input data of the second format”, the operation is not optional, or otherwise must be performed. The Examiner disagrees.
	The Examiner notes that “an operation to be performed” is not the same as “performing an operation”. The limitation merely states that there is an intent to perform the operation, and not actual performance thereof. For example, the stated operation itself could later be cancelled at the accelerator, yet the operation would still be considered “to be performed” from the perspective of the storage device at the time of transmission. This is a clear counterexample to Applicant’s assertion that the operation “must” be performed, because transmitting data for an operation to be performed does not necessarily result in the operation being performed.
MPEP 2111.04 does not permit limitations which are not required to be given patentable weight. Accordingly, the Examiner maintains that the language merely conveys an intended use or step for processing the data, rather than an actual step which must be taken as part of the method itself.

[P9-13]
	Applicant submits that none of the cited prior art of record appears to teach or suggest the conversion is performed by an operator within a storage device.
The Examiner disagrees, and reiterates that Chai discloses a buffer chip comprising a data conversion apparatus for changing the precision of data. Adding the buffer chip to a memory of the computer allows for receiving and transmitting data which has been processed using data conversion. See Chai P2-5.
Chai further discloses details of the buffer chip having the conversion apparatus, specifically that the buffer chip is disposed on a memory [P6] and that the buffer chip, by use of the conversion logic, may convert data to a different precision [P8-9].
Including such a buffer chip in the host memory would allow the host memory to transmit data to device memory using a low precision format, thereby reducing the bandwidth consumed on the host to device bus. In the context of McKennon and Suda, who point out the problem of data transfer bandwidth between a host memory and a device memory and the available hardware solution disclosed by Chai, it would have been obvious to the skilled artisan to add the buffer chip of Chai to the host memory to decrease the size of data before transmitting it to the device memory in order to address the known bandwidth problem between the host memory and the device memory.
	Accordingly, Applicant’s arguments are unpersuasive.

[P13]
	Applicant submits that Chai does not disclose “converting, using an operator included in the storage device, the input data received from the host processor into a second format for an operation to be performed on the input data of the second format by an accelerator connected to the storage device”.
The Examiner notes that this argument is drawn to the amended claims, which further include “input data received from the host processor”. While the Examiner considers that data resident in host memory may be from the host processor, at least Chai does not expressly require this feature. As this element is new, it is further addressed below with reference to Bordawekar US 2011/0202745.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: “Memory Device Comprising Operator For Reducing Precision of Input Data”.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-12, 14-19, 21-23, and 25 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
	Specifically, the claims broadly recite storing data in Format1, converting the data from Format1 to Format2 by use of an operator, and storing the data in Format2. However, the claims are so broad as to encompass input data and output data in any format to any other format. Such claims are too broad in view of the specification’s limited disclosure of converting high precision data to low precision data for the purpose of reducing bandwidth usage. Based on Applicant’s limited disclosure, the skilled artisan would not have understood Applicant to have been in possession of the invention as broadly claimed.

    PNG
    media_image1.png
    356
    758
    media_image1.png
    Greyscale

[MPEP 2163.II.A.3(a).ii]	
At best, Applicant’s disclosure describes converting high precision data to low precision data, which is not commensurate in scope with the claims. Neither does the limited disclosure reflect the full variety or scope of the claimed genus, which includes any possible conversion from any input data format to any output data format. Accordingly, Applicant’s disclosure does not appear to contain adequate support for possession of the full scope of the claim.
	Examiner suggests amending the claim to more closely correspond in scope with the embodiments detailed in Applicant’s disclosure. For example, claims 13, 20, and 24 reflect the concept that the conversion is from higher precision data (of a first size) to lower precision data (of a second size). Applicant is encouraged to amend the claims to specify wherein the first format is a higher precision format and the second format is a lower precision format.
	Claims 2-3, 5-12, 14-17, 19, 21-23, and 25 maintain similar breadth regarding the conversion operation, and are rejected on similar grounds.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 3, 5-7, 12-14, 22, 24-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over McKennon’s “CUDA Host/Device Transfers and Data Movement” in view of Suda’s “Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks”, Chai WO 2016/169032 and Bordawekar US 2011/0202745.

[CLM 1]
1. A method of operating a storage device, the method comprising:
storing input data of a first format received from a host processor;
converting, using an operator included in the storage device, the input data received from the host processor into a second format for an operation to be performed on the input data of the second format by an accelerator connected to the storage device; and
re-storing the input data of the second format.
	It is noted that the claim limitations do not require specific storage or re-storage in the same part of the “storage device”. A computer may be broadly construed as a storage device, as a computer is understood to be a device which processes and stores data.
	Regarding the phrase, “for an operation to be performed on the input data of the second format”, as per MPEP 2111.04, this feature does not require any specific step to be performed, and hence is not given patentable weight. However, for purposes of expediting prosecution, details regarding how prior art may be applied is included.

	McKennon’s “CUDA Host/Device Transfers and Data Movement” teaches a method of operating a storage device, the method comprising:
	Method of operating a desktop computer [P2] to store and process data. 
storing received input data of a first format {received from a host processor};
“Allocate memory on the device…Copy data from host to device…Perform some calculations…Copy Data from Device to Host” [P1]
Specifically, McKennon discloses a typical host-accelerator data exchange. Host memory is separate from accelerator memory, and to provide the accelerator with input data stored in host memory, the input data is copied from host memory to accelerator memory. McKennon further draws attention to the requirements that such data transfers impose in terms of bandwidth between the host memory and the GPU memory.
re-storing the input data {of the second format}.
Data copied from the host memory is written to the accelerator memory [P1].

Further, regarding the input data, McKennon is clear as to the purpose of the data transfer as being for providing data from the host to an accelerator (GPU) in order to offload “calculations” from the CPU to the accelerator (GPU) [P1, 4].

	Where McKennon is silent, Suda teaches:
converting, {using an operator included in the storage device}, the input data {received from the host processor} into a second format for an operation to be performed on the input data of the second format by an accelerator connected to the storage device; and
Converting the input data from high precision to low precision before transmitting the input data to accelerator device memory, the accelerator to perform an operation on the received input data:
“While CNNs are proven indispensable in many computer vision applications, they consume significant amount of storage, external memory bandwidth, and computational resources…The performance limitation due to the external memory bandwidth can be alleviated by using reduced precision model weights.” [Suda, P18, S3.1]
“Traditionally CNN models are trained in CPU/GPU environments using 32-bit floating point data. Such high precision is not necessarily required in the testing or classification phase, owing to the redundancy in the over-parameterized CNN models [19]. Reducing data precision of the weights/data without any impact on the accuracy directly reduces the storage requirement as well as the energy for memory transfers.” [Suda, P18, S3.2]
re-storing the input data of the second format.
	Storing the input data at low precision in the device memory of the accelerator.
“Reducing data precision of the weights/data without any impact on the accuracy directly reduces the storage requirement as well as the energy for memory transfers.” [Suda, P18, S3.2]	
	
Specifically, both McKennon and Suda indicate that data transfers between a host memory (e.g. a main memory) and an accelerator memory are bottlenecked by the (external) memory bandwidth. Suda teaches that the bandwidth problem may be addressed, at least in part, by decreasing the precision of the input data being transmitted to the accelerator. In at least some contexts, the precision of the data stored in main memory is greater than what is needed at the accelerator, and the decreased precision can provide for vastly improved memory bandwidth. Hence, it is also not necessary to restore the precision of the input data even after the transmission.
	Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to employ Suda’s technique of reducing the precision of input data before transmitting the input data, in low precision form, with McKennon’s methods for managing transfers between host and accelerator memory for the purpose of improving memory bandwidth between a host and an accelerator.

	Neither McKennon nor Suda specifically discuss a specific operator (e.g. a logic element, controller or processor) responsible for converting the input data from the higher precision form to a lower precision form. Hence, neither specifically addresses using an operator included in the storage device to effect data conversion. In addition, the McKennon and Suda do not specifically state that the input data in the main memory was received from a host processor.

	Where the combination is silent, Chai WO 2016/169032 discloses a buffer chip comprising a data format conversion unit [P4-5], the conversion unit capable of converting between high precision and low precision data, and vice versa. The disclosed application of the buffer chip is to enable transmission between a main memory of the computer and a device memory of the computer using low precision, which consumes less bandwidth [P2]. Hence, Chai appears to teach a specific device (buffer chip) for performing a conversion function, the details of which were not specifically discussed in McKennon and Suda.
Hence, Chai teaches converting the input data into a second format for an operation to be performed on the input data of the second format using an operator included in the storage device; and
“an embodiment of the present invention provides a buffer chip, including: a bus interface, an address buffer unit, a control cache unit, a data buffer unit, and the apparatus according to any of the first aspects.” [P3]
	“The bus interface of the buffer chip in the memory receives the data copy command sent by the CPU of the central processor, and caches the data to be converted obtained by the copy to the data buffer unit according to the data copy command, and caches the storage address of the data to be converted to An address buffering unit, configured to cache the format conversion type of the data to be converted to a control cache module; and send the data copy command to a control module of the data format conversion device;” [P3-4]
	“The control module is configured to send, according to the received data copy command, a control instruction to a conversion module of the data format conversion device, where the data copy command includes the to-be-converted data, a format conversion type and information of an address of the data to be converted; the control instruction is used to instruct the conversion module to perform data format conversion and storage address mapping on the data to be converted;
The conversion module completes the data format conversion and the storage address mapping of the data to be converted according to the received control instruction, and sends the data to be converted after the data format conversion to the acceleration calculation unit.” [P4]

	Hence, Chai discloses a buffer chip which provides the functionality of converting high precision data, e.g. present in the main memory of the combination of McKennon and Suda, to low precision data for transmission over a bus. The buffer chip implements the general function of data conversion disclosed by Suda for decreasing the size of the data before it is transmitted from the host memory to the device memory, and Chai’s buffer chip is taught as being incorporated with a computer memory.
Therefore, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to incorporate known elements for performing a necessary task, i.e. incorporate the buffer chip of Chai to the computer of the combination, e.g. as part of the main memory, in order to convert data stored at high precision into a lower precision form for more compact transmission across the memory bus, thereby improving the effective memory bandwidth.

With regard to storing input data of a first format received from a host processor, McKennon, Suda, and Chai do not specifically discuss the source of the input data received by the host memory. 
Where the combination is silent, Bordawekar US 2011/020245 discloses further details of a process by which CPUs offload computations onto a GPU, such as that implemented by APIs such as CUDA or OpenCL [0019]. Such processes are relevant to the combination, as McKennon discusses CUDA and Suda discusses OpenCL, each of which are regarded as interfaces for communication between a CPU and accelerator to enable offloading computations to the accelerator.
Bordawekar discloses a system comprising a host CPU and a GPU (accelerator). Bordawekar discloses a process where the host CPU reads a file, performs computations on the read data, writes the data to host memory, and instructs the GPU to transfer the data from the host memory to the device memory in order to perform computations [0015, 0022]. For example, the CPU may read an input graph and write it to host memory for the GPU to access and perform further computations on [0018; 0022]. The preprocessing may be used to improve data access locality for the accelerator to improve efficiency [0019]. Alternatively, other computations may be performed by the CPU to generally prepare data for input to an accelerator [0023-0026].
	It would have been obvious to the skilled artisan before the effective filing date of the claimed invention to employ known processes for using APIs to enable a CPU to write preprocessed data to host memory for the purpose of offloading computations associated with the preprocessed data to a GPU, as disclosed by Bordawekar, to the system of the combination in order to implement a necessary step in such APIs, e.g. CUDA or OpenCL, to offload computational tasks to the accelerator from the CPU, and balancing computations between the CPU and GPU [0014]. 
Alternatively, such modification may be motivated by improved efficiency when the CPU is enabled to preprocess and write data in the host memory for more efficient processing by the accelerator, e.g. via improved access locality [0014-0019].

[CLM 3]
3. The method of claim 1, wherein the second format has a lower memory bandwidth than the first format.
	The combination teaches claim 1, wherein the second format has a lower memory bandwidth than the first format. The use of a lower precision format taught by Suda is expressly for the property that it consumes less bandwidth per data element than the higher precision format. Specifically, the exemplary higher precision 32b floating point format requires more bits than the lower precision 16b floating point format [Suda, P18, S3.2]. Other lower precision formats, such as 8-bit precision, are also contemplated.

[CLM 5]
5. The method of claim 1, wherein the operation to be performed on the input data in the second format by the operator or the accelerator receiving the input data of the second format from the storage device.
	The combination teaches claim 1, wherein the operation [is] to be performed on the input data in the second format by the operator or an accelerator receiving the input data of the second format from the storage device (receiving data transmitted from host memory at low precision at the accelerator [McKenson, P1-2]; [Suda, P18, S3.1], and subsequently performing the operation, e.g. training a CNN using the low precision FP16 as input data [Suda, P18, S3.1-3.2]).

[CLM 6]
6. The method of claim 1, wherein the operation to be performed on the input data is one of operations that are performed by a neural network configured to infer the input data.
	The combination teaches claim 1, wherein the operation to be performed on the input data is one of operations that are performed by a neural network configured to infer the input data (training the neural network to perform inference operations on the input data, such as optical recognition, classification, etc: “CNNs, which are primarily employed in computer vision applications such as character recognition [1], image classification [2] [9] [16] [17], video classification [3], face detection [4], gesture recognition [5], etc., are also being used in a wide range of fields including speech recognition [6], natural language processing [7] and text classification [8].” [Suda, P16, C2]).

[CLM 7]
7. The method of claim 1, further comprising:
converting result data of the operation performed on the input data into the first format; and outputting the result data of the first format.
	The combination teaches claim 1, further comprising converting result data of the operation performed on the input data into the first format; and
outputting the result data of the first format.
The disclosed buffer chip converts from high precision to low precision in one direction, and converts back from low precision to high precision in the other direction:
“The data format conversion module completes the conversion of the data format according to the instruction of the control module, and the specific The working mode can be divided into three types:
Transfer data from low precision to high precision: increase the bit width of the data format, fill the zeros in the upper part, and keep the original data in the lower part;
Transfer data from high precision to low precision: the bit width of the less data format, directly truncating the low part;
No data format conversion;
The selection of the working mode of the module, the specific data format conversion mode, and the converted data type are all implemented according to the control word in the control module instruction.” [Chai, P10]).
	Therefore, the conversion mechanism of the combination included means for converting low precision data received from the accelerator back to high precision data when received at the main memory. The combination does not prescribe whether to store the output of the accelerator in a particular precision. However, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to try one of the three available functions of the buffer chip to store the received low precision output data into the main memory, and the results would have been predictable because each data format has its own advantages (high precision data taking up more space in the memory but enabling more accurate computations, while low precision data takes up less space but placing a lower bound on the accuracy of computations).

[CLM 12]
12. The method of claim 1, wherein the storage device is included in a user terminal into which data to be inferred through a neural network that performs the operation are input or a server that receives the data to be inferred from the user terminal.
	The combination teaches claim 1, wherein the storage device is included in a user terminal into which data to be inferred through a neural network that performs the operation are input or a server that receives the data to be inferred from the user terminal. A user terminal is construed as any input device, such as a computer, to which the input data may be provided to feed the neural network. The combination teaches a computer, e.g. the host of McKennon, which provides data to device memory coupled to a neural network (e.g., a CNN implemented on FPGAs [Suda, Abstract and P18]).

[CLM 13]
13. The method of claim 1, wherein the first format is a 32-bit floating point (FP32) format and the second format is a 16-bit floating point (FP16) format or an 8-bit integer (INT8) format.
	The combination teaches claim 1, wherein the first format is a 32-bit floating point (FP32) format and the second format is a 16-bit floating point (FP16) format or an 8-bit integer (INT8) format (Suda discloses a NN having FP32 data and a neural network accepting FP16 [P18, C2]).

[CLM 14]
14. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 1.
	Claim 14 is rejected on similar grounds as claim 1, as it is the CRSM embodying the method of claim 1.

[CLM 22]
22. An electronic device, comprising:
a storage device configured to store input data of a first format received from a host processor, convert, using an internal operator of the storage device, the input data of the first format received from the host processor into a second format, and re-store the input data of the second format; and
an accelerator configured to perform the operation on the input data of the second format received from the storage device.
	Claim 22 is rejected on similar grounds as claim 1, as it is the computer comprising the device of claim 1, and further including the accelerator to perform the operation.
The combination further teaches an accelerator (FPGAs and/or GPUs). The accelerator (FPGA) is configured to implement a neural network to perform operations such as recognizing images using lower precision data [Suda, P16, C2; P18, C1-2].
Alternatively, the accelerator (GPU) is configured to perform image processing operations on behalf of the host CPU [McKennon, P1-4][Chai, P2-5][Bordawekar, 0002-0005; 0014-0022].

[CLM 24]
24. The electronic device of claim 23, wherein the first format is a 32-bit floating point (FP32) format and the second format is a 16-bit floating point (FP16) format or an 8-bit integer (INT8) format.
	Claim 24 is rejected on similar grounds as claim 13, as it is the computer performing the method of claim 13.

[CLM 25]
25. The electronic device of claim 23, wherein the accelerator is configured to perform an inference operation on the input data of the second format received from the storage device.
	Claim 25 is rejected on similar grounds as claim 6, as it is the computer performing the method of claim 6.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and further in view of Lin US 10,373,050.
Claim 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 22 above, and further in view of Lin US 10,373,050.
[CLM 2]
2. The method of claim 1, wherein the converting comprises converting the input data of the first format into the second format by applying any one or any combination of any two or more of type converting, quantization, dequantization, padding, packing, and unpacking to the input data of the first format.
	The combination teaches claim 1, wherein the converting comprises converting the input data of the first format into the second format by applying any one or any combination of any two or more of type converting, quantization, dequantization, padding, packing, and unpacking to the input data of the first format.
	Specifically, the combination teaches converting input data at a high precision to a low precision, such as from FP32 to FP16. This constitutes quantization.
	Regarding the definition of “quantization”, as per Lin US 10,373,050, “quantization” is any function which maps a space of input values into a smaller space of output values. This broadly covers any function or algorithm which entails conversion from higher precision data to lower precision data. 
Accordingly, the combination is understood to teach a conversion which constitutes quantization of the input data, because the combination teaches a conversion between higher precision data, e.g. FP32, and lower precision data, e.g. FP16.

[CLM 23]
23. The electronic device of claim 22, wherein the storage device comprises the internal operator configured to convert the input data of the first format into the second format by applying any one or any combination of any two or more of type converting, quantization, dequantization, padding, packing, and unpacking to the input data of the first format.
	Claim 23 is rejected on similar grounds as claim 2, as it is the computer performing the method of claim 2.

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and further in view of Ginsburg US 2017/0372202.
[CLM 4]
4. The method of claim 1, wherein the operation to be performed on the input data is a low precision operation in the second format and has a lower precision than a high precision operation in the first format.
	Claim 4 does not require any additional steps to be performed – it merely discusses an intended use for the data produced by the method. See MPEP 2111.04. Hence, the limitations are not given patentable weight.
	For purposes of expediting prosecution, further consideration is given:
Suda further discloses that CNN operations may still be performed in 32-bit floating point precision despite the use of rounded-off weights. The loss of accuracy in the context was observed to be insignificant until the precision fell below 8 bits [P18, S3.2].
Where the combination is silent, Ginsburg discloses wherein the operation to be performed on the input data is a low precision operation in the second format and has a lower precision than a high precision operation in the first format. 
	Ginsburg US 2017/0372202 discloses that the use of FP16 data and operations for training neural networks include certain advantages over FP32 data and operations – specifically, FP16 operations are faster to perform than single-precision (FP32), and FP16 data is smaller and hence transmits faster and requires less bandwidth [0004-0005]. A potential disadvantage exists [0006], but the use of FP16 is particularly useful if the range of the input data fits within the range covered by FP16 [0004], and the disadvantage may be reduced by range extension techniques disclosed by Ginsburg [0022]. 
	Hence, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to perform FP16 operations as opposed to FP32 operations, as disclosed by Ginsburg, in training the neural network of the combination, in order to improve the performance of the neural network [Ginsburg, 0005].
	Alternatively, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to try substituting FP16 operations for the FP32 operations of the combination, in order to improve the performance of the neural network [Ginsburg, 0005]. The results of the substitution would have been predictable, as the advantages and disadvantages of the two data formats and operations were known, and the field of art has generally realized the potential benefits of performing the substitution [0004-0006].

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and further in view of Son “Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations” and Oyadoma US 2005/0010691.
[CLM 8]
8. The method of claim 1, wherein the operator is disposed adjacent to a bank configured to store data in the storage device.
	The combination teaches claim 1, and further teaches placing the operator at the output of the main memory (or, at least on the main memory side of the host to device transmission link). However, the combination does not specifically disclose wherein the operator is disposed adjacent to a bank configured to store data in the storage device.
Regarding whether main memory comprises a bank, host memory is commonly implemented using DRAM which comprises a bank structure. See Son:
“DRAM has been a de facto standard for main memory for decades thanks to its high density and performance. DRAM has more than ten times higher storage density than SRAM and is orders of magnitude faster than NAND flash devices. With continued technology scaling, DRAM devices have evolved to exploit these smaller and faster transistors to increase mainly their capacity and bandwidth under tight cost constraints.” [P380, C2].
“DRAM devices have adopted prefetching and multi-bank architectures [24] to improve the sequential and random access bandwidth. Instead of increasing the internal operating frequency through deep pipelining (i.e., reducing tCCD), the DRAM mats transfer more bits in parallel through a wide datapath to keep up with ever-surging bandwidth demands.” [P381, “Modern DRAM Device Organization”].
Hence, it would have been obvious to the skilled artisan before the effective fling date of the claimed invention to employ DRAM comprising a multibank structure for the main memory of the combination, and the results would have been predictable (using the de facto standard for main memory as the host main memory).

Further, it is generally recognized that physical distance entails greater propagation times and hence increases latency. See Oyadoma US 2005/0010691 [0014]. Therefore, the skilled artisan would have been motivated to dispose the buffer chip as close as possible to the memory bank that is storing the input data.
Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to place the buffer chip of Chai at a location as close as possible, e.g. adjacent, to a memory bank of the main memory of the combination in order to shorten data propagation time between the memory and the buffer, thereby providing lower latency.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and further in view of Rusterholz US 4,873,630.
[CLM 9]
9. The method of claim 1, wherein the operator comprises an arithmetic logic unit (ALU) configured to perform a predetermined operation.
	The combination teaches claim 1. Chai further discloses a conversion module, but does not specifically disclose an ALU. Where the combination is silent, Rusterholz US 4,873,630 teaches a conversion process from floating point to floating point by use of an ALU, and hence teaches wherein the operator comprises an arithmetic logic unit (ALU) configured to perform a predetermined operation [C98, p2]:
“The final conversion type is conversion from floating to floating point. This is shown in FIG. 92. The operands enter at the bottom via the characteristic and mantissa augend and addend registers. Just the exponent is processed and then everything is staged on up to the shift unit where the rest of the convert instruction occurs. First, the operand is made positive and the mantissa is separated. The mantissa is then shifted with a zero fill by three or not shifted at all, depending on the conversion type. The exponent and the mantissa are then remerged. Characteristic overflow or underflow is detected which can only occur if the conversion is from double to single precision. Then the ALU-out register is zeroed, if the mantissa was zero.”
Hence, Rusterholz indicates that FP-FP conversion is performed by use of an ALU.
Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to implement the conversion module of the combination using mathematical operations performed by an ALU, as disclosed by Rusterholz, and results would have been predictable because FP32 to FP16 conversion processes were known, and a known process for performing floating point to floating point conversion using an ALU was known.

Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and alternatively further in view of Amir’s “3-D Stacked Image Sensor With Deep Neural Network Computation”.
[CLM 10]
10. The method of claim 1, wherein the input data comprise at least one of: image data of the first format captured by an image sensor; and data of the first format processed by the host processor configured to control either one or both of the storage device and the accelerator connected to the storage device.
	The combination teaches claim 1, wherein the input data comprise at least one of: image data of the first format captured by an image sensor; and data of the first format processed by a host processor configured to control either one or both of the storage device and an accelerator connected to the storage device (input data comprises an image [Suda, P17, S2]; input data further comprises data, e.g. graphs, preprocessed by the CPU for input to the accelerator [Bordawekar, 0018-0022]).
	In particular, Suda recites: “Convolutional Neural Networks (CNNs), inspired by visual cortex of the brain, are a category of feed-forward artificial neural networks. CNNs, which are primarily employed in computer vision applications such as character recognition [1], image classification [2] [9] [16] [17], video classification [3], face detection [4], gesture recognition [5], etc., are also being used in a wide range of fields including speech recognition [6], natural language processing [7] and text classification [8]. Over the past decade, the accuracy and performance of CNN-based algorithms improved significantly, mainly due to the enhanced network structures enabled by massive training datasets and increased raw computational power aided by CMOS scaling to train the models in a reasonable amount of time.” [Suda, P16, S1].
	For purposes of examination, an image sensor is construed as any element which converts visual information into electrical signals.
	The class of problems referenced by Suda would have been broadly recognized by the skilled artisan as the processing of real-world image data which is commonly obtained by an image sensor, e.g. a camera. Hence, Suda is considered to implicitly teach the use of an image sensor to capture the disclosed characters, images, videos, faces, and gestures.

	Alternatively, while Suda does not specifically discuss an image capture device or image sensor, the use of image sensors for the acquisition of input data for a neural network was known. See Amir [P1]: “an image sensor captures and sends data to a distant processing engine to perform DNN operations”.
Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to obtain the input data representing a character, image, video, face, or gesture disclosed by the combination by use of image sensors, and the results would have been predictable (generating digital data for training the neural network).

Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination as applied to claim 1 above, and further in view of Mellempudi US 2018/0322382.
11. The method of claim 1, wherein the storage device is a dynamic random-access memory (DRAM) located outside an accelerator that performs the operation.
	The combination teaches claim 1, wherein the storage device is a dynamic random-access memory (DRAM) located outside an accelerator that performs the operation.
	In the combination, the buffer chip was combined with host main memory disclosed by McKennon.
While the combination is silent, it is well known that main memory is commonly implemented using DRAM. See e.g. Mellempudi US 2018/0322382: “By way of example, and not limitation, the processor memories 401-402 and GPU memories 420-423 may be volatile memories such as dynamic random-access memories (DRAMs) (including stacked DRAMs), Graphics DDR SDRAM (GDDR) (e.g., GDDR5, GDDR6), or High Bandwidth Memory (HBM) and/or may be non-volatile memories such as 3D XPoint or Nano-Ram. In one embodiment, some portion of the memories may be volatile memory and another portion may be non-volatile memory (e.g., using a two-level memory (2LM) hierarchy).” [0084]
	Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention to substitute the generic, unspecified host memory disclosed by McKennon with the DRAM disclosed by Mellempudi, and the results would have been predictable (providing a memory device for data storage purposes).

Allowable Subject Matter
Claim 20 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).

The following is a statement of reasons for the indication of allowable subject matter:
	None of the cited prior art of record appear to teach or suggest the combination of the following features:
A storage device, comprising:
a bank configured to store received input data of a first format; and
an operator disposed adjacent to the bank and configured to convert the input data into a second format for an operation to be performed on the input data of the second format,
wherein the input data of the second format are re-stored in the bank.
	wherein the first format is a 32-bit floating point (FP32) format and the second format is a 16-bit floating point (FP16) format or an 8-bit integer (INT8) format.
	The cited prior art of record teaches:
A host memory which stores data to be copied to a device memory [McKennon, P1-2].
	A host memory being implementable in DRAM [Son, P380-382].
	DRAM comprising banks [Son, P381-382].
The host memory storing data at a high precision [Suda, P18, C1].
The accelerator device accepting data at a low precision [Suda, P18, C2].
A buffer chip comprising conversion means to reduce a precision of input data [Chai, P1-6, 9].
	Motivation for the skilled artisan to reduce a precision of the data prior to transmission for the purpose of reducing the size of the input data to be transmitted, thereby reducing the bandwidth consumed when transmitting the data to the device from the host [Suda, P18, C1-2][Chai, P2].

	However, none of the cited prior art of record appear to specifically teach or suggest re-storing the input data of the second format in the same bank of memory as the input data of the first format. Nor does the cited prior art of record appear to teach or suggest why such change should be made.
	As a general matter, it is recognized that storing the output of a data processing operation, such as the conversion process, to a storage device allows for later accesses to the data to be available at a lower latency. Hence, the skilled artisan could have considered storing the converted data back into main memory for the purpose of providing a readily available copy of the low precision data, thereby avoiding a need to reprocess the high precision data to again obtain the low precision data by use of the conversion module,
	However, none of the cited prior art of record appear to teach or suggest adopting such a line of reasoning, at least in combination with the context provided by the other recited features. Moreover, such a line of reasoning does not specifically address storing the output into the same bank of memory as the input data stored at the higher precision, as opposed to any available memory.
	Accordingly, claim 20 appears to contain allowable subject matter.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEWY H LI whose telephone number is (571)272-8714. The examiner can normally be reached Mon-Fri 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571)272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HEWY H LI/Examiner, Art Unit 2136                                                                                                                                                                                                        
/CHARLES RONES/Supervisory Patent Examiner, Art Unit 2136