Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


This is a supplement NON final action because the Official NON final dated May 26, 2022 is incomplete.


DETAILED ACTION

Claim Rejections - 35 USC § 103

1.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
	
2.	Claims 1, 4-6, 9, 10-13, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al (THIRD-PARTY SUBMISSION UNDER CFR 129, Pub. No. US 20150227429).
	As per independent claim 1, Henry el al disclose the invention substantially as claimed, including: 
Claim 1. A method for optimizing matrix conversions, comprising (para. [0077], “[a]t block 802, a programmer develops a program, such as graphics software, with a conventional programming language, such as the C language, and invokes an approximation-aware compiler with an approximation directive.”): 
receiving a matrix of real numbers, wherein each real number is represented by a mantissa and an exponent; identifying a mathematical operation to be performed on the matrix (para. [0019], “[t]he processor 100 comprises a programmable data processor that performs stored instructions, such as a central processing unit (CPU) or a graphics processing unit (GPU).”; and para. [0027] “[e]xamples of functional units include, but are not limited to, execution units, such as an integer unit, a single issue multiple data (SIMD) unit, a multimedia unit, and a floating point unit, such as a floating point multiplier, floating point divider and floating point adder. Advantageously, the approximating functional units 106 consume less power when performing approximate computations than when performing normal computations.”); 
determining, based on the matrix and the mathematical operation, a computing resource requirement (para. [0002], “[a]pproximate computing attempts to perform computations in a manner that reduces power consumption in exchange for potentially reduced accuracy.” and paras. [0071]-[0072], “[s]till further, the program may detect a change in the power source, such as plugging into or unplugging from a wall outlet. Flow proceeds to block 502. 
At block 502, the program determines the approximation policy based on the system configuration, as described above”);
determining that the required computing resource requirement exceeds a threshold (para. [0026], “[i]n one embodiment, the approximation control register 132 holds information that specifies the approximation policy 176 for the processor 100 that is provided to the approximating functional units 106. Preferably, the approximation control register 132 includes an approximation flag, an approximation amount, and an error bound (or error threshold). The approximation flag indicates whether computations performed by the approximating functional units 106 should be full accuracy computations or approximate computations, i.e., in full accuracy mode or approximate computation mode (or approximating mode)”); 
 (para. [0029] “… [f]or example, if the approximation mode is full accuracy, the power control 206 causes power to be
provided to the transistors of the least significant bit multiplication gates 204; whereas, if the approximation mode is less than the full accuracy, the power control 206 causes power not to be provided to the transistors of the least significant bit multiplication gates 204. In one embodiment, the least significant bit multiplication gates 204 are grouped such that the power control 206 powers off the gates associated with the multiplication of lesser or fewer of the least significant bits based on the approximation amount indicated in the approximation policy 176.);
wherein the converted matrix minimizes error between a sum of the matrix and a sum of the converted matrix (para. [0024], “…if the accumulated error of a result of an approximate computation exceeds an error bound, the processor 100 may restore its state from the snapshot 134 and re-perform the computations without approximation”);
generating a result based on the mathematical operation and the converted matrix and providing the result (para [0023] “[e]ach time an approximating functional unit 106 generates a result 164 (which is written to an architectural register 108), the approximating functional unit 106 also generates an indication of the amount error 168 associated with the result 164 that has accumulated due to approximating computations”).
It is noted that Henry et al do not specifically detail the claimed “converting the matrix to a converted matrix”; however, the approximation function units 106 has three embodiments (see Fig. 2 and para. [0028]). Therefore, the feature is equivalent to the claimed “converting”. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to design the claimed invention according to Henry et al’s teaching because the reference is a device having optimizing feature as claimed.
As per claim 4, wherein converting the matrix to a converted matrix includes: clipping one or more of the coefficients to reduce a number of storage bits for the one or more coefficients, and wherein the clipped storage bits are utilized to determine the error (para [0029], “…[i]n one embodiment, the least significant bit multiplication gates 204 are grouped such that the power control 206 powers off the gates associated with the multiplication of lesser or fewer of the least significant bits based on the approximation amount indicated in the approximation policy 176”; and para [0023], “…[e]ach time  an approximating functional unit 106 generates a result 164 (which is written to an architectural register 108), the approximating functional unit 106 also generates an indication of the amount error 168 associated with the result 164 that has accumulated due to approximating computations”).
	As per claim 5, wherein clipping the one or more coefficients includes reducing a number of storage bits for the mantissa of one or more of the coefficients (para [0030], “…[p]referably, the multiplier excludes M of the least significant bits of the factors 166 when performing the multiplication, where M is less than N. For example, assume the mantissas of the factors166 are each 53 bits, then the transistors of the gates 204 of the approximating multiplier that would normally be used in the multiplication of the lower N bits of the 53 bits of the factors 166 are turned off such that the lower M bits of the factors 166 are not included in the approximate multiply, where the number of bits M is specified in the approximation policy, e.g., in the approximation control register
132.”)
As per claim 6, further comprising: determining, based on the converted matrix and the mathematical operation, a second computing resource requirement; and determining that the second required computing resource requirement does not exceed the threshold (para [0062], “…[t]he program determines the approximation policy based, at least in part, on the current system configuration. For example, the program may detect whether the computer system is operating from battery power or from an effectively limitless source, such as A/C wall power.  Additionally, the program may detect the hardware configuration of the computer system, such as the display size and speaker quality. The program may consider such factors in determining the desirability and/or acceptability of performing certain computations approximately rather than with full accuracy, such as audio/video-related computations. Flow proceeds to block 504”).
As per claim 9, further comprising: receiving a second matrix of real numbers; determining that the mathematical operation is to be performed on the matrix and the second matrix; and converting the second matrix to a second converted matrix, wherein the second converted matrix minimizes error between a sum of the second matrix and a sum of the second converted matrix, Page 4 of 8\\NORTHCA - 1R2674/006401 - 2724557 viAppl. No. 16/684,275 Amdt. dated December 6, 2019Attorney Docket No.: 1R2674.006401Preliminary AmendmentClient Reference No.: 19-RE-0044US01wherein generating the result is further based on the second converted matrix (para [0019], “…[r]eferring now to FIG.1, a block diagram illustrating an embodiment of a processor 100 is shown. The processor 100 comprises a programmable data processor that performs stored instructions, such as a central processing unit (CPU) or a graphics processing unit (GPU)”; and ...[t]he processor 100 includes an instruction cache 102; an instruction translator 104 coupled to the instruction cache 102; one or more approximating functional units 106 coupled to receive micro instructions from the instruction translator 104; architectural registers 108 coupled to provide instruction operands 166 to the approximating functional units 106; an approximation control register 132 coupled to the approximating functional units 106; a data cache memory 138 coupled to the approximating functional units 106; and a snapshot storage 134 coupled to the approximating functional units 106”).
Due to the similarity of independent claim 10 to claim 1, it is rejected under a similar rationale.
Due to the similarity of claim 11 to claim 6, it is rejected under a similar rationale.
Due to the similarity of claim 12 to claim 4, it is rejected under a similar rationale.
Due to the similarity of claim 13 to claim 5, it is rejected under a similar rationale.
Due to the similarity of independent claim 16 to claim 1, it is rejected under a similar rationale.
Due to the similarity of claim 17 to claim 6, it is rejected under a similar rationale.
Due to the similarity of claim 18 to claim 4, it is rejected under a similar rationale.
Due to the similarity of claim 19 to claim 5, it is rejected under a similar rationale.

3.	Claims 1, 8, 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Rahimi et al (THIRD-PARTY SUBMISSION UNDER CFR 129, “A Variability …”).
	As per independent claim 1, Rahimi el al disclose the invention substantially as claimed, including: 
Claim 1. A method for optimizing matrix conversions, comprising: (Abstract, “[w]e propose a tightly-coupled, multi-core cluster Architecture with shared, variation-tolerant, and accuracy-reconfigurable floating point units (FPUs). [...] We use a profiling technique to identify tolerable Error significance and error rate thresholds in error-tolerant image processing applications. This information Guides an application driven hardware FPU synthesis and optimization design flow to generate efficient FPUs.”); 
receiving a matrix of real numbers, wherein each real number is represented by a mantissa and an exponent; identifying a mathematical operation to be performed on the matrix (Pg. 2 left Col, “[f]or general purpose error-intolerant application, our approach reduces the recovery cycles that yield an average energy saving of 22% (and up to 28%), compared to the worst-case design. For error-tolerant image processing applications with annotated approximate directives, 36% energy saving is achieved while maintaining acceptable quality degradation”); 
determining, based on the matrix and the mathematical operation, a computing resource requirement (Pg 2 Right Col, “[d]isciplined approximated programming allows Programmers to identify parts of a program for Approximate computation”);
determining that the required computing resource requirement exceeds a threshold (Pg 2 Left Col, “[a]t design-time, code regions are profiled to identify
acceptable error significance and error rate. This information drives synthesis of an application driven hardware FPU. At runtime, as different sequences of OpenMP directives are dynamically encountered during program execution, the scheduler promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements.“); Pg 3, Left Col “[p]rograms with elastic outputs have application-dependent fidelity metrics, such as peak signal to noise ratio(PSNR),associated with them to characterize the quality of the computational result. The degradation of output quality for such applications is acceptable if the fidelity metrics satisfy a certain threshold. For example, in multimedia applications the quality of the output can be degraded but acceptable within the constraints of PSN… 30dB”); 
 wherein the converted matrix minimizes error between a sum of the matrix and a sum of the converted matrix; generating a result based on the mathematical operation and the converted matrix and providing the result (Pg 4, Left Col, “[i]n the approximate mode, the pipeline simply disables the EDS circuit sensors on the less significant N bits of the fraction where N is reprogrammable through a memory-mapped register. The sign and the exponent bits are always protected by EDS. This allows the pipeline to ignore any timing error below the less significant N bits of the fraction and save on the recovery cost”.)
It is noted that Rahimi et al do not specifically detail the claimed “converting the matrix to a converted matrix”; however, the “approximate mode” (Pg 4, Left Col) modifies the fraction (disables the EDS circuit sensors on the less significant N bits of the fraction). Therefore, the feature is equivalent to the claimed “converting”. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to design the claimed invention according to Rahimi et al’s teaching because the reference is a device having optimizing feature as claimed.
	As per claim 8, wherein determining that the required computing resource requirement exceeds a threshold is based on at least one or the size of the matrix, maximum storage limit for the matrix, and a maximum computing time for performing the operation on the matrix (Pg 2, Left Col, “[w]e propose a set of accuracy-reconfigurable FPUs that are resistant to variation-induced timing errors and shared among tightly-coupled processors in a cluster. This resilient shared-FPUs architecture supports online timing error detection, correction, and characterization. We introduce the notion of FP pipeline vulnerability (FPV), captured as meta data, to expose variability and its effects to a software scheduler for reducing the cost of error correction. A runtime ranking scheduler utilizes the FPV metadata to identify the most suitable FPUs for the required computation accuracy for the minimum timing error rate“).
    Due to the similarity of independent claims 10 and 16 to claim 1, they are rejected under a similar rationale.
4.	Claims 1, 7-8, 10, 14, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ho et al (THIRD-PARTY SUBMISSION UNDER CFR 129, “Efficient floating point signal…”).
As per independent claim 1, Ho el al disclose the invention substantially as claimed, including: 
Claim 1. A method for optimizing  
receiving a  (Abstract, “[w]e successfully used the tool to transform floating point signal processing programs to their arbitrary precision fixed-point equivalent, obtaining about 82% and 66% average reduction in resources when compared to the double precision and single precision versions, respectively“); 
;
; 
converting the matrix to a converted matrix, wherein the converted matrix minimizes error between a sum of the matrix and a sum of the converted matrix (Pg 2, “A. Problem formulation”.); 
generating a result based on the mathematical operation and the converted matrix and providing the result (Pg 5, Right Col, “[w]e presented an algorithm for finding the minimum mantissa precision in floating point code assuming bounds on the output error are given. The algorithm’s novelty lies in the use of program’s high-level structure in formation to guide the black box search in such a way that is both scalable and yet produces high quality results. The proposed search algorithm is not only fast and parallelizable, but also produces results comparable to that obtained by fine-grain word length optimization methods”.)
It is noted that Ho et al do not specifically detail the claimed “matrix”; however, the toolchain can use in the complex computing system (Abstract, “[o]ur toolchain uses a distributed algorithm than can analyze thousands of variables).  It implies that the system should have matrix computation.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to design the claimed invention according to Ho et al’s teaching because the reference is a device having optimizing feature as claimed.
As per claim 7, wherein the converted matrix includes entries that are represented as fixed point numbers (Abtract, “[t]his paper presents an automatic tool-chain that efficiently computes the precision of floating point variables down to the bit level of the mantissa. Our tool chain uses a distributed algorithm that can analyze thousands of variables. We successfully used the tool to transform floating point signal processing programs to their arbitrary precision fixed-point equivalent, obtaining about 82% and 66% average reduction in resources when compared to the double precision and single precision versions, respectively“).
	As per claim 8, wherein determining that the required computing resource requirement exceeds a threshold is based on at least one or the size of the matrix, maximum storage limit for the matrix, and a maximum computing time for performing the operation on the matrix (Pg 2, “A. Problem formulation”.)
    Due to the similarity of independent claims 10 and 16 to claim 1, they are rejected under a similar rationale.
           Due to the similarity of claims 14 & 20 to claim 7, they are rejected under a similar rationale.

5.	Claims 1, 2-3, 8, 10, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Jerez et al (THIRD-PARTY SUBMISSION UNDER CFR 129, “A low complexity scaling…”).
	As per independent claim 1, Jerez el al disclose the invention substantially as claimed, including: 
Claim 1. A method for optimizing matrix conversions, comprising (Abstract, “[w]e consider the problem of enabling fixed-point implementation of linear algebra kernels on low-cost embedded systems, as well as motivating more efficient computational architectures for scientific applications.  Fixed-point arithmetic presents additional design challenges compared to floating-point arithmetic, such as having to bound peak values of variables and control their dynamic ranges. Algorithms for solving linear equations or finding eigenvalues are typically nonlinear and iterative, making solving these design challenges a non trivial task.”): 
receiving a matrix of real numbers, wherein each real number is represented by a mantissa and an exponent; identifying a mathematical operation to be performed on the matrix (Pg. 303, “[p]orting floating-point algorithm implementations to fixed-point arithmetic is an effective way to address these limitations. Because fixed-point numbers do not require mantissa alignment, the circuitry is significantly simpler and faster. The smaller delay in arithmetic operations leads to lower latency computation and shorter pipelines.”); 
determining, based on the matrix and the mathematical operation, a computing resource requirement; determining that the required computing resource requirement exceeds a threshold (Pg. 303, “[i]n embedded computing, cost, power consumption, computation time and size constraints often limit the complexity of the algorithms that can be implemented, thereby limiting the capabilities of the embedded solution.”);
 wherein the converted matrix minimizes error between a sum of the matrix and a sum of the converted matrix; generating a result based on the mathematical operation and the converted matrix and providing the result (Pg 306, “[n]otice that a different Lanczos problem has to be solved at each iteration of the optimization solver.  Since the range of Lanczos problems that have to be solved on the same hardware is so diverse, without using the scaling matrix (2) it is not possible to decide on a fixed data format that can represent; Pg 304, “[s]ection 4 presents numerical results showing that the numerical quality of the linear equation solution does not suffer by moving to fixed point arithmetic“; and Pg 304, “[i]n section 6 this tool is used to evaluate the potential relative performance improvement between fixed-point and floating point FPGA implementations and perform an absolute performance comparison against the peak performance of a high-end GPGPU“).
It is noted that Jerez et al do not specifically detail the claimed “converting the matrix to a converted matrix”; however, Jerez et al do show “scaling matrix” (Pg 306, Section 3.1). Therefore, the feature is equivalent to the claimed “converting”. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to design the claimed invention according to Jerez et al’s teaching because the reference is a device having optimizing feature as claimed.
As per claim 2, wherein the matrix is a filter kernel (Pg 304, “[w]e propose a novel scaling procedure first introduced by us in[11] to tackle the fixed-point bounding problem for the nonlinear and recursive Lanczos kernel”).
As per claim 3, wherein the matrix has at least one of horizontal, vertical, or diagonal symmetry, and wherein converting the matrix is at least partially based on the symmetry of the matrix (Pg 303, “[t]he Lanczos iteration [1] is the key building block in modern iterative numerical methods for computing eigenvalues or solving systems of linear equations involving symmetric matrices” and (Pg 305, “[t]he Lanczos Kernel” section).
	As per claim 8, wherein determining that the required computing resource requirement exceeds a threshold is based on at least one or the size of the matrix, maximum storage limit for the matrix, and a maximum computing time for performing the operation on the matrix (Pg. 303, “[p]orting floating-point algorithm implementations to fixed-point arithmetic is an effective way to address these limitations. Because fixed-point numbers do not require mantissa alignment, the circuitry is significantly simpler and faster. The smaller delay in arithmetic operations leads to lower latency computation and shorter pipelines.”).
Due to the similarity of independent claims 10 and 16 to claim 1, they are rejected under a similar rationale.
       Due to the similarity of claim 15 to claim 2, it is rejected under a similar rationale.

6.	Claims 2-3 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al (THIRD-PARTY SUBMISSION UNDER CFR 129, Pub. No. US 20150227429) in view of Jerez et al (THIRD-PARTY SUBMISSION UNDER CFR 129, “A low complexity scaling…”).
	Henry et al have been discussed in paragraph No. 2 above.
	Jerez et al have been discussed in paragraph No. 5 above.
Claims 2-3 and 8 add detailed features. Jerez et al show such detailed features.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jerez et al’s features in Henry et al’s teaching because the reference is a device having optimizing feature as claimed.

7.	Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al (THIRD-PARTY SUBMISSION UNDER CFR 129, Pub. No. US 20150227429) in view of Ho et al (THIRD-PARTY SUBMISSION UNDER CFR 129, “Efficient floating point signal…”).
	Henry et al have been discussed in paragraph No. 2 above.
	Ho et al have been discussed in paragraph No. 4 above.
Claims 7-8 add detailed features. Ho et al show such detailed features.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Ho et al’s features in Henry et al’s teaching because the reference is a device having optimizing feature as claimed.

8.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Henry et al (THIRD-PARTY SUBMISSION UNDER CFR 129, Pub. No. US 20150227429) in view of Rahimi et al (THIRD-PARTY SUBMISSION UNDER CFR 129, “A Variability …”).	Henry et al have been discussed in paragraph No. 2 above.
	Rahimi et al have been discussed in paragraph No. 3 above.
Claim 8 adds detailed features. Rahimi et al show such detailed features.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Rahimi et al’s features in Henry et al’s teaching because the reference is a device having optimizing feature as claimed.
	
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tan V. Mai whose telephone number is (571) 272-3726.  The examiner can normally be reached on Mon, Wed and Fri from 9:30am to 2:30pm.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mehta Jyoti, can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is:
			Official	 	(571) 273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Tan V Mai/ 		Primary Examiner, Art Unit 2182