DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-14 are pending in this office action and presented for examination. 

Specification
The disclosure is objected to because of the following informalities.  Appropriate correction is required.
Paragraph [39] discloses “The AI chip 103 may include processor cores 1031, 1032, and 1033, a wire 1034, and a computational accelerator 1035.” However, FIG. 1 shows AI chip 104 including the above entities.
Paragraph [40] discloses “The AI chip 104 may include processor cores 1041, 1042, and 1043, a wire 1044, and a computational accelerator 1045.” However, FIG. 1 shows AI chip 103 including the above entities.
In one location in paragraph [41] and two locations in paragraph [42], the reference character 102 is associated with an AI chip. However, reference character 102 was previously associated with a bus. Reference character 104 may have been intended to be used. 
In paragraph [78], “step 402” should be “step 302”.
In paragraph [88], the use of a dash between “random” and “access” is inconsistent. 
In paragraph [90], reference character 304’ appears to be incorrectly applied to “the complex computational result queue”. Reference character 308’ may have been intended. 
In paragraph [108], the use of a dash between “random” and “access” is inconsistent.

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. Examiner recommends a title which conveys that an AI accelerator is shared between processor cores. 

Drawings
The drawings are objected to because:
The drawing sheet numbering must be clear and larger than the numbers used as reference characters to avoid confusion. However, the drawings do not meet this requirement.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the 

Claim Objections
Claims 1-11 are objected to because of the following informalities.  Appropriate correction is required.
In claim 1, line 4, “the method” should be “the computing method” for antecedent basis clarity. Note that this limitation is also recited in claim 2, line 1; claim 2, line 3; claim 3, line 1; claim 4, line 1; claim 4, line 6; claim 5, line 1; claim 6, line 1; claim 6, line 5; claim 7, line 1; claim 8, line 1; claim 9, line 1; claim 10, line 1; and claim 11, line 1. 
Claims 2-11 are objected to for failing to alleviate the objection of claim 1 above. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 2-7 and 11-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the 
Claim 2 recites the limitation “wherein before decoding, by a target processor core among the at least one processor core, a to-be-executed instruction, the method further comprises: selecting, in response to receiving the to-be-executed instruction, a processor core executing the to-be-executed instruction from the at least one processor core for use as the target processor core” in lines 1-7. However, the metes and bounds of this limitation are indefinite. For example, it is indefinite as to how it can be the case that the recited processor core is [in the process of] executing the to-be-executed instruction before the recited processor core is even selected for executing the to-be-executed instruction.
Claims 3-6 are rejected for failing to alleviate the rejection of claim 2 above. 

Claim 4 recites the limitation “selecting, by the target processor core, the complex computational result from the complex computational result queue corresponding to the target processor core into at least one of: a result register in the target processor core, or a memory of the artificial intelligence chip. However, the metes and bounds of this limitation are indefinite. For example, it is indefinite as to what it means to select an entity “into” another entity. Note that claim 6 explicitly recites a writing step in conjunction with a selecting step. 

Claim 6 recites the limitation “the result register in the target processor core” in lines 10-11. However, there is insufficient antecedent basis for this limitation in the claims.
Claim 6 recites the limitation “the memory of the artificial intelligence chip” in lines 11-12. However, there is insufficient antecedent basis for this limitation in the claims. 

Claim 7 recites the limitation “the computational accelerator comprises at least one of following items: an application specific integrated circuit chip, or a field programmable gate array” in lines 1-4. However, a Markush grouping is a closed group of alternatives. If a Markush grouping requires an element selected from an open list of alternatives (e.g., selected from the group “comprising” or “consisting essentially of” the recited alternatives), the claim is indefinite because it is unclear what other alternatives are intended to be encompassed by the claim. See MPEP 2173.05(h). Examiner recommends reciting “the computational accelerator comprises at least one item selected from the group consisting of an application specific integrated circuit chip and a field programmable gate array”. 

Claim 11 recites the limitation “the preset complex computational identifier comprises at least one of following items: an exponentiation identifier, a square root extraction identifier, or a trigonometric function computation identifier” in lines 1-4. However, a Markush grouping is a closed group of alternatives. If a Markush grouping requires an element selected from an open list of alternatives (e.g., selected from the group “comprising” or “consisting essentially of” the recited alternatives), the claim is indefinite because it is unclear what other alternatives are intended to be encompassed by the claim. See MPEP 2173.05(h). Examiner recommends reciting “the preset complex computational identifier comprises at least one item selected from the group consisting of an exponentiation identifier, a square root extraction identifier, or a trigonometric function computation identifier”. 

Claim 12 recites the limitation “the artificial intelligence chip” in lines 6-7. However, there is insufficient antecedent basis for this limitation in the claims. Note that this limitation is 
Claim 14 is rejected for failing to alleviate the rejection of claim 12 above. 

Claim 13 recites the limitation “the computational accelerator” in line 16. However, there is insufficient antecedent basis for this limitation in the claims. Note that this limitation is also recited in claim 13, line 19, and claim 13, line 24. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7-8, and 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dao et al. (Dao) (US 6148395) in view of Lee et al. (Lee) (US 20140344194 A1).
Consider claim 1, Dao discloses a computing method applied to a chip (col. 2, line 60, single-chip multiprocessor), the chip comprising at least one processor core (col. 2, lines 64-65, two or more microprocessor central processing units, or CPUs) and a computational accelerator connected to each of the at least one processor core (col. 4, lines 19-21, multiple CPUs on the same integrated circuit chip should therefore be able to share a single high-performance FPU), the method comprising: decoding (col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by instruction decoders 
	However, Dao does not disclose that the chip is an artificial intelligence chip.
	On the other hand, Lee discloses a chip being an artificial intelligence chip ([0024], lines 1-2, machine-learning accelerator (MLA) block; [0021], lines 1-3, for example, application specific integrated circuits (ASICs) are typically used to realize algorithms implemented in hardware). In addition, to any extent to which Dao does not disclose the computation being “complex”, Lee also discloses “complex” computation (claim 16, calculating exponential values, square root values; [0028], line 19, discrete cosine transform).
Lee’s teaching supports a range of computations required in various machine-learning frameworks while employing a specialized architecture that can exploit algorithmic structure in order to achieve low energy (Lee, [0007], lines 3-8).


Consider claim 7, the overall combination entails the computational accelerator comprises at least one of following items: an application specific integrated circuit chip, or a field programmable gate array (Dao, col. 2, line 60, single-chip multiprocessor; Lee, [0022], line 2, machine-learning accelerator (MLA) integrated circuit).

Consider claim 8, the overall combination entails the complex computational instruction queue and the complex computational result queue are first-in-first-out queues (Dao, col. 5, line 41, FIFO order; col. 6, lines 26-28, as noted above relative to the description of queue stages 41 in floating-point pipeline 40, instruction buffers 50 are preferably arranged in a FIFO manner; col. 6, lines 46-48, results from completion unit 75 are output to result register 76 which, in turn, drives writeback bus WB at its output; col. 9, lines 49-52, as shown in FIG. 3, writeback bus WB is coupled to output buffer 78, through which communication of the result of the floating-point operation to memory, via internal bus IBUS, may be effected).
Consider claim 10, the overall combination entails the computational accelerator comprises at least one computing unit; and the executing, by the computational accelerator, a complex computation indicated by the complex computational identifier in the selected complex 

Consider claim 11, the overall combination entails the preset complex computational identifier comprises at least one of following items: an exponentiation identifier, a square root extraction identifier, or a trigonometric function computation identifier (Lee, claim 16, calculating exponential values, square root values; [0028], line 19, discrete cosine transform).

Claims 2 and 12-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dao and Lee (in the case of claim 2, as applied to claim 1 above), and further in view of Wu et al. (Wu) (US 20120233477 A1).
Consider claim 2, the combination thus far does not entail before decoding, by a target processor core among the at least one processor core, a to-be-executed instruction, the method further comprises: selecting, in response to receiving the to-be-executed instruction, a processor core executing the to-be-executed instruction from the at least one processor core for use as the target processor core.
On the other hand, Wu discloses before decoding, by a target processor core among at least one processor core, a to-be-executed instruction, a method further comprises: selecting, in response to receiving the to-be-executed instruction, a processor core executing the to-be-executed instruction from the at least one processor core for use as the target processor core ([0029], lines 1-6, code is distributed between core 101 and 102 based on maximizing performance and power. For example, code regions are identified to perform better on one of the two cores 101, 102. As a result, when one of such code regions is encountered/detected, that code section is distributed to the appropriate core).
Wu’s teaching optimizes power and performance efficiency (Wu, [0029], lines 1-3; [0001], lines 1-3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Wu with the combination of Dao and Lee in order to optimize power and performance efficiency.

Consider claim 12, Dao discloses a chip (col. 2, line 60, single-chip multiprocessor) comprising: at least one processor core (col. 2, lines 64-65, two or more microprocessor central processing units, or CPUs); a computational accelerator connected to each of the at least one processor core (col. 4, lines 19-21, multiple CPUs on the same integrated circuit chip should therefore be able to share a single high-performance FPU), the chip to implement operations, the operations comprising: decoding (col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by instruction decoders 16.sub.0, 16.sub.1, in their respective integer predecode stages 34.sub.0, 34.sub.1. As such, instruction queue stage 41.sub.0 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.0, while instruction queue stage 41.sub.1 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.1), by a target processor core (col. 2, lines 64-65, two or more microprocessor central processing units, or CPUs; in other words, the recited target processor core corresponds to any of two or more microprocessor central processing units) among the at least one processor core (col. 2, lines 64-65, two or more microprocessor central processing units, or CPUs), a to-be-executed instruction to obtain a computational identifier and at least one operand (col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by instruction decoders 16.sub.0, 16.sub.1, in their respective integer predecode stages 34.sub.0, 34.sub.1. As such, instruction queue stage 41.sub.0 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.0, while instruction queue stage 41.sub.1 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.1; col. 6, lines 35-40, each stage of instruction buffers 50 is preferably able to store an entire instruction code for a floating-point instruction 
	However, Dao does not disclose that the chip is an artificial intelligence chip. Dao also does not disclose a storage apparatus, storing at least one program thereon, wherein the at least one program, when executed by the artificial intelligence chip, causes the artificial intelligence chip to implement the aforementioned operations. 

Lee’s teaching supports a range of computations required in various machine-learning frameworks while employing a specialized architecture that can exploit algorithmic structure in order to achieve low energy (Lee, [0007], lines 3-8).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Lee with the invention of Dao in order to support a range of computations required in various machine-learning frameworks while employing a specialized architecture that can exploit algorithmic structure in order to achieve low energy.
However, the combination thus far does not entail a storage apparatus, storing at least one program thereon, wherein the at least one program, when executed by the artificial intelligence chip, causes the artificial intelligence chip to implement the aforementioned operations.
On the other hand, Wu discloses a storage apparatus, storing at least one program thereon, wherein the at least one program, when executed by a chip, causes the chip to implement operations ([0098], lines 1-12, the embodiments of methods, hardware, software, firmware or code set forth above may be implemented via instructions or code stored on a machine readable medium which are executable by a processing element. A machine-readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Wu with the combination of Dao and Lee, as this modification merely entails a combination of prior art elements according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Wu’s teaching, when applied to the combination of Dao and Lee which entails the cited artificial intelligence chip in particular and the cited operations, results in the overall claim limitation. 

Consider claim 13, Dao discloses a chip (col. 2, line 60, single-chip multiprocessor) implementing operations, the operations comprising: decoding (col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by instruction decoders 16.sub.0, 16.sub.1, in their respective integer predecode stages 34.sub.0, 34.sub.1. As such, instruction queue stage 41.sub.0 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.0, while instruction queue stage 41.sub.1 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.1), by a target processor core (col. 2, lines 64-65, two or more microprocessor central processing units, or CPUs; in other words, the recited target processor core corresponds to any of two or more microprocessor central processing units) among at least one processor core (col. 2, lines 64-65, two or more microprocessor central processing units, or 
	However, Dao does not disclose that the chip is an artificial intelligence chip. Dao also does not disclose a non-transitory computer readable medium, storing a computer program thereon, wherein the program, when executed by the artificial intelligence chip, implements the aforementioned operations. 
	On the other hand, Lee discloses a chip being an artificial intelligence chip ([0024], lines 1-2, machine-learning accelerator (MLA) block; [0021], lines 1-3, for example, application specific integrated circuits (ASICs) are typically used to realize algorithms implemented in hardware). In addition, to any extent to which Dao does not disclose the computation being “complex”, Lee also discloses “complex” computation (claim 16, calculating exponential values, square root values; [0028], line 19, discrete cosine transform).
Lee’s teaching supports a range of computations required in various machine-learning frameworks while employing a specialized architecture that can exploit algorithmic structure in order to achieve low energy (Lee, [0007], lines 3-8).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Lee with the invention of Dao in order to support a range of computations required in various machine-learning 
However, the combination thus far does not entail a non-transitory computer readable medium, storing a computer program thereon, wherein the program, when executed by the artificial intelligence chip, implements the aforementioned operations.
On the other hand, Wu discloses a non-transitory computer readable medium, storing a computer program thereon, wherein the program, when executed by a chip, implements operations ([0098], lines 1-12, the embodiments of methods, hardware, software, firmware or code set forth above may be implemented via instructions or code stored on a machine readable medium which are executable by a processing element. A machine-readable medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine, such as a computer or electronic system. For example, a machine-readable medium includes random-access memory (RAM), such as static RAM (SRAM) or dynamic RAM (DRAM); ROM; magnetic or optical storage medium; flash memory devices; electrical storage devices; optical storage devices; etc.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Wu with the combination of Dao and Lee, as this modification merely entails a combination of prior art elements according to known methods to yield predictable results, which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143. Note that Wu’s teaching, when applied to the combination of Dao and Lee which entails the cited artificial intelligence chip in particular and the cited operations, results in the overall claim limitation.

Consider claim 14, the overall combination entails an electronic device, comprising: a processor (Lee, [0022], line 3, central processing unit), a storage apparatus (Lee, [0022], lines 11-13, the CPU core 12 is interfaced with a program memory 16 and a data memory 18), and at least one artificial intelligence chip according to claim 12 (Lee, [0024], lines 1-2, machine-learning accelerator (MLA) block; [0021], lines 1-3, for example, application specific integrated circuits (ASICs) are typically used to realize algorithms implemented in hardware).

Claims 3-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dao, Lee, and Wu as applied to claim 2 above, and further in view of Koehler et al. (Koehler) (US 20090113212 A1).
Consider claim 3, the combination thus far entails the complex computational instruction queue comprises a complex computational instruction queue corresponding to each of the at least one processor core (Dao, col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by instruction decoders 16.sub.0, 16.sub.1, in their respective integer predecode stages 34.sub.0, 34.sub.1. As such, instruction queue stage 41.sub.0 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.0, while instruction queue stage 41.sub.1 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.1); and the adding, by the target processor core, the generated complex computational instruction to a complex computational instruction queue comprises: adding, by the target processor core, the generated complex computational instruction to a complex computational instruction queue corresponding to the target processor core (Dao, col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by 
	However, the combination thus far does not entail the complex computational result queue comprises a complex computational result queue corresponding to each of the at least one processor core, and the writing, by the computational accelerator, the obtained computational result as a complex computational result into a complex computational result queue comprises: writing, by the computational accelerator, the obtained computational result as the complex computational result into a complex computational result queue corresponding to a processor core corresponding to the complex computational instruction queue of the selected complex computational instruction.
	On the other hand, Koehler discloses a complex computational result queue comprises a complex computational result queue corresponding to each of at least one processor core, and a writing, by a computational accelerator, an obtained computational result as a complex computational result into a complex computational result queue comprises: writing, by the computational accelerator, the obtained computational result as the complex computational result 
Koehler’s teaching enables independent and concurrent operation of engines within an accelerator (Koehler, [0034], lines 8-11).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Koehler with the combination of Dao, Lee, and Wu in order to enable independent and concurrent operation of engines within an accelerator. Alternatively, this modification merely entails simple substitution of one known element (a result queue) for another (per-core result queues) to obtain predictable results (the combination of Dao, Lee, and Wu, entailing per-core result queues rather than a result queue), which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.

Consider claim 4, the overall combination entails after writing, by the computational accelerator, the obtained computational result as the complex computational result into a complex computational result queue corresponding to a processor core corresponding to the complex computational instruction queue of the selected complex computational instruction, the method further comprises: selecting, by the target processor core, the complex computational result from the complex computational result queue corresponding to the target processor core into at least one of: a result register in the target processor core, or a memory of the artificial intelligence chip (Dao, col. 9, lines 49-52, output buffer 78, through which communication of the .

Claims 5-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dao, Lee, and Wu as applied to claim 2 above, and further in view of Kahle et al. (Kahle) (US 6725354 B1).
Consider claim 5, the combination thus far entails the generating, by the target processor core, a complex computational instruction using the computational identifier and the at least one operand obtained by the decoding in response to determining that the computational identifier obtained by decoding is a preset complex computational identifier comprises: generating, by the target processor core, the complex computational instruction using the computational identifier, the at least one operand obtained by the decoding, in response to determining that the computational identifier obtained by the decoding is the preset complex computational identifier (Dao, col. 5, lines 27-35, the first stage of floating-point pipeline 40 is performed simultaneously for floating-point instructions detected by instruction decoders 16.sub.0, 16.sub.1, in their respective integer predecode stages 34.sub.0, 34.sub.1. As such, instruction queue stage 41.sub.0 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.0, while instruction queue stage 41.sub.1 receives a series of instruction codes for floating-point instructions detected in predecode stage 34.sub.1; col. 6, lines 35-40, each stage of instruction buffers 50 is preferably able to store an entire instruction code for a floating-point instruction which, in this case of x86-architecture CPUs 10, may include up to eight bytes (one for the tag, and up to seven bytes for the instruction code with identifiers for source and 
However, the combination thus far does not entail the aforementioned generation entails an identifier of the target processor core, and the aforementioned writing entails a processor core identifier in the selected complex computational instruction.
On the other hand, Kahle discloses generation entails an identifier of a target processor core (col. 6, lines 7-10, tag 264 of each entry 261 identifies either first processor core 201a or second processor core 201b as the source of entry's corresponding instruction 266), and writing entails a processor core identifier in a selected complex computational instruction (col. 6, lines 46-55, when a floating point instruction 266 completes execution in one of the pipelines 230, the depicted embodiment of shared floating point unit 231 routes instruction 266 to first processor core 201a and second processor core 201b. Each processor core 201 then examines the floating point instruction's tag 264 to determine the instruction's "owner." The processor core 201 that owns the floating point instruction will store the instructions result in an appropriate rename register while the processor core 201 that does not own the instruction will discard or ignore the instruction's results).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Kahle with the combination of Dao, Lee, and Wu in order to enable the simultaneous processing of distinct execution streams or "threads" in a single shared resource. Alternatively, this modification merely entails simple substitution of one known element (the manner by which the combination of Dao, Lee, and Wu directs a result to the corresponding processor core) for another (Kahle’s method of directing a result to the corresponding processor core) to obtain predictable results (the combination of Dao, Lee, and Wu, entailing Kahle’s identifier of a target processor core to direct a result to the corresponding processor core), which is an exemplary rationale that may support a conclusion of obviousness, as per MPEP 2143.

Consider claim 6, the overall combination entails after writing, by the computational accelerator, the obtained computational result and a processor core identifier in the selected complex computational instruction as the complex computational result into the complex computational result queue, the method further comprises: selecting, by the target processor core, a computational result in the complex computational result with the processor core identifier being the identifier of the target processor core from the complex computational result queue, and writing the computational result into at least one of: the result register in the target processor core, or the memory of the artificial intelligence chip (Kahle, col. 6, lines 7-10, tag 264 of each entry 261 identifies either first processor core 201a or second processor core 201b as the source of entry's corresponding instruction 266), and writing entails a processor core identifier in a .

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dao and Lee as applied to claim 1 above, and further in view of Shah et al. (Shah) (US 20150269074 A1).
Consider claim 9, the combination thus far does not entail the complex computational instruction queue and the complex computational result queue are stored in a cache.
On the other hand, Shah discloses a complex computational instruction queue and a complex computational result queue are stored in a cache ([0027], lines 15-16, instruction have been written to a queue in the shared cache; [0019], lines 1-2, after the accelerator performs operations on the data, it writes output data to the shared cache).
Shah’s teaching increases efficiency (Shah, [0003], lines 1-5; title).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Shah with the combination of Dao and Lee in order to increase efficiency. Alternatively, this modification merely entails a combination of prior art elements (the complex computational instruction queue .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Meltzer (US 5987587) discloses a shared FPU with results steered to cache (see FIG. 5). As such, this reference is relevant to the claimed computational accelerator that stores results to cache. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314.  The examiner can normally be reached on Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/KEITH E VICARY/            Primary Examiner, Art Unit 2182