DETAILED ACTION
Claims 1-10 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Referring to claim 1, the amendment in lines 4-6 is grammatically incorrect such that the scope of this limitation is unclear.  Specifically, the language “so that a number of other threads being executed while a thread refers to a…memory…of the architecture” is unclear.  Should “being” be replaced with --are--?
Further referring to claim 1, in the 3rd
The claims recite the following limitations for which there is a lack of antecedent basis:
In claim 1, line 6, “the ESM architecture”.  There are multiple ESM architectures in line 2.
Claims 2-10 are rejected due to their dependence on an indefinite claim.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Forsell, “MTAC - A Multithreaded VLIW Architecture for PRAM Simulation”, 1997, pp.1037-1055, in view of Ohkami, U.S. Patent No 5,600,810.
Referring to claim 1, Forsell has taught a processor architecture arrangement for emulated shared memory (ESM) architectures (see the title, abstract, page 1037, last paragraph, and page 1038, 2nd full paragraph, and note a distributed memory architecture is used to emulate a shared memory architecture, e.g. PRAM), the processor architecture arrangement comprising:
a) a number of multi-threaded processors (see page 1038, 2nd full paragraph, 1st sentence, which says “…using multithreading to increase the utilization of processors”) each provided with an interleaved inter-thread pipeline (see the abstract and page 1038, 2nd full paragraph, which set forth thread interleaving, i.e., switching to a next thread while a first thread is waiting, and super-pipelines), multiple threads being executed in a cyclic, interleaved manner by the pipeline so that a number of other threads being executed while a thread refers to a common physically distributed and logically shared data memory of the ESM architecture (see the last paragraph on page 1051, and the last paragraph of section 6.  A thread switch occurs every clock cycle (this is temporal multithreading known as fine-grained or interleaved multithreading), and there may be up to 500 threads.  As such, a first thread will access a memory but this latency will be hidden by switching in second, third, fourth, etc. threads in as many cycles.  Also, note that the memory accessed is a common physically distributed and logically shared memory (see FIG.2), and a plurality of functional units for carrying out arithmetic and logical operations on data (see FIG.4 on page 1043, and note multiple ALUs, which are functional units that perform arithmetic and logical operations),
b) wherein the interleaved inter-thread pipeline includes a first pipeline branch having a first sub-group of said plurality of functional units which are arranged for carrying out integer operations (see FIG.4 and note the pipeline comprises multiple function units (either multiple ALUs chained together, and/or multiple processing elements within a single ALU chained together.  These ALUs would operate on integer data in response to the “max” program of Table 3),
c) Forsell has not taught wherein the interleaved inter-thread pipeline includes at least two operatively parallel pipeline branches, the first pipeline and a second pipeline branch having a second, non-overlapping, sub-group of said plurality of functional units arranged for carrying out floating point operations.  However, Ohkami has taught floating-point arithmetic processing in parallel with integer processing (see FIG.2 and note the IUs and FPUs).  Floating-point operations allow for operations on fractions, as opposed to only integers, which has clear advantages.  FIG.4 of Forsell shows generic arithmetic and logic processing, and one of the interleaved inter-thread pipeline includes at least two operatively parallel pipeline branches, the first pipeline and a second pipeline branch having a second, non-overlapping, sub-group of said plurality of functional units arranged for carrying out floating point operations.
d) Forsell, as modified, has taught wherein the first and the second pipeline branches each comprise a plurality of segments, the segments being connected in series, and wherein each segment comprises one or more of said functional units (see FIG.4).  For ALU0 operation of A0, there are at least three illustrated segments shown in series (each having an ALU slice (functional unit) and at least one latch).  The floating-point branch would be substantially similar with floating-point data being processed by arithmetic slices and results being stored in latches).
e) Forsell, as modified, has taught wherein at least one of the functional units of the second sub-group which is arranged for floating point operations is located operatively in parallel with a memory access segment of the interleaved inter-thread pipeline (from FIG.4, the entire column of memory units is parallel with all other entire columns.  The entire memory unit column (or any subset thereof) may be considered a memory access segment, and thus, up to all (but at least one of the) ALUs are located in parallel with the memory access segment),
f) Forsell, as modified, has also not explicitly taught associated floating-point operations are executed simultaneously with pipeline stages of a memory unit taking care of pending accesses of the shared memory.  However, one of skill in the art recognizes that any given stage in all of the pipeline branches may be processing an instruction at any given time.  For associated floating-point operations are executed simultaneously with pipeline stages of a memory unit taking care of pending accesses of the shared memory.  As a basic example, assume, for simplicity, that each stage in FIG.4 corresponds to 1 clock cycle, which means the memory send/receive stages are 5-6 cycles after the ALU0 stage.  So, if an ALU0 instruction is input into the pipeline 5-6 cycles after a memory access operation, these two would execute at the same time as they enter their respective execution stages (assuming no stalling).  Regardless of stage timing, this would be the case when instructions are input at times such that different types of 
g) Forsell, as modified, has also taught while pipeline stages of a first segment of floating point functional units are operated in parallel with corresponding stages of a first segment of arithmetic and logic functional units and pipeline stages of a second segment of floating point functional units are operating in parallel with corresponding stages of a second segment of arithmetic and logic functional units.  Again, for similar reasons as set forth above, with full pipeline branches, all stages will be operated simultaneously.  For instance, if the ALUs in FIG.4 are integer ALUs, and the floating-point units are similarly implemented, then floating operations and integer operations would occur in parallel.  Further, an earlier load/store instruction will have already made its way further down the pipeline into the memory stages.  Thus, that instruction is occurring in parallel with the floating and integer operations.
h) Forsell, as modified, has taught wherein no overlapping exists between the functional units included in the first sub-group and the functional units included in the second sub-group (with the first sub-group being integer units, and the second sub-group being floating-point units, there is no overlap between the two.  They are completely separate hardware execution units), and
i) Forsell, as modified, has further taught a number of floating-point functional units being executed temporally in series is selected to be smaller than a number of integer units for a corresponding time period with respect to said smaller number of floating point functional units (see FIG.2 of Ohkami, which shows an example system with three integer units (IUs) and two floating-point units (FPUs).  As such, when the system is being fully utilized, the number of floating point units executing thread temporally in series is smaller than the number of integer 
Referring to claim 2, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, wherein at least one of the functional units of the first sub-group is located operatively in parallel with the memory access segment of the interleaved inter-thread pipeline (again from FIG.4, at least one ALU is in parallel with the memory access segment, which may be considered the entire memory unit column (or any subset thereof)). 
Referring to claim 3, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, wherein at least two or more of the functional units of the second sub-group in the second branch are chained together in a chain, and wherein a chained functional unit may pass an operation result to a subsequent functional unit in the chain as an operand (again, see FIG.4, along with section 3.4, where ALUs, and processing elements therein, are chained together.  In Forsell, as modified, integer units would be chained and floating point units would be chained). 
Referring to claim 4, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, wherein a number of functional units in said first and/or second branch are functionally positioned before a memory, where some functional units are in parallel, and some functional units are after the memory access segment (from FIG.4, the memory and memory access segment can said to be located at the “Memory Request Send” and “Memory Request Receive” rows since this is where memory is accessed.  ALUs are shown before and after this segment).
Referring to claim 7, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, wherein at least one functional unit is controllable through a number of operation selection fields of instruction words (this is inherent, as all ALUs are controlled by instruction words, particularly the opcode portion (operation selection fields) of the instruction words).
Referring to claim 8, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, wherein a number of operands for a functional unit can be determined in an operand select stage of the interleaved inter-thread pipeline in accordance with a number of operand selection fields given in an instruction word (this is inherent as opcodes indicate the number of operands.  For instance, an opcode indicating a basic add, would inform the processor to obtain two operands.  However, an opcode indicating a logical NOT, would inform the processor to obtain one operand, etc.). 
Referring to claim 9, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, wherein the second sub-group of functional units in said second branch includes at least one functional unit configured to execute at least one floating-point operation selected from the group consisting of: addition, subtraction, multiplication, division, comparison, transformation from integer to floating point, transformation from floating point to integer, square root, logarithm, and exponentiation .

Claims 5-6 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Forsell in view of Ohkami and the examiner’s taking of Official Notice.
Referring to claim 5, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, but has not taught wherein at least two functional units of the second sub-group are mutually of different complexity in terms of operation execution latency.  However, the examiner asserts that a floating-point multiply-add operation is a known operation in the art.  With such an operation, it is known to chain a multiplier to an adder to realize this function, which is useful in at least digital signal processing.  A multiplier is known to be more complex and have a higher latency than an adder.  As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Forsell such that at least two functional units of the second sub-group are mutually of different complexity in terms of operation execution latency.  
Referring to claim 6, Forsell, as modified, has taught the processor architecture arrangement of claim 5 but has not taught wherein a functional unit associated with longer latency is logically located in parallel with an end portion of the memory access segment.  However, where a multiplier is located simply amounts to a rearrangement of parts, which is not a patentable distinction (see e.g. In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 1950)).  From FIG.4, ALUs appear throughout, some earlier (higher up in FIG.4), and some later (lower in FIG.4).  A multiplier may appear anywhere, including where a later ALU is shown, which also corresponds to an end portion of the memory access segment, which again may be the entire a functional unit associated with longer latency is logically located in parallel with an end portion of the memory access segment.
Referring to claim 10, Forsell, as modified, has taught the processor architecture arrangement according to claim 1, but has not taught wherein a first functional unit of said second sub-group is configured to execute a plurality of floating point operations and a second functional unit of said second sub-group is configured to execute one or more other floating point operations.  However, a floating-point multiply-add operation is a known operation in the art.  With such an operation, it is known to chain a multiplier to an adder to realize this function, which is useful in at least digital signal processing.  As a result, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Forsell such that a first functional unit of said second sub-group is configured to execute a plurality of floating point operations (i.e., numerous multiplications in response to instructions) and a second functional unit of said second sub-group is configured to execute one or more other floating point operations (i.e., additions, to finish the multiply-accumulate).

Response to Arguments
On page 7 of applicant’s response, it is not clear how the arguments relate to the claimed language and the mappings to the references.  Applicant argues that Forsell has taught chaining, but this does not seem to have any relevance to the claim.  Forsell and Ohkami have taught all language bolded by applicant at the top of page 7, as set forth in the rejection.  That is, Forsell 

On page 8 of the response, applicant argues that there are no ALUs in parallel with the memory segment.
The examiner assumes applicant is referring to the memory request send and receive portions of the pipeline of FIG.4 when talking about the “memory segment”.  While the examiner agrees that no ALU is shown in parallel with this specific segment, the memory segment is not so limited.  The entire “Memory Units” column of FIG.4 may be said to comprise one or more memory access segments.  Therefore, any segment that is in parallel with an ALU of a floating-point or integer pipeline is the claimed memory access segment.

On page 8 of the response, applicant argues that a thread goes through the pipeline once and sub-instructions are executed one by one.
The examiner does not understand how this is relevant to the claims and rejection.  The bolded language preceding this argument is broad.  As can be seen from Forsell’s FIG.4, the memory access pipeline is the same length as any other pipeline, and, thus, has stages that overlap every other pipeline.  As such, where one pipeline may be handling a floating point operation, at least one stage in the memory access pipeline would be taking care of a pending access for a corresponding memory access instruction.

On page 8, 2nd to last paragraph, applicant argues that Ohkami has not taught fewer numbers of FPUs being executed temporally in series.
The examiner asserts that the combination of references does teach this limitation for reasons set forth in the rejection above. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David J. Huisman whose telephone number is 571-272-4168.  The examiner can normally be reached on Monday-Friday, 9:00 am-5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta, can be reached at 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/David J. Huisman/Primary Examiner, Art Unit 2183