DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to communication received on 08/10/2020. The applicant has submitted 20 claims for examination, all claims are currently pending. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-20 of copending Application No.  16/898,829 in view of Bertilsson et al US 2018/0314780. Although the claims at issue are not identical, they are not patentably distinct from each other because claims recite the same elements with respect to a system for solving differential equation comprising multiple execution units and respect systolic arrays. Although the claims of the instant application recited further a interface computer such would be obvious when considered in combination with the system for Bertilsson which teaches a modeling computer as an interface to submit and perform differential equation solving(see table).
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

16/989,821
16/989,829
1. A system comprising: an interface computer configured to receive a problem to be solved, the problem comprising a differential equation and a domain, 
and to store the received problem in a problem queue; 






a dispatch computer configured to receive the problem from the problem queue, 


and to select a solver of a plurality of solvers, based upon availability of each of the plurality of solvers; wherein each solver comprises: 





a coordinator; a plurality of differential equation accelerator (DEA) units, wherein each DEA unit comprises a plurality of systolic arrays, each systolic array having a hardware configuration for solving a corresponding type of differential equation; wherein the coordinator,


 in response to receiving the problem from the dispatch computer, partitions the domain into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a DEA unit of the plurality of DEA units; and wherein each of the plurality of DEA units having an assigned sub-domain is configured to process sub-domain data of its assigned sub-domain over a plurality of time-steps.
1. A system configured for discretized solving of partial differential equations, comprising: 

Bertilsson (modeling computer, ¶89)








systolic array of the plurality of systolic arrays, based upon a type of the partial differential equation,

a plurality of systolic arrays, each corresponding to a respective partial differential equation type; and a controller configured to select a systolic array of the plurality of systolic arrays, based upon a type of the partial differential equation,

 the systolic array having a plurality of sub-arrays and the systolic array configured to receive at least a portion of the plurality of nodes from the first memory, 





and process each received node in parallel using a respective sub-array of the plurality of sub-arrays, the respective sub-array comprising: a plurality of branches, each corresponding to a respective term of the discretized form of the partial differential equation and comprising a respective set of circuit elements to receive a value of the node to generate the respective term;


 




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bertilsson US 2018/0314780 and further in view of Tang US 2022/0148130.
Regarding claims 1, 9 and 17,Bertilsson teaches a system, method and circuit comprising: an interface computer configured to receive a problem to be solved,
["It is contemplated that the GUI module 220 may display GUI windows in connection with obtaining data for use in performing modeling, simulation, and/or other problem solving for one or more processes and/or physics phenomena under consideration by a system user. The one or more processes and/or phenomena may be assembled and solved by the Modeling and Simulation module 222. That is, user data may be gathered or received by the system using modules, such as the GUI module 220, and subsequently used by the Modeling and Simulation module 222. Thereafter, the data may be transferred or forwarded to the Data Storage and Retrieval module 224 where the user-entered data may be stored in a separate data structure (e.g., User Data Files 228). It is contemplated that other data and information may also be stored and retrieved from a separate data structure, such as Libraries 226, which may be used by the Modeling and Simulation module 222 or in connection with the GUI module 220.", ¶89]
[" It is contemplated that the systems and methods described herein may be used for combining physics interfaces that model different physical phenomena or processes. The combination of a plurality of physics interfaces can be referred to as a multiphysics model. Properties of the physics interfaces can be represented by PDEs that may be automatically combined to form PDEs describing physical quantities in a coupled system or representation. The coupled PDEs may be displayed, for example, in an “Equation view” that allows for the coupled PDEs to be modified and used as input into a solver. It is also contemplated that the PDEs may be provided to the solver either independently as one PDE or a system of PDEs, describing a single phenomenon or process, or as one or several systems of PDEs describing several phenomena or processes.", ¶91]

the problem comprising a differential equation and a domain, and to store the received problem in a problem queue(problem models for different problems to be solve acoustics maintained in the computer modeling system, ¶79, 106) ;
[" It is contemplated that in certain embodiments a product or process may be in the development or feasibility stage where it is being designed or analyzed. The product or process being developed or analyzed may need to be assessed for use in complex environment(s) involving several physical properties and quantities. It can be desirable to solve complex multiphysics problems by systematically varying parametric and geometric features in a computer-based design system. Other desirable features may include, for example, having a computer-based system for solving complex multiphysics problems in which the settings for the physical properties and boundary conditions, located in a memory and used to form multiphysics models and/or solve multiphysics problems, can be accessed directly from the design system.", ¶79]
["The GUI 439 also includes an exemplary list of physics interfaces 440 (e.g., AC/DC, Acoustics, Chemical Species Transport, Electrochemistry, Fluid Flow, Heat Transfer, Plasma, Radio Frequency, Structural Mechanics) from which a user may select in accordance with a user's choice of space dimensions. To add physics interfaces to a multiphysics model, the user selects physics interfaces from the list and may specify that these physics interfaces are to be included in a multiphysics model. For example, the user may right-click and then select context menu item “Add selected” 442 to add a physics interface (e.g., Heat Transfer in Fluids) to a multiphysics model. After selection, this physics interface is added to the list of “Selected physics” 444 below the physics list in the GUI 439. Physics interfaces may also be removed from the list by selecting a “Remove selected” button 446.", ¶106]


 wherein each solver comprises a plurality of differential equation accelerator (DEA) units, wherein each DEA unit comprises a plurality of arrays, each array having a hardware configuration for solving a corresponding type of differential equation(solvers for a solving PDE are based on type of equation (i.e the particular modeling being requested), processing unit execute the computation which can include a mesh/array configuration, ¶s110, 362 394)
[" It is contemplated that in certain aspect of the present disclosure, a study can determine the type of analysis that may be done on a multiphysics model, such as stationary, time-dependent, eigenvalue, and eigenfrequency. The study may control the type of equation formulation used in a multiphysics model, the type of mesh (e.g., selected from a list of possible meshes), and/or the type of solvers that may be used to solve the different studies or study steps in a multiphysics model. In one exemplary aspect, a study may comprise a stationary study step followed by a transient study step. The study then formulates the equations, meshes, and solvers for the stationary and time-dependent study steps. A user may select a study from the studies list 550 and then finish the model wizard steps by clicking the “Finish” button 554.", ¶110]
[" According to another aspect of the present disclosure, an apparatus for generating an application data structure includes a physical computing system comprising one or more processing units, one or more user input devices, a display device, and one or more memory devices. At least one of the one or more memory devices includes executable instructions for generating an application data structure. The executable instructions cause at least one of the one or more processing units to perform, upon execution, the acts of embedding a multiphysics model data structure for a physical system in an application data structure. The embedded multiphysics model data structure includes at least one modeling operation for the physical system. One or more geometry subroutines are added to the embedded multiphysics model data structure via at least one of the one or more input devices. At least one of the one or more geometry subroutines includes parameter definitions associated with the physical system. One or more call features are added to the embedded multiphysics data structure via at least one of the one or more input devices. The call features allow implementation of the geometry subroutines. One or more application features are determined, via at least one of said one or more processing units, to add to the application data structure. The one or more application features are associated with a model of the physical system. First data is added, via at least one of the one or more input devices, representing at least one form feature for at least one of the one or more application features for the model of the physical system. Second data is added, via at least one of the one or more input devices, representing at least one action feature for at least one of the one or more application features for the model of the physical system. The second data representing the at least one action feature is associated with the least one modeling operation for the physical system to define a sequence of operations for modeling the physical system.", ¶362]
[" It would further be desirable to create customized applications with an application builder system that assigns read data to a target for input and that associates a source for output of written data. A target for input can include, for example, one or more variables or parameters. In some aspects, a variable can be an array including a specific dimensionality with specific data types assigned to array members to at least partially define the one or more variables or parameters.", ¶394]

partitions the domain into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a DEA unit of the plurality of DEA units
["It is contemplated that in certain aspects of the present disclosure physical properties can be used to model physical quantities for component(s) and/or process(es) being examined using the modeling system, and the physical properties can be defined using a GUI that allow the physical properties to be described as numerical values. In certain aspects, physical properties can also be defined as mathematical expressions that include one or more numerical values, space coordinates, time coordinates, and/or the actual physical quantities. In certain aspects, the physical properties may apply to some parts of a geometrical domain, and the physical quantity itself may be undefined in the other parts of the geometrical domain. A geometrical domain or “domain” may be partitioned into disjoint subdomains. The mathematical union of these subdomains forms the geometrical domain or “domain”. The complete boundary of a domain may also be divided into sections referred to as “boundaries”. Adjacent subdomains may have common boundaries referred to as “borders”. The complete boundary is the mathematical union of all the boundaries including, for example, subdomain borders. For example, in certain aspects, a geometrical domain may be one-dimensional, two-dimensional, or three-dimensional in a GUI. However, as described in more detail elsewhere herein, the solvers may be able to handle any space dimension. It is contemplated that through the use of GUIs in one implementation, physical properties on a boundary of a domain may be specified and used to derive the boundary conditions of the PDEs."¶94]

Bertilsson teaches a system for solving differential equation using arrays but does not teach use of systolic arrays nor specifically a displayer and coordinator as claim thus Bertilsson does not teach wherein each solver comprises: a coordinator; a plurality of differential equation accelerator (DEA) units, wherein each DEA unit comprises a plurality of systolic arrays, each systolic array having a hardware configuration for solving a corresponding type of differential equation; 
wherein the coordinator, in response to receiving the problem from the dispatch computer, partitions the domain into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a DEA unit of the plurality of DEA units; and 
wherein each of the plurality of DEA units having an assigned sub-domain is configured to process sub-domain data of its assigned sub-domain over a plurality of time-steps.
Tang in the same field of endeavor teaches a system fin the analogous area of parallel processing and modeling. Tang teaches  a dispatch computer(thread dispatcher/inter-thread communicator ) configured to receive the problem from the problem queue(geometry pipeline etc), and to select a solver(compute unit/compute core) of a plurality of solvers, based upon availability of each of the plurality of solvers(a pipeline of data to be processes such as geometry data is maintained in a pipeline and feeds the thread dispatcher allocates work to execution units also called compute units, compute core, ¶111, 157) 
["In some embodiments, 3D/Media subsystem 315 includes logic for executing threads spawned by 3D pipeline 312 and media pipeline 316. In one embodiment, the pipelines send thread execution requests to 3D/Media subsystem 315, which includes thread dispatch logic for arbitrating and dispatching the various requests to available thread execution resources. The execution resources include an array of graphics execution units to process the 3D and media threads. In some embodiments, 3D/Media subsystem 315 includes one or more internal caches for thread instructions and data. In some embodiments, the subsystem also includes shared memory, including registers and addressable memory, to share data between threads and to store output data.", ¶111]
["In some embodiments, graphics processor 800 includes a geometry pipeline 820, a media pipeline 830, a display engine 840, thread execution logic 850, and a render output pipeline 870. In some embodiments, graphics processor 800 is a graphics processor within a multi-core processing system that includes one or more general-purpose processing cores. The graphics processor is controlled by register writes to one or more control registers (not shown) or via commands issued to graphics processor 800 via a ring interconnect 802. In some embodiments, ring interconnect 802 couples graphics processor 800 to other processing components, such as other graphics processors or general-purpose processors. Commands from ring interconnect 802 are interpreted by a command streamer 803, which supplies instructions to individual components of the geometry pipeline 820 or the media pipeline 830.", ¶157]

wherein each solver comprises a coordinator(thread arbiter); a plurality of differential equation accelerator (DEA) units(execution units), wherein each DEA unit comprises a plurality of systolic arrays(EU arrays¶73  which are systolic arrays, ¶144), each systolic array having a hardware configuration for solving a corresponding type of differential equation; wherein the coordinator, in response to receiving the problem from the dispatch computer(solvers selected  based on type to solve PDE(partial diff equations) of Bertillsson executing such solvers in the hardware configuration of Tang, Tang ¶s73,77,102, 144)) 
[“The graphics microcontroller 233 can be configured to perform various scheduling and management tasks for the graphics processor core 219. In one embodiment the graphics microcontroller 233 can perform graphics and/or compute workload scheduling on the various graphics parallel engines within execution unit (EU) arrays 222A-222F, 224A-224F within the sub-cores 221A-221F. In this scheduling model, host software executing on a CPU core of an SoC including the graphics processor core 219 can submit workloads one of multiple graphic processor doorbells, which invokes a scheduling operation on the appropriate graphics engine. Scheduling operations include determining which workload to run next, submitting a workload to a command streamer, pre-empting existing workloads running on an engine, monitoring progress of a workload, and notifying host software when a workload is complete. In one embodiment the graphics microcontroller 233 can also facilitate low-power or idle states for the graphics processor core 219, providing the graphics processor core 219 with the ability to save and restore registers within the graphics processor core 219 across low-power state transitions independently from the operating system and/or graphics driver software on the system”, ¶73].
[“Within each graphics sub-core 221A-221F includes a set of execution resources that may be used to perform graphics, media, and compute operations in response to requests by graphics pipeline, media pipeline, or shader programs. The graphics sub-cores 221A-221F include multiple EU arrays 222A-222F, 224A-224F, thread dispatch and inter-thread communication (TD/IC) logic 223A-223F, a 3D (e.g., texture) sampler 225A-225F, a media sampler 206A-206F, a shader processor 227A-227F, and shared local memory (SLM) 228A-228F. The EU arrays 222A-222F, 224A-224F each include multiple execution units, which are general-purpose graphics processing units capable of performing floating-point and integer/fixed-point logic operations in service of a graphics, media, or compute operation, including graphics, media, or compute shader programs. The TD/IC logic 223A-223F performs local thread dispatch and thread control operations for the execution units within a sub-core and facilitate communication between threads executing on the execution units of the sub-core. The 3D sampler 225A-225F can read texture or other 3D graphics related data into memory. The 3D sampler can read texture data differently based on a configured sample state and the texture format associated with a given texture. The media sampler 206A-206F can perform similar read operations based on the type and format associated with media data. In one embodiment, each graphics sub-core 221A-221F can alternately include a unified 3D and media sampler. Threads executing on the execution units within each of the sub-cores 221A-221F can make use of shared local memory 228A-228F within each sub-core, to enable threads executing within a thread group to execute using a common pool of on-chip memory.”, ¶77]
[“FIG. 2D is a block diagram of general purpose graphics processing unit (GPGPU) 270 that can be configured as a graphics processor and/or compute accelerator, according to embodiments described herein. The GPGPU 270 can interconnect with host processors (e.g., one or more CPU(s) 246) and memory 271, 272 via one or more system and/or memory busses. In one embodiment the memory 271 is system memory that may be shared with the one or more CPU(s) 246, while memory 272 is device memory that is dedicated to the GPGPU 270.”, ¶102]
[“The execution unit 600 also includes a compute unit 610 that includes multiple different types of functional units. In one embodiment the compute unit 610 includes an ALU unit 611 that includes an array of arithmetic logic units. The ALU unit 611 can be configured to perform 64-bit, 32-bit, and 16-bit integer and floating point operations. Integer and floating point operations may be performed simultaneously. The compute unit 610 can also include a systolic array 612, and a math unit 613. The systolic array 612 includes a W wide and D deep network of data processing units that can be used to perform vector or other data-parallel operations in a systolic manner. In one embodiment the systolic array 612 can be configured to perform matrix operations, such as matrix dot product operations. In one embodiment the systolic array 612 support 16-bit floating point operations, as well as 8-bit and 4-bit integer operations. In one embodiment the systolic array 612 can be configured to accelerate machine learning operations. In such embodiments, the systolic array 612 can be configured with support for the bfloat 16-bit floating point format. In one embodiment, a math unit 613 can be included to perform a specific subset of mathematical operations in an efficient and lower-power manner than then ALU unit 611. The math unit 613 can include a variant of math logic that may be found in shared function logic of a graphics processing engine provided by other embodiments (e.g., math logic 422 of the shared function logic 420 of FIG. 4). In one embodiment the math unit 613 can be configured to perform 32-bit and 64-bit floating point operations.”, ¶144]
partitions the domain into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a DEA unit of the plurality of DEA units(each core can be further partitioned into sub-core/subslice with respective sets of systolic arrays), 
[" FIGS. 5A-5B illustrate thread execution logic 500 including an array of processing elements employed in a graphics processor core according to embodiments described herein. Elements of FIGS. 5A-5B having the same reference numbers (or names) as the elements of any other figure herein can operate or function in any manner similar to that described elsewhere herein, but are not limited to such. FIG. 5A-5B illustrates an overview of thread execution logic 500, which may be representative of hardware logic illustrated with each sub-core 221A-221F of FIG. 2B. FIG. 5A is representative of an execution unit within a general-purpose graphics processor, while FIG. 5B is representative of an execution unit that may be used within a compute accelerator.", ¶126]
[" one embodiment, arrays of multiple instances of the graphics execution unit 508 can be instantiated in a graphics sub-core grouping (e.g., a sub-slice). For scalability, product architects can choose the exact number of execution units per sub-core grouping. In one embodiment the execution unit 508 can execute instructions across a plurality of execution channels. In a further embodiment, each thread executed on the graphics execution unit 508 is executed on a different channel.", ¶142]
["The execution unit 600 also includes a compute unit 610 that includes multiple different types of functional units. In one embodiment the compute unit 610 includes an ALU unit 611 that includes an array of arithmetic logic units. The ALU unit 611 can be configured to perform 64-bit, 32-bit, and 16-bit integer and floating point operations. Integer and floating point operations may be performed simultaneously. The compute unit 610 can also include a systolic array 612, and a math unit 613. The systolic array 612 includes a W wide and D deep network of data processing units that can be used to perform vector or other data-parallel operations in a systolic manner. In one embodiment the systolic array 612 can be configured to perform matrix operations, such as matrix dot product operations. In one embodiment the systolic array 612 support 16-bit floating point operations, as well as 8-bit and 4-bit integer operations. In one embodiment the systolic array 612 can be configured to accelerate machine learning operations. In such embodiments, the systolic array 612 can be configured with support for the bfloat 16-bit floating point format. In one embodiment, a math unit 613 can be included to perform a specific subset of mathematical operations in an efficient and lower-power manner than then ALU unit 611. The math unit 613 can include a variant of math logic that may be found in shared function logic of a graphics processing engine provided by other embodiments (e.g., math logic 422 of the shared function logic 420 of FIG. 4). In one embodiment the math unit 613 can be configured to perform 32-bit and 64-bit floating point operations.", ¶144]
 and wherein each of the plurality of DEA units having an assigned sub-domain is configured to process sub-domain data of its assigned sub-domain over a plurality of time-steps(sub-arrays of Tang, ¶144,234 executing the problem sub domains of Bertilsson , ¶94) .
["The execution unit 600 also includes a compute unit 610 that includes multiple different types of functional units. In one embodiment the compute unit 610 includes an ALU unit 611 that includes an array of arithmetic logic units. The ALU unit 611 can be configured to perform 64-bit, 32-bit, and 16-bit integer and floating point operations. Integer and floating point operations may be performed simultaneously. The compute unit 610 can also include a systolic array 612, and a math unit 613. The systolic array 612 includes a W wide and D deep network of data processing units that can be used to perform vector or other data-parallel operations in a systolic manner. In one embodiment the systolic array 612 can be configured to perform matrix operations, such as matrix dot product operations. In one embodiment the systolic array 612 support 16-bit floating point operations, as well as 8-bit and 4-bit integer operations. In one embodiment the systolic array 612 can be configured to accelerate machine learning operations. In such embodiments, the systolic array 612 can be configured with support for the bfloat 16-bit floating point format. In one embodiment, a math unit 613 can be included to perform a specific subset of mathematical operations in an efficient and lower-power manner than then ALU unit 611. The math unit 613 can include a variant of math logic that may be found in shared function logic of a graphics processing engine provided by other embodiments (e.g., math logic 422 of the shared function logic 420 of FIG. 4). In one embodiment the math unit 613 can be configured to perform 32-bit and 64-bit floating point operations.", ¶144]
["FIG. 16 illustrates an exemplary recurrent neural network. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1600 can be described as having an input layer 1602 that receives an input vector, hidden layers 1604 to implement a recurrent function, a feedback mechanism 1605 to enable a ‘memory’ of previous states, and an output layer 1606 to output a result. The RNN 1600 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1605. For a given time step, the state of the hidden layers 1604 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1604. A second input (x.sub.2) can be processed by the hidden layer 1604 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t−1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1604 can vary depending on the specific implementation details of the RNN 1600.", ¶234]


It would have been obvious to a person of ordinary skill in the art at the time of the effective filing of the instant application to modify Bertilsson with the architecture for compute units as taught by Tang. The reason for this modification would be apply the well known of use of systolic arrays architecture to provide efficient processing of large computational tasks.
Regarding claims 2, 10 and 18, Tang/Bertilsson teachwherein a DEA unit of the plurality of DEA units is configured to process sub-domain data over the plurality of time-steps by: storing the sub-domain data in a first memory portion(each compute unit comprises multiple caches to hold data for systolic arrays, ¶s61,133,159,163) ;
 selecting a systolic array of the plurality of systolic arrays for processing the sub-domain data, based upon a type of differential equation of the problem(sub-arrays of Tang, ¶144,234 executing the problem sub domains of Bertilsson , ¶94)
 processing the sub-domain data using the selected systolic array to generate processed sub-domain data(sub-arrays of Tang, ¶144,234 executing the problem sub domains of Bertilsson , ¶94) .
storing the processed sub-domain data in a second memory portion separate from the first memory portion(each compute unit comprises multiple caches to hold data for systolic arrays, ¶s61,133,159,163)
[Tang," Each of processor cores 202A-202N includes one or more internal cache units 204A-204N. ", ¶61]
[" One or more internal instruction caches (e.g., 506) are included in the thread execution logic 500 to cache thread instructions for the execution units. In some embodiments, one or more data caches (e.g., 512) are included to cache thread data during thread execution. Threads executing on the execution logic 500 can also store explicitly managed data in the shared local memory 511. In some embodiments, a sampler 510 is included to provide texture sampling for 3D operations and media sampling for media operations. In some embodiments, sampler 510 includes specialized texture or media sampling functionality to process texture or media data during the sampling process before providing the sampled data to an execution unit.", ¶133

[Tang," In some embodiments, execution units 852A-852B are an array of vector processors having an instruction set for performing graphics and media operations. In some embodiments, execution units 852A-852B have an attached L1 cache 851 that is specific for each array or shared between the arrays. The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.", ¶159]

[Tang," The graphics processor 800 has an interconnect bus, interconnect fabric, or some other interconnect mechanism that allows data and message passing amongst the major components of the processor. In some embodiments, execution units 852A-852B and associated logic units (e.g., L1 cache 851, sampler 854, texture cache 858, etc.) interconnect via a data port 856 to perform memory access and communicate with render output pipeline components of the processor. In some embodiments, sampler 854, caches 851, 858 and execution units 852A-852B each have separate memory access paths. In one embodiment the texture cache 858 can also be configured as a sampler cache.", ¶163]

[Tang,"The execution unit 600 also includes a compute unit 610 that includes multiple different types of functional units. In one embodiment the compute unit 610 includes an ALU unit 611 that includes an array of arithmetic logic units. The ALU unit 611 can be configured to perform 64-bit, 32-bit, and 16-bit integer and floating point operations. Integer and floating point operations may be performed simultaneously. The compute unit 610 can also include a systolic array 612, and a math unit 613. The systolic array 612 includes a W wide and D deep network of data processing units that can be used to perform vector or other data-parallel operations in a systolic manner. In one embodiment the systolic array 612 can be configured to perform matrix operations, such as matrix dot product operations. In one embodiment the systolic array 612 support 16-bit floating point operations, as well as 8-bit and 4-bit integer operations. In one embodiment the systolic array 612 can be configured to accelerate machine learning operations. In such embodiments, the systolic array 612 can be configured with support for the bfloat 16-bit floating point format. In one embodiment, a math unit 613 can be included to perform a specific subset of mathematical operations in an efficient and lower-power manner than then ALU unit 611. The math unit 613 can include a variant of math logic that may be found in shared function logic of a graphics processing engine provided by other embodiments (e.g., math logic 422 of the shared function logic 420 of FIG. 4). In one embodiment the math unit 613 can be configured to perform 32-bit and 64-bit floating point operations.", ¶144]
[Tang,"FIG. 16 illustrates an exemplary recurrent neural network. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1600 can be described as having an input layer 1602 that receives an input vector, hidden layers 1604 to implement a recurrent function, a feedback mechanism 1605 to enable a ‘memory’ of previous states, and an output layer 1606 to output a result. The RNN 1600 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1605. For a given time step, the state of the hidden layers 1604 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1604. A second input (x.sub.2) can be processed by the hidden layer 1604 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t−1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1604 can vary depending on the specific implementation details of the RNN 1600.", ¶234]
[Bertilsson, "It is contemplated that in certain aspects of the present disclosure physical properties can be used to model physical quantities for component(s) and/or process(es) being examined using the modeling system, and the physical properties can be defined using a GUI that allow the physical properties to be described as numerical values. In certain aspects, physical properties can also be defined as mathematical expressions that include one or more numerical values, space coordinates, time coordinates, and/or the actual physical quantities. In certain aspects, the physical properties may apply to some parts of a geometrical domain, and the physical quantity itself may be undefined in the other parts of the geometrical domain. A geometrical domain or “domain” may be partitioned into disjoint subdomains. The mathematical union of these subdomains forms the geometrical domain or “domain”. The complete boundary of a domain may also be divided into sections referred to as “boundaries”. Adjacent subdomains may have common boundaries referred to as “borders”. The complete boundary is the mathematical union of all the boundaries including, for example, subdomain borders. For example, in certain aspects, a geometrical domain may be one-dimensional, two-dimensional, or three-dimensional in a GUI. However, as described in more detail elsewhere herein, the solvers may be able to handle any space dimension. It is contemplated that through the use of GUIs in one implementation, physical properties on a boundary of a domain may be specified and used to derive the boundary conditions of the PDEs."¶94]


Regarding claims 3 and 11, Tang teaches wherein the DEA unit is further configured to, during a time step of the plurality of time steps: store the processed sub-domain data in a third memory portion separation from the first and second memory portions; and transmit the processed sub-domain data from the third memory portion to the coordinator(data computed from each array  stored in one of multiple caches in a core can be passed to thread dispatcher/interthread communicator to shared data  between cores, ¶74).
["The graphics processor core 219 may have greater than or fewer than the illustrated sub-cores 221A-221F, up to N modular sub-cores. For each set of N sub-cores, the graphics processor core 219 can also include shared function logic 235, shared and/or cache memory 236, a geometry/fixed function pipeline 237, as well as additional fixed function logic 238 to accelerate various graphics and compute processing operations. The shared function logic 235 can include logic units associated with the shared function logic 420 of FIG. 4 (e.g., sampler, math, and/or inter-thread communication logic) that can be shared by each N sub-cores within the graphics processor core 219. The shared and/or cache memory 236 can be a last-level cache for the set of N sub-cores 221A-221F within the graphics processor core 219, and can also serve as shared memory that is accessible by multiple sub-cores. The geometry/fixed function pipeline 237 can be included instead of the geometry/fixed function pipeline 231 within the fixed function block 230 and can include the same or similar logic units.", ¶74]

Regarding claims 4 and 12, Tang teaches wherein the DEA unit further comprises a parameter storage separate from the first and second memory portions configured to store one or more constants associated with the problem.
["The compute units 260A-260N can couple with a constant cache 267, which can be used to store constant data, which is data that will not change during the run of kernel or shader program that executes on the GPGPU 270. In one embodiment the constant cache 267 is a scalar data cache and cached data can be fetched directly into the scalar registers 262.", ¶103]

Regarding claims 5, 13 and 19, wherein a first DEA unit of the plurality of DEA units having an assigned sub-domain is configured to, for each time-step, receive external domain data corresponding to a portion of sub-domain data assigned to a second DEA unit of the plurality of DEA units, the external domain data corresponding a portion of the domain adjacent to the sub- domain assigned to the first DEA unit(external cache used to hold data input data from the modeling system (i.e. data set in modeling parameters), such data fed as inputs to the cores/arrays execution a portion of the computation, ¶s106,234) .
["FIG. 16 illustrates an exemplary recurrent neural network. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1600 can be described as having an input layer 1602 that receives an input vector, hidden layers 1604 to implement a recurrent function, a feedback mechanism 1605 to enable a ‘memory’ of previous states, and an output layer 1606 to output a result. The RNN 1600 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1605. For a given time step, the state of the hidden layers 1604 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1604. A second input (x.sub.2) can be processed by the hidden layer 1604 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t−1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1604 can vary depending on the specific implementation details of the RNN 1600.", ¶234]
[“FIG. 3A is a block diagram of a graphics processor 300, which may be a discrete graphics processing unit, or may be a graphics processor integrated with a plurality of processing cores, or other semiconductor devices such as, but not limited to, memory devices or network interfaces. In some embodiments, the graphics processor communicates via a memory mapped I/O interface to registers on the graphics processor and with commands placed into the processor memory. In some embodiments, graphics processor 300 includes a memory interface 314 to access memory. Memory interface 314 can be an interface to local memory, one or more internal caches, one or more shared external caches, and/or to system memory.", ¶106]

Regarding claims 6 and 14, Tang teaches wherein the first DEA unit is configured to, during each time-step, process data of its assigned sub-domain in an order such that a portion of the sub-domain farther from the external domain data is processed before a portion of the sub-domain nearer the external domain data(subcores can be implement a neural net where input data  from one node feeds the next node until final data is output¶226,226, final data is written to external  cache, ¶106 such implemented a pipelined processing where the processors that perform first stage of processing task are farther from external memory as compared to later pipeline stages).
[“FIG. 3A is a block diagram of a graphics processor 300, which may be a discrete graphics processing unit, or may be a graphics processor integrated with a plurality of processing cores, or other semiconductor devices such as, but not limited to, memory devices or network interfaces. In some embodiments, the graphics processor communicates via a memory mapped I/O interface to registers on the graphics processor and with commands placed into the processor memory. In some embodiments, graphics processor 300 includes a memory interface 314 to access memory. Memory interface 314 can be an interface to local memory, one or more internal caches, one or more shared external caches, and/or to system memory.", ¶106]
["Recurrent neural networks (RNNs) are a family of feedforward neural networks that include feedback connections between layers. RNNs enable modeling of sequential data by sharing parameter data across different parts of the neural network. The architecture for a RNN includes cycles. The cycles represent the influence of a present value of a variable on its own value at a future time, as at least a portion of the output data from the RNN is used as feedback for processing subsequent input in a sequence. This feature makes RNNs particularly useful for language processing due to the variable nature in which language data can be composed.", ¶222]
["FIG. 16 illustrates an exemplary recurrent neural network. In a recurrent neural network (RNN), the previous state of the network influences the output of the current state of the network. RNNs can be built in a variety of ways using a variety of functions. The use of RNNs generally revolves around using mathematical models to predict the future based on a prior sequence of inputs. For example, an RNN may be used to perform statistical language modeling to predict an upcoming word given a previous sequence of words. The illustrated RNN 1600 can be described as having an input layer 1602 that receives an input vector, hidden layers 1604 to implement a recurrent function, a feedback mechanism 1605 to enable a ‘memory’ of previous states, and an output layer 1606 to output a result. The RNN 1600 operates based on time-steps. The state of the RNN at a given time step is influenced based on the previous time step via the feedback mechanism 1605. For a given time step, the state of the hidden layers 1604 is defined by the previous state and the input at the current time step. An initial input (x.sub.1) at a first time step can be processed by the hidden layer 1604. A second input (x.sub.2) can be processed by the hidden layer 1604 using state information that is determined during the processing of the initial input (x.sub.1). A given state can be computed as s.sub.t=ƒ(Ux.sub.t+Ws.sub.t−1), where U and W are parameter matrices. The function ƒ is generally a nonlinearity, such as the hyperbolic tangent function (Tan h) or a variant of the rectifier function ƒ(x)=max(0,x). However, the specific mathematical function used in the hidden layers 1604 can vary depending on the specific implementation details of the RNN 1600.", ¶234]

Regarding claims 7 and 15, Bertilsson/Tang teach wherein the plurality of DEA units of the solver are connected in an arrangement based upon a partitioning scheme used by the coordinator to partition the domain into the plurality of sub-domains( The partitioning of a domain into subdomains(Bertilsson, ¶94) would be applied to the partitioning of an systolic array into sub-arrays/subslices (Tang ¶142)).
[Bertilsson, "It is contemplated that in certain aspects of the present disclosure physical properties can be used to model physical quantities for component(s) and/or process(es) being examined using the modeling system, and the physical properties can be defined using a GUI that allow the physical properties to be described as numerical values. In certain aspects, physical properties can also be defined as mathematical expressions that include one or more numerical values, space coordinates, time coordinates, and/or the actual physical quantities. In certain aspects, the physical properties may apply to some parts of a geometrical domain, and the physical quantity itself may be undefined in the other parts of the geometrical domain. A geometrical domain or “domain” may be partitioned into disjoint subdomains. The mathematical union of these subdomains forms the geometrical domain or “domain”. The complete boundary of a domain may also be divided into sections referred to as “boundaries”. Adjacent subdomains may have common boundaries referred to as “borders”. The complete boundary is the mathematical union of all the boundaries including, for example, subdomain borders. For example, in certain aspects, a geometrical domain may be one-dimensional, two-dimensional, or three-dimensional in a GUI. However, as described in more detail elsewhere herein, the solvers may be able to handle any space dimension. It is contemplated that through the use of GUIs in one implementation, physical properties on a boundary of a domain may be specified and used to derive the boundary conditions of the PDEs."¶94]
[" one embodiment, arrays of multiple instances of the graphics execution unit 508 can be instantiated in a graphics sub-core grouping (e.g., a sub-slice). For scalability, product architects can choose the exact number of execution units per sub-core grouping. In one embodiment the execution unit 508 can execute instructions across a plurality of execution channels. In a further embodiment, each thread executed on the graphics execution unit 508 is executed on a different channel.", ¶142]

Regarding claims 8, 16 and 20 Bertilsson teaches wherein the coordinator is further configured to: receive, from each of the plurality of DEA units having an assigned sub-domain, processed sub-domain data corresponding to the sub-domain processed over the plurality of time-steps generate a solution package corresponding to the problem by assembling the received processed sub-domain data from each of the plurality of DEA units; transmit the solution package to a solution queue of the interface computer(results from various core/execution units can be assembled  into a solution at another server and sent to the modeling computer, for display, or stored at the modeling system for display via another server, ¶78) .

["It is contemplated that computer systems on which modeling systems operate, such as the modeling systems described herein, can include networked computers or processors. In certain embodiments, processors may be operating directly on the modeling system user's computer, and in other embodiments, a processor may be operating remotely. For example, a user may provide various input parameters at one computer or terminal located at a certain location. Those parameters may be processed locally on the one computer or they may be transferred over a local area network or a wide area network, to another processor, located elsewhere on the network that is configured to process the input parameters. The second processor may be associated with a server connected to the Internet (or other network) or the second processor can be several processors connected to the Internet (or other network), each handling select function(s) for developing and solving a problem on the modeling system. It is further contemplated that the results of the processing by the one or more processors can then be assembled at yet another server or processor. It is also contemplated that the results may be assembled back at the terminal or computer where the user is situated. The terminal or computer where the user is situated can then display the solution of the multiphysics modeling system to the user via a display (e.g., a transient display) or in hard copy form (e.g., via a printer). Alternatively or in addition, the solution may be stored in a memory associated with the terminal or computer, or the solution may be stored on another server that the user may access to obtain the solution from the modeling system.", ¶78]

Relevant Art Cited By The Examiner

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 2020/0301876 - Digital Processing Connectivity-  
US 2018/0307980 - SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION 
US 6,041,398 - Massively Parallel Multiple-folded Clustered Processor Mesh Array  
US 2021/0019591 - SYSTEM AND METHOD FOR PERFORMING SMALL CHANNEL COUNT CONVOLUTIONS IN ENERGY-EFFICIENT INPUT OPERAND STATIONARY ACCELERATOR



Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TOM Y. CHANG whose telephone number is (571)270-5938.  The examiner can normally be reached on Monday - Thursday from 9am to 5pm.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, William Trost , can be reached on (571)272-7872. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through 
Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/TOM Y CHANG/
Primary Examiner, Art Unit 2456