DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on May 6, 2021 were filed after the mailing date of the application on May 6, 2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Double Patenting
3.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
4.	Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-19 of U.S. Patent No. 11,023,560, as shown in the tables below. Claims 1-11 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-11 respectively of U.S. Patent No. 10,380,222.  Although the claims at issue are not identical, they are not patentably distinct from each other because the instant application claims are broader in every aspect than the patent claims and are therefore an obvious variant thereof.
17/313,324
Claim 1
2
3
4
5
6
7
8
9
10
11
12
13
11,023,560
Claim 1
2
3
4
5
6
7
8
9
10
11
12
13
10,380,222
Claim 1
2
3
4
5
6
7
8
9
10
11




17/313,324
14
15
16
17
18
19
20
11,023,560
13
14
15
16
17
18
19


17/313,324 (Claim 1)
11,023,560 (Claim 1)
A system comprising:  a first graphics processing unit that processes a first data block of a data matrix associated with a matrix factorization system to generate first information for the matrix factorization system; and
A system comprising:  a first graphics processing unit that processes a first data block of a data matrix associated with a matrix factorization system to generate first information for the matrix factorization system; and
a second graphics processing unit that processes a first portion of a second data block of the data matrix separate from a second portion of the second data block to generate second information for the matrix factorization system,
a second graphics processing unit that processes a first portion of a second data block of the data matrix separate from a second portion of the second data block to generate second information for the matrix factorization system,
wherein the second data block comprises distinct features compared to the first data block.
wherein the second data block comprises disjoint, unrelated features compared to the first data block.


17/313,324 (Claim 1)
10,380,222 (Claim 1)
A system comprising:  a first graphics processing unit that processes a first data block of a data matrix associated with a matrix factorization system to generate first information for the matrix factorization system; and
A system comprising:  a first graphics processing unit that processes a first data block of a data matrix associated with a matrix factorization system to generate first information for the matrix factorization system;
a second graphics processing unit that processes a first portion of a second data block of the data matrix separate from a second portion of the second data block to generate second information for the matrix factorization system,
a second graphics processing unit that processes a first portion of a second data block of the data matrix separate from a second portion of the second data block to generate second information for the matrix factorization system,
wherein the second data block comprises distinct features compared to the first data block.
wherein the second data block comprises disjoint, unrelated features compared to the first data block


Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

8.	Claims 1, 6, 7, and 9-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (US 20160321776A1).
9.	As per Claim 1, Zou teaches a system comprising:  a first graphics processing unit that processes a first data block of a data matrix associated with a matrix factorization system to generate first information for the matrix factorization system; and a second graphics processing unit that processes a first portion of a second data block of the data matrix separate from a second portion of the second data block to generate second information for the matrix factorization system, wherein the second data block comprises distinct features compared to the first data block (a storage model parameter matrix and a storage gradient matrix are equally divided into partitions spatially, the number of partitions depends on the number of data parallel groups (i.e., the total number of worker groups) and is half thereof, this is in consideration of making full use of communication efficiency between GPUs, and because GPU communication is peer to peer and two GPUs participate into one communication, the number of the groups is the total number of worker groups (i.e., the total number of the GPUs for a single-GPU worker group)/2, [0091], each GPU is bound to one worker, and two adjacent GPUs make up one worker group, that is, parallel training configuration where four channels of data are parallel and two GPU models in the group are parallel is formed, one GPU in each worker group is responsible for one part of the training model, while the other GPU is responsible for the other part of the training model, and each GPU corresponds to one worker, [0117]).  “Distinct” means that there is a difference between them.  Thus, two things can be related but still have a difference between them.  Since Zou teaches dividing the data matrix into partitions, and each worker group processes a different partition of the data matrix [0091] and there are two GPUs in each worker group [0117], then a GPU in a first worker group processes a first data block of the data matrix, and a GPU in a second worker group processes a second data block of the data matrix.  It would have been obvious to one of ordinary skill in the art that the GPU in the first worker group processes the first data block that is different and thus distinct from the second data block that the GPU in the second worker group processes, because if one GPU is already processing a data block, then there would be no point in having another GPU process that same data block.
10.	As per Claim 6, Zou teaches further comprising a central processing unit [0008], wherein  the first graphics processing unit processes the first data block to generate the first information [0091, 0117].  The GPU takes out the mini-batch data each time, to perform mini-batch training, 
11.	As per Claim 7, Zou teaches further comprising a central processing unit [0008], wherein  the second graphics processing unit processes the second data block to generate the second information [0091, 0117].  The GPU takes out the mini-batch data each time, to perform mini-batch training, a gradient is obtained according to a result of the mini-batch training, a model parameter is updated according to the gradient, the gradient is synchronized to models in other GPUs, gradient synchronized from the other GPUs are received at the same time, and the model parameter is updated once again; in the way, the plurality of GPUs in the parallel training all has the latest model parameter [0078].  After parameter updating is completed, whether the training data has been completely processed is judged, and if no, next mini-batch training data is continuously acquired for training.  Otherwise, the learning rate is updated according to the model parameter [0079].  The adaptive learning rate updating module 37 is in the CPU, as discussed in the rejection for Claim 6.  Thus, if the training data has been completely processed, then the adaptive learning rate updating module 37 in the CPU updates the learning rate according to the model parameter [0079] which was generated by the graphics processing unit [0078].  Thus, the graphics processing unit transmits the generated model parameter to the adaptive learning rate updating module 37 in the CPU in response to a determination that the training data has been completely processed.  Thus, Zou teaches wherein the second graphics processing unit transmits the second information to the central processing unit in response to a determination that a criterion associated with the second data block is satisfied [0091, 0117, 0078, 0079].
the peripheral interface 108, the processor 106, and the memory controller 104 may be implemented in a single chip, [0068]).  Thus, the machine learning model is further processed by an integrated circuit based on at least the first information provided by the first graphics processing unit and the second information provided by the second graphics processing unit.
13.	As per Claim 10, Zou teaches wherein the system further comprises:  a third graphics processing unit that processes a third data block of the data matrix to generate third information for the matrix factorization system; and a fourth graphics processing unit that processes a third portion of a fourth data block of the data matrix based on a fourth portion of the fourth data block to generate fourth information for the matrix factorization system [0091, 0117].
14.	As per Claim 11, Zou teaches GPU sockets 0, 1, 2 and 3 are installed to a CPU, and GPU sockets 4, 5, 6 and 7 are installed to another CPU [0070].  The number of GPUs is not limited to that shown in Fig. 3, which can include fewer GPUs [0069].  Thus, the first graphics processing unit and the second graphics processing unit are installed to a first central processing unit, and the third graphics processing unit and the fourth graphics processing unit are installed to a second central processing unit.  Since Zou teaches the central processing unit that processes a portion of the machine learning model for the matrix factorization system based on at least the first information provided by the first graphics processing unit and the second information provided by the second graphics processing unit, as discussed in the rejection for Claims 6-7, this .
15.	Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (US 20160321776A1) in view of Dijkman (US 20160239706A1).
	Zou is relied upon for the teachings as discussed above relative to Claim 1.  Zou teaches that the second graphics processing unit processes the first portion of the second data block separate from the second portion of the second data block [0091, 0117].
	However, Zou does not teach wherein the second graphics processing unit divides the second data block into at least the first portion of the second data block and the second portion of the second data block.  However, Dijkman teaches wherein the graphics processing unit divides the data block into at least the first portion of the data block and the second portion of the data block (GPU may tile over the result matrix to divide the work among processing units and then block on the input matrices into small enough blocks to fit in cache, [0083]).  Since Zou teaches that the second graphics processing unit processes the first portion of the second data block separate from the second portion of the second data block [0091, 0117], this teaching from Dijkman can be implemented into the device of Zou so that wherein the second graphics processing unit divides the second data block into at least the first portion of the second data block and the second portion of the second data block so that the second graphics processing unit can process the first portion of the second data block separate from the second portion of the second data block.
.
16.	Claims 3 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (US 20160321776A1) in view of Vij (US008373710B1).
17.	As per Claim 3, Zou is relied upon for the teachings as discussed above relative to Claim 1.
	However, Zou does not expressly teach wherein the second graphics processing unit processes the first portion of the second data block based on a set of policies for processing the second data block.  However, Vij teaches chunking operations are determined based on the determined allocation of atomic operations (col. 10, lines 12-14).  Atomic operations represent operations that are performed by one or more threads of GPUs 155 in a path-independent manner--i.e., without waiting for certain conditions to occur or without for data inputs from other operations to be determined (col. 8, lines 16-20).  Chunking the data is dividing up the data on which the calculation is to be performed so as to increase the degree to which each GPU core is provided with a steady stream of data for performing its constituent operations (col. 3, lines 5-9).  Thus, one policy for processing is performing chunking operations and another policy for processing is processing in a path-independent manner.  Thus, Vij teaches wherein the second graphics processing unit processes the first portion of the second data block based on a set of policies for processing the second data block (col. 10, lines 12-14; col. 8, lines 16-20; col. 3, lines 5-9).

18.	As per Claim 5, Zou does not teach wherein the second graphics processing unit processes the first portion of the second data block based on performance of an atomic operation associated with the second graphics processing unit.  However, Vij teaches wherein the second graphics processing unit processes the first portion of the second data block based on performance of an atomic operation associated with the second graphics processing unit (once atomic operations have been determined, allocate various atomic operations among GPUs, determine which GPUs should receive which atomic operations based on the characteristics of each GPU, including memory capacity and bandwidth, GPU core- and thread-count, and GPU processing speed, col. 9, lines 4-20; chunking operations may also be determined based on the determined allocation of atomic operations during the operation localization stage 520, col. 10, lines 12-14; chunking the data—i.e., dividing up the data on which the calculation is to be performed so as to increase the degree to which each GPU core is provided with a steady stream of data for performing its constituent operations, col. 3, lines 5-9).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zou so that the second graphics processing unit processes the first portion of the second data block based on performance of an atomic operation .
19.	Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (US 20160321776A1) in view of Dijkman (US 20160239706A1) and Seide (US009477925B2).
	Zou is relied upon for the teachings as discussed above relative to Claim 1.
	However, Zou does not teach wherein the second graphics processing unit processes the first portion of the second data block based on an amount of data included in the second data block.  However, the combination of Zou and Dijkman teaches wherein the second graphics processing unit processes the second data block by dividing the second data block, as discussed in the rejection for Claim 2.
	However, Zou and Dijkman do not expressly teach wherein the second graphics processing unit processes the first portion of the second data block based on an amount of data included in the second data block.  However, Seide teaches dividing the data based on the amount of data (partition the training data into batches 128 according to the batch size, in which each batch is designed to optimize the tradeoff between computation accuracy and execution efficiency, col. 13, lines 14-17), and each of the graphics processing units processes a batch to perform machine learning (processing of a reasonably sized batch of sample frames with respect to the DNNs 112 may translate into the gathering and redistribution of 400 megabyte (MB) worth of gradients and another 400 MB of model parameters by each of the multi-core processors 108(1)-108(N), col. 8, lines 23-28; parallelize the training of the DNNs across multiple multi-core processors, such as multiple general-purpose graphics processing units, col. 2, lines 51-54).  Since the combination of Zou and Dijkman teaches wherein the second graphics processing unit processes the second data block by dividing the second data block, this teaching 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zou and Dijkman so that the second graphics processing unit processes the first portion of the second data block based on an amount of data included in the second data block because Seide suggests that this optimizes the tradeoff between computation accuracy and execution efficiency (col. 13, lines 14-17).
20.	Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zou (US 20160321776A1) in view of Buck (US 20050125369A1).
	Zou is relied upon for the teachings as discussed above relative to Claim 1.  Zou teaches that the central processing unit processes the machine learning model based on the first information provided by the first graphics processing unit and the second information provided by the second graphics processing unit, as discussed in the rejection for Claim 9.
	However, Zou does not teach wherein the machine learning model is associated with a stochastic gradient descent process, and wherein the first information provided by the first graphics processing unit and the second information provided by the second graphics processing unit facilitate improved processing performance for the central processing unit during the stochastic gradient descent process.  However, Buck teaches that the GPU runs the various shaders needed to process the machine learning technique [0063].  By way of example, when training two-layer neural networks, the forward and backpropagation correspond to about twenty different shaders.  For stochastic gradient descent, the learning parameters are updated after processing each pattern in the group [0064].  Testing requires very fast response time.  In 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Zou so that the machine learning model is associated with a stochastic gradient descent process, and wherein the first information provided by the first graphics processing unit and the second information provided by the second graphics processing unit facilitate improved processing performance for the central processing unit during the stochastic gradient descent process as suggested by Buck.  It is well-known in the art that stochastic gradient descent has the advantage of economizing on the computational cost at every iteration by sampling a subset of summand functions at every step.
Allowable Subject Matter
21.	Claims 12-20 are rejected under double patenting, but would be allowable if terminal disclaimers are filed.
The following is a statement of reasons for the indication of allowable subject matter:  
22.	The claims contain allowable subject matter for similar reasons that the claims for U.S. Patent No. 11,023,560 were allowed.
23.	The Examiner makes note that Applicant’s disclosure describes “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se” ([0097], p. 40).  Thus, Claims 17-20 are directed to statutory subject matter.
24.	One prior art (Clinchant US 20130204885A1) teaches first term-document matrix for the documents in the first natural language and generating a second term-document matrix for the documents in the second natural language, applying a co-clustering algorithm to both first and second term-document matrices to generate K-dimensional word vectors in the same K-dimensional space corresponding to the text words in both the first and second term-document matrices, and the learning of a probabilistic topic model comprises learning a probabilistic model using sets or sequences of word vectors representing a set of documents of the first language but not the second language, wherein the learned probabilistic topic model operates to assign probabilities for topics of the probabilistic topic model to an input set or sequence of K-dimensional embedded word vectors generated by applying the set of word embedding transforms to a document in the first language, or the second language, or a combination of the first and second languages (Claim 7 of Clinchant).  However, Clinchant does not teach minimizing error, by the system, associated with missing data of the data matrix based on collaboratively processing, by the first graphics processing unit, the second graphics processing 
25.	Another prior art (Dalal US 20120041769A1) teaches wherein the data matrix corresponds to a defined rating matrix [0106].  However, Dalal does not teach minimizing error, by the system, associated with missing data of the data matrix based on collaboratively processing, by the first graphics processing unit, the second graphics processing unit and a central processing, to minimize error of a cost function associated with a machine learning model for the matrix factorization system.
26.	Another prior art (Jacobs US006009437A) teaches when some points are not visible in some images, Q will contain missing entries for these points.  We can find the affine structure and motion that minimizes error by finding the Q matrix that is closest to Q where we not compare only those elements actually present in Q to the corresponding elements of Q.  The problem of finding structure from motion in the presence of missing data, then, depends of finding the nearest rank three matrix to a data matrix with missing elements (col. 4, lines 17-31).  However, Jacobs does not teach minimizing error, by the system, associated with missing data of the data matrix based on collaboratively processing, by the first graphics processing unit, the second graphics processing unit and a central processing, to minimize error of a cost function associated with a machine learning model for the matrix factorization system.
27.	Another prior art (Pilaszy US 20120030159A1) teaches using collaborative filtering to obtain information from users to create a user profile for each user, one of the widely used techniques is matrix factorization, which is an efficient tool to make personalized recommendations of items, such as products, services or media content [0012].  However, Pilaszy does not teach facilitation of a synchronization stage matrix factorization, assign a first .
Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
1.	Clinchant (US 20130204885A1) teaches learning a probabilistic model using sets or sequences of word vectors representing a set of documents (Claim 7 of Clinchant).
2.	Dalal (US 20120041769A1) teaches wherein the data matrix corresponds to a defined rating matrix [0106].  
3.	Jacobs (US006009437A) teaches when some points are not visible in some images, Q will contain missing entries for these points.  We can find the affine structure and motion that minimizes error by finding the Q matrix that is closest to Q where we not compare only those elements actually present in Q to the corresponding elements of Q.  The problem of finding 
4.	Pilaszy (US 20120030159A1) teaches using collaborative filtering to obtain information from users to create a user profile for each user, one of the widely used techniques is matrix factorization, which is an efficient tool to make personalized recommendations of items, such as products, services or media content [0012].  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 





JH
/JONI HSU/Primary Examiner, Art Unit 2611