DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is responsive to the Amendment filed on 10/05/2021. Claims 1-20 are pending in the case. Claims 1 and 12 are independent claims.

Response to Arguments
Applicant's prior art arguments have been fully considered but are not persuasive. Specifically:
Applicant argues that the personal information included in the collected information about a context of an electronic apparatus is not taught by the cited references. (Page 12, paragraph 3 of the filed Amendment). This is incorrect. McMahan specifically discusses that the data derived through user interaction on the user device may have a private nature (see paragraph 31 of McMahan). Indeed the privacy aspect of the data, like the instant application, drives nature of the distributed learning (see paragraph 14 of McMahan, “thereby… maintaining user privacy”).
Applicant further argues that the local model is not refined based on a second gradient received from an external apparatus is not taught by the combined references. (Page 12, paragraph 3 of the filed Amendment). This is incorrect. Applicant seems to be taking a piecemeal approach to the references. McMahan teaches receiving a second gradient from an external apparatus, used to refine a second model stored on the external apparatus (see McMahan at figure 2, receive gradient 112. Paragraph 40, determining, by the user device, a local update based at least in part on the received gradient update). While in McMahan this gradient is received from the server 100, Feng contemplates a more hierarchical federated learning technique where the gradient is received from a node lower in the tree of communicative nodes in figure 1(b) on page 2 
Applicant lastly argues that the local model is not refined based on information about a global model received from a server having the global model. (Page 12, paragraph 3 of the filed Amendment). This is incorrect. Both McMahan and Feng clearly teach a local model getting updated based on information about a global model received from a server having the global model (see for example McMahan at figure 2, boxes 112 and 114).
Therefore, Examiner respectfully asserts that the cited art sufficiently teaches the limitations recited in the amended claims.

Claim Objections
Claim 15 is objected to because it recites “the electronic apparatus of claim 11,” but claim 11 is directed towards a method. It is believed that this is a typographical error and that the dependent reference was intended to be to claim 12. Additionally, the “obtain” element of the claim ends with the word “for” (e.g., “obtain at least one representative value regarding corresponding information between information related to a loss function included in a gradient for the refined global model and information related to a loss function included in the first gradient for”) which appears to be grammatically insufficient. Appropriate correction is required.

Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory 

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant are advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.

Claims 1-10 and 12-20 are rejected under 35 U.S.C. § 103 as being unpatentable over McMahan et al. (U.S. Pat. App. Pub. No. 2017/0109322, hereinafter McMahan) in view of Feng et al. (Feng, Shaohan, Dusit Niyato, Ping Wang, Dong In Kim, and Ying-Chang Liang. "Joint Service Pricing and Cooperative Relay Communication for Federated Learning." arXiv e-prints (2018): arXiv-1811, hereinafter Feng).

claim 1, McMahan teaches:
A method, performed by an electronic apparatus, of refining an artificial intelligence (AI) model, the method comprising (Title):
collecting information about a context of an electronic apparatus including personal information of a user of the electronic apparatus (Paragraph 13, the data examples may be generated, for instance, through interaction of a user with the user device. In this manner, the local update can correspond to a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. Paragraph 33, training data 308 may be privacy sensitive. Paragraph 31, training data can be any data derived through a user interaction with a user device 302);
refining a first local model stored in the electronic apparatus based on the collected information about the context (Paragraph 13, determine one or more local updates to the model using data stored on the respective computing devices);
determining a first gradient based on a difference between the refined first local model and the first local model before being refined (Paragraph 37, determining, by a user device, a local gradient based on one or more local data examples. In particular, the local gradient can be determined for a loss function using the one or more data examples. Figure 2, box 102. Paragraph 41, the local update can be determined based at least in part using one or more stochastic updates or iterations);…
refining the refined first local model based on [another received]… gradient (Paragraph 40, determining, by the user device, a local update based at least in part on the received gradient update);
transmitting at least one of the first gradient and the second gradient to a server having a global model (Figure 2, provide 104 local gradient to server. Paragraph 14, global model can be updated based at least in part on the received local updates. See also Feng at Page 2, figure 1(b));
receiving, from the server, information about the global model refined based on at least one of the first and the second gradients having a global model (Figure 2, determine 108 global gradient, and receive 112 global gradient. Paragraph 38, at (108), method (100) can include determining, by the server, a global gradient based at least in part on the received local gradient. Paragraph 39, at (112), method (100) can include receiving the global gradient. Figure 3, client devices 230 to server 210 via network 240); and
further refining the refined first local model based on the received information about the model (Figure 2, box 114 et seq. Paragraph 40, at (114), method (100) can include determining, by the user device, a local update. In a particular implementation, the local update can be determined based at least in part on the global update).
While McMahan teaches receiving a second gradient from an external apparatus, used to refine a second model stored on the external apparatus (figure 2, receive gradient 112), McMahan does not appear to expressly teach receiving a second gradient from an external apparatus, used to refine a second local model stored on the external apparatus; and the second gradient.
Feng teaches receiving a second gradient from an external apparatus, used to refine a second local model stored on the external apparatus (Page 2, figure 1(b). Page 3, right column, paragraph 1, received model update from another mobile device. Page 1, right column, paragraph 1, the mobile devices perform computation of model training locally on their training data); and the second gradient (Page 2, figure 1(b). Page 3, right column, paragraph 1, for each received model update from another 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the distributed machine learning of McMahan to include the relay federated learning techniques of Feng to allow indirect communication between nodes in learning, increase energy efficiency, and mobile wireless availability (see Feng at abstract).

As to dependent claim 2, McMahan teaches:
the determining of the first gradient comprises: obtaining prediction information outputtable by the first local model and observation information indicating an answer to the prediction information based on the changed information about the context (Paragraph 13, a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. The model may be implemented in solving one or more problems, such as predictive typing, predictive image sharing, image classification, voice recognition, next-word-prediction, and/or various other suitable problems relating to use of the user device. Paragraph 37, a local gradient based on one or more local data examples);
obtaining a loss function indicating a difference between the observation information and the prediction information (Paragraph 37, the local gradient can be determined for a loss function using the one or more data examples); and
determining the first gradient including information related to a point where a value of the loss function is lowest (Paragraph 41, stochastic gradient descent techniques).

claim 3, McMahan teaches the information about the global model comprises a gradient including information related to a point where a value of a loss function for the global model used to refine the global model is lowest (Paragraph 41, stochastic gradient descent techniques).

As to dependent claim 4, McMahan teaches:
the further refining of the local model, based on the received information, comprises: obtaining at least one representative value regarding corresponding information between information related to a loss function included in a gradient for the refined global model and information related to a loss function included in the first gradient (Paragraph 14, the local update can be a gradient vector. Paragraph 32, the local updates can be a gradient vector associated with the model. For instance, user devices 302 can determine a gradient (e.g. an average gradient) associated with the model based at least in part on training data 308 respectively stored on user devices 302); and
refining the refined first local model based on the at least one representative value (Figure 2, box 114).

As to dependent claim 5, McMahan teaches at least one value of the at least one representative value includes an arithmetic average value or weighted average value regarding the corresponding information (Paragraph 14, determining a weighted average of the received local updates).

As to dependent claim 6, McMahan teaches:
the weighted average value is obtained based on at least one weight value being applied to the corresponding information (Paragraph 29, the server can then aggregate the data, for instance, by determining a weighted average), and
a weight value applied to the information related to the loss function included in the first gradient, among the at least one weight value, is determined based on the information about the context of the electronic apparatus (Paragraph 29, for instance, the user devices may determine an updated version of the model (e.g. using one or more stochastic gradient descent techniques) using local data. The server can then determine a weighted average of the resulting models to determine a global update to the model).

As to dependent claim 7, McMahan teaches the global model is refined based on at least one gradient for each local model for at least one external apparatus based on each local model being refined, and further refined based on a gradient of the electronic apparatus based on the gradient being received from the electronic apparatus (Paragraph 29, the user devices can be configured to provide the determined gradients to the server, as part of the local updates. The server can then aggregate the gradients to determine a global model update).

As to dependent claim 8, McMahan teaches the first gradient is transmitted to the server based on being subjected to at least one operation among an operation of adding noise and an operation of performing encoding (Paragraph 14, the local update can be a gradient vector. Paragraph 32, the local updates can be a gradient vector associated with the model. For instance, user devices 302 can determine a gradient (e.g. an average gradient) associated with the model based at least in part on training data 308 respectively stored on user devices 302).

As to dependent claim 9, McMahan teaches:
identifying a relay apparatus for transmitting the first gradient of the first local model from the electronic apparatus (Paragraph 52, the client device 230 can also include a network interface 
transmitting the first gradient of the first local model to the server via the relay apparatus (Figure 3, network 240. Paragraph 30, sever 304 can be configured to communicate with user devices 302 over one or more networks, such as network 240 of FIG. 3. Figure 2, boxes 104 to 106),
wherein the relay apparatus is configured to receive at least one gradient from at least one electronic apparatus and to transmit the received at least one gradient to the server (Paragraph 49, the server 210 can exchange data with one or more client devices 230 over the network 240).

As to dependent claim 10, McMahan teaches:
receiving a gradient for a current global model from the server in response to presence of a difference between the gradient for the current global model and the gradient for the global model previously transmitted from the server to the electronic apparatus, the difference being equal to or greater than a reference value (Paragraph 21, this can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges). The threshold can be determined based at least in part on a desired accuracy of the global model); and
refining the refined first local model based on the received gradient (Figure 2, box 114).

claim 12, McMahan teaches:
An electronic apparatus configured to refine an artificial intelligence (Al) model, the electronic apparatus comprising (Title):
a memory storing a local model (Figure 3, memory 234);
at least one processor configured to: collect information about a context of the electronic apparatus including personal information of a user of the electronic apparatus, refine a first local model stored in the electronic apparatus based on the collected information about the context, and determine a first gradient based on difference between the refined first local model and the first local model before being refined (Figure 3, processor 232. Paragraph 13, the data examples may be generated, for instance, through interaction of a user with the user device. In this manner, the local update can correspond to a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. Paragraph 33, training data 308 may be privacy sensitive. Paragraph 31, training data can be any data derived through a user interaction with a user device 302. Paragraph 13, determine one or more local updates to the model using data stored on the respective computing devices. Paragraph 37, determining, by a user device, a local gradient based on one or more local data examples. In particular, the local gradient can be determined for a loss function using the one or more data examples. Figure 2, box 102. Paragraph 41, the local update can be determined based at least in part using one or more stochastic updates or iterations); and
a communicator comprising communication circuitry configured to:… transmit at least one of the first gradient and the second gradient to a server having a global model, and 
wherein the at least one processor is further configured to refine the refined first local model based on [another received]… gradient and refine the refined first local model based on the received information about the refined global model (Paragraph 40, determining, by the user device, a local update based at least in part on the received gradient update. Figure 2, box 114 et seq. Paragraph 40, at (114), method (100) can include determining, by the user device, a local update. In a particular implementation, the local update can be determined based at least in part on the global update).
While McMahan teaches receiving a second gradient from an external apparatus, used to refine a second model stored on the external apparatus (figure 2, receive gradient 112), McMahan does not appear to expressly teach receive a second gradient from an external apparatus used to refine a second local model stored on the external apparatus; and the second gradient.
Feng teaches receive a second gradient from an external apparatus used to refine a second local model stored on the external apparatus (Page 2, figure 1(b). Page 3, right column, paragraph 1, received model update from another mobile device. Page 1, right column, paragraph 1, the mobile devices perform computation of model training locally on their training data); and the second gradient (Page 2, figure 1(b). Page 3, right column, paragraph 1, for each received model update from another mobile 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the distributed machine learning of McMahan to include the relay federated learning techniques of Feng to allow indirect communication between nodes in learning, increase energy efficiency, and mobile wireless availability (see Feng at abstract).

As to dependent claim 13, McMahan teaches obtain prediction information outputtable by the first local model and observation information indicating an answer to the prediction information based on the changed information about the context, obtain a loss function indicating a difference between the observation information and the prediction information, and determine the first gradient including information related to a point where a value of the loss function is lowest (Paragraph 13, a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. The model may be implemented in solving one or more problems, such as predictive typing, predictive image sharing, image classification, voice recognition, next-word-prediction, and/or various other suitable problems relating to use of the user device. Paragraph 37, a local gradient based on one or more local data examples).

As to dependent claim 14, McMahan teaches the information about the global model comprises a gradient including information related to a point where a value of a loss function for the global model used to refine the global model is lowest (Paragraph 41, stochastic gradient descent techniques).

claim 15, McMahan teaches:
obtain at least one representative value regarding corresponding information between information related to a loss function included in a gradient for the refined global model and information related to a loss function included in the first gradient (Paragraph 14, the local update can be a gradient vector. Paragraph 32, the local updates can be a gradient vector associated with the model. For instance, user devices 302 can determine a gradient (e.g. an average gradient) associated with the model based at least in part on training data 308 respectively stored on user devices 302); and
refine the refined first local model based on the at least one representative value (Figure 2, box 114).

As to dependent claim 16, McMahan teaches:
at least one value of the at least one representative value includes a weighted average value regarding the corresponding information (Paragraph 14, determining a weighted average of the received local updates),
the weighted average value being obtained based on at least one weight value being applied to the corresponding information (Paragraph 29, the server can then aggregate the data, for instance, by determining a weighted average), and
a weight value applied to the information related to the loss function included in the first gradient, among the at least one weight value, is determined based on the information about the context of the electronic apparatus (Paragraph 29, for instance, the user devices may determine an updated version of the model (e.g. using one or more stochastic gradient descent techniques) using local data. The server can then determine a weighted average of the resulting models to determine a global update to the model).

As to dependent claim 17, McMahan teaches the global model is refined based on at least one gradient for each local model for a plurality of external apparatus based on each local model being refined, and further refined based on a gradient of the electronic apparatus based on the gradient being received from the electronic apparatus (Paragraph 29, the user devices can be configured to provide the determined gradients to the server, as part of the local updates. The server can then aggregate the gradients to determine a global model update).

As to dependent claim 18, McMahan teaches:
identify a relay apparatus configured to transmit the first gradient of the first local model from the electronic apparatus, and control the communicator to transmit the first gradient of the first local model to the server via the relay apparatus (Paragraph 52, the client device 230 can also include a network interface used to communicate with one or more remote computing devices (e.g. server 210) over the network 240. The network interface can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components. Figure 3, network 240. Paragraph 30, sever 304 can be configured to communicate with user devices 302 over one or more networks, such as network 240 of FIG. 3. Figure 2, boxes 104 to 106), and
wherein the relay apparatus is configured to receive at least one gradient from at least one electronic apparatus and to transmit the received at least one gradient to the server (Paragraph 49, the server 210 can exchange data with one or more client devices 230 over the network 240).

claim 19, McMahan teaches control the communicator to receive a gradient for a current global model from the server in response to a presence of a difference between the gradient for the current global model and the gradient for the global model previously transmitted from the server to the electronic apparatus, the difference being equal to or greater than a reference value, and refine the refined first local model based on the received gradient (Paragraph 21, this can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges). The threshold can be determined based at least in part on a desired accuracy of the global model).

As to dependent claim 20, McMahan teaches a computer program product comprising a non-transitory computer-readable recording medium having recorded thereon a program for implementing the method of claim 1 (Paragraphs 5, 47, and 50).

Claim 11 is rejected under 35 U.S.C. § 103 as being unpatentable over McMahan in view of Feng and Lin et al. (U.S. Pat. App. Pub. No. 2015/0135186, hereinafter Lin).

As to dependent claim 11, the rejection of claim 1 is incorporated.
McMahan teaches based on detecting that the collected information about the context of the electronic apparatus is changed, refining of the first local model comprises refining the first local model (Paragraph 13, the data examples may be generated, for instance, through interaction of a user with the user device. In this manner, the local update can correspond to a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. Paragraph 37, determining, by a user device, a local gradient based on one or more local data 
McMahan as modified by Feng does not appear to expressly teach based on at least one of whether the electronic apparatus is currently in an idle state, whether a memory space of the electronic apparatus for refining the local model is sufficient, whether a battery of the electronic apparatus is currently being charged, or whether a current time is midnight.
Lin teaches based on at least one of whether the electronic apparatus is currently in an idle state, whether a memory space of the electronic apparatus for refining the local model is sufficient, whether a battery of the electronic apparatus is currently being charged, or whether a current time is midnight (Paragraph 51, scheduling a task for an idle period. In more details, the task scheduling module 106 sets a plurality of idle time intervals of the computing devices 112a-112c according to the first processing schedule and the loading data. And the task scheduling module 106 compares a time length of one of the idle time intervals with processing time lengths of the tasks 560, 562 and 564 so as to search for target tasks, in which a processing time length of the target task is smaller than the time length of the one of the idle time intervals. In addition, the processing time length of the target task is the closest to the time length of the one of the idle time intervals).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the distributed machine learning of McMahan as modified by Feng to include the task scheduling of Lin such that processing resources are used efficiently (see Lin at paragraph 8).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866 217 9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800 786 9199 (IN USA OR CANADA) or 571 272 1000.
/CRG/Examiner, Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123