DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is responsive to the Request for Continued Examination filed on 03/10/2021, which refers to the Amendment filed on 02/22/2021. Claims 1-20 are pending in the case. Claims 1 and 12 are independent claims.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/22/2021 has been entered.
 
Response to Arguments
Applicant's amendments to claims 1 and 12 and arguments regarding the double patenting rejections have been fully considered and are persuasive. Accordingly, these rejections are hereby withdrawn.
Applicant's prior art arguments have been fully considered but are moot in view of the new grounds of rejection presented below.

Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory 

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant are advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.

Claims 1-10 and 12-20 are rejected under 35 U.S.C. § 103 as being unpatentable over McMahan et al. (U.S. Pat. App. Pub. No. 2017/0109322, hereinafter McMahan) in view of Feng et al. (Feng, Shaohan, Dusit Niyato, Ping Wang, Dong In Kim, and Ying-Chang Liang. "Joint Service Pricing and Cooperative Relay Communication for Federated Learning." arXiv e-prints (2018): arXiv-1811, hereinafter Feng).

claim 1, McMahan teaches:
A method, performed by an electronic apparatus, of refining an artificial intelligence (AI) model, the method comprising (Title):
detecting information about a context of an electronic apparatus used to refine a local model stored in the electronic apparatus being changed (Paragraph 13, the data examples may be generated, for instance, through interaction of a user with the user device. In this manner, the local update can correspond to a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device);
determining a first gradient for refining the local model based on the changed information about the context (Paragraph 37, determining, by a user device, a local gradient based on one or more local data examples. In particular, the local gradient can be determined for a loss function using the one or more data examples. Figure 2, box 102);…
receiving, from the server, information about a global model refined based on the first and the second gradients (Figure 2, box 112. Figure 3, client devices 230 to server 210 via network 240. Where the first and second gradients are local gradients from different client devices 230);…
further refining the local model based on the received information about the model (Figure 2, box 114 et seq.).
McMahan does not appear to expressly teach receiving a second gradient from an external apparatus determined for refining a local model stored on the external apparatus; refining the local model based on the determined first gradient and the second gradient; transmitting the first gradient 
Feng teaches receiving a second gradient from an external apparatus determined for refining a local model stored on the external apparatus (Page 2, figure 1(b). Page 3, right column, paragraph 1, received model update from another mobile device); refining the local model based on the determined first gradient and the second gradient (Page 2, figure 1(b). Page 3, right column, paragraph 1, for each received model update from another mobile device, mobile device i needs to spend the time of Tai for combining the received model update with its own model update by using the average operator); transmitting the first gradient and the second gradient to a server (Page 2, figure 1(b)); and transmitting the received information about the global model to the external apparatus (Page 2, figure 1(b)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the distributed machine learning of McMahan to include the relay federated learning techniques of Feng to allow indirect communication between nodes in learning, increase energy efficiency, and mobile wireless availability (see Feng at abstract).

As to dependent claim 2, McMahan teaches:
the determining of the first gradient comprises: obtaining prediction information outputtable by the local model and observation information indicating an answer to the prediction information based on the changed information about the context (Paragraph 37, a local gradient based on one or more local data examples);
obtaining a loss function indicating a difference between the observation information and the prediction information (Paragraph 37, the local gradient can be determined for a loss function using the one or more data examples); and
determining the first gradient including information related to a point where a value of the loss function is lowest (Paragraph 41, stochastic gradient descent techniques).

As to dependent claim 3, McMahan teaches the information about the global model comprises a gradient including information related to a point where a value of a loss function for the global model used to refine the global model is lowest (Paragraph 41, stochastic gradient descent techniques).

As to dependent claim 4, McMahan teaches:
the refining of the local model, based on the received information, comprises: obtaining at least one representative value regarding corresponding information between information related to a loss function included in a gradient for the refined global model and information related to a loss function included in the gradient for the refined local model (Paragraph 14, the local update can be a gradient vector. Paragraph 32, the local updates can be a gradient vector associated with the model. For instance, user devices 302 can determine a gradient (e.g. an average gradient) associated with the model based at least in part on training data 308 respectively stored on user devices 302); and
refining the local model based on the at least one representative value (Figure 2, box 114).

As to dependent claim 5, McMahan teaches at least one value of the at least one representative value includes an arithmetic average value or weighted average value regarding the corresponding information (Paragraph 14, determining a weighted average of the received local updates).

claim 6, McMahan teaches:
the weighted average value is obtained based on at least one weight value being applied to the corresponding information (Paragraph 29, the server can then aggregate the data, for instance, by determining a weighted average), and
a weight value applied to the information related to the loss function included in the gradient for the refined local model, among the at least one weight value, is determined based on the information about the context of the electronic apparatus (Paragraph 29, for instance, the user devices may determine an updated version of the model (e.g. using one or more stochastic gradient descent techniques) using local data. The server can then determine a weighted average of the resulting models to determine a global update to the model).

As to dependent claim 7, McMahan teaches the global model is refined based on at least one gradient for each local model for at least one external apparatus based on each local model being refined, and further refined based on a gradient of the electronic apparatus based on the gradient being received from the electronic apparatus (Paragraph 29, the user devices can be configured to provide the determined gradients to the server, as part of the local updates. The server can then aggregate the gradients to determine a global model update).

As to dependent claim 8, McMahan teaches the first gradient is transmitted to the server based on being subjected to at least one operation among an operation of adding noise and an operation of performing encoding (Paragraph 14, the local update can be a gradient vector. Paragraph 32, the local updates can be a gradient vector associated with the model. For instance, user devices 302 can determine a gradient (e.g. an average gradient) associated with the model based at least in part on training data 308 respectively stored on user devices 302).

As to dependent claim 9, McMahan teaches:
identifying a relay apparatus for transmitting the first gradient of the local model from the electronic apparatus (Paragraph 52, the client device 230 can also include a network interface used to communicate with one or more remote computing devices (e.g. server 210) over the network 240. The network interface can include any suitable components for interfacing with one more networks, including for example, transmitters, receivers, ports, controllers, antennas, or other suitable components); and
transmitting the first gradient of the local model to the server via the relay apparatus (Figure 3, network 240. Paragraph 30, sever 304 can be configured to communicate with user devices 302 over one or more networks, such as network 240 of FIG. 3. Figure 2, boxes 104 to 106),
wherein the relay apparatus is configured to receive at least one gradient from at least one electronic apparatus and to transmit the received at least one gradient to the server (Paragraph 49, the server 210 can exchange data with one or more client devices 230 over the network 240).

As to dependent claim 10, McMahan teaches:
receiving a gradient for a current global model from the server in response to presence of a difference between the gradient for the current global model and the gradient for the global model previously transmitted from the server to the electronic apparatus, the difference being equal to or greater than a reference value (Paragraph 21, this can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges). The threshold can be determined based at least in part on a desired accuracy of the global model); and
refining the local model based on the received gradient (Figure 2, box 114).

As to independent claim 12, McMahan teaches:
An electronic apparatus configured to refine an artificial intelligence (Al) model, the electronic apparatus comprising (Title):
a memory storing a local model (Figure 3, memory 234);
at least one processor configured to (Figure 3, processor 232):
detect information about a context of the electronic apparatus used to refine the local model being changed, determine a first gradient for refining the local model based on the changed information about the context, and refining the local model based on the determined first gradient (Paragraph 13, the data examples may be generated, for instance, through interaction of a user with the user device. In this manner, the local update can correspond to a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. Paragraph 37, determining, by a user device, a local gradient based on one or more local data examples. In particular, the local gradient can be determined for a loss function using the one or more data examples. Figure 2, box 102. Paragraph 32, the local update can include an updated version of model); and
a communicator comprising communication circuitry configured to:… receive, from the server, information about a global model refined based on the first and second gradients (Figure 3, network 240. Paragraph 46, the network interface can include any suitable components for interfacing with one more networks, 
McMahan does not appear to expressly teach receive a second gradient from an external apparatus determined for refining a local model stored on the external apparatus, transmit the first gradient and the second gradient to a server,… and transmit the received information about the global model to the external apparatus; and wherein the at least one processor is further configured to refine the local model based on the determined first gradient and the second gradient and refine the local model based on the received information about the refined global model.
Feng teaches receive a second gradient from an external apparatus determined for refining a local model stored on the external apparatus, transmit the first gradient and the second gradient to a server,… and transmit the received information about the global model to the external apparatus (Page 2, figure 1(b). Page 3, right column, paragraph 1, for each received model update from another mobile device, mobile device i needs to spend the time of Tai for combining the received model update with its own model update by using the average operator); and wherein the at least one processor is further configured to refine the local model based on the determined first gradient and the second gradient and refine the local model based on the received information about the refined global model (Page 2, figure 1(b)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the distributed machine learning of McMahan to include the relay federated learning techniques of Feng to allow indirect communication between nodes in learning, increase energy efficiency, and mobile wireless availability (see Feng at abstract).

claim 13, McMahan teaches obtain prediction information outputtable by the local model and observation information indicating an answer to the prediction information based on the changed information about the context, obtain a loss function indicating a difference between the observation information and the prediction information, and determine the first gradient including information related to a point where a value of the loss function is lowest (Paragraph 37, a local gradient based on one or more local data examples. The local gradient can be determined for a loss function using the one or more data examples. Paragraph 41, stochastic gradient descent techniques).

As to dependent claim 14, McMahan teaches the information about the global model comprises a gradient including information related to a point where a value of a loss function for the global model used to refine the global model is lowest (Paragraph 41, stochastic gradient descent techniques).

As to dependent claim 15, McMahan teaches:
obtain at least one representative value regarding corresponding information between information related to a loss function included in a gradient for the refined global model and information related to a loss function included in the gradient for the refined local model (Paragraph 14, the local update can be a gradient vector. Paragraph 32, the local updates can be a gradient vector associated with the model. For instance, user devices 302 can determine a gradient (e.g. an average gradient) associated with the model based at least in part on training data 308 respectively stored on user devices 302); and
refine the local model based on the at least one representative value (Figure 2, box 114).

claim 16, McMahan teaches:
at least one value of the at least one representative value includes a weighted average value regarding the corresponding information (Paragraph 14, determining a weighted average of the received local updates),
the weighted average value being obtained based on at least one weight value being applied to the corresponding information (Paragraph 29, the server can then aggregate the data, for instance, by determining a weighted average), and
a weight value applied to the information related to the loss function included in the gradient for the refined local model, among the at least one weight value, is determined based on the information about the context of the electronic apparatus (Paragraph 29, for instance, the user devices may determine an updated version of the model (e.g. using one or more stochastic gradient descent techniques) using local data. The server can then determine a weighted average of the resulting models to determine a global update to the model).

As to dependent claim 17, McMahan teaches the global model is refined based on at least one gradient for each local model for a plurality of external apparatus based on each local model being refined, and further refined based on a gradient of the electronic apparatus based on the gradient being received from the electronic apparatus (Paragraph 29, the user devices can be configured to provide the determined gradients to the server, as part of the local updates. The server can then aggregate the gradients to determine a global model update).

As to dependent claim 18, McMahan teaches:
identify a relay apparatus configured to transmit the first gradient of the local model from the electronic apparatus, and control the communicator to transmit the first gradient of the local 
wherein the relay apparatus is configured to receive at least one gradient from at least one electronic apparatus and to transmit the received at least one gradient to the server (Paragraph 49, the server 210 can exchange data with one or more client devices 230 over the network 240).

As to dependent claim 19, McMahan teaches control the communicator to receive a gradient for a current global model from the server in response to a presence of a difference between the gradient for the current global model and the gradient for the global model previously transmitted from the server to the electronic apparatus, the difference being equal to or greater than a reference value, and refine the local model based on the received gradient (Paragraph 21, this can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges). The threshold can be determined based at least in part on a desired accuracy of the global model).

As to dependent claim 20, McMahan teaches a computer program product comprising a non-transitory computer-readable recording medium having recorded thereon a program for implementing the method of claim 1 (Paragraphs 5, 47, and 50).

Claim 11 is rejected under 35 U.S.C. § 103 as being unpatentable over McMahan in view of Feng and Lin et al. (U.S. Pat. App. Pub. No. 2015/0135186, hereinafter Lin).

As to dependent claim 11, the rejection of claim 1 is incorporated.
McMahan teaches based on detecting that the information about the context of the electronic apparatus is changed, refining the local model (Paragraph 13, the data examples may be generated, for instance, through interaction of a user with the user device. In this manner, the local update can correspond to a model using data generated through use of the user device by the user. For instance, the data examples may include, without limitation, image files, video files, inputted text, or data indicative of other interactions by the user with the user device. Paragraph 37, determining, by a user device, a local gradient based on one or more local data examples. In particular, the local gradient can be determined for a loss function using the one or more data examples. Figure 2, box 102. Paragraph 32, the local update can include an updated version of model).
McMahan as modified by Feng does not appear to expressly teach based on at least one of whether the electronic apparatus is currently in an idle state, whether a memory space of the electronic apparatus for refining the local model is sufficient, whether a battery of the electronic apparatus is currently being charged, or whether a current time is midnight.
Lin teaches based on at least one of whether the electronic apparatus is currently in an idle state, whether a memory space of the electronic apparatus for refining the local model is sufficient, whether a battery of the electronic apparatus is currently being charged, or whether a current time is midnight (Paragraph 51, scheduling a task for an idle period. In more details, the task scheduling module 106 sets a plurality of idle time intervals of the computing devices 112a-112c according to the first processing schedule and the loading data. And the task scheduling module 106 compares a time length of one of the idle time intervals with processing time lengths of the tasks 560, 562 and 564 so as to 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the distributed machine learning of McMahan as modified by Feng to include the task scheduling of Lin such that processing resources are used efficiently (see Lin at paragraph 8).

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).Any inquiry concerning this communication or earlier communications from the examiner should be directed to Casey R. Garner whose telephone number is 571-272-2467. The examiner can normally be reached on Monday to Friday, 8am to 5pm, Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/Casey R. Garner/Examiner, Art Unit 2123