DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1-19
The following claims are amended: 1-3, 6-8, and 11-13
The following claims are new: 16-19
The following claims are cancelled: None
The following claims are rejected: 1-19
Response to Arguments
Applicant’s arguments filed amendments on 09/08/2021 to address the 35 U.S.C. 112(b) rejection. In response to the Applicant’s amendments, the 35 U.S.C. 112(b) rejection still stands. The disclosure makes no mention of a “rating value” and a “highest rating value”, thus it is not entirely clear what these two phrases are referring to precisely. Applicant’s amendment attempts to further define what the rating value is, however it is still not entirely obvious or clear what both the “rating value” and “highest rating value” are precisely as the specification does not mention either. Applicant is respectfully requested to make of record paragraphs of the disclosure referring to the “rating value” and “highest rating value”.
Applicant’s arguments with respect to claims 1-19 regarding the U.S.C. 102 and 103 rejections have been considered but are moot. Additionally, Applicants arguments are high 
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 3-5, 8-10, and 13-16 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The specification does not have a written description of what the “rating value” and “the highest rating value” are. Applicant’s 
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 3-5, 8-10, and 13-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites “a rating value” and “a highest rating value”. It is not clear what “ a rating value” and “a highest rating value” is referring to exactly. The specification does not make any reference to “a rating value” or “a highest rating value”. For the purpose of prior art examination, Examiner is interpreting “a rating value” and “a highest rating value” to be a Q-value. Claims 4-5, and 16 do not cure the deficiencies of claim 3, therefore claims 4-5 and 16 are also rejected under U.S.C. 112(b). Claim 8 and 13 are rejected for the reason under U.S.C. 112(b), and subsequently claims that depend from claim 8 and 13 are also rejected under U.S.C. 112(b) by the virtue of their dependency.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Q-Learning Based Power Control Algorithm for D2D Communication to Nie, et al. (hereinafter, “Nie”), in view of Flags to Uky (hereinafter, “Uky”)
As per claim 1, Nie teaches a method for operating a first electronic device, the method comprising: 
generating a decision-making data structure using a machine learning data structure (Nie, Page 4, Col. 1, 3rd Para. discloses “As reward function defined by (11) needs all agents’ information, team-Q is implemented in a centralized method. Thus we assume a centralized controller is available to maintain the Q-table.” And Page 4, Col. 1, Distributed Q-Learning section discloses “Distributed Q-learning can decompose the large Q-value table in team-Q to multiple small ones” (The machine learning data structure being the Q-table. Generating a decision making data structure from the Q-table))
transmitting, to a second electronic device, the decision-making data structure; (Nie, Page 3, Col. 2, 3rd Para. discloses  “…there is only one reward function and thus only one Q-table needs to be maintained such that all agents can learn the common Q-function in parallel.” (Agents using the Q-table for learning which means it is transmitted to the agents))
receiving, from the second electronic device, result data regarding a result of performing a selected action selected from the decision-making data structure; (Nie, Para. 3, Col. 2, 7th Para. discloses “The action of each agent consists of a set of transmitting power levels” and Equation 11 describing the reward function (Agent performs an  action which produces a result))
and updating the machine learning data structure using the result data. (Nie, Page 4, Col. 1 3rd Para. discloses “In the learning process, D2D users who share the same RB select transmitting power levels from A simultaneously, then the controller gets a reward and learns Q-function with (12)” (Q-table gets updated. Additionally, Algorithm 1 discloses updating the table entry in Distributed Q-Learning))
wherein the machine learning data structure includes data regarding a plurality of states and data regarding a plurality of possible actions for each of the plurality of states (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels (Q-Table consists of actions for the plurality of states within the table))
wherein the decision-making data structure includes [[additional information]] (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels (Q-Table consists of actions for the plurality of states within the table))
[[and wherein the additional information indicates]] whether data regarding each of the plurality of states is complete or whether more data about the updating is needed (Nie, see Algorithm 1 describing the distributed Q-learning algorithm wherein the Q-table entry gets updated according within the loop)
Nie fails to explicitly teach:
additional information
and wherein the additional information indicates [[whether data regarding each of the plurality of states is complete or whether more data about the updating is needed]]
Hiowever, Uky teaches:
additional information (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
and wherein the additional information indicates [[whether data regarding each of the plurality of states is complete or whether more data about the updating is needed]] (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify distribued Q-learning as disclosed by Nie to use boolean flags as disclosed by Uky. The combination would have been obvious because a person of ordinary skill in the art would be motivated to continuously refine and update the Q-table accordingly wherein using boolean flags (i.e. bit values) can further help make a determination as to whether or not an action pair is in an optimal state. If not, the boolean flag assists such that the action pair state eventually results in an optimized state.

As per claim 2, the combination of Nie and Uky as shown above teaches the method of claim 1, Nie further teaches:
wherein the decision-making data structure includes data regarding the plurality of states, and data regarding one action for each of the plurality of states. (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels” And Page 4, Col. 1, Distributed Q-Learning section discloses “Distributed Q-learning can decompose the large Q-value table in team-Q to multiple small ones…  the dimension of Q-table in distributed-Q turns to be n × |S| × |A|” (Q-Table consists of actions for the plurality of states within the table. The action of an agent consists a single action))

As per claim 3, the combination of Nie and Uky as shown above teaches the method of claim 2, Nie further teaches:
wherein the machine learning data structure further comprises a rating value associated with each of the plurality of possible actions for each of the plurality of states, (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))
wherein the one action for each of the plurality of states in the decision- making data structure corresponds to an action with a highest rating value among the plurality of possible actions for each of the plurality of states in the machine learning data structure. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy. Additionally Algorithm 1 discloses the Q-table having a maximum Q-value))
(Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))

As per claim 4, the combination of Nie and Uky as shown above teaches the method of claim 3, Nie further teaches:
wherein the machine learning data structure further comprises a Q-table regarding Q-value of each of the plurality of possible actions for each of the plurality of states, (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))
wherein the one action for each of the plurality of states in the decision- making data structure corresponds to an action with a highest Q-value for each of the plurality of states in the Q-table. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))

As per claim 5, the combination of Nie and Uky as shown above teaches the method of claim 4, Nie further teaches:
wherein the decision-making data structure further comprises a look-up table regarding each of the plurality of states and the action with the highest Q-value. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (The Q-Table is a lookup table that contains the highest Q-value))

As per claim 16, the combination of Nie and Uky as shown above teaches the method of claim 4, Nie further teaches:
[[wherein the additional information includes a first bit indicating whether]] a randomly-selected action based on a first specified probability level is to be taken instead of an action that is mapped to a corresponding state of the plurality of states. (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels” and “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy, which is described as follows: choose action randomly with probability…” (Q-Table consists of actions for the plurality of states within the table wherein actions are subsequently taken based off transitional probabilities))
Uky further teaches:
wherein the additional information includes a first bit indicating whether [[a randomly-selected action based on a first specified probability level is to be taken instead of an action that is mapped to a corresponding state of the plurality of states]] (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Nie with the teachings of Uky for at least the same reasons as discussed above in claim 1

As per claim 17, the combination of Nie and Uky as shown above teaches the method of claim 1, Nie further teaches:
[[wherein the additional information includes a first bit indicating]] whether the result data is to be transmitted from the second electronic device, (Nie, Para. 3, Col. 2, 7th Para. discloses “The action of each agent consists of a set of transmitting power levels” and Equation 11 describing the reward function and Page 4, Col. 1 Distirbuted Q-learning section discloses “The Q-values in each Q-table will be updated only when the next Q-value is greater than current Q-value. The update rule can be expressed by formula (14). The agents, states and actions used for the distributed-Q algorithm are the same as that in team-Q” (Agent performs an  action which produces a result. Distributed Q-algorithm involving agents communicating with one another))
and wherein the receiving the result data is performed based on a value of [[a second bit indicating that]] the result data is to be transmitted from the second electronic device. (Nie, Para. 3, Col. 2, 7th Para. discloses “The action of each agent consists of a set of transmitting power levels” and Equation 11 describing the reward function and Page 4, Col. 1 Distirbuted Q-learning section discloses “The Q-values in each Q-table will be updated only when the next Q-value is greater than current Q-value. The update rule can be expressed by formula (14). The agents, states and actions used for the distributed-Q algorithm are the same as that in team-Q” (Agent performs an  action which produces a result. Distributed Q-algorithm involving agents communicating with one another))
Uky further teaches:
wherein the additional information includes a first bit indicating [[whether the result data is to be transmitted from the second electronic device]] (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
[[and wherein the receiving the result data is performed based on a value of]] a second bit indicating that [[the result data is to be transmitted from the second electronic device]] (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Nie with the teachings of Uky for at least the same reasons as discussed above in claim 1

As per claim 18, the combination of Nie and Uky as shown above teaches the method of claim 1, Nie further teaches:
wherein the receiving the result data is performed based on whether there is a functional network connection over a communication network between the first electronic device and the second electronic device. (Nie, Fig. 1 discloses D2D communication within a cellular network and Page 3, Col. 1 Single-agent Q-Learning Algorithm discloses “In our system, D2D transmitter in each D2D pair is treated as an agent. Since the whole network includes many agents, it can be regarded as a multi-agents system where the optimal policy of an agent depends not only on the environment but also on the policies of the other agents [20]. Thus the aforementioned Q-learning algorithm needs to be extended from single-agent to multi-agents.” (Communication among different agents involving distributed Q-learning))

As per claim 19, the combination of Nie and Uky as shown above teaches the method of claim 1, Nie further teaches:
(Nie, Page 4, Col. 1, Distributed Q-learning section discloses “The Q-values in each Q-table will be updated only when the next Q-value is greater than current Q-value. The update rule can be expressed by formula (14).”)
and wherein the predetermined event includes an update request from the first electronic device. (Nie, Page 4, Col. 1, Distributed Q-learning section discloses “The Q-values in each Q-table will be updated only when the next Q-value is greater than current Q-value. The update rule can be expressed by formula (14).” (Q-value being greater than what is stored in the agents Q-tables, thus involving a predetermined event whereby subsequently the Q-tables of the agents get accordingly updated as a result))

Claims 6-10 are rejected under 35 U.S.C. 103 as being unpatentable over Nie, in view of Uky and further in view of U.S. Pub. No. US 20150365871 A1 to Hu, et al. (hereinafter, “Hu”)
	As per claim 6, Nie teaches an apparatus of a first electronic device, the apparatus comprising:
generate a decision-making data structure using a machine learning data structure (Nie, Page 4, Col. 1, 3rd Para. discloses “As reward function defined by (11) needs all agents’ information, team-Q is implemented in a centralized method. Thus we assume a centralized controller is available to maintain the Q-table.” And Page 4, Col. 1, Distributed Q-Learning section discloses “Distributed Q-learning can decompose the large Q-value table in team-Q to multiple small ones” (The machine learning data structure being the Q-table. Generating a decision making data structure from the Q-table))
[[the transceiver]] to transmit, to a second electronic device, the decision-making data structure; (Nie, Page 3, Col. 2, 3rd Para. discloses  “…there is only one reward function and thus only one Q-table needs to be maintained such that all agents can learn the common Q-function in parallel.” (Agents using the Q-table for learning which means it is transmitted to the agents))
control [[the transceiver]] to receive, from the electronic device, result data regarding a result of performing a selected action selected from the decision-making data structure; (Nie, Para. 3, Col. 2, 7th Para. discloses “The action of each agent consists of a set of transmitting power levels” (Agent performs an  action which produces a result))
and update the machine learning data structure using the result data. (Nie, Page 4, Col. 1 3rd Para. discloses “In the learning process, D2D users who share the same RB select transmitting power levels from A simultaneously, then the controller gets a reward and learns Q-function with (12)” (Q-table gets updated. Additionally, Algorithm 1 discloses updating the table entry in Distributed Q-Learning))
wherein the machine learning data structure includes data regarding a plurality of states and data regarding a plurality of possible actions for each of the plurality of states (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels (Q-Table consists of actions for the plurality of states within the table))
[[additional information]] (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels (Q-Table consists of actions for the plurality of states within the table))
[[and wherein the additional information indicates]] whether data regarding each of the plurality of states is complete or whether more data about the updating is needed (Nie, see Algorithm 1 describing the distributed Q-learning algorithm wherein the Q-table entry gets updated according within the loop)
	Nie fails to explicitly teach:
	a memory storing a [[machine learning data structure]]
	a transceiver
	at least one processor, wherein the at least one processor is configured to
	However, Hu (Hu addresses managing wireless frequencies using Q-learning) teaches:
a memory storing a [[machine learning data structure]] (Hu, Para. [0056] discloses “These devices may include… processor(s), memory…)
	a transceiver (Hu, Para. [0056] discloses “Primary and secondary user devices may be equipped with, for example, one or two transceivers…”)
	at least one processor, wherein the at least one processor is configured to (Hu, Para. [0056] discloses “These devices may include… processor(s), memory…)

Nie fails to explicitly teach:
additional information
and wherein the additional information indicates [[whether data regarding each of the plurality of states is complete or whether more data about the updating is needed]]
Hiowever, Uky teaches:
additional information (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
and wherein the additional information indicates [[whether data regarding each of the plurality of states is complete or whether more data about the updating is needed]] (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify distribued Q-learning as disclosed by Nie to use boolean flags as disclosed by Uky. The combination would have been obvious because a person of ordinary skill in the art would be motivated to continuously refine and update the Q-table accordingly wherein using boolean flags (i.e. bit values) can further help make a determination as to whether or not an action pair is in an optimal state. If not, the boolean flag assists such that the action pair state eventually results in an optimized state.

	As per claim 7, the combination of Nie, Uky and Hu as shown above teaches the apparatus of claim 6, Nie further teaches:
wherein the decision-making data structure comprises data regarding the plurality of states, and data regarding one action for each of the plurality of states. (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels” And Page 4, Col. 1, Distributed Q-Learning section discloses “Distributed Q-learning can decompose the large Q-value table in team-Q to multiple small ones…  the dimension of Q-table in distributed-Q turns to be n × |S| × |A|” (Q-Table consists of actions for the plurality of states within the table. The action of an agent consists a single action))

	As per claim 8, the combination of Nie, Uky and Hu as shown above teaches the apparatus of claim 7, Nie further teaches:
wherein the machine learning data structure further comprises a rating value associated with each of the plurality of possible actions for each of the plurality of states, (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))
and wherein the one action for each of the plurality of states in the decision- making data structure corresponds to an action with a highest rating value among the plurality of possible actions for each of the plurality of states in the machine learning data structure. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy. Additionally Algorithm 1 discloses the Q-table having a maximum Q-value))
and wherein the rating value is determined based on a reward value determined for each of the plurality of states (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))

	As per claim 9, the combination of Nie, Uky and Hu as shown above teaches the apparatus of claim 8, Nie further teaches:
wherein the machine learning data structure further comprises a Q-table regarding Q-value of each of the plurality of possible actions for each of the plurality of states, (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))
wherein the one action for each of the plurality of states in the decision- making data structure corresponds to an action with a highest Q-value for each of the plurality of states in the Q-table. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))

	As per claim 10, the combination of Nie, Uky and Hu as shown above teaches the apparatus of claim 9, Nie further teaches:
wherein the decision-making data structure further comprises a look-up table regarding each of the plurality of states and the action with the highest Q-value. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (The Q-Table is a lookup table that contains the highest Q-value))

Claims 11-15 are rejected under 35 U.S.C. 103 as being unpatentable over Nie, in view of Uky, further in view of Hu, further in view of U.S. Pub. No. US 20130210480 A1 to Pollington, et al. (hereinafter, “Pollington”)
	As per claim 11, Nie teaches an apparatus of a second electronic device, the apparatus comprising:
control [[the transceiver]] to receive, from a first electronic device, a decision making data structure  (Nie, Page 3, Col. 2, 3rd Para. discloses  “…there is only one reward function and thus only one Q-table needs to be maintained such that all agents can learn the common Q-function in parallel.” (Agents using the Q-table for learning which means it is transmitted to the agents))
[[the state information]] (Nie, Algorithm 1 discloses selecting an action from the Q-Table))
	perform the selected action (Nie, Algorithm 1 disclosed executing the selected action)
control [[the transceiver]] to transmit, to the first electronic device, result data regarding a result of performing the selected action  (Nie, Para. 3, Col. 2, 7th Para. discloses “The action of each agent consists of a set of transmitting power levels” (Agent performs an  action which produces a result))
	wherein the decision-making data structure is generated using a machine learning data structure (Nie, Page 4, Col. 1, 3rd Para. discloses “As reward function defined by (11) needs all agents’ information, team-Q is implemented in a centralized method. Thus we assume a centralized controller is available to maintain the Q-table.” And Page 4, Col. 1, Distributed Q-Learning section discloses “Distributed Q-learning can decompose the large Q-value table in team-Q to multiple small ones” (The machine learning data structure being the Q-table. Generating a decision making data structure from the Q-table))
wherein the machine learning data structure includes data regarding a plurality of states and data regarding a plurality of possible actions for each of the plurality of states (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels (Q-Table consists of actions for the plurality of states within the table))
[[additional information]] (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels (Q-Table consists of actions for the plurality of states within the table))
[[and wherein the additional information indicates]] whether data regarding each of the plurality of states is complete or whether more data about the updating is needed (Nie, see Algorithm 1 describing the distributed Q-learning algorithm wherein the Q-table entry gets updated according within the loop)
	Nie fails to explicitly teach:
	a memory
	a/the transceiver
	at least one processor, wherein the at least one processor is configured to
	However, Hu teaches:
 a memory storing a [[machine learning data structure]] (Hu, Para. [0056] discloses “These devices may include… processor(s), memory…)
	a/the transceiver (Hu, Para. [0056] discloses “Primary and secondary user devices may be equipped with, for example, one or two transceivers…”)
	at least one processor, wherein the at least one processor is configured to (Hu, Para. [0056] discloses “These devices may include… processor(s), memory…)

	Nie fails to explicitly teach:
	at least one sensor
	control the at least one sensor to obtain state information on a current state of the second electronic device
	state information
	However, Pollington (Pollignton addresses the issue of detecting states of an electronic device) teaches:
at least one sensor (Pollington, Abstract discloses “Method of determining a state of a mobile device, the mobile device having a one or more sensors, the method comprising the steps of: obtaining sensor data from the one or more sensors of the mobile device in an initial state.”)
	control the at least one sensor to obtain state information on a current state of the second electronic device (Pollington, Abstract discloses “Method of determining a state of a mobile device, the mobile device having a one or more sensors, the method comprising the steps of: obtaining sensor data from the one or more sensors of the mobile device in an initial state.”)
	state information (Pollington, Abstract discloses “Method of determining a state of a mobile device, the mobile device having a one or more sensors, the method comprising the steps of: obtaining sensor data from the one or more sensors of the mobile device in an initial state.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the Q-learning as disclosed by Nie to use device state information as disclosed by Pollington. The combination would have been obvious because a person of ordinary skill in the art would be motivated to select the maximal action based off the electronic devices sensor data in order to be able to produce the optimum result for the electronic device. Selecting the maximal action allows the device to perform in its maximum capacity and allows it to reap the most reward for its action. Additionally, using sensor data allows for more “…robust and accurate state detection” (Pollington, Para. [0008])
Nie fails to explicitly teach:
additional information
[[whether data regarding each of the plurality of states is complete or whether more data about the updating is needed]]
Hiowever, Uky teaches:
additional information (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
and wherein the additional information indicates [[whether data regarding each of the plurality of states is complete or whether more data about the updating is needed]] (Uky, 1st Para. discloses “A flag is a logical concept, not a special type of variable. The concept is that a variable records the occurrence of an event. That it is set "one way" if the event happened, and set "the other way" if the event did not happen.” And 4th Para. discloses “Any data type can be used as a flag. An integer could have the value 1 if something happened and 0 if it did not.” (The additional information indicating a bit or Boolean value))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify distribued Q-learning as disclosed by Nie to use boolean flags as disclosed by Uky. The combination would have been obvious because a person of ordinary skill in the art would be motivated to continuously refine and update the Q-table accordingly wherein using boolean flags (i.e. bit values) can further help 

	As per claim 12, the combination of Nie, Uky, Hu and Pollington as shown above teaches the apparatus of claim 11, Nie further teaches:
wherein the decision-making data structure comprises data regarding the plurality of states, and data regarding one action for each of the plurality of states. (Nie, Page 3, Col. 2., Team-Q Learning Based Power Control section discloses “In team-Q learning, the agents, states, actions and reward function are defined as follows: Agent: The agents are D2D transmitters. A D2D transmitter is denoted by D2D i, 1 ≤ i ≤ n, State: The state of D2D user i on RB k at time t is defined as… Action: The action of each agent consists of a set of transmitting power levels” And Page 4, Col. 1, Distributed Q-Learning section discloses “Distributed Q-learning can decompose the large Q-value table in team-Q to multiple small ones…  the dimension of Q-table in distributed-Q turns to be n × |S| × |A|” (Q-Table consists of actions for the plurality of states within the table. The action of an agent consists a single action))
	
As per claim 13, the combination of Nie, Uky, Hu and Pollington as shown above teaches the apparatus of claim 12, Nie further teaches:
wherein the machine learning data structure further comprises a rating value associated with each of the plurality of possible actions for each of the plurality of states, (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))
wherein the one action for each of the plurality of states in the decision- making data structure corresponds to an action with a highest rating value among the plurality of possible actions for each of the plurality of states in the machine learning data structure. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy. Additionally Algorithm 1 discloses the Q-table having a maximum Q-value))
and wherein the rating value is determined based on a reward value determined for each of the plurality of states (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use -greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))

As per claim 14, the combination of Nie, Uky, Hu and Pollington as shown above teaches the apparatus of claim 13, Nie further teaches:
wherein the machine learning data structure further comprises a Q-table regarding Q-value of each of the plurality of possible actions for each of the plurality of states, (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))
wherein the one action for each of the plurality of states in the decision- making data structure corresponds to an action with a highest Q-value for each of the plurality of states in the Q-table. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (Q-value being the highest rating. Q-table consists of Q-values where a row in the Q-table has a highest Q-value that is chosen using a greedy strategy))

As per claim 15, the combination of Nie, Uky, Hu and Pollington as shown above teaches the apparatus of claim 14, Nie further teaches:
wherein the decision-making data structure further comprises a look-up table regarding each of the plurality of states and the action with the highest Q-value. (Nie, Page 3, Col. 2, 7th Para. discloses “There are many ways to choose actions based on the current Q-value estimation, in this paper we use greedy strategy…” (The Q-Table is a lookup table that contains the highest Q-value))
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lauer, et al. (An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems) discloses distributed Q-learning where a large Q-table is compressed
Miozzo, et al. (Distributed Q-Learning for Energy Harvesting Heterogeneous Networks) discloses distributed Q-learning within a heterogeneous network
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

/H.R.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145