Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 496311
Notice to Applicant
Claims 1-10 and 21-24 have been examined in this application. This communication is the first action on the merits.  Information Disclosure Statement (IDS) filed on 1/22/2020, 4/24/2020, and 9/4/2020 has been acknowledged. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1- 10 and 21-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 1-10 and 21-24 are directed to a method for object search.
Claim 1 and Claim 8 recite a method for object searching, which include for claim 1, selecting a state from a set of states of a target object-searching scene as a first state; obtaining a target optimal object-searching strategy whose initial state is the first state for searching for a target object; performing strategy learning by taking the target optimal object-searching strategy as a learning target to obtain an object-searching strategy by which the robot searches for the target object in the target object-searching scene, and adding the obtained object-searching strategy into an object-searching strategy pool; determining whether the obtained object-searching strategy is consistent with the target optimal object-searching strategy by comparing the obtained object-searching strategy and the target optimal object-searching strategy; when the obtained object-searching strategy is consistent with the target optimal object- searching strategy, determining that the strategy learning in which the first state is taken as the initial state of the object-searching strategy is completed; and when the obtained object-searching strategy is not consistent with the target optimal object-searching strategy, returning to the step of selecting a state from a set of states of a target object-searching scene. For claim 8, includes  receiving an object-searching instruction for searching for a target object in a target object-searching scene; obtaining a current state of the robot; determining an action performed by the robot in transitioning from the current state to a next state, according to an object-searching strategy, including the current state, for searching for the target object in an object-searching strategy pool, wherein an object-searching strategy in the object-searching strategy pool is a strategy by which the robot searches for the target object in the target object-searching scene and which is obtained by performing strategy learning by taking an optimal object-searching strategy for searching for the target object as a learning target; and the object-searching strategy includes: states successively experienced by the robot from an initial state of the object-searching strategy to a state that the target object is found, and an action performed by the robot in transitioning from each state to a next state; performing the determined action to realize a state transition, and determining whether the target object is found; when the target object is not found, returning to the step of obtaining a current state of the robot until the target object is found. As drafted, this is, under its broadest reasonable interpretation, within the Abstract idea grouping of “Methods of Organizing Human Activity”- business relations.  The recitation of  “robot” nothing in the claim elements preclude the step from being “Methods of Organizing Human Activity”- business relations.  Accordingly, the claim recites an abstract idea.  
This judicial exception is not integrated into a practical application. The claims primarily recite the additional element of using computer components to perform each step. The “robot”  is recited at a high-level of generality, such that it amounts no more than mere instructions to apply the exception using a computer component. See MPEP 2106.05(f). Additionally, the claim 1, claim 8 and claim 16 recite using one or more machine learning techniques. The general use of machine learning techniques does not provide a meaningful limitation to transform the abstract idea into a practical application. Therefore, currently, the machine learning is solely used a tool to perform the instructions of the abstract idea. When the machine learning model has been trained and is simply used to make a decision, then it is just a complex mathematical exercise and is likely not statutory. If the results of the decision are feedback to the model to make it smarter and allow it to make better decisions, then it is likely statutory. As stated in the claim and specification the machine learning model is simply applied to return a result. Neither the result nor the rules (machine learning model) provide a practical application or significantly more than the identified abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims also fail to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, and/or an additional element applies or uses the judicial  exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.  See 84 Fed. Reg. 55.  In particular, there is a lack of improvement to a computer or technical field in contextual analysis. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “robot”  is insufficient to amount to significantly more. (See MPEP 2106.05(f) – Mere Instructions to Apply an Exception – “Thus, for example, claims that amount to nothing more than an instruction to apply the abstract idea using a generic computer do not render an abstract idea eligible.” Alice Corp., 134 S. Ct. at 235). Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. 
The claim fails to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, adding unconventional steps that confine the claim to a particular useful application, and/or meaningful limitations beyond generally linking the use of an abstract idea to a particular environment.  See 84 Fed. Reg. 55. Viewed individually or as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.   With regards to receiving data and step 2B, it is M2106.05(d)- Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information).
Examiner concludes that the additional elements in combination fail to amount to significantly more than the abstract idea based on findings that each element merely performs the same function(s) in combination as each element performs separately. The claim is not patent eligible. Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually.
Dependent Claims 2-7, 9-10, and 21-24 recite the additional elements determining a reward function in a reinforcement learning algorithm for strategy learning through a target type of object-searching strategy by taking the target optimal object-searching strategy as a learning target, wherein the target type of object-searching strategy is an object- searching strategy for searching for the target object in the target object-searching pool; and performing the strategy learning based on the reward function, to obtain an object- searching strategy that maximizes an output value of a value function in the reinforcement learning algorithm as an object-searching strategy by which the robot searches for the target object in the target object-searching scene; determining a reward function R that maximizes a value as the reward function in the reinforcement learning algorithm for strategy learning; obtaining, through learning, object-searching strategies whose initial states are the first state and whose end states are the second state in a preset state transition manner; calculating, according to the following expression, an output value of the value function of the reinforcement learning algorithm in each of the obtained object-searching strategies; determining, according to probabilities of transitioning from a pre-transition state to other states pre-obtained in statistics, a post-transition state and an action performed by the robot in transitioning from the pre-transition state to the post-transition state, wherein the action belongs to a set of actions of the target object-searching scene, and the set of actions is performed by the robot in performing state transitions in the target object-searching scene; determining, according to probabilities of transitioning from a pre-transition state to other states pre-obtained in statistics, a post-transition state and an action performed by the robot in transitioning from the pre-transition state to the post-transition state, wherein the action belongs to a set of actions of the target object-searching scene, and the set of actions is performed by the robot in performing state transitions in the target object-searching scene; collecting an information sequence of the target object-searching scene, wherein the information sequence is composed of information elements comprising video frames and/or audio frames; determining whether the number of information elements that have not been selected in the information sequence is greater than a preset number; when the number of information elements that have not been selected in the information sequence is greater than the preset number, selecting the preset number of information elements from the information elements that have not been selected in the information sequence to generate one state of the robot in the target object-searching scene as a third state; determining whether the third state exists in the set of states; when the third state does not exist in the set of states, adding the third state into the set of states, and returning to the step of determining whether the number of information elements that have not been selected in the information sequence is greater than a preset number; and when the third state exists in the set of states, directly returning to the step of determining whether the number of information elements that have not been selected in the information sequence is greater than a preset number; collecting an information sequence of the target object-searching scene, wherein the information sequence is composed of information elements comprising video frames and/or audio frames; selecting a preset number of information elements from the information sequence; determining whether a state matching the selected information elements exists in a pre- obtained set of states of the target object-searching scene, wherein the set of states is a set of states of the robot in the target object-searching scene; when a state matching the selected information elements exists in the pre-obtained set of states, determining the state matching the selected information elements in the set of states as the current state of the robot.  and further narrowing the abstract idea. These recited limitations in the dependent claims are mere instructions for applying the abstract idea on a computerized system which are operating such that they do not amount to significantly more than the above-identified judicial exceptions in Claims 1 and 8. Furthermore, the claim 21-24 recite the additional element of processor, memory, computer program,  and computer readable storage medium which is M2106.05(d)- Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information). Regarding Claims 2-4, and 9 recite using one or more machine learning techniques. The specification discloses the machine learning at a high-level of generality, providing examples of different techniques that may be applied. The general use of a machine learning does not provide a meaningful limitation to transform the abstract idea into a practical application. Therefore, currently, the machine learning is solely used a tool to perform the instructions of the abstract idea.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all 

obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-8, 10 and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Pecka et al.. Safe Exploration Techniques for Reinforcement Learning – An Overview. In: Hodicky, J. (eds) Modelling and Simulation for Autonomous Systems. MESAS 2014. [hereinafter Pecka], in view of Lu et al, Efficient deep network for vision-based object detection in robotic applications, Neurocomputing,Volume 245, July 5, 2017 [hereinafter Lu]. 
Regarding Claim 1, 
Pecka teaches
A machine learning method, which is applied to a robot, comprising: selecting a state from a set of states of a target object-searching scene as a first state, wherein the set of states is a set of states of the robot in the target object-searching scene (Pecka Pg. 358-“Reinforcement learning proved to be extremely useful in the case of state-space exploration – the long-term reward corresponds to the value of each state [17]. From such values, we can compose a policy which tells the agent to always take the action leading to the state with the highest value. As an addition, state values are easily interpretable for humans. Since the early years, a lot of advanced methods were devised in the area of reinforcement learning. To name one, Q-learning [25] is often used in connection with safe exploration. Instead of computing the values of states, it computes the values of state–action pairs, which has some simplifying consequences. …What do all of these methods have in common, is the need for rather large training data sets. For simulated environments it is usually not a problem. But with real robotic hardware, the collection of training samples is not only lengthy, but also dangerous (be it mechanical wear or other effects). Another common feature of RL algorithms is the need to enter unknown states, which is inherently unsafe.; Pg. 363/ Algorithm 3; Pg. 362- Sec. 2.4- first state); 
obtaining a target optimal object-searching strategy whose initial state is the first state for searching for a target object, wherein the object-searching strategy includes: states successively experienced by the robot from the initial state of the object-searching strategy to a state that the target object is found, and an action performed by the robot in transitioning from each state to a next state (Pecka Pg. 363, algorithm 3: Pg. 362, section 2.4 Policy Iteration-“ Policy iteration is a completely different approach to computing the optimal policy. Instead of deriving the policy from the Value or Q function, Policy iteration works directly with policies. In the first step, a random policy is chosen. Then a loop consisting of policy evaluation and policy improvement repeats as long as the policy can be improved [17] (refer to Algorithm 3 for details). Since in every step the policy gets better, and there is a finite number of different policies, it is apparent that the algorithm converges [23]. Policy iteration can be initialized by a known, but suboptimal policy. Such policy can be obtained e.g. by a human operator driving the UGV. If the initial policy is good, Policy iteration has to search much smaller subspace and thus should converge more quickly than with a random initial policy [11].);
 performing strategy learning by taking the target optimal object-searching strategy as a learning target to obtain an object-searching strategy by which the robot searches for the target object in the target object-searching scene, …wherein the obtained object-searching strategy is an object- searching strategy whose initial state is the first state and whose end state is a second state, wherein the second state is a state of the robot corresponding to a position of the target object in the target object-searching scene (Pecka Pg. 363; Algorithm 3; Section 2.4 Policy Iteration- disclose that policy learning is performed which is regarded as strategy learning. . π’ which is regarded as optimal object-searching strategy to obtain π which is regarded as an object-searching strategy. lt is implicit that the searching strategy has the initial state as a first state and an end state which is regarded as the second state and that this second state is corresponding to a position of the target object in the target object-searching scene.) ; 
determining whether the obtained object-searching strategy is consistent with the target optimal object-searching strategy by comparing the obtained object-searching strategy and the target optimal object-searching strategy (Pecka Pg. 363; Algorithm 3 - disclose the loop of the algorithm. The criteria of the loop is to check if π=π’, wherein this criteria of the loop is regarded as the comparison to check if the obtained object-searching strategy is consistent with the target optimal object-searching strategy.); 
when the obtained object-searching strategy is consistent with the target optimal object- searching strategy, determining that the strategy learning in which the first state is taken as the initial state of the object-searching strategy is completed (Pecka Pg. 363, algorithm 3, disclose the loop of the algorithm. The loop ends when the consistency is reached which is regarded as determining that the strategy learning in which the first state is taken as the initial state of the object-searching strategy is completed.); 
and when the obtained object-searching strategy is not consistent with the target optimal object-searching strategy, returning to the step of selecting a state from a set of states of a target object-searching scene(Pecka Pg. 363, algorithm 3, disclose the loop of the algorithm. The loop repeats when the consistency is not reached which is regarded as returning to the step of selecting a state from a set of states of a target object-searching scene.). 

Pecka teaches object search strategy and the feature is expounded upon by the teaching of Lu:

and adding the obtained object-searching strategy into an object-searching strategy pool, (Lu Introduction-“Another class of approaches for object detection is based on deep learning, and this approach has achieved tremendous success in recent years. Among them, convolutional neural networks (CNNs) [15] are a subtype of deep artificial neural networks which are inspired by the animal visual cortex organization [16]. CNNs have achieved state-of-the-art performance using local receptive fields, weight sharing and spatial pooling [17]. With the development of CNN-based models, they are increasingly prevalent in robotic applications. Alex et al. proposed AlexNet [15], that consists of several convolutional layers, max-pooling layers and fully-connected layers. This method achieved best performance in the ImageNet ILSVRC-2012 competition and shows significant gains over state-of-the-art hand-craft methods [10], [12], [14]. Thereafter, Google and Baidu have improved their image search engines according to this deep learning architecture which increases the searching accuracy markedly [16]. Nevertheless, deeper convolutional neural networks are often more difficult to train [18], and this causes a bottleneck in the development of CNNs. To overcome this limitation, He et al. introduced the deep residual network [18] that enables deeper CNN training through residual learning. With 152 layers, this deep residual network [18] won first prize in several tasks in the ImageNet ILSVRC 2015 and COCO 2015 competitions.; Sec 3-“In order to control the model complexity, we first construct a basic CNN framework, which will be completed by MPGA. As shown in Fig. 11, the basic CNN framework consists of an input layer, 5 convolution layers, 3 pooling layers and an output layer (softmax). The input layer is the RGB color image patch of 32 × 32 pixels, which is resized from object proposals.”)

Pecka and Lu are directed to robotics use in deep learning . It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the analysis of Pecka, as taught by Lu, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Pecka with the motivation of improved improving object detection performance in the context of robotic applications. (Lu Introduction).

Regarding Claim 2,
The method of claim 1, wherein performing strategy learning by taking the target optimal object-searching strategy as a learning target to obtain an object-searching strategy by which the robot searches for the target object in the target object-searching scene comprises: determining a reward function in a reinforcement learning algorithm for strategy learning through a target type of object-searching strategy by taking the target optimal object-searching strategy as a learning target, wherein the target type of object-searching strategy is an object- searching strategy for searching for the target object in the target object-searching pool (Pecka Pg. 360-“Once the model is set up, everything is ready for utilizing an MDP. “The agent’s job is to find a policy π mapping states to actions, that maximizes some long-run measure of reinforcement” [17]. The “long-run” may have different meanings, but there are two favorite optimality models: the first one is the finite horizon model, where the term J = h t=0 rt is maximized (h is a predefined time horizon and rt is the reward obtained in time instant t while executing policy π). The dependency of rt on the policy is no longer obvious from this notation, but this is the convention used in literature when it is clear which policy is used. This model represents the behavior of the robot which only depends on a predefined number of future states and actions.”; Pg. 363; Algorithm 3; Pg. 362; Section 2.4 Policy iteration- determining a reward function in a reinforcement learning algorithm for strategy learning through a target type of  object learning strategy.”); 
and performing the strategy learning based on the reward function, to obtain an object- searching strategy that maximizes an output value of a value function in the reinforcement learning algorithm as an object-searching strategy by which the robot searches for the target object in the target object-searching scene (Pecka Pg. 363-364; Algorithm 3; Pg. 362; Section 2.4 Policy iteration- determining a reward function in a reinforcement learning algorithm for strategy learning through a target type of object learning strategy.”).
Regarding Claim 5,
The method of claim 1, wherein the next state of each state in the object-searching strategy and an action performed by the robot in transitioning from each state to the next state are determined by: determining, according to probabilities of transitioning from a pre-transition state to other states pre-obtained in statistics, a post-transition state and an action performed by the robot in transitioning from the pre-transition state to the post-transition state, wherein the action belongs to a set of actions of the target object-searching scene, and the set of actions is performed by the robot in performing state transitions in the target object-searching scene. (Pecka Pg. 363; Algorithm 3; Pg. 359-360-Section2.1“Markov Decision Processes (MDPs) are the standard model for deliberating about reinforcement learning problems. They provide a lot of simplifications, but are sufficiently robust to describe a large set of real-world problems.”[using probabilities of transitioning from one state to another state an action is used for the transitioning.]); 
Regarding Claim 6 and Claim 10, Pecka in view of Lu teach The method of claim 5,… and The method of claim 8,…
Pecka fails to teach the following feature taught by Lu:
wherein the states in the set of states are obtained by: collecting an information sequence of the target object-searching scene, wherein the information sequence is composed of information elements comprising video frames and/or audio frames (Lu Pg. 32- “For object detection tasks, Girshick et al. proposed a R-CNN [19] framework that first generates a set of proposal bounding boxes that are likely to contain objects, using region proposal methods such as Selective Search [20] and Edge Boxes [21].;  Pg. 39-“ The camera mounted on a robot can provide image sequence which contains rich temporal information. The goal of multi-frame fusion is to utilize this temporal information to improve performance in object detection. As we know, the probability of a nonobject region giving a false positive in several consecutive frames is much lower than that in a single frame. Similarly, if an image region is detected to be positive in several consecutive frames, then there is a high probability that the image region is an object region. In other words, multi-frame fusion can effectively improve the true positive rate and yet reduce the false positive rate. Moreover, detection in different frames can be smoothed by multi-frame fusion.”) ; 
determining whether the number of information elements that have not been selected in the information sequence is greater than a preset number (Lu Pg. 41-“ In Table 1, Recall@2000 denotes the average recall using proposals at the number of 2000. N@90% represents the average number of proposals needed to achieve 90% recall per frame. All the recall rates in Table 1 are computed using the IoU threshold of 0.7”);
 when the number of information elements that have not been selected in the information sequence is greater than the preset number, selecting the preset number of information elements from the information elements that have not been selected in the information sequence to generate one state of the robot in the target object-searching scene as a third state (Lu Pg. 33-“ Proposal layer: This structure is proposed to efficiently generate potential object bounding-boxes, which could efficiently avoid exhaustive object searching across an image and yet improve object detection performance by reducing the complexity of the classification task for the detection layer. The proposal layer consists of multiple hyperplanes optimization, data-driven kernel size selection, multi-scale feature extraction and multi-kernel convolution. As multiple hyperplanes and data-driven kernel size selection do not need to run on-line, the proposal layer is able to work efficiently and effectively. (2) Detection layer: This component runs on the proposal bounding-boxes generated by the proposal layer. We first propose a multiple population genetic algorithm-based convolutional neural network (MPGA-based CNN) module that is able to determine the number of feature maps at each layer and makes a trade-off between performance and computational complexity.”); 
determining whether the third state exists in the set of states (Lu Then, a TLD-based multi-frame fusion strategy is proposed to utilize temporal information and improve the detection performance. Furthermore, we perform several experiments to validate each component of our proposed object detection approach and compare the results with some recently published state-of-the-art object detection algorithms on widely used datasets.”); 
when the third state does not exist in the set of states, adding the third state into the set of states, and returning to the step of determining whether the number of information elements that have not been selected in the information sequence is greater than a preset number; and when the third state exists in the set of states, directly returning to the step of determining whether the number of information elements that have not been selected in the information sequence is greater than a preset number. (Lu – Fig. 11; Pg. 38-“ detection layer, which works on the potential bounding-boxes generated by the proposal layer. In the detection layer, we first propose an MPGA-based convolutional neural network (CNN) module to balance the performance and computational complexity of deep convolutional neural networks. Then, we design a multi-frame fusion strategy to utilize temporal information and improve detection performance.)
Pecka and Lu are directed to robotics use in deep learning . It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the analysis of Pecka, as taught by Lu, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Pecka with the motivation of improved improving object detection performance in the context of robotic applications. (Lu Introduction).

Regarding Claim 7, Pecka in view of Lu teach The method of claim 6,…
wherein the actions in the set of actions are obtained by: obtaining an action sequence corresponding to the information sequence, wherein the action sequence is composed of action elements, and the action elements in the action sequence correspond to the information elements in the information sequence one to one; determining whether the number of action elements that have not been selected in the action sequence is greater than the preset number (Lu Pg. 31-“ Object detection plays a critical role in a wide range of robotic applications such as service robot interaction [1], autonomous driving [2], and collision avoidance [3], which need to detect the presence of both stationary and moving objects in a specific area of interest around the host robots in order to perform corresponding actions such as interaction, braking and evading.”); 
when the number of action elements that have not been selected in the action sequence is greater than the preset number, selecting the preset number of action elements from the action elements that have not been selected in the action sequence, to generate one action of the robot in the target object-searching scene as a first action; determining whether the first action exists in the set of actions (Lu Pg. 36-“ The corresponding convolution kernel sizes can be obtained by using Eq. (9). Note that in Fig. 8, the sizes of some resulting bounding-boxes are larger than 150 × 150. In this situation, the dimension of the corresponding feature vector will be larger than (150/4)2 × 4 = 5625, which is a relatively high dimension for the classification problem in this work where features are simple and classifiers are liner. Consequently, it may increase the complexity in classification or even cause the problem of over-fitting, which is so-called curse of dimensionality [28]. In order to avoid these problems, we downsample the original image to t sizes and assign each resulting bounding-box to a certain resized image (see Fig. 2).)”; 
when the first action does not exist in the set of actions, adding the first action into the set of actions, and returning to the step of determining whether the number of action elements that have not been selected in the action sequence is greater than the preset number (Lu – Fig. 11; Pg. 38-“ detection layer, which works on the potential bounding-boxes generated by the proposal layer. In the detection layer, we first propose an MPGA-based convolutional neural network (CNN) module to balance the performance and computational complexity of deep convolutional neural networks. Then, we design a multi-frame fusion strategy to utilize temporal information and improve detection performance.”); 
and when the first action exists in the set of actions, directly returning to perform the step of determining whether the number of action elements that have not been selected in the action sequence is greater than the preset number (Lu Pg. 35-36-“The element Es(x, y) located at (x, y) of the importance map represents the importance of selecting the bounding-box of width x and height y. Supposing that q types of bounding-box need to be. selected, we design q iterative procedures to address the problem. At each iteration, a bounding-box of size xm × ym is selected at the position (xm, ym), which has the maximum importance value in the importance map Es (see Eq. (13)).
Pecka and Lu are directed to robotics use in deep learning . It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the analysis of Pecka, as taught by Lu, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Pecka with the motivation of improved improving object detection performance in the context of robotic applications. (Lu Introduction).
Regarding Claim 8, 
Pecka teaches
An object-searching method, which is applied to a robot, comprising: receiving an object-searching instruction for searching for a target object in a target object-searching scene; (Pecka Pg. 362 Section 2.4-“ Policy iteration is a completely different approach to computing the optimal policy. Instead of deriving the policy from the Value or Q function, Policy iteration works directly with policies. In the first step, a random policy is chosen. Then a loop consisting of policy evaluation and policy improvement repeats as long as the policy can be improved [17] (refer to Algorithm 3 for details). Since in every step the policy gets better, and there is a finite number of different policies, it is apparent that the algorithm converges [23]. Policy iteration can be initialized by a known, but suboptimal policy. Such policy can be obtained e.g. by a human operator driving the UGV. If the initial policy is good, Policy iteration has to search much smaller subspace and thus should converge more quickly than with a random initial policy [11].); 
obtaining a current state of the robot; (Pecka Pg. 358-“Reinforcement learning proved to be extremely useful in the case of state-space exploration – the long-term reward corresponds to the value of each state [17]. From such values, we can compose a policy which tells the agent to always take the action leading to the state with the highest value. As an addition, state values are easily interpretable for humans. Since the early years, a lot of advanced methods were devised in the area of reinforcement learning. To name one, Q-learning [25] is often used in connection with safe exploration. Instead of computing the values of states, it computes the values of state–action pairs, which has some simplifying consequences. …What do all of these methods have in common, is the need for rather large training data sets. For simulated environments it is usually not a problem. But with real robotic hardware, the collection of training samples is not only lengthy, but also dangerous (be it mechanical wear or other effects). Another common feature of RL algorithms is the need to enter unknown states, which is inherently unsafe.; Pg. 363/ Algorithm 3; Pg. 362- Sec. 2.4- first state); 
determining an action performed by the robot in transitioning from the current state to a next state, …wherein an object-searching strategy in the object-searching strategy pool is a strategy by which the robot searches for the target object in the target object-searching scene; (Pecka Pg. 363, algorithm 3: Pg. 362, section 2.4 Policy Iteration-“ Policy iteration is a completely different approach to computing the optimal policy. Instead of deriving the policy from the Value or Q function, Policy iteration works directly with policies. In the first step, a random policy is chosen. Then a loop consisting of policy evaluation and policy improvement repeats as long as the policy can be improved [17] (refer to Algorithm 3 for details). Since in every step the policy gets better, and there is a finite number of different policies, it is apparent that the algorithm converges [23]. Policy iteration can be initialized by a known, but suboptimal policy. Such policy can be obtained e.g. by a human operator driving the UGV. If the initial policy is good, Policy iteration has to search much smaller subspace and thus should converge more quickly than with a random initial policy [11].) Section 2.4 Policy Iteration- disclose that policy learning is performed which is regarded as strategy learning. . π’ which is regarded as optimal object-searching strategy to obtain π which is regarded as an object-searching strategy. lt is implicit that the searching strategy has the initial state as a first state and an end state which is regarded as the second state and that this second state is corresponding to a position of the target object in the target object-searching scene.) ; 
performing the determined action to realize a state transition, and determining whether the target object is found; (Pecka Pg. 363, algorithm 3, disclose the loop of the algorithm. The loop ends when the consistency is reached which is regarded as determining that the strategy learning in which the first state is taken as the initial state of the object-searching strategy is completed.); 
when the target object is not found, returning to the step of obtaining a current state of the robot until the target object is found.(Pecka Pg. 363, algorithm 3, disclose the loop of the algorithm. The loop repeats when the consistency is not reached which is regarded as returning to the step of selecting a state from a set of states of a target object-searching scene.). 

Pecka teaches object search strategy and the feature is expounded upon by the teaching of Lu:

according to an object-searching strategy, including the current state, for searching for the target object in an object-searching strategy pool, (Lu Introduction-“Another class of approaches for object detection is based on deep learning, and this approach has achieved tremendous success in recent years. Among them, convolutional neural networks (CNNs) [15] are a subtype of deep artificial neural networks which are inspired by the animal visual cortex organization [16]. CNNs have achieved state-of-the-art performance using local receptive fields, weight sharing and spatial pooling [17]. With the development of CNN-based models, they are increasingly prevalent in robotic applications. Alex et al. proposed AlexNet [15], that consists of several convolutional layers, max-pooling layers and fully-connected layers. This method achieved best performance in the ImageNet ILSVRC-2012 competition and shows significant gains over state-of-the-art hand-craft methods [10], [12], [14]. Thereafter, Google and Baidu have improved their image search engines according to this deep learning architecture which increases the searching accuracy markedly [16]. Nevertheless, deeper convolutional neural networks are often more difficult to train [18], and this causes a bottleneck in the development of CNNs. To overcome this limitation, He et al. introduced the deep residual network [18] that enables deeper CNN training through residual learning. With 152 layers, this deep residual network [18] won first prize in several tasks in the ImageNet ILSVRC 2015 and COCO 2015 competitions.; Sec 3-“In order to control the model complexity, we first construct a basic CNN framework, which will be completed by MPGA. As shown in Fig. 11, the basic CNN framework consists of an input layer, 5 convolution layers, 3 pooling layers and an output layer (softmax). The input layer is the RGB color image patch of 32 × 32 pixels, which is resized from object proposals.”)

Pecka and Lu are directed to robotics use in deep learning . It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon the analysis of Pecka, as taught by Lu, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Pecka with the motivation of improved improving object detection performance in the context of robotic applications. (Lu Introduction).
Regarding Claim 21 and Claim 22, Pecka in view of Lu teach method of claim1 and claim 8…
A robot, comprising a processor and a memory, wherein the memory stores a computer program; and the processor, when executing the program stored on the memory, performs …(Pecka Pg. Pg. 358-“ What do all of these methods have in common, is the need for rather large training data sets. For simulated environments it is usually not a problem. But with real robotic hardware, the collection of training samples is not only lengthy, but also dangerous (be it mechanical wear or other effects). Another common feature of RL algorithms is the need to enter unknown states, which is inherently unsafe. 369-“ if To better illustrate some of the practical details, we use the UGV (Unmanned Ground Vehicle) robotic platform from EU FP7 project NIFTi [6] (see Figure 1) as a reference agent. It may happen that in these practical details we assume some advantages of UGVs over UAVs (Unmanned Aerial Vehicles), like the ability to stand still without much effort, but it is mostly easy to convert these assumptions to UAVs,; Pg. 370-“ Since the labels error/non-error are only for final states, the risk function here is extended by a so called Case-based memory, which is in short a constant-sized memory for storing the historical (s, a, V(s)) samples and is able to find nearest neighbors for a given query (using e.g. the Euclidean distance).”)

Regarding Claim 23 and Claim 24, Pecka in view of Lu teach method of claim1 and claim 8…
A non-transitory computer readable storage medium, which is arranged in the robot, wherein a computer program is stored in the computer readable storage medium, and the computer program, when executed by a processor, so as to cause the processor to perform …(Pecka Pg. Pg. 358-“ What do all of these methods have in common, is the need for rather large training data sets. For simulated environments it is usually not a problem. But with real robotic hardware, the collection of training samples is not only lengthy, but also dangerous (be it mechanical wear or other effects). Another common feature of RL algorithms is the need to enter unknown states, which is inherently unsafe. 369-“ if To better illustrate some of the practical details, we use the UGV (Unmanned Ground Vehicle) robotic platform from EU FP7 project NIFTi [6] (see Figure 1) as a reference agent. It may happen that in these practical details we assume some advantages of UGVs over UAVs (Unmanned Aerial Vehicles), like the ability to stand still without much effort, but it is mostly easy to convert these assumptions to UAVs,; Pg. 370-“ Since the labels error/non-error are only for final states, the risk function here is extended by a so called Case-based memory, which is in short a constant-sized memory for storing the historical (s, a, V(s)) samples and is able to find nearest neighbors for a given query (using e.g. the Euclidean distance).”)

Reasons Claims are Patentably Distinguishable from the Prior Art
Examiner analyzed Claim 3-4 in view of the prior art on record and finds not all claim limitations are explicitly taught nor would one of ordinary skill in the art find it obvious to combine these references with a reasonable expectation of success as discussed below. 

In regards to Claim 3 (similarly Claim 9), the prior art does not teach or fairly suggest: 
 “… determining a reward function R that maximizes a value of the following expression as the reward function in the reinforcement learning algorithm for strategy learning:”


    PNG
    media_image1.png
    286
    619
    media_image1.png
    Greyscale
.
k represents the number of object-searching strategies for searching for the target object included in the object-searching strategy pool, i represents an identifier of each object-searching strategy for searching for the target object in the object-searching strategy pool, 7ti represents an object-searching strategy for searching for the target object, identified by i, in the object- searching strategy pool, 7rd represents the target optimal object-searching strategy, SO represents the first state, Va represents an output value of the value function of the reinforcement learning algorithm in the object-searching strategy 7r, M represents the number of states included in the object-searching strategy 7r, m represents an identifier of each of the states in the object- searching strategy 7r, t represents the number of state transitions in the object-searching strategy 7r, 7t(Sm) represents an action performed by the robot in transitioning from a state Sm to a next state in the object-searching strategy 7r, y is a preset coefficient, 0<y<1, and maximize( represents a function that returns the maximum value. 


Examiner finds that Pecka et al. (Safe Exploration Techniques for Reinforcement Learning – An Overview. In: Hodicky, J. (eds) Modelling and Simulation for Autonomous Systems. MESAS 2014) discloses different approaches to safety in (semi)autonomous robotics. Particularly, we focus on how to achieve safe behavior of a robot if it is requested to perform exploration of unknown states. Presented methods are studied from the viewpoint of reinforcement learning, a partially-supervised machine learning method. To collect training data for this algorithm, the robot is required to freely explore the state space – which can lead to possibly dangerous situations. The role of safe exploration is to provide a framework allowing exploration while preserving safety. The examined methods range from simple algorithms to sophisticated methods based on previous experience or state prediction.  
Lu et al. (Efficient deep network for vision-based object detection in robotic applications, Neurocomputing,Volume 245, July 5, 2017) teaches Vision-based object detection is essential for a multitude of robotic applications, specifically, disclosing Vision-based object detection is essential for a multitude of robotic applications. However, it is also a challenging job due to the diversity of the environments in which such applications are required to operate, and the strict constraints that apply to many robot systems in terms of run-time, power and space. To meet these special requirements of robotic applications, we propose an efficient deep network for vision-based object detection. More specifically, for a given image captured by a robot mount camera, we first introduce a novel proposal layer to efficiently generate potential object bounding-boxes. The proposal layer consists of efficient on-line convolutions and effective off-line optimization. Afterwards, we construct a robust detection layer which contains a multiple population genetic algorithm-based convolutional neural network (MPGA-based CNN) module and a TLD-based multi-frame fusion procedure. Unlike most deep learning based approaches, which rely on GPU, all of the on-line processes in our system are able to run efficiently without GPU support. We perform several experiments to validate each component of our proposed object detection approach and compare the approach with some recently published state-of-the-art object detection algorithms on widely used datasets. The experimental results demonstrate that the proposed network exhibits high efficiency and robustness in object detection tasks. (see Abstract). In particular, Lu discloses pooling strategic analysis (see Introduction).  

Tai et al. ( "A robot exploration strategy based on Q-learning network," 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2016,) introduces a reinforcement learning method for exploring a corridor environment with the depth information from an RGB-D sensor only. The robot controller achieves obstacle avoidance ability by pre-training of feature maps using the depth information. The system is based on the recent Deep Q-Network (DQN) framework where a convolution neural network structure was adopted in the Q-value estimation of the Q-learning method.
Zhuang et al. ("Robot path planning in complex environment based on delayed-optimization reinforcement learning," Proceedings. International Conference on Machine Learning and Cybernetics, 2002) discloses In this paper, the delayed-optimization reinforcement learning (DORL) is proposed and applied to mobile robot control in a complex environment with multiple obstacles. The delayed optimization of the sub-optimal solutions is incorporated into the reinforcement-learning agent. Learning from global optimized control experience is enabled. In the experiments, the global optimal control strategy can be learned by DORL. Compared with the traditional reinforcement learning method, the DORL algorithm shows much better learning performance.

However Pecka, Lu, Tai and Zhuang, individually and in combination, fail to teach the specific case of sending commands to adjust a flow rate at a facility based on the estimated arrival time of a driver and the sensed fluid levels at the facility. Therefore, for at least these reasons, Claim 3 (similarly Claim 20) is eligible over the prior art. 
The dependent claim 4 is eligible under 35 U.S.C. 102 and 35 U.S.C. 103 because they depend on claim 3 that is determined to be eligible.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US Patent Publication No. 20140121833  A1 to Lee et al. - A method and an apparatus for planning path of robot in correspondence to environment changes in real time, and a recording medium storing the program for performing the said method. The method includes operating the robot according to a first path; generating a second path if an obstacle is discovered around the robot while the robot is being operated according to the first path, and data of the first path exist in a first space within a first distance from a current location of the robot; and operating the robot according to at least the second path..
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chesiree Walton, whose telephone number is (571) 272-5219.  The examiner can normally be reached from Monday to Friday between 8 AM and 5 PM.  If any attempt to reach the examiner by telephone is unsuccessful, the examiner’s supervisor, Patricia Munson, can be reached at (571) 270-5396.  The fax telephone numbers for this group are either (571) 273-8300 or (703) 872-9326 (for official communications including After Final communications labeled “Box AF”).
	Another resource that is available to applicants is the Patent Application Information Retrieval (PAIR). Information regarding the status of an application can be obtained from the (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAX. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, please feel free to contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
	Applicants are invited to contact the Office to schedule an in-person interview to discuss and resolve the issues set forth in this Office Action.  Although an interview is not required, the Office believes that an interview can be of use to resolve any issues related to a patent application in an efficient and prompt manner.
Sincerely,
/CHESIREE A WALTON/Examiner, Art Unit 3624