DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Pursuant to communications filed on 11/30/2020, this is a First Action Non-Final Rejection on the Merits. Claims 1-18 are currently pending in the instant application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/30/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the Examiner.
Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.
                                   Examiner's Note
Examiner has cited particular paragraphs and/or columns / lines numbers or figures in the reference(s) as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant, in preparing the responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. Applicant is reminded that the Examiner is entitled to give the broadest reasonable interpretation to the language of the claims.
Claim Objections
Claim 16 is objected to because the claim is not following the USPTO guidelines. The claim is written as a single paragraph, there is no preamble and there is no a transitional phrase such as "comprising", "consisting essentially of" and "consisting of", to mention a few, to define the scope of the claim. In other words, the claim language is lacking the transitional phrase that links the preamble of a patent claim to the specific elements set forth in the claim which define what the invention itself actually is.
Accordingly, appropriate correction and/or clarification are earnestly solicited.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter (e.g. process, machine, manufacture, or composition of matter) because the claim, as drafted, it is directed to “a learned model that is acquired by updating a neural network according to a reward generated in a case where a plurality of virtual images photographed while changing an environmental condition of a virtual environment generated by virtualizing an environment including a robot and a state of a virtual robot are input to the neural network, and a policy of the virtual robot, which is output from the neural network, satisfies a predetermined condition”. Hence, the claim can be interpreted as a program itself (e.g. model, neural network), not a process occurring as a result of executing the program, a machine programmed to operate in accordance with the program nor a manufacture structurally and functionally interconnected with the program in a manner which enable the program to act as a computer component and realize its functionality. It's also clearly not directed to a composition of matter. Therefore, it's non-statutory under 35 USC 101. Accordingly, appropriate correction and/or clarification are earnestly solicited.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
1. - an acquisition unit configured to …In claim 1.
2. - a driving unit configured to…. In claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-11, and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Tobin et al (NPL “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World” – from IDS), hereinafter “Tobin” in view of Jaderberg et al (NPL “Reinforcement Learning with Unsupervised Auxiliary Tasks” – from IDS), hereinafter “Jaderberg”.
Regarding claims 1, 16, 17, and 18, Tobin discloses a robot controller / the associated learned model / the associated method / the associated CRM that controls a robot (e.g. via Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World being implemented in a Fetch Robot as an object detector that can be used to perform grasping in a cluttered environment), including at least one processor or circuit (e.g. via robotic control) configured to perform the operations of the following units: 
an acquisition unit (e.g. via robotic control) configured to acquire an image from an image capturing apparatus (e.g. via a monocular camera and/or webcam) that photographs an environment (e.g. real-world data) including the robot (Fetch Robot/object detector) (see at least fig. 1 showing rendered images of environment with multiple objects; see fig. 2 showing the input of an image from a webcam; see also page 25, section III-A disclosing the use of a monocular camera and disclosing “Since we use a single monocular camera image from an uncalibrated camera to estimate object positions, we fix the height of the table in simulation, effectively creating a 2D pose estimation task. Random textures are chosen among the following: (a) A random RGB value (b) A gradient between two random RGB values; see also section IV-A disclosing “training object detectors for each of eight geometric objects. We constructed mesh representations for each object to render in the simulator. Each training sample consists of (a) a rendered image of the object and one or more distractors (also from among the geometric object set) on a simulated tabletop and (b) a label corresponding to the Cartesian coordinates of the center of mass of the object in the world frame.” – further, see also sections IV-B-C); and 
a driving unit (not shown- e.g. inherently performed by the robotic control / motion planner to execute motion when grasping objects) configured to drive the robot based on an output result obtained by inputting the image to a neural network (see fig 8 showing the execution and sequence of motions when grasping objects; see abstract disclosing “perform grasping in a cluttered environment”; see page 25, section III disclosing the use of “deep neural network”; see section 4-G disclosing “use of our object detection networks for localizing an object in clutter and performing a prescribed grasp. For two of our most consistently accurate detectors, we evaluated the ability to pick up the detected object in 20 increasingly cluttered scenes using the positions estimated by the detector and off-the-shelf motion planning software”; and see section V disclosing “an object detector trained only in simulation can achieve high enough accuracy in the real world to perform grasping in clutter.”), 


    PNG
    media_image1.png
    303
    564
    media_image1.png
    Greyscale

       
    PNG
    media_image2.png
    599
    553
    media_image2.png
    Greyscale

   
        
    PNG
    media_image3.png
    390
    552
    media_image3.png
    Greyscale

      
    PNG
    media_image4.png
    305
    425
    media_image4.png
    Greyscale


Tobin teaches the claimed invention, but does not expressly teach wherein the neural network is updated according to a reward generated in a case where a plurality of virtual images photographed while changing an environmental condition of a virtual environment generated by virtualizing the environment and a state of a virtual robot are input to the neural network, and a policy of the virtual robot, which is output from the neural network, satisfies a predetermined condition. 
However, in the same field of endeavour or analogous art, Jaderberg teaches the claimed features implemented in a Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. Jaderberg further teaches “learning generally about the dynamics of the environment, an agent must learn to maximise the global reward stream. To learn a policy to maximise rewards, an agent requires features that recognise states that lead to high reward and value. An agent with a good representation of rewarding states, will allow the learning of good value functions, and in turn should allow the easy learning of a policy. However, in many interesting environments reward is encountered very sparsely, meaning that it can take a long time to train feature extractors adept at recognising states which signify the onset of reward. We want to remove the perceptual sparsity of rewards and rewarding states to aid the training of an agent, but to do so in a way which does not introduce bias to the agent’s policy. To do this, we introduce the auxiliary task of reward prediction – that of predicting the onset of immediate reward given some historical context. This task consists of processing a sequence of consecutive observations, and requiring the agent to predict the reward picked up in the subsequent unseen frame. This is similar to value learning focused on immediate reward (γ= 0).” (see at least page 4-5, section 3.2). 

    PNG
    media_image5.png
    428
    781
    media_image5.png
    Greyscale

Therefore, it is prima facie obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Tobin to include the idea of rewards along policies, as taught by Jaderberg for the benefit of showing how augmenting a deep reinforcement learning agent with auxiliary control and reward prediction tasks can drastically improve both data efficiency and robustness to hyperparameter settings.  
Regarding claim 2, Tobin in view of Jaderberg discloses as discussed above in claim 1. Tobin further teaches wherein the neural network includes a convolution neural network and a recursive neural network (see page 26, section III-B disclosing the use of convolutional neural network / convolutional layers).  
Regarding claim 3, Tobin in view of Jaderberg discloses as discussed above in claim 1. Jaderberg teaches wherein the policy is a set of a plurality of actions of the virtual robot and respective selection probabilities of the plurality of actions (see fig. 1; see page 1, Abstract disclosing uses reinforcement learning to approximate both the optimal policy and optimal value function for many different pseudo-rewards. See also page 5 disclosing the use of probabilities).


Regarding claim 4, Tobin in view of Jaderberg discloses as discussed above in claim 3. Jaderberg teaches wherein the neural network is updated such that a selection probability of an action with which the reward has been obtained is increased (see page 6, section 3.3 disclosing Experience replay is also used to increase the efficiency and stability of the auxiliary control tasks. Q-learning updates are applied to sampled experiences that are drawn from the replay buffer, allowing features to be developed extremely efficiently).  
Regarding claim 5, Tobin in view of Jaderberg discloses as discussed above in claim 1. Jaderberg teaches wherein the reward is different according to the predetermined condition (see page 1, Abstract disclosing reinforcement learning to approximate both the optimal policy and optimal value function for many different pseudo-rewards).  
Regarding claim 6, Tobin in view of Jaderberg discloses as discussed above in claim 1. Tobin further teaches wherein noise is applied to the virtual image (see page, 25, section III-A disclosing type and amount of random noise added to images; see also page 28, section IV-F).  
Regarding claim 7, Tobin in view of Jaderberg discloses as discussed above in claim 6. Tobin further teaches wherein the noise is randomly changed on an episode-by-episode basis (see page 28, section IV-F disclosing the use of random noise).    
Regarding claim 8, Tobin in view of Jaderberg discloses as discussed above in claim 1. Tobin further teaches wherein the environmental condition includes one or both of a brightness and a color tone of virtual illumination light in the virtual environment (see page 26, section III-A (c) disclosing the color; see also page 29, section IV-G).  
Regarding claim 9, Tobin in view of Jaderberg discloses as discussed above in claim 8. Tobin further teaches wherein the brightness or the color tone is randomly changed on an episode-by-episode basis (see page 26, section III-A (c) disclosing the color; see also page 29, section IV-G).    
Regarding claim 10, Tobin in view of Jaderberg discloses as discussed above in claim 1. Tobin further teaches wherein the environmental condition includes textures of a plurality of objects included in the virtual environment (see page 25, section III-A disclosing Position and texture of all objects on the table, Textures of the table, floor, skybox, and robot; Random textures are chosen).  
Regarding claim 11, Tobin in view of Jaderberg discloses as discussed above in claim 1. Tobin further teaches wherein the robot has an arm that holds a work, and the virtual robot has a virtual arm that holds a virtual work (see fig. 1 and 8 showing the virtual training robot and a test robot that have arm to grasp the objects).  
Regarding claim 13, Tobin in view of Jaderberg discloses as discussed above in claim 11. Tobin further teaches wherein the virtual robot is capable of lifting up the virtual work and placing the virtual work in a predetermined area in the virtual environment (see fig. 1 and 8 showing the virtual training robot and a test robot that have arm to grasp the objects).  
Regarding claim 14, Tobin in view of Jaderberg discloses as discussed above in claim 11. Tobin further teaches wherein a position and a posture of the virtual work are randomly changed on an episode-by episode basis (see fig. 1 and 8 showing the virtual training robot and a test robot that have arm to grasp the objects under the technique of randomization, random exploration – see section I – Introduction and see section III disclosing the random values).
Regarding claim 15, Tobin in view of Jaderberg discloses as discussed above in claim 11. Tobin/Jaderberg is silent to teaches wherein the work is cloth or liquid.
Nevertheless, although a cloth or liquid element is not specified, it was common knowledge that a use of cloth or liquid as a work/object is a popular choice among a finite number of options for selecting the working object to interact with the robot, and it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Tobin/Jaderberg by using a work/object as cloth or liquid since a person of ordinary skill has good reason to pursue the known options within his or her technical grasp.  If this leads to the anticipated success, it is likely the product not of innovation but of ordinary skill and common sense.
Allowable Subject Matter
Claim 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
1.- Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research- Plappert et al – which is directed to introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.
2.- Automatic Goal Generation for Reinforcement Learning Agents – Florensa et al – which is directed to a method for automatic curriculum generation that considerably improves the sample efficiency of learning to reach all feasible goals in the environment. Learning to reach multiple goals is useful for multi-task settings such as navigation or manipulation, in which we want the agent to perform a wide range of tasks. Our method also naturally handles sparse reward functions, without needing to manually modify the reward function for every task, based on prior task knowledge.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jaime Figueroa whose telephone number is (571)270-7620.  The examiner can normally be reached on Monday-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffrey A. Burke can be reached on 469-295-9067.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAIME FIGUEROA/ Primary Examiner, AU 3664