DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER’S AMENDMENT
Authorization for this examiner’s amendment was given in an interview with John Treilhard on 07/21/2021.
 	Claims are amended as follows:
1. (Currently Amended)	A method implemented by a data processing apparatus, the method comprising:	obtaining a first image depicting a physical environment, wherein the environment comprises a given physical object;	determining whether a position of the given object in the environment can be inferred based on a template representation library by applying template matching techniques to the first image, comprising, for each of a plurality of template representations from the template representation library:		determining whether a similarity measure between the template representation and a region of the first image exceeds a threshold,	wherein the template representation library comprises a plurality of template representations of respective objects; 	in response to determining that the position of the given object in the environment cannot be inferred based on the template representation library using template matching techniques:		obtaining a plurality of images of the environment;  		generating a three-dimensional reconstruction of the environment from the plurality of images of the environment, wherein the three-dimensional reconstruction of the environment characterizes a geometry of the environment;
2. (Canceled)
3. (Previously Presented)	The method of claim 1, wherein generating a three-dimensional reconstruction of the environment from the plurality of images of the environment comprises:	applying stereo reconstruction techniques to the plurality of images of the environment. 
4. (Previously Presented)	The method of claim 1, wherein the three-dimensional reconstruction of the environment comprises a plurality of coordinates defining the three-dimensional reconstruction of the environment. 
5. (Previously Presented)	The method of claim 1, wherein determining the estimated position of the given object using the three-dimensional reconstruction of the environment comprises:	determining a segmentation of the environment into a plurality of segmented regions based on the three-dimensional reconstruction of the environment;
6. (Previously Presented)	The method of claim 5, wherein determining a segmentation of the environment into a plurality of segmented regions based on the three-dimensional reconstruction of the environment comprises: 	determining a watershed transformation of the three-dimensional reconstruction of the environment.
7. (Previously Presented)	The method of claim 1, wherein generating the new template representation of the given object from the identified image region that is predicted to depict the given object comprises: 	cropping the identified image region that is predicted to depict the given object. 
8. (Previously Presented)	The method of claim 1, further comprising: 	physically interacting with the environment based on the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment; and	determining whether the interaction satisfies an interaction success condition; and 	refraining from generating the new template representation of the given object using the estimated position of the given object if the interaction does not satisfy the interaction success condition.
9. (Previously Presented)	The method of claim 1, wherein physically interacting with the environment based on the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment comprises:	attempting to manipulate the given object using a robotic actuator based on the estimated position of the given object that is determined from the three-dimensional 
10. (Original)	The method of claim 1, wherein each template representation comprises an image of a respective object. 
11. (Currently Amended)	A system comprising:	a memory storing instructions that are executable; and 	one or more computers to execute the instructions to perform operations comprising:	obtaining a first image depicting a physical environment, wherein the environment comprises a given physical object;	determining whether a position of the given object in the environment can be inferred based on a template representation library by applying template matching techniques to the first image, comprising, for each of a plurality of template representations from the template representation library:		determining whether a similarity measure between the template representation and a region of the first image exceeds a threshold,	wherein the template representation library comprises a plurality of template representations of respective objects; 	in response to determining that the position of the given object in the environment cannot be inferred based on the template representation library using template matching techniques:		obtaining a plurality of images of the environment;  		generating a three-dimensional reconstruction of the environment from the plurality of images of the environment, wherein the three-dimensional reconstruction of the environment characterizes a geometry of the environment;		determining an estimated position of the given object using the three-dimensional reconstruction of the environment;		generating a new template representation of the given object using the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment, comprising:
12. (Canceled)
13. (Previously Presented)	The system of claim 11, wherein generating a three-dimensional reconstruction of the environment from the plurality of images of the environment comprises:	applying stereo reconstruction techniques to the plurality of images of the environment.
14. (Previously Presented)	The system of claim 11, wherein the three-dimensional reconstruction of the environment comprises a plurality of coordinates defining the three-dimensional reconstruction of the environment.
15. (Previously Presented)	The system of claim 11, wherein determining the estimated position of the given object using the three-dimensional reconstruction of the environment comprises:	determining a segmentation of the environment into a plurality of segmented regions based on the three-dimensional reconstruction of the environment;	identifying a segmented region as the given object; and	determining the estimated position of the given object based on the segmented region identified as the given object.

17. (Currently Amended)	One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:	obtaining a first image depicting a physical environment, wherein the environment comprises a given physical object;	determining whether a position of the given object in the environment can be inferred based on a template representation library by applying template matching techniques to the first image, comprising, for each of a plurality of template representations from the template representation library:		determining whether a similarity measure between the template representation and a region of the first image exceeds a threshold,	wherein the template representation library comprises a plurality of template representations of respective objects; 	in response to determining that the position of the given object in the environment cannot be inferred based on the template representation library using template matching techniques:		obtaining a plurality of images of the environment;  		generating a three-dimensional reconstruction of the environment from the plurality of images of the environment, wherein the three-dimensional reconstruction of the environment characterizes a geometry of the environment;		determining an estimated position of the given object using the three-dimensional reconstruction of the environment;		generating a new template representation of the given object using the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment, comprising:
18. (Canceled)
19. (Previously Presented)	The non-transitory computer storage media of claim 17, wherein generating a three-dimensional reconstruction of the environment from the plurality of images of the environment comprises:	applying stereo reconstruction techniques to the plurality of images of the environment.

20. (Previously Presented)	The non-transitory computer storage media of claim 17, wherein generating the new template representation of the given object from the identified image region that is predicted to depict the given object comprises: 
cropping the identified image region that is predicted to depict the given object.

Allowable Subject Matter
Claims 1, 3-11, 13-17, 19 and 20 are allowed.

			 Statement of Reasons for Allowance
The following is an Examiner’s statement of reasons for allowance:
          
With respect to the allowed independent claim 1:
Morin et al. (US 20140118536, hereinafter “Morin”), teaches,
“A method implemented by a data processing apparatus, the method comprising: obtaining a first image (one or more of the cameras may be installed on machines or vehicles 106, Para. [0015]) depicting a physical environment (working environment 104 in Fig. 1), wherein the environment comprises a given physical object (A given object may be, for example, an edge 1051 or a corner 1052 of a light switch 103 or other item on a wall 107, or may be a side or corner of a window 109), determining whether a position of the given object in the environment can be inferred based on a template representation library using template matching techniques (The processing subsystem 100 enters the database 102 using the information that identifies unique objects in the image and utilizes a known searching technique, such as SURF, to determine if the database contains position information for one or more of the identified unique objects, Para. [0022]), wherein the template representation library comprises a plurality of template representations of respective objects (the data base contains and/or is updated to contain object identification information and associated position coordinates, and as appropriate, orientation information, for uniquely identifiable objects in a working environment, Para. [0014]), in response to determining that the position of the given object in the environment cannot be inferred based on the template representation library using template matching techniques (If the system identifies fewer than N known objects in the image, the system determines if the INS subsystem 110 is initialized (steps 404, 412). If so, the system uses the inertial position and attitude to calculate the position and orientation information for the unknown unique identifiable objects in the image (step 414), Para. [0033]), generating a new template representation of the given object and augmenting the template representation library with the new template representation (In step 420, the database is updated by including, for each new object, information relating to the associated pixel patterns for one or more edges, contours, and so forth, that define the unique object as well as attributes such as color, outline, shape, texture and so forth. In addition, the position and orientation coordinates are included in the database also as an attribute of the object, Para. [0034]).”
Kwant et al. (US 20180165831, hereinafter “Kwant”), teaches,
“obtaining a plurality of images of the environment (Fig. 6 and steps 206 and 210, where first and second images are captured), generating a three-dimensional reconstruction of the environment from the  plurality of images of the environment, wherein the three-dimensional reconstruction of the environment characterizes a geometry of the environment (three dimensional information/data may be determined and/or generated for one or more static features identified in both the first and second captured images. For example, the three dimensional information/data may comprise three dimensional shape information/data for each of static features identified in both the first and second captured images, Fig. 6; 214 and Para. [0060]); determining an estimated position of the given object using the three-dimensional reconstruction of the environment (a correction to the observed position and/or pose of the vehicle 5 and/or the vehicle apparatus 20 may be determined. For example, the vehicle apparatus 20 may determine a correction to the observed position and/or pose, Fig.6; 216-218); generating a new template representation of the given object using the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment (a corrected position may be determined based on the correction to the observed position and the observed position and an updated reference image may be generated based on the corrected position. The correction to the observed pose may then be determined by comparing (or implicitly comparing) the captured image to the updated reference image, Para. [0049]).”
However, Morin and Kwant, whether taken alone or combination, do not teach or suggest the following novel features:
“A method implemented by a data processing apparatus, the method, comprising, for each of a plurality of template representations from the template representation library: determining whether a similarity measure between the template representation and a region of the first image exceeds a threshold, identifying, from an image of the environment, a region of the image that is predicted to depict the given object based on the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment and generating the new template representation of the given object from the identified image region that is predicted to depict the given object”, in combination with all the recited limitations of the claim 1.

With respect to the allowed independent claim 11:
Morin et al. (US 20140118536, hereinafter “Morin”), teaches,
“a memory storing instructions that are executable (i.e., inherent feature in a processing subsystem 100) and one or more computer to execute instructions to perform operation comprising: obtaining a first image (one or more of the cameras may be installed on machines or vehicles 106, Para. [0015]) depicting a physical environment (working environment 104 in Fig. 1), wherein the environment comprises a given physical object (A given object may be, for example, an edge 1051 or a corner 1052 of a light switch 103 or other item on a wall 107, or may be a side or corner of a window 109), determining whether a position of the given object in the environment can be inferred based on a template representation library by applying template matching techniques to the first image (The processing subsystem 100 enters the database 102 using the information that identifies unique objects in the image and utilizes a known searching technique, such as SURF, to determine if the database contains position information for one or more of the identified unique objects, Para. [0022]), wherein the template representation library comprises a plurality of template representations of respective objects (the data base contains and/or is updated to contain object identification information and associated position coordinates, and as appropriate, orientation information, for uniquely identifiable objects in a working environment, Para. [0014]), in response to determining that the position of the given object in the environment cannot be inferred based on the template representation library using template matching techniques (If the system identifies fewer than N known objects in the image, the system determines if the INS subsystem 110 is initialized (steps 404, 412). If so, the system uses the inertial position and attitude to calculate the position and orientation information for the unknown unique identifiable objects in the image (step 414), Para. [0033]) generating a new template representation of the given object and augmenting the template representation library with the new template representation (In step 420, the database is updated by including, for each new object, information relating to the associated pixel patterns for one or more edges, contours, and so forth, that define the unique object as well as attributes such as color, outline, shape, texture and so forth. In addition, the position and orientation coordinates are included in the database also as an attribute of the object, Para. [0034]).”
Kwant et al. (US 20180165831, hereinafter “Kwant”), teaches,
“obtaining a plurality of images of the environment (Fig. 6 and steps 206 and 210, where first and second images are captured), generating a three-dimensional reconstruction of the environment from the  plurality of images of the environment, wherein the three-dimensional reconstruction of the environment characterizes a geometry of the environment (three dimensional information/data may be determined and/or generated for one or more static features identified in both the first and second captured images. For example, the three dimensional information/data may comprise three dimensional shape information/data for each of static features identified in both the first and second captured images, Fig. 6; 214 and Para. [0060]); determining an estimated position of the given object using the three-dimensional reconstruction of the environment (a correction to the observed position and/or pose of the vehicle 5 and/or the vehicle apparatus 20 may be determined. For example, the vehicle apparatus 20 may determine a correction to the observed position and/or pose, Fig.6; 216-218); generating a new template representation of the given object using the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment (a corrected position may be determined based on the correction to the observed position and the observed position and an updated reference image may be generated based on the corrected position. The correction to the observed pose may then be determined by comparing (or implicitly comparing) the captured image to the updated reference image, Para. [0049]).”
However, Morin and Kwant, whether taken alone or combination, do not teach or suggest the following novel features:
“A system comprising, for each of a plurality of template representations from the template representation library: determining whether a similarity measure between the template representation and a region of the first image exceeds a threshold, identifying, from an image of the environment, a region of the image that is predicted to depict the given object based on the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment and generating the new template representation of the given object from the identified image region that is predicted to depict the given object”, in combination with all the recited limitations of the claim 11.

With respect to the allowed independent claim 17:
Morin et al. (US 20140118536, hereinafter “Morin”), teaches,
“One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: obtaining a first image (one or more of the cameras may be installed on machines or vehicles 106, Para. [0015]) depicting a physical environment (working environment 104 in Fig. 1), wherein the environment comprises a given physical object (A given object may be, for example, an edge 1051 or a corner 1052 of a light switch 103 or other item on a wall 107, or may be a side or corner of a window 109), determining whether a position of the given object in the environment can be inferred based on a template representation library using template matching techniques (The processing subsystem 100 enters the database 102 using the information that identifies unique objects in the image and utilizes a known searching technique, such as SURF, to determine if the database contains position information for one or more of the identified unique objects, Para. [0022]), wherein the template representation library comprises a plurality of template representations of respective objects (the data base contains and/or is updated to contain object identification information and associated position coordinates, and as appropriate, orientation information, for uniquely identifiable objects in a working environment, Para. [0014]), in response to determining that the position of the given object in the environment cannot be inferred based on the template representation library using template matching techniques (If the system identifies fewer than N known objects in the image, the system determines if the INS subsystem 110 is initialized (steps 404, 412). If so, the system uses the inertial position and attitude to calculate the position and orientation information for the unknown unique identifiable objects in the image (step 414), Para. [0033]), generating a new template representation of the given object and augmenting the template representation library with the new template representation (In step 420, the database is updated by including, for each new object, information relating to the associated pixel patterns for one or more edges, contours, and so forth, that define the unique object as well as attributes such as color, outline, shape, texture and so forth. In addition, the position and orientation coordinates are included in the database also as an attribute of the object, Para. [0034]).”
Kwant et al. (US 20180165831, hereinafter “Kwant”), teaches,
“obtaining a plurality of images of the environment (Fig. 6 and steps 206 and 210, where first and second images are captured), generating a three-dimensional reconstruction of the environment from the  plurality of images of the environment, wherein the three-dimensional reconstruction of the environment characterizes a geometry of the environment (three dimensional information/data may be determined and/or generated for one or more static features identified in both the first and second captured images. For example, the three dimensional information/data may comprise three dimensional shape information/data for each of static features identified in both the first and second captured images, Fig. 6; 214 and Para. [0060]); determining an estimated position of the given object using the three-dimensional reconstruction of the environment (a correction to the observed position and/or pose of the vehicle 5 and/or the vehicle apparatus 20 may be determined. For example, the vehicle apparatus 20 may determine a correction to the observed position and/or pose, Fig.6; 216-218); generating a new template representation of the given object using the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment (a corrected position may be determined based on the correction to the observed position and the observed position and an updated reference image may be generated based on the corrected position. The correction to the observed pose may then be determined by comparing (or implicitly comparing) the captured image to the updated reference image, Para. [0049]).”
However, Morin and Kwant, whether taken alone or combination, do not teach or suggest the following novel features:
“The non-transitory computer storage media of claim 17, comprising, for each of a plurality of template representations from the template representation library: determining whether a similarity measure between the template representation and a region of the first image exceeds a threshold, identifying, from an image of the environment, a region of the image that is predicted to depict the given object based on the estimated position of the given object that is determined from the three-dimensional reconstruction of the environment and generating the new template representation of the given object from the identified image region that is predicted to depict the given object”, in combination with all the recited limitations of the claim 17.
Any comments considered necessary by Applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GOLAM SOROWAR whose telephone number is (571)270-3761.  The examiner can normally be reached on Mon-Fri: 8:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.