DETAILED ACTION

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on January 15, 2020 and February 19, 2021 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 8 and 14 objected to because of the following informalities:  
Claim 8 reads, “change a mesh portion around the mesh portion” and is not clear to the examiner the intent of the statement. The examiner believes the claim should read “change a mesh portion”
Claim 14 reads “observation space information and” and should read “observation space information.”
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claim 1, 5, 10-11 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over a machine translation of JP2006250917A (hereinafter Iwane).
Regarding independent claim 1, Iwane discloses an image processing system (Paragraph 0025, “According to another aspect of the present invention, there is provided a CV system three-dimensional map generation apparatus,”), comprising at least one processor configured to 
Acquire taken images that have been taken by a camera, which is movable in a real space (Page 1, “along with obtaining the video image of the periphery of the mobile unit”); 
Acquire, based on changes in position of a feature point cloud in the taken images, observation space information including three-dimensional coordinates of the feature point clod in an observation space (Paragraph 0001, “In particular, the present invention provides a high-precision CV computing device that automatically obtains CV (camera vector) data that indicates a camera position and a camera rotation angle with high accuracy from a plurality of frame images of a video image (moving image)”); 
Integrate the observation space information and the additional information (Paragraph 0158, “Specifically, the PRM technology prepares all the shapes and attributes of the object to be predicted in advance as parts (operator parts), compares these parts with actual live-action images, and selects matching parts”).
Iwane teaches acquiring feature points of an object in captured images, but is silent on  using machine learning as a basis for acquiring additional information on a feature of a photographed object shown in the taken images (Paragraph 0014, “plurality of feature points are extracted from each frame image of the all-around video image recorded in the image recording unit and the all-around video image recording unit, and the feature points are traced to a plurality of adjacent frames of the all-around video image.”).  Nevertheless, it is common knowledge that machine learning is well known and practiced in the art and would have been an appropriate and obvious alternative to use in this endeavor for additional automation and to further obtain a desirable higher-precision position information as noted in Iwane (abstract).
Regarding dependent claim 5, the rejection of claim 1 is incorporated herein. Additionally, Iwane further discloses wherein the additional information includes information on a three-dimensional shape of the photographed object, (paragraph 0011, “A parallax camera coordinate three-dimensionalization unit for generating a three-dimensional shape in the camera coordinate system from a plurality of images with parallax of the recorded video image, and a three-dimensional feature among the three-dimensional shapes in the camera coordinate system”).
As noted in claim 1, Iwane is silent on the use of a machine learning algorithm, however being that the result of Iwane is to create and obtain high precision position information and machine learning is common knowledge in the relevant art, it would have been obvious and appropriate to one of ordinary skill in the art at the time of the claimed invention to incorporate a machine learning algorithm as an alternative for the endeavor for the added benefit of automation and precision as explained above. 
Regarding dependent claim 10, the rejection of claim 5 is incorporated herein. Additionally, Iwane further discloses wherein the additional information includes information on a normal of the photographed object (Paragraph 0194, “On the other hand, specific measurement points that are particularly important in the survey target range are manually specified and registered. The specific measurement point specifies a part derived from an artificial structure such as a vertical part, a horizontal part, or a right-angled part in the image, thereby improving the accuracy of the subsequent three-dimensional map;” the right-angled part in the image is read as a normal, being that normal are 90 degrees off of a surface).
Regarding dependent claim 11, the rejection of claim 5 is incorporated herein. Additionally, Iwane further discloses wherein the additional information includes information on a classification of the photographed object (Paragraph 0179, “FIG. 55 shows the classification and classification of feature points, boundary points, and area points for all blocks;” classifying information on features within the image, is read as classifying the photographed object)
(Paragraph 0024, “Further, according to the CV method three-dimensional map generation device of the present invention”).
Regarding independent claim 14, the references and analysis of claim 1 apply directly. Additionally, Iwane further discloses a non-transitory computer-readable information storage medium for storing a program (Paragraph 0032, “Here, the high-precision CV arithmetic device, CV system three-dimensional map generation device, and CV system navigation device of the present invention described below are realized by processing, means, and functions executed by a computer in accordance with instructions of a program (software).”).

Claims 2-4, and 6-9 are rejected under 35 U.S.C. 103 as being unpatentable over Iwane, and further in view of Christoph Fehn, "Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV," Proc. SPIE 5291, Stereoscopic Displays and Virtual Reality Systems XI, (21 May 2004); https://doi.org/10.1117/12.524762 (hereinafter Fehn).
Regarding dependent claim 2, the rejection of claim 1 is incorporated herein. Additionally, Iwane discloses wherein the additional information includes two-dimensional feature amount information in which a position of the photographed object in the taken images and a feature amount regarding the photographed object are (Paragraph 0070, “In addition, since the image is two- dimensional and the shape changes during tracking, there is a certain limit in tracking accuracy. Therefore, the camera vector obtained by the feature point tracking is regarded as an approximate value, and the three- dimensional information (three-dimensional shape) obtained in the subsequent process is traced on each frame image, and a high-precision camera vector is obtained from the trajectory”), Wherein the at least one processor is configured to; 
Estimate a position of the camera based on the changes in position of the feature point cloud, and set an observation viewpoint in the observation space based on a result of the estimation (Paragraph 0199, “Therefore, by adopting the configuration as in this embodiment, the camera position is known by obtaining the CV value of the three-dimensional coordinates of the camera itself,”).
Iwane fails to explicitly disclose as further recited, however Fehn discloses execute processing based on a result of comparison between two-dimensional observation information, which shows a view of the observation space as seen from the observation viewpoint, and the two-dimensional feature amount information (Page 94, “convert already existing 2D video material into 3D using so-called “structure from motion” algorithms;” Page 96, “2D-to-3D conversion techniques based on “structure from motion” approaches can be used to generate the required depth information for already recorded monoscopic video material” Page 97, equations 2 and 3; Page 97, “Depth-image-based rendering (DIBR) is the process of synthesizing “virtual” views of a scene from still- or moving color images and associated per-pixel depth information”).
It would have been obvious to a person having ordinary skill in the art at the time of the claimed invention to incorporate the teaching of Fehn in order to create a virtual view of a real-world scene in real time (abstract).
Regarding dependent claim 3, the rejection of claim 2 is incorporated herein. Additionally, Fehn in the combination further discloses wherein the feature amount includes a depth of the photographed object estimated, wherein, in the two-dimensional observation information, a position of the feature point cloud in a two-dimensional space, and a depth of the feature point cloud in the observation space are associated with each other (Page 94, “On principle, such (offline or online) methods process one or more monoscopic color video sequences to: (a) establish a dense set of image point correspondences from which information about the recording camera as well as the 3D structure of the scene can be derived, or (b) infer approximate depths information from the relative movements of automatically tracked image segment;” Page 97, “Depth-image-based rendering (DIBR) is the process of synthesizing “virtual” views of a scene from still- or moving color images and associated per-pixel depth information”… “At first, the original image points are reprojected into the 3D world, utilizing the respective depth data. Thereafter, these 3D space points are projected into the image plane of a “virtual” camera, which is located at the required viewing position. The concatenation of reprojection (2D-to-3D) and subsequent projection (3D-to-2D) is usually called 3D image warping in the Computer Graphics (CG) literature and will be derived mathematically in the following paragraph”), and
wherein the is configured to set a mesh of the photographed object in the observation space based on the two-dimensional feature amount information, and change a scale of the mesh based on the result of the comparison between the two-dimensional observation information and the two-dimensional feature amount information (Page 97, “This disparity equation can also be considered as a 3D image warping formalism, which can be used to generate an arbitrary novel view from a known reference image. This only requires the definition of the position and orientation of a “virtual” camera relative to the reference camera as well as the declaration of the “virtual” camera’s intrinsic parameters. Then, if the depth values of the corresponding 3D space points are known for every pixel of the original image, the “virtual” view can be synthesized by applying Eq. (5) to all original image points.”).
Fehn in the combination further teaches “2D-to-3D conversion techniques based on “structure from motion” approaches can be used to generate the required depth information for already recorded monoscopic video material” (Page 96, top paragraph).  Iwane and Fehn as a whole is silent on machine learning data to ascertain depth information as claimed.  Nevertheless, machine learning is common knowledge in the relevant art, it would have been obvious and appropriate to one of ordinary skill in the art at the time of the claimed invention to incorporate a machine learning algorithm as an alternative for the endeavor for the added benefit of automation and precision as explained in earlier claims.  
Regarding dependent claim 4, the rejection of claim 3 is incorporated herein. Additionally, Fehn in the combination further discloses wherein the at least one processor is configured to partially change the mesh after changing the scale of the mesh based on the result of the comparison between the two dimensional observation information (page 97, “M symbolize the two 2D image points,”) and the two-dimensional feature amount information (page 97, “Depth-image-based rendering (DIBR) is the process of synthesizing “virtual” views of a scene from still- or moving color images and associated per-pixel depth information;” the scale is read as depth information )
Regarding dependent claim 6, the rejection of claim 5 is incorporated herein. Additionally, Fehn in the combination further discloses wherein the additional information includes information on a mesh of the photographed object (Page 97, “This disparity equation can also be considered as a 3D image warping formalism, which can be used to generate an arbitrary novel view from a known reference image;” forming a view based on multiple images is read as a mesh because the images have to be pieced together to formulate the novel view (i.e. one that an image wasn’t taken from)).
Regarding dependent claim 7, the rejection of claim 6 is incorporated herein. Additionally, Fehn in the combination further discloses wherein the at least one processor is configured to set the mesh in the observation space based on the (Page 97, “This disparity equation can also be considered as a 3D image warping formalism, which can be used to generate an arbitrary novel view from a known reference image. This only requires the definition of the position and orientation of a “virtual” camera relative to the reference camera as well as the declaration of the “virtual” camera’s intrinsic parameters;” as the definition of position and orientation change (and are read as the additional information) the view generated would change, which is formed by the altered mesh).
Regarding dependent claim 8, the rejection of claim 7 is incorporated herein. Additionally, Fehn in the combination further discloses wherein the at least one processor is configured to change a mesh portion of the mesh that corresponds to the three-dimensional coordinates of the feature point cloud indicated by the observation space information, and then change a mesh portion around the mesh portion (Page 97, “This disparity equation can also be considered as a 3D image warping formalism, which can be used to generate an arbitrary novel view from a known reference image. This only requires the definition of the position and orientation of a “virtual” camera relative to the reference camera as well as the declaration of the “virtual” camera’s intrinsic parameters. Then, if the depth values of the corresponding 3D space points i.e., point cloud are known for every pixel of the original image, the “virtual” view can be synthesized by applying Eq. (5) to all original image points;” as recited prior, as the observation space information (view information) changes, the mesh would be updated to incorporate the relevant changes).
Regarding dependent claim 9, the rejection of claim 7 is incorporated herein. Additionally, Iwane in the combination further discloses wherein the at least one processor is configured to: 
Estimate a position of the camera based on the changes in position of the feature point cloud (Paragraph 0199, “Therefore, by adopting the configuration as in this embodiment, the camera position is known by obtaining the CV value of the three-dimensional coordinates of the camera itself”).
Iwane fails to explicitly disclose as further recited, however Fehn in the combination further discloses and set an observation viewpoint in the observation space based on a result of the estimation (Page 97, “Depth-image-based rendering (DIBR) is the process of synthesizing “virtual” views of a scene from still- or moving color images and associated per-pixel depth information;” virtual veews are read as viewpoint in observation space), and change each mesh portion based on an orientation of each mesh portion with respect to the observation viewpoint (Page 97, “This disparity equation can also be considered as a 3D image warping formalism, which can be used to generate an arbitrary novel view from a known reference image. This only requires the definition of the position and orientation of a “virtual” camera relative to the reference camera as well as the declaration of the “virtual” camera’s intrinsic parameters. Then, if the depth values of the corresponding 3D space points are known for every pixel of the original image, the “virtual” view can be synthesized by applying Eq. (5) to all original image points;” the mesh is read as forming the virtual view (i.e. piecing together the images to form a view) which is updated as the position and orientation is updated).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Iwane as applied to claim 1 above, and further in view of a machine translation of JP 2015079490 to Leung (hereinafter Leung). 
Regarding dependent claim 12, the rejection of claim 1 is incorporated herein. Iwane is silent on wherein the camera is configured to take images of the real space based on a predetermined frame rate, and wherein the at least one processor is configured to execute processing based on one of the taken images that is taken in the same frame.
However, Leung discloses wherein the camera is configured to take images of the real space based on a predetermined frame rate (Page 8, “the moving camera 120 is a digital video camera that continuously captures a frame group (or image group) representing the scene 110 in a three- dimensional (3D) space at a predetermined frame rate”), and wherein the at least one processor is configured to execute processing based on one of the taken images that is taken in the same frame (Page 12, “The model with the lowest score is the best fit. In the homography model, the first key frame and the second key frame are assumed to be images of the same plane, and when the displacement of the moving camera 120 is small, the homography model is fitted by a set of corresponding image features;” in order for an image to have corresponding features to another image (especially when considering moving images) both would likely be from similar frames, otherwise there would be large discrepancies).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
JP 2015524969 A discloses a method for automatically constructing, merging, and scaling feature points from images taken from multiple unknown camera positions

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Courtney J. Nelson whose telephone number is (571)272-3956. The examiner can normally be reached Monday - Friday 8:00 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/COURTNEY JOAN NELSON/Examiner, Art Unit 2668                                                                                                                                                                                                        
/VU LE/Supervisory Patent Examiner, Art Unit 2668