DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a capture portion”, “an image acquisition portion”,  “a viewpoint conversion map generation portion”, “a feature extraction process portion”, and “an output process portion”, “an image display portion”, “movement control portion” in claim 1, or claims 5-9, and “a memory device”, “a calculation circuit”, “a teacher data setting portion”, “a learning portion”, “an object identification apparatus”, “an image acquisition portion”, “a calculation device” in claim 3, or claims 15-16.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 10-11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Schulter et al (arXiv:1803.10870v1 2018).
-Regarding claim 10, Schulter discloses an object identification method comprising (Abstract; Figs. 1-9): inputting data of a capture image of an object captured from a capture viewpoint to a convolutional neural network (Abstract, “perspective view”; FIG. 1 
    PNG
    media_image1.png
    581
    991
    media_image1.png
    Greyscale
); applying the data of the capture image to convolution calculation; extracting a feature map in a first coordinate system based on the capture viewpoint (FIGS. 1-2; Page 5, 1st and 2nd paragraphs, section 3.1; Page 13, 2nd paragraph); applying a warp function to the feature map (FIGS. 1, 3-5; Table 3; Page 9, Table 3; 1st paragraph), the warp function relating a position in a second coordinate system based on a different viewpoint from the capture viewpoint to a position in the first coordinate system (Abstract, “perspective view … top view”; FIGS. 1, 3-5; Table 3; Page 9, Table 3; 1st paragraph; Page 4, 1st paragraph; Page 6, section 3.2; ); and obtaining a viewpoint conversion map in which the data of the capture image is converted from the capture viewpoint to the different viewpoint based on the feature map to which the warp function is applied and the object is identified (Abstract; FIGS. 1, 3-5; Table 3; Page 7, section 3.2, 1st paragraph; Supplemental Material: page 4, section 4, Fig. 2).
-Regarding claim 11, Schulter discloses an object model learning method comprising (Abstract; Figs. 1-9): in an object identification model forming a convolutional neural network and a warp structure warping a feature map extracted in the convolutional neural network to a different coordinate system (Abstract; Figs 1, 3-5; Table 3; Page 7, section 3.2, 1st paragraph;) preparing, in the warp structure, a warp parameter for relating a position in the different coordinate system to a position in a coordinate system before warp (Fig. 5; Page 9, 1st – 2nd paragraphs; Page 8, 1st paragraph); and learning the warp parameter to input a capture image in which an object is captured to the object identification model (Page 3, 1st paragraph; Page 9, 1st – 2nd paragraphs, “parameterized by                         
                            θ
                        
                     … hyperparameters”) and output a viewpoint conversion map in which the object is identified in the different coordinate system (Abstract; FIGS. 1, 3-5; Table 3; Page 7, section 3.2, 1st paragraph; Supplemental Material: page 4, section 4, Fig. 2).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4, 6-9 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Nagata et al (US PG-PUB No. 20190039614 A1) in view of Hayakawa et al (US PG-PUB No. 20140146176 A1), and further in view of Schulter et al (arXiv:1803.10870v1 2018).
	-Regarding claim 1, Nagata discloses an object identification apparatus (FIG. 1) that is communicably connected to a capture portion (FIG. 1, camera sensor) mounted on a moving body ([0003], “mounted in the vehicle”) and is configured to identify an object in an outside of the moving body (Abstract; [0045]; [0047]-[0048]), the object identification apparatus comprising (Abstract; FIGS. 1-10): an image acquisition portion configured to acquire an image of the outside captured by the capture portion from a predetermined capture viewpoint (FIGS. 1-5; [0047]; FIG.6, S4, S5; FIG. 8).
	Nagata does disclose an object recognition unit (FIG. 1, unit 12) recognizing peripheral objects by using information of the camera sensor ([0047]) and calculating the position coordinates of the recognized objects in a reference coordinate system ([0047]-[0048]), and a display unit displaying image information for a driver (FIG. 1 HMI 8; [0043]). Nagata does not disclose a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint.
	In the same field of endeavor, Hayakawa teaches a moving body detecting method and device (Hayakawa: Abstract; FIGS. 1-18). Hayakawa further teaches a  viewpoint conversion unit performing viewpoint conversion of the captured images into bird's-eye view images (Hayakawa: Abstract; [0040]; FIGS. 3-4, unit 31).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Nagata with the teaching of Hayakawa by using a viewpoint conversion unit in order to perform viewpoint conversion to bird's-eye image data and align object positions associated with images taken from different times or viewpoints.
Nagata in view of Hayakawa does not disclose viewpoint conversion unit that forms a convolutional neural network configured to receive image data.
However, Schulter is an analogous art pertinent to the problem to be solved in this application and teaches a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians (Schulter: Abstract; Figs. 1-9)
Schulter further teaches a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion (Schulter: Fig. 1, “CNN”; FIGS. 2, 4) and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint via the convolutional neural network (Schulter: Fig. 1, “bird’s eye view”, footnote, “top-view”; Page 4, 1st paragraph, “BEV representation”; Page 6, Section 3.2), wherein: the viewpoint conversion map generation portion includes a feature extraction process portion configured to apply convolution calculation by the convolutional neural network to the data of the image and extract a feature map of the object in a first coordinate system based on the capture viewpoint (Schulter: Fig. 1; Page 6, Section 3.2; Figs. 3-5) and an output process portion configured to apply a warp function to the feature map extracted by the feature extraction process portion, the warp function relating a position in a second coordinate system based on the different viewpoint to a position in the first coordinate system and output the viewpoint conversion map in which the object in an area of the second coordinate system is identified (Schulter: Figs. 1, 3-5; Table 3; Page 3, 1st paragraph, “aligns … a variant of spatial transformer network”; Page 9, 1st paragraph; Page 14, 2nd paragraph). 
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa with the teaching of Schulter by using a convolutional neural network and a warp function in order to improve the performance of alignment and be able to predict the occluded portions of the scene layout.
-Regarding claim 2, the combination further discloses the second coordinate system is a coordinate system of a two-dimensional space having a movable direction of the moving body (Schulter: Page 6, section 3.2; Fig. 3).
-Regarding claim 4, the combination further discloses wherein: the output process portion includes a warp application portion configured to apply the warp function to the feature map extracted by the feature extraction process portion and an identification process portion configured to concatenate the feature map to which the warp function is applied with the convolutional neural network and output the viewpoint conversion map in which the object in the area of the second coordinate system is identified (Schulter: Abstract; Figs. 1-5).
-Regarding claim 6, the combination further discloses wherein: the feature extraction process portion is configured to output the feature map in which the object in an area of the first coordinate system is identified; and the output process portion is configured to apply the warp function to the feature map in which the object in the area of the first coordinate system is identified and output the feature map as the viewpoint conversion map (Schulter: Abstract; Figs. 1-5).
-Regarding claim 17, the combination further discloses wherein: the capture portion corresponds to a camera; and the image acquisition portion and the viewpoint conversion map generation portion correspond to a processor (Nagata: FIG. 1).
-Regarding claim 7, , Nagata discloses a moving body system for a moving body, the moving body system comprising (Abstract; FIGS. 1-10): a capture portion that is mounted on the moving body (FIG. 1, camera sensor, [0003], “mounted in the vehicle”) and is configured to capture an outside of the moving body from a predetermined capture viewpoint (FIGS. 1-5; [0047]; FIG.6, S4, S5; FIG. 8) and generate an image ([0047]); and an object identification apparatus that is communicably connected to the capture portion and is configured to identify an object in the outside of the moving body (Abstract; FIGS. 1, 8, 10; [0045]; [0047]-[0048]).
	Nagata does disclose an object recognition unit (FIG. 1, unit 12) recognizing peripheral objects by using information of the camera sensor ([0047]) and calculating the position coordinates of the recognized objects in a reference coordinate system ([0047]-[0048]), and a display unit displaying image information for a driver (FIG. 1 HMI 8; [0043]). Nagata does not disclose the object identification apparatus including a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint.
	In the same field of endeavor, Hayakawa teaches a moving body detecting method and device (Hayakawa: Abstract; FIGS. 1-18). Hayakawa further teaches a  viewpoint conversion unit performing viewpoint conversion of the captured images into bird's-eye view images (Hayakawa: Abstract; [0040]; FIGS. 3-4, unit 31).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Nagata with the teaching of Hayakawa by using a viewpoint conversion unit in order to perform viewpoint conversion to bird's-eye image data and align object positions associated with images taken from different times or viewpoints.
Nagata in view of Hayakawa does not disclose viewpoint conversion unit that forms a convolutional neural network configured to receive image data.
However, Schulter is an analogous art pertinent to the problem to be solved in this application and teaches a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians (Schulter: Abstract; Figs. 1-9)
Schulter further teaches a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion (Schulter: Fig. 1, “CNN”; FIGS. 2, 4) and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint via the convolutional neural network (Schulter: Fig. 1, “bird’s eye view”, footnote, “top-view”; Page 4, 1st paragraph, “BEV representation”; Page 6, Section 3.2), wherein: the viewpoint conversion map generation portion includes a feature extraction process portion configured to apply convolution calculation by the convolutional neural network to the data of the image and extract a feature map of the object in a first coordinate system based on the capture viewpoint (Schulter: Fig. 1; Page 6, Section 3.2; Figs. 3-5) and an output process portion configured to apply a warp function to the feature map extracted by the feature extraction process portion, the warp function relating a position in a second coordinate system based on the different viewpoint to a position in the first coordinate system and output the viewpoint conversion map in which the object in an area of the second coordinate system is identified (Schulter: Figs. 1, 3-5; Table 3; Page 3, 1st paragraph, “aligns … a variant of spatial transformer network”; Page 9, 1st paragraph; Page 14, 2nd paragraph). 
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa with the teaching of Schulter by using a convolutional neural network and a warp function in order to improve the performance of alignment and be able to predict the occluded portions of the scene layout.
-Regarding claim 8, the combination further discloses an image display portion configured to display an image obtained by visualizing the viewpoint conversion map (Nagata: FIG. 1).
-Regarding claim 9, the combination further discloses a movement control portion configured to control movement of the moving body by using the viewpoint conversion map (Nagata: FIG. 1).
-Regarding claim 18, the combination further discloses wherein: the capture portion corresponds to a camera; and the image acquisition portion and the viewpoint conversion map generation portion correspond to a processor (Nagata: FIG. 1).
Claims 12 is rejected under 35 U.S.C. 103 as being unpatentable over Schulter et al (arXiv:1803.10870v1 2018) in view of Jaderberg et al (NIPS 2015).
-Regarding claim 12, Schulter discloses before the learning, preparing a set of data of the capture image in which the object is captured and data that is correct answer data corresponding to the data of the capture image and is data of the viewpoint conversion map in which the object is identified (Tables 1-3; Page 6, 3rd paragraph, “ground truth”; Page 7, section 3.3; Page 10, 1st paragraph; Page 9, section 4), wherein: in the learning, when the data of the capture image is input to the object identification model, the warp parameter is learned to output data closer to the correct data is output (Figs.1, 4; Page 3, 1st paragraph; Page 9, 1st – 2nd paragraphs, “parameterized by                         
                            θ
                        
                     … hyperparameters”).
Schulter does not disclose kernel parameter for the convolutional neural network. Schulter does not disclose the kernel parameter and the warp parameter are simultaneously learned.
In the same field of endeavor, Jaderberg teaches a learnable module - Spatial Transformer, which explicitly allows the spatial manipulation of data within a convolutional neural network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps (Jaderberg: Abstract; Figure 3;Page 3, section 3). Jaderberg further teaches kernel parameters for the convolutional neural network, and the kernel parameter and the warp parameter are simultaneously learned (Page 3, sections 3.1-3.2; Page 4, section 3.3; equations (1)-7; Page 5, section 3.4, 1st – 4th paragraphs).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Schulter with the teaching of Jaderberg by training the kernel parameter and the warp parameter simultaneously in order to provide the convolutional neural network the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimization process.
Claims 3, 15-16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Nagata et al (US PG-PUB No. 20190039614 A1) in view of Hayakawa et al (US PG-PUB No. 20140146176 A1), and further in view of Schulter et al (arXiv:1803.10870v1 2018), in view of Jaderberg et al (NIPS 2015).
-Regarding claim 3, Nagata in view of Hayakawa, and further in view of Schulter discloses the apparatus of claim 1. 
Nagata in view of Hayakawa, and further in view of Schulter discloses an object identification apparatus comprising (Nagata: Abstract; FIGS. 1-10): a calculation circuit configured to execute a calculation process of the object identification model (Nagata: FIG.1, device 10, unit 12), a control parameter setting unit (Nagata: FIG. 1, unit 15), a memory (ROM) stores various data including various programs and maps (Nagata: [0044]-[0045]).
Nagata in view of Hayakawa, and further in view of Schulter does not disclose kernel parameter for the convolutional neural network. Nagata in view of Hayakawa, and further in view of Schulter does not disclose the kernel parameter and the warp parameter that are simultaneously learned.
However, Jaderberg is an analogous art pertinent to the problem to be solved in this application and teaches a learnable module - Spatial Transformer, which explicitly allows the spatial manipulation of data within a convolutional neural network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps (Jaderberg: Abstract; Figure 3;Page 3, section 3). Jaderberg further teaches kernel parameters for the convolutional neural network, and the kernel parameter and the warp parameter are simultaneously learned (Page 3, sections 3.1-3.2; Page 4, section 3.3; equations (1)-7; Page 5, section 3.4, 1st – 4th paragraphs).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa, and further in view of Schulter with the teaching of Jaderberg by training the kernel parameter and the warp parameter simultaneously in order to provide the convolutional neural network the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimization process. As a common practice, the modification also provides processors, memories, and other devices to implement convolutional neural network-based object detection model, a warp function, and learning algorithm, to store kernel parameter and warp parameters, and to process the convolution calculation using the kernel parameter and calculation that uses the warp parameter and applies the warp function.
-Regarding claim 15, Nagata discloses an object identification apparatus comprising (Abstract; FIGS. 1-10): a calculation circuit configured to execute a calculation process of the object identification model (FIG.1, device 10, unit 12), a control parameter setting unit (FIG. 1, unit 15), a memory (ROM) stores various data including various programs and maps ([0044]-[0045]), a capture portion that is mounted on the moving body (FIG. 1, camera sensor, [0003], “mounted in the vehicle”) and is configured to capture an outside of the moving body from a predetermined capture viewpoint (FIGS. 1-5; [0047]; FIG.6, S4, S5; FIG. 8) and generate an image ([0047]); and an object identification apparatus that is communicably connected to the capture portion and is configured to identify an object in the outside of the moving body (Abstract; FIGS. 1, 8, 10; [0045]; [0047]-[0048]).
Nagata does disclose an object recognition unit (FIG. 1, unit 12) recognizing peripheral objects by using information of the camera sensor ([0047]) and calculating the position coordinates of the recognized objects in a reference coordinate system ([0047]-[0048]), and a display unit displaying image information for a driver (FIG. 1 HMI 8; [0043]) .
Nagata does not disclose the object identification apparatus including a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint. Nagata does not disclose a learning portion configured to learn a kernel parameter for a kernel of the convolutional neural network and a warp parameter for the warp structure to output data closer to the correct data when the capture image is input to the object identification model.
In the same field of endeavor, Hayakawa teaches a moving body detecting method and device (Hayakawa: Abstract; FIGS. 1-18). Hayakawa further teaches a  viewpoint conversion unit performing viewpoint conversion of the captured images into bird's-eye view images (Hayakawa: Abstract; [0040]; FIGS. 3-4, unit 31).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Nagata with the teaching of Hayakawa by using a viewpoint conversion unit in order to perform viewpoint conversion to bird's-eye image data and align object positions associated with images taken from different times or viewpoints.
Nagata in view of Hayakawa does not disclose viewpoint conversion unit that forms a convolutional neural network configured to receive image data.
However, Schulter is an analogous art pertinent to the problem to be solved in this application and teaches a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians (Schulter: Abstract; Figs. 1-9)
Schulter further teaches a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion (Schulter: Fig. 1, “CNN”; FIGS. 2, 4) and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint via the convolutional neural network (Schulter: Fig. 1, “bird’s eye view”, footnote, “top-view”; Page 4, 1st paragraph, “BEV representation”; Page 6, Section 3.2); Schulter teaches a warp structure that warps a feature map extracted in the convolutional neural network to a different coordinate system (Schulter: Figs. 1, 3-5); Schulter teaches data sets of a capture image of an object captured from a capture viewpoint and an output map, as correct answer data, in which the object is identified in a coordinate system based on a different viewpoint from the capture viewpoint (Schulter: Tables 1-3; Page 6, 3rd paragraph, “ground truth”; Page 7, section 3.3; Page 10, 1st paragraph; Page 9, section 4); Schulter teaches a learning method to learn a warp parameter for the warp structure to output data closer to the correct data when the capture image is input to the object identification model (Schulter: Figs.1, 4; Page 3, 1st paragraph; Page 9, 1st – 2nd paragraphs, “parameterized by                         
                            θ
                        
                     … hyperparameters”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa with the teaching of Schulter by using a convolutional neural network and a warp function in order to improve the performance of alignment and be able to predict the occluded portions of the scene layout.
Nagata in view of Hayakawa, and further in view of Schulter does not disclose kernel parameter for the convolutional neural network. Nagata in view of Hayakawa, and further in view of Schulter does not disclose the kernel parameter and the warp parameter that are simultaneously learned.
However, Jaderberg is an analogous art pertinent to the problem to be solved in this application and teaches a learnable module - Spatial Transformer, which explicitly allows the spatial manipulation of data within a convolutional neural network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps (Jaderberg: Abstract; Figure 3;Page 3, section 3). Jaderberg further teaches kernel parameters for the convolutional neural network, and the kernel parameter and the warp parameter are simultaneously learned (Page 3, sections 3.1-3.2; Page 4, section 3.3; equations (1)-7; Page 5, section 3.4, 1st – 4th paragraphs).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa, and further in view of Schulter with the teaching of Jaderberg by training the kernel parameter and the warp parameter simultaneously in order to provide the convolutional neural network the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimization process. As a common practice, the modification also provides processors, memories, and other devices to implement convolutional neural network-based object detection model, a warp function, and learning algorithm and to store kernel parameter and warp parameters.
-Regarding claim 16, Nagata discloses an object identification apparatus that is communicably connected to a camera and is configured to identify an object in the outside of the moving body, , the object identification apparatus comprising (Abstract; FIGS. 1-10; [0045]; [0047]-[0048]): an image acquisition portion that is connected to the camera and is configured to acquire an image of the outside captured by the camera (FIG. 1, camera sensor, [0003], “mounted in the vehicle”; [0047]; FIGS 4-6, 8); a calculation circuit configured to execute a calculation process of the object identification model (FIG.1, device 10, unit 12), a control parameter setting unit (FIG. 1, unit 15), a memory (ROM) stores various data including various programs and maps ([0044]-[0045]). 
Nagata does disclose an object recognition unit (FIG. 1, unit 12) recognizing peripheral objects by using information of the camera sensor ([0047]) and calculating the position coordinates of the recognized objects in a reference coordinate system ([0047]-[0048]), and a display unit displaying image information for a driver (FIG. 1 HMI 8; [0043]) .
Nagata does not disclose the object identification apparatus form a convolutional neural network including an encoder that includes a plurality of feature amount extraction units and a decoder. Nagata does not disclose a warp function, warp parameter, and kernel parameter. Nagata does not disclose generating a viewpoint conversion map of which viewpoint is converted into a different viewpoint from a viewpoint captured by the camera.
In the same field of endeavor, Hayakawa teaches a moving body detecting method and device (Hayakawa: Abstract; FIGS. 1-18). Hayakawa further teaches a  viewpoint conversion unit performing viewpoint conversion of the captured images into bird's-eye view images (Hayakawa: Abstract; [0040]; FIGS. 3-4, unit 31).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Nagata with the teaching of Hayakawa by using a viewpoint conversion unit in order to perform viewpoint conversion to bird's-eye image data and align object positions associated with images taken from different times or viewpoints.
Nagata in view of Hayakawa does not disclose viewpoint conversion unit that forms a convolutional neural network configured to receive image data.
However, Schulter is an analogous art pertinent to the problem to be solved in this application and teaches a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians (Schulter: Abstract; Figs. 1-9)
Schulter further teaches a viewpoint conversion map generation portion that forms a convolutional neural network configured to receive data of the image acquired by the image acquisition portion (Schulter: Fig. 1, “CNN”; FIGS. 2, 4) and is configured to output a viewpoint conversion map obtained by converting the image into a different viewpoint from the capture viewpoint via the convolutional neural network (Schulter: Fig. 1, “bird’s eye view”, footnote, “top-view”; Page 4, 1st paragraph, “BEV representation”; Page 6, Section 3.2); 
Schulter teaches convolutional neural network including an encoder and a decoder portion, and causing the encoder portion to extract a feature map of a feature amount of the object from data of the image (Schulter: Figs. 1-2, 4; Page 7, section 3.3, 1st paragraph); Schulter teaches generating a plurality of warp functions and applying the plurality of warp functions to the feature map (Schulter: Figs. 1, 3-5; Table 3; Page 3, 1st paragraph, “aligns … a variant of spatial transformer network”; Page 9, 1st paragraph; Page 14, 2nd paragraph).
Schulter further teaches data sets of a capture image of an object captured from a capture viewpoint and an output map, as correct answer data, in which the object is identified in a coordinate system based on a different viewpoint from the capture viewpoint (Schulter: Tables 1-3; Page 6, 3rd paragraph, “ground truth”; Page 7, section 3.3; Page 10, 1st paragraph; Page 9, section 4); Schulter teaches a learning method to learn a warp parameter for the warp structure to output data closer to the correct data when the capture image is input to the object identification model (Schulter: Figs.1, 4; Page 3, 1st paragraph; Page 9, 1st – 2nd paragraphs, “parameterized by                         
                            θ
                        
                     … hyperparameters”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa with the teaching of Schulter by using a convolutional neural network and a warp function in order to improve the performance of alignment and be able to predict the occluded portions of the scene layout.
Nagata in view of Hayakawa, and further in view of Schulter does not disclose kernel parameter for the convolutional neural network. Nagata in view of Hayakawa, and further in view of Schulter does not disclose the kernel parameter and the warp parameter that are simultaneously learned.
However, Jaderberg is an analogous art pertinent to the problem to be solved in this application and teaches a learnable module - Spatial Transformer, which explicitly allows the spatial manipulation of data within a convolutional neural network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps (Jaderberg: Abstract; Figure 3;Page 3, section 3). Jaderberg further teaches kernel parameters for the convolutional neural network, and the kernel parameter and the warp parameter are simultaneously learned (Page 3, sections 3.1-3.2; Page 4, section 3.3; equations (1)-7; Page 5, section 3.4, 1st – 4th paragraphs).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Nagata in view of Hayakawa, and further in view of Schulter with the teaching of Jaderberg by training the kernel parameter and the warp parameter simultaneously in order to provide the convolutional neural network the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimization process. As a common practice, the modification also provides processors, memories, and other devices to implement convolutional neural network-based object detection model, a warp function, and learning algorithm and to store kernel parameter and warp parameters. As a common practice, the modification also provides processors, memories, and other devices to implement convolutional neural network-based object detection model, a warp function, and learning algorithm and to store kernel parameter, warp parameters, and other learning parameters. A person of ordinary skill in the art would understand that parameters must be read from the memory during learning process or object detection.
-Regarding claim 19, Nagata in view of Hayakawa, and further in view of Schulter, in view of Jaderberg discloses the apparatus of claim 15. 
The modification further discloses wherein: the teacher data setting portion and the learning portion correspond to a processor (Nagata: FIG. 1).
-Regarding claim 20, Nagata in view of Hayakawa, and further in view of Schulter, in view of Jaderberg discloses the apparatus of claim 16. 
The modification further discloses wherein: the calculation device, the image acquisition portion, the encoder portion, the plurality of feature amount extraction units, the decoder portion, and the plurality of identification units correspond to a processor (Nagata: FIG. 1).
Allowable Subject Matter
Claims 5 and 13-14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571) 272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIAO LIU/Examiner, Art Unit 2664                                                                                                                                                                                                        /NANCY BITAR/Primary Examiner, Art Unit 2664