DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02 December 2020.  The information disclosure statement is being considered by the examiner.


Claim Objections
Claims 2, 5, 11, and 20 are objected to because of the following informalities:  
In claim 2, line 3, “data generator includes” should read “data generator further includes”
In claim 5, lines 18-19, “by which to rotate the person region, using the mask of the learning data, and extracting a person region” should read “by which to rotate the person region, using the mask of the learning data, extracting a person region”
In claim 11, line 3, “a joint position acquirer configured to that acquire a joint position” should read “a joint position acquirer configured to acquire a joint position”
In claim 20, line 2, “wherein the method further comprising” should read “wherein the method further comprises”
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“composited learning data generator” in claims 1-2, 5, and 7
“learner” in claims 1, 4, 7-9, 12-15, and 17-19
“compositing parameter generator” in claims 1, and 7
“compositing mask generator” in claims 1-3, 5-7, 10-11, and 16
“composited image generator” in claims 2, 6, and 10
“joint position acquirer” in claims 3, and 11
“person region generator” in claims 3, and 11
“output generator” in claims 3, and 11
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 7, and 10-15 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because they recite a “program” which read as a functionally claimed program, thus constituting a program per se. See MPEP § 2106 non-limiting example vi: a computer program per se, (citing Gottschalk v. Benson, 409 U.S. 63, 72 (1972)). Computer programs per se are not in one of the statutory categories of invention because a computer program is merely a set of instructions capable of being executed by a computer; the computer program itself is not a process.
The Examiner suggests Applicant amends the claims to read “A non-transitory computer readable recording medium storing a program …” to overcome this rejection. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 5, and 7 recite the limitation "(hereinafter referred to as compositing mask)". It is unclear what "(hereinafter referred to as compositing mask)" is referring to – whether it is “composited learning data”, “a set of a composited image and a mask indicating a person region in the composited image”, “a mask indicating a person region in the composite image” or a redefinition of “composited image”. In light of paragraph [0040] and figures 9 and 10 in the specification, applicant appears to refer to a mask for the composite image that indicates the person region and will be examined as best understood. Dependent claims 2-4, 6, and 8-20 are also rejected for the same reason. 
Claims 1-2, 5, 7, and 10 recite the limitation “(hereinafter referred to as compositing person region)”. It is unclear what (hereinafter referred to as compositing person region)” is referring to – whether it is “a person region”, “a person region from an image in the learning data”, or “learning data”. In light of paragraph [0022] in the specification, applicant appears to refer to the person region of the composited image, and will be examined as best understood. Dependent claims 2-4, 6, and 8-20 are also rejected for the same reason. 
Claims 4, 8-9, 12-14, and 17-18 recite the formula “                    
                        
                            
                                L
                            
                            
                                W
                            
                        
                        
                            
                                p
                            
                        
                        =
                        a
                        
                            
                                l
                                +
                                M
                                
                                    
                                        p
                                    
                                
                            
                        
                        L
                        (
                        P
                        )
                    
                ”. It is unclear what                     
                        l
                    
                 is referring to – whether it is another predetermined positive value, a predetermined negative value, or another expression. In light of a lack of information in the specification, it is understood that                     
                        l
                    
                 is any possible value, including 0, or function, and will be examined as best understood. Dependent claims 2-4, 6, 8-20 are also rejected for the same reason. 
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 15 and 19 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Claim 15 depends on claim 12 and repeats the exact same language used in said claim. Therefore, claim 15 is improper as it does not further limit the subject matter of claim 12.  Claim 19 depends on claim 17 and repeats the exact same language used in said claim. Therefore, claim 19 is improper as it does not further limit the subject matter of claim 17. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 5-7, and 10 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kawai (US 2021/0117731 A1).
Regarding claim 1, Kawai teaches a region extraction model learning device,
comprising:
	a composited learning data generator configured to generate, from already-existing learning data that is a set of (Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”):
an image including a person region and a mask indicating the person region (Kawai, [0066]: “[0066] In the object continuous image DB, one or the plurality of object continuous images, person position information indicating a position or an area of a person in each of a plurality of object still images included in each object continuous image, information on a frame rate of each object continuous image are stored. The person position information is, for example, a silhouette image described in the first example embodiment, but is not limited to this.”; Figure 6), 
and a background image to serve as a background of a composited image, (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information
of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter,
referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”), 
composited learning data that is a set of a composited image and a mask
indicating a person region in the composited image (hereinafter referred to as compositing mask) (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous
image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”); 
and a learner configured to learn model parameters using the composited learning data (Kawai, [0052] "[0052] The learning image generation apparatus 100 according to the present example embodiment described
above makes it possible to generate a learning image which is a learning image used for machine learning and is configured with a continuous image. Further, the learning image generation apparatus 100 according to the present example embodiment, which determines the synthesis position of the object based on the posture information of the background camera which generates the background image, enables a movement distance of the object shown in the synthesis continuous image and the like to be natural. "),
wherein the composited learning data generator includes:
a compositing parameter generator configured to generate compositing parameters that are a set of: 
an enlargement factor whereby a person region is enlarged/reduced
(Kawai, [0091], " [0091] Note that, the image synthesis unit 131 may adjust (enlarge/reduce) a size of the image indicating the person cut out from the object still image, and then synthesize the image on the background image. The image synthesis unit 131 can adjust a size of the cut image based on the synthesis position, the height of the person, and the posture information of the background camera so that the person on the
background image has a natural size. For example, the size of the cut image can be adjusted so that the height of the person acquired by converting coordinates of a vertex of a head and coordinates of a foot of the person on the background image into coordinates of the real space becomes a predetermined height. The predetermined height may be a height of each person indicated by the height information described above, or may be an average height of any group or the like."; Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”),
a degree of translation by which to translate the person region (Kawai, 
[0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”),
and a degree of rotation by which to rotate the person region, using the 
mask of the learning data (Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”) ,
and a composited image and compositing mask generator configured to:
extract a person region from an image in the learning data (hereinafter referred to as compositing person region) using a mask of the learning data (Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “The person position information is, for example, a silhouette image described in [0045]”; Kawai, figure 6),
generate the composited image from the background image and the compositing person region, using the compositing parameters (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”),
generate the compositing mask from a mask generating image and the compositing person region that are the same size as the composited image, using the compositing parameters (Kawai, figure 11; Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066], "[0066] The person position information is, for example, a silhouette image described in [0045]"; Kawai, [0090] "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."; The synthesis continuous image, equivalent to applicant’s composited image, is the same size as the silhouette, equivalent to the applicant’s mask generating image, and the object continuous image that includes the object of a person, which is equivalent to the applicant’s composited person region, as the synthesis image is composed of these pieces),
and generate the composited learning data (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”).

	Regarding claim 2, Kawai teaches the region extraction model learning device of claim 1, wherein the composited learning data generator includes (Kawai [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”):
		a composited image generator configured to:
extract the person region from an image in the learning data (hereinafter referred to as compositing person region) using the mask of the learning data (Kawai, [0063], “[0063] The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “[0066] The person position information is, for example, a silhouette image described in [0045]”; Figure 6),
and generate the composited image from the background image and the compositing person region, using the compositing parameters (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”), 
		and a compositing mask generator configured to:
generate the compositing mask from a mask generating image that is the same size as the composited image, using the composited image (Kawai, figure 11; Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066], "[0066] The person position information is, for example, a silhouette image described in [0045]"; Kawai, [0090] "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."; The synthesis continuous image, equivalent to applicant’s composited image, is the same size as the silhouette, equivalent to the applicant’s mask generating image, and the object continuous image that includes the object of a person, which is equivalent to the applicant’s composited person region, as the synthesis image is composed of these pieces), 
and generate the composited learning data (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”).

	Regarding claim 5, Kawai teaches a computer-implemented method for region extraction model learning, the method comprising (Kawai, [0012]: “[0012] In addition, according to the present invention, there is provided a learning image generation method executed by a computer, the method including a background image acquisition step of acquiring a background image; a background camera posture information acquisition step of
acquiring posture information of a background camera which generates the background image; an object continuous image acquisition step of acquiring an object continuous
image including an object; a synthesis position determination step of determining a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis step of synthesizing the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit.”):
generating by a composited learning data generator from already-existing learning data that is a set of (Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”):
an image including a person region and a mask indicating the person region (Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “The person position information is, for example, a silhouette image described in [0045]”; Kawai, figure 6), 
and a background image to serve as a background of a composited image (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”),
composited learning data that is a set of a composited image and a mask
indicating a person region in the posited image (hereinafter referred to as compositing mask) (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous
image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”); 
and learning device learning model parameters using the composited learning data (Kawai, [0052] "[0052] The learning image generation apparatus 100 according to the present example embodiment described
above makes it possible to generate a learning image which is a learning image used for machine learning and is configured with a continuous image. Further, the learning image generation apparatus 100 according to the present example embodiment, which determines the synthesis position of the object based on the posture information of the background camera which generates the background image, enables a movement distance of the object shown in the synthesis continuous image and the like to be natural."), wherein generating the composited learning data includes:
generating compositing parameters that are a set of an enlargement factor whereby a person region is enlarged/reduced (Kawai, [0091], " [0091] Note that, the image synthesis unit 131 may adjust (enlarge/reduce) a size of the image indicating the person cut out from the object still image, and then synthesize the image on the background image. The image synthesis unit 131 can adjust a size of the cut image based on the synthesis position, the height of the person, and the posture information of the background camera so that the person on the background image has a natural size. For example, the size of the cut image can be adjusted so that the height of the person acquired by converting coordinates of a vertex of a head and coordinates of a foot of the person on the background image into coordinates of the real space becomes a predetermined height. The predetermined height may be a height of each person indicated by the height information described above, or may be an average height of any group or the like."; Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”), a degree of translation by which to translate the person region (Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB."), and a degree of rotation by which to rotate the person region, using the mask of the learning data (Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”),
and extracting a person region from an image in the learning data (hereinafter referred to as compositing person region) using a mask of the learning data (Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “The person position information is, for example, a silhouette image described in [0045]”; Kawai, figure 6),
generating, by a composite image and compositing mask generator, the composited image from the background image and the compositing person region, using the compositing parameters (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”),
generating, by the composite image and compositing mask generator, the compositing mask from a mask generating image and the compositing person region that are the same size as the composited image, using the compositing parameters (Kawai, figure 11; Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066], "[0066] The person position information is, for example, a silhouette image described in [0045]"; Kawai, [0090] "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."; The synthesis continuous image, equivalent to applicant’s composited image, is the same size as the silhouette, equivalent to the applicant’s mask generating image, and the object continuous image that includes the object of a person, which is equivalent to the applicant’s composited person region, as the synthesis image is composed of these pieces), 
and generating the composited learning data (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.").

	Regarding claim 6, Kawai teaches the region extraction model learning method of claim 5, wherein the generating composited learning data includes:
generating, by a composited image generator, the composited image from the background image and the compositing person region using the compositing parameters (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”),
generating, by a compositing mask generator the compositing mask from a mask generating image that is the same size as the composited image, using the composited image (Kawai, figure 11; Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066], "[0066] The person position information is, for example, a silhouette image described in [0045]"; Kawai, [0090] "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."; The synthesis continuous image, equivalent to applicant’s composited image, is the same size as the silhouette, equivalent to the applicant’s mask generating image, and the object continuous image that includes the object of a person, which is equivalent to the applicant’s composited person region, as the synthesis image is composed of these pieces), 
and generating the composited learning data (Kawai, [0090], [0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”).

	Regarding claim 7, Kawai teaches a program for causing a computer to function as the region extraction model learning device, the device comprising (Kawai, [0013], “[0013] In addition, according to the present invention, there is provided a program causing a computer to function as: a background image acquisition unit that acquires a background image; a background camera posture information acquisition unit that acquires posture information of a background camera which generates the background image; an object continuous image acquisition unit that acquires an object continuous image including an object; a synthesis position determination unit that determines a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis unit that synthesizes the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit. In addition, according to the present invention, there is provided a program causing a computer to function”):
a composited learning data generator configured to generate, from already-existing learning data that is a set of (Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”):
an image including a person region and a mask indicating the person region (Kawai, [0066]: “[0066] In the object continuous image DB, one or the plurality of object continuous images, person position information indicating a position or an area of a person in each of a plurality of object still images included in each object continuous image, information on a frame rate of each object continuous image are stored. The person position information is, for example, a silhouette image described in the first example embodiment, but is not limited to this.”; Figure 6), 
and a background image to serve as a background of a composited image (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as “background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated."),
composited learning data that is a set of a composited image and a mask
indicating a person region in the composited image (hereinafter referred to as compositing mask) (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image. "); 
and a learner configured to learn model parameters using the composited learning data (Kawai, [0052] "[0052] The learning image generation apparatus 100 according to the present example embodiment described above makes it possible to generate a learning image which is a learning image used for machine learning and is configured with a continuous image. Further, the learning image generation apparatus 100 according to the present example embodiment, which determines the synthesis position of the object based on the posture information of the background camera which generates the background image, enables a movement distance of the object shown in the synthesis continuous image and the like to be natural."), wherein the composited learning data generator includes:
a compositing parameter generator configured to generate compositing parameters that are a set of:
an enlargement factor whereby a person region is enlarged/reduced (Kawai, [0091], " [0091] Note that, the image synthesis unit 131 may adjust (enlarge/reduce) a size of the image indicating the person cut out from the object still image, and then synthesize the image on the background image. The image synthesis unit 131 can adjust a size of the cut image based on the synthesis position, the height of the person, and the posture information of the background camera so that the person on the
background image has a natural size. For example, the size of the cut image can be adjusted so that the height of the person acquired by converting coordinates of a vertex of a head and coordinates of a foot of the person on the background image into coordinates of the real space becomes a predetermined height. The predetermined height may be a height of each person indicated by the height information described above, or may be an average height of any group or the like."; Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”),
a degree of translation by which to translate the person region 
(Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB."),
and a degree of rotation by which to rotate the person region,
using the mask of the learning data (Kawai, [0100], "[0100] The object camera posture information acquisition unit 114 acquires posture information of an object camera which generates an object continuous image when the image is generated. For example, with camera calibration, internal parameters (a focal length, image center coordinates, a distortion coefficient, and the like) or external parameters (a rotation matrix, a translation vector, and the like) of the object camera when the object continuous image is generated are computed. The object camera posture information acquisition unit 114 acquires the internal parameters or the external parameters as posture information. Note that, the posture information of the object camera which generates each object continuous image when the image is generated may be registered in the object continuous image DB described in the second example embodiment. The object camera posture information acquisition unit 114 may acquire the posture information of the object camera from the object continuous image DB.”),
and a composited image and compositing mask generator configured to:
extract a person region from an image in the learning data (hereinafter referred to as compositing person region) using a mask of the learning data (Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “The person position information is, for example, a silhouette image described in [0045]”; Kawai, figure 6),
generate the composited image from the background image and the compositing person region, using the compositing parameters (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”),
generate the compositing mask from a mask generating image and the compositing person region that are the same size as the composited image, using the compositing parameters (Kawai, figure 11; Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066], "[0066] The person position information is, for example, a silhouette image described in [0045]"; Kawai, [0090] "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."; The synthesis continuous image, equivalent to applicant’s composited image, is the same size as the silhouette, equivalent to the applicant’s mask generating image, and the object continuous image that includes the object of a person, which is equivalent to the applicant’s composited person region, as the synthesis image is composed of these pieces), 
and generate the composited learning data (Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”).
	
	Regarding claim 10, Kawai teaches the program according to claim 7, wherein the composited learning data generating unit includes: 
		a composited image generator configured to:
extract the person region from an image in the learning data (hereinafter referred to as compositing person region) using the mask of the learning data (Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “The person position information is, for example, a silhouette image described in [0045]”; Kawai, figure 6), 
and generate the composited image from the background image and the compositing person region, using the compositing parameters (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.”), 
and a compositing mask generator configured to:
generate the compositing mask from a mask generating image that is the same size as the composited image, using the composited image (Kawai, figure 11; Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066], "[0066] The person position information is, for example, a silhouette image described in [0045]"; Kawai, [0090] "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."; The synthesis continuous image, equivalent to applicant’s composited image, is the same size as the silhouette, equivalent to the applicant’s mask generating image, and the object continuous image that includes the object of a person, which is equivalent to the applicant’s composited person region, as the synthesis image is composed of these pieces), 
and generate the composited learning data (Kawai, [0090], [0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.").

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3, 11, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kawai (US 2021/0117731 A1), in view of Balakrishnan et al., (Synthesizing Images of Humans in Unseen Poses) hereinafter Balakrishnan.

Regarding claim 3, Kawai teaches the region extraction model learning device according
to claim 2 (Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”).
Kawai fails to disclose “wherein the compositing mask generator includes:
a joint position acquirer configured to acquire a joint position and a joint label of a person in a person region included in the composited image, 
a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label, 
and an output generator configured to generate the composited learning data from the composited image and the compositing mask.”
However, Balakrishnan teaches:
a joint position acquirer configured to acquire a joint position and a joint label of a person in a person region included in the composited image (Balakrishnan, page 3, section 3.2, paragraph 1: "To handle such movement, we first segment                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     into L foreground layers and one background layer. The L layers correspond to L predefined body parts."; Balakrishnan, section 3.2, paragraph 2: "specifying the rough location of each body part in                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     "), 
a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label (Balakrishnan, page 3, section 3.2, paragraph 2: "consists of a 2D Gaussian mask over the approximate spatial region of each body part"), 
and an output generator configured to generate the composited learning data from the composited image and the compositing mask (Balakrishnan, pages 2-3, section 3, paragraph 3: "We design our network such that these modules are learned jointly and trained using only the target image as a label.").
Kawai and Balakrishnan are both considered to be analogous to the claimed invention because they are in the same field of image processing and bootstrapping image data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kawai to incorporate the teachings of Balakrishnan and incorporate a joint position acquirer configured to acquire a joint position and a joint label of a person in a person region included in the composited image, a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label, and an output generator configured to generate the composited learning data from the composited image and the compositing mask, as doing so would reduce unnatural movement of the human figure in the image as “when a person moves, each body part may move differently from one another” (Balakrishnan, page 3, section 3.2, paragraph 1).

Regarding claim 11, Kawai teaches the program according to claim 7 (Kawai, [0013], “[0013] In addition, according to the present invention, there is provided a program causing a computer to function as: a background image acquisition unit that acquires a background image; a background camera posture information acquisition unit that acquires posture information of a background camera which generates the background image; an object continuous image acquisition unit that acquires an object continuous image including an object; a synthesis position determination unit that determines a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis unit that synthesizes the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit. In addition, according to the present invention, there is provided a program causing a computer to function”). 
Kawai fails to teach “wherein the compositing mask generator includes:
a joint position acquirer configured to that acquire a joint position and a joint label of a person in a person region included in the composited image,
a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label,
and an output generator configured to generate the composited learning data from the composited image and the compositing mask.”
However, Balakrishnan teaches:
a joint position acquirer configured to that acquire a joint position and a joint label of a person in a person region included in the composited image (Balakrishnan, page 3, section 3.2, paragraph 1: "To handle such movement, we first segment                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     into L foreground layers and one background layer. The L layers correspond to L predefined body parts."; Balakrishnan, section 3.2, paragraph 2: "specifying the rough location of each body part in                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     "),
a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label (Balakrishnan, page 3, section 3.2, paragraph 2: "consists of a 2D Gaussian mask over the approximate spatial region of each body part"), 
and an output generator configured to generate the composited learning data from the composited image and the compositing mask (Balakrishnan, pages 2-3, section 3, paragraph 3: "We design our network such that these modules are learned jointly and trained using only the target image as a label.").
Kawai and Balakrishnan are both considered to be analogous to the claimed invention because they are in the same field of image processing and bootstrapping image data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kawai to incorporate the teachings of Balakrishnan and incorporate a joint position acquirer configured to acquire a joint position and a joint label of a person in a person region included in the composited image, a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label, and an output generator configured to generate the composited learning data from the composited image and the compositing mask, as doing so would reduce unnatural movement of the human figure in the image as “when a person moves, each body part may move differently from one another” (Balakrishnan, page 3, section 3.2, paragraph 1).

	Regarding claim 16, Kawai teaches the computer-implemented method of claim 6 (Kawai, [0012]: “[0012] In addition, according to the present invention, there is provided a learning image generation method executed by a computer, the method including a background image acquisition step of acquiring a background image; a background camera posture information acquisition step of acquiring posture information of a background camera which generates the background image; an object continuous image acquisition step of acquiring an object continuous image including an object; a synthesis position determination step of determining a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis step of synthesizing the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit.”) and providing the composited learning data (Kawai, [0090], [0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image.").
Kawai fails to teach “the method further comprising, by the compositing mask generator: 
acquiring a joint position and a joint label of a person in a person region included in the composited image, 
generating the compositing mask from the mask generating image, using the joint position and the joint label,
generating the composited learning data from the composited image and the compositing mask;”
However, Balakrishnan teaches:
acquiring a joint position and a joint label of a person in a person region included in the composited image (Balakrishnan, page 3, section 3.2, paragraph 1: "To handle such movement, we first segment                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     into L foreground layers and one background layer. The L layers correspond to L predefined body parts."; Balakrishnan, section 3.2, paragraph 2: "specifying the rough location of each body part in                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     "), 
generating the compositing mask from the mask generating image, using the joint position and the joint label (Balakrishnan, page 3, section 3.2, paragraph 2: "consists of a 2D Gaussian mask over the approximate spatial region of each body part"), 
generating the composited learning data from the composited image and the compositing mask (Balakrishnan, pages 2-3, section 3, paragraph 3: "We design our network such that these modules are learned jointly and trained using only the target image as a label.").
Kawai and Balakrishnan are both considered to be analogous to the claimed invention because they are in the same field of image processing and bootstrapping image data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kawai to incorporate the teachings of Balakrishnan and incorporate a joint position acquirer configured to acquire a joint position and a joint label of a person in a person region included in the composited image, a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label, and an output generator configured to generate the composited learning data from the composited image and the compositing mask, as doing so would reduce unnatural movement of the human figure in the image as “when a person moves, each body part may move differently from one another” (Balakrishnan, page 3, section 3.2, paragraph 1).

	Regarding claim 20, Kawai teaches the computer-implemented method of claim 5 (Kawai, [0012]: “[0012] In addition, according to the present invention, there is provided a learning image generation method executed by a computer, the method including a background image acquisition step of acquiring a background image; a background camera posture information acquisition step of acquiring posture information of a background camera which generates the background image; an object continuous image acquisition step of acquiring an object continuous image including an object; a synthesis position determination step of determining a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis step of synthesizing the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit.”), wherein the method further comprising:
	receiving input images, the input images including:
		an image including a person region and a mask indicating the person region (Kawai, [0063], “The object continuous image acquisition unit 113 acquires an object continuous image including an object. Hereinafter, the object is a person, but is not limited to this.”; Kawai, [0066]: “The person position information is, for example, a silhouette image described in [0045]”; Kawai, figure 6), 
and a background image to serve as a background (Kawai, [0057], "[0057] The background image acquisition unit 111 acquires a background image. The background camera posture information acquisition unit 112 acquires posture information of a background camera which generates a background image acquired by the background image acquisition unit 111 when the background image is generated."; Kawai, [0059], "[0059] In this example, the learning image generation apparatus 100 has a background image database (hereinafter, referred to as "background image DB") which stores one or a plurality of background images. The background image stored in the background image DB may include any image, and a publicly available image DB may be used. Further, the background image DB stores posture information of the background camera which generates each background image when each background image is generated.”; Kawai, [0090], "[0090] Based on the background image acquired by the background image acquisition unit 111, the object continuous image acquired by the object continuous image acquisition unit 113, and the synthesis position determined by the synthesis position determination unit 121, the image synthesis unit 131 synthesizes the person included in each of the plurality of object still images with the background image and generates a plurality of synthesis still images to generate a synthesis continuous image."), 
Kawai fails to teach “wherein the background image is distinct from the person region, wherein the person region includes one or more joints of a person for acquiring one more joint labels.
However, Balakrishnan teaches: 
wherein the background image is distinct from the person region, wherein the person region includes one or more joints of a person for acquiring one more joint labels, (Balakrishnan, page 3, section 3.2, paragraph 1: "To handle such movement, we first segment                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     into L foreground layers and one background layer. The L layers correspond to L predefined body parts."; Balakrishnan, section 3.2, paragraph 2: "specifying the rough location of each body part in                         
                            
                                
                                    I
                                
                                
                                    S
                                
                            
                        
                     ") and wherein the one or more joint labels include a wrist and an elbow (Balakrishnan, page 4, figure 4).
Kawai and Balakrishnan are both considered to be analogous to the claimed invention because they are in the same field of image processing and bootstrapping image data. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kawai to incorporate the teachings of Balakrishnan and incorporate a joint position acquirer configured to acquire a joint position and a joint label of a person in a person region included in the composited image, a person region generator configured to generate the compositing mask from the mask generating image, using the joint position and the joint label, and an output generator configured to generate the composited learning data from the composited image and the compositing mask, as doing so would reduce unnatural movement of the human figure in the image as “when a person moves, each body part may move differently from one another” (Balakrishnan, page 3, section 3.2, paragraph 1).

Claims 4, 8, 12-13, 15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Kawai (US 2021/0117731 A1), in view of Ozdemir et al., (US 2019/0122073 A1), hereinafter Ozdemir.

Regarding claim 4, Kawai teaches the region extraction model learning device according to claim 1(Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”).
Kawai fails to disclose “wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                    
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions, and                         
                            a
                        
                     is a predetermined positive value.” 
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                     (Ozdemir, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                        
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            
                                
                                    p
                                
                            
                            =
                            a
                            
                                
                                    l
                                    +
                                    M
                                    
                                        
                                            p
                                        
                                    
                                
                            
                            L
                            (
                            P
                            )
                        
                     (Ozdemir, [0044]: 
“                        
                            
                                
                                    L
                                
                                
                                    B
                                
                            
                            
                                
                                    θ
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        B
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                            
                                        
                                        
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                            l
                                            o
                                            g
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            +
                                            
                                                
                                                    1
                                                    -
                                                    y
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            
                                                
                                                    1
                                                    -
                                                    w
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            l
                                            o
                                            g
                                            (
                                            1
                                            -
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    ”)
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, [0044]: “                        
                            θ
                        
                     is the vector of network weights that are learned via training (i.e. by minimizing the loss function                         
                            L
                            
                                
                                    θ
                                
                            
                        
                    )),                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, [0044]: “                        
                            
                                
                                    y
                                
                                ^
                            
                            
                                
                                    x
                                    ;
                                    θ
                                
                            
                            ∈
                            
                                
                                    0,1
                                
                            
                        
                     is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                         
                            a
                        
                     is a predetermined positive value (Ozdemir, [0044]: “                        
                            w
                            
                                
                                    x
                                
                            
                            ∈
                            [
                            0,1
                            ]
                        
                     is the weight representing the contribution of the cross-entropy loss associated with pixel x”). 
Kawai and Ozdemir are both considered to be analogous to the claimed invention
because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]). 
	
Regarding claim 8, Kawai teaches the region extraction model learning device according to claim 2 (Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”), 
Kawai fails to disclose “wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                    
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions, and                         
                            a
                        
                     is a predetermined positive value.” 
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                     (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                        
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            
                                
                                    p
                                
                            
                            =
                            a
                            
                                
                                    l
                                    +
                                    M
                                    
                                        
                                            p
                                        
                                    
                                
                            
                            L
                            (
                            P
                            )
                        
                     (Ozdemir, page 4, [0044]: 
“                        
                            
                                
                                    L
                                
                                
                                    B
                                
                            
                            
                                
                                    θ
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        B
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                            
                                        
                                        
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                            l
                                            o
                                            g
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            +
                                            
                                                
                                                    1
                                                    -
                                                    y
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            
                                                
                                                    1
                                                    -
                                                    w
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            l
                                            o
                                            g
                                            (
                                            1
                                            -
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    ”)
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                        
                            θ
                        
                     is the vector of network weights that are learned via training (i.e. by minimizing the loss function                         
                            L
                            
                                
                                    θ
                                
                            
                        
                    )),                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                        
                            
                                
                                    y
                                
                                ^
                            
                            
                                
                                    x
                                    ;
                                    θ
                                
                            
                            ∈
                            
                                
                                    0,1
                                
                            
                        
                     is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                         
                            a
                        
                     is a predetermined positive value (Ozdemir, page 4, [0044]: “                        
                            w
                            
                                
                                    x
                                
                            
                            ∈
                            [
                            0,1
                            ]
                        
                     is the weight representing the contribution of the cross-entropy loss associated with pixel x”). 
Kawai and Ozdemir are both considered to be analogous to the claimed invention
because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Regarding claim 12, Kawai teaches the program according to claim 7 (Kawai, [0013], “[0013] In addition, according to the present invention, there is provided a program causing a computer to function as: a background image acquisition unit that acquires a background image; a background camera posture information acquisition unit that acquires posture information of a background camera which generates the background image; an object continuous image acquisition unit that acquires an object continuous image including an object; a synthesis position determination unit that determines a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis unit that synthesizes the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit. In addition, according to the present invention, there is provided a program causing a computer to function ”), 
Kawai fails to disclose “wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                    
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions, and                         
                            a
                        
                     is a predetermined positive value.”
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                     (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                        
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            
                                
                                    p
                                
                            
                            =
                            a
                            
                                
                                    l
                                    +
                                    M
                                    
                                        
                                            p
                                        
                                    
                                
                            
                            L
                            (
                            P
                            )
                        
                     (Ozdemir, page 4, [0044]: 
“                        
                            
                                
                                    L
                                
                                
                                    B
                                
                            
                            
                                
                                    θ
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        B
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                            
                                        
                                        
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                            l
                                            o
                                            g
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            +
                                            
                                                
                                                    1
                                                    -
                                                    y
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            
                                                
                                                    1
                                                    -
                                                    w
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            l
                                            o
                                            g
                                            (
                                            1
                                            -
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    ”)
where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                        
                            θ
                        
                     is the vector of network weights that are learned via training (i.e. by minimizing the loss function                         
                            L
                            
                                
                                    θ
                                
                            
                        
                    )),                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                        
                            
                                
                                    y
                                
                                ^
                            
                            
                                
                                    x
                                    ;
                                    θ
                                
                            
                            ∈
                            
                                
                                    0,1
                                
                            
                        
                     is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                         
                            a
                        
                     is a predetermined positive value (Ozdemir, page 4, [0044]: “                        
                            w
                            
                                
                                    x
                                
                            
                            ∈
                            [
                            0,1
                            ]
                        
                     is the weight representing the contribution of the cross-entropy loss associated with pixel x”).
Kawai and Ozdemir are both considered to be analogous to the claimed invention
because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Regarding claim 13, Kawai teaches the program according to claim 10 (Kawai, [0013], “[0013] In addition, according to the present invention, there is provided a program causing a computer to function as: a background image acquisition unit that acquires a background image; a background camera posture information acquisition unit that acquires posture information of a background camera which generates the background image; an object continuous image acquisition unit that acquires an object continuous image including an object; a synthesis position determination unit that determines a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis unit that synthesizes the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit. In addition, according to the present invention, there is provided a program causing a computer to function””), 
Kawai fails to disclose “wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                    
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions, and                         
                            a
                        
                     is a predetermined positive value.”
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                     (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                        
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            
                                
                                    p
                                
                            
                            =
                            a
                            
                                
                                    l
                                    +
                                    M
                                    
                                        
                                            p
                                        
                                    
                                
                            
                            L
                            (
                            P
                            )
                        
                     (Ozdemir, page 4, [0044]: 
“                        
                            
                                
                                    L
                                
                                
                                    B
                                
                            
                            
                                
                                    θ
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        B
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                            
                                        
                                        
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                            l
                                            o
                                            g
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            +
                                            
                                                
                                                    1
                                                    -
                                                    y
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            
                                                
                                                    1
                                                    -
                                                    w
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            l
                                            o
                                            g
                                            (
                                            1
                                            -
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    ”)
where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                        
                            θ
                        
                     is the vector of network weights that are learned via training (i.e. by minimizing the loss function                         
                            L
                            
                                
                                    θ
                                
                            
                        
                    )),                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                        
                            
                                
                                    y
                                
                                ^
                            
                            
                                
                                    x
                                    ;
                                    θ
                                
                            
                            ∈
                            
                                
                                    0,1
                                
                            
                        
                     is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                         
                            a
                        
                     is a predetermined positive value (Ozdemir, page 4, [0044]: “                        
                            w
                            
                                
                                    x
                                
                            
                            ∈
                            [
                            0,1
                            ]
                        
                     is the weight representing the contribution of the cross-entropy loss associated with pixel x”).
Kawai and Ozdemir are both considered to be analogous to the claimed invention
because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Regarding claim 15, it is rejected under similar reasoning as clam 12 above. 

Regarding claim 17, Kawai teaches the computer-implemented method of claim 5 (Kawai, [0012]: “[0012] In addition, according to the present invention, there is provided a learning image generation method executed by a computer, the method including a background image acquisition step of acquiring a background image; a background camera posture information acquisition step of
acquiring posture information of a background camera which generates the background image; an object continuous image acquisition step of acquiring an object continuous
image including an object; a synthesis position determination step of determining a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis step of synthesizing the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit.”), 
Kawai fails to disclose “wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                    
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions, and                         
                            a
                        
                     is a predetermined positive value.”
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                     (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                        
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            
                                
                                    p
                                
                            
                            =
                            a
                            
                                
                                    l
                                    +
                                    M
                                    
                                        
                                            p
                                        
                                    
                                
                            
                            L
                            (
                            P
                            )
                        
                     (Ozdemir, page 4, [0044]: 
“                        
                            
                                
                                    L
                                
                                
                                    B
                                
                            
                            
                                
                                    θ
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        B
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                            
                                        
                                        
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                            l
                                            o
                                            g
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            +
                                            
                                                
                                                    1
                                                    -
                                                    y
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            
                                                
                                                    1
                                                    -
                                                    w
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            l
                                            o
                                            g
                                            (
                                            1
                                            -
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    ”)
where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                        
                            θ
                        
                     is the vector of network weights that are learned via training (i.e. by minimizing the loss function                         
                            L
                            
                                
                                    θ
                                
                            
                        
                    )),                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                        
                            
                                
                                    y
                                
                                ^
                            
                            
                                
                                    x
                                    ;
                                    θ
                                
                            
                            ∈
                            
                                
                                    0,1
                                
                            
                        
                     is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                         
                            a
                        
                     is a predetermined positive value (Ozdemir, page 4, [0044]: “                        
                            w
                            
                                
                                    x
                                
                            
                            ∈
                            [
                            0,1
                            ]
                        
                     is the weight representing the contribution of the cross-entropy loss associated with pixel x”).
Kawai and Ozdemir are both considered to be analogous to the claimed invention
because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Regarding claim 18, Kawai teaches the computer-implemented method of claim 6 (Kawai, [0012]: “[0012] In addition, according to the present invention, there is provided a learning image generation method executed by a computer, the method including a background image acquisition step of acquiring a background image; a background camera posture information acquisition step of
acquiring posture information of a background camera which generates the background image; an object continuous image acquisition step of acquiring an object continuous
image including an object; a synthesis position determination step of determining a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis step of synthesizing the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit.”), 
Kawai fails to disclose “wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                    
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions, and                         
                            a
                        
                     is a predetermined positive value.”
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                         
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            (
                            p
                            )
                        
                     (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                        
                            
                                
                                    L
                                
                                
                                    W
                                
                            
                            
                                
                                    p
                                
                            
                            =
                            a
                            
                                
                                    l
                                    +
                                    M
                                    
                                        
                                            p
                                        
                                    
                                
                            
                            L
                            (
                            P
                            )
                        
                     (Ozdemir, page 4, [0044]: 
“                        
                            
                                
                                    L
                                
                                
                                    B
                                
                            
                            
                                
                                    θ
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        B
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                            
                                        
                                        
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                            l
                                            o
                                            g
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            +
                                            
                                                
                                                    1
                                                    -
                                                    y
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            
                                                
                                                    1
                                                    -
                                                    w
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                            l
                                            o
                                            g
                                            (
                                            1
                                            -
                                            
                                                
                                                    y
                                                
                                                ^
                                            
                                            
                                                
                                                    x
                                                    ;
                                                    θ
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    ”)
where                         
                            L
                            (
                            P
                            )
                        
                     is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                        
                            θ
                        
                     is the vector of network weights that are learned via training (i.e. by minimizing the loss function                         
                            L
                            
                                
                                    θ
                                
                            
                        
                    )),                         
                            M
                            
                                
                                    p
                                
                            
                        
                     is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                        
                            
                                
                                    y
                                
                                ^
                            
                            
                                
                                    x
                                    ;
                                    θ
                                
                            
                            ∈
                            
                                
                                    0,1
                                
                            
                        
                     is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                         
                            a
                        
                     is a predetermined positive value (Ozdemir, page 4, [0044]: “                        
                            w
                            
                                
                                    x
                                
                            
                            ∈
                            [
                            0,1
                            ]
                        
                     is the weight representing the contribution of the cross-entropy loss associated with pixel x”).
Kawai and Ozdemir are both considered to be analogous to the claimed invention
because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Regarding claim 19, it is rejected under similar reasoning as claim 17 above. 

Claims 9 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kawai (US 2021/0117731 A1), in view of Balakrishnan et al., (Synthesizing Images of Humans in Unseen Poses) hereinafter Balakrishnan as applied to claims 3, 11, 16, and 20 above, and further in view of Ozdemir et al., (US 2019/0122073 A1), hereinafter Ozdemir.

Regarding claim 9, Kawai in view of Balakrishnan teaches the region extraction model learning device according to claim 3 (Kawai, [0054], “[0054] The learning generator apparatus 100 according to the present example embodiment executes a process in the same manner as that of the first example embodiment, but the processing content is specified. This will be described below.”)
Kawai in view of Balakrishnan fails to teach “wherein the learner learns the model parameters using a weighted loss function                 
                    
                        
                            L
                        
                        
                            W
                        
                    
                    (
                    p
                    )
                
            
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                 
                    L
                    (
                    P
                    )
                
             is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                 
                    M
                    
                        
                            p
                        
                    
                
             is a function that is 1 in a person region and is 0 at all other regions, and                 
                    a
                
             is a predetermined positive value.”
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                 
                    
                        
                            L
                        
                        
                            W
                        
                    
                    (
                    p
                    )
                
             (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
             (Ozdemir, page 4, [0044]: 
“                
                    
                        
                            L
                        
                        
                            B
                        
                    
                    
                        
                            θ
                        
                    
                    =
                    
                        
                            ∑
                            
                                B
                            
                        
                        
                            
                                
                                    ∑
                                    
                                        x
                                    
                                
                                
                                    y
                                    
                                        
                                            x
                                        
                                    
                                    w
                                    
                                        
                                            x
                                        
                                    
                                    l
                                    o
                                    g
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                    
                                        
                                            x
                                            ;
                                            θ
                                        
                                    
                                    +
                                    
                                        
                                            1
                                            -
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                        
                                    
                                    
                                        
                                            1
                                            -
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                        
                                    
                                    l
                                    o
                                    g
                                    (
                                    1
                                    -
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                    
                                        
                                            x
                                            ;
                                            θ
                                        
                                    
                                    )
                                
                            
                        
                    
                
            ”)
where                 
                    L
                    (
                    P
                    )
                
             is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                
                    θ
                
             is the vector of network weights that are learned via training (i.e. by minimizing the loss function                 
                    L
                    
                        
                            θ
                        
                    
                
            )),                 
                    M
                    
                        
                            p
                        
                    
                
             is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                
                    
                        
                            y
                        
                        ^
                    
                    
                        
                            x
                            ;
                            θ
                        
                    
                    ∈
                    
                        
                            0,1
                        
                    
                
             is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                 
                    a
                
             is a predetermined positive value (Ozdemir, page 4, [0044]: “                
                    w
                    
                        
                            x
                        
                    
                    ∈
                    [
                    0,1
                    ]
                
             is the weight representing the contribution of the cross-entropy loss associated with pixel x”).
Kawai, Balakrishnan and Ozdemir are both considered to be analogous to the claimed invention because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Regarding claim 14, Kawai in view of Balakrishnan teaches the program according to claim 11 (Kawai, [0013], “[0013] In addition, according to the present invention, there is provided a program causing a computer to function as: a background image acquisition unit that acquires a background image; a background camera posture information acquisition unit that acquires posture information of a background camera which generates the background image; an object continuous image acquisition unit that acquires an object continuous image including an object; a synthesis position determination unit that determines a synthesis position on the background image of the object included in each of a plurality of object still images included in the object continuous image based on the posture information of the background camera; and an image synthesis unit that synthesizes the object included in each of the plurality of object still images with the background image to generate a synthesis continuous image, based on the background image, the object continuous image, and the synthesis position determined by the synthesis position determination unit. In addition, according to the present invention, there is provided a program causing a computer to function”), 
Kawai in view of Balakrishnan fails to teach “wherein the learner learns the model parameters using a weighted loss function                 
                    
                        
                            L
                        
                        
                            W
                        
                    
                    (
                    p
                    )
                
            
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
            
	where                 
                    L
                    (
                    P
                    )
                
             is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned,                 
                    M
                    
                        
                            p
                        
                    
                
             is a function that is 1 in a person region and is 0 at all other regions, and                 
                    a
                
             is a predetermined positive value.”
However, Ozdemir teaches wherein the learner learns the model parameters using a weighted loss function                 
                    
                        
                            L
                        
                        
                            W
                        
                    
                    (
                    p
                    )
                
             (Ozdemir, page 4, [0044]: “training cost function…our cost function per-batch is a weighted cross-entropy”)
	                
                    
                        
                            L
                        
                        
                            W
                        
                    
                    
                        
                            p
                        
                    
                    =
                    a
                    
                        
                            l
                            +
                            M
                            
                                
                                    p
                                
                            
                        
                    
                    L
                    (
                    P
                    )
                
             (Ozdemir, page 4, [0044]: 
“                
                    
                        
                            L
                        
                        
                            B
                        
                    
                    
                        
                            θ
                        
                    
                    =
                    
                        
                            ∑
                            
                                B
                            
                        
                        
                            
                                
                                    ∑
                                    
                                        x
                                    
                                
                                
                                    y
                                    
                                        
                                            x
                                        
                                    
                                    w
                                    
                                        
                                            x
                                        
                                    
                                    l
                                    o
                                    g
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                    
                                        
                                            x
                                            ;
                                            θ
                                        
                                    
                                    +
                                    
                                        
                                            1
                                            -
                                            y
                                            
                                                
                                                    x
                                                
                                            
                                        
                                    
                                    
                                        
                                            1
                                            -
                                            w
                                            
                                                
                                                    x
                                                
                                            
                                        
                                    
                                    l
                                    o
                                    g
                                    (
                                    1
                                    -
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                    
                                        
                                            x
                                            ;
                                            θ
                                        
                                    
                                    )
                                
                            
                        
                    
                
            ”)
where                 
                    L
                    (
                    P
                    )
                
             is a loss function defined on the basis of error between a person region extracted from a composited image and a person region in a compositing mask using model parameters being learned (Ozdemir, page 4, [0044]: “                
                    θ
                
             is the vector of network weights that are learned via training (i.e. by minimizing the loss function                 
                    L
                    
                        
                            θ
                        
                    
                
            )),                 
                    M
                    
                        
                            p
                        
                    
                
             is a function that is 1 in a person region and is 0 at all other regions (Ozdemir, page 4, [0044]: “                
                    
                        
                            y
                        
                        ^
                    
                    
                        
                            x
                            ;
                            θ
                        
                    
                    ∈
                    
                        
                            0,1
                        
                    
                
             is the output of the network for pixel x denoting the probability that pixel x is a nodule”), and                 
                    a
                
             is a predetermined positive value (Ozdemir, page 4, [0044]: “                
                    w
                    
                        
                            x
                        
                    
                    ∈
                    [
                    0,1
                    ]
                
             is the weight representing the contribution of the cross-entropy loss associated with pixel x”).
Kawai, Balakrishnan and Ozdemir are both considered to be analogous to the claimed invention because they are in the same field of image processing and machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified the learning image apparatus of Kawai to incorporate the teachings of Ozdemir as loss functions are known in the art and the “approach de-biases the network from learning only the background pixels that have significantly higher occurrence frequency than that of nodule pixels” (Ozdemir, page 4, [0044]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORGAN RUTH BERASLEY whose telephone number is (571)272-8071. The examiner can normally be reached M-F 8:30-5:00 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Leonard Chang can be reached on 571-270-3691. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





                                                                                                                                                                                                       /MORGAN RUTH BERASLEY/Examiner, Art Unit 2662                                                                                                                                                                                                        
/GANDHI THIRUGNANAM/Primary Examiner, Art Unit 2662