Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

DETAILED ACTION
Claims 1 – 12 are pending in this application. Claims 1, 3 and 11 are independent.

CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 


Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Said placeholder(s) is/are: "…discriminator…" in at least claims 1 – 12.


A review of the specification to determine whether the corresponding structure, material, or acts that perform the claimed function are disclosed shows that the written description fails to link or associate the disclosed structure, material, or acts to the claimed function(s); there is no disclosure (or insufficient disclosure) of structure, material, or acts for performing the claimed function(s).

If Inventor(s) (or (pre-AlA) Applicant(s)) does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, Inventor(s) (or (pre-AlA) Applicant(s)) may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. § 112 (b):
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the Inventor(s) (or (pre-AlA) Applicant(s)) regards as the invention.

Claims 1 – 12 are rejected under 35 U.S.C. § 112 (b), as being indefinite for failing to particularly point out and distinctly claim the subject matter which Inventor(s) (or (pre-AlA) Applicant(s)) regards as the invention.
Specifically, regarding claims 1 – 12, the use of the said non-structural generic placeholders (i.e., "…discriminator…") coupled with the claimed corresponding functional language invokes 35 U.S.C. § 112(f). However, when reviewed from the point of view of one skilled in the relevant art, the written description fails to clearly link or associate the corresponding structure, material, or acts to the claimed functions. Telcordia Techs., Inc. v. Cisco Systems, Inc., 612 F.3d 1365, 1376, 95 USPQ2d 1673, 1682 (Fed. Cir. 2010). In other words, there is no disclosure or there is insufficient disclosure of structure, material, or acts for performing the claimed functions. Donaldson, 16 F.3d at 1195, 29 USPQ2d at 1850. Therefore, the claim is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite.
Appropriate action is required.













Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2 and 5 – 12 are rejected under 35 U.S.C. 103 as being unpatentable over IKEDA, Hiroo (US-20160132755-A1, hereinafter simply referred to as Ikeda) in view of Yano, Kotaro (US-20190333241-A1, hereinafter simply referred to as Yano).

Regarding independent claims 1 and 11, Ikeda teaches:
A learning method (See at least Ikeda, ¶ [0112], FIG. 12) for learning a discriminator for recognizing a crowd state made up of a plurality of persons from a recognition object image (See at least Ikeda, ¶ [0048, 0050], FIGS. 1 – 6 and 8 – 16, "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…"), the learning method comprising learning, with use of training data including a crowd state image that is a captured image of the crowd state (See at least Ikeda, ¶ [0048, 0050], FIGS. 1 – 6 and 8 – 16, "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…"), a crowd state label that is a label indicating the crowd state of the image (See at least Ikeda, ¶ [0048, 0050], FIGS. 1 – 6 and 8 – 16, "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…"), and a crowd position label that is a label indicating information enabling positions of the plurality of persons included in the crowd state image to be specified (See at least Ikeda, ¶ [0048, 0050], FIGS. 1 – 6 and 8 – 16, "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…").
Ikeda teaches all the subject matters of the claimed inventive concept as expressed in the rejections above.
But, Ikeda does not expressly disclose the concept of the discriminator having the crowd state image as input and the crowd state label and the crowd position label as output.
Nevertheless, Yano teaches the concept of the discriminator having the crowd state image as input and the crowd state label and the crowd position label as output (See at least Yano, ¶ [0107], FIG. 15, "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…").
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of the discriminator having the crowd state image as input and the crowd state label and the crowd position label as output as disclosed in the device of Yano to modify the known and similar device of Ikeda 

Regarding dependent claims 2 and 12, Ikeda modified by Yano above teaches:
wherein the discriminator is formed by a neural network (See at least Yano, ¶ [0107], FIG. 15, #neural network), and has a common network (See at least Yano, ¶ [0107], FIG. 15, #neural network) common to the crowd state and the crowd position on an input side of the neural network (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…") and independent networks (See at least Yano, ¶ [0107], FIG. 15, # density of persons, # movement vector) independently provided respectively for the crowd state and the crowd position on an output side of the neural (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…").

Claim(s) 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Yano, Kotaro (US-20190333241-A1, hereinafter simply referred to as Yano).

Regarding independent claim 3, Yano teaches:
A crowd state recognition device (See at least Yano, ¶ [0028, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…people flow analysis apparatus 10…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…") comprising: a dictionary storage unit (i.e., storage merely "…implemented, for example, by a magnetic disk or the like." according to para. [0071] of Applicants PG PUB ) (See at least Yano, ¶ [0029, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…storage device 12…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…") which stores a dictionary (i.e., merely "…a dictionary obtained as a result of learning..." according to para. [0082] of Applicants PG PUB – which is seen to correspond to "…a model obtained by learning…" as taught in Yano ¶ [0053] ) represented by a network structure forming a neural network (See at least Yano, ¶ [0107], FIG. 15, #neural network) and a weight (See at least Yano, ¶ [0107], FIG. 15, #density of persons) and a bias (See at least Yano, ¶ [0107], FIG. 15, # movement vector) of a network (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"), as a dictionary used in a discriminator for recognizing (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"); wherein the hardware processor is configured to execute the software code to recognize a crowd state (See at least Yano, ¶ [0107], FIG. 15, #density of persons) and a crowd position (See at least Yano, ¶ [0107], FIG. 15, # movement vector) from a recognition object image (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"), using a (See at least Yano, ¶ [0107], FIG. 15, # movement vector) from a recognition object image (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"), is based on the dictionary stored in the dictionary storage unit (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"), and has a common network common to (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…") and independent networks (See at least Yano, ¶ [0107], FIG. 15, # density of persons, # movement vector) independently provided respectively for the crowd state and the crowd position on an output side of the neural network (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…").
Yano teaches all the subject matters of the claimed inventive concept as expressed in the rejections above. However, the teachings are taught in separate embodiments.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Yano taught in separate embodiments for the desirable and advantageous purpose of providing a people flow analysis apparatus and a people flow analysis system that can appropriately count the number of target objects present in a certain region even when images captured by a plurality of image capturing devices are used, as discussed in Yano (See ¶ [0005]); thereby, helping to improve the overall system robustness by providing a people flow analysis apparatus and a people flow analysis system that can appropriately count the number of target objects present in a certain region even when images captured by a plurality of image capturing devices are used.

Regarding dependent claim 4, Yano teaches:
a network selection information storage unit (See at least Yano, ¶ [0029, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…storage device 12…") which stores network selection information (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"), wherein the hardware processor is configured to execute the software code to select a common network (See at least Yano, ¶ [0107], FIG. 15, #neural network) and an independent network (See at least Yano, ¶ [0107], FIG. 15, # density of persons, # movement vector) used in the discriminator based on the information stored in the network selection information storage unit (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…"), and selectively recognize only the crowd state or both the crowd state and the crowd position (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…").

Regarding dependent claim 5, Ikeda modified by Yano above teaches:
wherein the discriminator has a plurality of types of crowd positions as output (See at least Ikeda, ¶ [0048, 0050], FIGS. 1 – 6 and 8 – 16, "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…" Also, see at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15), and has independent networks (See at least Yano, ¶ [0107], FIG. 15, # density of persons, # movement vector) independently provided respectively for the plurality of types of crowd positions (See at least Yano, ¶ [0107], FIG. 15, # movement vector (horizontal direction), # movement vector (vertical direction)) to be output (See at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15, "…The arithmetic processing device 11 controls an operation of the people flow analysis apparatus 10, and, for example, executes a program stored in the storage device 12…", "…features of histograms of oriented gradients are extracted from an image…using a model obtained by learning…", "…In FIG. 15, among outputs from a neural network, dense portions by shading represent the positions of head portions of persons in the distribution of the density of persons. For movement vectors, the denser by shading the portion, the greater the amount of travel…The image recognition unit 211 integrates the distribution of the density of persons from the neural network to acquire the number of persons in each block. In addition, an average movement vector of the block is acquired by averaging movement vectors from the neural network…").

Regarding dependent claim 6, Ikeda modified by Yano above teaches:
wherein the hardware processor is configured to execute the software code to generate, based on a generation instruction to generate the crowd state image (See at least Ikeda, ¶ [0011, 0048, 0050], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…" Also, see at least Yano, ¶ [0029, 0053, 0107], FIGS. 1 – 5, 7 – 13 and 15), the crowd state image, the crowd (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15), from information of background images, information of person images, and information of person region images of regions in which respective persons are captured in the person images (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15).

Regarding dependent claim 7, Ikeda modified by Yano above teaches:
wherein the hardware processor is configured to execute the software code to generate the crowd state label based on the generation instruction (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15), and generate the crowd position label (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped." Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15).

Regarding dependent claim 8, Ikeda modified by Yano above teaches:
wherein information of each person region image includes the person region image and position information of a person in the person region image (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15), and the position information of the person includes at least one of a center position of the person (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15), a rectangle enclosing the person (See at least Ikeda, ¶ [0102], FIGS. 8 – 11), a center position of a head of the person (See at least Ikeda, ¶ [0054, 0102], FIGS. 8 – 11), and a rectangle enclosing the head (See at least Ikeda, ¶ [0102], FIGS. 8 – 11) (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15).

Regarding dependent claim 9, Ikeda modified by Yano above teaches:
wherein the hardware processor is configured to execute the software code to generate, based on the generation instruction and the person region images, an image of a crowd region made up of person regions of a plurality of persons for the crowd state image (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0107], FIGS. 1 – 5, 7 – 13 and 15), divide the generated image of the crowd region into predetermined N.times.N equal regions (See at least Ikeda, ¶ [0094], FIGS. 1 – 6 and 8 – 16. Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15), calculate average luminance for each divided region (See at least Ikeda, ¶ [0118], FIGS. 1 – 6 and 8 – 16), and set calculated luminance values of N.times.N dimensions as the crowd position label (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15).

Regarding dependent claim 10, Ikeda modified by Yano above teaches:
wherein the hardware processor is configured to execute the software code to generate, based on the generation instruction, the person region images (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15), and information indicating a head rectangle added to each of the person region images (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15), an image of a crowd region made up of head regions of a plurality of persons for the crowd state image (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15), divide the generated image of the crowd region into predetermined N.times.N equal regions (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15), calculate average luminance for each divided region (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15), and sets calculated luminance values of N.times.N dimensions as the crowd position label (See at least Ikeda, ¶ [0011, 0044, 0048, 0050, 0057, 0076, 0094, 0313], FIGS. 1 – 6 and 8 – 16, "…there is assumed a method using a discriminator with a learned dictionary in order to recognize a crowd state in an image. The dictionary is learned by training data such as images indicating crowd states…", "…the training data generating device 10 creates a plurality of pairs of local image of a crowd state and training label corresponding to the local image…", "…The learning local image information storage means 22 stores a size of a crowd patch (local image of a crowd state used for machine learning), and a size of the reference site of a person for the crowd patch…", "…storage means 23 stores designation information on person states for people (which will be denoted as people state control designation below) when synthesizing a plurality of person images in a crowd patch…The people state control designation is defined per item such as item "arrangement of person" for a people arrangement relationship such as overlapped persons or positional deviation when synthesizing a plurality of person images, item "direction of person" on orientations of persons, or item "number of persons" for the number of persons or density…", "…means 23 stores therein the people state control designation and the presence of a designated training label defined by the operator for at least the items "arrangement of person," "direction of person," and "number of persons…", "…The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21…calculates an aspect ratio of the crowd patch size stored in the learning local image information storage means 22…temporarily extracts a background at a proper position and a proper size to meet the aspect ratio from the selected background image…enlarges or downsizes the temporarily-extracted background to match with the crowd patch size stored in the learning local image information storage means 22…", "…The search window storage means 51 stores a group of rectangular regions indicating portions to be recognized for a crowd state on an image…The group of rectangular regions may be set by defining a changed size of a crowd patch depending on a position on an image based on the camera parameters indicating position, posture, focal distance and lens distortion of the image acquisition device 3 and the size of the reference site corresponding to the crowd patch size (the size of the reference site stored in the learning local image information storage means 22)…The group of rectangular regions may be set to cover the positions on the image…may be set to be overlapped.", "…means 14 may divide a person image read from the person image storage means 25 into the region of a person and the region of other than the person based on a person region image corresponding to the person image, may weight the region of the person and the region of other than the person, and may blend and synthesize the person image based on the weights…", "…the present invention may be used for outputting a recognition result of a crowd state in an image together with a position (2D position or 3D position) of the crowd to other system. Furthermore, the present invention can be used for acquiring a recognition result of a crowd state in an image and a position (2D position or 3D position) of the crowd and making video search with the acquisition as a trigger…" Also, see at least Yano, ¶ [0006, 0029, 0053, 0054, 0071, 0072, 0073, 0107], FIGS. 1 – 5, 7 – 13 and 15).














Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure: See the Notice of References Cited (PTO–892)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IDOWU O OSIFADE whose telephone number is (571)272-0864. The Examiner can normally be reached on Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s Supervisor, Kim Vu can be reached on (571) 272 -3859. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/IDOWU O OSIFADE/Primary Examiner, Art Unit 2666