Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-16 are pending.
Drawings as filed are accepted.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-6 and 10-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
As to claims 2 and 10:

Claims 2 and 10 recite “performing semi-supervised background subtraction to remove areas not capturing areas of occupancy from the captured image that is inputted to the trained detection machine learning model”.

It is unclear what the language “remove areas not capturing areas of occupancy” specifically indicate due to confusing grammatical construction. It is unclear whether if there were missing languages intended between “remove areas” and “not capturing areas”, or whether Applicant intended to say to remove areas that are not capturing areas of occupancy.

The claims are thus indefinite as the grammatical construction renders the language’s scope unknown.

Dependent claims 3-6, 11-14 fall together with their respective base claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4, 9-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Lin et al. (US 9,251,410) in view of Sundaresan et al. (US 2019/0114804).
As to claim 1:

Lin discloses a computer-implemented method for occupancy monitoring of a facility using at least one image captured of the facility (See Abstract, method for counting people in an area, Fig. 3 for example the area being interior of a office/building) the method comprising: 

receiving the at least one captured image; (See Fig. 3, col. 3, lines 40 through 67, receiving at least one input image that was captured by one or more camera)
receiving an input signal comprising a detected number of occupants in the facility, (Col. 5, lines 32 through 55, also Col. 5, lines 60 through line 10 of col. 6. The system obtains count of people inside the premise)
and outputting the detected number of occupants.  (See abstract, the counted number of people inside the premise is generated)

Lin discloses a model configured for counting people in a facility as disclosed above, outputting a detection of each occupant in the facility (See Col. 5, lines 12 through 30, recognizing and identifying face of each detected person), however does not explicitly disclose:

the number of occupants captured in the at least one captured image determined using a trained detection machine learning model, the detection machine learning model taking as input the at least one captured image with an associated feature map, and the detection machine learning model trained using training images each comprising a respective label for each occupant in the training image; 

The examiner asserts that the above limitations merely describes typical training of an crowd-counting AI model using training image(s) and labels.

In fact, Sundaresan in the similar field of endeavor discloses a model for detecting and counting “objects” (¶0049, a person can be an “object” to be detected/counted). Per ¶0052, the model is a trained detection neural network (i.e. “trained detection machine learning model”), trained using training images that include corresponding labels indicating classifying the  respective persons in the training images. An input image inputted to the input layer is then given/associated with a mapping from input layer, called a feature map per ¶0124.

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the people detection/counting model by Lin can be built/trained in manner disclosed in Sudaresan to arrive at the claimed invention. Sudaresan has demonstrated that it is common knowledge before the effective filing time of the invention that the omitted limitation is merely typical training of an crowd-counting AI model using training image(s) and labels, which advantageously aids the learning model to learn what features/characteristic to expect from a prospective target, thus reducing training time and margin of error.

As to claim 9:
Lin discloses a system for occupancy monitoring of a facility using at least one image captured of the facility, the system comprising one or more processors and a data storage (Abstract, col. 3, lines 5-10), the one or more processors configured to execute: an input module to receive the at least one captured image from the one or more cameras (See Fig. 3, col. 3, lines 40 through 67, receiving at least one input image that was captured by one or more camera); an occupant detection module to: receiving an input signal comprising a detected number of occupants in the facility, (Col. 5, lines 32 through 55, also Col. 5, lines 60 through line 10 of col. 6. The system obtains count of people inside the premise)
and an output module to outputting the detected number of occupants.  (See abstract, the counted number of people inside the premise is generated)

Lin discloses a model configured for counting people in a facility as disclosed above, outputting a detection of each occupant in the facility (See Col. 5, lines 12 through 30, recognizing and identifying face of each detected person), however does not explicitly disclose:

the number of occupants captured in the at least one captured image determined using a trained detection machine learning model, the detection machine learning model taking as input the at least one captured image with an associated feature map, and the detection machine learning model trained using training images each comprising a respective label for each occupant in the training image; 

The examiner asserts that the above limitations merely describes typical training of an crowd-counting AI model using training image(s) and labels.

In fact, Sundaresan in the similar field of endeavor discloses a model for detecting and counting “objects” (¶0049, a person can be an “object” to be detected/counted). Per ¶0052, the model is a trained detection neural network (i.e. “trained detection machine learning model”), trained using training images that include corresponding labels indicating classifying the  respective persons in the training images. An input image inputted to the input layer is then given/associated with a mapping from input layer, called a feature map per ¶0124.

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the people detection/counting model by Lin can be built/trained in manner disclosed in Sudaresan to arrive at the claimed invention. Sudaresan has demonstrated that it is common knowledge before the effective filing time of the invention that the omitted limitation is merely typical training of an crowd-counting AI model using training image(s) and labels, which advantageously aids the learning model to learn what features/characteristic to expect from a prospective target, thus reducing training time and margin of error.


As to claims 2 and 10:

Lin in view of Sudaresan discloses all limitations of claims 1/9, further comprising the occupant detection module performing semi-supervised background subtraction to remove areas not capturing areas of occupancy from the captured image that is inputted to the trained detection machine learning model.  (See Lin, Col. 4, lines 28 through 52, using background subtraction guided by the head recognition module, for example removing a background from the successive counting zone images, separating the moving blocks from areas that are not the block (i.e. background areas)).

 
As to claims 3 and 11:
Lin in view of Sudaresan discloses all limitations of claims 2/10, wherein the background subtraction comprises separating occupants as foreground elements from the background by generating a foreground mask.  (Lin - Col. 4, lines 40 through 52, “segmentation method” which intrinsically generating a mask, also “determines the head location by segn1enting moving blocks from the successive counting zone images and detecting the moving blocks.”, the moving blocks are the mask resulted from separating the background)

As to claims 4 and 12:
Lin in view of Sudaresan discloses all limitations of claims 3/11, wherein the foreground elements are determined by detecting dynamically moving objects.  (Lin - Col. 4, lines 40 through 52, “determines the head location by segmenting moving blocks from the successive counting zone images and detecting the moving blocks.”, the moving blocks are the mask resulted from separating the background)


Claim(s) 5, 6, 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Lin et al. (US 9,251,410) in view of Sundaresan et al. (US 2019/0114804) and in further view of Heikman et al. (US 2018/0225834).


As to claims 5 and 13:
 Lin in view of Sudaresan discloses all limitations of claims 4/12, wherein  Lin discloses receiving the at least one captured image comprises receiving multiple successive captured images, (Lin - Col. 4, lines 40 through 52, successive capture images are obtained and detecting dynamically moving objects )and however silent on wherein detecting dynamically moving objects comprises determining a running average as a function over the successive captured images.  

Heikman, however in a field of endeavor, discloses detection of movements/changes in frame by determining a running average over the total number of frames over a period of time (See ¶0035).

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the background determination of Lin/Sudaresan can be done using a running average. Moving average is a well-known method for providing illustration of changes (i.e. movements) in a smooth trend that is less prone to false positive/negatives).


As to claims 6 and 14:
Lin in view of Sudaresan and Heikman discloses all limitations of claim 5/13, wherein the running average is determined using: 
    PNG
    media_image1.png
    21
    261
    media_image1.png
    Greyscale
 wherein FG are coordinates of foreground elements, CF are coordinates in the current frame, and BG are coordinates in a background model.  (See ¶0036 of Heikman, every pixel value in the background frame is subtracted from every corresponding pixel values of the current frame)

Claim(s) 7, 8, 15 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over 
Lin et al. (US 9,251,410) in view of Sundaresan et al. (US 2019/0114804) and in further view of Hall et al. (US 2022/0198657)

As to claims 7 and 15:
Lin in view of Sudaresan discloses all limitations of claims 1/9, however is silent on wherein the detection machine learning model comprises a region proposal network.  
Hall, however, in a related field of endeavor discloses an image analysis system that employs ResNet-50 (i.e. a RPN) to segmenting a region of interest from an image (See ¶0109).

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that  segmentation of a counting zone image in Lin can be done using a ResNet-50 as suggested by Hall.  As number of layers is proportional to quality of output, a ResNet-50 provides a quite high expected accuracy advantage with 50 layers of processing.


As to claims 8 and 16:
Lin in view of Sudaresan and Hall discloses all limitation claims 7/15, wherein the region proposal network comprises a ResNet-50 architecture to extract features of occupants and a fully connected network to localize and classify the occupants using the features. (See Hall, ¶0109, note that ResNet-50 is used for segmentation of target objects, and ¶0147, the top layer typically being a fully connected neural network that is used for classification of object, which in context of Lin being people/person)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wang et al. (US 2014/0277757) - Embodiments of methods and apparatus disclosed herein may employ depth, visual, or motions sensors to enable three-dimensional people counting and data mining to enable an energy saving heating, ventilation, and air conditioning (HVAC) control system. Head detection methods based on depth information may assist people counting in order to enable an accurate determination of room occupancy. A pattern of activities of room occupancy may be learned to predict the activity level of a building or its rooms, reducing energy usage and thereby providing a cost savings.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUAN M HUA whose telephone number is (571)270-7232. The examiner can normally be reached 10:30-6:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on 571-272-7795. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/QUAN M HUA/Primary Examiner, Art Unit 2645