DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


                                  Statement of positive recitation

            Claim 34 recites, “A machine readable medium…”, however paragraph 034 indicates that such medium is non-transitory, “The code, in an embodiment, is stored on a computer-readable storage10 medium in the form of a computer program comprising a plurality of computer-readable instructions executable by one or more processors. The computer-readable storage medium, in an embodiment, is a non-transitory computer-readable medium”.










Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4,11-12, 22 and 26, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 4,12 and 22, it’s not clear what is, “ground truth”. 
Regarding claim 11, it’s not clear what is meant by, “wherein the one or more images is one image”.
Regarding claim 26, it’s not clear what is meant by, “wherein a first segmentation of a first image corresponds to pixels of the first image that are of an object and a second segmentation of a second image corresponds to pixels of the second image that are of the object”.







Claim Rejections - 35 USC § 102


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 5-7, 13-15, 19-21, 23-25, 27, 33-34 and 40 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Shi (USPN       10,990,645).

Regarding claim 1, Shi recites, detecting one or more segments of one or more objects within one or more images based, at least in part, on a neural network (Please note, column 8, lines 44-54. As indicated for visual layout, embodiments may utilize a Neural Network such as a CNN (Convolutional Neural Network) to classify or identify the page intent (or goal, purpose, or function). The visual model may further perform object detection and segmentation of the current page, using convolutional Neural Network and non-max suppression, to identify key areas of the page and juxtaposition among key areas) trained in an unsupervised manner to infer the one or more segments (Please note, column 10, lines 33-34. As indicated utilizing unsupervised learning to group elements into “clusters” of potential target types).
Regarding claim 5, Shi recites, wherein the neural network is trained in an unsupervised manner on images that are from any of at least two categories of images (Please note, column 10, lines 33-36. As indicated utilizing unsupervised learning to group elements into “clusters” of potential target types. Instead of manually defining and labeling target types, this enables a system to automatically identify clusters of similar elements, and treat each cluster as a potential target type).
Regarding claim 6, Shi recites, wherein how many segments to detect in an image is a parameter specified by a user to train the neural network (Please note, column 8, lines 9-19.As indicated in training a model, certain characteristics of the model are determined by the training data; for example, the training data may determine or calculate one or more model parameters, such as the coefficients of the features. Typically, by increasing the amount of training data available, the error is reduced, and the model's accuracy is improved. Thus, in embodiments of the data aggregation process described herein, newly extracted features are added to the training data repository and used to update and improve the model(s)).
Regarding claim 7, similar analysis as those presented for claim 1, are applicable.
Regarding claim 13, Shi recites, wherein the neural network is a fully convolutional neural network (Please note, column 3, lines 47-50. As indicated figure 9 is a diagram illustrating the architecture (FIG. 9(a)) and operation or processing flow (FIG. 9(b)) of a convolutional neural network that may be used as part of the processing of a page in an implementation of the system and methods for automated data aggregation described herein).
Regarding claim 14, recites the same limitations as claim 1 above with the addition of hardware sensors (Please note, column 27, lines 64-67. As indicated the CPU or a device in which the CPU is incorporated may be coupled, connected, and/or in communication with one or more peripheral devices, such as display. In another example implementation, the processing element or processor may be incorporated into a mobile computing device, such as a smartphone or tablet computer). As such, the above-mentioned devices posse sensors. 
Regarding claim 15, Shi recites, wherein the one or more hardware sensors includes a video capture device and at least a portion of the one or more images is from a video captured by the video capture device (Please note, column 27, lines 64-67. As indicated the CPU or a device in which the CPU is incorporated may be coupled, connected, and/or in communication with one or more peripheral devices, such as display. In another example implementation, the processing element or processor may be incorporated into a mobile computing device, such as a smartphone or tablet computer). In this regard, smartphones or tablet computers capture video. 
Regarding claim 19, Shi recites, wherein the image recognition system comprises a first computer system comprising the one or more hardware sensors to communicate, via a network, with a second computer system comprising the one or more processors (Please note, column 27, lines 64-67. As indicated the CPU or a device in which the CPU is incorporated may be coupled, connected, and/or in communication with one or more peripheral devices, such as display. In another example implementation, the processing element or processor may be incorporated into a mobile computing device, such as a smartphone or tablet computer).
Regarding claim 20, Shi recites, wherein the one or more segments comprises a background segment (Please note, column 8, lines 44-54. As indicated for visual layout, embodiments may utilize a Neural Network such as a CNN (Convolutional Neural Network) to classify or identify the page intent (or goal, purpose, or function). The visual model may further perform object detection and segmentation of the current page, using convolutional Neural Network and non-max suppression, to identify key areas of the page and juxtaposition among key areas). In this regard, key areas of the page do not exclude background.
Regarding claim 21, similar analysis as those presented for claim 1, are applicable.

Regarding claim 23, Shi recites, wherein the one or more neural networks are neural networks to be trained based at least in part on one or more rules that constrain segmentation generation based on one or more properties, wherein the one or more properties includes at least one of: geometric density; invariance to spatial transformation; and semantic consistency (Please note, column 8, lines 25-28. As indicated as recognized by the inventor, this enables embodiments to implement image classification, as well as object detection and segmentation of the page image to model the page layout semantically).
Regarding claim 24, Shi recites, wherein neural network is to be trained on a set of images of a category and to predict segments in another image of the category (Please note, figure 2).
Regarding claim 25, similar analysis as those presented for claim 14, are applicable.
Regarding claim 27, similar analysis as those presented for claim 1, are applicable.
Regarding claim 33, Shi recites, wherein the one or more processors are to further determine one or more boundaries from the one or more segments (Please note, column 8, lines 44-54. As indicated for visual layout, embodiments may utilize a Neural Network such as a CNN (Convolutional Neural Network) to classify or identify the page intent (or goal, purpose, or function). The visual model may further perform object detection and segmentation of the current page, using convolutional Neural Network and non-max suppression, to identify key areas of the page and juxtaposition among key areas). In this regard, key areas of the page do not exclude boundaries.
Regarding claim 34, similar analysis as those presented for claim 1, are applicable.
Regarding claim 40, Shi recites, wherein the parameters associated with the one or more neural networks include one or more weights determined as part of training the one or more neural networks that are used to detect the one or more segments (Please note, column 27, lines 25-40. As indicated in general terms, a neural network may be viewed as a system of interconnected artificial “neurons” that exchange messages between each other. The connections have numeric weights that are “tuned” during a training process, so that a properly trained network will respond correctly when presented with an image or pattern to recognize (for example), In this characterization, the network consists of multiple layers of feature-detecting “neurons”; each layer has neurons that respond to different combinations of inputs from the previous layers. Training of a network is performed using a “labeled” dataset of inputs in a wide assortment of representative input patterns that are associated with their intended output response. Training uses general-purpose methods to iteratively determine the weights for intermediate and final feature neurons. In terms of a computational model, each neuron calculates the dot product of inputs and weights, adds the bias, and applies a non-linear trigger or activation function (for example, using a sigmoid response function).


















Allowable Subject Matter


Claims 2-3, 8-10, 16-18, 28-32 and 35-39 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
           The following is a statement of reasons for the indication of allowable subject matter: The closest applied Prior Art of record fails to disclose or reasonably suggest wherein the neural network is trained in an unsupervised manner based at least in part on one or more loss functions that encode rules that constrain how segments are determined and wherein the one or more loss functions encode one or more constraints on how segments are generated, including at least one of: a geometric concentration constraint; a spatial invariance constraint; and a semantic consistency constraint.













Examiner’s Note

               The examiner cites particular figures, paragraphs, columns and line numbers in the references as applied to the claims for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claims, other passages and figures may apply as well. 
               It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.












Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMIR ALAVI whose telephone number is (571)272-7386. The examiner can normally be reached on M-F from 8:00-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571)272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.









/AMIR ALAVI/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        Tuesday, May 31, 2022