DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/29/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by He et al (U.S. 10,713,794 B1; He)	
Regarding claim 1, He discloses A non-transitory computer-readable data storage medium (Fig.17- memory 1704) storing program code executable by a processor (Fig.17 – processor 1702) to perform processing  (Col 41 – line 35-37: “memory 1704 includes main memory for storing instructions for processor 1702 to execute or data for processor 1702 to operate on.”) comprising: 
applying a point extraction machine learning model (Fig.5 - the first convolutional neural network 510) to a captured image of 5one or multiple documents (Fig.5-images 410)  to identify the documents within the captured image and to identify a plurality of boundary points (Fig.12B ;  Fig. 14A- step 1452:”generate bounding box”) for each document; (Fig.5 and Col 23 – line 15-20: “ the system may have a feature-extraction convolutional neural network (e.g., the first convolutional neural network 510) that may take as inputs patches of images 410 and output features 520 of the patch/image. The features 520 may be represented by a feature map that encodes various features of the image 410.”) and 
for each document identified within the captured image, applying an instance segmentation machine learning model (Fig.5-second convolution network 530) to the boundary points for the document and to the captured image to extract a segmentation mask for the 10document. (Fig.5 ; Col 23 – line 36-41: “a large part of the system 500 is shared at the feature-extraction convolutional neural network stage. The layers of the second 530 and third 540 convolutional neural networks may be specialized for separately outputting an object proposal 430 and an object-score prediction 440, respectively.”; Fig. 14A – generate an instance segmentation mask for each predetermined class 1453”; Fig.14B)

Regarding claim 2,  He discloses the processing further comprises: 
for each document identified within the captured image, applying the segmentation mask for the document to the captured image to extract an image 15of the document from the captured image.  (Fig.5 – outputting an object 430 ; Fig.14B – step 1460-1470 ; Col 36 - line 59-61: “ At step 1460, the system may train the neural network branches used for generating the classification prediction, bounding box prediction, and masks prediction.” ; Col 37 – line 31-34: “At step 1470, the system may determine whether there are additional Rols to process for the given training image. If so, then the system may repeat steps, starting from step 1430, to process the next RoI.”)

Regarding claims 3 and 13,  He discloses the processing further comprises: for each document identified within the captured image, performing an action on the image of the document extracted from the captured image. (Fig.14B-step 1494-1496 ; Col 38 – line 18-22 : “ at step 1494, the system may generate instance segmentation masks for the subset of Rols selected, rather than the full set of Rols. At step 1496, the system may select, for each of the selected Rols, the associated instance segmentation mask that corresponds to the predicted class.”)

Regarding claims 4 and 14,  He discloses the processing further comprises: 
prior to applying the instance segmentation machine learning model, displaying the boundary points for each document overlaid against the captured 5image; (Fig.3B, Fig.5;Fig.12B ;  Fig. 14A- step 1452:”generate bounding box” ; Col 11 – line 56-57: “A sponsored story may be generated from stories in users' news feeds and promoted to specific areas within displays … ”) and 
permitting a user to modify the boundary points for each document overlaid against the captured image. (FIG. 3B illustrates example object proposals overlaying an image) ; (Col 20 – line 11-16: “  one or more servers 162 may be authorization/privacy servers for enforcing privacy settings. In response to a request from a user (or other entity) for a particular object stored in a data store 164, social-networking system 160 may send a request to the data store 164 for the object.”; Col 21- line 57-59: “particular embodiments of FIGS. 3A and 3B may be implemented by social-networking system 160, third-party system 170, or any other suitable system.”)

Regarding claims 5 and 15,  He discloses the processing further comprises: 
10after applying the instance segmentation machine learning model, displaying the segmentation mask for each document overlaid against the captured image; (Fig.3B, Fig.5;Fig.12B ;  Fig. 14A- step 1452:”generate bounding box” ; Col 11 – line 56-57: “A sponsored story may be generated from stories in users' news feeds and promoted to specific areas within displays … ”)
in response to user disapproval of the segmentation mask for any document, displaying the boundary points for each document overlaid against the 15captured image; (Col 21 – line 37-39 Fig.3B shows objects proposals 301-306 are represented as shapes overlaying objects in the image 300A”; it shows shapes overlaying objects read as “boundary points”)
permitting the user to modify the boundary points for each document overlaid against the captured image; (FIG. 3B illustrates example object proposals overlaying an image) ; (Col 20 – line 11-16: “  one or more servers 162 may be authorization/privacy servers for enforcing privacy settings. In response to a request from a user (or other entity) for a particular object stored in a data store 164, social-networking system 160 may send a request to the data store 164 for the object.”; Col 21- line 57-59: “particular embodiments of FIGS. 3A and 3B may be implemented by social-networking system 160, third-party system 170, or any other suitable system.”) and 
for each document identified within the captured image, reapplying the instance segmentation model to the boundary points for the document and to the 20captured image to reextract the segmentation mask for the document. (Fig.5 – outputting an object 430 ; Fig.14B – step 1460-1470 ; Col 36 - line 59-61: “ At step 1460, the system may train the neural network branches used for generating the classification prediction, bounding box prediction, and masks prediction.” ; Col 37 – line 31-34: “At step 1470, the system may determine whether there are additional Rols to process for the given training image. If so, then the system may repeat steps, starting from step 1430, to process the next RoI.”)

Regarding claim 6,  He discloses the segmentation mask for each document is reextracted using the 21Attorney docket no. 85986544 captured image from which the segmentation mask was first extracted, such that the segmentation mask is reextracted without having to capture a new image of the documents. (Fig. 14A and Col 37 – line 31-34: “At step 1470, the system may determine whether there are additional Rols to process for the given training image. If so, then the system may repeat steps, starting from step 1430, to process the next RoI.”)

Regarding claim 7,  He discloses the point extraction machine learning model outputs a plurality of center points corresponding to the documents within the captured image in order to identify the documents within the captured image, and wherein the point extraction machine model outputs the boundary points for each document in relation to the center point corresponding to the 10document. (Fig. 12A-12B ; Col 34- line 17-23: “FIG. 12B shows four such sampling locations per bin (e.g., sampling points 1241-1244). RolAlign computes the value of each sampling point based on their defined locations. For example, for each sampling point, RolAlign may compute the value for that sampling point by bilinear interpolation from the nearby grid points on the feature map” ; Col 34 – line 39-42: “the system may interpolate only a single value at each bin center (without pooling), which experimentally has been shown to be nearly as effective”) 

Regarding claim 8,  He discloses the center points are output by the point extraction machine learning model within a heatmap of the center points.(Fig. 12A-12B ; Col 34 – line 17-20 “FIG. 12B shows four such sampling locations per bin (e.g., sampling points 1241-1244). RolAlign computes the value of each sampling point based on their defined locations.”, it shows that sampling points is within the feature map as sampling point 1241 within the featured map 1200)

Regarding claim 9,  He discloses the point extraction machine learning model comprises: 
a backbone convolutional neural network that extracts image features from the captured image; (Col 29 – line 59-62: “A deep network, such as ResNet, Feature Pyramid Network (FPN), or any other suitable convolutional backbone, may process the input image 910 and generate a feature map 920.”) ; (Fig.13B – Col 34 – line 51-52: “(i) the convolutional backbone architecture used for feature extraction over an entire image,”) and 
a feature pyramid network head module to the backbone convolutional neural network that identifies the documents and the boundary points for each 20document from the extracted image features. (Col 33 – line 27-29: “FIG. 11A illustrates an RoI 1110, as proposed by the RPN, overlaid over a feature map 1100, which is represented as a dashed grid (as determined by the feature map's stride) “) ; Col 35 – line 1-6 : “Faster R-CNN with an FPN backbone extracts RoI features from different levels of the feature pyramid according to their scale, but otherwise the rest of the approach is similar to vanilla ResNet. Using a ResNet-FPN backbone for feature extraction with Mask R-CNN gives excellent gains in both accuracy and speed.”) 

Regarding claim 10,  He discloses the instance segmentation machine learning model comprises: 
a backbone convolutional neural network that extracts image features from the captured image based on the boundary points for each document 5identified within the captured image; (Col 35 – line 53-56: “At step 1420, the system may generate a feature map for the training image. As described in further detail elsewhere herein, the feature map may be generated using a neural network, such as ResNet or FPN backbone “)  and
a pyramid scene parsing head module to the backbone convolutional neural network that extracts the segmentation mask for each document identified within the captured image from the extracted image features. (Col 35 – line 1-6 : “Faster R-CNN with an FPN backbone extracts RoI features from different levels of the feature pyramid according to their scale, but otherwise the rest of the approach is similar to vanilla ResNet. Using a ResNet-FPN backbone for feature extraction with Mask R-CNN gives excellent gains in both accuracy and speed.”) 

Regarding claim 11,  He discloses the point extraction machine learning model and the instance segmentation machine learning model each comprises a backbone convolutional neural network that extracts image features from the captured image, (Fig.5; Col 29 – line 59-65: “ A deep network, such as ResNet, Feature Pyramid Network (FPN), or any other suitable convolutional backbone, may process the input image 910 and generate a feature map 920. In particular embodiments, the output of the backbone may also be used by a region proposal network (RPN) to identify any number of Rols (e.g., 930) that map to regions in the feature map 920.”)
wherein the backbone convolutional neural network of the point extraction machine learning model is of a same or different type of neural network than the 15backbone convolutional neural network of the instance segmentation machine learning model. (Fig.5 ; Col 36 – line 1-5: “RPN and Mask R-CNN may have the same backbones and so they are shareable. In particular embodiments, each of the Rols output by the RPN may be individually processed to predict a class, a bounding box, and a segmentation mask.”)

Regarding claim 12,  He discloses A computing device (Col 3 – line 31: “FIG. 17 illustrates an example computer system.”) comprising: 
an image capturing sensor (Col 42 – line 35-40: “As an example and not by way of limitation, an I/O device may include a keyboard, keypad, microphone, monitor, mouse, printer, scanner, speaker, still camera, stylus, tablet, touch screen, trackball, video camera, another suitable I/O device or a combination of two or more of these.”) to capture an image of one or multiple documents (Fig.5-images 410);
20a processor (Fig.17 – processor 1702); and a memory (Fig.17- memory 1704)  storing instructions executable by the processor (Col 41 – line 35-37: “memory 1704 includes main memory for storing instructions for processor 1702 to execute or data for processor 1702 to operate on.”) to: 
applying a point extraction machine learning model (Fig.5 - the first convolutional neural network 510) to a captured image of 5one or multiple documents (Fig.5-images 410)  to identify the documents within the captured image and to identify a plurality of boundary points (Fig.12B ;  Fig. 14A- step 1452:”generate bounding box”) for each document; (Fig.5 and Col 23 – line 15-20: “ the system may have a feature-extraction convolutional neural network (e.g., the first convolutional neural network 510) that may take as inputs patches of images 410 and output features 520 of the patch/image. The features 520 may be represented by a feature map that encodes various features of the image 410.”) and 
for each document identified within the captured image, applying an instance segmentation machine learning model (Fig.5-second convolution network 530) to the boundary points for the document and to the captured image to extract a segmentation mask for the 10document. (Fig.5 – Col 23 – line 36-41: “a large part of the system 500 is shared at the feature-extraction convolutional neural network stage. The layers of the second 530 and third 540 convolutional neural networks may be specialized for separately outputting an object proposal 430 and an object-score prediction 440, respectively.”; Fig. 14A – generate an instance segmentation mask for each predetermined class 1453”; Fig.14B)
for each document identified within the captured image, applying the segmentation mask for the document to the captured image to extract an image 15of the document from the captured image.  (Fig.5 – outputting an object 430 ; Fig.14B – step 1460-1470 ; Col 36 - line 59-61: “ At step 1460, the system may train the neural network branches used for generating the classification prediction, bounding box prediction, and masks prediction.” ; Col 37 – line 31-34: “At step 1470, the system may determine whether there are additional Rols to process for the given training image. If so, then the system may repeat steps, starting from step 1430, to process the next RoI.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sakar et al (U.S. 20210049357 A1), “Electronic document segmentation using deep learning”, teaches about electronic document segmentation and more particularly to using deep learning to identify elements within an electronic document and a hierarchy that relates the identified elements.
Zhang et al (U.S. 20190171871 A1), “Systems And Methods For Optimizing Pose Estimation”, teaches about to machine-learning models and various optimization techniques that enable computing devices with limited system resources (e.g., mobile devices such as smartphones, tablets, and laptops) to recognize objects and features of objects captured in images or videos.
Palaniyanppan et al (U.S. 20180293731 A1), “ METHODS AND SYSTEMS FOR SEGMENTING MULTIPLE DOCUMENTS FROM A SINGLE INPUT IMAGE”, teaches about methods and systems for segmenting multiple documents from a single electronic image in a single-pass scanning.
Hoehne et al (U.S. 20200082218 A1), “OPTICAL CHARACTER RECOGNITION USING END-TO-END DEEP LEARNING”, teaches about an optical character recognition (OCR) system may utilize a neural network architecture. This neural network architecture may allow the conversion of images of text into characters with a single model and a single computational step. The neural network may receive an image as an input and may output the set of characters found in the image, the position of the characters on the image, and/or bounding boxes for characters, words, or lines. Using these outputs or a subset of these outputs may allow the OCR system to generate a document with optically recognized text.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward F Urban can be reached on (571)-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DUY TRAN/Examiner, Art Unit 2665                          

/BOBBAK SAFAIPOUR/Primary Examiner, Art Unit 2665