DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .20190065589.pn.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 10, 11  is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al US 2019/0065589 in view of Tao et al 2020/0104288.
Regarding claim 1, Wen et al teaches a method for automatically searching, isolating and extracting information-of-interest from single or multi-page, unknown layout, unstructured text data sources (webpage content module 116 can include a feature extraction module 202 that extracts text and/or one or more images (searching and isolating by looking for text or images in a webpage and extracting them) from the webpage (single or multi-page, unknown layout, unstructured text data source, also see paragraph 4 applicant’s specification) (paragraph 0050), comprising the steps of: providing a first computer processing system, comprising: 
a computer processor unit (processing unit 114 (paragraph 0044)); and 
a non-transitory computer readable medium operably associated with the computer processor unit (paragraph 0012 and 0099), the non-transitory computer readable medium storing instructions executable by the computer processor unit to perform the steps of: 
inputting a first list of the single or multi-page, unknown layout unstructured text data source (webpages, paragraph 50)
inputting a second list of information-of-interest to be detected, (positive samples of the classifier and  training data, paragraph 54,  isolated and extracted from the single or multi-page, unknown layout, unstructured text data sources (the feature extraction module 202 can use a pruning operation to identify candidate elements (second list of information of interest) on a webpage that may include information of interest (e.g., a title or a breadcrumb). A set of features can be extracted from the candidate elements, and the features can be input into a trained classifier to obtain a final determination of the webpage elements that include the information of interest (paragraph 0051), wherein the second list includes a list of labels (e.g., label 1 and label 0, paragraph 54)
processing the first list of single or multi-page, unknown layout, unstructured text data sources using the list of labels (paragraph 54 resulting in C trained classifier, paragraph 51, features can be input into trained classifier to obtain a final determination of the webpage elements);
Wen et al fails to teach outputting a report containing all detected, isolated and extracted information-of-interest as defined by the list of labels.
Tao et al teaches outputting a report containing all detected, isolated and extracted information-of-interest as defined by the list of labels (Extraction module 121 can also obtain a log of selections of interest information contents and disinterest information contents, as well as recommendation information, and determine the information items associated with the information contents or included in the recommendation information (paragraph 0048)) 
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen et al to include: outputting a report containing all detected, isolated and extracted information-of-interest as defined by the list of labels.  The reason of doing would be to allow user to view and utilized the information and also be able to verify all information.
Note:  the limitation after optional for claim 1 is not required for claim 1 as they are only optional and will not be mapped with limitations from prior art.
Regarding claim 10, Wen et al teaches wherein the single or multi-page, unknown layout, unstructured text data sources include an unknown number of pages (feature extraction module 202 uses a tag-based approach for feature extraction on product pages and other webpages (paragraph 0050), note the webpages in paragraph does not have a particular pages number associate with it and it would have obvious to a person with ordinary skill in the art to know that the webpages in paragraph 50 is referring to any number of pages of webpage and therefore, unknown number of pages).
Regarding claim 11, Wen et al teaches wherein the single or multi-page, unknown layout, unstructured text data sources include an unknown sequence of pages (feature extraction module 202 uses a tag-based approach for feature extraction on product pages and other webpages (paragraph 0050) Note: because the webpages in paragraph 50 can be in either a known sequence or unknown sequence, it would have been obvious to a person with ordinary skill in the art to applied the technique disclosed by Wen to apply to both known sequence of webpages and unknown sequence of webpages.
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al US 2019/0065589 in view of Tao et al 2020/0104288 and further in view of Carus US 2010/0293451.

Regarding claim 12, Wen et al does not teach wherein the processing step further comprises searching, isolating and ignoring intervening page headers and footers to extract contiguous information-of-interest from consecutive pages of the single or multi-page, unknown layout, unstructured text data sources.
Carus teaches searching, isolating and ignoring intervening page headers and footers to extract contiguous information-of-interest from consecutive pages of the single or multi-page data source (paragraph 26)
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen to include: searching, isolating and ignoring intervening page headers and footers to extract contiguous information-of-interest from consecutive pages of the single or multi-page, unknown layout, unstructured text data sources.
The reason of doing so would have allowed the system to filter out unwanted information from webpages.

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al US 2019/0065589 in view of Tao et al 2020/0104288 further in view of Nakajima US 2001/0021279.
Regarding claim 13, Wen et al in view of Tao et al teaches all of the limitations of claim 1.
Wen et al teaches of the single or multi-page, unknown layout, unstructured text data sources (see rejection of claim 1)
Wen et al in view of Tao et al fails to teach wherein the processing step further comprises searching, isolating and extracting an unknown number, lengths and indentations of sections or paragraphs.
Nakajima teaches wherein the processing step further comprises searching, isolating and extracting an unknown number, lengths (the method of detecting paragraph in Nakajima obviously can be applied to any number of paragraphs and paragraphs of any length) and indentations (no indentation, paragraph 0112, to a machine no indentation detected is the same as unknown indentation) of sections or paragraphs (extracting paragraphs, paragraph 40, 41).
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen et al in in view of Tao et al to include: wherein the processing step further comprises searching, isolating and extracting an unknown number, lengths and indentations paragraphs.  The reason of doing would be to select specific data that a user may want for processing, or a whole paragraph of interest for the user to view.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al US 2019/0065589 in view of Tao et al 2020/0104288 further in view of Amit et al US 2013/0275399.
	Regarding claim 14, Wen et al in view of Tao et al teaches all of the limitations of claim 1.
Wen et al teaches of the single or multi-page, unknown layout, unstructured text data sources (see rejection of claim 1)
Wen et al in view of Tao et al fails to teach wherein the processing step further comprises searching, isolating and extracting information-of-interest from tabular data with an unknown number of rows and column sequences.
Amit et al teaches wherein the processing step further comprises searching, isolating and extracting information-of-interest from tabular data with an unknown number of rows and column sequences (most data reported in tables may be extracted (paragraph 0016).  note:  AMIT invention is for searching for data in a table which obvious can have any number of rows and columns (unknown number of rows and columns).
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen et al in view of Tao et al to include: wherein the processing step further comprises searching, isolating and extracting information-of-interest from tabular data with an unknown number of rows and column sequences.  The reason of doing would be to select specific data that a user may want for processing or to be quickly identify important table information for the user to view. 


Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al US 2019/0065589 in view of Tao et al 2020/0104288 further in view of Milo et al US 2004/0148278.
Regarding claim 15, Wen et al in view of Tao et al teaches all of the limitations of claim 1.
Wen et al teaches of the single or multi-page, unknown layout, unstructured text data sources (see rejection of claim 1)
Wen et al in view of Tao et al fails to teach wherein the processing step further comprises searching, isolating and extracting information-of-interest from multiple text paragraphs, with the capability to detect and ignore intervening information that is not of interest to a user.
Milo et al teaches wherein the processing step further comprises searching, isolating and extracting information-of-interest from multiple text paragraphs, with the capability to detect and ignore intervening information that is not of interest to a user (citation paragraphs are extracted (paragraph 0197).  Note: only citations containing certain words are identified and extracted.  Therefore, other information, not pertaining to citations to be extracted are not identified as information of interest, therefore, they are ignored.
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen et al in view of Tao et al to include: wherein the processing step further comprises searching, isolating and extracting information-of-interest from multiple text paragraphs, with the capability to detect and ignore intervening information that is not of interest to a user.  The reason of doing so would help to user to quickly identified important information for the user in big webpages to save time.


Claim(s) 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al US 2019/0065589 in view of Tao et al 2020/0104288 further in view of Nagao et al US 2008/0001953.

Regarding claim 24, Wen et al teaches a method for automatically searching, isolating and extracting information-of-interest from single or multi-page, unknown layout, unstructured text data sources (webpage content module 116 can include a feature extraction module 202 that extracts text and/or one or more images (searching and isolating by looking for text or images in a webpage and extracting them) from the webpage (single or multi-page, unknown layout, unstructured text data source, also see paragraph 4 applicant’s specification) (paragraph 0050), comprising the steps of: providing a first computer processing system, comprising: 
a computer processor unit (processing unit 114 (paragraph 0044)); and 
a non-transitory computer readable medium operably associated with the computer processor unit (paragraph 0012 and 0099), the non-transitory computer readable medium storing instructions executable by the computer processor unit to perform the steps of: 
inputting a first list of the single or multi-page, unknown layout unstructured text data source (webpages, paragraph 50)
inputting a second list of information-of-interest to be detected, (positive samples of the classifier and  training data, paragraph 54,  isolated and extracted from the single or multi-page, unknown layout, unstructured text data sources (the feature extraction module 202 can use a pruning operation to identify candidate elements (second list of information of interest) on a webpage that may include information of interest (e.g., a title or a breadcrumb). A set of features can be extracted from the candidate elements, and the features can be input into a trained classifier to obtain a final determination of the webpage elements that include the information of interest (paragraph 0051), wherein the second list includes a list of labels (e.g., label 1 and label 0, paragraph 54)
processing the first list of single or multi-page, unknown layout, unstructured text data sources using the list of labels (paragraph 54 resulting in C trained classifier, paragraph 51, features can be input into trained classifier to obtain a final determination of the webpage elements);
Wen et al fails to teach outputting a report containing all detected, isolated and extracted information-of-interest as defined by the list of labels.
Tao et al teaches outputting a report containing all detected, isolated and extracted information-of-interest as defined by the list of labels (Extraction module 121 can also obtain a log of selections of interest information contents and disinterest information contents, as well as recommendation information, and determine the information items associated with the information contents or included in the recommendation information (paragraph 0048)) 
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen et al to include: outputting a report containing all detected, isolated and extracted information-of-interest as defined by the list of labels.  The reason of doing would be to allow user to view and utilized the information and also be able to verify all information.
Wen et al in view of Tao et al fails to teach providing a parallel computer processing system operable to simultaneously perform, 
Nagao et al teaches providing a parallel computer processing system operable to simultaneously perform (a parallel processing system, all the image processing modules 38 perform image processing in parallel (paragraph 0040),
Therefore, it would have been obvious to a person with ordinary skill in the art to have modified Wen et al in view of Tao et al to include: providing a parallel computer processing system operable to simultaneously perform the method as disclosed in Wen and Tao.  The reason of doing would be to allow to process large amounts of information and reduce processing time.
Note:  the limitation after optional for claim 1 is not required for claim 1 as they are only optional and will not be mapped with limitations from prior art.


Allowable Subject Matter
Claims 2-9, and 25 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 16-23 are allowed.

Conclusion

Any inquiry concerning this communication should be directed to Michael Burleson whose telephone number is (571) 272-7460 and fax number is (571) 273-7460.  The examiner can normally be reached Monday thru Friday from 8:00 a.m. – 4:30p.m.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571) 272- 7440.
Michael Burleson
Patent Examiner
Art Unit 2673

Michael Burleson
September 23, 2022
/MICHAEL BURLESON/
/KING Y POON/Supervisory Patent Examiner, Art Unit 2675