Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is responsive to communications regarding the applicant’s amendments and arguments, filed on 03/21/2022.
Claims 2-3 and 14 are canceled.
Claims 1, 4-13 and 15-20 are pending.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/21/2022 has been entered.
Notice of Pre-AIA  or AIA  Status
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Response to Amendments and Arguments
Applicant's arguments filed on 03/21/2022 with respect to claims 1, 4-13 and 15-20 have been considered but are moot in view of the new ground(s) of rejection. Except:
Applicant's main argument is that Narular does not teach “each text window having a number of tokens that is within a range for an expected number of tokens for an address for a particular location”.

In response to Applicant’s above argument, it is noted that Narular clearly teach “token” as expected word within expected windows of length in col. 10 ln. 35-64, col. 17 ln. 27-col. 19 ln. 27. For example, extracting and probability matching address information based on probability weight factors of words or tokens, i.e. “The matching probability table 350 is stored in memory unit 37 (FIG. 2) of the client computer system. The matching probability table 350 includes a plurality of rows 352, one for each identifiable text string of the text lines of the plain text data. Each row includes columns for storing information identifying a corresponding text string, and a plurality of probability weight factors, each indicating a probability that the corresponding text string represents a particular portion of address information. Specifically, for each of the rows 352, the matching probability table 350 includes: a first column 354 for storing a line number of the plain text data at which a corresponding text string is located; a column 356 for storing a starting position value, and an ending position value indicating the starting and ending positions of the corresponding text string in the corresponding line number indicated in column 354; and a plurality of probability weight columns 358, each of which provides storage for a corresponding probability weight factor indicating the probability that the corresponding text string, identified by the contents of columns 354 and 356, represents one of a plurality of types of address information including a name, a title, a company, a "street address", a city, a state, a zip code, a telephone number, a fax number, an e-mail address, and a web address”…” The databases include: a negative name matching database 372 which includes a list of words which have a very low probability of occurring in names (e.g., sales, marketing, world, help, orange, etc.) and which are used to determine negative name matches which substantially decrease the probability that a text string matching an entry in this data base is a name; a positive name pattern matching database 374 including name entries for which a match with a text string suggests, with some predetermined probability weight factor, that the matching text sting is a name; a country name pattern matching data base 375 including country name a state name pattern matching database 380 including entries for which a match suggests that the matching text sting is a state name, the state entries including all full state names and state abbreviations (e.g., California and CA); a city name pattern matching database 382 including entries for which a match suggests that the matching text sting is a city name; an address name pattern matching database 384 including entries for which a match suggests that the matching text sting is a "street address"; a zip code pattern matching database 386 including entries for which a match suggests that the matching text sting is a zip code; an e-mail pattern matching database 384 including entries for which a match suggests that the matching text sting is an e-mail address; a phone number pattern matching database 390 including entries for which a match suggests that the matching text sting is a phone number; a facsimile number pattern matching database 392 including entries for which a match suggests that the matching text sting is fax number; a web address pattern matching database 394 including entries for which a match suggests that the matching text sting is a web address; an amount pattern matching database 396 including entries for which a match suggests that the matching text sting is an amount; and a date pattern matching database 398 including entries for which a positive match suggests that the matching text sting is a date. In one embodiment, a different set of pattern matching databases is used for each of a plurality of countries or geographical regions.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 6, 7, 11-13, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 6339795  to Narurkar et al. (hereinafter “Narurkar”), and further in view of U.S. Patent Application Publication No. 20180285773 to Hsiao et al. (hereinafter “Hsiao”).
As to claim 1, Narukar teaches a method, comprising (computer implemented method in a system comprising processor and non-transitory computer readable storage medium, col. 6 ln. 10-col. 7 ln. 24):
extracting a plurality of text windows from text in one or more content items associated with an entity, each text window having a number of tokens that is within a range for an expected numbers of 
for each text window in the plurality of text windows, applying, by a computer system, a machine learning model to features for the text window to produce for the text window a score representing a likelihood that the text window contains an address (col. 10 ln. 35-64, col. 17 ln. 27-col. 
identifying, by the computer system, based on the scores of the corresponding text windows and a set of validation rules applied to the text windows, one of the text windows as an address for the entity (col. 10 ln. 35-64, col. 17 ln. 27-col. 18 ln. 47, validation rules such as eleven pattern matching databases); and
storing the selected one of the text windows as the address for the entity (Abstract, Fig. 10A, col. 18 ln., transfer data from source to target and stored data based on received information, i.e. “The matching probability table 350 is stored in memory unit 37 (FIG. 2) of the client computer system. The matching probability table 350 includes a plurality of rows 352, one for each identifiable text string of the text lines of the plain text data. Each row includes columns for storing information identifying a corresponding text string, and a plurality of probability weight factors, each indicating a probability that the corresponding text string represents a particular portion of address information.”).
While Narukar inherently teaches a machine leaning model, Narukar does not explicitly teach the features for the text window derived from the tokens of the text window, by: generating hash values from tokens in the text window; and creating a feature vector for the text window by updating elements of the feature vector based on indexes represented by the hash values; identifying, by the computer system, based on the scores of the corresponding text windows as produced by the machine learning model as claimed.
Hsiao teaches the features for the text window derived from the tokens of the text window, by: generating hash values from tokens in the text window; and creating a feature vector for the text window by updating elements of the feature vector based on indexes represented by the hash values; identifying, by the computer system, based on the scores of the corresponding text windows as produced by the machine learning model (Fig. 4, 5, par. 0022-0023, 0030-0036, 0059, 0068, generating 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Narukar with the teaching of Hsiao because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Hsiao would allow Narukar to “enable computing systems to develop improved functionality without explicitly being programmed. Given a set of training data, a machine-learning model can generate and refine a function that predicts a target attribute for an instance based on other attributes of the instance” (Hsiao, par. 0001-0004).
As to claim 6, the rejection of claim 1 is hereby incorporated by reference, the combination of Narukar and Hsiao teaches the method of claim 1, wherein extracting the plurality of text windows from the text in the one or more content items associated with the entity comprises: identifying in the text a token representing an address ending; and generating a subset of the plurality of text windows to end at the token representing the address ending (col. 4 ln. 50-67, identifying token such as spaces, tabs, punctuation marks to identify each line as an address ending.)
As to claim 7, the rejection of claim 1 is hereby incorporated by reference, the combination of Narukar and Hsiao teaches the method of claim 1, wherein extracting the plurality of text windows from the text in the one or more content items associated with the entity comprises: generating the text windows to have varying lengths and to contain varying numbers of tokens associated with valid addresses (col. 10 ln. 35-64, col. 17 ln. 27-col. 18 ln. 32, extracting and probability matching address information based on probability weight factors, i.e. “The matching probability table 350 is stored in 
As to claim 11, the rejection of claim 1 is hereby incorporated by reference, the combination of Narukar and Hsiao teaches the method of claim 1 further comprising: cleaning the text in the one or more content items prior to extracting the plurality of text windows from the text (col. 19 ln. 38-col. 20 ln. 22, preprocessing plain text data, including remove spacing and punctuation type characters, and those lines of the plain text data which do not include a predetermined threshold number of text characters.)
As to claim 12, the rejection of claim 11 is hereby incorporated by reference, the combination of Narukar and Hsiao teaches the method of claim 11, wherein cleaning the text in the content item comprises at least one of: removing phone numbers and email addresses from the text; removing 
Regarding claims 13, 17, is essentially the same as claim 1, 7, except that it sets forth the claimed invention as a system rather than a method and rejected for the same reasons as applied hereinabove. 
Regarding claim 20, is essentially the same as claim 1 except that it sets forth the claimed invention as a non-transitory computer-readable storage medium rather than a method and rejected for the same reasons as applied hereinabove. 
Claims 4, 5, 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Narukar, Hsiao, and further in view of U.S. Patent Application Publication No. 20190171755 to Yanez et al. (hereinafter “Yanez”).
As to claim 4, combination of Narukar and Hsiao teaches the method of claim 1. The combination of Narukar and Hsiao does not explicitly teach further comprising: generating the features for the text window from the tokens of the text window by: generating binary features indicating the presence or absence of known address components in the text window as claimed.
Yanez teaches further comprising: generating the features for the text window from the tokens of the text window by: generating binary features indicating the presence or absence of known address components in the text window (par. 0026, features generated including whether information is missing or whether a portion of an address provided by the place record is missing, such as a street name, zip code, street number, or locality name.)

As to claim 5, the rejection of claim 4 is hereby incorporated by reference, the combination of Narukar, Hsiao and Yanez teaches the method of claim 4, wherein the address components comprise at least one of: a zip code; a state abbreviation; a state; a compass direction; a post office box; a street type; and a number at a start of a text window (Narukar, col. 10 ln. 35-64, col. 17 ln. 27-col. 18 ln. 32. Further, in Yanez, par. 0026).
Regarding claim 15, 16, is essentially the same as claim 4, 5, except that it sets forth the claimed invention as a system rather than a method and rejected for the same reasons as applied hereinabove. 
Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over Narukar, Hsiao, and further in view of U.S. Patent No. 8285656 to Chang et al. (hereinafter “Chang”).
As to claim 8, combination of Narukar and Hsiao teaches the method of claim 1. The combination of Narukar and Cai does not explicitly teach wherein identifying one of the text windows as the address for the entity comprises: identifying, based on the scores, a subset of the plurality of text windows with highest scores produced by the machine learning model; applying the validation rules to address labels associated with tokens in the subset of the plurality of text windows to generate adjusted scores for the subset of the plurality of text windows; and selecting one of the text windows with a highest adjusted score as the address for the entity as claimed.
Chang teaches wherein identifying one of the text windows as the address for the entity comprises: identifying, based on the scores, a subset of the plurality of text windows with highest scores 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of combination of Narukar and Hsiao with the teaching of Chang because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Chang would allow combination of Narukar and Hsiao to enable  “improved method and system for crawling, mapping and extracting information from web pages where the extracted information can be mapped to a specific business” (Chang, col. 1 ln. 20-64).
Claims 9, 10 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Narukar, Hsiao, Chang, and further in view of U.S. Patent Application Publication No. 20090119268 to Bandaru et al. (hereinafter “Bandaru”).
As to claim 9, the rejection of claim 8 is hereby incorporated by reference, the combination of Narukar, Hsiao and Chang teaches the method of claim 8. The combination of Narukar, Hsiao and Chang does not explicitly teach wherein identifying one of the text windows as the address for the entity further comprises: updating the subset of the plurality of text windows based on Uniform Resource Locators (URLs) of the one or more content items from which the text windows were extracted as claimed.
Bandaru teaches wherein identifying one of the text windows as the address for the entity further comprises: updating the subset of the plurality of text windows based on Uniform Resource 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of combination of Narukar, Hsiao and Chang with the teaching of Bandaru because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Bandaru would allow combination of Narukar, Hsiao and Chang to enable  “improved method and system for crawling, mapping and extracting information from web pages where the extracted information can be mapped to a specific business” (Bandaru, par. 0020).
As to claim 10, the rejection of claim 8 is hereby incorporated by reference, the combination of Narukar, Hsiao and Chang teaches the method of claim 8. The combination of Narukar, Hsiao and Chang does not explicitly teach wherein applying the validation rules to address labels associated with tokens in the subset of the plurality of text windows comprises at least one of: validating a start of an address at a beginning token of a text window; validating a number of tokens associated with an address label; and validating a road label associated with the text window as claimed.
Bandaru teaches wherein applying the validation rules to address labels associated with tokens in the subset of the plurality of text windows comprises at least one of: validating a start of an address at a beginning token of a text window; validating a number of tokens associated with an address label; and validating a road label associated with the text window (Fig. 7A-8, par. 0052, 0102-0112, validating business attributes, including address labels.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of combination of Narukar, Hsiao and Chang with the teaching of Bandaru because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Bandaru 
Regarding claim 18, is essentially the same as claim 8 and 9, except that it sets forth the claimed invention as a system rather than a method and rejected for the same reasons as applied hereinabove. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANHTAI V TRAN whose telephone number is (571)270-5129.  The examiner can normally be reached on Monday through Thursday from 8:00 AM to 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571)272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.