DETAILED ACTION
This action is responsive to the Application filed on 09/04/2020. Claims 1-20 are pending in the case. Claims 1, 12, and 18 are the independent claims.
This action is non-final.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.
Acknowledgement of References Cited By Applicant
As required by MPEP 609 (c), the Applicants’ submission of the Information Disclosure Statement(s) is/are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. 
As required by MPEP 609 (c)(2), a copy of each PTOL-1449, initialed and dated by the Examiner, is attached to the instant office action.
Claim Interpretation
The following disclosure citations provide context with respect to how certain claim terms and operations have been interpreted in view of the disclosure as originally filed:
The obtaining, normalizing, and generating steps of claim 1 which result in the chemical patent corpus are described in the instant application as a combination of manual operations (see [0089] which explains there are two phases: first phase is building corpus, and second phase is assign relevancy annotations to entities identified in first phase; [0090] …annotators were asked to annotate chemical compounds [0091] in three categories [0092] and six classes [0097] ten (10) chemistry graduates were selected as annotators for annotation [0098] eleven (11) snippets were also annotated by two subject-matter experts (SMEs) who defined the guidelines [00100] same training corpus of 11 snippets was also annotated for relevant compounds by the annotators and the SME).
The providing step of claim 1 appears to the manual step of providing a training set (the chemical patent corpus) to some system which is presumed to learn the entities and annotations which were manually generated, so that the system may generate an intended result when some other document (a corresponding normalized patent document) is provided to the system. It is noted that none of claims 1-10 are directed to any details of the training of the system, merely the data or analysis of the data which is being provided to the system. It is further noted that the disclosure admits [0086] the system (chemical entity recognition system) consists of two different named-entity extraction systems, Chemical Entity Recognizer (CER) 532 (Elsevier, Frankfurt DE) and a mining program 534 such as, for example, OCMiner (OntoChem, Halle DE) which are previously-known products.1
“relevancy annotation” is described as a manual annotation provided with the training documents according to the annotation guidelines briefly explained in [0093]. Note that there is no description of how the annotations are made (e.g. what kind of labeling is used, values assigned, etc.) merely a listing of what is or is not considered relevant. Accordingly, a “relevancy annotation” in view of the disclosure as originally could include whether the identified chemical compound entity is present in the title, the abstract, or a claim of the patent document, as it is improper to import limitations from the disclosure into the claims.

Claim Rejections - 35 USC § 103
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art. 
2. Ascertaining the differences between the prior art and the claims at issue. 
3. Resolving the level of ordinary skill in the pertinent art. 
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4-10, 12, 14-16, 18, 20 are rejected under 35 U.S.C. 103 as being unpatentable over AKHONDI et al. ((2014) Annotated Chemical Patent Corpus: A Gold Standard for Text Mining. PLoS ONE 9(9): e107477. doi:10.1371/journal.pone.0107477. 14 pages) in view of ZHANG et al. (Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning. Database (2016) Vol. 2016: article ID baw049; doi:10.1093/database/baw049. 10 pages), further in view of LAWSON et al. (US 2005/0246316 A1).
Regarding claim 1, AKHONDI teaches the method of training a chemical entity recognition system to extract one or more chemical compounds from a patent document and determine a relevance of the one or more chemical compounds to the patent document (intended use of the method), the method comprising:
“corpus development strategy” (starting page 3 col 1)
obtaining, by a processing device (generic computer; arguably inherent in the implementation of the system), a plurality of patent documents from one or more patent databases ((page 3 col 1) Patent Corpus selection (page 3 col 2) patents were downloaded from the sources (EPO, USPTO, and WIPO)…);
normalizing, by the processing device, each patent document of the plurality of patent documents into a unified format to achieve a plurality of unified patent documents ((page 3 col 2) patents were downloaded from the sources (EPO, USPTO, and WIPO) in XML format. Whenever multiple consecutive line breaks were encountered, they were replaced with a single line break. Images were also removed for all patents);
generating, by the processing device, a chemical patent corpus from the plurality of unified patent documents, the chemical patent corpus comprising one or more chemical entities  from the plurality of unified patent document ((page 3 col 2) Annotated entities: We annotated all compounds, diseases, protein targets, and modes of actions (MOA) mentioned in the patents using (page 4 col 1) “Annotation guidelines” and according to a (page 4 col 2) “Annotation process”; generating, after any disambiguation (page 7 col 2) “The gold standard patent corpus”), 
 “use for training a recognition system”
providing, by the processing device, the chemical patent corpus to the chemical entity recognition system (intended use of the chemical patent corpus, e.g. to test a chemical entity recognition system; see (page 2 col 1) To validate the performance of named entity recognition techniques, the availability of a manually annotated patent corpus is essential…(page 3 col 1) we present a gold standard annotated corpus of 200 full patents for benchmarking text mining performance),

As noted above, AKHONDI may be relied upon to teach (as described in the instant application) the first phase of generating the chemical patent corpus, and its potential use for testing (validating) a chemical entity recognition system. However, AKHONDI may not be relied upon to expressly disclose the second phase (annotating the chemical entities such that each of the one or more chemical entities comprising one or more relevancy annotations, the one or more relevancy annotations indicating a relevance to the patent document from which the chemical entity is extracted), nor may AKHONDI be relied upon to teach the intended use of the chemical entity recognition system (after being trained, i.e. in response to receiving the chemical patent corpus), where the chemical entity recognition system  tags the one or more chemical entities in a corresponding normalized patent document of the plurality of unified patent documents, extracts one or more additional chemical entities from the plurality of unified patent documents, assigns a confidence score to each of the one or more additional chemical entities, and labels each of the one or more additional chemical entities as relevant or irrelevant to an associated patent document based on information contained in the chemical patent corpus.
ZHANG describes (see abstract) solutions used for two of the three subtasks of the Chemical and Drug Named Entity Recognition from patent text challenge: (i) Chemical Entity Mention Recognition in Patents (CEMP) and (ii) Chemical Passage Detection (CPD), such that the system output for the CPD task was yielded based on the patent titles and abstracts with chemicals recognized in the CEMP task.
ZHANG broadly explains how an NER (named entity recognition system) was trained (see § Introduction, page 2, end of col 2), including (page 2 col 1) additional sets of features were then employed to adapt the baseline NER system to patent text: (i) domain knowledge features such as chemical/drug dictionaries, chemical structural patterns and semantic type information present in the context of a candidate chemical and (ii) word representation features generated from large unlabeled corpora by unsupervised learning algorithms. The NER system was used to generate the outputs for the CPD task, by leveraging chemical entities recognized in the patent titles and abstracts by the CEMP task. 
ZHANG clearly teaches training a chemical entity recognition system to extract one or more chemical compounds from a patent document (system to perform the CEMP task) and suggests determine a relevance of the one or more chemical compounds to the patent document (system to perform the CPD task, interpreting the identification of chemical entities in patent titles and abstracts as sufficient information to indicate a relevance to the patent document from which the chemical entity is extracted; as supported by the instant application).
ZHANG further explains how the trained system was tested for completion of both the CEMP task and the CPD task (see (§ Results page 6 col 2) and the performance tables on page 7) including various scores for accuracy and precision, thus ZHANG teaches the intended use of the chemical entity recognition system (after being trained, i.e. in response to receiving the chemical patent corpus), where the chemical entity recognition system  tags the one or more chemical entities in a corresponding normalized patent document of the plurality of unified patent documents, extracts one or more additional chemical entities from the plurality of unified patent documents, assigns a confidence score to each of the one or more additional chemical entities, and labels each of the one or more additional chemical entities as relevant or irrelevant to an associated patent document based on information contained in the chemical patent corpus) when there is only one chemical entity, then the accuracy score or precision score of recognition can be used as the confidence score for the recognition.
Accordingly, it would have been obvious to one having ordinary skill at the time the invention was effectively filed to have marked each chemical entity in the “gold standard database” created by AKHONDI with at least its location in the patent document (e.g. title, abstract) under the assumption that the location of the chemical entity in the patent document indicates its relevance to the patent document, thus teaching the second phase of creating the chemical patent corpus described in the instant application, that is annotating the chemical entities such that each of the one or more chemical entities comprising one or more relevancy annotations, the one or more relevancy annotations indicating a relevance to the patent document from which the chemical entity is extracted, so that the manually-annotated chemical patent corpus may be used for training a chemical entity recognition system to extract one or more chemical compounds from a patent document and determine a relevance of the one or more chemical compounds to the patent document (e.g. to perform the CEMP and CPD tasks as taught by ZHANG), the combination motivated by the suggestion in AKHONDI for using the annotated corpus to validate an entity recognition system, the teaching in ZHANG of using a manually-annotated corpus of medicinal chemistry patents provided by the challenge organizers (see (page 2, col 2, near bottom)), and the goal to (see ZHANG (page 2 col 2)) promote the development of NER systems for medicinal chemistry patents.
It is acknowledged that AKHONDI does not explicitly state the operations are performed by a computing device. Further, the reference does not explicitly state the chemical entities are extracted.
Arguably, all computer-implemented inventions inherently are performed by a computing device.  However, in the interests of providing a complete rejection, LAWSON may be relied to explicitly describe [0023] software developed to automatically extract chemical data from documents. This preferred embodiment is focused but not limited to identification and extraction of chemical structures, reactions, and some common physical values from patents [0029] Preferred method embodiments comprise: (a) identifying and tagging one or more chemical compounds within a text document; (b) identifying and tagging physical properties related to one or more of those compounds; ( c) translating one or more of those compounds into a chemical structure; (d) identifying and tagging one or more chemical reaction descriptions within the text document; and (e) extracting at least some of the tagged information and storing it in a database. Note the example result provided in FIG 4 which identifies (408) the patent document(s) and locations within those documents where the chemical entity was referenced (e.g. Frontpage/Claim: 8). Note also the discussion of “plausibility checks” [0088-0091] which is with respect to scoring mappings and comparing with a threshold to determine acceptability.
The software is clearly executed by a computer having memory (see e.g. [0098] technical details regarding operating systems and embedded products; see also [0137]), thus providing evidence that software-implemented methods to develop a chemical entity recognition systems are known to be performed by a computing device and teaching it was known to extract tagged information about chemical entities from patents to be stored in a database. 
Accordingly, it would have been obvious to one having ordinary skill in graphical user interfaces before the effective filling date of the claimed invention, having the teachings of AKHONDI in view of ZHANG, and LAWSON before them, to have combined the teachings and arrived at the claimed invention with a reasonable expectation of success, the combination motivated by LAWSON [0024] get better results by combining chemical knowledge, text mining methods, and linguistic knowledge with intelligent pre- and post-processing, including, in at least some embodiments, plausibility checkers; [0026] extract chemical information from documents and store this information in a database, thus automatically creating an index to the underlying documents; [0027] keep the quality of the data as high as possible, and to keep the error rate at a level comparable to that created by a human indexer.
Regarding claim 12, AKHONDI in view ZHANG, further in view of LAWSON, combined for the reasons discussed above, similarly teaches the system configured for training a chemical entity recognition system to extract one or more chemical compounds from a patent document and determine a relevance of the one or more chemical compounds to the patent document, the system comprising: 
one or more hardware processors; and a non-transitory, processor-readable storage medium comprising one or more programming instructions thereon that, when executed, (collectively, the computing device which is configured by software to perform operations; inherent in any computer-implemented invention as evidenced by LAWSON) cause the one or more hardware processors to:
perform the operations analogous to those of the method claim 1, thus rejected under similar rationale by the combination of AKHONDI in view ZHANG further in view of LAWSON.
Regarding claim 18, AKHONDI in view ZHANG, further in view of LAWSON, combined for the reasons discussed above, similarly teaches the non-transitory storage medium having executable instructions (e.g. memory with software) embodied thereon for causing a processing device to (as evidenced by LAWSON) perform the operations analogous to those of the method claim 1, thus rejected under similar rationale by the combination of AKHONDI in view ZHANG further in view of LAWSON.
Regarding dependent claim 2, incorporating the rejection of claim 1, AKHONDI further teaches wherein obtaining the plurality of patent documents from the one or more patent databases comprises obtaining patent documents that are classified as chemistry related patent documents (see e.g. (§ Patent corpus selection, starting page 3, col 1-2) Based on these selection criteria we were left with 8,016 patents grouped in 11 target classes. To make sure that a collection of well-known patents are included in the corpus, 50 drug patents from Sayle et al. [24] were added).
Regarding dependent claim 4 (14, 20), incorporating the rejection of claim 1 (12, 18), AKHONDI in view of ZHANG further in view of LAWSON, combined at least for the reasons discussed above further teaches: wherein generating the chemical patent corpus comprises:
identifying a chemical compound within text contained in each patent document of the plurality of unified patent documents (AKHONDI (§ Annotated entities, page 3 col 2) We annotated all compounds, diseases, protein targets, and modes of actions (MOA) mentioned in the patents…using (§ Annotation guidelines page 4 col 1) and following (§ Annotation process, page 4 col 2) Each patent was automatically pre-annotated using LeadMine (NextMove Software, UK) [34]. LeadMine can identify chemicals, protein targets, genes, species, company names…; note also LAWSON [0029] (a) identifying and tagging one or more chemical compounds within a text document);
accessing a physical properties database and obtaining one or more physical properties of the identified chemical compound (LAWSON: [0029] (b) identifying and tagging physical properties related to one or more of those compounds; [0049] a database containing a list of chemical fragments can serve as the basis for a data parser [0050] Chemical data includes chemical structures, chemical fragments, molecular formulas, and "atomistic properties.” Such as [0052] physical values); and
generating a chemical structure corresponding to the chemical compound based on the one or more physical properties (LAWSON: [0029] (c) translating one or more of those compounds into a chemical structure [0062] explains at least three different translation services, where [0063-0064] the algorithms are bundled together so the best source algorithm is used [0065] alternatively also providing other metadata such as sum formula; note result obtained in FIG 4).
Regarding dependent claim 5, incorporating the rejection of claim 4, AKHONDI in view of ZHANG further in view of LAWSON, combined at least for the reasons discussed above further teaches: wherein identifying the chemical compound comprises utilizing one or more of a dictionary-based approach and a morphology-based approach to identify the chemical compound (only one must be shown in the art when recited in the alternative; AKHONDI uses LeadMine which is reference (34) on page 8  and is a grammar and dictionary driven approach to chemical entity recognition; note LAWSON uses a dictionary with fragments), wherein the morphology-based approach comprises identifying one or more elements within the chemical compound and combining the one or more elements to create the chemical compound if the chemical compound is validated based on a structural chemistry of the chemical compound (not required in the claim when utilizing dictionary approach; LAWSON [0010-0011] explains how the use of a dictionary of fragments which may be combined to form compound names using connection tables and structural diagrams was previously known and used (thus teaching the “morphology-based” approach)).
Regarding dependent claim 6, incorporating the rejection of claim 1, AKHONDI further teaches wherein generating the chemical patent corpus from the plurality of unified patent documents comprises annotating each of the plurality of unified patent documents with one or more of (only one must be shown in the art when recited in the alternative) a chemical compound, a compound class (see (§ Annotated entities page 3 col 2)), a suffix of a chemical compound, and a prefix of a chemical compound (see (§ Annotation guidelines page 4 col 1) Prefixes should be included within annotations, for example ‘‘1,4-’’ in ‘‘1,4-butanediol’’).
Regarding dependent claim 7, incorporating the rejection of claim 6, AKHONDI further teaches wherein the chemical compound is selected from (only one must be shown in the art when recited in the alternative) a mono-component compound, a compound mixture part, (see (§ Annotated entities page 3 col 2)) or a prophetic compound.
For completeness, note that the example of LAWSON in FIG 4 identifies a compound in referred to in claim 8 which may be considered to be a “prophetic compound” commensurate with the special definition in the disclosure as originally filed.
Regarding dependent claim 8, incorporating the rejection of claim 6, AKHONDI further teaches wherein the compound class is selected from (only one must be shown in the art when recited in the alternative) a chemical class (e.g. generic salts, acids), a biomolecule (e.g. specific drugs), a polymer, a mixture class (e.g. annotated: IUPAC names), a mixture part class (e.g. having a prefix; see claim 6), or a Markush class (see (§ Annotated entities page 3 col 2); note Table 1 “target class distribution of the patents” and Figure 1, example annotation).
Regarding dependent claim 9 (15), incorporating the rejection of claim 1 (12), AKHONDI in view of ZHANG further in view of LAWSON, combined at least for the reasons discussed above, further suggests wherein the one or more relevancy annotations comprise: a relevant compound indicated for a prophetic compound or a Markush class (only one must be shown when recited in the alternative; LAWSON identifies a potential prophetic compound in the claim as discussed in claim 7 above; it would be considered relevant if the prophetic compound is also in the title or abstract); and an irrelevant compound indicated for a compound mixture part, a mixture part class, a mixture class, a polymer, or a biomolecule (AKHONDI identifies these various compound types; they would be deemed irrelevant if they did not appear in the abstract or title).
Regarding dependent claim 10 (16), incorporating the rejection of claim 1 (12), AKHONDI in view of ZHANG further in view of LAWSON, combined at least for the reasons discussed above further teaches wherein the one or more relevancy annotations for a mono-component compound or a chemical class are assigned based on a context of the corresponding unified patent document (as explained in the rejection of claim 1, relevancy is based on the chemical entity being mentioned in the abstract or title, this would be regardless of “type” of the chemical compound, where as explained in the rejection of claims 7 and 8, AKHONDI may be relied upon to at least identify and annotate a mono-component compound or a chemical class).

Claims 11 and 17 are rejected under 35 USC 103 as unpatentable over AKHONDI in view of ZHANG further in view of LAWSON, further in view of Akhondi et al. Recognition of chemical entities: combining dictionary-based and grammar-based approaches Journal of Cheminformatics 2015, 7(Suppl 1):S10 [http://www.jcheminf.com/content/7/S1/S10]. 11 pages, hereinafter AKHONDI’2015.
Regarding dependent claim 11 (17), incorporating the rejection of claim 1 (12), AKHONDI in view of ZHANG further in view of LAWSON, combined at least for the reasons discussed above further teaches wherein the confidence score is calculated based on one or more of (only one must be shown in the art when recited in the alternative) a frequency of a compound in a patent document, an occurrence of a compound within predefined sections of a patent document, a length of a term, an occurrence of a compound within special characters, an occurrence of a single compound within a section of a patent document, a compound not containing solvents or laboratory chemicals, and a presence of a compound in one or more predefined groups representing a frequency of compounds in a large set of chemistry patent documents.
AKHONDI’2015 is broadly directed to describing methods for (abstract) The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). Of particular interest, AKHONDI’2015 explains (page 5 col 1 § Ranking) To perform the CDI subtask, we needed a sorted list of unique mentions of the chemical terms in each document. The terms should be ranked according to an estimated confidence of recognition. We therefore determined a “confidence score” for each chemical term … We assumed that chemical terms would be present more frequently in chemical abstracts than in non-chemical abstracts. For each term, the ratio of the tf*idf (term frequency times inverse document frequency) scores for both abstract sets was computed and transformed into a confidence score between zero and one: if ratio < 1 then score = ratio * 0.5 else score = 1 -0.5/ratio. A term with high confidence is found more frequently in chemical abstracts than in non-chemical abstracts and therefore is likely to be a chemical term.
Thus, AKHONDI’2015 teaches a confidence score of a chemical entity is calculated based an occurrence of a compound within predefined sections of a … document. Applying this to the patent corpus as explained in claim 1, the confidence score not only may be calculated based on an occurrence of the compound within the title and/or abstract (examples of predefined sections) of the patent document, the confidence score may also be used to determine some measure of the relevancy of the compound to the patent document. While AKHONDI’2015 does not explicitly describe training their system to do the entity recognition (relying instead on pre-trained recognizers), nonetheless AKHONDI’2015 (see page 9, col 2) describes a number of advantages with their system, as well as noting how training could improve their results.
Accordingly, it would have been obvious to one having ordinary skill in the art at the time the invention was effectively filed, having the teachings of AKHONDI in view of ZHANG further in view of LAWSON (training a chemical entity recognition system for extracting chemical entities from patent documents, along with some measure of relevance and some measure of confidence of the recognized entity with respect to the input document) and AKHONDI’2015 (having a specific confidence score for a recognized chemical entity which is based on its location within a document) and arrived at the claimed invention, the combination motivated by the advantages described in AKHONDI’2015 (see page 9, col 2).
Claims 3, 13, 19 are rejected under 35 USC 103 as unpatentable over AKHONDI in view of ZHANG further in view of LAWSON, further in view of JESSOP, David M. Information Extraction from Chemical Patents. Dissertation for Fitzwilliam College. Published 15 March 2011. Retrieved via Semantic Scholar from [https://www.repository.cam.ac.uk/handle/1810/238302] on [05/05/2022]. 243 pages.
Regarding dependent claim 3 (13, 19), incorporating the rejection of claim 1 (12, 18), AKHONDI in view of ZHANG further in view of LAWSON, combined at least for the reasons discussed above further teaches: wherein normalizing each patent document of the plurality of patent documents comprises converting the plurality of patent documents into a unified XML representation format (see e.g. AKHONDI (page 3 col 3) patents were downloaded from the sources (EPO, USPTO, and WIPO) in XML format) utilizing one or more predefined XML tags corresponding to heuristic information within the plurality of patent documents (LAWSON [0070-0069] Where a source document is to be converted, each document type preferably has a document type definition (DTD) file that lists the conversion method. For example, where a document has a formal structure (i.e., a particular document type), a DTD preferably specifies how structures (e.g., tagged structures) in the source document are to be converted).
LAWSON’s normalization does not explicitly state storing one-to-one mapping between each character in an original text of each patent document and a corresponding character in a normalized patent document, (per instant application [0088], this process is only to provide some bookkeeping in the event the original document needs to be reviewed or [00122] the corpus needs to be changed and is not otherwise used).
JESSOP is similarly directed to the extraction of chemical information from patent documents. Pages 133-138 describe a process for converting an EPO-provided patent into a more formalized document to be analyzed.
On pages 139-140, JESSOP explains the software which is used to annotate reports of spectral data within the patent documents (OSCAR3) does not directly allow for the annotation of arbitrary XML documents. Instead, the patent documents must first be converted to SciXML… but the process of conversion has destroyed valuable markup that is included in the patent XML files. To work around this problem, software was written to identify the sections of text in the original document that correspond to the sections annotated by OSCAR3 in its SciXML documents in order to allow the annotations to be added to the original document. It is hoped that this process will be made redundant by the addition of the capability to annotate arbitrary XML documents in a future version of the OSCAR software.
JESSOP then describes the use of a DataAligner class which locates the equivalent, unannoted, sections in the XML source of the input paragraph.
Thus, JESSOP identifies a need to have a mapping between a character string in an original text of each patent document and a corresponding character string in a normalized (e.g. converted to SciXML) patent document in order to properly annotate the original patent document. In JESSOP, this need is met by using a string-matching algorithm (DataAligner) which attempts to determine the mapping of character strings when needed. A clear improvement would be storing the mapping, so that it may be referred to if ever needed, rather than determining it each time, the benefit of which would be a faster identification of original sections in the original patent document, with the cost of some data storage.
Accordingly, it would have been obvious to one having ordinary skill in the art at the time the invention was effectively filed, having the teachings of AKHONDI in view of ZHANG further in view of LAWSON (extraction of chemical entities from patent documents) and JESSOP (extraction of chemical entities from patent documents with a recognized need to store mappings between original patent documents and normalized patent documents) before them, to have included storing one-to-one mapping between each character in an original text of each patent document and a corresponding character in a normalized patent document, in order to correctly identify the placement of annotation text in an original patent document, as well as in the normalized version of the patent document, with a reasonable expectation of success.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).


CONCLUSION
The prior art made of record is considered pertinent to applicant’s disclosure and is recorded on Form PTO-892. Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
OH et al. (US 10,572,545 B1) systems and methods for searching and indexing documents comprising chemical information
WEBER et al (US 2011/0055233 A1) creating a tree-based dictionary from text, specific for chemical structures (see e.g. FIGs 1-2, method in FIGs 7-8)
ARAVAMUDAN et al. (US 2018/0082197 A1) generating semantic information between identified entities (note subject matter relates to medical information, as in FIG 49; explains using a confidence score for entity recognition at [0179]).
OLSON et al. (US 2014/0372448 A1) systems and methods for searching chemical structures
TOIVANEN et al. (US 2019/0213407 A1) analysis of scientific, technological, business information from pre-processed documents (e.g. chemical-related patent document in FIG 6; topic analysis result of multiple patents in FIG 7)
ALDERUCCI et al. (WO 2008/130397 A1) maintaining notes associated with documents in a patent database
BOBACH et al. Automated compound classification using a chemical ontology. Journal of Cheminformatics 2012, 4:40 [http://www.jcheminf.com/content/4/1/40]. 12 pages.
IRMER et al. (2015) OCMiner for patents. extracting chemical information from patent texts. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 119–123. Retrieved from [https://biocreative.bioinformatics.udel.ed] on [04/21/2022]. 5 pages.
TSAI et al. NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition. Database (2016) Vol. 2016: article ID baw135; doi:10.1093/database/baw135. 8 pages.
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMY M LEVY whose telephone number is (571)270-3771. The examiner can normally be reached Mon-Fri 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KIEU VU can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Amy M Levy/Primary Examiner, Art Unit 2173                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Evidence that these are known products may be found in Akhondi,S., Rey,H., Schwörer,M. et al. Automatic identification of relevant chemical compounds from patents. Database (2019) Vol. 2019: article ID baz001; doi:10.1093/database/baz001. 14 pages. The content of this article is nearly identical to the patent application filing. See specifically page 7, section title “Chemical entity recognizers” which is provided with citations (40) and (41) and page 14 which identifies these citations as 40. Lawson,A., Roller,S., Grotz,H. et al. (2011) Method and software for extracting chemical data. Unites States Patent Office (USPTO). US7933763 and 41. Irmer,M., Weber,L., Böhme,T. et al. (2015) OCMiner for
        patents. extracting chemical information from patent texts.In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 119–123.