DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1-2, 7-10, 12, 15-18, 22-23, 25-26 and 28 are pending, of which claims 1, 9 and 17 are in independent form.  Claims 1-2, 7-10, 12, 15-18, 22-23, 25-26 and 28 are rejected under 35 U.S.C. 103.  

Response to Claim Amendments and Arguments
Applicant’s claim amendments and arguments filed on 10/13/2021 as they apply to the 35 U.S.C. 103 rejections of the claims have been fully considered.  On pages 11-13 of the remarks, Applicant’s representative appears to argue three aspects of the newly amended and rolled up independent claim limitations.

Applicant’s Argument:
The Vaithyanathan reference does not disclose the newly amended claim limitation reciting, the plurality of dimensions includes a dimension for each word of the dictionary as illustrated by the reduced vector disclosed in Vaithyanathan.
 

Examiner’s Response:
Vaithyanathan in the Abstract discloses an initial matrix and an initial vector, and a reduced vector.  Examiner is of the position that the initial vector disclosed in Vaithyanathan has a number of dimensions equal to that of the number of terms in a domain, M.  Additionally, Vaithyanathan at Column 5, Lines 14-19 discloses, “Initially each document in a given domain stored in memory 13 is represented as an M dimensional vector 21, where M represents the number of terms or words in a predefined domain of documents as shown in FIG. 2.”  Examiner is of the position that the M dimensional vector cited in Vaithyanathan reads on a dimension for each word of the dictionary.

Applicant’s Argument:
While the Sun reference, cited against now cancelled claim 21, teaches extending terms of a dictionary, Sun does not disclose creating a dictionary or, as recited in the newly amended claim language, ...construct a dictionary from words found in a plurality of reports stored in a database.

Examiner’s Response:
Examiner was relying on the Sun reference to teach supplementing a dictionary as recited in not cancelled claim 21, not constructing a dictionary as now recited in the amended independent claims.  Vaithyanathan at Column 5, Lines 14-19 discloses, “Initially each document in a given domain stored in memory 13 is represented as an M dimensional vector 21, where M represents the number of terms or words in a predefined domain of documents as shown in FIG. 2. Typically, the domain comprises thousands of words following a standardized set of format rules and compiled in a series of pre-processing steps.”  Examiner is of the position that the compiling of the domain of words from documents as disclosed in the cited portion of Vaithyanathan above reads on construct a dictionary from words found in a plurality of reports.

Applicant’s Argument:
The Vaithyanathan reference teaches identifying words that have the greatest impact on a document and thus teaches away from the binary matrix of Guha.

Examiner’s Response:
In the Summary of Invention section of the Vaithyanathan reference, Column 2, Lines 28-37, Vaithyanathan discloses one aspect of the invention is representing each document by an M dimensional vector where M represents the number of terms or words from a predetermined domain of documents.  Further, Vaithyanathan discloses reducing the dimensionality of the vector by, for example, as illustrated in Vaithyanathan at Column 2, Lines 43-47 conducting PCA on the initial matrix.  Examiner is of the position that dimensionality reduction is a stated goal of one of the aspects of the invention and therefore the binary matrix would not teach away from the disclosure of Vaithyanathan.
Applicant argues one, or another aspect of the invention, disclosed in Vaithyanathan at Column 2, Lines 37-43, identify for each cluster the terms having greatest impact on the documents in that cluster teaches away from the Vaithyanathan being modified by Guha in the manner Examiner suggests, however, the cited portion of Vaithyanathan is just one embodiment or possible aspect of the invention and not a necessary or limiting feature.  Therefore, Examiner is relying on Vaithyanathan as modified by Guha to teach the argued claim limitation reciting, …the plurality of dimensions comprises only a single bit for each word of the plurality of words.  Guha at paragraph [0050] teaches in part, “…the binary term-document matrix M(D, T) is built, wherein the matrix M corresponds to the terms T and documents D: M(D, T)=1 if and only if term T occurs in document D; otherwise M(D, T)=0.”  Examiner is of the position that a binary term-document matrix reads on the plurality of dimensions comprises only a single bit for each word of the plurality of words.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 7, 9-10, 12, 15, 17-18, 22, 25 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Vaithyanathan et al. U.S. Patent No. 5,857,179 (hereinafter “Vaithyanathan”) in view of Brinker et al. U.S. Pub. No. 2008/0183665 (hereinafter “Brinker”) in view of Shah U.S. Pub. No. 2017/0039358 (hereinafter “Shah”) in further view of Guha et al. U.S. Pub. No. 2007/0203924 (hereinafter “Guha”).
Regarding independent claim 1, Vaithyanathan discloses:
construct a dictionary from words found in a plurality of reports stored in a database (Vaithyanathan at Column 5, Lines 14-19 discloses, “Initially each document in a given domain stored in memory 13 is represented as an M dimensional vector 21, where M represents the number of terms or words in a predefined domain of documents as shown in FIG. 2. Typically, the domain comprises thousands of words following a standardized set of format rules and compiled in a series of pre-processing steps.”  Examiner is of the position that the compiling of a domain of words from a plurality of documents as disclosed in the cited portion of Vaithyanathan above reads on construct a dictionary from words found in a plurality of reports.)
 
encode a first report of the plurality of reports into a feature vector based upon content of the first report; report identification circuitry configured to: identify a closest prototype to the feature vector representative of the first report from among a plurality of prototypes, each prototype representative of a cluster of feature vectors of respective reports of the plurality of reports stored in the database (Examiner is interpreting the claim term prototype as a cluster center.  Vaithyanathan in the Abstract discloses in part, “…each document being represented by a respective M dimensional vector, where M represents the number of terms or words in a predetermined domain of documents…The resultant vectors are then clustered such that correlated documents are grouped into respective clusters.”  Additionally, Vaithyanathan at Column 7, Line 65 – Column 8, Line 5 discloses clustering vectors using a K-means clustering method utilizing a Euclidean distance metric.  Examiner is interpreting K-means clustering as assigning vectors to a cluster with the nearest mean.  Lastly, Vaithyanathan at Column 10, Lines 36 -43 discloses a vector distance to a cluster centroid.)

Vaithyanathan does not disclose:
wherein the first report and the respective reports stored in the database have metadata associated therewith, and wherein the report identification circuitry is further configured to filter the plurality of prototypes based upon the metadata associated with the first report such that the plurality of prototypes from which the closest prototype is to be identified are each representative of a cluster of feature vectors of respective reports having metadata associated therewith which corresponds to the metadata associated with the first report; wherein the metadata associated with the first report is comprised of medical information and the metadata associated with the respective reports is comprised of medical information; wherein one or more prototypes representative of a cluster of feature vectors of respective reports having metadata associated therewith which does not correspond to the metadata associated with the first report are eliminated from the plurality of prototypes from which the closest prototype is to be identified prior to identification of the closest prototype; 
In other words, while Vaithyanathan in the Abstract discloses generating a multidimensional vector of a document based upon the words and terms included in the document, and therefore Examiner is of the position that determining which cluster to assign a vector to, would necessarily depend on the elements used to create the vector, Vaithyanathan does not disclose metadata used in the generation of a vector, and thus filtering and eliminating of prototypes (i.e., cluster centers) based on metadata.  
However, Brinker at paragraph [0010] teaches the following:
The present invention provides improved methods and apparatus for document clustering. In accordance with an embodiment of the invention, a method of clustering a plurality of documents from a data stream includes identifying metadata in one or more of the plurality of documents, emphasizing one or more words corresponding to the metadata, generating a single feature vector for each of the documents based on the emphasized words, and clustering the documents based on the feature vectors.

Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the generation of feature vectors based on words or terms in a document and clustering those vectors based on a nearest mean disclosed in Vaithyanathan with the incorporation of metadata in the generation of feature vectors, and thus necessarily in the assignment of vectors to clusters, taught in Brinker to facilitate in the accuracy and flexibility of the clustering of feature vector.
With respect to the claim limitation specifying the metadata is comprised of medical information, Examiner is of the position that such a limitation amounts to nonfunctional descriptive material (See MPEP 2111.05) because whether the reports are related to medical information and thus the underlying report metadata is related to medical information is a particular use case and would be handled in the same way if the reports were related to global climates or artificial intelligence and therefore will not be given patentable weight.  In other words, medical information does not relate to or add any functionality to the claim.

wherein the feature vectors comprise multi-dimensional feature vectors, a respective multi-dimensional feature vector having: (i) a plurality of dimensions that each represents only a presence or absence of a respective word from the dictionary in the report represented by the respective multi-dimensional feature vector and (ii) one or more dimensions based upon the metadata associated with the report represented by the respective multi-dimensional feature vector, wherein the plurality of dimensions includes a dimension for each word of the dictionary; (Vaithyanathan in the Abstract discloses in part, “…each document being represented by a respective M dimensional vector, where M represents the number of terms or words in a predetermined domain [i.e. dictionary] of documents.”  Vaithyanathan at Column 5, Lines 38-48 discloses a list of words for a domain.  Additionally, Vaithyanathan at Column 2, Lines 28-37 discloses the following:
In one aspect of the present invention, each document is represented by a respective M dimensional vector, where M represents the number of terms or words from a predetermined domain of documents. The vectors are stored in an initial matrix of documents and terms. The dimensionality of the initial matrix is reduced to form resultant vectors of the documents. The resultant vectors are then clustered such that different respective documents are grouped into different clusters.

Examiner is of the position, consistent with the cited portion of Vaithyanathan provided above, the term matrix illustrated in Figure 6B would result in Document 4 having an initial 12 dimensional representative vector including all 12 terms in the domain, reduced to a one dimensional vector corresponding to term “NSR”.  Similarly, Document 12 would have an initial 12 dimensional representative vector reduced to a five dimensional vector, such that each dimension represents only the presence or absence of a representative word.  Lastly, Vaithyanathan at Column 5, Lines 14-19 discloses, “Initially each document in a given domain stored in memory 13 is represented as an M dimensional vector 21, where M represents the number of terms or words in a predefined domain of documents as shown in FIG. 2.”  Examiner is of the position that the M dimensional vector cited in Vaithyanathan reads on a dimension for each word of the dictionary.)

Vaithyanathan does not disclose:
for only the feature vectors in the cluster represented by the closest prototype, determine the distance between each of the feature vectors and the feature vector representative of the first report; identify one or more of the feature vectors of the respective reports included within the cluster represented by the closest prototype that are separated from the feature vector representative of the first report by the shortest distance…
In other words, while Vaithyanathan at Column 1, Lines 33 – 37 disclose comparing a user query represented as a vector to a list of documents represented as vectors and returning vectors most closely matching the query vector based on distance, Vaithyanathan does not disclose comparing vectors for only the feature vectors in the cluster represented by the closest prototype.
However, Shah at paragraph [0006] teaches in part, “A second processing phase is conducted in a near real-time manner in regard to conflating received entity records with the existing entity records of the corpus of entity records. This second processing phase first matches received entity records to a location cluster, and then matches a received entity record to an entity record within the location cluster.”  Examiner is of the position that Shah teaches first identifying a cluster, then identifying a record within a cluster.  Further, Shah at paragraph [0029] teaches how records are matched within a cluster.
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the matching of a user query vector to clustered document vectors as disclosed in Vaithyanathan with matching of a record by first determining a cluster, then determining a matching record within a cluster taught in Shah to facilitate in identifying relevant documents.

Vaithyanathan does not disclose:
provide an indication of the respective report(s) represented by the one or more feature vectors identified to be closest to the feature vector representative of the first report, wherein the plurality of dimensions comprises only a single bit for each word of the plurality of words.
While Vaithyanathan at Column 1, Lines 33 – 37 discloses returning vectors most closely matching the query vector and Vaithyanathan at Figure 6B discloses a document term matrix, Vaithyanathan does not disclose a binary document term matrix.
However, Guha at paragraph [0050] teaches in part, “…the binary term-document matrix M(D, T) is built, wherein the matrix M corresponds to the terms T and documents D: M(D, T)=1 if and only if term T occurs in document D; otherwise M(D, T)=0.”  Additionally, Guha at paragraphs [0051]-[0052] teaches generating a relevancy graph based on common terms.
Both the Vaithyanathan reference and the Guha reference, in the sections cited by the Examiner, are in the field of endeavor of searching and organizing a large collection of data.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the document term matrix and vector clustering of documents disclosed in Vaithyanathan with the binary document matrix and relevancy graph matching common terms taught in Guha to facilitate in the organization and searching of a large collection of data (See Guha at paragraph [0031]).

Regarding dependent claim 2, all of the particulars of claim 1 have been addressed above.  While Vaithyanathan discloses generating generating a feature vector representation of a document using the words contained within the document, Vaithyanathan does not disclose metadata and does not disclose:
wherein the report encoding circuitry is configured to encode the first report by encoding the first report into the feature vector based upon the content of the first report and the metadata associated with the first report.
However, Brinker at paragraph [0010] teaches the following:
The present invention provides improved methods and apparatus for document clustering. In accordance with an embodiment of the invention, a method of clustering a plurality of documents from a data stream includes identifying metadata in one or more of the plurality of documents, emphasizing one or more words corresponding to the metadata, generating a single feature vector for each of the documents based on the emphasized words, and clustering the documents based on the feature vectors.


Regarding dependent claim 7, all of the particulars of claim 1 have been addressed above.  Additionally, Vaithyanathan discloses:
wherein the report identification circuitry is configured to identify the closest prototype by identifying the prototype that has a shortest Euclidean distance to the feature vector representative of the first report as the closest prototype (Vaithyanathan at Column 7, Line 66 – Column 8, Line 4 discloses, “At step 27 in FIG. 2, a Euclidean distance metric is preferably employed to cluster the vectors of the matrix of reduced dimensionality 60. In a preferred embodiment, a K-means clustering algorithm is employed, such as that described by A. C. Jain and R. C. Dubes, "Algorithms for Clustering Data,"”).

Regarding independent claim 9, while independent claim 9, a method claim, and
Independent claim 1, a system claim, are directed to different statutory classes, they are similar in scope and therefore claim 9 is rejected under the same rationale as claim 1.

Regarding dependent claim 10, all of the particulars of claim 9 have been addressed above.  Additionally, claim 10 is rejected under the same rationale as claim 2. 

Regarding dependent claim 12, all of the particulars of claim 9 have been addressed above.  Additionally, Vaithyanathan discloses:
wherein each prototype represents a center point of the feature vectors of the respective cluster (Vaithyanathan at Column 10, Lines 36-38 discloses, “The distance indicator 108A-111A represents the vector distance from the centroid of that particular cluster centroid.”)

Regarding dependent claim 15, all of the particulars of claim 9 have been addressed above.  Additionally, claim 15 is rejected under the same rationale as claim 7.

Regarding independent claim 17, while independent claim 17, a computer program product claim, and independent claim 1, a system claim, are directed to different statutory classes, they are similar in scope and therefore claim 17 is rejected under the same rationale as claim 1.

Regarding dependent claim 18, all of the particulars of claim 17 have been addressed above.  Additionally, claim 18 is rejected under the same rationale as claim 2. 

Regarding dependent claim 22, all of the particulars of claim 1 have been addressed above.  Additionally, Vaithyanathan discloses:
wherein the report identification circuitry is configured to utilize a different metric in order to identify the closet prototype than the metric used to identify the one or more of the feature vectors of the respective reports included within the cluster represented by the closest prototype that are separated from the feature vector representative of the first report by the shortest distance (Vaithyanathan at Column 8, Lines 12-14 discloses applying other clustering techniques such as a cosine similarity measurement.)

Regarding dependent claim 25, all of the particulars of claim 9 have been addressed above.  Additionally, claim 25 is rejected under the same rationale as claim 22.

Regarding dependent claim 28, all of the particulars of claim 17 have been addressed above.  Additionally, claim 28 is rejected under the same rationale as claim 22.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Vaithyanathan in view of Brinker in view of Shah in view of Guha in further view of Nakagawa U.S. Pub. No. 2003/0059115 (hereinafter “Nakagawa”).
Regarding dependent claim 8, all of the particulars of claim 1 have been addressed above.  Vaithyanathan does not disclose:
wherein the report identification circuitry is configured to identify one or more of the feature vectors that are closest to the feature vector representative of the first report by identifying the one or more feature vectors from among the respective reports of the closest prototype that have a shortest Euclidean distance to the feature vector representative of the first report as the closest feature vector(s).
In other words, while Vaithyanathan at Column 1, Lines 33 – 37 disclose comparing a user query represented as a vector to a list of documents represented as vectors and returning vectors most closely matching the query vector, and Vaithyanathan further discloses clustering vectors using the Euclidean distance, Vaithyanathan does not explicitly disclose identifying a closest vector using the Euclidean distance.
However, Nakagawa at paragraph [0005] teaches in part, “…a vector extracted from a pattern of an input character and reference vectors extracted from patterns of previously stored target characters. In this system, the Euclidean distance between the vector of the input pattern and the reference vector is calculated and is compared with a predetermined threshold to perform character recognition.”  Examiner is of the position that the predetermined threshold taught in Nakagawa can include the shortest Euclidean distance. 
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the generation of a set of feature vectors corresponding to a set of documents based on words included in the documents, clustering said feature vectors, and finding the feature vector most closely matching a query vector disclosed in Vaithyanathan with using the Euclidean distance to find the shortest distance between the vectors taught in Nakagawa to facilitate in identifying the closest vector to a query vector.

Regarding dependent claim 16, all of the particulars of claim 9 have been addressed above.  Additionally, claim 16 is rejected under the same rationale as claim 8. 

Claims 23 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Vaithyanathan in view of Brinker in view of Shah in view of Guha in further view of Senior et al. U.S. Pub. No. 2015/0269931 (hereinafter “Senior”).
Regarding dependent claim 23, all of the particulars of claim 1 have been addressed above.  Vaithyanathan does not disclose:
wherein two or more clusters partially overlap such that one or more of the feature vectors are included in the two or more of the clusters.
In other words, while Vaithyanathan discloses representing documents as vectors, and clustering said vectors, Vaithyanathan does not disclose overlapping clusters.
However, Senior at paragraph [0080] teaches, “In some implementations, each cluster may include a distinct plurality of vectors, where each cluster is associated with a distinct speech model. In some other implementations, one or more clusters may include overlapping vectors.”
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the clustering of vectors disclosed in Vaithyanathan with the clusters containing overlapping vectors taught in Senior, to facilitate in accurate vectorization and clustering.

Regarding dependent claim 26, all of the particulars of claim 9 have been addressed above.  Additionally, claim 26 is rejected under the same rationale as claim 23.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
2010/0223276
Paragraph [0046] as it relates to massive datasets for medical applications.  Paragraph [0087] as it relates to creating feature vectors for each document in a dataset.  Paragraph [0188] as it relates to centroids of feature vector clusters.
2013/0031100
Claim 16 as it relates to generating similarity vectors to an objects metadata and the metadata of a centroid and clustering based on the similarity vectors.
2015/0066711
Paragraph [0296] teaches in part, “…each review document can be represented into the vector form, where each vector dimension indicates the presence or absence of a corresponding feature and its associated opinion (i.e. sentiment term) identified from the review document.”
2014/0280088
Paragraph [0036] as it relates to the number of dimensions and terms in a set of documents.
2014/0067847
Paragraph [0053] as it relates to a document being represented as a vector in a multi-dimensional space, where each dimension corresponds to a word in a given corpus or dictionary.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY G GEMIGNANI whose telephone number is (571)272-1018. The examiner can normally be reached M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain T Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.G.G./Examiner, Art Unit 2154                                                                                                                                                                                                        
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154