DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is response to application filed 03/23/2020.

Status of the claims
Claims 1-20 are currently pending for examination.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/23/2020 is being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4, 6-20 of copending Application No. 16/776126. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-20 of instant application recited similar limitation. Therefore, they are rejected on the ground of nonstatutory double patenting.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Claims similar as following:
Instant application 16/826562		Instant application 16/776126

Claim 1	, 2, 17, 18, 19, 20		Claim 1, 15, 18
Claims 4-6				claims 2, 3 and 4
Claim 7, 8				claim 6
Claim 9					claim 7
Claim 10				claim 8
Claim 11				claim 9
Claim 12				claim 10

Claim 17				claim 15

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-9, 11-13, 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Griffith (US 20190095472) in view of Wang et al. “Toward Scalable Hierarchical clustering and co-clustering methods: application to the cluster Hypothesis in information retrieval” , Sep 20 2019 (hereafter Wang) in view of Koral et al. (US 20200112574, hereafter Koral).

Regarding claim 1, Griffith disclose:  An apparatus comprising: at least one processing device comprising a processor coupled to a memory (Griffith [0090]);
the at least one processing device (Griffith [0090])being configured to perform steps of: 
obtaining two or more sets of data records, each of the data records in the two or more sets of data records comprising at least one string associated with at least one attribute (Griffith [0020; 0030] discloses: different datasets for ingestion Dataset ingestion controller include a dataset joinability analyzer 160, which may be configured to determine data attributes (e.g., similarity attributes) with which to determine degrees of similarity among subsets of data);
generating at least one similarity matrix, wherein entries of the at least one similarity matrix comprise values characterizing similarity between respective pairs of the strings associated with the at least one attribute, each pair of strings comprising a first string from one of the data records in a first one of the two or more sets of data records and a second string from one of the data records in a second one of the two or more sets of data records (Griffith [0020; 0021] discloses: similarity matrix for dataset joinability; [0023] discloses: similarity matrices describe data attributes for data relevant to similar or equivalent entities, class, types/string); 
constructing at least one graph network based at least in part on the at least one similarity matrix, the at least one graph network comprising edges connecting pairs of the data records in the two or more sets of data records based at least in part on values of entries in the at least one similarity matrix, at least one of the edges connecting a first data record in the first set of data records with a second data record in the second set of data records (Griffith [0039] discloses: links may be formed among the ingested columnar data, which is converted into a graph-based data format, and a subset of a graph-based data arrangement. At 212, links among a column of data (e.g., an ingested subset of a tabular data arrangement) may be formed with a subset of the other datasets associated with a subset of relevant similarity matric); 
Griffith didn’t disclose, but Wang disclose: performing at least one clustering operation on the at least one graph network to identify one or more clusters for the at least one attribute, a given one of the one or more clusters comprising at least one data record from the first set of data records and at least one data record from the second set of data records  (Wang [page 31] disclose: co-clustering, it performs in both data and feature spaces by simultaneously partitioning data instances and their attributes into subgroups, which are called co-clusters).
Griffith and Wang are analogous art because they are in the same field of endeavor, grouping or clustering datasets based on similarity. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Griffith, to include the teaching of Wang, in order to clustering data based on similarity. The suggestion/motivation to combine is to grouping or clustering data object and their features at the same time.
Griffith didn’t disclose, but Koral disclose: initiating at least one remedial action responsive to identifying the given cluster (Koral [0046] discloses: depending upon the cluster to which the compressed vector representation is assigned, the processing system 104 may implement a type of remedial action that is specific to the type of anomaly. For instance, if the cluster is labeled as a DDoS on a DNS authoritative server, sources originating DNS requests associated with the cluster may then be blocked). 

Regarding claim 2, Griffith as modified disclose: The apparatus of claim 1 wherein the first set of data records is independent of the second set of data records (Griffith [0027] discloses: collaborative dataset consolidation system 110 (e.g., data is linked from external sources).

Regarding claim 3, Griffith as modified disclose:  The apparatus of claim 1 wherein the first set of data records is obtained from a first data source in an information processing system and the second set of data (Griffith [0027] discloses: collaborative dataset consolidation system 110 (e.g., data is linked from external sources)). Records is obtained from a second data source in the information processing system).
 
Regarding claim 4, Griffith as modified disclose: The apparatus of claim 1 wherein generating the at least one similarity matrix comprises performing string similarity calculations for the pairs of the strings (Griffith [0028] discloses: similarity matrix).

Regarding claim 5, Griffith disclose: The apparatus of claim 4 wherein the string similarity calculations comprise one or more edit distance calculations. (Griffith [0056]).
Regarding claim 7, Griffith disclose:  The apparatus of claim 1 further comprising applying a thresholding filter to values in the entries of the at least one similarity matrix to create at least one biadjacency matrix, and wherein constructing the at least one graph network is based at least in part on the at least on biadjacency matrix (Wang [page 38] discloses: construct a similarity graph and weight adjacentry matrix). 
(Wang [page 80] discloses: clustering based on the threshold values). 
Regarding claim 9, Griffith disclose: The apparatus of claim 8 wherein constructing the at least one graph network comprises connecting pairs of the data records in the two or more sets of data records having entries in the at least one biadjacency matrix with the second value, and refraining from connecting pairs of the data records in the two or more sets of data records having entries in the at least one biadjacency matrix with the first value(Wang [page 38] discloses: weight adjacentry matrix for generating a similarity graph ). 
Regarding claim 11, Griffith disclose: The apparatus of claim 1 wherein the two or more sets of data records are associated with a plurality of assets of an information technology infrastructure, the plurality of assets comprising at least one of physical and virtual computing resources in the information technology infrastructure, and wherein initiating the at least one remedial action comprises at least one of: applying one or more security hardening procedures to one or more of the plurality of assets associated with the data records in the given cluster (Koral [0047]; and 
modifying a configuration of one or more of the plurality of assets associated with the data records in the given cluster (Koral [0046] discloses: processing system 104 may then apply remedial actions which are specific to certain types of attacks or other anomalous traffic. For instance, the encoder-decoder neural network may be applied to an input aggregate vector for new network traffic data, the compressed vector representation may be generated and clustered, and then depending upon the cluster to which the compressed vector representation is assigned, the processing system 104 may implement a type of remedial action that is specific to the type of anomaly). 
(Koral [0077] discloses: The at least one additional remedial action may involve blocking DNS traffic from one or more clients associated with the additional DNS traffic records from which the additional input aggregate vector is derived, directing queries from DNS resolver(s) associated with the additional DNS traffic records from which the additional input aggregate vector is derived to a different DNS authoritative server, blocking, …); and 
monitoring subsequent access, by one or more of the plurality of users associated with the data records in the given cluster, to one or more of the plurality of assets (Koral [0014] discloses:  controlling user access the website). 
Regarding claim 13, Griffith disclose: The apparatus of claim 1 wherein each of the data records in the two or more sets of data records comprises a first string associated with a first attribute and a second string associated with a second attribute (Griffith [0036] discloses: attributes such as location zip code associate with location and name)  . 
Regarding claim 15, Griffith disclose: The apparatus of claim 13 wherein: the at least one graph network comprises a first graph network for the first attribute and a second graph network for the second attribute;
 performing the at least one clustering operation on the at least one graph network comprises identifying a first set of one or more clusters of the data records in the two or more sets of data records in the first graph network for the first attribute and identifying a second set of one or more clusters of the (Wang [page 38] discloses: construct a similarity graph and weight adjacentry matrix ); and 
initiating the at least one remedial action is responsive to identifying a given data record in one of the first set of data records and the second set of data records that is in a first cluster with a first subset of the data records in the two or more sets of data records for the first attribute and is in a second cluster with a second subset of the data records in the two or more sets of data records for the second attribute (Koral [0046] discloses: depending upon the cluster to which the compressed vector representation is assigned, the processing system 104 may implement a type of remedial action that is specific to the type of anomaly. For instance, if the cluster is labeled as a DDoS on a DNS authoritative server, sources originating DNS requests associated with the cluster may then be blocked). 
Regarding claim 16, Griffith disclose: The apparatus of claim 15 wherein the first attribute comprises a mailing address and the second attribute comprises a name (Griffith [0036] discloses: attributes such as location zip code associate with location and name). 
Regarding claim 17, Griffith disclose:  A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device to perform steps of: 
obtaining two or more sets of data records, each of the data records in the two or more sets of data records comprising at least one string associated with at least one attribute (Griffith [0020; 0030] discloses: different datasets for ingestion Dataset ingestion controller include a dataset joinability analyzer 160, which may be configured to determine data attributes (e.g., similarity attributes) with which to determine degrees of similarity among subsets of data);; 
generating at least one similarity matrix, wherein entries of the at least one similarity matrix comprise values characterizing similarity between respective pairs of the strings associated with the at least one attribute, each pair of strings comprising a first string from one of the data records in a first one of the two or more sets of data records and a second string from one of the data records in a second one Griffith [0020; 0021] discloses: similarity matrix for dataset joinability; [0023] discloses: similarity matrices describe data attributes for data relevant to similar or equivalent entities, class, types/string); 
 
constructing at least one graph network based at least in part on the at least one similarity matrix, the at least one graph network comprising edges connecting pairs of the data records in the two or more sets of data records based at least in part on values of entries in the at least one similarity matrix, at least one of the edges connecting a first data record in the first set of data records with a second data record in the second set of data records (Griffith [0039] discloses: links may be formed among the ingested columnar data, which is converted into a graph-based data format, and a subset of a graph-based data arrangement. At 212, links among a column of data (e.g., an ingested subset of a tabular data arrangement) may be formed with a subset of the other datasets associated with a subset of relevant similarity matric); 
Griffith didn’t disclose, but Wang disclose: performing at least one clustering operation on the at least one graph network to identify one or more clusters for the at least one attribute, a given one of the one or more clusters comprising at least one data record from the first set of data records and at least one data record from the second set of data records (Wang [page 31] disclose: co-clustering, it performs in both data and feature spaces by simultaneously partitioning data instances and their attributes into subgroups, which are called co-clusters).
Griffith and Wang are analogous art because they are in the same field of endeavor, grouping or clustering datasets based on similarity. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Griffith, to include the teaching of Wang, in order to clustering data based on similarity. The suggestion/motivation to combine is to grouping or clustering data object and their features at the same time.

Griffith didn’t disclose, but Koral disclose: initiating at least one remedial action responsive to identifying the given cluster (Koral [0046] discloses: depending upon the cluster to which the compressed vector representation is assigned, the processing system 104 may implement a type of remedial action that is specific to the type of anomaly. For instance, if the cluster is labeled as a DDoS on a DNS authoritative server, sources originating DNS requests associated with the cluster may then be blocked). 
Griffith as modified and Koral are analogous art because they are in the same field of endeavor, grouping or clustering datasets based on similarity. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Griffith, to include the teaching of Koral, in order to cluster data records event. The suggestion/motivation to combine is to identifying network security events for detecting anomalous domain name system traffic records via an encoder-decoder neural network.
Regarding claim 18, Griffith disclose: The computer program product of claim 17 wherein the first set of data records is independent of the second set of data records (Griffith [0027] discloses: collaborative dataset consolidation system 110 (e.g., data is linked from external sources).
Regarding claim 19, Griffith disclose:  A method comprising steps of: 
obtaining two or more sets of data records, each of the data records in the two or more sets of data records comprising at least one string associated with at least one attribute (Griffith [0020; 0030] discloses: different datasets for ingestion Dataset ingestion controller include a dataset joinability analyzer 160, which may be configured to determine data attributes (e.g., similarity attributes) with which to determine degrees of similarity among subsets of data);; 
generating at least one similarity matrix, wherein entries of the at least one similarity matrix comprise values characterizing similarity between respective pairs of the strings associated with the at least one attribute, each pair of strings comprising a first string from one of the data records in a first one of the two or more sets of data records and a second string from one of the data records in a second one of the two or more sets of data records (Griffith [0020; 0021] discloses: similarity matrix for dataset joinability; [0023] discloses: similarity matrices describe data attributes for data relevant to similar or equivalent entities, class, types/string); 
constructing at least one graph network based at least in part on the at least one similarity matrix, the at least one graph network comprising edges connecting pairs of the data records in the two or more sets of data records based at least in part on values of entries in the at least one similarity matrix, at least (Griffith [0039] discloses: links may be formed among the ingested columnar data, which is converted into a graph-based data format, and a subset of a graph-based data arrangement. At 212, links among a column of data (e.g., an ingested subset of a tabular data arrangement) may be formed with a subset of the other datasets associated with a subset of relevant similarity matric); 
wherein the method is performed by at least one processing device comprising a processor coupled to a memory (Griffith [0090]). 
Griffith didn’t disclose, but Wang disclose: performing at least one clustering operation on the at least one graph network to identify one or more clusters for the at least one attribute, a given one of the one or more clusters comprising at least one data record from the first set of data records and at least one data record from the second set of data records (Wang [page 31] disclose: co-clustering, it performs in both data and feature spaces by simultaneously partitioning data instances and their attributes into subgroups, which are called co-clusters).
Griffith and Wang are analogous art because they are in the same field of endeavor, grouping or clustering datasets based on similarity. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Griffith, to include the teaching of Wang, in order to clustering data based on similarity. The suggestion/motivation to combine is to grouping or clustering data object and their features at the same time.
Griffith didn’t disclose, but Koral disclose: initiating at least one remedial action responsive to identifying the given cluster (Koral [0046] discloses: depending upon the cluster to which the compressed vector representation is assigned, the processing system 104 may implement a type of remedial action that is specific to the type of anomaly. For instance, if the cluster is labeled as a DDoS on a DNS authoritative server, sources originating DNS requests associated with the cluster may then be blocked). 
Griffith as modified and Koral are analogous art because they are in the same field of endeavor, grouping or clustering datasets based on similarity. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Griffith, to include the teaching of Koral, in order to cluster data 

Regarding claim 20, Griffith disclose:  The method of claim 19 wherein the first set of data records is independent of the second set of data records (Griffith [0027] discloses: collaborative dataset consolidation system 110 (e.g., data is linked from external sources).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Griffith (US 20190095472) in view of Wang in view of Koral et al. (US 20200112574, hereafter Koral) and further in view of Ishakian et al. (US 20170111245, hereafter Ishakian).

Regarding claim 6, Griffith as modified didn’t disclose, but  Ishakian disclose:  The apparatus of claim 5 wherein the one or more edit distance calculations comprises at least one of a Levenshtein edit distance calculation and a Jaro-Winkler edit distance calculation (Ishakian  [0067] discloses: Levenshtein edit distance).
Griffith as modified and Ishakian are analogous art because they are in the same field of endeavor, generating similarity graph. It would have been obvious to one of ordinary skill in the art, at the time of filling to modify Griffith, to include the teaching of Ishakian, in order to generating similarity graph. The suggestion/motivation to combine is to provide the method to calculating the similarity information.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Griffith (US 20190095472) in view of Wang  in view of Koral et al. (US 20200112574, hereafter Koral) and further in view of Jaya et al. “NodeTrix-Community hierarchy : techniques for finding hierarchical communities for visual analytics of small-world networks” Copyright 2017 by SCITEPRESS (hereafter Jaya).

Regarding claim 10, Griffith as modified didn’t disclose, but Jaya disclose: The apparatus of claim 1 wherein performing the at least one clustering operation comprises performing community detection on the at least one graph network to identify the one or more clusters for the at least one attribute, the (Jaya [Section 3 “hierarchical communities, page 143] discloses: the community detection using modularity-based methods (e.g., Louvain)  . 
Griffith as modified and Jaya are analogous art because they are in the same field of endeavor, generating similarity matrix. It would have been obvious to one of ordinary skill in the art, at the time of filling, to modify Griffith, to include the teaching of Jaya, in order to managing community network. The suggestion/motivation to combine is to detect communities on both the network and the similarity graph, by using specific community detection at specific hierarchical level.

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CINDY NGUYEN whose telephone number is (571)272-4025. The examiner can normally be reached M-F 8:00-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on 571-272-4080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CINDY NGUYEN/Examiner, Art Unit 2161