Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-5 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Han (U.S. Pub 2020/0097601 A1), 
Claim 1
Han discloses a non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to:
extract a plurality of data records from at least one data source ([0013], line 1-9, “... the entity analysis platform may include and/or receive the unstructured data... identify entities in the structured data and obtain structured data that includes a plurality of known entities...” <examiner note: entities/data records in structured and unstructured data are identified/extracted>)
combine, using a first set of predefined rules having parameters trained using a first Bayesian Program Learning model, a set of data records from the plurality of data records into a single data record based on a likelihood that each data record from the set of data records is associated with a common entity ([0013], line 27-28, “... The entity analysis platform may automatically determine a probability that an entity of the unstructured data corresponds to an entity of the structured data...” [0020], line 7-16, “... the entity analysis platform may train a model using information that includes a plurality of identifiers of entities (e.g., formal names, nicknames, brands, logos, and/or the like), a plurality of characteristics associated with the entities, and/or the like, to identify whether a representation of an entity is associated with a particular entity... the entity analysis platform may determine that past identifications of representations of entities, are associated with a threshold probability of being associated with the particular entity...” [0013], line 30-36, “... Furthermore, the entity analysis platform may update the structured data to include information, associated with the entity, from the unstructured data. Accordingly, as a specific example, if the entity from the online news article or social media post is receiving an award (or positive sentiment), a profile of the entity in the directory can be updated to indicate that award...” [0028], “... the entity analysis platform may use any number of artificial intelligence techniques, machine learning techniques, deep learning techniques, and/or the like to identify a representation of entity (e.g., in unstructured data), analyze the representation of the entity.  In this case, the entity analysis platform may determine that a relatively high score (e.g., as being likely to be identified) is to be assigned to representations of entities that are determined to be the same or similar as previously identified representations of the particular entity (or more frequently identified than past identified representations)...” <examiner note: a first model is trained and using information such as identifiers of enties, characteristics of entities to resolve representations of entities are the same as the particular entity. The system combines information of information of entities extracted from unstructured data into the particular entity in the structured data. The first model is any AI models and/or machine learning models that include BPL model. Fig. 2b, Robert J. Doe may correspond to Bob Doe >) ; 
define a link between the single data record and at least one other data record from the plurality of data records using a second set of predefined rules having parameters trained using a second Bayesian Program Learning model ([0041], “... generate the entity relation model... that enables identification of a relationship between at least two entities based on one or more characteristics of the entities, such as a common identifier... a common organization... a common location... a common event... and/or the like. The entity relation model may be generated, as described above with regard to the model of entity representation analysis....” [0028], “... the entity analysis platform may use any number of artificial intelligence techniques, machine learning techniques, deep learning techniques, and/or the like...” <examiner note: a trained entity relation model/second AI model (i.e., the second second AI model  is any AI models and/or machine learning models that include BPL model) uses rules such as common identifier, locations, and so on to identify and determine relationship between entities. Regarding to fig. 1, for instance, the single data record is entity X that has relationships or links to other entities/data records>); 
generate a knowledge graph data structure that represents a set of links including the link ([0040], line 1-3, “... an entity relation model to identify the relationships and/or characteristics of the entities and/or generate the knowledge graphs accordingly...” <examiner note: fig. 1, a knowledge graph is presented with link/relationship between entity x and other entities>) ; and 
detect, based on the set of links, a set of attributes associated with the common entity ([0019], line 19-20, “... the entity analysis platform may identify characteristics associated with recognizing entities...” <examiner note: fig. 2a and 2b shows the entity analysis platform uses a set of pattern of entities that are linked to characteristics 210 to determine characteristics/attributes associates with the common entity>) 
Claim 2
Claim 1 is included, Han further discloses wherein the plurality of data records from the at least one data source includes at least one of image data , video data, audio data, textual data (unstructured data), or time series data (structured data)
Claim 3
Claim 1 is included, Han further discloses wherein the plurality of data records from the at least one data source includes at least one of structured data, semi-structured data, or unstructured data ([0013], line 1-9, “... the entity analysis platform may include and/or receive the unstructured data... identify entities in the strucutured data and obtain structured data that includes a plurality of known entities...”)
Claim 4
Claim 1 is included, Han furher discloses wherein the at least one data source includes at least one of a database, a file system, or an application ([0011], “... The data may be structured data (e.g., that is organized according to one or more parameters), such as data in a database, a table, an index, a task graph, and/or the like...”)
Claim 5
Claim 1 is included, han further discloses wherein the code to cause the processor to extract is performed by at least one of an artificial neural network (ANN) ([0034], “... entity recognition model to identify the entities...”), an isomap, a kernel principal component analysis (kernel PCA), a thresholding, or a connected-component labeling (fig. 2, 210).

	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Han (U.S. Pub 2020/0097601 A1), as applied to claim 1, and in view of Variational Knowledge Graph Reasoning written by Wenhu Chen, 23 Oct 2018
Claim 6
Claim 1 is included, however, Han does not disclose the code further comprising code to cause the processor to: improve the knowledge graph data structure using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference algorithm.
	Chen discloses improve the knowledge graph data structure using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference algorithm (pg. 1, abstract, “... Inferring missing links in knowledge graphs (KG) has attracted a lot of attention from the research community. In this paper, we tackle a practical query answering task involving predicting the relation of a given entity pair. We frame this prediction problem as an inference problem in a probabilistic graphical model and aim at resolving it from a variational inference perspective...”)
	Han discloses a knowledge graph is generated using identified entities and identified relationship and characteristics. However, Han does not disclose how to deal with missing links in knowledge graph. Chen resolve this problem by applying variation intefrence. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate variational inference techniques as disclosed by Chen into Han to allow missing links between entities in knowledge graph are discovery and linking between entities.
	
Claim(s) 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Devi Reddy (U.S. 2017/0118240 A1), in view of  Elkington (U.S. Pub 2016/0357790 A1)
Claim 14
Reddy discloses an apparatus (fig. 11), comprising: 
a memory (fig. 11, main memory, static memory); and
a processor operatively coupled to the memory (fig. 11, processor), 
the processor configured to receive a plurality of data records, the plurality of data records being heterogeneous data from at least one data source ([0038], “... a client device 100 executes an application or process (not shown) to collect raw data from the client device 100 about entities within the local network 110, and to send this raw data to the security analytics system 140... The raw data can contain information from a wide variety of sources and from all layers of the technology stack... ” [0056], line 1-2, “... The security analytics system 140 receives the raw data...” <examiner note: the system 140 receives raw data comprise  from multiple device>),
the processor configured to prepare the plurality of data records using feature extraction and normalization to generate a plurality of prepared data records ([0056], line 6-8, “... the security analytics system 140 may pre-process the raw data, such as by aggregating the raw data fields or reformatting the raw data...” [0062], line 1-3, “... The data normalization module 210 can filter the raw data based on the relevance of information within the raw data...” [0029], line 1-3, “... the raw data is filtered... to extract data fields from the raw data that are relevant ...” [0109], “... identifies 420 a subset of the data fields based on the relevance of the data fields...The security analytics system generates 430 filtered data containing the subset of data fields and generates 440 structured data based on the filtered data...”), 
the processor configured to define each entity record from a plurality of entity records from the plurality of prepared data records ([0090], line 1-16, “... The machine-learned analysis module 240 applies machine-learned models to features generated by the feature extraction module 230 to detect security threats... a machine-learned model is generated to describe a type of entity...” [0091], line 1-3, “...  Machine-learned models are used to determine behavior of entities within the local network 110 that is representative of malicious behavior...” <examiner note: using normalized extracted features and machne mearning model, entities’ behaviors are determined/defined>) , 
the processor configured to associate each entity record of an entity record pair from a plurality of entity record pairs from the plurality of entity records with a remaining entity record from that entity record pair to generate a plurality of relationships, each entity record pair from the plurality of entity record pairs having an indication of relation type and an indication of relation likelihood ([0030], line 1-3, “... use the structured data to determine whether an entity in the local network is exhibiting malicious behavior...” [0092], line 1-10, “... module 240 determines the likelihood that a security threat... be represented using a numeric threat score... a threat score is determined for each entity... wherein the threat score represents the likelihood that the entity is performing malicious behavior...” [0121], line 8-11, “... an edge 710 between entity 700 and entity 705. In this example, entity 700 may use entity 705 as a workstation, so the security analytics system can attribute actions performed by entity 700 to entity 705...” [0124], line 3-8, “... if entity 705 is accessing a file that entity 700 should not be accessing and an edge has been established between entity 700 and entity 705 that associated the actions of entity 705 with entity 700, then it may be determined that entity 700 is exhibiting malicious behavior and is a security threat...” <examiner note: fig. 7 shows multiple entity pairs are establish. each entity has a threat score that represent the likelihood that entity performing malicious behavior. Further, edge 710 represent an action of accessing data  from entity 700 to entity 705>), and 
the processor configured to generate a knowledge graph data structure based on the plurality of entity records and the plurality of relationships (fig. 7 shows a knowledge graph data structure)

However, Reddy does not explicit ly disclose define each entity record from a plurality of entity records by merging a first prepared data record from the plurality of prepared data records with a second prepared data record from the plurality of prepared data records.
Elkington discloses define each entity record from a plurality of entity records by merging a first prepared data record from the plurality of prepared data records with a second prepared data record from the plurality of prepared data records ([0042], “... Once ML model(s) is/are trained 207, they are ready for use in generating predictions. Input is received 201, including N duplicate records representing the same entity. Feature vectors are built 202 for each of the N duplicate records...” [0043], “... Once feature vectors have been built 202, the feature vectors are fed 203 into ML model(s) 112, which generate 204 one or more resolved records. In at least one embodiment, a confidence score is associated with each generated resolved record. The record with the highest confidence score is selected 205 and output 206...”)
Reddy discloses raw data is prepared by data nomalization and feature extraction; however, Reddy does not explicitly disclose to identify duplicate/near-duplicate record and revolve the issue. Elkington uses extracted features and machine learning model to resolve/merge multiple records which represent the same entity into a revolved record. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate entity record resolution as disclosed by Elkington into Reddy because duplicate records can lead to waste storage. By applying entity record resolution as disclosed by Elkington, storage space can be saved.
Claim 15
Claim 14 is included, Reddy discloses wherein the heterogeneous data from the at least one data source includes at least one of structured data, semi-structured data, or unstructured data ([0038], “... The raw data can contain information from a wide variety of sources and from all layers of the technology stack, including, but not limited to: [0039] hardware events (e.g., interrupts) [0040] virtualization layer logs [0041] system state information [0042] file system information [0043] operating system event logs [0044] network device logs (e.g. logs for Dynamic Host Configuration Protocol (DHCP), Domain Name System (DNS), routers, and switches) [0045] security monitoring tool logs (e.g. logs from intrusion detection systems (IDS), intrusion prevention systems (IPS), proxies, and firewalls) [0046] network traffic monitoring logs (e.g. deep packet inspection metadata, NetFlow®) [0047] authentication logs (e.g. Microsoft® Active Directory or LDAP systems) [0048] application firewall logs [0049] database logs [0050] file sharing logs [0051] web server access logs [0052] email logs/content/metadata [0053] content management system logs [0054] physical access logs, and [0055] business policies and audit rules.)
Claim 16
Claim 14 is included, Reddy further discloses wherein the at least one data source includes at least one of a database, a file system, or an application  ([0038], “... The raw data can contain information from a wide variety of sources and from all layers of the technology stack, including, but not limited to: [0039] hardware events (e.g., interrupts) [0040] virtualization layer logs [0041] system state information [0042] file system information [0043] operating system event logs [0044] network device logs (e.g. logs for Dynamic Host Configuration Protocol (DHCP), Domain Name System (DNS), routers, and switches) [0045] security monitoring tool logs (e.g. logs from intrusion detection systems (IDS), intrusion prevention systems (IPS), proxies, and firewalls) [0046] network traffic monitoring logs (e.g. deep packet inspection metadata, NetFlow®) [0047] authentication logs (e.g. Microsoft® Active Directory or LDAP systems) [0048] application firewall logs [0049] database logs [0050] file sharing logs [0051] web server access logs [0052] email logs/content/metadata [0053] content management system logs [0054] physical access logs, and [0055] business policies and audit rules.)
Claim 17
Claim 14 is included, Reddy further discloses wherein the feature extraction is performed by at least one of an artificial neural network (ANN), an isomap, a kernel principal component analysis (kernel PCA), a thresholding, or a connected-component labeling ([0045], “.. FIG. 2, feature vectors are built for each of the N duplicate records. For example, for record s.sub.i, Feat(s.sub.i)=(Feat(i,1), . . . Feat.sub.(i,k)) represents the feature vector to be built (which has K features).)
Claim 18
Claim 14 is included, Reddy discloses wherein the processor is configured to perform an automated process on the knowledge graph data structure including: identifying at least one new relationship; updating the plurality of relationships to include the at least one new relationship to define a plurality of updated relationships; and regenerating the knowledge graph data structure based on the plurality of updated relationships ([0129], “... The icons 905 are positioned with time periods 915, which represent the threat score of the entities associated with the icons 905 during the time period 915. In some embodiments, each time period represents a separate hour-long period within a day. In the embodiment illustrated in FIG. 9, the user can also select a date 920 for which the user would like to see a day-long timeline 900 of entity threat scores. For example, the user is presently seeing the threat scores for entities on Saturday, the 22nd of the present month. In some embodiments, the user can select an icon 905 in the timeline, which highlights the icons 905 representing the same entity in the other time periods 910 being displayed. This allows the user to see how the threat score of an entity has changed over time...” <examiner note: the raw data is collected intervally, the relationship between entities are regenerated, threat scores are recalcualted. graph in figure 8 is updated>)

Claim(s) 19 is rejected under 35 U.S.C. 103 as being unpatentable over Devi Reddy (U.S. 2017/0118240 A1), in view of  Elkington (U.S. Pub 2016/0357790 A1), as applied to claim 14, and further in view of Variational Knowledge Graph Reasoning written by Wenhu Chen, 23 Oct 2018
Claim 19
Claim 14 is included, however, Devi Reddy and Elkington do not explicitly disclose wherein the processor is configured to improve the knowledge graph data structure using at least one of a Markov Chain Monte Carlo (MCMC) algorithm or a variational inference algorithm.
Reddy discloses a knowledge graph is generated using identified entities and identified relationship and properties/characteristics. However, Reddy does not disclose how to deal with missing links in knowledge graph. Chen resolve this problem by applying variation intefrence. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate variational inference techniques as disclosed by Chen into Reddy to allow missing links between entities in knowledge graph are discovery and linking between entities.
Allowable Subject Matter
Claim 20 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
The following is a statement of reasons for the indication of allowable subject matter:  Claim 7 includes “... defining a plurality of entity records, each entity record from the plurality of entity records by sampling data from an empirical distribution of the plurality of prepared data records and merging the first prepared data record from the plurality of prepared data records with the second prepared data record from the plurality of prepared data records based on comparing the sampled data with a set of predefined quality criteria...” that are not disclosed by Han, Chen, Reddy, and Elkington.Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAU HAI HOANG whose telephone number is (571)270-5894. The examiner can normally be reached 1st biwk: Mon-Thurs 7:00 AM-5:00 PM; 2nd biwk: Mon-Thurs: 7:00 am-5:00pm, Fri: 7:00 am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert Beausoliel can be reached on 571 262 3645. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

HAU HAI. HOANG
Primary Examiner
Art Unit 2167



/HAU H HOANG/           Primary Examiner, Art Unit 2167