Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Priority
Examiner acknowledges applicants’ claim of priority to the following application:
Foreign priority to 201910926347.1, filed 09/27/2019

Information Disclosure Statement
The IDS filed 01/19/2022 has been considered as noted on the attached PTO-1449.
Claims 1-20 have been examined.
This action is made FINAL.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-8, 12-15 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over by Zhou et al. [CN 109597855 A, November 29, 2018], in view of Kleiman et al. [US 20180349511 A1, June 6, 2018], further in view of Goel et al. [US 20180349817 A1, ne 1, 2018] and Dai et al. [US 20200265218 A1, February 20, 2019].

With respect to claims 1, 8 and 15, the claims limitations of the method, apparatus storing executable instructions and computer executable instructions comprising steps of:  
acquiring source data [e.g. crawler of the encyclopedia website] related to preset keywords [e.g. constructed domain synonym database] according to the preset keywords (page 12, lines 17-20, in addition to completing the collection of raw data in the process, an accurate and rich synonym database can be constructed through the synonymous entity expansion method based on the crawler of the encyclopedia website to realize the entity link in the knowledge fusion.
Page 11, lines 15-20, the domain knowledge map construction method further includes: after the preset time length, crawling the data source and acquiring the second data information (e.g. preset keywords from first acquiring keywords); determining, according to the second data information, whether the first data information is changed; When a data information is changed, the change data is acquired, and the change data is converted into a graph database model and incorporated into the knowledge map);
cleaning the source data according to a preset data dictionary and an error information table (page 11, lines 34-39, after the original data acquisition is completed into Hadoop Distributed File System (HDFS), there are many problems, such as missing attribute consistency, and data cleansing.. 
Page 12, lines 3-6, make uniformization according to the different naming representations of the same entity in the thesaurus, to clean up and integrate, and to obtain a unified and clear data representation. This step corresponds to the entity link in the knowledge map construction process);
extracting entities, attribute information of the entities and relationship information among the entities from the cleaned source data according to the preset data dictionary and an entity relationship (page 3, lines 44-48, the performing knowledge fusion on the association information between the entities includes: extracting information features according to association information between the entities, to eliminate conceptual ambiguity and stripping redundancy And the concept of error; physical linking of the information features to obtain relational data);
fusing the entities (page 15, lines 13-19, the storage module 300 is configured to perform knowledge fusion on association information between entities and establish a relational database. The building module 400 is configured to convert the relational database into a graph database model to construct a knowledge map); the attribute information of the entities and the relationship information among the entities to obtain data triples as the knowledge graph (page 5. lines 43-49, the formalization of the knowledge map is defined as: logically, the knowledge map can be divided into two levels: the data layer and the pattern layer. In the data layer of the knowledge map, the "entity-relationship-entity" or "entity-attribute-attribute value" triplet is used as the basic expression of the fact and stored in the graph database. The vast network of entity relationships formed by all facts forms a knowledge map. The pattern layer is above the data layer and is the core of the knowledge map); and
storing the knowledge graph into a preset graph database (page 5, line 49-page 6, line 3, the model layer stores refined knowledge. The ontology library is usually used to manage the pattern layer of the knowledge map. The ontology library supports the axioms, rules and constraints to regulate the types and attributes of entities, relationships, and entities. The connection between)).
Zhou does not teach:
wherein the keywords are keywords in a field of arts; the preset data dictionary is a data dictionary of arts, the error information table is an error information table related to the field of arts; and the preset entity relationship is a preset entity relationship among a painter, a painting and a museum.
Kleiman teaches wherein the keywords are keywords in a field of arts; the preset data dictionary is a data dictionary of arts, the error information table is an error information table related to the field of arts; and the preset entity
relationship is a preset entity relationship among a painter, a painting and a museum ([0028] the content sources 108 may include various types of information and data including without limitation textual information (e.g., published or unpublished information such as books, journals, periodicals, magazines, newspapers, treatises, reports, legal documents, reporters, dictionaries, encyclopedias, blogs, wikis, and so graphical information (e.g., charts, graphs, tables, and so forth), images or other visual data (e.g., photographs, drawings, paintings, plans, renderings, models, sketches, diagrams, computer-aided designs, and so forth), audio data, numerical data, geographic data, scientific data (e.g., chemical composition, scientific formulas, and so forth), mathematical data, and so forth).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Zhou with data dictionary of arts of Kleiman. Such a modification would provide a useful medium for sharing knowledge with other users and collaborating on investigation of a topic (Kleiman [0005]).
Zhou as modified by Kleiman further teaches wherein the source data comprises semi-structured source data and structured source data (Zhou, page 5, lines 36-48,  the data source comprises structured data, semi-structured data, and unstructured data). Further, in an embodiment of the present invention, the extracting information from the data source includes: extracting entity, relationship, and entity attribute structured information from the semi-structured and unstructured data by the data source, To obtain the associated information); the acquiring source data (Kleiman [0028] the content sources 108 may include various types of information and data including without limitation… images or other visual data (e.g., photographs, drawings, paintings, plans, renderings, models, sketches, diagrams, computer-aided designs, and so forth)…) related to preset keywords according to the preset keywords in the field of arts comprises steps of: crawling the semi-structured source data on a preset target website related to the field of arts by using a scrapy application framework according to the key words: and/or (page 12, lines 11-15, the embodiment of the present invention can implement crawling based on the Python Scrapy framework, obtain network data, package the import event into a MapReduce task through Sqoop, submit it to the Hadoop distributed environment, and obtain data in parallel from the data source, and finally generate HDFS format results file); retrieving the structured source data in a preset database related to the field of arts according to the keywords (Kleiman (Kleiman [0028] the content sources 108 may include various types of information and data including without limitation… images or other visual data (e.g., photographs, drawings, paintings, plans, renderings, models, sketches, diagrams, computer-aided designs, and so forth)…).
Zhou as modified by Kleiman does not teach:
, comprising steps of:
dividing the semi-structured source data into a plurality of groups according to preset obtaining, based on a Word2vec algorithm, similarity vectors corresponding to data in the semi-structured source data in the plurality of groups.
comparing the similarity with a preset similarity threshold:
in response to the similarity exceeding the preset similarity threshold, fusing the two data into a piece of source data.
Goel teaches dividing the semi-structured source data [e.g. dividing semi-structured data sources data layer] into a plurality of groups [e.g. RFIs, Change Orders, Quality Issues, Safety Issues, Building Standards, Submittals, Contracts] according to preset obtaining, based on a Word2vec algorithm [e.g. based on , similarity corresponding to data in the semi-structured source data in the plurality of groups [e.g. score degree of groups similarity] ([0039] the data layer 102 includes data sources (structured transactions, semi-structured text, images, and models/designs/docs), data conversation/ normalization (text conversions and common schema alignment), and data quality assessment scores (cleanliness, construction context fitness, and standards conformance… embodiments of the invention utilize a set of topic modeling engines that can determine how close the language used in the particular text is close to construction language topics (e.g. a set word2vec topic models based on RFIs, Change Orders, Quality Issues, Safety Issues, Building Standards, Submittals, Contracts.
[0134] transaction text similarity looks at whether different types of descriptions are effectively items of the same class. The score would denote a metric of the distributions of these groups of similarity. For example, a high degree of similarity means that potentially not much variance exists which in turn makes risk scores (e.g. Sub Contractor's Scores) less effective as they cannot differentiate between subcontractors).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Zhou as modified by Kleiman with dividing the semi-structured source data into a plurality of groups according to preset obtaining, based on a Word2vec algorithm and similarity  of Goel. Such a modification would provide extensive quality measures for more types of data being ingested (Goel [0105]).
obtaining a similarity between any two data in a same group based on the similarity vectors ([0058] clustering module 108 applies a Chinese Whispers clustering algorithm to identify groups of similar feature vectors within the N feature vectors (FV.sub.1, . . . , FV.sub.N)) comparing the similarity with a preset similarity threshold ([0058] during operation, clustering module 108 (e.g. fusing) will determine dusters based on a specified similarity threshold (ST);
in response to the similarity exceeding the preset similarity threshold, fusing the two data into a piece of source data ([0058] the similarity threshold (ST) may set a minimum weight threshold for establishing a connecting edge between nodes. An example of the operation of clustering unit 104 is illustrated in the flow chart of FIG. 2. The Object 1 FVS 150(1), which includes N feature vectors (FV.sub.1, . . . , FV.sub.N), is provided to clustering module 108. Clustering module 108 is configured to generate a cluster module output 223 by classifying the N feature vectors (FV.sub.1, . . . , FV.sub.N) into clusters of similar feature vectors based on a specified similarity threshold ST (Block 222). Particularly, in the illustrated embodiment, clustering module 108 applies a Chinese Whispers clustering algorithm to identify groups of similar feature vectors within the N feature vectors (FV.sub.1, . . . , FV.sub.N) and classify the respective groups into respective clusters 250, 252(1) to 252(J). Any feature vectors from FVS 150(1) that are not sufficiently similar to any other feature vectors to be classified into one of the clusters 250, 252(1) to 252(J) are classified as outlier feature vectors FVs 255. Accordingly, the cluster module output 223 includes clusters 250, 252(1) to 252(J) and any outlier feature vectors FV 155. The cluster that includes the . 
[0060] once the clustering module 108 generates a cluster module output 223 that meets the size threshold of block 224, the cluster module output 223 is passed to cluster extraction module 110 which is configured to extract any outlier feature vectors FV 255 from the cluster module output 223 and add them to an outlier feature vector (FV) set 170(1).
[0026] perform multiple iterations of a clustering operation to classify feature vectors that each represent a respective data object in an unstructured dataset, each iteration of the clustering operation including classifying groups of similar feature vectors into respective clusters and classifying feature vectors not included in one of the respective clusters as outlier feature vectors, wherein during one or more of the multiple iterations after a first iteration, outlier feature vectors from a previous iteration are excluded from the clustering operation).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Zhou as modified by Kleiman and Goel with comparing the vector similarity with a threshold of Dai. Such a modification would provide the dataset assembly module is configured to assemble a dataset that includes at least the feature vectors from the key clusters of the feature vector sets (e.g. fusions processing) (Dia [0014]).

Zhou as modified by Kleiman, Goel and Dia further teaches for the fused semi-structured source data, extracting corresponding data from the source data to form the structured source data (Zhou, page 15, lines 5-12, converting non-semi-structured data into structured data. Unstructured data refers to a large amount of plain text content in network data. It has the widest knowledge coverage, but it is also the most difficult to extract. It usually needs to be processed by natural language processing technology. Only by completing the transformation of unstructured data into structured data can the transformation from the relational database to the schema database schema be completed and the map constructed).

With respect to dependent claim 5, Zhou as modified by Kleiman, Goel and Dia further teaches wherein the step of cleaning the source data according to a preset data dictionary of arts and an error information table related to the field of arts comprises steps of: processing a single-valued attribute in the source data by using the error information table to replace an error value in the single-valued attribute with a correct value (Zhou, page 10, line 45- page 11, line 7, (1) Each node tag is represented by the table name of the entity table, that is, the table name of the entity table is used as the node tag name. For example, if the data table is named "Enterprise", then the node type labeled "Enterprise" is created.
(2)Each row in the entity table corresponds to a node, and each row in the relational data table can completely describe an entity and its attribute values, and at the same time determine the global unique identifier of the node.
(3)The columns on the relational table become node attributes. In a row of data, except for the unique label, all the other fields are supplemented and explained to the nodes, so they are all used as node attributes);
inquiring entity attribute information and relationship information corresponding to the source data from the preset data dictionary of arts and a relationship table according to the single-valued attribute; looking through the source data in the error information table; and in response to the error information table not containing the source data, of which the single-valued attribute is required to be replaced, outputting the entity attribute information and the relationship information corresponding to the source data (Zhou, page 14, line 16- page 15, line 3, Data cleansing: After the first two steps are completed, the framework and specifications of the mining database have been determined. The following will be the specific processing of the data, the main problems are: vacancy values, erroneous data, noise data and isolated points….).

With respect to dependent claim 6, Zhou as modified by Kleiman, Goel and Dia further teaches applying the knowledge graph to a preset scene (Zhou, page 4, lines 8-13, the knowledge graph is defined as: 3 the knowledge graph G consists of a pattern graph G/S, a data graph G/D, and a relationship R between the 4 pattern graph G/S, the data graph G/D and the data graph G/D, and can be expressed as a formula (1)…..).

With respect to dependent claim 7, Zhou as modified by Kleiman, Goel and Dia further teaches wherein the preset scene comprises at least one of scenes of: encyclopedia cards, searching, recommending, asking and answering, explaining, assisting in decision making (Zhou, page 4, lines 8-13, build a knowledge map based on the hierarchy and path length. Related art discloses a 

Regarding claims 12-14, 19 and 20; the instant claims recite substantially same limitations as the above rejected claims 5-7 and are therefore rejected under the same prior-art teachings.

Response to Amendment
In response to the 09/10/2021 office action claims 1, 5, 6, 8, 12, 12, 15, 19 and 20 have been amended, no new claim has been added, and claims 2-4, 9-11 and 16-18 have been cancelled. Claims 1, 5-8, 12-15 and 19-20 are currently pending and stand rejected.

Response to Arguments
Applicant’s arguments filed on 12/09/2021 have been considered. 
Applicant argues (page 14) Goel fails to teach the feature “dividing the semi-structured source data into a plurality of groups according to preset attribute information; obtaining, based on a Word2vec algorithm, similarity vectors corresponding to data in the semi-structured source data in the plurality of groups”.

Goel in paragraph [0039 and 134] teaches dividing the semi-structured source data [e.g. dividing semi-structured data sources data layer] into a plurality of groups [e.g. RFIs, Change Orders, Quality Issues, Safety Issues, Building Standards, Submittals, Contracts] according to preset obtaining, based on a Word2vec algorithm [e.g. based on construction language topics of a set word2vec topic models], similarity corresponding to data in the semi-structured source data in the plurality of groups [e.g. score degree of groups similarity].

Applicant argues (page 14) Dai does not disclose “in response to the similarity exceeding the preset similarity threshold, fusing the two data into a piece of semi-structured source data’, in order that “preprocessing the semi-structured source data may remove redundancy and enrich the source data”.
	Examiner’s response:
Dai in paragraph [0058 and 0060] teaches obtaining a similarity between any two data in a same group based on the similarity vectors comparing the similarity with a preset similarity threshold [e.g. during operation, clustering module 108 (e.g. fusing) will determine dusters based on a specified similarity threshold (ST)], in response to the similarity exceeding the preset similarity threshold, fusing the two data into a piece of source data [e.g. any feature vectors from FVS 150(1) that are not sufficiently similar to any other feature vectors to be classified into one of the clusters 250, 252(1) to 252(J) are classified as outlier feature vectors FVs 255]. 
in order that “preprocessing the semi-structured source data may remove redundancy and enrich the source data’’ “perform multiple iterations of a clustering operation to classify feature vectors that each represent a respective data object in an unstructured dataset, each iteration of the clustering operation including classifying groups of similar feature vectors into respective clusters and classifying feature vectors not included in one of the respective clusters as outlier feature vectors, wherein during one or more of the multiple iterations after a first iteration, outlier feature vectors from a previous iteration are excluded from the clustering operation”.
As shown above based on similarity threshold any feature vector exceeds the threshold will be placed into outlier feature vectors, in order to exclude clustered outlier feature vectors from the clustering operation.
Note: the argued limitation “preprocessing the semi-structured source data may remove redundancy and enrich the source data” is not recited in the claims.

Therefore Zhou as modified by Kleiman, Goel and Dia teaches the method as claimed.
		The dependent claims in view of the combination of references are rejected for the same reason given above in favor of independent claims. Therefore, in view of the response set forth above, the rejections of the claims are sustained.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOHEILA G DAVANLOU whose telephone number is (571)270-5155. The examiner can normally be reached Monday - Friday, 9:00am - 6:00 Eastern Time..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alford Kindred can be reached on (571)272-4037. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


SOHEILA G DAVANLOU
Examiner
Art Unit 2153



/KRIS E MACKES/Primary Examiner, Art Unit 2153