DETAILED ACTION
This action regards case 14/954664, HASSANZADEH, OKTIE et al, filed 11/30/2015, without priority claims. The assignee is INTERNATIONAL BUSINESS MACHINES CORPORATION.

Reply to Remarks / Miscellaneous

	Applicant writes:
The Examiner argues that Alqadah: figure 5; figure 6; paragraph 56, disclosing the tokenization of unstructured text as part of the candidate transformation process to classify text into related structures; paragraphs 26, 33, & 36-37, related to the expansion of a word based on a semantic.
 
	The office action had a typo. While Figures 5, 6, and paragraph 56 were from Alqadah, paragraphs 26, 33, and 36-37 were actually from the primary reference of Aarskog. The examiner cited these teaching from the primary reference to show the teaching and motivation for the combination of references, as well as partial teaching of the substance of the claim from the primary reference. The examiner has correct the oversight in the office action below, to make clearer the mapping of the prior art. 
	Applicant additionally argues that the prior art fails to teach expansion based on semantics. The examiner respectfully refers applicant to the prior office action, in which the examiner cited Aarskog, paragraphs 26. 33. 36-37, & 51, and the examiner noted how the concept was described in alternative language in the prior art.
	
	Applicant writes:


actually a transformation where words are removed. 

	
The examiner respectfully disagrees. The score calculation procedure explicitly takes into consideration stop lists, which Aarskog notes depends on language (Aarskog: paragraph 449). This appears to contemplate stop words (Aarskog: paragraph 539). The nature of the score calculation process allows each of the elements, including the stop lists, to be independently weighted. A weight of zero, which falls within the teaching of the Aarskog, is removing the words from the calculation. Further, even if the examiner has misinterpreted the prior art, the examiner respectfully notes that stop words are well known in the art, and the incorporation here does not modify the general understanding in the art for stop words. The examiner would therefore hold it to be obvious in light of knowledge of stop words in general in the alternative to the teaching the examiner has specifically cited. 

	Applicant writes:
Additionally, the Examiner argues that based on paragraph [0108] of Aarskog, that "mapping the source terms to target terms via the classification model" is a cluster. However, this appears to be a one to one correlation of the source term to target terms and not a grouping as would be to generate a cluster. 

On page 2 of the Office Action, the Examiner replies that "The claim language as drafted recites "each of the clusters including a plurality of source terms and a plurality of target terms. Each cluster (as least as recited) is arbitrary as to what terms it contains and what terms it maps. The prior art teaches the one-to-one correlation of source into target, but this one-to-one correlation appears to be required by each of the terms in any particular cluster. To perform the transformation of each of the elements inside of a cluster is functionally the same as transformation of the entire cluster." 

However, respectfully, the claim also recites that "splitting the input data into a 
plurality of clusters according to semantic relationship", and so the clusters are not necessarily arbitrary as the Examiner argues. 



	The examiner respectfully disagrees. The mapping of source terms to target terms as recited in Aarskog is based on the terms themselves, which is a semantic consideration. There appears to be no dividing line contemplated for the clusters themselves, other then the semantic relationships which formed the basis for the mapping between the source terms and the target terms. The examiner believes the broadest reasonable understanding of the claim language does not contemplate the relationship between clusters. Aarskog, paragraphs 26. 33. 36-37, & 51 reads on the concept of sorting according to semantic relationships.

	Applicant writes:
Furthermore, Alguagah recites that "[0056] Process 300 receives the UPT as a string input in step 302 and normalizes the string input in step 303. String normalization may include converting the UPT to a standard encoding (e.g. Unicode, ASCII, etc.), removing non-pertinent punctuation, removing excess whitespace, removing all capitalization, and removing non-pertinent symbols or characters. These steps produce a plurality of tokens. Every token of the UPT is scanned and checked in step 304 for matching a rule in TED database 304."  (emphasis added by Applicant). 
Therefore, the string input is converted to a standard encoding to produce tokens. However, the string input UPT is not transformed for each cluster as presently claimed. Moreover, the combination is not for providing the search result. 

The examiner respectfully disagrees. Because the relationship between clusters is unspecified noted in the examiner’s mappings and response to remarks, the normalization of the strings within the clusters is functionally equivalent to normalization of the cluster, which necessary contains the strings that are normalized as best understood by the examiner based on the present claim language. The examiner notes the clustering is based on semantics as disclosed by Aarskog, paragraphs 26. 33. 36-37, & 51. 


	The examiner therefore respectfully maintains the rejection based on the combination of art previous cited. 
	As the prior office action lacked sufficient clarity, the examiner makes this action non-final. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 USC 103 as rendered obvious by Aarskog, US Publication 2005/0108001 (May 19. 2005), in further view of Alqadah, US Publication 2015/0331936 (Nov. 19, 2015). All dependent claims incorporate the rejections of the claims on which they depend.

As to claim 1:
Aarskog discloses:
A method for identifying semantically related records in a machine including a computer, comprising:
receiving input data from an input device; (Aarskog: paragraph 88, “a set of interconnected apparatuses (or modules) that operates on an integrated set of database partitions populated with data transmitted by the devices associated to each apparatus”)
splitting the input data into a plurality of clusters according to semantic relationship, each of the clusters including a plurality of source terms and a plurality of target terms; (Aarskog: paragraph 108, “The present invention incorporates a documental classification scheme specifically designed in order to organize and structure the display of text zones in conformity with document classes reflecting the documents' situational context (see also the definitions of the text's inner and outer context).”; paragraph 108, “each text or portion of text being selected by the user can automatically be displayed in a plane in which the coordinates are specified with reference to the documental classification scheme (document model)”, mapping the source terms to target terms via the classification model, which is a cluster)

for each of the plurality of clusters that are transformed, finding relatedness scores of a plurality of semantic relatedness measures with the plurality of target terms; (Aarskog: paragraph 62, “the textual context may reveal deeper semantic variations.  Such simple linguistic variations may be captured in indexing devices with synonymy relations derived from thesauri.”) 
building a vector of similarity scores for each of the plurality of target terms; and (Aarskog: paragraph 135, “The notion of relevance is taken as a technical term referring to degree of similarity between a document vector (the representation of document content) and a query vector.  A query vector is supposed to represent the user's information need.”)
for each of the plurality of source terms, selecting a predetermined number of the plurality of target terms for each of the plurality of search terms according to the similarity scores to provide a search result. (Aarskog: paragraph 13; paragraph 21, “establishing relations between word occurrences, and with respect to the words' inner context”; paragraph 72, “domain specific thesauri in a target word selection procedure that regulates the assignment of relations in accordance with how the words appear in their inner context”; paragraph 458; paragraph 539, “The device that generates word lists operates on DBP Information Word and outputs various types of word lists that are utilized by other devices for filtering purposes.”)

Aarskog does not explicitly disclose, but Alquagah in an analogous art discloses:
transforming each of the plurality of clusters based on a transformation which includes tokenization of the plurality of clusters, and removal of words, the tokenization of the plurality of clusters comprising semantically extending  at least a word (Alqadah: figure 5; figure 6; paragraph 56, disclosing the tokenization of unstructured text as part of the candidate transformation process to classify text into related structures)

Before the effective filing date of the claimed invention, a person of ordinary skill in the art would combine the textual exploration discovery method of Aarskog with the textual classification method of Alqahah because Aarskog refers to tokenization (see Aarskog: paragraph 252), but does not describe the process in detail, and a person of ordinary skill would 


As to claim 2:
The method according to claim 1, wherein the splitting includes splitting the input data into the clusters that include terms that are more likely to semantically related with each other than other terms, wherein the search result is an output, and wherein the semantic extension includes extending the word into a plurality of words that are semantically related. (Aarskog: paragraph 24, “the system calculates/presents the items most likely to satisfy the user's need”; paragraphs 26, 33, & 36-37, related to the expansion of a word based on a semantic) 

As to claim 3:
The method according to claim 1, wherein the transformation includes the tokenization, conversion to lowercase, and removal of stop words, wherein smaller parts or individual words are semantically extended to a set of words that are related to an initial word from the plurality of clusters, and wherein when each word is extended in two sources, a whole set of extendable semantic relationship metrics is used to compute semantic distance between words that appear in two records of the two sources. (Aarskog: paragraph 449, disclosing stop words; Alqadah: paragraph 12, disclosing lowercase normalization) 

As to claim 4:
The method according to claim 3, wherein tokenization further comprises extending a first set of tokens to create an extended first set of tokens and a second set of tokens to create an extended second set of tokens based on a received semantic knowledge. (Aarskog: paragraph 612, “If the user transmits a request with the term `company` a system for automatic query expansion can include the interlinked terms, either as a category search or iteratively.”, wherein the iterative process uses early steps of semantic knowledge to inform later steps)


As to claim 5:
The method according to claim 4, further comprising receiving a set of weight values related to the extended first and second sets of extended tokens and calculating a similarity score for the extended first and second sets of extended tokens based on received weight values. (Aarskog: paragraph 125, “A Zone Traversal Path Adjusted is a modification of the default path based on pre-calculated weights that are adjusted according to information captured from the user request.”, wherein the zone applies to each set of compartmentalized tokens) 


As to claim 6:

The method according to claim 1, wherein the transformations includes taking an initial input and breaking into smaller parts of the initial input, and taking smaller parts of terms and semantically extended to a set of words that are related to the initial input. (Aarskog: paragraph 250, “The Word Information is processed in a device for word frequency calculation and a device that produces combined collocations.  The Word Information for each sentence is used when calculating the similarity between all pairs of sentences in a text.”)

As to claim 7:

The method according to claim 6, wherein the smaller parts are semantically extended by dictionary or ontology. (Aarskog: paragraph 314, " a Target Word Selection procedure (dictionary lookup), the application designer, or preferably the system can assign one or several semantic codes to the words (or other textual units, preferably semantic codes at various abstraction levels).  These semantic codes may preferably be assigned to zones and sentences (or other derived object types such as chains).  Further, semantic codes at a lower abstraction level are associated to smaller textual units such as a text zones")

As to claim 8:

The method according to claim 1 being an executable program stored in a computer readable medium and executed by a processor. (Alqadah: figure 1, with particular attention to elements 104, 114. 120, 144, 146, and 148), wherein the relatedness cores of various semantic relatedness measures with all the terms in another set to build the vector of similarity scores for each term, and wherein for each record in source,, the processor selects top-k (wherein K is an integer greater than zero) similar records in the target according to the similarity score derived and return as output.  (Alqadah: paragraph 23-24, disclosing similarity scores that can be used consistent with the system)

As to claim 9:

The method according to claim 1 being an executable program stored in a computer readable medium on cloud-based virtual server. (Alqadah: figure 1, with particular attention to element 136; Aarskog: paragraph 98, contemplating virtual layers)


As to claim 10:

Aarskog discloses:

A method of identifying semantically related records in including a computer, comprising:
receiving input data from an input device, the input data including a plurality of source terms and a plurality of target terms; (Aarskog: paragraph 88, “a set of interconnected apparatuses (or modules) that operates on an integrated set of database partitions populated with data transmitted by the devices associated to each apparatus”)

for each of the transformed set of terms, finding relatedness scores of a plurality of semantic relatedness measures with the plurality of target terms; (Aarskog: paragraph 62, “the textual context may reveal deeper semantic variations.  Such simple linguistic variations may be captured in indexing devices with synonymy relations derived from thesauri.”)
building a vector of similarity scores for each of the plurality of target terms; and (Aarskog: paragraph 135, “The notion of relevance is taken as a technical term referring to degree of similarity between a document vector (the representation of document content) and a query vector.  A query vector is supposed to represent the user's information need.”)
for each of the plurality of source terms, selecting a predetermined number of the plurality of target terms according to the similarity scores for an output to provide a search result. (Aarskog: paragraph 21, “establishing relations between word occurrences, and with respect to the words' inner context”; paragraph 72, “domain specific thesauri in a target word selection procedure that regulates the assignment of relations in accordance with how the words appear in their inner context”; paragraph 458; paragraph 539, “The device that generates word lists operates on DBP Information Word and outputs various types of word lists that are utilized by other devices for filtering purposes.”)

Aarskog does not explicitly disclose, but Alquagah in an analogous art discloses:
transforming each of the plurality of clusters based on a transformation which includes tokenization of the plurality of clusters, and removal of words, the tokenization of the plurality of clusters comprising semantic extension of at least a word (Alqadah: figure 5; figure 6; paragraph 56, disclosing the tokenization of unstructured text as part of the candidate transformation process to classify text into related structures; see also Aarskog paragraphs 26, 33, & 36-37, related to the expansion of a word based on a semantic)


Before the effective filing date of the claimed invention, a person of ordinary skill in the art would combine the textual exploration discovery method of Aarskog with the textual classification method of Alqahah because Aarskog refers to transformation (see Aarskog: paragraph 252; see also paragraph 108), but does not describe the process in detail, and a person of ordinary skill would consult and combine references in the related domain of art that provided more explicit teaching related to the explicitly contemplated process. 


As to claim 11:

The method according to claim 10, further comprising of splitting the input data into a plurality of clusters that are more likely to be semantically related, the tokenization of the plurality of clusters comprising semantically extendind at least a word (Aarskog: paragraph 24, “the system calculates/presents the items most likely to satisfy the user's need”; paragraphs 26, 33, & 36-37, related to the expansion of a word based on a semantic)


As to claim 12:

The method according to claim 10, wherein the transformation method includes tokenization, conversion to lowercase, and removal of stop words. (Aarskog: paragraph 449, disclosing stop words; Alqadah: paragraph 12, disclosing lowercase normalization)


As to claim 13:

The method according to claim 12, wherein tokenization further comprises extending a first set of tokens to create an extended first set of tokens and a second set of tokens to create an extended second set of tokens based on a received semantic knowledge. (Aarskog: paragraph 612, “If the user transmits a request with the term `company` a system for automatic query expansion can include the interlinked terms, either as a category search or iteratively.”, wherein the iterative process uses early steps of semantic knowledge to inform later steps)

As to claim 14:

The method according to claim 13, further comprising receiving a set of weight values related to the extended first and second sets of extended tokens and calculating a similarity score for the extended first and second sets of extended tokens based on the received weight values. (Aarskog: paragraph 125, “A Zone Traversal Path Adjusted is a modification of the default path based on pre-calculated weights that are adjusted according to information captured from the user request.”, wherein the zone applies to each set of compartmentalized tokens)


As to claim 15:

The method according to claim 10, wherein the transformation includes taking an initial input and breaking into smaller parts, and
taking the smaller parts and semantically extended to a set of words that are related to the initial input. (Aarskog: paragraph 250, “The Word Information is processed in a device for word frequency calculation and a device that produces combined collocations.  The Word Information for each sentence is used when calculating the similarity between all pairs of sentences in a text.”)

As to claim 16:

The method according to claim 15, wherein the smaller parts are semantically extended by dictionary or ontology. (Aarskog: paragraph 314, " a Target Word Selection procedure (dictionary lookup), the application designer, or preferably the system can assign one or several semantic codes to the words (or other textual units, preferably semantic codes at various abstraction levels).  These semantic codes may preferably be assigned to zones and sentences (or other derived object types such as chains).  Further, semantic codes at a lower abstraction level are associated to smaller textual units such as a text zones")

As to claim 17:

The method according to claim 10 being an executable program stored in a computer readable medium and executed by a processor. (Lobez Comeras: figure 1, 7, & 8; paragraph 92, disclosing a cloud computing environment)

 As to claim 18:
The method according to claim 10 being an executable program stored in a computer readable medium on cloud-based virtual server. (Lobez Comeras: figure 1, 7, & 8; paragraph 92, disclosing a cloud computing environment)

As to claim 19:

Aarskog discloses:
An apparatus for identifying semantically related records, comprising: 
A machine including:
a computer readable medium storing a program; and a processor executing the program, (Lobez Comeras: figure 1, 7, & 8; paragraph 92, disclosing a cloud computing environment)
wherein the processor receives input data from an input device, (Aarskog: paragraph 88, “a set of interconnected apparatuses (or modules) that operates on an integrated set of database partitions populated with data transmitted by the devices associated to each apparatus”)
wherein the processor splits the input data into a plurality of clusters according to semantic relationship, each of the clusters including a plurality of source terms and a plurality of target terms, (Aarskog: paragraph 108, “The present invention incorporates a documental classification scheme specifically designed in order to organize and structure the display of text zones in conformity with document classes reflecting the documents' situational context (see also the definitions of the text's inner and outer context).”; paragraph 108, “each text or portion of text being selected by the user can automatically be displayed in a plane in which the coordinates are specified with reference to the documental classification scheme (document model)”, mapping the source terms to target terms via the classification model, which is a cluster)

wherein, for each of the plurality of clusters that are transformed, the processor finds relatedness scores of a plurality of semantic relatedness measures with the plurality of target terms, (Aarskog: paragraph 62, “the textual context may reveal deeper semantic variations.  Such simple linguistic variations may be captured in indexing devices with synonymy relations derived from thesauri.”)
wherein the processor builds a vector of similarity scores for each of the plurality of target terms, and (Aarskog: paragraph 135, “The notion of relevance is taken as a technical term referring to degree of similarity between a document vector (the representation of document content) and a query vector.  A query vector is supposed to represent the user's information need.”)
wherein, for each of the plurality of source terms, the processor selects a predetermined number of the plurality of target terms according to the similarity scores to provide a search result. (Aarskog: paragraph 21, “establishing relations between word occurrences, and with respect to the words' inner context”; paragraph 72, “domain specific thesauri in a target word selection procedure that regulates the assignment of relations in accordance with how the words appear in their inner context”; paragraph 458; paragraph 539, “The device that generates word lists operates on DBP Information Word and outputs various types of word lists that are utilized by other devices for filtering purposes.”)

Aarskog does not explicitly disclose, but Alquagah in an analogous art discloses:
wherein the processor transforms each of the plurality of clusters based on a transformation which includes tokenization of the plurality of clusters, and removal of words, the tokenization of the plurality of clusters comprises semantic extension of at least a word (Alqadah: figure 5; figure 6; paragraph 56, disclosing the tokenization of unstructured text as part of the candidate transformation process to classify text into related structures; see also Aarskog paragraphs 26, 33, & 36-37, related to the expansion of a word based on a semantic)

Before the effective filing date of the claimed invention, a person of ordinary skill in the art would combine the textual exploration discovery method of Aarskog with the textual classification method of Alqahah because Aarskog refers to tokenization (see Aarskog: paragraph 252), but does not describe the process in detail, and a person of ordinary skill would 

As to claim 20:

The apparatus according to claim 19, wherein the transformation includes the tokenization, conversion to lowercase, and removal of stop words. (Aarskog: paragraph 449, disclosing stop words; Alqadah: paragraph 12, disclosing lowercase normalization)

Conclusion


Applicant is cautioned to avoid entry of any new matter in any amendment(s) to the claim(s), drawing(s), or specification. Any amendment or correction which enters new matter may trigger a rejection under 35 USC 112 ¶ 1 and / or 35 USC 132(a). See also MPEP 706.03(o). 
The examiner respectfully requests that any amendments to the claims be accompanied by written remarks which show pinpoint support from the specification for each new limitation. Such a showing will assist in expediting prosecution. 
For additional art discovered by the examiner and deemed to be relevant, please see form 892. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kurt A Mueller whose telephone number is (571)270-3889.  The examiner can normally be reached during standard business hours, but prefers interviews to be conducted Tues - Thur. The examiner can also be reached by personal fax at (571) 270-4889. Please use this fax number for any written interview requests. Include a written interview agenda with the interview request (form PTOL-413A- see MPEP 713.01), and if required, authorization to act in a representative capacity (form PTO/SB/84 – see MPEP 405). The examiner strongly suggests the submission of an agenda 
Please note that any document submitted by applicant in connection with this or any other matter must be made part of the official record as required by the Federal Records Act, 44 U.S.C. 3101 et seq. Any instruction contained in any submission requesting the examiner not to enter a document into the record will be disregarded.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at 571-272-3677.  The central fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Thursday, March 25, 2021
/K. A. M./
Examiner, Art Unit 2157

/James Trujillo/            Supervisory Patent Examiner, Art Unit 2157