Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Priority
Examiner acknowledges applicants’ claim of priority to the following application:
Certified copy of foreign application serial no. 201811194956.4 , filed 10/15/2018.

Claims 1-20 have been examined.
This action is made FINAL.

Claim Rejections – 35 USC § 101

In light of the claim amendments the 101 rejections to claims 1-20 have been withdrawn.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 9-14 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over by Wheeler [US 2020/0026772 A1, July 23, 2018], in view of Zhu et al. [US 20190205384 A1, October 11, 2018].

With respect to claim 1, Wheeler teaches a computer implemented method, comprising:
upon receiving a search request for an object, obtaining a plurality of webpages crawled by a web crawler in a search engine that are associated with the object ([0003] a search engine typically receives a search query (e.g., query input including one or more terms, such as keywords, by a user of the search engine). Search engines generally index website content, such as web pages of crawled websites, and then identify relevant content (e.g., URLs for matching web pages) based on matches to keywords received in a user query that includes one or more terms or keywords);
clustering [e.g. cards] the plurality of webpages based on the plurality of similarity scores [e.g. a web document that is included in content feed 804 may be determined based on a sim hash associated with an entity corresponding to the web document and a cosine similarity] ([0120] content feed 804 can include one or more web documents that are similar to an entity included in a query. A user may initiate a query by selecting a query icon 836 included in the application. The query may be comprised of one or more words, a phrase, a question, or a sentence. The query may correspond to an entity. A web document that is included in content feed 804 may be determined based on a sim hash associated with an entity corresponding to the web document and a cosine similarity between a feature vector associated with the entity 
[0173] FIG. 14, some of the interests in the 100 dimensional vector space are clustered together after performing the collaborative filtering technique. For example a cluster 1402 includes an interest in photography and an interest in Flickr.RTM.. Cluster 1404 includes an interest in Yelp.RTM., San Francisco, Silicon Valley, TechCrunch.RTM., virtual reality, and Engadget.RTM.. The interests comprise a cluster in the event the distance between each 100 dimensional space vector of a plurality of interests is less than or equal to a document similarity threshold….), into a Plurality of  groups of webpages representing a plurality of first events [e.g. Fig. 8A, 806, 810 and 814 cards] ([0112] FIG. 8A, the content feed is comprised of one or more cards that include web documents (e.g., or excerpts of web documents that can be selected to view the entire web document) and/or synthesized content and is based on a user model, such as user model 316, which is tailored to a user account, such as user account 402. For example, a web document can be an article, sponsored content, an advertisement, a social media post, online video content (e.g., embedded video file), online audio content (e.g., embedded audio file), etc.);
selecting [e.g. content feeds], from the plurality of groups of webpages [e.g. story groups] representing the plurality of first events [e.g. cards], one or more
groups of webpages representing one or more second events ([0313] the orchestrator is configured to generate story groups in a content feed. For example, a user may indicate a preference for such story groupings rather than the described interleaving of cards in the user's content feed (e.g., such can be implemented as a In such cases, rather than interleaving cards for different interests in the user's content feed, the orchestrator can automatically reshuffle the cards in the feed (e.g., irrespective of the relative document scores) so that cards related to the same interest are contiguous in the content feed. For example, if the content feed update includes three new cards related to the interest of computer security for mobile devices, then the orchestrator can group those three new cards together within the content feed), wherein each of the one or more selected groups comprises a quantity of webpages greater than a threshold [e.g. sim hash] ([0121] a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document. In some embodiments, a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document and has a cosine similarity score that is greater than or equal to a cosine similarity threshold);
determining one or more representative webpages [e.g. generating labels] respectively from the one or more selected groups of webpages representing the one or more second events ([0240-0242] FIG. 21 is a flow diagram illustrating a process performed by the classifier for generating labels for websites to facilitate categorizing of documents in accordance with some embodiments. In some embodiments, the process 2100 for generating labels for websites to facilitate categorizing of documents is performed using the disclosed system/service (e.g., including classifier 1740 of search and feed system 1700 of FIG. 17).
; and
returning the one or more representative webpages as a search result of the search  request ([0353-0354] ranking the set of documents based on a document score and a user signal is performed. In an example implementation, the orchestrator can rank the set of documents based on the document score and the user signal, such as similarly with respect to FIG. 26. 
At 2710, generating a content feed that includes at least a subset of the set of documents based on the ranking is performed. In an example implementation, the orchestrator can generate the content feed (e.g., for the app) that includes at least a subset of the set of documents based on the ranking, such as similarly described above (e.g., as similarly described with respect to FIG. 26 and an example content feed is shown in FIGS. 8A-8B). For example, the content feed for the user can include content from one or more web documents related to one or more of the user's interests).

Wheeler does not expressly teach:
parsing the plurality of webpages by the web crawler;
determining a plurality of similarity scores among the plurality of webpages based on the parsing result.
Zhu teaches:
parsing the plurality of webpages by the web crawler [e.g. webpages, may be further segmented to generate a set containing terms, and each term is assigned to a certain weight] determining a plurality of similarity scores [a score is calculated according to the weights] among the plurality of webpages based on the parsing result ([0022] the query may be segmented first, to generate a set containing multiple keywords, and each keyword is assigned to a certain weight. Then content to be retrieved, such as Internet webpages, may be further segmented to generate a set containing terms, and each term is assigned to a certain weight. A degree of word coincidence, i.e. a similarity, between the set containing multiple keywords and the set containing terms is calculated, and a score is calculated according to the weights, then a sorting result of pure text relevance is obtained).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Wheeler with determining a similarity score of ZHU. Such modification would provide search result that the user is more willing to click or stay is obtained (ZHU [0003]).

With respect to dependent claim 2, Wheeler as modified by Zhu further teaches wherein the obtaining a plurality of webpages associated with the object comprises: sorting webpages crawled by the web crawler in the search engine based on a number of occurrences of the object in a title and a body text of each of the webpages, to obtain a sorted list of the webpages crawled by the web
 crawler in the search engine; and determining a plurality of webpages associated with the object based on the sorted list (Wheeler [0200] processing the row for each document can include processing text or other content in a title field of a web page document, processing text or other content in a body of a web page document, processing text or other content in tweets, or other anchors (e.g., Reddit posts, etc.). Processing of text can include identifying terms of interest in the document (e.g., using term frequency-inverse document frequency (TF-IDF) and/or other techniques). In cases of (re)tweets, Reddit posts, or other user associations with the document, the indexer can also determine a credibility associated with the user (e.g., a user/entity can be given a credibility ranking/score based on a threshold value associated with the number of followers for the user's verified user account on a given social network or other objective metrics can be utilized)).

With respect to dependent claim 3, Wheeler as modified by Zhu further teaches wherein the clustering the plurality of webpages comprises: determining, for every two webpages in the plurality of webpages, a similarity between the two webpages (Wheeler [0051] each web document included in the corpus of web documents has an associated feature vector. The feature vector represents a location of a web document in the feature space associated with the corpus of web documents. …determining the similarity between D.sub.n and any other web document using matrix X can be computationally intensive and time consuming. To reduce the amount of ; and determining, in response to the similarity between the two webpages being greater than a preset similarity threshold [e.g. sim hash], that the two webpages are associated with a first event (Wheeler [0121] a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document. In some embodiments, a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document and has a cosine similarity score that is greater than or equal to a cosine similarity threshold).

With respect to dependent claim 9, Wheeler as modified by Zhu further teaches wherein the selecting one or more groups of webpages representing the one or more first events comprises, for one of the one or more groups representing
 a first event of the one or more first events [e.g. Fig. 8A, 806, 810 and 814 cards] (Wheeler [0112] FIG. 8A, the content feed is comprised of one or more cards that include web documents (e.g., or excerpts of web documents that can be selected to view the entire web document) and/or synthesized content and is based on a user model, such as user model 316, which is tailored to a user account, such as user account 402. For example, a web document can be an article, sponsored content, an advertisement, a social media post, online video content (e.g., embedded video file), online audio content (e.g., embedded audio file), etc.);
determining, based on a number of webpages associated with the first event, a popularity of the first event (Wheeler [0313] the orchestrator is configured to generate story groups in a content feed. For example, a user may indicate a preference for such story groupings rather than the described interleaving of cards in the user's content feed (e.g., such can be implemented as a configurable parameter or measured as a user feedback based on generated content feeds that use interleaving and other content feeds that use story group approaches). In such cases, rather than interleaving cards for different interests in the user's content feed, the orchestrator can automatically reshuffle the cards in the feed (e.g., irrespective of the relative document scores) so that cards related to the same interest are contiguous in the content feed. For example, if the content feed update includes three new cards related to the interest of computer security for mobile devices, then the orchestrator can group those three new cards together within the content feed); and in response to the popularity of the first event being greater than a preset popularity threshold, selecting the one group as a group representing one of the one or more second event [e.g. sim hash] (Wheeler [0121] a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document. In some embodiments, a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document and has a cosine similarity score that is greater than or equal to a cosine similarity threshold).

wherein the determining one or more representative webpages respectively from the one or more selected groups of webpages representing the one or more second events comprises: for each second event of the one or more second events, determining webpage from the corresponding selected group of event based on a number of occurrences of the object in a title and a body text of the one or more webpages; and determining the webpage as a representative webpage of the second event (Wheeler [0240-0242] FIG. 21 is a flow diagram illustrating a process performed by the classifier for generating labels for websites to facilitate categorizing of documents in accordance with some embodiments. In some embodiments, the process 2100 for generating labels for websites to facilitate categorizing of documents is performed using the disclosed system/service (e.g., including classifier 1740 of search and feed system 1700 of FIG. 17).
Referring to FIG. 21 at 2102, processing web pages for a plurality of different websites is performed to identify topics for the web pages of each of the websites using the classifier (e.g., the classifier that was previously trained using training data sets as similarly described above). For example, the classifier can determine that all pages with a URL of "http://example-web-site-1.com/sports" are likely about sports and that all pages with a URL of "http://example-web-site-1.com/technology" are likely about technology and that all pages with a URL of "http://example-web-site-2.com" are likely about astronomy and that all pages with a URL of "http://example-web-site-32.com" are likely about chemistry…).

wherein the returning the one or more representative webpages comprises: for each second event of the one or more second events, determining a release time of the representative webpage as the occurrence time of the second event; and based on the occurrence times of the second events, determining an order to return the one or more representative webpages representing the one or more second events (Wheeler [0267] if a user tweets about a new posted article (e.g., web page on a website, as publishers generally post a tweet or other online announcement that indicates that a new article is being released or posted on their site at about the same time as it is being released/posted on their site, so such can provide a timely notification to add to the time series/crawl list for crawling and indexing to timely update the RDI as similarly described herein), then the delay to the serving stack can be as little as one minute or less during which the new web page is crawled, indexed, and available as a newly added document in the RDI provided by the serving stack (e.g., the serving structure as shown at 1734 of FIG. 17)).

Regarding claims 12-14 and 18-20; the instant claims recite substantially same limitations as the above-rejected claims 1-3 and 9-11 and are therefore rejected under the same prior-art teachings.

Claims 4-8 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over by Wheeler in view of Zhu, as applied to claims 1 and 12, further in view of Milazzo et al [US 20200073902 A1, 08/13/ 2018]. 

 wherein the determining a similarity between the two webpages comprises:
determining a first similarity between body texts of the two webpages, a second similarity between objects included in the body texts of the two webpages, a third similarity between titles of the two webpages, and a fourth similarity between objects included in the titles of the two webpages; and
determining the similarity between the two webpages based on the first similarity, the second similarity, the third similarity, and the fourth similarity.
Milazzo teaches wherein the determining a similarity between the two webpages comprises:
determining a first similarity between body texts of the two webpages, a second similarity between objects included in the body texts of the two webpages, a third similarity between titles of the two webpages, and a fourth similarity between objects included in the titles of the two webpages; and determining the similarity between the two webpages based on the first similarity, the second similarity, the third similarity, and the fourth similarity ([0085] the title topic indicator may indicate a relationship between keywords and/or phrases in the title and keywords and/or phrases in the body. In various embodiments, the text assessment module 204, utilizing NLP, identifies keywords and phrases in the title of the article and the body of the article. The indicator module 206 may determine which keywords and/or phrases are important in the body of the article and determine similarity with words and/or phrases of the title of the article. the indicator module 206 may utilize a term frequency-inverse document frequency (TF-IDF) to determine 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention to modify the system of Wheeler as modified by Zhu with determining a similarity including objects and body texts of the two files of Milazzo. Such a provide the user with a previously determined credibility score, the bias score, and/or the sentiment score, for that article (Milazzo [0154]).

With respect to dependent claim 5, Wheeler as modified by Zhu and Milazzo wherein the determining a first similarity between body texts of the two 
webpages comprises: generating a first character vector and a first word vector of the body text of a first webpage of the two webpages (Wheeler [0172] a document similarity between two entities can be determined by computing a dot product between two vectors); generating a second character vector and a second word vector of the body text of a second webpage of the two webpages; determining a fifth similarity between the first character vector and the second character vector, and a sixth similarity between the first word vector and the second word vector; and based on the fifth similarity and the sixth similarity, determining the first similarity between the body texts of the two webpages (Milazzo [0138] indicator module 206 may generate lightweight scores. For example, the text assessment module 204 and/or the indicator module 206 may generate word count vectors of text in the article using TF-IDF. In various embodiments, the indicator module 206 applies dimensionality reduction to reduce the number of data features (e.g., the number of LSA attempts to find similarities in the domains of words. In the representation used by LSA, a document is seen as an unordered collection of words, and the matrix of words versus documents may be analyzed with SVD, so that information may be sorted into implicit categories. SVD allows an exact representation of any matrix, and also may eliminate the less important parts of that representation to produce an approximate representation with any desired number of dimensions).

With respect to dependent claim 6, Wheeler as modified by Zhu and Milazzo wherein the determining a second similarity between the objects included in the body texts of the two webpages comprises: generating a first vector of the included in the body text of a first webpage of the two webpages; generating a second vector of the object included in the body text of a second webpage of the two webpages; and based on the first vector and the second vector, determining the second similarity between the objects included in body texts of the two webpages (Milazzo [0138] indicator module 206 may generate lightweight scores. For example, the text assessment module 204 and/or the indicator module 206 may generate word count vectors of text in the article using TF-IDF. 
Milazzo [0181] indicator module 206 may determine a title topic indicator to indicate a relationship between key words and/or phrases in the title and keywords and/or phrases in the body of the content of the webpage. In one example, the text assessment module 204 utilizes NLP to identify keywords and phrases in the title of the content of the webpage as well as the body of the content of the webpage. As 

With respect to dependent claim 7, Wheeler as modified by Zhu and Milazzo wherein the determining a third similarity between titles of the two Webpages
comprises: generating a first character vector and a first word vector of the title content of a first webpage of the two webpages; generating a second character vector and a second word vector of the title content of a second webpage of the two webpages; determining a seventh similarity between the first character vector and the second character vector, and an eighth similarity between the first word vector and the second word vector; and based on the seventh similarity and the eighth similarity, determining the third similarity between the titles of the two webpages (Milazzo [0138] indicator module 206 may generate lightweight scores. For example, the text assessment module 204 and/or the indicator module 206 may generate word count vectors of text in the article using TF-IDF. In various embodiments, the indicator module 206 applies dimensionality reduction to reduce the number of data features (e.g., the number of dimensions) in the data set calculated using TF-IDF to identify principal components. LSA attempts to find similarities in the domains of words. In the representation used by LSA, a document is seen as an unordered collection of words, and the matrix of words versus documents may be analyzed with SVD, so that information may be sorted into implicit categories. SVD allows an exact representation 

With respect to dependent claim 8, Wheeler as modified by Zhu and Milazzo wherein the determining a fourth similarity between objects included in the titles of the two webpages comprises: generating a first vector of the object included in the title of a first webpage of the two webpages; generating a second vector of the object included in the title of a second webpage of the two webpages; and based on the first vector and the second vector, determining the fourth similarity between the objects included in titles of the two webpages (Wheeler [0208] the RDI includes a vector-based model (e.g., a vector model) for each document in the index. In an example implementation, the vector model is built using unsupervised machine learning techniques. For example, the unsupervised machine learning can learn a representation of a word, a sequence of words, parts of a document such as title, and finally, a representation for the entire document itself. In this example implementation, the document is annotated with vectors that represent the whole document, vectors for some selected portions of the document such as the title, and vectors for each of the annotations.
Wheeler [0144] a distance between two 100 dimensional space vectors can be determined to facilitate various embedded based comparison, similarity, and retrieval techniques described herein. In some embodiments, a Euclidean distance between the 100 dimensional space vectors is determined. For example, in the event the distance 

Regarding claims 15-17; the instant claims recite substantially same limitations as the above-rejected claims 5-8 and are therefore rejected under the same prior-art teachings.
Response to Amendment
In response to the 03/22/2021 office action claims 1-20 have been amended, no new claim has been added, and no claim has been cancelled. Claims 1-20 are currently pending and stand rejected.

Response to Arguments
Applicant’s arguments filed on 06/18/2021 have been considered. 
Applicant argues (pages 12-13) Wheeler fails to teach at least the following features of the amended claim 1:
determining a plurality of similarity scores among the plurality of webpages based on the parsing result;
clustering the plurality of files webpages based on the plurality of similarity scores into a plurality of groups of webpages representing a plurality of first events;
selecting, from the plurality of groups of webpages representing the plurality of first events, one or more groups of webpages representing one or more second events, wherein each of the one or more selected groups comprises a quantity of webpages greater than a threshold. 

The new reference Zhu et al. [US 20190205384 A1, October 11, 2018] in paragraph [0022] teaches parsing the plurality of webpages by the web crawler [e.g. webpages, may be further segmented to generate a set containing terms, and each term is assigned to a certain weight] determining a plurality of similarity
 scores [a score is calculated according to the weights] among the plurality of webpages based on the parsing result.
Wheeler in paragraphs [0112, 0120] teaches clustering [e.g. cards] the plurality of webpages based on the plurality of similarity scores [e.g. a web document that is included in content feed 804 may be determined based on a sim hash associated with an entity corresponding to the web document and a cosine similarity] into a Plurality of  groups of webpages representing a plurality of first events [e.g. Fig. 8A, 806, 810 and 814 cards].
Wheeler in paragraphs [0121, 0240-0242, 0313] teaches selecting [e.g. content feeds], from the plurality of groups of webpages [e.g. story groups] representing the plurality of first events [e.g. cards], one or more groups of webpages representing one or more second events [e.g. generating labels] wherein each of the one or more selected groups comprises a quantity of webpages greater than a threshold [e.g. a web document may be returned for a query in the event the entity included in the query has the same sim hash as the entity corresponding to the web document and has a cosine similarity score that is greater than or equal to a cosine similarity threshold].
As shown above Wheeler as modified by Zhu teaches the method as claimed.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOHEILA G DAVANLOU whose telephone number is (571)270-5155.  The examiner can normally be reached on Monday - Friday, 9:00am - 6:00 Eastern Time..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alford Kindred can be reached on (571)272-4037.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


SOHEILA G DAVANLOU
Examiner
Art Unit 2153



/ALFORD W KINDRED/Supervisory Patent Examiner, Art Unit 2153