DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Amendment filed on July 7, 2022.  Claims 3-20, 22, 28, and 35 are cancelled.  Claim 41 is new.  Claims 1, 26, and 32 are amended.  Claims 1, 2, 21, 23-27, 29-34, and 36-41 are pending in the case.  Claims 1, 26, and 32 are the independent claims.  
This action is final.


Applicant’s Response
In Applicant’s response filed on July 7, 2022, Applicant provided arguments in response to the rejections of the claims under 35 USC 103 in the previous office action.

Response to Argument/Amendment
Applicant’s interview request (on page 7 of Applicant’s response filed July 7, 2022) is acknowledged.  On September 23, 2022, Examiner attempted to contact, and left a voice mail for, Applicant’s representative Lance Wimmer at (281) 970-4545 (as directed on page 13 of Applicant’s response filed July 7, 2022) in order to schedule an interview as requested in Applicant’s response.  Applicant’s representative has not, to Examiner’s knowledge, responded to this communication as of the writing of this office action.  Should Applicant continue to desire an interview regarding this application, Examiner respectfully invites Applicant’s representatives to contact him at their convenience.  Examiner may be reached via telephone at (469) 295-9105, or via email at jeremy.stanley@uspto.gov.
Applicant’s arguments in response to the rejection of the claims under 35 USC 103 in the previous office action are acknowledged, and have been fully considered.  Applicant’s arguments are persuasive.  
Applicant argues that Yee, Lee, and Lavergne do not teach all of the limitations recited in the amended claims, i.e. “detecting an input to insert an object into a slide of a presentation file…classification label is based at least in part on an association between the object and one or more additional objects in the slide….”  For example, while Applicant appears to admit (on page 9 of Applicant’s remarks filed July 7, 2022) that Yee teaches a system including image classification to identify text for labels for images and associate on or more labels corresponding to the relevant text with the images, such as text appearing with images on a webpage, Applicant argues that Yee does not teach an input to insert an object into a slide of a presentation file, and is instead silent regarding a slide and a presentation file, “as well as a classification label that is based on an inserted object on a slide and one or more additional objects in the slide when the object is inserted into the slide.”  Applicant additionally argues that Lee teaches generically classifying objects, but does not consider an association of the text object and one or more additional objects on the slide for classification, and that Lavergne merely teaches parsing contents of text to assign classification labels, but is silent regarding association of text content and one or more additional objects to assign appropriate classification labels.
Examiner notes that Lee does teach input to insert an object into a portion of an application file (as cited in the previous office action and below).  Examiner additionally notes that Yee does teach generating a classification label for a first object in a file (an image object) based on an additional object of a different type in the file (associated text) (as cited in the previous office action and below).  However, Examiner agrees that Lee, Lavergne, and Yee do not explicitly disclose that the portion of the application file is a slide, and that the application file is a presentation file (as recited in the amended independent claims).  
Therefore, Applicant’s argument is persuasive, and the previous rejections are withdrawn.  However, new grounds of rejection are provided below.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102€, (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).

Claims 1, 21, 24, 26, 27, 30, 32, 34, 37, and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (US 20160147434 A1) in view of Lavergne (US 10515125 B1), further in view of Yee et al. (US 8478052 B1), further in view of Kikin-Gil et al. (US 20190114047 A1).
With respect to claims 1, 26, and 32, Lee teaches:
an electronic device, comprising: a graphical user interface (GUI) configured to display an application file (e.g. paragraph 0004, memo application; Fig. 1, showing UI for entering handwritten content; paragraph 0114, page/document including object; paragraph 0376, UI module providing UI/GUI operated according to applications; i.e. a document/memo application interface is displayed including a document into which content is to be entered); a processor configured to execute instructions to perform a method (e.g. paragraph 0378, processors operating in accordance with stored instructions);
a tangible, non-transitory computer-readable medium configured to store a program for determining object names based on object content, the program comprising instructions to perform the method (e.g. paragraph 0377, non-transitory computer readable recording medium, functional programs, code, code segments for accomplishing described invention; paragraph 0378, instructions stored on non-transitory processor readable mediums); and
the method, comprising: 
detecting an input to insert an object into a portion of an application file via an application graphical user interface (GUI), wherein the object includes content (e.g. paragraph 0004, memo application enabling user to record and store information using electronic pen; paragraph 0093-0094, display device receives handwritten input and displays handwritten content; handwritten content includes image, text, table, list, formula, etc.; paragraph 0097, user selecting area from handwritten content; paragraph 0100-0101, describing Fig. 2, acquiring handwritten content generated in real time or already generated; handwritten input from user; handwritten content stored in memory; paragraph 0104, handwriting strokes segmented into objects; paragraph 0114, document including object; paragraph 0376, specialized UI/GUI operated according to applications; i.e. as shown in Fig. 1, a user enters (analogous to inserting), in a GUI, a plurality of handwritten/drawn objects of various type of content in a document (analogous to an application file)); 
in response to detecting the input, providing the object to a content classifier (e.g. paragraph 0100, handwritten content generated in real time; paragraph 0103-0104, handwritten content grouped and segmented into plurality of objects as further shown in Fig. 3; paragraph 0112, processing objects which are segmented while handwritten content is received from user in form of selectable object in real time; paragraph 0138, describing S350 of Fig. 3, classifying objects included in handwritten content; paragraph 0141, indicating that these operations are implemented through a machine learning algorithm; i.e. in response to receiving handwriting in real time, operations are performed via a machine learning algorithm, including grouping the handwriting into objects, determining characteristic information, and classifying the objects into types; that is, the received handwritten content is processed by a machine learning algorithm which determines classifications for objects within the content, and is therefore analogous to a content classifier machine learning module); 
receiving a classification label for the object from the content classifier (e.g. paragraph 0105, grouping strokes according to segmentation level; paragraph 0108, segmenting content into figure, text, list, table, formula, etc. objects according to type; paragraph 0120, displaying labels indicating object types; paragraph 0136, extracting characteristic information based on corresponding segmentation levels; paragraph 0137, extracting multi-scale characteristic information by combining characteristic information; paragraph 0138, describing S350 of Fig. 3, classifying objects included in handwritten content using multi-scale characteristic information, and analyzing types of objects included in the content; paragraph 0139, classifying groups as paragraph; paragraph 0140, classifying group as list object; classifying group as figure object; paragraph 0141, indicating that these operations are implemented through a machine learning algorithm; paragraph 0145, describing Fig. 4, segmenting handwritten content into figure object 410, text object 420, and list object 430; paragraph 0148, displaying labels indicating object types of plurality of objects 410-430 near respective content; i.e. once the machine learning algorithm determines classifications/types of different objects within the handwritten content, labels are provided and displayed on the screen in association with the objects, as shown in Fig. 4); 
updating in the application GUI, a metadata name of the object based upon the classification label (e.g. paragraph 0113, generating thumbnail information which briefly describes objects including label, characteristics of object, index, etc.; paragraph 0114, generating additional information corresponding to objects including type information, generation time, page, document name, size information, relative position, tag, etc.; paragraph 0115, matching thumbnail information and additional information for object, storing matching information, e.g. matching table; paragraph 0247, thumbnail information of plurality of objects is information which describes objects including the label, thumbnail image, index, etc.; paragraph 0276, additional information of objects includes type information, generation time, page, document name, size, relative position, tag, etc.; i.e. where the thumbnail information and additional information generated and stored for each identified object are analogous to metadata for each object, and where this metadata includes at least a label, which is based on (such as identical to) the classification/type label generated by the machine learning algorithm, the label in the metadata is analogous to a metadata name of the object based on the classification label; it is noted that other components of the thumbnail and additional information for each object also appear to be analogous to a metadata name, such as tag information, index information, etc.); and 
displaying the metadata name of the object in an object list of the application file, wherein the object list enumerates one or more objects of the portion of the application file (e.g. paragraph 0120, displaying labels indicating object types; paragraph 0125, displaying handwritten content including plurality of objects in first area of screen, displaying list of thumbnail information corresponding to the plurality of objects in a second area of the screen; paragraph 0251, display handwritten content including plurality of objects and list of thumbnail information, as shown in Fig. 22; paragraph 0261, displaying thumbnail information of objects included in the first handwritten content in 2210 in the thumbnail information list 2220; paragraph 0263-0267, as shown in Fig. 22, thumbnail information 2221-2225 for each object may include label indicating characteristics of respective object, number of lines, summary, first three words, thumbnail image etc.; paragraph 0277, displaying additional information of objects corresponding to thumbnail information near thumbnail information list, as shown in Fig. 24, and further described in paragraphs 0280-0281; additional information displayed includes, e.g., type information, generation time information, page including the corresponding object, name of document including the object, size information, relative position information, tags, etc.).
Although Lee additionally teaches that the object may comprise an image and/or text (e.g. paragraph 0145, defining handwriting content as figure object 410, text object 420, list object 430, as shown in Fig. 4), Lee does not explicitly disclose wherein when the object comprises a text, the classification label comprises a description of subject matter of the text based upon an evaluation of a meaning of one or more words of the text.  However, Lavergne teaches wherein when the object comprises a text, the classification label comprises a description of subject matter of the text based upon an evaluation of a meaning of one or more words of the text (e.g. col. 9 lines 43-58, parsing contents of text segment to develop semantic understanding; analyzing associated content, i.e. metadata, determining classification labels, automatically classifying without human intervention; col. 10 lines 18-35, classifying text segment; system label “WRITING” identifying topic of the text segment, i.e. focus of the text segment is on writing ability; col. 11 lines 17-26, processing and analyzing contents of text segment such as terms included, sentence structures; generating text analysis data using NLP techniques; semantic scores to represent linguistic or syntactic attributes; col. 12 lines 2-11, assigning classification labels “INTERACTION” and “ENTHUSIASTIC” based on presence of dialog and presence of positive connotation associated with included terms; col. 17 lines 10-22, individual terms within submitted text segment analyzed by natural language processor and compared against other terms included within repository; providing classification labels as recommended classification labels; i.e. the system applies processing techniques to develop a semantic understanding of a given text segment (analogous to an evaluation of a meaning of one or more words) and generates/recommends labels which provide a topical/subject matter description of the text based on this processing).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee and Lavergne in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), to incorporate the teachings of Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments) to include the capability to generate, as the classification label for a text object (i.e. as taught by Lee), a label which provides an identification of a topic/subject matter of the text, based on processing and analysis of the text to develop a semantic understanding of the text.  One of ordinary skill would have been motivated to make such a modification in order to improve storage and retrieval of text segments, eliminate the necessity to use otherwise computationally-intensive processing, and reduce the storage space necessary to store sufficient information about a text segment to identify granular information about the text segment or retrieve content associated with the text segment as described in Lavergne (abstract; col. 2 line 66-col. 3 line 27;  col. 4 lines 17-20).
Lee and Lavergne do not explicitly disclose wherein the classification label is based at least in part on an association between the object and one or more additional objects in the application file, wherein (with respect to claim 1) the one or more additional objects comprise a second object with a different object type than the object, (with respect to claim 26) independent of a user input classifying the one or more additional objects, and (with respect to claim 32) the application file association indicating that the one or more additional objects provide context to the object.  However, Yee teaches wherein the classification label is based at least in part on an association between the object and one or more additional objects in the application file, wherein (with respect to claim 1) the one or more additional objects comprise a second object with a different object type than the object, (with respect to claim 26) independent of a user input classifying the one or more additional objects, and (with respect to claim 32) the application file association indicating that the one or more additional objects provide context to the object (e.g. col. 5 lines 10-22, relevant text identified for an image using image classification model that generates relevance score for image to text; single word or string of words identified as textual unit that is relevant to the image; col. 7 lines 16-51, image classification subsystem identifies text 166 that appears with the image 164 on the web page 162 as being associated with the image 164, and parses identified text into n-grams; obtaining image classification models that have been trained for the set of n-grams, providing feature vector for image 164 as input to each model, receiving scores indicating whether the image is positive or negative for each of the n-grams; n-grams for which the image is positive are identified as high confidence labels for the image, which are associated with the image 164, such as by storing or indexing the high confidence labels at memory locations corresponding to the image; col. 14 lines 25-55, describing Fig. 3, image classified relative to n-grams defining candidate labels that are identified from text that is associated with the image; text associated with the image can be text that appears on web page with the image; text associated with the image is parsed into one or more unique candidate labels; classification models obtained for each label, image classified according to each of the image classification models and labels corresponding to the candidate labels; text associated with the image is obtained as shown in step 302; text is text that appears on a web page with the image, such as text that appears within a threshold number of pixels of the image, which can be identified as potentially relevant to the image and obtained; col. 15 lines 3-6, the text is parsed into candidate labels; col. 15 lines 57-62, classification scores computed for the image based on models; classification score for the image relative to a candidate label; col. 16 lines 3-27, image classified based on classification scores; image associated with labels corresponding to candidate labels for which image is classified as positive image; label is text that matches candidate label and is associated with the image by storing the text at a memory location corresponding to the image; or data flag indicating image is associated with topic; i.e. where the image is analogous to a first object and text is analogous to a second object having a different type, and classification labels are automatically generated from the content of the text and are associated with the image based on the text being related to the image (such as being within a threshold distance of the image in the file/webpage or some other association), this is analogous to the classification label (for the image) being based at least in part on an association between the object (the image) and one or more additional objects (text) in the application file (the webpage), wherein the one or more additional objects comprise a second object with a different object type than the object (text as opposed to image/picture), independent of a user input classifying the one or more additional objects (where the classification is performed automatically without user input), and the application file association indicating that the one or more additional objects provide context to the object (such as text being within a threshold number of pixels of the image within the webpage being considered potentially relevant, and therefore providing context, to the image)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, and Yee in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning) and Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), to incorporate the teachings of Yee (directed to image classification) to include the capability to generate classification labels for a first object in an application file (such as an image in a webpage as taught by Yee) based on other objects in the application file which are associated with the first object (such as text which is relevant to the image in the webpage), including where the first object and other objects have different types (such as an image/photograph and text), performed automatically and therefore independent of user input classifying the other objects, and where the classification labels are based on an application file association between the objects in the application file (such as the text and image being within a threshold number of pixels of one another within the file, as taught by Yee).  One of ordinary skill would have been motivated to make such a modification in order to provide the capability to compute relevance scores for each individual word associated with an image, such that images are associated with high confidence labels, boosting the relevance scores to a search query for images with labels matching the query (i.e. allowing for more relevant image search results to be provided in response to a query) as described in Yee (col. 2 lines 49-57).
Lee, Lavergne, and Yee do not explicitly disclose that the portion of the application file is a slide, and the application file is a presentation file.  However, Kikin-Gil teaches that the portion of the application file is a slide, and the application file is a presentation file (e.g. paragraph 0014, operations executed by neural networks or machine-learning processing; paragraph 0017, users creating or editing digital presentation documents including slide-based presentations; paragraph 0019, detecting completion of drag and drop action that comprises user placing first data object onto a second data object; paragraph 0021, drag and drop action analyzed by drag and drop evaluation model which may be a machine-learning model; paragraph 0022, drag and drop model generating inferences, analyzing attributes of data objects, and evaluating relationships between the data objects of the drag and drop action; paragraph 0023, dragging name text object on photo object containing multiple people and moving cursor above specific individual prompting an offering to user that text as a tag or a title; dragging text over group of images and moving text over one of the images might associated that text with that specific image and offer to use as a caption for that specific image; paragraph 0024, drag and drop evaluation model configured to identify types of data objects involved in drag and drop action, inference rules set for working with types of content/objects; paragraph 0025, salient region detection processing recognizing placement options; inference made to add text data object as a caption of another data object/image; paragraph 0028, image data objects arranged into presentation format, i.e. slideshow presentation; paragraph 0031, inference rules to evaluate characteristics of objects; proposing to use initial item in meaningful way such as label/caption/associated text for second item; paragraph 0034, recognizing unified theme related to grouped content, and proposing a label, etc. for the group of objects; paragraph 0046, slideshow of image objects; paragraph 0051, slide presentation application may be used in accordance with described invention; i.e. where a user, while editing a slide in a presentation application, drags an object onto another object, the system detects this operation and, using a machine learning model, analyzes attributes of the objects, evaluates relationships of the objects, and makes inferences about the objects, including determining types of the objects, proposing a label for the objects, proposing that a text object may be a caption for an image object, etc., analogous to detecting an input inserting an object into a slide of a presentation file and utilizing a content classifier to provide a classification label (i.e. determination of relevant type, and/or actual proposed label) for the object(s)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, and Kikin-Gil in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), and Yee (directed to image classification), to incorporate the teachings of Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application) to include the capability to implement the system of Lee (i.e. including detecting user input of an object in an application file, generating a classification label for the object, etc.) in the context of a slide in a presentation application, such that when the user inputs the objects to a slide within the presentation application file, machine learning models are applied to evaluate the objects (i.e. as taught by both Lee and Kikin-Gil), including to generate classification labels for the objects (i.e. as taught by Lee, Lavergne, Yee, and Kikin-Gil).  One of ordinary skill would have been motivated to make such a modification in order to provide a plurality of technical advantages including generation of composite data objects, content mobility, improved processing efficiency for computing devices utilized for generating and managing content for drag and drop operations, generation and utilization of an exemplary drag and drop evaluation model, generation and application of inference rules for generating inferences about a drag and drop action, and improved user interaction and productivity as described in Kikin-Gil (paragraph 0013).
With respect to claims 21, 27, and 34, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claims 1, 26, and 32 as previously discussed, and Lavergne further teaches wherein the classification label is based upon metadata of the object, the metadata comprising a name of the object, a caption associated with the object, or both (e.g. col. 9 lines 43-58, parsing contents of text segment to develop semantic understanding; analyzing associated content, i.e. metadata, and using predicted attributes for determining classification labels, automatically classifying without human intervention; col. 9 line 60-col. 10 line 17, analyzing content associated with text segment such as document or other resource that includes the text segment; accessing stored metadata to obtain metadata relevant to online resource of text segment; metadata identifies author of quote included within text segment and title of article that includes the quote, or in other cases URL for page including text segment or product associated with text segment; i.e. the system determines metadata associated with the text segment, such as the title of a work which includes the text segment or a name of an author of the text segment, where the title of the work including the text or name of the author of the segment are analogous to a name of the text segment, under the broadest reasonable interpretation).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Yee, Kikin-Gil, and Lavergne in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments) to include the capability to generate, as the classification label for a text object (i.e. as taught by Lee), a label which is based on metadata associated with the object such as a title of a work that the object is contained within, a name of a person who created the object such as an author, etc.  One of ordinary skill would have been motivated to make such a modification in order to improve storage and retrieval of text segments, eliminate the necessity to use otherwise computationally-intensive processing, and reduce the storage space necessary to store sufficient information about a text segment to identify granular information about the text segment or retrieve content associated with the text segment as described in Lavergne (abstract; col. 2 line 66-col. 3 line 27;  col. 4 lines 17-20).
With respect to claims 24, 30, and 37, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claims 1, 26, and 32 as previously discussed, and Lavergne further teaches wherein the evaluation of the meaning of the one or more words of the text comprises an identification and analysis of nouns, verbs, or both, of the one or more words (e.g. col. 9 lines 43-58, parsing contents of text segment to develop semantic understanding; col. 11 lines 17-26, processing and analyzing contents of text segment such as terms included, sentence structures; generating text analysis data using NLP techniques; semantic scores to represent linguistic or syntactic attributes; col. 11 lines 49-56, scores representing number of different parts of speech, adjectives, nouns, adverbs, verbs, etc.; col. 14 lines 4-14, classification labels assigned based on linguistic or syntactic attributes of text segment, including terms included, parts of speech present, linguistic complexity, sentence structure and arrangement, etc.; col. 17 lines 10-22, individual terms within submitted text segment analyzed by natural language processor and compared against other terms included within repository; providing classification labels as recommended classification labels).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Yee, Kikin-Gil, and Lavergne in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments) to include the capability to generate, as the classification label for a text object (i.e. as taught by Lee), a label which provides an identification of a topic/subject matter of the text, based on processing and analysis of the text to develop a semantic understanding of the text, such as by evaluating the meaning of various portions of the text including nouns, verbs, and other parts of the text.  One of ordinary skill would have been motivated to make such a modification in order to improve storage and retrieval of text segments, eliminate the necessity to use otherwise computationally-intensive processing, and reduce the storage space necessary to store sufficient information about a text segment to identify granular information about the text segment or retrieve content associated with the text segment as described in Lavergne (abstract; col. 2 line 66-col. 3 line 27;  col. 4 lines 17-20).
With respect to claim 39, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claim 32 as previously discussed, and Yee further teaches wherein the application file association comprises a placement of the object and the one or more additional objects indicating that the one or more additional objects provide a context to the object (e.g. col. 14 lines 49-55, obtaining text associated with the image; text is text that appears on webpage with the image; text that appears within a threshold number of pixels of the image identified as potentially relevant to the image and obtained).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Kikin-Gil, and Yee in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), to incorporate the teachings of Yee (directed to image classification) to include the capability to determine that objects are associated with one another within an application file (such as a webpage) based on relative positions/placements of the objects in the application file (such as determining the objects as being relevant to one another based on the objects being within a threshold number of pixels of one another).  One of ordinary skill would have been motivated to make such a modification in order to provide the capability to compute relevance scores for each individual word associated with an image, such that images are associated with high confidence labels, boosting the relevance scores to a search query for images with labels matching the query (i.e. allowing for more relevant image search results to be provided in response to a query) as described in Yee (col. 2 lines 49-57).
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil, further in view of Corrado et al. (US 20150178383 A1).
With respect to claim 25, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claim 1 as previously discussed. Although Lee additionally teaches that the object may comprise an image and/or text (e.g. paragraph 0145, defining handwriting content as figure object 410, text object 420, list object 430, as shown in Fig. 4), Lee and Lavergne do not explicitly disclose wherein when the object comprises an image, the classification label is based on subject matter contained in the image.
However, Corrado teaches wherein when the object comprises an image, the classification label is based on subject matter contained in the image (e.g. paragraph 0022-0023, receiving input data object, generating scores for categories representing likelihood that input data object belongs to category; if data objects are image, generating respective score for each of predetermined set of object categories, representing likelihood that input image includes image of object that belongs to the object category; categories may be generic or specific, such as “horses,” “George Washington,” generic numbers category, specific categories for each digit from zero to nine, etc; each object category is associated with a label; paragraph 0025, receiving classification scores for data objects and generating label data for the data object).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, Kikin-Gil, and Corrado in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Corrado (directed to classifying data objects) to include the capability to generate, as the classification label for an image object (i.e. as taught by Lee), a label which is based on the subject matter of the image, such as by identifying one or more categories for the image based on the content/subject matter in the image and generating a label for the image based on identified categories.  One of ordinary skill would have been motivated to make such a modification in order to provide the capability to accurately predict category labels for data objects, including the accuracy of zero-shot predictions, to syntactically or semantically relate inaccurately predicted labels to a correct label, to easily predict labels that are specific, generic, or both for a given data object, and to improve the accuracy of initial data object classification systems without training the system further and without any significant increase in computing resources used by the initial classification system as described in Corrado (paragraph 0007).
Claims 31, 38, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil, further in view of Stanton et al. (US 20160364419 A1).
With respect to claim 40, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claim 1 as previously discussed.  Lee additionally teaches that the various objects in the page may comprise an image and/or text (e.g. paragraph 0145, defining handwriting content as figure object 410, text object 420, list object 430, as shown in Fig. 4).  In addition Yee further teaches wherein the different object type comprises an image object type (e.g. col. 5 lines 10-22, relevant text identified for an image using image classification model that generates relevance score for image to text; single word or string of words identified as textual unit that is relevant to the image; col. 7 lines 16-51, image classification subsystem identifies text 166 that appears with the image 164 on the web page 162 as being associated with the image 164, and parses identified text into n-grams; obtaining image classification models that have been trained for the set of n-grams, providing feature vector for image 164 as input to each model, receiving scores indicating whether the image is positive or negative for each of the n-grams; n-grams for which the image is positive are identified as high confidence labels for the image, which are associated with the image 164, such as by storing or indexing the high confidence labels at memory locations corresponding to the image; i.e. where text is displayed as an image on the screen and, therefore, under the broadest reasonable interpretation, is a type of image object).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Kikin-Gil, and Yee in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), to incorporate the teachings of Yee (directed to image classification) to include the capability to determine that objects are associated with one another within an application file (such as a webpage) based on relative positions/placements of the objects in the application file (such as determining the objects as being relevant to one another based on the objects being within a threshold number of pixels of one another), where the objects may be of different types, including different types of images (i.e. a textual image object and another type of image object such as a photograph, figure, drawing, illustration, etc.).  One of ordinary skill would have been motivated to make such a modification in order to provide the capability to compute relevance scores for each individual word associated with an image, such that images are associated with high confidence labels, boosting the relevance scores to a search query for images with labels matching the query (i.e. allowing for more relevant image search results to be provided in response to a query) as described in Yee (col. 2 lines 49-57).
Assuming arguendo that Lee, Lavergne, and Yee do not explicitly disclose wherein the different object type comprises an image object type, Stanton teaches wherein the different object type comprises an image object type (e.g. paragraph 0014, text and image data of data set retrieved; dataset corresponds to product for sale and includes information about corresponding product; tags are generated by inputting image and text data into classifiers; rather than merely indexing text information specified in datasets, additional information generated by analyzing image of product to identify tags/attributes about contents of the image; dataset describes clothing item with single overall color in text data record of dataset, image recognition on photograph of the item identifies additional colors/patterns; paragraph 0016, extracting additional information about dataset using images; new information output as tags to be associated with the dataset; paragraph 0022, dataset includes information about specific product such as color, price, text description, title, image, etc.; each dataset is a document; paragraph 0023, extracting text content from dataset; paragraph 0024, images included in datasets and are extracted for analysis; paragraph 0025, automatically generating tags for dataset, where tag is descriptive of subject of corresponding dataset and is generated by recognizing content of an image of the subject of the dataset; paragraph 0029, tags of data set identified using classifier; classifier using as inputs text data and image of the dataset; product category determined using both image of product and product title included in dataset; paragraph 0030, tags identified with sufficient confidence associated with dataset; paragraph 0031, generating additional tags using subclassifiers, such as color of specific object component included in an image, property of color, quality of color, darkness/lightness of color, feature/neckline/length/sleeve property of clothing, etc.; paragraph 0032, additional tags with sufficient confidence associated with dataset; paragraphs 0036-0037, selecting text data in data set as input, recognizing properties of text data, identifying tags for dataset; paragraphs 0038-0040, images of dataset selected for input, such as image included in dataset and depicting subject of dataset, selected to generate additional tags for dataset; processing images to generate output using neural network, where output provided to identify tags for dataset; result of image processing utilized as input to identify tags of dataset; by utilizing both image and text data, accuracy of identified tags is improved; paragraphs 0044-0045, determining whether content of dataset is inconsistent with received tags; text description in dataset may be incorrect such as describing incorrect color of product as red when the actual product is blue in color as evidenced in photographs of the product; verifying each tag is consistent with text attribute in dataset; resolving inconsistency, such as by modifying content of dataset to be consistent with the inconsistent tag, removing inconsistent portion of dataset, etc.; i.e. where the dataset is provided as a document including both text content and image content (i.e. objects of different types), the dataset as a whole (and therefore including the text content) may be assigned classification labels/tags based on an analysis of the content of the image object in the document (i.e. where the image object is an additional object of a different type than the text content, which is assigned classification labels/tags based on the image object by virtue of the assignment of the label/tag to the document/data set as a whole, and also possibly as part of resolving an inconsistency between the tag derived from the image object and the textual description)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, Kikin-Gil, and Stanton in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Stanton (directed to image and text data hierarchical classifiers) to include the capability to provide classification labels/tags for content objects of one type in a document (i.e. the data set of Stanton), such as text (and/or for the content as a whole, including the text content), based on a content object of a different type in the same document, such as an image (i.e. determining relevant tags for the image included in the document and assigning those tags to the document as a whole, including to the text included within the document, as taught by Stanton).  One of ordinary skill would have been motivated to make such a modification in order to jointly utilize text data and image data of a data set in order to improve the accuracy and detection of automatically generated descriptive tags of the dataset as described in Stanton (paragraph 0015, 0040).
With respect to claims 31 and 38, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claims 26 and 32 as previously discussed. Lee additionally teaches that the various objects in the page may comprise an image and/or text (e.g. paragraph 0145, defining handwriting content as figure object 410, text object 420, list object 430, as shown in Fig. 4).  In addition, Yee further teaches wherein when the one or more additional objects comprise an image, the classification label is based on subject matter contained in the image (e.g. col. 5 lines 10-22, relevant text identified for an image using image classification model that generates relevance score for image to text; single word or string of words identified as textual unit that is relevant to the image; col. 7 lines 16-51, image classification subsystem identifies text 166 that appears with the image 164 on the web page 162 as being associated with the image 164, and parses identified text into n-grams; obtaining image classification models that have been trained for the set of n-grams, providing feature vector for image 164 as input to each model, receiving scores indicating whether the image is positive or negative for each of the n-grams; n-grams for which the image is positive are identified as high confidence labels for the image, which are associated with the image 164, such as by storing or indexing the high confidence labels at memory locations corresponding to the image; i.e. where text is displayed as an image on the screen and, therefore, under the broadest reasonable interpretation, is a type of image object, and the classification label generated for the other image object (i.e. a non-textual image object) is generated based on the subject matter contained in the text, such as words in the text).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Kikin-Gil, and Yee in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), to incorporate the teachings of Yee (directed to image classification) to include the capability to determine that objects are associated with one another within an application file (such as a webpage) based on relative positions/placements of the objects in the application file (such as determining the objects as being relevant to one another based on the objects being within a threshold number of pixels of one another), where the objects may be of different types, including different types of images (i.e. a textual image object and another type of image object such as a photograph, figure, drawing, illustration, etc.).  One of ordinary skill would have been motivated to make such a modification in order to provide the capability to compute relevance scores for each individual word associated with an image, such that images are associated with high confidence labels, boosting the relevance scores to a search query for images with labels matching the query (i.e. allowing for more relevant image search results to be provided in response to a query) as described in Yee (col. 2 lines 49-57).
Assuming arguendo that Lee, Lavergne, and Yee do not explicitly disclose wherein when the one or more additional objects comprise an image, the classification label is based on subject matter contained in the image, Stanton teaches wherein when the one or more additional objects comprise an image, the classification label is based on subject matter contained in the image (e.g. paragraph 0014, text and image data of data set retrieved; dataset corresponds to product for sale and includes information about corresponding product; tags are generated by inputting image and text data into classifiers; rather than merely indexing text information specified in datasets, additional information generated by analyzing image of product to identify tags/attributes about contents of the image; dataset describes clothing item with single overall color in text data record of dataset, image recognition on photograph of the item identifies additional colors/patterns; paragraph 0016, extracting additional information about dataset using images; new information output as tags to be associated with the dataset; paragraph 0022, dataset includes information about specific product such as color, price, text description, title, image, etc.; each dataset is a document; paragraph 0023, extracting text content from dataset; paragraph 0024, images included in datasets and are extracted for analysis; paragraph 0025, automatically generating tags for dataset, where tag is descriptive of subject of corresponding dataset and is generated by recognizing content of an image of the subject of the dataset; paragraph 0029, tags of data set identified using classifier; classifier using as inputs text data and image of the dataset; product category determined using both image of product and product title included in dataset; paragraph 0030, tags identified with sufficient confidence associated with dataset; paragraph 0031, generating additional tags using subclassifiers, such as color of specific object component included in an image, property of color, quality of color, darkness/lightness of color, feature/neckline/length/sleeve property of clothing, etc.; paragraph 0032, additional tags with sufficient confidence associated with dataset; paragraphs 0036-0037, selecting text data in data set as input, recognizing properties of text data, identifying tags for dataset; paragraphs 0038-0040, images of dataset selected for input, such as image included in dataset and depicting subject of dataset, selected to generate additional tags for dataset; processing images to generate output using neural network, where output provided to identify tags for dataset; result of image processing utilized as input to identify tags of dataset; by utilizing both image and text data, accuracy of identified tags is improved; paragraphs 0044-0045, determining whether content of dataset is inconsistent with received tags; text description in dataset may be incorrect such as describing incorrect color of product as red when the actual product is blue in color as evidenced in photographs of the product; verifying each tag is consistent with text attribute in dataset; resolving inconsistency, such as by modifying content of dataset to be consistent with the inconsistent tag, removing inconsistent portion of dataset, etc.; i.e. where the dataset is provided as a document including both text content and image content (i.e. an object and one or more additional objects), the dataset as a whole (and therefore including the text content) may be assigned classification labels/tags based on an analysis of the content/subject matter of the image object in the document (i.e. where the image object is an additional object of a different type than the text content, which is assigned classification labels/tags based on the image object by virtue of the assignment of the label/tag to the document/data set as a whole, and also possibly as part of resolving an inconsistency between the tag derived from the image object and the textual description, such as identifying a tag based on the content/subject matter of the image which describes various attributes of the product which is also described by the corresponding text)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, Kikin-Gil, and Stanton in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Stanton (directed to image and text data hierarchical classifiers) to include the capability to provide classification labels/tags for content objects of one type in a document (i.e. the data set of Stanton), such as text (and/or for the content as a whole, including the text content), based on subject matter contained in an additional content object of a different type in the same document, such as an image (i.e. determining relevant tags for the image included in the document based on the subject matter contained in the image and assigning those tags to the document as a whole, including to the text included within the document, as taught by Stanton).  One of ordinary skill would have been motivated to make such a modification in order to jointly utilize text data and image data of a data set in order to improve the accuracy and detection of automatically generated descriptive tags of the dataset as described in Stanton (paragraph 0015, 0040).
Claims 2 and 33 rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil, further in view of Johnson et al. (US 20050027664 A1).
With respect to claims 2 and 33, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claim 1 as previously discussed, and Lee further teaches filtering the classification label according to one or more predefined filters, one or more user-defined filters, or a combination thereof (e.g. paragraph 0105, grouping handwriting according to segmentation level, which may be “document,” “text/non-text,” “text/figure/list/table,” “paragraph,” “text-line,” “word,” “character,” “stroke,” etc.; paragraph 0106, when certain segmentation level is “text/non-text” strokes are grouped based on having text characteristics and non-text characteristics, and defined as text objects and non-text objects; paragraph 0107, when segmentation level is “text line” content is grouped into objects for first line, second line, third line, etc.; paragraph 0109, adjusting segmentation level according to user input; i.e. the selected segmentation level determines how objects are grouped and, therefore, which labels/classifications are applied to the objects as described in paragraphs 0139-0141; that is, when a “text/non-text” segmentation level is applied, labels for objects may be “text” or “non-text”; where a “text/figure/list/table” segmentation level is applied, labels for objects may be “text,” “figure,” “list,” “table,” as shown in Figs. 4 and 22; therefore, the segmentation level, which is user-selectable, acts as a filter for how the objects are classified, and ultimately which labels are applied to them;  paragraph 0292, Fig. 26, filter 2600 used for selecting retrieval condition, including retrieval category including object type; paragraph 0293, selecting object type 2610 in filter, displaying list of types, selecting type such as “Figure,” etc., retrieving objects having the selected type).
Lee does not explicitly disclose wherein the one or more predefined filters, the one or more user-defined filters, or a combination thereof are for filtering out derogatory language, metadata names previously overridden, or any combination thereof.  However, Johnson teaches wherein the one or more predefined filters, the one or more user-defined filters, or a combination thereof are for filtering out derogatory language, metadata names previously overridden, or any combination thereof (e.g. paragraph 0002, identifying, demarcating and labeling (i.e. annotating) information in textual data; paragraph 0030, user labels data, learner learns in the background concurrently; paragraph 0033, labeling named entity instances; paragraph 0035, system annotating data based on training data; presenting learned annotations to user for evaluation and correction if needed; paragraph 0036, maintaining a log of user corrections such that if a person removes an annotation instance or alters the class name of annotation instances, and the system later attempts to re-annotate incorrectly, system overrides the learning algorithm’s assignments; i.e. where the system applies the log of user corrections as a filter in order to filter out previously-corrected annotations/labels which would otherwise be suggested/applied to the content).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, Kikin-Gil, and Johnson in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Yee (directed to image classification), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), to incorporate the teachings of Johnson (directed to an interactive machine learning system for automated annotation of information in text) to include the capability to maintain a log of user corrections to, modifications of, and removals of, system-assigned labels/annotations (i.e. produced via machine learning algorithm and applied to inserted content as taught by Lee), and to further apply the log of user corrections as a filter in order to filter out previously-corrected annotations/labels (i.e. by the user) which would otherwise be suggested/applied to the content (i.e. by the machine learning algorithm of Lee).  One of ordinary skill would have been motivated to perform such a modification in order to reduce the amount of manual labor and level of expertise required to train annotators of text documents, etc. as described in Johnson (paragraph 0026).
Claims 23, 29, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil, further in view of Stewart (US 20110145327 A1).
With respect to claims 23, 29, and 36, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claims 1, 26, and 32 as previously discussed.  Lee and Lavergne do not explicitly disclose wherein updating the metadata name of the object comprises replacing the metadata name correctly describing the object with a more specific name correctly describing the object with relatively more detail.
However, Stewart teaches wherein updating the metadata name of the object comprises replacing the metadata name correctly describing the object with a more specific name correctly describing the object with relatively more detail (e.g. paragraphs 0025-0026, describing Figs. 22-23, tag created for local library; tag created which includes richer information and can be used to replace or flesh out the tag; paragraph 0066, creating tag data structures that describe people, places, groups/organizations, activities, interests, groups of interests, organization types, other complex entities; required and optional attributes; name and type may be required attributes; topic represented by tag data structure; tag type indicating type of concept the tag represents; paragraph 0067, associating tag data structure to item of media; paragraph 0068, interconnectedness of tag data structures allowing variety of different kinds of relevant information to be returned as contextual information relating to media items associated with a tag; paragraph 0069, local tag created within application instance prior to time when person identified by tag is aware or otherwise has provided data to be used in creation or maintenance of tag; at a later time, information provided by that person can supplement or replace the tag first created locally in the application instance; paragraph 0080, associations between content items and persons indicating relevance between them; using associations to display web of context for given media item, concept, or entity; paragraph 0088, relationships between tags; paragraph 0096, users assigning free form text tags to photos; tags existing in taxonomy have more meaning; paragraph 0097, creating tag taxonomy and using structure in association with media items; paragraph 0104, synchronization of changes between tags; paragraphs 0120-0121, describing Figs. 22 and 23, tag created by person who does not know the subject of the tag very well, which exists in local application instance of tag creator and used to tag media items; more complete tag structure including complete information shared and its information can be propagated into local application instance where less detailed tag currently resides; paragraphs 0133-0134, selecting labels/tags to be displayed based on subjective point of view of viewer, such as labeling a picture of a person as “Dad” instead of “Bill” or “Grandma Stewart” instead of “Vickie” when the viewer is the pictured person’s child or grandchild, using contextual information specific to the viewpoint of the present viewer, as shown in Figs. 12-13; paragraphs 0135-0137, tagging media in a more sophisticated way, based on defined relationships between tags and context; i.e. where replacing a less detailed tag created at a time before complete information was available with a more detailed tag created after complete information becomes available, or changing displayed labels for media items based on the subjective point of view of the viewer are both analogous to replacing a metadata name correctly describing the object with a more specific name correctly describing the object with relatively more detail).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, Kikin-Gil, and Stewart in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Stewart  (directed to contextualizing and liking media items using tagging to allow description of a concept from multiple points of view and contexts) to include the capability store, as a first metadata name for a given content item, a tag including relatively less specific/detailed/complete information at a time when complete information is not available and to subsequently, upon the complete information becoming available (i.e. such as after the content of the content item has been classified and assigned labels as taught by Lavergne and Ball), replace the less specific/detailed/complete tag information with the more complete/specific/detailed tag information, and to further include the capability to actively substitute displayed names for content items based on contextual and relationship information associated with the tag and the viewpoint of the current viewer (i.e. thereby replacing a more generic name for a displayed entity, such as a person’s name, with a more specific name for that entity, such as a name which also indicates the relationship of that entity to the viewer, as taught by Stewart).  One of ordinary skill would have been motivated to perform such a modification in order to enhance contextualization of media idents and ease of use as described in Stewart (abstract).
Claim 41 is rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil, further in view of Kannan et al. (US 20120314941 A1).
With respect to claim 41, Lee in view of Lavergne, further in view of Yee, further in view of Kikin-Gil teaches all of the limitations of claim 1 as previously discussed.  Lee additionally teaches that the various objects in the page may comprise an image and/or text (e.g. paragraph 0145, defining handwriting content as figure object 410, text object 420, list object 430, as shown in Fig. 4).  In addition Yee further teaches wherein the different object type comprises an image object type (e.g. col. 5 lines 10-22, relevant text identified for an image using image classification model that generates relevance score for image to text; single word or string of words identified as textual unit that is relevant to the image; col. 7 lines 16-51, image classification subsystem identifies text 166 that appears with the image 164 on the web page 162 as being associated with the image 164, and parses identified text into n-grams; obtaining image classification models that have been trained for the set of n-grams, providing feature vector for image 164 as input to each model, receiving scores indicating whether the image is positive or negative for each of the n-grams; n-grams for which the image is positive are identified as high confidence labels for the image, which are associated with the image 164, such as by storing or indexing the high confidence labels at memory locations corresponding to the image; i.e. where text is displayed as an image on the screen and, therefore, under the broadest reasonable interpretation, is a type of image object).  Further, Kikin-Gil teaches that the object may be a text object, the second object may be an image object, and a label may be generated based on both of the image and text objects on the slide (e.g. paragraph 0014, operations executed by neural networks or machine-learning processing; paragraph 0017, users creating or editing digital presentation documents including slide-based presentations; paragraph 0023, dragging name text object on photo object containing multiple people and moving cursor above specific individual prompting an offering to user that text as a tag or a title; dragging text over group of images and moving text over one of the images might associated that text with that specific image and offer to use as a caption for that specific image; paragraph 0024, drag and drop evaluation model configured to identify types of data objects involved in drag and drop action, inference rules set for working with types of content/objects; paragraph 0025, salient region detection processing recognizing placement options; inference made to add text data object as a caption of another data object/image; paragraph 0034, recognizing unified theme related to grouped content, and proposing a label, etc. for the group of objects; i.e. a type/label for the dragged text object may be determined (such as a determination that the text is a caption relevant to an image) and/or the text and image may be determined to be grouped content and a label for both the text and the image may be determined).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, and Kikin-Gil in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), and Yee (directed to image classification), to incorporate the teachings of Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application) to include the capability to implement the system of Lee (i.e. including detecting user input of an object in an application file, generating a classification label for the object, etc.) in the context of a slide in a presentation application, such that when the user inputs the objects, such as a text object, to a slide within the presentation application file, which contains other objects such as image objects, machine learning models are applied to evaluate the objects (i.e. as taught by both Lee and Kikin-Gil), including to generate classification labels for the objects (i.e. as taught by Lee, Lavergne, Yee, and Kikin-Gil).  One of ordinary skill would have been motivated to make such a modification in order to provide a plurality of technical advantages including generation of composite data objects, content mobility, improved processing efficiency for computing devices utilized for generating and managing content for drag and drop operations, generation and utilization of an exemplary drag and drop evaluation model, generation and application of inference rules for generating inferences about a drag and drop action, and improved user interaction and productivity as described in Kikin-Gil (paragraph 0013).
Assuming arguendo that Lee, Lavergne, Yee, and Kikin-Gil do not explicitly disclose wherein the object comprises a text object and the classification label is based at least in part upon an image object, Kannan teaches wherein the object comprises a text object and the classification label is based at least in part upon an image object (e.g. paragraph 0004, product images used in conjunction with textual descriptions to improve classifications of product offerings; using image signals to complement text classifiers and improve overall classification; paragraph 0017, Fig. 2, product offering 202 presented in both text 204 and image 206; product offering 212 presented in both text 214 and image 216; paragraph 0021, almost all products have associated image in addition to text; images can be used to provide additional clues that, when used in conjunction with available text, are able to improve classification; even if textual descriptions are uninformative, associated images contain discernable clues that can be utilized to form better classifiers; paragraph 0022, combining classifiers (text and image) to improve classification; paragraph 0024, text classifier trained on text features; paragraph 0025, image classifier trained on image features; paragraph 0026, text and image probability distributions concatenated to create MDFS with portions that effectively capture uncertainty in category prediction for text and image classifiers, and another classifier is learned using this MDFS, constituting a single large multi-way classifier that learns to predict labels using both text and image features; paragraph 0031, text and image training sets used to improve automated classifiers including text-only classifiers; modifying steps 324 and 424 of Figs. 3 and 4 to, after concatenating text and image probability distributions to the MDFS, learn a new classifier on the MDFS for predicting labels on text features only; i.e. where the portion of the file includes both text and images, the images provided with the text can be utilized in conjunction with/to complement the text to arrive at an improved overall classification and generate a predicted label using both text and image features, such as a classification label to be applied to the text, either individually or as a part of the overall offering, analogous to generating a classification label for a text object based at least in part upon an image object included within a same file portion).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Lee, Lavergne, Yee, Kikin-Gil, and Kannan in front of him to have modified the teachings of Lee (directed to a device and method of providing handwritten content, including labeling/naming content by type using machine learning), Lavergne (directed to using NLP, machine learning, and classification techniques, including classification labeling, for storage and retrieval of text segments), Kikin-Gil (directed to utilizing machine-learning to analyze, evaluate, and make inferences regarding data objects and their relationships, such as a data object input by a user with respect to other data objects, such as in a slide of a presentation application), and Yee (directed to image classification), to incorporate the teachings of Kannan  (directed to accurate text classification through selective use of image data) to include the capability to, when the input object is a text object (i.e. as taught by Lee and Kikin-Gil, where Kikin-Gil further teaches that the object may be input into a slide of a presentation program) and a classification label is generated for the input object (i.e. as taught by at least Lee), further determine the classification label based on an associated image object, such as by utilizing an image provided with the text in conjunction with the text or to complement the text to provide an improved overall classification and generate a predicted label using both text and image features (i.e. as taught by Kannan).  One of ordinary skill would have been motivated to perform such a modification in order to provide more robust classifications, and provide for a combined text and image classifier for those instances where a text classifier is inadequate, allowing images to play a beneficial role when a text-based classifier becomes “confused,” as described in Kannan (paragraphs 0023 and 0026).
	
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain,” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting in re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (GCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co, v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert, denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F,3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir, 2005): Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEREMY STANLEY whose telephone number is (469)295-9105. The examiner can normally be reached on Mon-Thurs 8:00-5:00 CST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on (571) 270-1104. The fax phone number for the organization where this application or proceeding is assigned is 571 -273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR.
Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JEREMY L STANLEY/
Examiner, Art Unit 2179