Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
	In response to Applicant's Patent Application filed 02/28/2020, in light of the telephone interview on 07/07/2022. Claim(s) 1-20 are pending. Claim(s) 1, 11 and 20 are independent.
	The Examiner is invited to contract the undersigned for any reason related to the advancement of this case. Applicant's Attorney(s) agree to the following:  amending claim(s) 1-3, 7-9, 11-13, 17-18 and 20 to further verified Applicant’s claims invention (i.e., ...generating figure metadata from the figure image data, the figure metadata including numerical and textual values describing the set of components of the figure; for each of a set of caption types, each of the set of caption types indicating a respective component of the set of components of the figure, computing a corresponding caption type vector representing the caption type and an embedding of a slot value word associated with the caption type; for each of the set of caption types, generating a caption unit of a set of caption units based on the figure image data, the figure metadata, the corresponding caption type vector and the embedding of the slot value word each of the set of caption units corresponding to a respective component of the set of components and including a sequence of words describing that component; and combining the set of caption units to form a caption associated with the figure...) supports in page 1 paragraph 16 of the current specification of the current patent application. Claim(s) 4-6, 10, 14-16 and 19 were original. The claims have been amended in accordance with the substance of the telephone interview. Favorable consideration of the pending claims and passing them allowance is agreed upon. [see Interview Summary for details].

		Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee. 
 














EXAMINER’S AMENDMENT
The application has been amended as follows: 
 In the Claims:
	1. (Currently amended) A method, comprising:
receiving figure image data, the figure image data representing a figure, the figure having a set of components; 
generating figure metadata from the figure image data, the figure metadata including numerical and textual values describing the set of components of the figure;
for each of a set of caption types, each of the set of caption types indicating a respective component of the set of components of the figure, computing a corresponding caption type vector representing the caption type and an embedding of a slot value word associated with the caption type;
for each of the set of caption types, generating a caption unit of a set of caption units based on the figure image data, the figure metadata, the corresponding caption type vector and the embedding of the slot value word
combining the set of caption units to form a caption associated with the figure. 

2. (Currently amended) The method as in claim 1, wherein generating the caption unit includes:
performing an encoding operation on the figure image data, the figure metadata, and the caption type to produce a decoder initialization array, the decoder initialization array including [[a]]the corresponding caption type vector representing the caption type and [[an]]the embedding of [[a]]the slot value word; and
initializing a decoder with the decoder initialization array, the decoder being configured to predict the sequence of words of the caption unit. 

3. (Currently amended) The method as in claim 2, wherein performing the encoding operation includes:
obtaining a set of features of the figure image based on the figure image data, the obtaining being performed via a neural network, each of the set of features having a corresponding bounding box of a set of bounding boxes, the figure metadata including bounding box coordinates of the set of bounding boxes;
generating an encoded input structure based on the bounding box coordinates, the encoded input structure including an array of input sequences, each element of the array of input sequences corresponding to a bounding box of the set of bounding boxes;
generating, from the encoded input structure and the caption type, a set of attention weights, each of the set of attention weights corresponding to a bounding box of the set of bounding boxes and representing a likelihood of the bounding box being associated with [[a]the slot value word; and
obtaining a slot value classification result based on the set of attention weights. 

4. (Original) The method as in claim 3, wherein a caption type of the set of caption types indicates that the caption type is a label name describing a name of a text label corresponding to a bounding box of the set of bounding boxes and its position relative to a fixed location, and
wherein generating the set of attention weights includes:
forming a query vector based on the position of the text labels corresponding to each of the set of bounding boxes;
for each of the set of bounding boxes, generating a raw weight based on the array of input sequences corresponding to that bounding box and the query vector to produce a set of raw weights; and
performing a normalization operation on the set of raw weights to produce the set of attention weights, a sum of the attention weights of the set of attention weights being unity. 

5. (Original) The method as in claim 4, wherein generating the raw weight for each of the set of bounding boxes includes: 
multiplying the array of input sequences corresponding to that bounding box by a first fixed array to produce a first vector;
multiplying the query vector by a second fixed array to produce a second vector; and
applying a sigmoidal function to a sum of the first vector and the second vector. 

6. (Original) The method as in claim 3, wherein the figure metadata further includes a set of text labels, each of the set of text labels corresponding to a respective bounding box, and
wherein each array of input sequences of the encoded input structure corresponding to a respective bounding box of the set of bounding boxes includes the bounding box coordinates of the bounding box, an index indicative of the text label corresponding to the bounding box, and a binary value indicating whether the text label corresponding to the bounding box has digits only. 

7. (Currently amended) The method as in claim 6, wherein a caption type of the set of caption types indicates that the caption type describes an element of the figure that has a minimum or maximum value, and 
wherein generating the set of attention weights includes:
forming a query vector based on [[the]]-a position of the text labels corresponding to each of the set of bounding boxes;
for each of the set of bounding boxes: 
appending the array of input sequences corresponding to that bounding box with features of the figure that are in the same row and column as that indicated by the coordinates of that bounding box to produce an augmented array of input sequences;
generating a raw weight based on the augmented array of input sequences corresponding to that bounding box and the query vector to produce a set of raw weights; and 
performing a normalization operation on the set of raw weights to produce the set of attention weights, a sum of the attention weights of the set of attention weights being unity. 

8. (Currently amended) The method as in claim 3, wherein a caption type of the set of caption types indicates that the caption type describes a comparison between a first element of the figure and a second element of the figure, and 
wherein generating the set of attention weights includes:
forming a query vector based on the first element and the second element; and
performing a relation classification operation on a difference between the array of input sequences corresponding to that bounding box for the first element and the array of input sequences corresponding to that bounding box for the second element to produce an attention weight. 

9. (Currently amended) The method as in claim 3, wherein obtaining the slot value classification result based on the set of attention weights includes: 
performing a classification operation on the set of attention weights and the set of features of figure imagethe slot value word, the classification operation being configured to predict, as the slot value word, a dictionary word among a static dictionary and a dynamic dictionary, the slot value word being used to initialize the decoder. 

10. (Original) The method as in claim 1, further comprising:
 generating rules for a post-editing operation configured to produce a natural language caption unit from the caption unit; and
performing the post-editing operation on the caption associated with the figure by applying the generated rules to produce the natural language caption unit. 

11. (Currently amended) A computer program product including a non-transitory computer-readable storage medium and storing executable code that, when executed by at least one data processing apparatus, is configured to cause the at least one data processing apparatus to perform a method, the method comprising: 
receiving figure image data, the figure image data representing a figure, the figure having a set of components; 
generating figure metadata from the figure image data, the figure metadata including numerical and textual values describing the set of components of the figure;
for each of a set of caption types, each of the set of caption types indicating a respective component of the set of components of the figure, computing a corresponding caption type vector representing the caption type and an embedding of a slot value word associated with the caption type;
for each of the set of caption types, generating a caption unit of a set of caption units based on the figure image data, the figure metadata, the corresponding caption type vector and the embedding of the slot value word
combining the set of caption units to form a caption associated with the figure. 

12. (Currently amended) The computer program product as in claim 11, wherein generating the caption unit includes: 
performing an encoding operation on the figure image data, the figure metadata, and the caption type to produce a decoder initialization array, the decoder initialization array including [[a]]the corresponding caption type vector representing the caption type and [[an]]the embedding of [[a]]the slot value word; and 
initializing a decoder with the decoder initialization array, the decoder being configured to predict the sequence of words of the caption unit. 

13. (Currently amended) The computer program product as in claim 12, wherein performing the encoding operation includes: 
obtaining a set of features of the figure image based on the figure image data, the obtaining being performed via a neural network, each of the set of features having a corresponding bounding box of a set of bounding boxes, the figure metadata including bounding box coordinates of the set of bounding boxes; 
generating an encoded input structure based on the bounding box coordinates, the encoded input structure including an array of input sequences, each element of the array of input sequences corresponding to a bounding box of the set of bounding boxes; 
generating, from the encoded input structure and the caption type, a set of attention weights, each of the set of attention weights corresponding to a bounding box of the set of bounding boxes and representing a likelihood of the bounding box being associated with [[a]]the slot value word; and
obtaining a slot value classification result based on the set of attention weights. 
14. (Original) The computer program product as in claim 13, wherein a caption type of the set of caption types indicates that the caption type is a label name describing a name of a text label corresponding to a bounding box of the set of bounding boxes and its position relative to a fixed location, and 
wherein generating the set of attention weights includes: 
forming a query vector based on the position of the text labels corresponding to each of the set of bounding boxes;
for each of the set of bounding boxes, generating a raw weight based on the array of input sequences corresponding to that bounding box and the query vector to produce a set of raw weights; and 
performing a normalization operation on the set of raw weights to produce the set of attention weights, a sum of the attention weights of the set of attention weights being unity. 

15. (Original) The computer program product as in claim 13, wherein the metadata further includes a set of text labels, each of the set of text labels corresponding to a respective bounding box, and 
wherein each array of input sequences of the encoded input structure corresponding to a respective bounding box of the set of bounding boxes includes the bounding box coordinates of the bounding box, an index indicative of the text label corresponding to the bounding box, and a binary value indicating whether the text label corresponding to the bounding box has digits only. 

16. (Original) The computer program product as in claim 15, wherein a caption type of the set of caption types indicates that the caption type describes an element of the figure that has a minimum or maximum value, and 
wherein generating the set of attention weights includes:
forming a query vector based on the position of the text labels corresponding to each of the set of bounding boxes;
for each of the set of bounding boxes:
appending the array of input sequences corresponding to that bounding box with features of the figure that are in the same row and column as that indicated by the coordinates of that bounding box to produce an augmented array of input sequences; 
generating a raw weight based on the augmented array of input sequences corresponding to that bounding box and the query vector to produce a set of raw weights; and 
performing a normalization operation on the set of raw weights to produce the set of attention weights, a sum of the attention weights of the set of attention weights being unity. 

17. (Currently amended) The computer program product as in claim 13, wherein a caption type of the set of caption types indicates that the caption type describes a comparison between a first element of the figure and a second element of the figure, and 
wherein generating the set of attention weights includes: 
forming a query vector based on the first element and the second element; and
performing a relation classification operation on a difference between the array of input sequences corresponding to that bounding box for the first element and the array of input sequences corresponding to that bounding box for the second element to produce an attention weight. 

18. (Currently amended) The computer program product as in claim 13, wherein obtaining the slot value classification result based on the set of attention weights includes: 
performing a classification operation on the set of attention weights and the figure image features to produce [[a]]the slot value word, the classification operation being configured to predict, as the slot value word, a dictionary word among a static dictionary and a dynamic dictionary, the slot value word being used to initialize the decoder. 

19. (Original) The computer program product as in claim 11, further comprising: 
generating rules for a post-editing operation configured to produce a natural language caption unit from the caption unit; and 
performing the post-editing operation on the caption associated with the figure by applying the generated rules to produce the natural language caption unit. 

20. (Currently amended) An apparatus comprising:
at least one memory including instructions; and
at least one processor that is operably coupled to the at least one memory and that is arranged and configured to execute instructions that, when executed, cause the at least one processor to:
receive figure image data, the figure image data representing a figure, the figure having a set of components;
generate figure metadata from the figure image data, the figure metadata including numerical and textual values describing the set of components of the figure; 
for each of a set of caption types, each of the set of caption types indicating a respective component of the set of components of the figure, computing a corresponding caption type vector representing the caption type and an embedding of a slot value word associated with the caption type;
for each of the set of caption types, generate a caption unit of a set of caption units based on the figure image data, the figure metadata, the corresponding caption type vector and the embedding of the slot value word
combine the set of caption units to form a caption associated with the figure. 



Examiner Comment(s)
Claim(s) 1-20 are allowed. 

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUOC A TRAN whose telephone number is (571)272-8664. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on 571-272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/QUOC A TRAN/Primary Examiner, Art Unit 2177