Internet DRAFT - draft-improving-data-quality-tags

draft-improving-data-quality-tags







Internet Engineering Task Force                            A. Ovcharenko
Internet-Draft                                              25 July 2023
Intended status: Informational                                          
Expires: 26 January 2024


            Improving Data Quality through Special Text Tags
                  draft-improving-data-quality-tags-00

Abstract

   This document proposes the use of special text tags to enhance data
   quality and improve the understanding of user queries in
   conversational AI models.  By incorporating these tags, models can
   benefit from additional context and structure during training and
   inference, leading to more accurate and relevant responses.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 26 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.




Ovcharenko               Expires 26 January 2024                [Page 1]

Internet-Draft  Improving Data Quality through Special T       July 2023


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   2
   3.  Specification . . . . . . . . . . . . . . . . . . . . . . . .   3
     3.1.  Intent Tagging  . . . . . . . . . . . . . . . . . . . . .   3
     3.2.  Entity Tagging  . . . . . . . . . . . . . . . . . . . . .   3
     3.3.  Contextual Tags . . . . . . . . . . . . . . . . . . . . .   3
     3.4.  Quality Assessment Tags . . . . . . . . . . . . . . . . .   4
     3.5.  Emotion or Tone Markers . . . . . . . . . . . . . . . . .   4
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   4
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   6.  Interoperability  . . . . . . . . . . . . . . . . . . . . . .   5
   7.  Implementation and Deployment . . . . . . . . . . . . . . . .   5
   8.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .   5
   9.  Informative References  . . . . . . . . . . . . . . . . . . .   5
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   Conversational AI models often face challenges in data collection and
   text parsing, impacting their performance and reliability.  This
   proposal aims to address these challenges by introducing special text
   tags.  This approach draws inspiration from related works in natural
   language processing, information retrieval, and conversational AI.

2.  Motivation

   The motivation behind this proposal is to improve the quality of
   training data and enhance the understanding of user queries by
   incorporating special text tags.  The idea is influenced by research
   on intent recognition, entity extraction, and context modeling in
   natural language understanding.  Notable works include:

   *  Previous studies on intent recognition in dialogue systems have
      explored the use of intent tags to improve the accuracy of
      responses[intent-recognition].

   *  Named Entity Recognition (NER) techniques have been widely studied
      and applied in information extraction tasks.  These approaches
      inspire the entity tagging component proposed in this
      study[gibbs-sampling].

   *  Research on dialogue modeling has emphasized the importance of
      context and sequential information in generating coherent
      responses.  Contextual tags introduced in this proposal draw
      inspiration from these studies[contextual-understanding].




Ovcharenko               Expires 26 January 2024                [Page 2]

Internet-Draft  Improving Data Quality through Special T       July 2023


3.  Specification

3.1.  Intent Tagging

   Intent tags are used to label the intent or purpose of user queries,
   providing guidance to the model in generating more contextually
   appropriate responses.

   *  [intent-def]: For queries seeking definitions of terms.

   *  [intent-comp]: For queries comparing two or more entities.

   *  [intent-ex]: For queries requesting examples or instances.

   *  [intent-steps]: For queries seeking step-by-step instructions.

   *  [intent-adv-disadv]: For queries exploring the pros and cons of a
      topic.

3.2.  Entity Tagging

   Entity tags are used to identify and label specific entities within
   the text, improving the model's understanding of user queries related
   to those entities.

   *  [entity-person]: For queries related to people or individuals.

   *  [entity-organization]: For queries related to organizations or
      companies.

   *  [entity-location]: For queries related to specific locations.

   *  [entity-date]: For queries related to dates or time.

   *  [entity-product]: For queries related to products or items.

3.3.  Contextual Tags

   Contextual tags mark contextual information, providing cues for
   maintaining a coherent and context-aware conversation.

   *  [context-background]: For providing background information or
      context.

   *  [context-constraints]: For indicating limitations or constraints.

   *  [context-previous-query]: For referring to a previous user query
      or conversation context.



Ovcharenko               Expires 26 January 2024                [Page 3]

Internet-Draft  Improving Data Quality through Special T       July 2023


   *  [context-next-steps]: For suggesting the next steps in a process
      or task.

   *  [context-clarification]: For seeking clarification or additional
      details.

3.4.  Quality Assessment Tags

   Quality assessment tags help identify the quality or reliability of
   information, enabling the model to generate more cautious and
   reliable responses.

   *  [qa-biased]: Indicating biased information.

   *  [qa-unverified]: Denoting information that is not verified or
      lacks credibility.

   *  [qa-misleading]: Highlighting information that may be misleading
      or deceptive.

   *  [qa-outdated]: Identifying information that is outdated or no
      longer accurate.

   *  [qa-fact-check-needed]: Flagging information that requires fact-
      checking.

3.5.  Emotion or Tone Markers

   Emotion or tone markers indicate the emotional or tonal aspects of
   the text, enabling the model to generate more appropriate and
   empathetic responses.

   *  [tone-positive]: Denoting a positive emotional tone.

   *  [tone-negative]: Indicating a negative emotional tone.

   *  [tone-neutral]: Denoting a neutral or unbiased tone.

   *  [tone-joy]: Indicating a joyful or happy emotion.

   *  [tone-sadness]: Denoting a sad or sorrowful emotion.

4.  IANA Considerations

   This memo includes no request to IANA.






Ovcharenko               Expires 26 January 2024                [Page 4]

Internet-Draft  Improving Data Quality through Special T       July 2023


5.  Security Considerations

   The security considerations section highlights that implementing
   special text tags does not introduce inherent security risks.
   However, it emphasizes the need to ensure secure and privacy-
   conscious practices during the tagging process and data collection,
   adhering to existing guidelines[usage-policies].

6.  Interoperability

   Interoperability is crucial for the widespread adoption of special
   text tags.  This section recognizes the importance of standardization
   efforts to ensure consistent usage and interpretation of tags across
   different conversational AI models and platforms.  It encourages
   collaboration with standardization bodies and references existing
   efforts in the field[caml-dialogue-systems].

7.  Implementation and Deployment

   The implementation and deployment section discuss the practical
   aspects of integrating special text tags.  It suggests involving
   human annotators or domain experts to accurately tag training data,
   modifying training processes to consider the tags, and updating
   inference systems to interpret and respond to tagged user queries
   effectively.

8.  Conclusion

   The proposed special text tags offer a structured approach to enrich
   the training data of conversational AI models.  By incorporating
   these tags, models can improve data quality, enhance understanding of
   user queries, and generate more accurate and contextually relevant
   responses.  The conclusion section summarizes the potential benefits
   and encourages further research and experimentation.

9.  Informative References

   [intent-recognition]
              Chen, M., Xu, Z., Weinberger, K., and O. Chapelle,
              "Marginalized Denoising Autoencoders for Domain
              Adaptation", 2012,
              <https://www.cs.cornell.edu/~kilian/papers/
              msdadomain.pdf>.








Ovcharenko               Expires 26 January 2024                [Page 5]

Internet-Draft  Improving Data Quality through Special T       July 2023


   [gibbs-sampling]
              Finkel, J. R., Grenager, T., and C. Manning,
              "Incorporating Non-local Information into Information
              Extraction Systems by Gibbs Sampling", 2005,
              <https://www.aclweb.org/anthology/P/P05/P05-1045.pdf>.

   [contextual-understanding]
              Ritter, A., Cherry, C., and B. Dolan, "Data-driven
              Response Generation in Social Media", 2011,
              <https://www.aclweb.org/anthology/D/D11/D11-1145.pdf>.

   [usage-policies]
              OpenAI, "Usage policies", 2021,
              <https://openai.com/policies/usage-policies>.

   [caml-dialogue-systems]
              Kovasznai, G., Kotropoulos, C., and I. Pitas, "CAML - A
              Universal Configuration Language for Dialogue Systems",
              <https://citeseerx.ist.psu.edu/doc/10.1.1.1086.4050>.

Author's Address

   Aleksey Ovcharenko
   Email: aleksey.ovcharenko@gmail.com



























Ovcharenko               Expires 26 January 2024                [Page 6]