Skip to content

Latest commit

 

History

History
399 lines (312 loc) · 28.9 KB

LLMs-in-Education_Review_2409.11917v1.md

File metadata and controls

399 lines (312 loc) · 28.9 KB

LLMs in Education: Novel Perspectives, Challenges, and Opportunities

by Bashar Alhafni, Sowmya Vajjala, Stefano Banno, Kaushal Kumar Maurya, Ekaterina Kochmar https://arxiv.org/html/2409.11917v1

Contents

LLMs in Education

Abstract

Overview:

  • Role of LLMs in education is an increasing area of interest
  • Offers new opportunities for teaching, learning, and assessment
  • This tutorial provides an overview of educational applications of NLP
    • Impact of recent advances in LLMs on this field

Discussion Topics:

  1. Key challenges and opportunities presented by LLMs in education
  2. Four major educational applications:
    • Reading skills
    • Writing skills
    • Speaking skills
    • Intelligent tutoring systems (ITS)

Audience:

  • Researchers and practitioners interested in the educational applications of NLP
  • The first tutorial to address this timely topic at COLING 2025.

1 Introduction

Large Language Models (LLMs)

  • GPT-3.5 (ChatGPT): remarkable capabilities across various tasks Wei et al. (2022); Minaee et al. (2024)
  • Rapid adoption by EdTech companies: Duolingo Naismith et al. (2023), Grammarly Raheja et al. (2023)
  • Impact on educational applications research and development: enabling new opportunities in writing assistance, personalization, and interactive teaching and learning, among others

Ethical Considerations in Integrating LLMs into Educational Settings

  • Paradigm shift in educational applications
  • Present novel challenges regarding ethical considerations Bommasani et al. (2021)

Key Topics Covered

  • Examining the challenges and opportunities presented by LLMs for educational applications through four key tasks:
    • Writing assistance
    • Personalization
    • Interactive teaching and learning

2 Outline

  • Impact of LLMs on:
    • Writing assistance
    • Reading assistance
    • Spoken language learning and assessment
    • Development of Intelligent Tutoring Systems (ITS)

2.2 LLMs for Writing Assistance

Grammatical Error Correction (GEC) Tutorial

Overview:

  • Focuses on GEC for writing assistance
  • Automatically detects and corrects errors in text
  • Provides pedagogical benefits to L1 and L2 language teachers and learners: instant feedback, personalized learning
  • Covers history, popular datasets (Yannakoudakis et al., 2011; Napoles et al., 2017; Náplava et al., 2022), evaluation methods (Bryant et al., 2023), and techniques: rule-based to sequence-to-sequence models

LLMs for GEC:

  • Thorough overview on using Language Models (LLMs) for GEC
  • Evaluates their performance in terms of fluency, coherence, and fixing various error types across popular benchmarks (Fang et al., 2023; Raheja et al., 2024; Katinskaia and Yangarber, 2024)
  • Discusses prompting techniques and strategies for GEC, evaluating their effectiveness and limitations (Loem et al., 2023)
  • Compares LLMs to supervised GEC approaches, examining strengths and weaknesses (Omelianchuk et al., 2024)
  • Discusses using LLMs to evaluate GEC systems (Kobayashi et al., 2024)
  • Provides insights into future directions: evaluation, usability, interpretability from a user-centric perspective.

2.3 LLMs for Reading Assistance

  • Readability assessment: important part of literacy; ability to read and comprehend text plays major role in critical thinking, effective communication
  • Over two decades of NLP research on:
    • Readability assessment
    • Text simplification
  • Focused on improving accessibility to written content

Approaches:

  • Various methods explored in NLP research over time
  • Challenges involved include maintaining context and preserving meaning while making adjustments

Domain Specific Issues and Multilingual Approaches:

  • Research addressing domain-specific issues Garimella et al. (2021)
    • Medical terminology, technical jargon
  • Advancements in multilingual text simplification Saggion et al. (2022)

LLMs for Reading Support:

  • Arrival of LLMs led to new advances in readability assessment and text simplification Kew et al. (2023)
  • Zero-shot usage, prompt tuning, fine-tuning Lee and Lee (2023); Tran et al. (2024)
    • Improving accuracy and relevance to individual users

Current Limitations:

  • Limitations of current approaches Štajner (2021)
    • Difficulty in preserving context and meaning when making adjustments
  • Focus on user-based evaluation Vajjala and Lučić (2019); Säuberli et al. (2024); Agrawal and Carpuat (2024)
    • Ensuring that simplifications are meaningful to the intended audience

Future Directions:

  • Increased focus on user-based evaluation
  • Supporting more languages Shardlow et al. (2024)
  • Exploring new techniques for preserving context and meaning while simplifying text.

2.4 LLMs for Spoken Language Learning and Assessment

Speaking as a Crucial Language Skill

  • Speaking is a core language skill in language education curricula (Fulcher, 2000)
  • Increasing interest in automated spoken language proficiency assessment

Overview of Automated Spoken Language Assessment and Writing Assessment

  • History: overview of approaches to speaking assessment and their counterpart - automatic essay scoring (Burstein, 2002; Rudner et al., 2006; Landauer et al., 2002)
  • Breakthroughs: first commercial systems for speech assessment (Townshend et al., 1998; Xi et al., 2008) and deep neural network approaches for writing (Alikaniotis et al., 2016) and speaking (Malinin et al., 2017) assessment
  • Applications: analytic assessment, grammatical error detection (GEC), and correction (GED)

Focus on LLMs for Spoken Language Assessment and Feedback

  • Use of text-based foundation models like BERT (Devlin et al., 2019) for holistic assessment (Craighead et al., 2020; Raina et al., 2020; Wang et al., 2021) and analytic assessment approaches (Bannò et al., 2024b)
  • Use of speech foundation models like wav2vec 2.0 (Baevski et al., 2020), HuBERT (Hsu et al., 2021), and Whisper (Radford et al., 2022) for mispronunciation detection, pronunciation assessment (Kim et al., 2022), and holistic assessment (Bannò and Matassoni, 2023; Bannò et al., 2023)
  • Addressing interpretability issues using analytic assessment approaches for writing
  • Exploring opportunities for multimodal models like SALMONN (Tang et al., 2024) and Qwen Audio (Chu et al., 2023), as well as text-to-speech models like Bark (Schumacher et al., 2023) and Voicecraft (Peng et al., 2024) for assessment and learning.

2.5 LLMs in Intelligent Tutoring Systems (ITS)

Overview:

  • Computerized learning environments that provide personalized feedback based on learning progress
  • Capable of providing one-on-one tutoring for equitable and effective learning experience
  • Leads to substantial learning gains

Lack of Individualized Tutoring:

  • Less effective learning and dissatisfaction in large classrooms

Key Principles of Learning Sciences:

  • Goals for ITS development
  • Outline: Pre-LLM ITS systems, including those tailored for misconception identification, model-tracing tutors, constraint-based models, Bayesian network models, and systems designed for specific knowledge areas

LLMs in Intelligent Tutoring Systems Development:

  • Question solving
  • Error correction
  • Confusion resolution
  • Question generation
  • Content creation
  • Simulating student interactions for teaching assistants and teacher training
  • Assisting in creating datasets for fine-tuning LLMs
  • Developing prompt-based techniques or modularized prompting for ITS development

Future Directions:

  1. Development of standardized evaluation benchmarks to assess progress in ITS
  2. Collection and creation of large public educational datasets for LLM training and fine-tuning
  3. Development of specialized foundational LLMs for educational purposes
  4. Investigations into long-term impact on students and teachers
  5. Examining ethical considerations, potential biases, and pedagogical value in dialogue-based ITS.

3 Recommended Reading List

  • Relevant papers cited in this proposal
  • Available on the tutorial website

4 Target Audience

  • Graduate students, researchers, and practitioners attending COLING 2025
  • Background in Computational Linguistics (CL), NLP, or Machine Learning (ML)
  • Interested in educational applications of NLP and generative AI
  • Basic knowledge of educational technologies not required

5 Tutorial Description

  • Self-contained and accessible to a wide audience
  • Focuses on advanced technologies for various educational applications
  • Covers recent advances brought by generative AI and LLMs
  • Addresses opportunities, challenges, and risks in the field
  • First tutorial on this topic at COLING or any other CL conference

6 Diversity Considerations

  • Tutorial covers a wide range of applications
  • Highlights opportunities to reach underrepresented groups
  • Addresses fairness and accessibility challenges
  • Instructors are diverse in gender, nationality, affiliation, and seniority
  • Includes open Q&A sessions for participant engagement and discussion.

Tutorial Reading List

COLING 2025 Tutorial Additional Reading List

Datasets

  1. The CoNLL-2014 Shared Task on Grammatical Error Correction
  2. The First QALB Shared Task on Automatic Text Correction for Arabic
  3. The Second QALB Shared Task on Automatic Text Correction for Arabic
  4. The BEA-2019 Shared Task on Grammatical Error Correction
  5. Grammar Error Correction in Morphologically Rich Languages: The Case of Russian
  6. Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language

Evaluation methods and their reliability

  1. Ground Truth for Grammatical Error Correction Metrics
  2. There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction
  3. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction
  4. Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems
  5. Classifying Syntactic Errors in Learner Language
  6. IMPARA: Impact-Based Metric for {GEC} Using Parallel Data
  7. Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality
  8. Inherent Biases in Reference-based Evaluation for Grammatical Error Correction

Methods: Statistical and rule-based

  1. Detection of Grammatical Errors Involving Prepositions
  2. The Ups and Downs of Preposition Error Detection in ESL Writing
  3. Grammatical Error Correction with Alternating Structure Optimization
  4. Joint Learning and Inference for Grammatical Error Correction
  5. Generalized Character-Level Spelling Error Correction
  6. Grammatical error correction using hybrid systems and type filtering
  7. The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation
  8. Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction

Methods: sequence-to-sequence

  1. Grammatical error correction using neural machine translation
  2. Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task
  3. Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models
  4. Neural and FST-based approaches to grammatical error correction
  5. Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data
  6. Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data
  7. Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model
  8. Document-level grammatical error correction

Methods: text-editing neural models

  1. Parallel Iterative Edit Models for Local Sequence Transduction
  2. Encode, Tag, Realize: High-Precision Text Editing
  3. Seq2Edits: Sequence Transduction Using Span-level Edit Operations
  4. FELIX: Flexible Text Editing Through Tagging and Insertion
  5. Character Transformations for Non-Autoregressive {GEC} Tagging
  6. EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start
  7. An Extended Sequence Tagging Vocabulary for Grammatical Error Correction

LLMs for GEC

  1. Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation
  2. Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction
  3. ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark
  4. Prompting open-source and commercial language models for grammatical error correction of English learner text
  5. MEDIT: Multilingual Text Editing via Instruction Tuning
  6. GPT-3.5 for Grammatical Error Correction
  7. Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods
  8. Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models

LLMs as GEC Evaluators / Explainors

  1. Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction
  2. GMEG-EXP: A Dataset of Human- and LLM-Generated Explanations of Grammatical and Fluency Edits
  3. Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction

Recent papers

  1. Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond
  2. Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision
  3. Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks
  4. Understanding Iterative Revision from Human-Written Text

Ethical considerations

  1. Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance

Readability and Simplification

Surveys

  1. Computational assessment of text readability: A survey of current and future research
  2. Trends, limitations and open challenges in automatic readability assessment research
  3. A survey of research on text simplification
  4. Data-Driven Sentence Simplification: Survey and Benchmark

Shared Tasks

Methods

  1. The Principles of Readability
  2. Do NLP and machine learning improve traditional readability formulas?
  3. Multiattentive Recurrent Neural Network Architecture for Multilingual Readability Assessment
  4. Text readability assessment for second language learners
  5. Exploring hybrid approaches to readability: experiments on the complementarity between linguistic features and transformers
  6. Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features
  7. Automatic induction of rules for text simplification
  8. Learning to simplify sentences with quasi-synchronous grammar and integer programming
  9. Optimizing statistical machine translation for text simplification
  10. Learning to Paraphrase Sentences to Different Complexity Levels
  11. Elaborative Simplification for German-language Texts
  12. Supervised and Unsupervised Neural Approaches to Text Readability
  13. All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch

LLMs

  1. Prompt-based Learning for Text Readability Assessment
  2. FPT: Feature Prompt Tuning for Few-shot Readability Assessment
  3. Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
  4. BLESS: Benchmarking Large Language Models on Sentence Simplification
  5. An LLM-Enhanced Adversarial Editing System for Lexical Simplification
  6. On Simplification of Discharge Summaries in Serbian: Facing the Challenges

Evaluation

  1. Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts
  2. Are Cohesive Features Relevant for Text Readability Evaluation?
  3. On understanding the relation between expert annotations of text readability and target reader comprehension
  4. Linguistic Corpus Annotation for Automatic Text Simplification Evaluation
  5. The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification
  6. Investigating Text Simplification Evaluation
  7. Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension

Explainability:

  1. Explainable AI in Language Learning: Linking Empirical Evidence and Theoretical Concepts in Proficiency and Readability Modeling of Portuguese
  2. “Geen makkie”: Interpretable Classification and Simplification of Dutch Text Complexity

Broader impact/Ethical issues

  1. Automatic text simplification for social good: Progress and challenges
  2. When readability meets computational linguistics: a new paradigm in readability
  3. Problems in Current Text Simplification Research: New Data Can Help

Spoken Language Learning and Assessment

Automated Speaking Assessment

  1. The use of DBN-HMMs for mispronunciation detection and diagnosis in {L2 English} to support computer-aided pronunciation training
  2. Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
  3. Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech
  4. Incorporating uncertainty into deep learning for spoken language assessment
  5. Automated scoring of spontaneous speech from young learners of English using transformers
  6. Automatic pronunciation assessment using self-supervised speech representation learning
  7. View-Specific Assessment of L2 Spoken English
  8. Proficiency assessment of L2 spoken English using wav2vec 2.0
  9. Can GPT-4 do L2 analytic assessment?

Spoken GED and GEC

  1. Automatic error detection in the Japanese learners’ English spoken data
  2. Impact of ASR Performance on Spoken Grammatical Error Detection
  3. Automatic Grammatical Error Detection of Non-Native Spoken Learner English
  4. On Assessing and Developing Spoken ‘Grammatical Error Correction’ Systems
  5. Towards End-to-End Spoken Grammatical Error Correction

LLMs for Speaking Assessment and Feedback

  1. Transformer Based End-to-End Mispronunciation Detection and Diagnosis
  2. Explore wav2vec 2.0 for Mispronunciation Detection
  3. Automatic Assessment of Conversational Speaking Tests

Validity and reliability

  1. Assessing L2 English speaking using automated scoring technology: examining automaker reliability

LLM in STEM Education and ITS

People have valued and thought deeply about education for a long time - John Dewey, 1923

Overview: Opportunities and Challenges

  1. Opportunities and Challenges in Neural Dialog Tutoring
  2. Are We There Yet? - A Systematic Literature Review on Chatbots in Education
  3. A Systematic Literature Review of Intelligent Tutoring Systems With Dialogue in Natural Language

Pre-LLM ITS

  1. AutoTutor 3-D Simulations: Analyzing Users' Actions and Learning Trends
  2. Gaze tutor: A gaze-reactive intelligent tutoring system
  3. Interactive Conceptual Tutoring in Atlas-Andes
  4. Individualizing self-explanation support for ill-defined tasks in constraint-based tutors
  5. Jacob-An animated instruction agent in virtual reality
  6. Data mining in education

LLM in STEM Education and ITS

  1. Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors
  2. Backtracing: Retrieving the Cause of the Query
  3. MATHDIAL: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
  4. Improving Teachers’ Questioning Quality through Automated Feedback: A Mixed-Methods Randomized Controlled Trial in Brick-and-Mortar Classrooms
  5. Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes
  6. GPTeach: Interactive TA Training with GPT-based Students
  7. NAISTeacher: A Prompt and Rerank Approach to Generating Teacher Utterances in Educational Dialogues
  8. Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction
  9. Demographic predictors of students’ science participation over the age of 16: An Australian case study

Evaluation of ITS

  1. The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
  2. Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions
  3. Evaluation Methodologies for Intelligent Tutoring Systems

Ethical Considerations

  1. What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education