Online Public Opinion Prediction Model

정부 및 국회 주관 "2022 과학기술 AI / DATA 분석 경진대회" AI 모델 개발 부문 1등 (최우수상) 수상작

1. Project Background

The National Assembly of the Republic of Korea is making various efforts to detect and respond to public opinion on major social issues through opinion polls and the media.
However, there is a limit to the objective prediction, and considerable time and cost are incurred from preparing a solution to legislative connection. For example, in the 20th National Assembly, the approval rate of all bills was 13.2%, and the average processing time was 577.2 days.
Therefore, in this project, we propose a natural language processing-based artificial intelligence model that can efficiently predict online public opinion and is expected to be used as a policy decision tool in various fields.

2. Dataset

The Korean National Assembly provided articles, Twitter, and online community data related to major legislation in Korea.
Online comment and review data were additionally collected for fine-tuning the language model.

3. Overall pipeline

To summarize the entire process of the 『Online Public Opinion Prediction Model』 we designed, it consists of the following four steps.

STEP 1) Sentiment analysis corpus preparation: positive/negative labeled Twitter and comment text.
STEP 2) Fine-tuning a text embedding classification model: learn a BERT-based language model designed to create and classify text embeddings (or vectors) in three ways.
STEP 3) Time series data conversion: Convert the positive/negative predictive values of the language model into a time series table.
STEP 4) Applying a Transformers-based time series prediction model: Predict the future trend of positive/negative public opinion after learning with my time series data prediction model designed based on Transformers.

STEP 1. Sentiment analysis corpus preparation

A total of 530k text data, including legislative news and Twitter, provided by the Korea National Assembly, and online comments collected for sentiment analysis, were synthesized and pre-processed.
This text data was labeled (negative: 0, positive: 1) according to positive and negative public opinion.

STEP 2. Text Embedding Classifier(classification model)

Using pre-trained BERT-based language models(PLMs), we can obtain a fixed-size contextual vector, which means Token Embedding.
Use the [CLS] token or apply a pooling technique to obtain sentence-level embedding instead of token-level embedding.
1. [CLS] Token : A word-level vector containing the meaning of the entire token within a sentence.
2. Mean Pooling : A sentence-level vector summarizing the semantic expression of all tokens.
3. Max Pooling : A sentence-level vector summarizing the semantic expression of important tokens.
As a result of Text Embedding Classifier, all three methods show classification accuracy of 91~92% or more.

STEP 3. Time series data conversion

Through the 'Text Embedding Classifier Model', articles and Twitter related to 'Lease 3 Law' are predicted as positive or negative and then converted into time series data through the 'Hash Table Function'.

STEP 4. Transformer-based time series predictor(prediction model)

The transformer model solves the problems faced by the existing RNN-based models by applying the attention mechanism, and the calculation speed is greatly improved.
In particular, attention is a core concept of the Transformer, which enables the neural network of the model to understand contextual text information, focusing on words similar to the current term, and training and inferencing.
Inspired by this model, our time series prediction model is a Seq2Seq model in consists of an encoder as three transformer encoders are stacked and a decoder as one linear regression model.

4. Train & Inference results

Our model attends (highly weighted) to the major rebound or rebound inflection points in the input data and learns fine-grained time series patterns.
Blue-line : Actual value, Red-line : Model predicted value, Grey-line : Residual(True - Prediction)

5. Dev

Seoul National University NLP Labs
Navy Lee

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
models		models
outputs		outputs
preprocessing		preprocessing
(Overall_pipeline)_Opinion_prediction.py		(Overall_pipeline)_Opinion_prediction.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Online Public Opinion Prediction Model

정부 및 국회 주관 "2022 과학기술 AI / DATA 분석 경진대회" AI 모델 개발 부문 1등 (최우수상) 수상작

1. Project Background

2. Dataset

3. Overall pipeline

STEP 1. Sentiment analysis corpus preparation

STEP 2. Text Embedding Classifier(classification model)

STEP 3. Time series data conversion

STEP 4. Transformer-based time series predictor(prediction model)

4. Train & Inference results

5. Dev

About

Releases

Packages

Languages

Navy10021/Public_Opinion_Prediction

Folders and files

Latest commit

History

Repository files navigation

Online Public Opinion Prediction Model

정부 및 국회 주관 "2022 과학기술 AI / DATA 분석 경진대회" AI 모델 개발 부문 1등 (최우수상) 수상작

1. Project Background

2. Dataset

3. Overall pipeline

STEP 1. Sentiment analysis corpus preparation

STEP 2. Text Embedding Classifier(classification model)

STEP 3. Time series data conversion

STEP 4. Transformer-based time series predictor(prediction model)

4. Train & Inference results

5. Dev

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages