Skip to content

saranya-venkatraman/gpt-who

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository provides code to calculate the 4 UID-based features and UID minimum and maximum span features described in the paper for efficient and accurate machine text detection.

image

Installation

Use the package manager pip to install all requirements.

$ pip install -r requirements.txt

This repository contains 2 scripts:

  1. get_uid_features.py: This scripts loads texts and author labels from a csv file/any data source, calculates all UID features needed for GPT-who and writes them to a new csv file. This new generated csv file is the input to gpt-who.py

Arguments

--input_path: Path to the CSV file or data source containing text and corresponding labels (default: None).
--cache_path: Path to the cache directory for the GPT-2 XL model (default: "./.cache/models/gpt2-xl").
--output_path: Path to the CSV file where UID features will be saved (default: "./scores/uid_features.csv").

Example Usage

python gptwho_uid_features.py --input_path ./data/text_labels.csv --cache_path ./model_cache/gpt2-xl --output_path ./scores/uid_features.csv
  1. gpt-who.py: This script takes as input two .csv files with UID features corresponding to the train and test split of the dataset, calculates the UID span features to concatenate with the other 4 (uid_var, uid_diff, uid_diff2, and mean), runs logistic regression, predicts labels, and reports machine text detection performance.

Arguments

--train_file: Path to the CSV file containing UID features for the training split (default: "./scores/train_uid_scores.csv").
--test_file: Path to the CSV file containing UID features for the test split (default: "./scores/test_uid_scores.csv").

Example Usage

python uid_span_features_logreg.py --train_file ./data/train_uid_features.csv --test_file ./data/test_uid_features.csv

Scores data files: We also provide UID feature train and test files for the ArguGPT dataset as an example dataset to run this code. However, our method can be applied to any custom dataset with "text" and "label" fields corresponding to the textual content and author labels.

About

Implementation of GPT-who: A Machine-Text Detector

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages