Implementation of "GPT-who: An Information Density-based Machine-Generated Text Detector"

This repository provides code to calculate the 4 UID-based features and UID minimum and maximum span features described in the paper for efficient and accurate machine text detection.

Installation

Use the package manager pip to install all requirements.

$ pip install -r requirements.txt

This repository contains 2 scripts:

get_uid_features.py: This scripts loads texts and author labels from a csv file/any data source, calculates all UID features needed for GPT-who and writes them to a new csv file. This new generated csv file is the input to gpt-who.py

Arguments

--input_path: Path to the CSV file or data source containing text and corresponding labels (default: None).
--cache_path: Path to the cache directory for the GPT-2 XL model (default: "./.cache/models/gpt2-xl").
--output_path: Path to the CSV file where UID features will be saved (default: "./scores/uid_features.csv").

Example Usage

python gptwho_uid_features.py --input_path ./data/text_labels.csv --cache_path ./model_cache/gpt2-xl --output_path ./scores/uid_features.csv

gpt-who.py: This script takes as input two .csv files with UID features corresponding to the train and test split of the dataset, calculates the UID span features to concatenate with the other 4 (uid_var, uid_diff, uid_diff2, and mean), runs logistic regression, predicts labels, and reports machine text detection performance.

Arguments

--train_file: Path to the CSV file containing UID features for the training split (default: "./scores/train_uid_scores.csv").
--test_file: Path to the CSV file containing UID features for the test split (default: "./scores/test_uid_scores.csv").

Example Usage

python uid_span_features_logreg.py --train_file ./data/train_uid_features.csv --test_file ./data/test_uid_features.csv

Scores data files: We also provide UID feature train and test files for the ArguGPT dataset as an example dataset to run this code. However, our method can be applied to any custom dataset with "text" and "label" fields corresponding to the textual content and author labels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of "GPT-who: An Information Density-based Machine-Generated Text Detector"

Installation

This repository contains 2 scripts:

Arguments

Example Usage

Arguments

Example Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scores		scores
README.md		README.md
get_uid_features.py		get_uid_features.py
gpt-who.py		gpt-who.py
requirements.txt		requirements.txt

saranya-venkatraman/gpt-who

Folders and files

Latest commit

History

Repository files navigation

Implementation of "GPT-who: An Information Density-based Machine-Generated Text Detector"

Installation

This repository contains 2 scripts:

Arguments

Example Usage

Arguments

Example Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages