pytorch-textregression,中文文本回归实践,支持多值输出

pytorch-textregression是一个以pytorch和transformers为基础,专注于中文文本回归的轻量级自然语言处理工具,支持多值回归等。

目录

数据使用方式paper参考

项目地址

pytorch-textregression: https://github.com/yongzhuo/Pytorch-NLU/tree/main/pytorch_nlu/pytorch_textregression

数据格式

1. 文本回归 (txt格式, 每行为一个json):

1.1 单个得分格式:

{"text": "你安静!", "label": [1]}

{"text": "斗牛场是多么欢乐阿!", "label": [1]}

{"text": "今天你不必做作业。", "label": [0]}

{"text": "他醒来时,几乎无法说话。", "label": [0]}

{"text": "在那天边隐约闪亮的不就是黄河?", "label": [1]}

1.2 多个得分格式:

{"text": "你安静!", "label": [1,0]}

{"text": "斗牛场是多么欢乐阿!", "label": [1,0]}

{"text": "今天你不必做作业。", "label": [0,0]}

{"text": "他醒来时,几乎无法说话。", "label": [0,0]}

{"text": "在那天边隐约闪亮的不就是黄河?", "label": [1,0]}

使用方式

更多样例sample详情见test/tr目录

训练 python tet_tr_base_train.py

预测 python tet_tr_base_predict.py

需要配置好预训练模型目录, 即变量 pretrained_model_dir、pretrained_model_name_or_path、idx等;

需要配置好自己的语料地址, 即字典 model_config[“path_train”]、model_config[“path_dev”]

cd到该脚本目录下运行普通的命令行即可, 例如: python trRun.py , python trPredict.py

文本回归(TR), Text-Regression

# 适配linux

import platform

import json

import sys

import os

path_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "../.."))

path_sys = os.path.join(path_root, "pytorch_nlu", "pytorch_textregression")

sys.path.append(path_sys)

print(path_root)

# 分类下的引入, pytorch_textclassification

from trConfig import model_config

from trTools import get_current_time

# 训练-验证语料地址, 可以只输入训练地址

path_corpus = path_root + "/corpus/text_regression/negative_sentence"

path_train = os.path.join(path_corpus, "train.json")

path_dev = os.path.join(path_corpus, "dev.json")

model_config["evaluate_steps"] = evaluate_steps # 评估步数

model_config["save_steps"] = save_steps # 存储步数

model_config["path_train"] = path_train

model_config["path_dev"] = path_dev

# 预训练模型适配的class

model_type = ["BERT", "ERNIE", "BERT_WWM", "ALBERT", "ROBERTA", "XLNET", "ELECTRA"]

pretrained_model_name_or_path = {

"BERT_WWM": "hfl/chinese-bert-wwm-ext",

"ROBERTA": "hfl/chinese-roberta-wwm-ext",

"ALBERT": "uer/albert-base-chinese-cluecorpussmall",

"XLNET": "hfl/chinese-xlnet-mid",

"ERNIE": "nghuyong/ernie-1.0-base-zh",

# "ERNIE": "nghuyong/ernie-3.0-base-zh",

"BERT": "bert-base-chinese",

# "BERT": "hfl/chinese-macbert-base",

}

idx = 1 # 选择的预训练模型类型---model_type

model_config["pretrained_model_name_or_path"] = pretrained_model_name_or_path[model_type[idx]]

model_config["model_save_path"] = "../output/text_regression/model_{}".format(model_type[idx])

model_config["model_type"] = model_type[idx]

# os.environ["CUDA_VISIBLE_DEVICES"] = str(model_config["CUDA_VISIBLE_DEVICES"])

# main

lc = TextRegression(model_config)

lc.process()

lc.train()

Reference

For citing this work, you can refer to the present GitHub project. For example, with BibTeX:

@software{Pytorch-NLU,

url = {https://github.com/yongzhuo/Pytorch-NLU},

author = {Yongzhuo Mo},

title = {Pytorch-NLU},

year = {2021}

*希望对你有所帮助!

精彩文章

评论可见,请评论后查看内容,谢谢!!!评论后请刷新页面。