fastchat.data.split_long_conversation

Split long conversations based on certain max length.

Usage: python3 -m fastchat.data.split_long_conversation –in sharegpt_clean.json –out sharegpt_split.json –model-name-or-path $<model-name>

Module Contents

Functions

split_all(content, begin, end, tokenizer_, max_length_)

Keep the maximum round of conversations within the max token length constraint

fastchat.data.split_long_conversation.split_all(content, begin, end, tokenizer_, max_length_)[source]

Keep the maximum round of conversations within the max token length constraint