fastchat.data.clean_sharegpt

  • Convert html to markdown with basic data cleaning.

  • Deduplication.

Usage: python3 -m fastchat.data.clean_sharegpt –in sharegpt_html.json –out sharegpt_clean.json

Module Contents

Functions

clean_html_all(content, begin, end)

Clean the source html files.

fastchat.data.clean_sharegpt.clean_html_all(content, begin, end)[source]

Clean the source html files.