fastchat.serve.monkey_patch_non_inplace

Monkey patch the llama implementation in the huggingface/transformers library. Avoid bugs in mps backend by not using in-place operations.

Module Contents

Functions

rotate_half(x)

Rotates half the hidden dims of the input.

replace_llama_attn_with_non_inplace_operations()

Avoid bugs in mps backend by not using in-place operations.

fastchat.serve.monkey_patch_non_inplace.rotate_half(x)[source]

Rotates half the hidden dims of the input.

fastchat.serve.monkey_patch_non_inplace.replace_llama_attn_with_non_inplace_operations()[source]

Avoid bugs in mps backend by not using in-place operations.