langchain pipeline vram usage when loading model
I'm trying to load 6b 128b 8bit llama based model from file (note the model itself is an example, I tested others and got similar problems), the pipeline is completely eating up my 8gb of vram:
My code:
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM, LlamaConfig, pipeline
torch.cuda.set_device(torch.device("cuda:0"))
PATH = './models/wizardLM-7B-GPTQ-4bit-128g'
config = LlamaConfig.from_json_file(f'{PATH}/config.json')
base_model = LlamaForCausalLM(config=config).half()
torch.cuda.empty_cache()
tokenizer = LlamaTokenizer.from_pretrained(
pretrained_model_name_or_path=PATH,
low_cpu_mem_usage=True,
local_files_only=True
)
torch.cuda.empty_cache()
pipe = pipeline(
"text-generation",
model=base_model,
tokenizer=tokenizer,
batch_size=1,
device=0,
max_length=100,
temperature=0.6,
top_p=0.95,
repetition_penalty=1.2
)
How can I make the pipeline initiation consume less vram?
gpu: AMD® Radeon rx 6600 (8gb vram, rocm 5.4.2 & torch)
I want to mention that I managed to load the same model on other frameworks like "KoboldAI" or "text-generation-webui" so I know it should be possible.
To load the model "wizardLM-7B-GPTQ-4bit-128g" downloaded from huggingface and run it using with langchain on python.
pip list output:
Package Version
------------------------ ----------------
accelerate 0.19.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.0
anyio 3.6.2
argilla 1.7.0
async-timeout 4.0.2
attrs 23.1.0
backoff 2.2.1
beautifulsoup4 4.12.2
bitsandbytes 0.39.0
certifi 2022.12.7
cffi 1.15.1
chardet 5.1.0
charset-normalizer 2.1.1
chromadb 0.3.23
click 8.1.3
clickhouse-connect 0.5.24
cmake 3.25.0
colorclass 2.2.2
commonmark 0.9.1
compressed-rtf 1.0.6
contourpy 1.0.7
cryptography 40.0.2
cycler 0.11.0
dataclasses-json 0.5.7
datasets 2.12.0
Deprecated 1.2.13
dill 0.3.6
duckdb 0.8.0
easygui 0.98.3
ebcdic 1.1.1
et-xmlfile 1.1.0
extract-msg 0.41.1
fastapi 0.95.2
ffmpy 0.3.0
filelock 3.9.0
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
gradio 3.28.3
gradio_client 0.2.5
greenlet 2.0.2
h11 0.14.0
hnswlib 0.7.0
httpcore 0.16.3
httptools 0.5.0
httpx 0.23.3
huggingface-hub 0.14.1
idna 3.4
IMAPClient 2.3.1
Jinja2 3.1.2
joblib 1.2.0
jsonschema 4.17.3
kiwisolver 1.4.4
langchain 0.0.171
lark-parser 0.12.0
linkify-it-py 2.0.2
lit 15.0.7
llama-cpp-python 0.1.50
loralib 0.1.1
lxml 4.9.2
lz4 4.3.2
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
marshmallow 3.19.0
marshmallow-enum 1.5.1
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
monotonic 1.6
mpmath 1.2.1
msg-parser 1.2.0
msoffcrypto-tool 5.0.1
multidict 6.0.4
multiprocess 0.70.14
mypy-extensions 1.0.0
networkx 3.0
nltk 3.8.1
numexpr 2.8.4
numpy 1.24.1
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
olefile 0.46
oletools 0.60.1
openai 0.27.7
openapi-schema-pydantic 1.2.4
openpyxl 3.1.2
orjson 3.8.12
packaging 23.1
pandas 1.5.3
pandoc 2.3
pcodedmp 1.2.6
pdfminer.six 20221105
Pillow 9.3.0
pip 23.0.1
plumbum 1.8.1
ply 3.11
posthog 3.0.1
psutil 5.9.5
pyarrow 12.0.0
pycparser 2.21
pydantic 1.10.7
pydub 0.25.1
Pygments 2.15.1
pygpt4all 1.1.0
pygptj 2.0.3
pyllamacpp 2.3.0
pypandoc 1.11
pyparsing 2.4.7
pyrsistent 0.19.3
python-dateutil 2.8.2
python-docx 0.8.11
python-dotenv 1.0.0
python-magic 0.4.27
python-multipart 0.0.6
python-pptx 0.6.21
pytorch-triton-rocm 2.0.1
pytz 2023.3
pytz-deprecation-shim 0.1.0.post0
PyYAML 6.0
red-black-tree-mod 1.20
regex 2023.5.5
requests 2.28.1
responses 0.18.0
rfc3986 1.5.0
rich 13.0.1
RTFDE 0.0.2
scikit-learn 1.2.2
scipy 1.10.1
semantic-version 2.10.0
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 66.0.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.4.1
SQLAlchemy 2.0.15
starlette 0.27.0
sympy 1.11.1
tabulate 0.9.0
tenacity 8.2.2
threadpoolctl 3.1.0
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1+rocm5.4.2
torchaudio 2.0.2+rocm5.4.2
torchvision 0.15.2+rocm5.4.2
tqdm 4.65.0
transformers 4.30.0.dev0
triton 2.0.0
typer 0.9.0
typing_extensions 4.4.0
typing-inspect 0.8.0
tzdata 2023.3
tzlocal 4.2
uc-micro-py 1.0.2
unstructured 0.6.6
urllib3 1.26.13
uvicorn 0.22.0
uvloop 0.17.0
watchfiles 0.19.0
websockets 11.0.3
wheel 0.38.4
wikipedia 1.4.0
wrapt 1.14.1
XlsxWriter 3.1.0
xxhash 3.2.0
yarl 1.9.2
zstandard 0.21.0
Comments
Post a Comment