在本地运行Google推出的Gemma模型

最近我发现自己调用OpenAI API的限额快满了，所以我就想着通过Ollama搭建的本地大模型，使用Langchain去访问它，帮我节省一下费用。刚好谷歌前段时间发布了Gemma。

那个版本的话看大家自己的需求和内存，2B只有1.4GB，而7B的话达到了4.8GB。所以我在这里选择了2B，大家有其他的需求的话参阅下面这个图：

安装Gemma

运行

ollama run gemma:2b

测试

安装Langchain

运行

pip install langchain

验证是否成功

运行以下命令以验证安装是否成功：

python -c "import langchain; print('LangChain version:', langchain.version)"

这里可以看到LangChain版本：0.1.11

Langchain调用Gemma

测试调用代码：

from langchain_community.llms import Ollama
import logging

# Configure basic logging
logging.basicConfig(level=logging.INFO)

try:
    llm = Ollama(model="gemma:2b")

    # It's good practice to ensure prompts are well-defined. Adjust based on the model's capabilities.
    prompt = ("Who are you? "
              "Are you better than Mistral? "
              "Can you share a detailed comparison?")

    response = llm.invoke(prompt)
    print(response)
except ImportError:
    logging.error("Failed to import Ollama from langchain_community. Is the package installed?")
except Exception as e:
    logging.error(f"An unexpected error occurred: {e}")

运行结果

调用Restful API

除了上述的测试代码进行调用之外也可以调用Restful API

curl http://localhost:11434/api/chat -d '{
  "model": "gemma:2b",
  "messages": [
    { "role": "user", "content": "hi, who are you?" }
  ]
}'

接收回复需要2-3秒钟，就目前来看Gemma使用起来相当不错。现在就可以将更复杂的LangChain作品移植到我们的本地大型语言模型（LLM），同时节省资金。

希望我的这个测试demo可以抛砖引玉给大家提供一些思路和想法。

Was this helpful?

0 / 0