✨明天呢,就给大家分享一下如何在恒源云 GPU 服务器上如何应用 spaCy。

置信很多小伙伴都晓得,spaCy 是一个自然语言解决库,包含分词、词性标注、词干化、命名实体辨认、名词短语提取等性能。


# 装置 spaCy 3 For CUDA 11.2,依据镜像 CUDA 版本替换 [] 内版本
pip install spacy[cuda112]==3.0.6

# 装置 spaCy 2 For CUDA 11.2,依据镜像 CUDA 版本替换 [] 内版本
pip install spacy[cuda112]==2.3.5

# 通过 spacy 模块下载模型因为墙可能不可用,可通过上面 pip 装置形式装置
python -m spacy download en_core_web_sm

# 装置 3.0.0 en_core_web_sm
pip install https://mirror.ghproxy.com/https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl --no-cache

# 装置 2.3.1 en_core_web_sm
pip install https://mirror.ghproxy.com/https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz --no-cache


import spacy

# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")

# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at"
        "Google in 2007, few people outside of the company took him"
        "seriously.“I can tell you very senior CEOs of major American"
        "car companies would shake my hand and turn away because I wasn’t"
        "worth talking to,”said Thrun, in an interview with Recode earlier"
        "this week.")
doc = nlp(text)

# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

