斯坦福stanford coreNLP 賓州樹庫漢語短語類別表23個

短語標記17個

標註

英文說明

中文說明

ADJP

Adjective phrase

形容詞短語,由JJ投射

ADVP

Adverbial phrase headed by AD

由副詞開頭的副詞短語、狀語

CLP

Classifier phrase

量詞短語

CP

Clause headed by C(complementizer)

由補語引導的補語從句,關係從句

DNP

Phrase formed by “XP+DEG”

XP+DEG結構構成的短語

DP

Determiner phrease

限定詞短語

DVP

Phrase formed BY ‘’XP+DEB“

XP+DEV結構構成的短語

FRAG

fragment

片段

IP

InflectionPhrase

Simple clause headed by I(INFL或其他曲折成份)

LCP

Phrase formed by ”XP+LC“

處所詞爲中心語的短語

LST

List marker

用於解釋說明性的列表標記短語

NP

Noun phrase

名詞短語

PP

Preposition phrase

介詞短語

PRN

Parenthetical

插入語

QP

Quantifier phrase

數詞短語,由數量詞構成的短語結構

UCP

Unidentical coordination phrase

非一致性並列短語

VP

Verb phrase

動詞短語



動詞複合6個標記

VCD 並列動詞複合(VCD (VV投資)    (VV辦廠))
VCP VV+VC 動詞+是
VNV A不A,A一A,(VNV(VV 能) (AD 不) (VV 能))
VPT V的R,或V不R (VPT (VV 得)   (AD 不)   (VV 到))
VRD 動詞結果複合,第二個成份是第一個成份的結果(VRD (VV 呈現) (VV 出));(VP(VRD(VV 聯合) (VV 起來)))
VSB 定語+核心複合,第一個成份爲不及物動詞,兩個成份之間沒有附加語或者體標記,VSB (VV 加速) (VV 建設)) (VP(VSB(VV 仰頭)(VV 望去)))

NP

中心詞爲名詞構成的短語。從語法角度看,有兩種含義:(1)按句法成份構成的短語,如組塊在句子中充當主語、賓語等,可以增加輔助標籤,NP-Sbg,NP-Obj;(2)知識庫中的實體和屬性,這種組塊稱爲baseNP。

VP

以動詞爲中心,與其修飾、限定、並列成份共同構成的一種語義組塊。

 

CoreNLP中源碼

nonTerminalInfo.put("ROOT",new String[][]{{left, "IP"}});
nonTerminalInfo.put("PAIR",new String[][]{{left, "IP"}});

// Major syntactic categories
nonTerminalInfo.put("ADJP",new String[][]{{left, "JJ","ADJP"}}); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJP
nonTerminalInfo.put("ADVP",new String[][]{{left, "AD","CS", "ADVP","JJ"}}); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewrites
nonTerminalInfo.put("CLP",new String[][]{{right, "M","CLP"}});
//nonTerminalInfo.put("CP", newString[][] {{left,"WHNP","IP","CP","VP"}}); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.
nonTerminalInfo.put("CP",new String[][]{{right, "DEC","WHNP", "WHPP"},rightExceptPunct}); // the (syntax-oriented) right-first head rule
// nonTerminalInfo.put("CP", new String[][]{{right, "DEC","ADVP", "CP", "IP", "VP","M"}}); // the (syntax-oriented) right-first head rule
nonTerminalInfo.put("DNP",new String[][]{{right, "DEG","DEC"}, rightExceptPunct});//according to tgrep2, first preparation, all DNPs have a DEG daughter
nonTerminalInfo.put("DP",new String[][]{{left, "DT","DP"}}); // there's one instance of DP adjunction
nonTerminalInfo.put("DVP",new String[][]{{right, "DEV","DEC"}}); // DVP always has DEV under it
nonTerminalInfo.put("FRAG",new String[][]{{right, "VV","NN"}, rightExceptPunct});//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"
nonTerminalInfo.put("INTJ",new String[][]{{right, "INTJ","IJ", "SP"}});
nonTerminalInfo.put("IP",new String[][]{{left, "VP","IP"}, rightExceptPunct}); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe head
nonTerminalInfo.put("LCP",new String[][]{{right, "LC","LCP"}}); // there's a bit of LCP adjunction
nonTerminalInfo.put("LST",new String[][]{{right, "CD","PU"}}); // covers all examples
nonTerminalInfo.put("NP",new String[][]{{right, "NN","NR", "NT","NP", "PN","CP"}}); // Basic heads are NN/NR/NT/NP; PN is pronoun.  Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites.  Finally, note that this doesn't give any specialtreatment of coordination.
nonTerminalInfo.put("PP",new String[][]{{left, "P","PP"}}); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2
// cdm 2006: PRN changed to not choose punctuation.  Helped parsing (if not significantly)
// nonTerminalInfo.put("PRN", new String[][]{{left,"PU"}}); //presumably left/right doesn't matter
nonTerminalInfo.put("PRN",new String[][]{{left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP"}, {rightdis, "NN","NR", "NT","FW"}});
// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-op
nonTerminalInfo.put("QP",new String[][]{{right, "QP","CLP", "CD","OD", "NP","NT", "M"}});//there's some QP adjunction
// add OD?
nonTerminalInfo.put("UCP",new String[][]{{left, }}); //an alternative would be"PU","CC"
nonTerminalInfo.put("VP",new String[][]{{left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV"},leftExceptPunct}); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP
// add BA, LB, as needed

// verb compounds
nonTerminalInfo.put("VCD",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});//could easily be right instead
nonTerminalInfo.put("VCP",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});// notmuch info from documentation
nonTerminalInfo.put("VRD",new String[][]{{left, "VCD","VRD", "VV","VA", "VC","VE"}}); // definitely left
nonTerminalInfo.put("VSB",new String[][]{{right, "VCD","VSB", "VV","VA", "VC","VE"}}); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)
nonTerminalInfo.put("VNV",new String[][]{{left, "VV","VA", "VC","VE"}}); // left/right doesn't matter
nonTerminalInfo.put("VPT",new String[][]{{left, "VV","VA", "VC","VE"}}); // activity verb is to the left

// some POS tags apparently sit where phrases are supposed to be
nonTerminalInfo.put("CD",new String[][]{{right, "CD"}});
nonTerminalInfo.put("NN",new String[][]{{right, "NN"}});
nonTerminalInfo.put("NR",new String[][]{{right, "NR"}});

// I'm adding these POS tags to doprimitive morphology for character-level
// parsing.  It shouldn't affect anythingelse because heads of preterminals are not
// generally queried - GMA
nonTerminalInfo.put("VV",new String[][]{{left}});
nonTerminalInfo.put("VA",new String[][]{{left}});
nonTerminalInfo.put("VC",new String[][]{{left}});
nonTerminalInfo.put("VE",new String[][]{{left}});

// new for ctb6.
nonTerminalInfo.put("FLR",new String[][]{rightExceptPunct});

// new for CTB9
nonTerminalInfo.put("DFL",new String[][]{rightExceptPunct});
nonTerminalInfo.put("EMO",new String[][]{leftExceptPunct});//left/right doesn't matter
nonTerminalInfo.put("INC",new String[][]{leftExceptPunct});
nonTerminalInfo.put("INTJ",new String[][]{leftExceptPunct});
nonTerminalInfo.put("OTH",new String[][]{leftExceptPunct});
nonTerminalInfo.put("SKIP",new String[][]{leftExceptPunct}); 


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章