短語標記17個
標註 |
英文說明 |
中文說明 |
ADJP |
Adjective phrase |
形容詞短語,由JJ投射 |
ADVP |
Adverbial phrase headed by AD |
由副詞開頭的副詞短語、狀語 |
CLP |
Classifier phrase |
量詞短語 |
CP |
Clause headed by C(complementizer) |
由補語引導的補語從句,關係從句 |
DNP |
Phrase formed by “XP+DEG” |
XP+DEG結構構成的短語 |
DP |
Determiner phrease |
限定詞短語 |
DVP |
Phrase formed BY ‘’XP+DEB“ |
XP+DEV結構構成的短語 |
FRAG |
fragment |
片段 |
IP |
InflectionPhrase |
Simple clause headed by I(INFL或其他曲折成份) |
LCP |
Phrase formed by ”XP+LC“ |
處所詞爲中心語的短語 |
LST |
List marker |
用於解釋說明性的列表標記短語 |
NP |
Noun phrase |
名詞短語 |
PP |
Preposition phrase |
介詞短語 |
PRN |
Parenthetical |
插入語 |
QP |
Quantifier phrase |
數詞短語,由數量詞構成的短語結構 |
UCP |
Unidentical coordination phrase |
非一致性並列短語 |
VP |
Verb phrase |
動詞短語 |
動詞複合6個標記
VCD 並列動詞複合(VCD (VV投資) (VV辦廠))VCP VV+VC 動詞+是
VNV A不A,A一A,(VNV(VV 能) (AD 不) (VV 能))
VPT V的R,或V不R (VPT (VV 得) (AD 不) (VV 到))
VRD 動詞結果複合,第二個成份是第一個成份的結果(VRD (VV 呈現) (VV 出));(VP(VRD(VV 聯合) (VV 起來)))
VSB 定語+核心複合,第一個成份爲不及物動詞,兩個成份之間沒有附加語或者體標記,VSB (VV 加速) (VV 建設)) (VP(VSB(VV 仰頭)(VV 望去)))
NP
中心詞爲名詞構成的短語。從語法角度看,有兩種含義:(1)按句法成份構成的短語,如組塊在句子中充當主語、賓語等,可以增加輔助標籤,NP-Sbg,NP-Obj;(2)知識庫中的實體和屬性,這種組塊稱爲baseNP。
VP
以動詞爲中心,與其修飾、限定、並列成份共同構成的一種語義組塊。
CoreNLP中源碼
nonTerminalInfo.put("ROOT",new String[][]{{left, "IP"}});
nonTerminalInfo.put("PAIR",new String[][]{{left, "IP"}});
// Major syntactic categories
nonTerminalInfo.put("ADJP",new String[][]{{left, "JJ","ADJP"}}); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJP
nonTerminalInfo.put("ADVP",new String[][]{{left, "AD","CS", "ADVP","JJ"}}); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewrites
nonTerminalInfo.put("CLP",new String[][]{{right, "M","CLP"}});
//nonTerminalInfo.put("CP", newString[][] {{left,"WHNP","IP","CP","VP"}}); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.
nonTerminalInfo.put("CP",new String[][]{{right, "DEC","WHNP", "WHPP"},rightExceptPunct}); // the (syntax-oriented) right-first head rule
// nonTerminalInfo.put("CP", new String[][]{{right, "DEC","ADVP", "CP", "IP", "VP","M"}}); // the (syntax-oriented) right-first head rule
nonTerminalInfo.put("DNP",new String[][]{{right, "DEG","DEC"}, rightExceptPunct});//according to tgrep2, first preparation, all DNPs have a DEG daughter
nonTerminalInfo.put("DP",new String[][]{{left, "DT","DP"}}); // there's one instance of DP adjunction
nonTerminalInfo.put("DVP",new String[][]{{right, "DEV","DEC"}}); // DVP always has DEV under it
nonTerminalInfo.put("FRAG",new String[][]{{right, "VV","NN"}, rightExceptPunct});//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"
nonTerminalInfo.put("INTJ",new String[][]{{right, "INTJ","IJ", "SP"}});
nonTerminalInfo.put("IP",new String[][]{{left, "VP","IP"}, rightExceptPunct}); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe head
nonTerminalInfo.put("LCP",new String[][]{{right, "LC","LCP"}}); // there's a bit of LCP adjunction
nonTerminalInfo.put("LST",new String[][]{{right, "CD","PU"}}); // covers all examples
nonTerminalInfo.put("NP",new String[][]{{right, "NN","NR", "NT","NP", "PN","CP"}}); // Basic heads are NN/NR/NT/NP; PN is pronoun. Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites. Finally, note that this doesn't give any specialtreatment of coordination.
nonTerminalInfo.put("PP",new String[][]{{left, "P","PP"}}); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2
// cdm 2006: PRN changed to not choose punctuation. Helped parsing (if not significantly)
// nonTerminalInfo.put("PRN", new String[][]{{left,"PU"}}); //presumably left/right doesn't matter
nonTerminalInfo.put("PRN",new String[][]{{left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP"}, {rightdis, "NN","NR", "NT","FW"}});
// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-op
nonTerminalInfo.put("QP",new String[][]{{right, "QP","CLP", "CD","OD", "NP","NT", "M"}});//there's some QP adjunction
// add OD?
nonTerminalInfo.put("UCP",new String[][]{{left, }}); //an alternative would be"PU","CC"
nonTerminalInfo.put("VP",new String[][]{{left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV"},leftExceptPunct}); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP
// add BA, LB, as needed
// verb compounds
nonTerminalInfo.put("VCD",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});//could easily be right instead
nonTerminalInfo.put("VCP",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});// notmuch info from documentation
nonTerminalInfo.put("VRD",new String[][]{{left, "VCD","VRD", "VV","VA", "VC","VE"}}); // definitely left
nonTerminalInfo.put("VSB",new String[][]{{right, "VCD","VSB", "VV","VA", "VC","VE"}}); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)
nonTerminalInfo.put("VNV",new String[][]{{left, "VV","VA", "VC","VE"}}); // left/right doesn't matter
nonTerminalInfo.put("VPT",new String[][]{{left, "VV","VA", "VC","VE"}}); // activity verb is to the left
// some POS tags apparently sit where phrases are supposed to be
nonTerminalInfo.put("CD",new String[][]{{right, "CD"}});
nonTerminalInfo.put("NN",new String[][]{{right, "NN"}});
nonTerminalInfo.put("NR",new String[][]{{right, "NR"}});
// I'm adding these POS tags to doprimitive morphology for character-level
// parsing. It shouldn't affect anythingelse because heads of preterminals are not
// generally queried - GMA
nonTerminalInfo.put("VV",new String[][]{{left}});
nonTerminalInfo.put("VA",new String[][]{{left}});
nonTerminalInfo.put("VC",new String[][]{{left}});
nonTerminalInfo.put("VE",new String[][]{{left}});
// new for ctb6.
nonTerminalInfo.put("FLR",new String[][]{rightExceptPunct});
// new for CTB9
nonTerminalInfo.put("DFL",new String[][]{rightExceptPunct});
nonTerminalInfo.put("EMO",new String[][]{leftExceptPunct});//left/right doesn't matter
nonTerminalInfo.put("INC",new String[][]{leftExceptPunct});
nonTerminalInfo.put("INTJ",new String[][]{leftExceptPunct});
nonTerminalInfo.put("OTH",new String[][]{leftExceptPunct});
nonTerminalInfo.put("SKIP",new String[][]{leftExceptPunct});