summary2[3.11-3.17]

原創

Ensheng Shi

2020-06-07 21:47

主要工作在讀Code Completion with Neural Attention and Pointer Networks以及想去實現

summary

Ⅰ Code Completion with Neural Attention and Pointer Networks的實現

Ⅱ Code auto-Completion 一些看法

Ⅰ Code Completion with Neural Attention and Pointer Networks的實現

github上無本paper的source code，需要自己implement。DataSet是public

1.DataSet

DataSet
150k個 JavaScript 和python code snippet的AST,存儲在json文件中。
eg： python

2. Data processing

paper中將AST Flatten後進行處理

我們python讀入json文件後，可以一按照原文所說的用BFS可以將AST flatten成一個list。
就比如圖片上的那個

x = 7
print x+1

code snippet的AST爲：

[ {"type":"Module","children":[1,4]},
    {"type":"Assign","children":[2,3]},
      {"type":"NameStore","value":"x"},
      {"type":"Num","value":"7"},
    {"type":"Print","children":[5]},
      {"type":"BinOpAdd","children":[6,7]},
        {"type":"NameLoad","value":"x"},
        {"type":"Num","value":"1"} ]

按照paper說的，BFS遍歷，non-leaf節點的value爲empty，處理後爲：

[{'type': 'Module', 'value': 'empty'}, 
{'type': 'Assign', 'value': 'empty'},
{'type': 'NameStore', 'value': 'x'},
{'type': 'Num', 'value': '7'},\
{'type': 'Print', 'value': 'empty'},
{'type': 'BinOpAdd', 'value': 'empty'},
{'type': 'NameLoad', 'value': 'x'}, 
{'type': 'Num', 'value': '1'}]

3. confused

1.我沒太懂他說的怎麼生成的queries

2.

這是 attentional LSTM，他的輸入是綠色（Type）和黃色（value）的vector concatenate在一起的，如果在test集合中黃色（value）存在rare word，不在vocabulary中，也就沒有對應的vector，那這樣是如何輸入。

3.還有其他一些實現細節上的問題，我在試着解決，但會記錄。

4. Next work

先用其他數據實現LSTM，attention NN, point NN。

Ⅱ Code auto-Completion 一些看法

1.之前看基於lexeme的paper實現code completion不算Code auto-Completion

我之前去年和你交流那些基於詞法實現code completion看的那幾篇論文：

On the Naturalness of Software ICSE2012
Mining Source Code Repositories at Massive Scale using Language Modeling MSR 2013
On the Localness of Software FSE2014
CACHECAA Cache Language Model Based Code ICSE2015

我現在覺得，他們並不算是Code auto-Completion，他們只是把code completion當成task，把自己的model用在已經寫好的code上，然後將predict 一個token和原來的token 對比一下。這個不應該稱得上自動補全

2 . 我對code auto-completion本身的理解感覺都有偏差

first ，auto-completion一個直觀的scenario ，我以前覺得就是在dev 寫code的時候，寫到

for(int i =0;

我就補全

for(int i =0; i< n ;i++)

還有就是如果是在

#include<stdio.h>
int max（int a, int b）
{
	return a>b?a:b;
}
int main()
{
int a = 1;
int b = 2;
int c;
 c = max(
 
}

寫到c = max（```的時候能補全成 ```c=max(a,b)這些是我覺得算是auto–completion的。如果是我理解的這種auto–completion，那就不需要vocabulary，token只源自context。

3 next work

我打算重新看下A Survey of Machine Learning for Big Code and Naturalness因爲我的畢設方向和大部分論文都是根據這個找的。

Ⅲ 《編譯原理》

我這幾天每太看，但有個疑惑我還是沒懂就是，parse階段，derivation的過程形成的是parse tree，那AST是怎麼得到的？
我在龍書第五章找到一張圖，按照對這張圖的理解，AST樹每個內部節點不再像parse tree是CFG定義的non-terminal node

由此來看之前antlr解析出來的樹（如下圖）是parse tree而不是AST，

for(i = 0; i<5;i++)
{
sum = sum +i
}

上圖這個關於for 循環的Tree，對照着JAVA8的grammar可以derivate出現，所以應該是Parse tree。

AST我還得好好看看。

Ⅳ Lexical analyzer and parser（7，Mar ;8，Mar）

我看懂網上給的簡單的詞法分析器和語法分析器的實現，看懂之後，感覺浪費了時間，因爲是用switch…case…框架和最長匹配原則實現了詞法分析。簡單語法，用了很簡單那遞歸下降算法得出來
https://blog.csdn.net/qq_36097393/article/details/88403887

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

summary2[3.11-3.17]

summary

Ⅰ Code Completion with Neural Attention and Pointer Networks的實現

1.DataSet

2. Data processing

3. confused

4. Next work

Ⅱ Code auto-Completion 一些看法

1.之前看基於lexeme的paper實現code completion不算Code auto-Completion

2 . 我對code auto-completion本身的理解感覺都有偏差

3 next work

Ⅲ 《編譯原理》

Ⅳ Lexical analyzer and parser（7，Mar ;8，Mar）

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

sql server sp_executesql 中使用表變量進行查詢

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

java中的多態，抽象方法和接口

機器學習1 --- 決策樹

機器學習2--樸素貝葉斯

Tiger-program1 -slp

java中的super()和this

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結