每天學習記錄

20200523
如何使用pycharm中的debug模式下的evaluate
2020年5月9日
有關圖編輯距離的stackflow的知識點：
networkx如何將GED標準化
 如何使用networkx計算多圖（倆點之間有多條邊的圖）的GED
rdkit庫的使用文檔
 如何使用rdkit畫圖
2020年5月4日
1、寫開題報告
2、學習匈牙利算法

2020年4月29日
如何將dataframe中的某列的元素設置爲list?
link
如何處理dataframe中的缺失值？
參考資料
2020年4月21日
These debiasing algorithms are very helpful for reducing bias, but are not perfect and do not eliminate all traces of bias. For example, one weakness of this implementation was that the bias direction $g$ was defined using only the pair of words woman and man. As discussed earlier, if $g$ were defined by computing $g_1 = e_{woman} - e_{man}$ ; $g_2 = e_{mother} - e_{father}$ ; $g_3 = e_{girl} - e_{boy}$ ; and so on and averaging over them, you would obtain a better estimate of the “gender” dimension in the 50 dimensional word embedding space. Feel free to play with such variants as well.

The derivation of the linear algebra to do this is a bit more complex. (See Bolukbasi et al., 2016 for details.) But the key equations are:

$\mu = \frac{e_{w1} + e_{w2}}{2}\tag{4}$

$KaTeX parse error: Expected '}', got '_' at position 34: …mu * \text{bias_̲axis}}{||\text{…$

$KaTeX parse error: Expected '}', got '_' at position 33: …mu * \text{bias_̲axis}}{||\text{…$

$\mu_{\perp} = \mu - \mu_{B} \tag{6}$

$e_{w1B} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{(e_{\text{w1}} - \mu_{\perp}) - \mu_B} {|(e_{w1} - \mu_{\perp}) - \mu_B)|} \tag{7}$

$e_{w2B} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{(e_{\text{w2}} - \mu_{\perp}) - \mu_B} {|(e_{w2} - \mu_{\perp}) - \mu_B)|} \tag{8}$

$e_1 = e_{w1B} + \mu_{\perp} \tag{9}$ $e_2 = e_{w2B} + \mu_{\perp} \tag{10}$

Exercise: Implement the function below. Use the equations above to get the final equalized version of the pair of words. Good luck!

$e^{bias\_component} = \frac{e*g}{||g||_2^2} * g\tag{2}$ $e^{debiased} = e - e^{bias\_component}\tag{3}$
If you are an expert in linear algebra, you may recognize $e^{bias\_component}$ as the projection of $e$ onto the direction $g$ . If you’re not an expert in linear algebra, don’t worry about this.
$\beta$
The figure below should help you visualize what neutralizing does. If you’re using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction $g$ , and the remaining 49 dimensions, which we’ll call $g_{\perp}$ . In linear algebra, we say that the 49 dimensional $g_{\perp}$ is perpendicular (or “othogonal”) to $g$ , meaning it is at 90 degrees to $g$ . The neutralization step takes a vector such as $e_{receptionist}$ and zeros out the component in the direction of $g$ , giving us $e_{receptionist}^{debiased}$ .

Even though $g_{\perp}$ is 49 dimensional, given the limitations of what we can draw on a screen, we illustrate it using a 1 dimensional axis below.
2020年4月20日

Now, you will consider the cosine similarity of different words with $g$ . Consider what a positive value of similarity means vs a negative cosine similarity.

Lets first see how the GloVe word embeddings relate to gender. You will first compute a vector $g = e_{woman}-e_{man}$ , where $e_{woman}$ represents the word vector corresponding to the word woman, and $e_{man}$ corresponds to the word vector corresponding to the word man. The resulting vector $g$ roughly encodes the concept of “gender”. (You might get a more accurate representation if you compute $g_1 = e_{mother}-e_{father}$ , $g_2 = e_{girl}-e_{boy}$ , etc. and average over them. But just using $e_{woman}-e_{man}$ will give good enough results for now.)

In the word analogy task, we complete the sentence “a is to b as c is to ____”. An example is ‘man is to woman as king is to queen’ . In detail, we are trying to find a word d, such that the associated word vectors $e_a, e_b, e_c, e_d$ are related in the following manner: $e_b - e_a \approx e_d - e_c$ . We will measure the similarity between $e_b - e_a$ and $e_d - e_c$ using cosine similarity.

To measure how similar two words are, we need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors $u$ and $v$ , cosine similarity is defined as follows:

$\text{CosineSimilarity(u, v)} = \frac {u . v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$

where $u.v$ is the dot product (or inner product) of two vectors, $||u||_2$ is the norm (or length) of the vector $u$ , and $\theta$ is the angle between $u$ and $v$ . This similarity depends on the angle between $u$ and $v$ . If $u$ and $v$ are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value.
The norm of $u$ is defined as $||u||_2 = \sqrt{\sum_{i=1}^{n} u_i^2}$

散列表的查找性能分析：

不成功平均查找長度（ASLu）什麼意思？

設散列函數h(key)= key mod TableSize,衝突解決方法爲線性探測。即d_{i}=.
假設TabelSize = 11,key = 11, 30, 47,則查找33按照線性探測，需要比較3次，因此不成功查找3次。如下圖：

提示報錯：
#define MAXSIZETABLE 10000;
這裏不要加封號！

這是今天寫的一個代碼，折騰了好久，在lyh師兄的指點下，順利編譯並且運行了。牢記一點：錯誤需要一個一個解決！

//幾個函數功能：計算素數 初始化哈希表 查找哈希表 插入哈希表 將字符串進行哈希函數的計算，
//參考資料：課件：慕課；書本參考P258
#define MAXSIZETABLE 10000
#define CLOCKS_PER_SEC 1000 //宏定義CLOCKS_PER_SEC爲1000，用於將difftime求得的單位從毫秒轉化爲秒
#include<stdio.h>
#include<time.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>

typedef int ElementType;
typedef int Index;//散列地址類型
typedef struct LNode *PtrToLNode;
 struct LNode{ //定義鏈表的結構，包含鏈表元素和位置
	ElementType Data;
	PtrToLNode Next ;
};
 typedef PtrToLNode Position; //Position是指針，指向結構體struct LNode 類型
typedef PtrToLNode List;
typedef struct TblNode *HashTable;
struct TblNode{ //定義哈希表,包含表長和含有鏈表頭的數組
	int TableSize;
	List Heads; //指向鏈表頭結點的數組
};

//返回一個不超過最大表長的最小素數
//輸入：一個整形數
//輸出：一個素數
int nextPrime(int N){

	int p=(N%2)?(N+2):(N+1);//從大於N的最小奇數開始探測
	int i;

	while(p<MAXSIZETABLE)//超過MAXSIZETABLE時停止比較
	{
		for(i=(int)sqrt(p);i>2;i--)//判斷p是否是素數
		{
			if (!(p%i)) break;//如果p被除了1和它本身整除，則不是素數，退出
		}
		if(i==2) break;//for 循環正常結束，說明p是素數
		else p+=2;

	}
	return p;
}

//計算哈希值的函數，哈希值爲Key在哈希表中的地址
//輸入：Key，TableSize
//輸出：一個整形數字
int hash(int Key,int TableSize)
{
	return Key%TableSize;
}

//初始化哈希表
HashTable createTable(int TableSize){
	HashTable H;
	int i;	

	//先分配內存，內存大小爲sizeof(struct HashTable)
	H = (HashTable)sizeof(struct TblNode);
		//保證散列表的最大長度是素數
	H->TableSize = nextPrime(TableSize);
	//分配鏈表頭結點數組
	H->Heads=(List)malloc(H->TableSize *sizeof(struct LNode));
	//初始化表頭節點
	for(i=0;i<H->TableSize;i++)
	{
		H->Heads[i].Data = '\0';
		H->Heads[i].Next = NULL;
	}
}

//在哈希表H中找Key
Position Find(ElementType Key,HashTable H)
{
	Position p;
	Index Pos;

	//計算key的哈希值
	Pos=hash(Key,H->TableSize);
	
	p= H->Heads[Pos].Next;//從鏈表的第一個位置開始

	//根據哈希值，在哈希表中查找，找到返回位置，否則返回null
	while(p && strcmp(p->Data,Key))
	{
		p=p->Next;
	}
	return p;//此時P指向找到的結點，或者爲NULL
}

//在哈希表中插入Key
void InsertKey(HashTable H,ElementType Key){
	Position NewCell;
	Index Pos;

	//先在哈希表中找Key,調用find(key,H)
	Position P=Find(Key,H);

	//若沒有找到，可以插入
	if(!P) 
	{
		//爲新插入的節點開闢內存空間，sizeof(struct LNode)
		NewCell = (Position)malloc(sizeof(struct LNode));

		//通過哈希函數計算Key的哈希地址，作爲在表中插入的位置
		Pos = hash(Key,H->TableSize);

		//插入這個鏈表需要插入key，以及next指針，就是插入一個表結點
		NewCell->Next = H->Heads[Pos].Next;
		H->Heads[Pos].Next = NewCell;
	}

	//若找到，無法插入
	else{
		printf("Key已經存在，無法插入");
	}
}

//銷燬HashTable
void DestoryTable(HashTable H){
	int i;
	Position p,Temp;

	//刪除鏈表節點的操作：p先指向鏈表節點,臨時指針temp指向p->Next,釋放p free(p),P->Temp
	for (i = 0;i<H->TableSize;i++)
	{
		p = H->Heads[i].Next;//先確定要刪除的表頭，p是表頭後面的鏈條
		while(p)
		{
			Temp = p->Next;
			free(p);
			p=Temp;
		}

	}
	free(H->Heads);
	free(H);
}

//將一個字符串進行哈希函數的計算，
//輸入：字符串，例如abcdd 
//輸出：字符串的哈希地址
int  HashAdd(const char *key)
{
	//程序計時開始
	clock_t start_t,end_t;
	double total_t;
	

	//字符串的哈希函數，將字符串進行哈希函數
	int TableSize = 110; 
	unsigned int h = 0; //散列函數值，從初始化爲0
	int add;
	start_t = clock();

	while(*key!= '\0')//將字符串每個字符*32後相加再相乘，最後得到的這個字符串的哈希值
	{
		h = (h <<5) + *key++; //左移5位

	}
	add = h % TableSize;
	end_t = clock();
	printf("%d\n",add);

	//程序計時結束
	total_t = difftime(end_t,start_t)/CLOCKS_PER_SEC;
	printf("diff:CPU佔用的時間：%f\n",total_t);
	
}


//main函數
void main()
{
	printf("hello!");
}

2020年4月8日
關於編輯距離函數的研究：
1.論壇
2.python官網
3.Networkx中論壇，裏面有關於圖同構的討論

2020年4月10日
如何解決報錯：AttributeError: module ‘tensorflow_core.compat.v1’ has no attribute ‘contrib’
answer
2020年4月13日
python庫中GED的包

每天學習記錄

認知提升的方法

C#開源的兩款功能強大的錄屏神器

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

前端 Vue yarn.lock文件：詳解和使用指南

2020年7月日誌

2020年5月日記

匈牙利算法計算GED

圖挖掘的基本概述以後寫文獻綜述

2020年6月日記

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結