問題:請簡單說一說C#中主要的集合類,並闡述幾種主要集合類的實現原理的區別。
思考:
在任何一門號稱爲高級編程語言的語言當中,各種基礎集合類是必不可少的一部分。在實際的編程當中,開發人員肯定會遇到各種需要操作大量數據,或者大量對象的情況。在面對不同的應用場景下,各種不同的集合類便在這裏發揮出了其各自在設計的時候,所具備的各種強大特性。當然,在c#當中,也同樣有很多強大的集合類用來處理大量數據的應用場景。在日常的實際使用中,雖然用起來很方便,但是卻不太有機會深入到源碼的底層去了解其具體的實現。這裏我想借這個機會,深入瞭解一下,同時也是自己學習提升的一次嘗試。
上一篇文章主要聊了C#最基礎的幾個重要接口IList,List等直接在業務中使用的一些常見結構,以及這幾個接口的泛型等價接口,並且從源碼的層面簡要的對比了各個接口之間的差異。本篇會接着聊一些繼承了這幾個基礎接口的集合接口及其各自的集合實現,同樣源碼是參考.NET4.5版本。
論C#集合_接口部分_IDictionary&Dictionary
//除了List相關接口,C#中還有字典類,字典在實際的編程中使用的情況也很多。
//下面的部分我們先看一下IDictionary接口具體的內容
////下面的源碼參考.NET framework 4.5
IDictionary接口及其泛型形式
// ==++==
//
// Copyright (c) Microsoft Corporation. All rights reserved.
//
// ==--==
/*============================================================
**
** Interface: IDictionary
**
** <OWNER>[....]</OWNER>
**
**
** Purpose: Base interface for all dictionaries.
**
**
===========================================================*/
namespace System.Collections {
using System;
using System.Diagnostics.Contracts;
// An IDictionary is a possibly unordered set of key-value pairs.
// Keys can be any non-null object. Values can be any object.
// You can look up a value in an IDictionary via the default indexed
// property, Items.
#if CONTRACTS_FULL
[ContractClass(typeof(IDictionaryContract))]
#endif // CONTRACTS_FULL
[System.Runtime.InteropServices.ComVisible(true)]
public interface IDictionary : ICollection
{
// Interfaces are not serializable
// The Item property provides methods to read and edit entries
// in the Dictionary.
Object this[Object key] {
get;
set;
}
// Returns a collections of the keys in this dictionary.
ICollection Keys {
get;
}
// Returns a collections of the values in this dictionary.
ICollection Values {
get;
}
// Returns whether this dictionary contains a particular key.
//
bool Contains(Object key);
// Adds a key-value pair to the dictionary.
//
void Add(Object key, Object value);
// Removes all pairs from the dictionary.
void Clear();
bool IsReadOnly
{ get; }
bool IsFixedSize
{ get; }
// Returns an IDictionaryEnumerator for this dictionary.
new IDictionaryEnumerator GetEnumerator();
// Removes a particular key from the dictionary.
//
void Remove(Object key);
}
/***
上面是IDictionary接口中定義的行爲,下面是是其等價的泛型形式
***/
// ==++==
//
// Copyright (c) Microsoft Corporation. All rights reserved.
//
// ==--==
/*============================================================
**
** Interface: IDictionary
**
** <OWNER>[....]</OWNER>
**
**
** Purpose: Base interface for all generic dictionaries.
**
**
===========================================================*/
namespace System.Collections.Generic {
using System;
using System.Diagnostics.Contracts;
// An IDictionary is a possibly unordered set of key-value pairs.
// Keys can be any non-null object. Values can be any object.
// You can look up a value in an IDictionary via the default indexed
// property, Items.
#if CONTRACTS_FULL
[ContractClass(typeof(IDictionaryContract<,>))]
#endif // CONTRACTS_FULL
public interface IDictionary<TKey, TValue> : ICollection<KeyValuePair<TKey, TValue>>
{
// Interfaces are not serializable
// The Item property provides methods to read and edit entries
// in the Dictionary.
TValue this[TKey key] {
get;
set;
}
// Returns a collections of the keys in this dictionary.
ICollection<TKey> Keys {
get;
}
// Returns a collections of the values in this dictionary.
ICollection<TValue> Values {
get;
}
// Returns whether this dictionary contains a particular key.
//
bool ContainsKey(TKey key);
// Adds a key-value pair to the dictionary.
//
void Add(TKey key, TValue value);
// Removes a particular key from the dictionary.
//
bool Remove(TKey key);
bool TryGetValue(TKey key, out TValue value);
}
在上述的接口和泛型形式中,首先可以看到定義了一個索引器,以key作爲索引下標,獲取對應的value值。而通過源碼可以看到,key和value都是ICollection類型的一組集合。並同時定義了一些字典操作常用的方法。新增,包含判斷,移除等。這兩個接口是所有字典數據結構的基類。
在泛型的接口中還可以看到其繼承了一個KeyValuePair
// ==++==
//
// Copyright (c) Microsoft Corporation. All rights reserved.
//
// ==--==
/*============================================================
**
** Interface: KeyValuePair
**
** <OWNER>[....]</OWNER>
**
**
** Purpose: Generic key-value pair for dictionary enumerators.
**
**
===========================================================*/
namespace System.Collections.Generic {
using System;
using System.Text;
// A KeyValuePair holds a key and a value from a dictionary.
// It is used by the IEnumerable<T> implementation for both IDictionary<TKey, TValue>
// and IReadOnlyDictionary<TKey, TValue>.
[Serializable]
public struct KeyValuePair<TKey, TValue> {
private TKey key;
private TValue value;
public KeyValuePair(TKey key, TValue value) {
this.key = key;
this.value = value;
}
public TKey Key {
get { return key; }
}
public TValue Value {
get { return value; }
}
public override string ToString() {
StringBuilder s = StringBuilderCache.Acquire();
s.Append('[');
if( Key != null) {
s.Append(Key.ToString());
}
s.Append(", ");
if( Value != null) {
s.Append(Value.ToString());
}
s.Append(']');
return StringBuilderCache.GetStringAndRelease(s);
}
}
}
接下來,大致看一下Dictionary的具體實現:
namespace System.Collections.Generic {
using System;
using System.Collections;
using System.Diagnostics;
using System.Diagnostics.Contracts;
using System.Runtime.Serialization;
using System.Security.Permissions;
[DebuggerTypeProxy(typeof(Mscorlib_DictionaryDebugView<,>))]
[DebuggerDisplay("Count = {Count}")]
[Serializable]
[System.Runtime.InteropServices.ComVisible(false)]
public class Dictionary<TKey,TValue>: IDictionary<TKey,TValue>, IDictionary, IReadOnlyDictionary<TKey, TValue>, ISerializable, IDeserializationCallback {
private struct Entry {
public int hashCode; // Lower 31 bits of hash code, -1 if unused
public int next; // Index of next entry, -1 if last
public TKey key; // Key of entry
public TValue value; // Value of entry
}
private int[] buckets;
private Entry[] entries;
private int count;
private int version;
private int freeList;
private int freeCount;
private IEqualityComparer<TKey> comparer;
private KeyCollection keys;
private ValueCollection values;
private Object _syncRoot;
// constants for serialization
private const String VersionName = "Version";
private const String HashSizeName = "HashSize"; // Must save buckets.Length
private const String KeyValuePairsName = "KeyValuePairs";
private const String ComparerName = "Comparer";
public Dictionary(): this(0, null) {}
public Dictionary(int capacity): this(capacity, null) {}
public Dictionary(IEqualityComparer<TKey> comparer): this(0, comparer) {}
public Dictionary(int capacity, IEqualityComparer<TKey> comparer) {
if (capacity < 0) ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
if (capacity > 0) Initialize(capacity);
this.comparer = comparer ?? EqualityComparer<TKey>.Default;
}
public Dictionary(IDictionary<TKey,TValue> dictionary): this(dictionary, null) {}
public Dictionary(IDictionary<TKey,TValue> dictionary, IEqualityComparer<TKey> comparer):
this(dictionary != null? dictionary.Count: 0, comparer) {
if( dictionary == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.dictionary);
}
foreach (KeyValuePair<TKey,TValue> pair in dictionary) {
Add(pair.Key, pair.Value);
}
}
...//略
這是核心的基礎字典類,除了這個實現類,字典類還有SortedDictionary類,這兩個類分別在mscorlib和system兩個包下,從邏輯上講SortedDictionary應該是對於字典類的封裝,在其中數據由二叉樹結構進行存儲_TreeSet。具體的實現還涉及到樹結構的實現和比較器的實現,如果再深入的話,可以直接參考源碼:
public TValue this[TKey key] {
get {
if ( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
TreeSet<KeyValuePair<TKey, TValue>>.Node node = _set.FindNode(new KeyValuePair<TKey, TValue>(key, default(TValue)));
if ( node == null) {
ThrowHelper.ThrowKeyNotFoundException();
}
return node.Item.Value;
}
set {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
TreeSet<KeyValuePair<TKey, TValue>>.Node node = _set.FindNode(new KeyValuePair<TKey, TValue>(key, default(TValue)));
if ( node == null) {
_set.Add(new KeyValuePair<TKey, TValue>(key, value));
} else {
node.Item = new KeyValuePair<TKey, TValue>( node.Item.Key, value);
_set.UpdateVersion();
}
}
}
/***
上面部分是索引器的是實現,下面是其中的尋找樹節點的內部方法。方法當中包含一個比較器的實現。
***/
internal virtual Node FindNode(T item) {
Node current = root;
while (current != null) {
int order = comparer.Compare(item, current.Item);
if (order == 0) {
return current;
} else {
current = (order < 0) ? current.Left : current.Right;
}
}
return null;
}
論C#集合_Set部分_ISet和IHashSet
//由上面的有序字典集合,其中有一個TreeSet的數據結構,由此可以引申出Set相關
//TreeSet繼承SortedSet,SortedSet繼承ISet,ICollection等,先了解ISet的具體內容
////下面的源碼參考.NET framework 4.5
// ==++==
//
// Copyright (c) Microsoft Corporation. All rights reserved.
//
// ==--==
/*============================================================
**
** Interface: ISet
**
** <OWNER>[....]</OWNER>
**
**
** Purpose: Base interface for all generic sets.
**
**
===========================================================*/
namespace System.Collections.Generic {
using System;
using System.Runtime.CompilerServices;
/// <summary>
/// Generic collection that guarantees the uniqueness of its elements, as defined
/// by some comparer. It also supports basic set operations such as Union, Intersection,
/// Complement and Exclusive Complement.
/// </summary>
public interface ISet<T> : ICollection<T> {
//Add ITEM to the set, return true if added, false if duplicate
new bool Add(T item);
//Transform this set into its union with the IEnumerable<T> other
void UnionWith(IEnumerable<T> other);
//Transform this set into its intersection with the IEnumberable<T> other
void IntersectWith(IEnumerable<T> other);
//Transform this set so it contains no elements that are also in other
void ExceptWith(IEnumerable<T> other);
//Transform this set so it contains elements initially in this or in other, but not both
void SymmetricExceptWith(IEnumerable<T> other);
//Check if this set is a subset of other
bool IsSubsetOf(IEnumerable<T> other);
//Check if this set is a superset of other
bool IsSupersetOf(IEnumerable<T> other);
//Check if this set is a subset of other, but not the same as it
bool IsProperSupersetOf(IEnumerable<T> other);
//Check if this set is a superset of other, but not the same as it
bool IsProperSubsetOf(IEnumerable<T> other);
//Check if this set has any elements in common with other
bool Overlaps(IEnumerable<T> other);
//Check if this set contains the same and only the same elements as other
bool SetEquals(IEnumerable<T> other);
}
}
上面的接口源碼可以看出ISet是all generic sets的基礎接口。可以看出其是直接繼承了ICollection,並提供了自己的行爲定義。其內部包含多個不重複的元素,且元素爲無序狀態。
在SortedSet中,則可以看到其內部比ISet複雜的多,包括其內部的數據存儲的具體實現。如果能直接看源碼的話,可以看到其實現是基於Stack的。而且依然是用二叉樹來保證其內部數據的元素存儲處於有序狀態。
接下來,看一下Set接口的具體實現:
public class HashSet<T> : ICollection<T>, ISerializable, IDeserializationCallback, ISet<T>
具體的細節不做深入,這裏源碼直接可以看出HashSet繼承與ISet泛型接口。HashSet是一個無序的能夠保持唯一性的集合。我們也可以把HashSet看作是Dictionary
public class Hashtable : IDictionary, ISerializable, IDeserializationCallback, ICloneable {
//略
}
由其繼承關係,我們可以看出HashTable實際上是繼承非泛型形式的字典接口IDictionary以及序列化接口和反序列化接口的。而且HashTable是沒有泛型的,這點和hashset,dictionary都是不一樣的。而在java中常見的hashmap,在c#項目的構建初期,雖然大量借鑑了java,但是似乎hashmap直接被拋棄了。所以沒有了hashtable和hashmap這樣的狗血問題。但是卻有了HashSet,HashTable和Dictionary區別這樣更狗血的問題。
寫在最後:
1.最近的考試,讓我深深的感覺到除了這些基礎的工程性知識外,底層知識和理論原理的羸弱。在準備考試的過程中,決定要調整未來2年的學習思路。
2.考慮到申請學位的時候論文的重要性,在接下來的時間中,會將一部分精力用於NLP的入門綜合相關的閱讀和學科前沿技術的論文追蹤。
3.另外會盡量在以後的理論學習中,使用純英文的材料進行閱讀,寫作和相關的工作。