正則替換group(n)內容

正則替換指定內容本來應該是一件挺容易的事情，但由於某些原因，替換指定group的內容得自己實現。

先設定一個需求，把下面字符串第1個的01換成1234，第2個01換成2345，當然也可能會有更多的01或者其他字符串：

		String hex = "00 00 00 01 00 01";
		String regex = "[0-9a-zA-Z\\s]{6}[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})\\s[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})";

正則中的小括號即爲提取參數，目的就是要將這些參數替換爲其他內容。

探究API

Java的String類雖然可以使用replaceAll/replaceFirst正則替換內容，但那是全局的，針對整個字符串的；

//String.java
    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }
    
    public String replaceFirst(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
    }

而Matcher類中的appendReplacement/appendTail（其實上文中String類的兩個方法也是Matcher類中），也無濟於事；

    public Matcher appendReplacement(StringBuffer sb, String replacement)
    public StringBuffer appendTail(StringBuffer sb)

前者appendReplacement適用於差異性替換，也就是用於匹配的正則不會匹配到其他內容，否則就會像這樣:

		String hex = "00 00 00 01 00 01";
		String regex1 = "[0-9a-zA-Z]{2}";
		Pattern pattern = Pattern.compile(regex1);
		Matcher matcher = pattern.matcher(hex);

		StringBuffer sb = new StringBuffer();
		while (matcher.find()){
			matcher.appendReplacement(sb, "1234");
		}
		System.out.println(sb.toString());

輸出：

1234 1234 1234 1234 1234 1234

把符合條件的字符替換了，但這裏明顯不能這麼做；
後者appendTail只會將最後一次匹配的內容添加到StringBuffer中。

所以在API本身並沒有找到適合的方法，就只能自行實現了。

取得索引

要替換內容，首先得知道需要替換的原內容的位置索引，然而這個索引位置從哪來？Matcher是怎麼用group(n)截取的字符串？

		String hex = "00 00 00 01 00 01";
		String regex = "[0-9a-zA-Z\\s]{6}[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})\\s[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})";

		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(hex);
		if (matcher.matches()) {
			int count = matcher.groupCount();
			for (int i = 1; i <= count; i++) {
				System.out.println(matcher.group(i));
			}
		}

輸出：

01
01

不要問，問就是group(n)必有蹊蹺；

    public String group(int group) {
        if (first < 0)
            throw new IllegalStateException("No match found");
        if (group < 0 || group > groupCount())
            throw new IndexOutOfBoundsException("No group " + group);
        if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
            return null;
        return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
    }

    CharSequence getSubSequence(int beginIndex, int endIndex) {
        return text.subSequence(beginIndex, endIndex);
    }

group內部也是在截取字符串，groups數組是什麼東西？爲何使用group*2就可以取到？

public final class Matcher implements MatchResult {

    /**
     * The storage used by groups. They may contain invalid values if
     * a group was skipped during the matching.
     */
    int[] groups;

	...//略
}

這是個不對外的屬性，也沒有get方法或其他方法能取得，只好試一下反射；

	/**
	 * 反射得到group所在索引
	 *
	 * @param clazz           Matcher類
	 * @param matcherInstance Matcher實例
	 * @return 索引數組
	 */
	public static int[] getOffsets(Class<Matcher> clazz, Object matcherInstance) {
		try {
			Field field = clazz.getDeclaredField("groups");
			field.setAccessible(true);

			return (int[]) field.get(matcherInstance);
		} catch (NoSuchFieldException | IllegalAccessException e) {
			e.printStackTrace();
		}
		return null;
	}

來測試一下：

		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(hex);
		matcher.matches();

		System.out.println(Arrays.toString(getOffsets(Matcher.class,matcher)));

輸出：

[0, 17, 9, 11, 15, 17, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]

根據我們對正則的普遍瞭解，可以得知第1組"0,17"實際就是group(0)也就是全匹配的內容，也就是全匹配時的起始索引和末尾索引；
那麼"9,11"就是group(1)的起始索引和末尾索引了；
依此類推；

這樣也就可以理解爲什麼要使用groups[group * 2], groups[group * 2 + 1]就可以用來截取字符串了。

顯然，正則在匹配之後，已經將對應的邊界索引記錄到groups數組中了。
那豈不是…？

實現替換

索引一拿到，就是萬事俱備了，只欠自行實現切割拼接字符串的“東風”了；
那麼就有

	/**
	 * 替換對應group(n)的內容
	 *
	 * @param origin      原始字符串
	 * @param regex       全匹配正則，需要替換的內容加小括號提取參數
	 * @param groupIndice group索引
	 * @param content     最終要得到的內容數組
	 * @return 最終內容
	 */
	public static String replaceMatcherContent(String origin, String regex, int[] groupIndice, String... content) {
		if (groupIndice.length != content.length) {
			return origin;
		}
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(origin);
		if (matcher.matches()) {
			int count = matcher.groupCount();
			String[] resSubArray = new String[count * 2 + 1];
			int[] offsets = getOffsets(Matcher.class, matcher);
			if (offsets == null) {
				return origin;
			}
			//分離出解析的內容
			int lastIndex = 0;
			for (int i = 1; i <= count; i++) {
				int startIndex = offsets[i * 2];
				int endIndex = offsets[i * 2 + 1];
				resSubArray[i * 2 - 2] = origin.substring(lastIndex, startIndex);
				resSubArray[i * 2 - 1] = origin.substring(startIndex, endIndex);
				lastIndex = endIndex;
			}
			resSubArray[count * 2] = origin.substring(lastIndex);

			//替換對應位置的內容
			for (int i = 0; i < groupIndice.length; i++) {
				resSubArray[groupIndice[i] * 2 - 1] = content[i];
			}

			//合併字符串
			StringBuilder sb = new StringBuilder();
			for (String sub : resSubArray) {
				sb.append(sub);
			}
			return sb.toString();
		}

		return origin;
	}

最終寫到一個工具類中，然後再來測試：

	public static void main(String[] args) {
		String hex = "00 00 00 01 00 01";
		String regex = "[0-9a-zA-Z\\s]{6}[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})\\s[0-9a-zA-Z]{2}\\s([0-9a-zA-Z]{2})";

		System.out.println(TextUtil.replaceMatcherContent(hex, regex, new int[]{1, 2}, new String[]{"1234", "2345"}));
	}

輸出：

00 00 00 1234 00 2345

成功。
或許還有一些小問題沒有想到，但目前的基本思路是這樣的。

代碼鏈接可以點擊這裏。

正則替換group(n)內容

探究API

取得索引

實現替換

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

Mina僞裝接收到心跳回復

利用IDE打jar包

關於gradle多渠道打包的命名

IDEA無法選擇新安裝字體

關於定時上報數據的無頭無尾問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結