App Privacy and Controls – The Citizen Lab

What WeChat Knows: Pervasive First-Party Tracking in a Billion-User Super-App Ecosystem

Alyson Bruce — Thu, 14 Aug 2025 19:35:41 +0000

In this paper, researchers examine the analytics and first-party tracking ecosystem of WeChat Mini Programs. They find that WeChat is comprehensively tracking user activity at an unprecedented scale with no way for users nor developers to opt out.

Co-authored by Mona Wang (Princeton University / The Citizen Lab), Pellaeon Lin (The Citizen Lab), Jeffrey Knockel (The Citizen Lab / Bowdoin College), Will Greenberg (Electronic Frontier Foundation), Jonathan Mayer (Princeton University), Prateek Mittal (Princeton University)

Read What WeChat Knows: Pervasive First-Party Tracking in a Billion-User Super-App Ecosystem in the Privacy Enhancing Technologies Symposium (PETS) 2025 conference proceedings.

WireWatch: Measuring the Security of Proprietary Network Encryption in the Global Android Ecosystem

Alyson Bruce — Mon, 12 May 2025 15:44:29 +0000

The Citizen Lab’s Mona Wang, Jeffrey Knockel, and Zoë Reichert, in collaboration with Princeton researchers Prateek Mittal and Jonathan Mayer, co-authored a new paper on the network security of Android apps. Their research found that a large portion of popular Chinese apps use broken proprietary network protocols instead of TLS.

The paper, titled “WireWatch: Measuring the security of proprietary network encryption in the global Android ecosystem,” will be presented by Wang at the 2025 IEEE Symposium on May 14, 2025. Register here.

Network Security Issues in RedNote

Mona Wang — Wed, 12 Feb 2025 15:00:21 +0000

Key Findings

We analyzed RedNote on Android and iOS for network security issues and found that all versions of RedNote fetch viewed images and videos over HTTP, which enables network eavesdroppers to learn exactly what content users are browsing.
Some versions of RedNote contain a vulnerability that enables network attackers to learn the contents of any files that RedNote has permission to read on the users’ devices. This issue was introduced by an upstream software development kit (SDK) used by RedNote, NEXTDATA, but is not present in Android versions downloaded from the Google Play Store nor in the iOS version.
All versions of RedNote that we analyzed also transmitted insufficiently encrypted device metadata, sometimes over TLS without certificate validation, enabling network attackers to learn device and network metadata, such as device screen size and the mobile network carrier. This issue was introduced by an upstream SDK, MobTech.
We responsibly disclosed the relevant issues to NEXTDATA on November 13, 2024, to MobTech on November 26, 2024, and to RedNote on January 16, 2025. At the time of publication, no party had responded to our disclosures.
All the issues we discovered could be mitigated through the use of TLS. Yet again, this work highlights the importance of using well-supported encryption implementations.

Introduction

RedNote, or XiaoHongShu, is a popular Chinese social media application with upwards of 300 million active users. RedNote is notable not just due to its popularity within China, but especially due to its popularity with Chinese tourists travelling internationally, the Chinese diaspora, and recently, Americans. In January 2025, the application gained global attention as approximately three million US users joined RedNote in the wake of the US government’s decision to ban TikTok.

We analyzed RedNote as a part of our ongoing work reviewing popular applications for network security issues. Other researchers and media outlets have also taken interest in the application’s security, censorship, and privacy properties after its surge in popularity among US users. After we disclosed the vulnerabilities to all relevant vendors, including RedNote, NEXTDATA, and MobTech, another security researcher published his findings which included some, but not all, of the issues we disclose in this report.

First, we found that all versions of RedNote we analyzed fetch viewed images and videos without any encryption, which enables network eavesdroppers to learn exactly what content users are browsing. Second, we found a vulnerability on some versions of RedNote for Android that enables network attackers to learn the contents of files on users’ devices. This issue was introduced by an upstream software development kit (SDK) used by RedNote, NEXTDATA (also known as 数美 or Shumei), for “fraud prevention.” Finally, we found that all versions of RedNote we analyzed transmit insufficiently encrypted device metadata, sometimes over TLS without certificate validation. This final issue was also introduced by an upstream analytics SDK, MobTech.

This work is primarily a network security analysis, which evaluates attacks that would allow Internet service providers (ISPs), virtual private networks (VPNs), or other network attackers to surveil RedNote users on their network. We did not perform any other security or privacy analysis and did not evaluate other forms of data collection (e.g., third-party tracking or first-party data collection by RedNote). In other words, our analysis does not concern whether certain user data is collected by RedNote and does not concern whether that data could be made available to the Chinese government via data access requests. Our analysis threat model primarily concerns whether any ISP, government, or network operator can surveil or attack RedNote users on their network. We focus on this particular threat model because RedNote is especially popular with Chinese tourists travelling abroad and the Chinese diaspora. The issues that we found make these users especially vulnerable to surveillance by non-Chinese governments, which might not already have methods to obtain data about those individuals.

We also did not perform a full security audit of RedNote and did not make any attempt to exhaustively find every security vulnerability in the software. This report outlines the issues we have discovered. However, the absence of our reporting of other vulnerabilities should not be considered evidence of their absence.

We discovered the NEXTDATA and MobTech issues in November 2024. We responsibly disclosed the issue to NEXTDATA on November 13, 2024, and to MobTech on November 26, 2024. After we did not receive any response, we then disclosed all relevant issues to RedNote on January 16, 2025. As of February 12, 2025, we have not received any responses. As neither RedNote, MobTech, nor NEXTDATA responded to our security disclosures, we have decided to make our findings public in accordance with our disclosure policy.

Methods

We originally analyzed and found issues with RedNote version 8.41.0, downloaded from the Xiaomi Mi Store, as tested on Android 14, in October 2024. We later re-confirmed the issues in February 2025 on RedNote version 8.69.3 as available on the RedNote website and the Xiaomi Mi Store and on RedNote version 8.59.2 from the Google Play Store. We also confirmed similar issues in RedNote version 8.69 as available on the Apple App Store.

Platform	Package name / Bundle ID	Downloaded from	Version analyzed
Android	com.xingin.xhs	Google Play Store	8.59.2
Android	com.xingin.xhs	RedNote website	8.69.3
Android	com.xingin.xhs	Xiaomi Mi Store	8.69.3
iOS	com.xingin.discover	Apple App Store	8.69

Table 1: The versions of RedNote analyzed for our research.

We examined these applications using both static and dynamic analysis methods. We used jadx to statically analyze and decompile Dalvik bytecode; and IDA Pro to statically analyze and decompile native machine code. We used frida to dynamically analyze the application on Android. Finally, we used Wireshark to perform network traffic capture and analysis; and mitmproxy to manipulate network traffic in our proof-of-concept exploits.

Findings

In this section we detail three major issues we discovered relating to RedNote’s network security (see Table 2 for a summary). We close the section by discussing other miscellaneous issues we found in RedNote’s network security.

Issue	Google Play Store	RedNote website	Xiaomi Mi Store	iOS
1. Users’ browsing behaviour observable to network eavesdroppers	YES	YES	YES	YES
2. Users’ file contents readable by network attackers	NO	YES	YES	NO
3. Users’ device metadata available to network attackers	YES	YES*	YES*	YES*

Table 2: Summary of which vulnerabilities affect which versions of the applications we studied.
*These versions of the attack require a TLS MITM.

1. Users’ browsing behaviour observable to network eavesdroppers

Network eavesdroppers can observe users’ browsing behaviour on all versions of RedNote we analyzed.

All image and video data is loaded from RedNote content delivery network (CDN) servers over HTTP. Since RedNote is primarily a multimedia social media application, this means that network eavesdroppers can easily determine what content users are browsing on the application.

Figure 1: A screenshot of a network capture from RedNote in Wireshark; highlighted are matches for “/w/540”, each of which corresponds to a request for the preview thumbnail of a video made entirely unencrypted and in the clear.

As a demonstration, we provide a screenshot of a sample network capture in Figure 1. The resource requests containing “/w/540” correspond with the video preview thumbnails displayed on the homepage. The final four requests in Figure 1 correspond precisely to videos and image posts that the user viewed, including an image that was loaded in a comment response to a post.

2. Users’ file contents readable by network attackers

Network attackers can read users’ file contents on the Android versions of the application available for download on RedNote’s website and on the Mi Store, but not in the version downloaded from the Google Play Store or the iOS version. Network attackers can learn the contents of any files that RedNote has permission to read on the user’s device.

When an affected version of RedNote fetches a “configuration file,” it does so without proper encryption or authentication. A network adversary can alter the contents of this configuration file. RedNote then proceeds to run the commands specified in this configuration file and sends the response over the network. Although the response cannot be easily decrypted, an attacker could use the size of the returned payload as a side-channel to infer information about the result of the command run on the user’s device. We designed a proof-of-concept to demonstrate this attack.

This attack is enabled by an SDK called NEXTDATA or Shumei (数美), which is used by RedNote to identify the use of rooted or emulated devices. We disclosed this same vulnerability to Shumei via email on November 13, 2024, but, as of February 12, 2025, received no response.

NEXTDATA cloud configuration mechanism

The attack is enabled by the mechanism through which RedNote fetches and acts on a “cloud configuration file.” Based on the contents of this file, NEXTDATA uses this configuration to identify the use of emulated devices or the presence of rooting or other jailbreaking tools like Magisk. In this section, we describe the mechanism through which NEXTDATA downloads this configuration file.

We found that payloads contained in network requests received from http://fp-it.fengkongcloud.com/v3/cloudconf are encrypted using DES-ECB. Specifically, the “data” entry of the returned JSON object is encrypted using DES-ECB with the hard-coded key b'zaq1mko0'. In other words, it can be decrypted as DES_ECB_decrypt(data, b’zaq1mko0′). The JSON payload decrypted in this manner contains a field, risk_files, which contains a base64-encoded payload:

“risk_files”:”Y/DYqXJpoNsnXZnUIGXpNV1pECq0lshwbrHG1ek+g0T …

The risk_files payload is itself encrypted using AES-CBC. The key is effectively hardcoded — b'51996e9be805c9284e69bc7684800a26' — although it is computed as the first 32 ASCII bytes of the MD5 hex digest of b'smsdkshumeiorganizationflag', i.e., it is computed deterministically from a hardcoded value. The IV is also hardcoded: b'0102030405060708'. The risk_files entry, when base64-decoded, can then be decrypted using the hardcoded key and IV as AES_CBC_decrypt(risk_files, key, IV).

The following is an example of a payload decrypted in this manner:

[
  {
    "path": "file:///proc/self/maps",
    "words": [
      "/data/.+\\.so",
      ".*titan.so",
      ".*titan2.so",
      ".* rwxp .*"
    ],
    "type": "file",
    "key": "maps2",
    "option": "regex"
  },
...
]

This risk_files data is subsequently used to perform various checks from the application, including searching for the presence of files, or running regexes on files. The result from all of these checks is included in an encrypted payload in a subsequent request to http://fp-it.fengkongcloud.com/v3/profile/android. This request is encrypted with yet another custom encryption method, which uses an RSA-bootstrapped AES key. This encryption method has issues, but is not trivially decryptable. We describe this method in full, in the Appendix.

Proof-of-concept exploits

Since NEXTDATA uses static keys to encrypt the above payloads, any network attacker could derive the keys and other parameters required to decrypt and re-encrypt the payloads from within the app by reverse engineering it. We designed proof-of-concept attacks to inject and alter data into the risk_files field:

Modifying received rules to induce false positives and false negatives in the detection of “risky” files on users’ phones.
Performing a denial-of-service attack.
Reading contents of files on the user’s phone.

We accomplished (1) by modifying the rules that the user received to be triggered in different conditions than what the rules had originally intended. We accomplished (2) by modifying the rules that the user received to exploit denial-of-service vulnerabilities in the app’s regex engine. For example, we found that the patterns (((([a-f0-9]{1,100}){1,100}){1,100}){1,100}){1,100} and (((([a-f0-9]+)+)+)+)+ pegged victims’ CPUs when applied to files such as /proc/self/maps or /proc/cpuinfo. We expound on (3), the most significant of the three attacks, in the section below, which facilitates remote file access.

Remote file access

In this section, we describe our final proof-of-concept remote attack which uses the risk_files vulnerability to read the contents of files on a user’s phone. Since this field enables an attacker to inject any regex to run on any arbitrary set of files, an attacker could also inject risk_files data in order to detect the presence of any file that the application has permissions to read. Even without decrypting the /v3/profile/android network request, the attacker can observe the size of the payload as a side-channel to determine the presence of a file or whether a regex was run successfully. As a simple example, injecting the following into the risk_files field reveals to the network attacker whether is present on the user’s phone:

  {
    "path": "file://",
    "words": [".*"],
    "type": "file",
    "key": "xxx",
    "option": "regex"
  }

If exists on the phone, this regex generates a large number of matches, significantly inflating the encrypted payload size of the subsequent v3/profile/android request.

Not only can an attacker use the file size to identify the presence of files on the phone, an attacker can use the same side-channel to read the contents of a file. We designed a proof-of-concept attack that allows a network eavesdropper to read the BogoMIPS value from /proc/cpuinfo on the victim’s device. Although we cannot decrypt the client’s resulting transmission to the server, the attack works by transmitting 65536 regex patterns carefully constructed so that if the user’s BogoMIPS value is n, then n of these patterns will match. Since the app reports each match to the server, by coercing the user’s app into executing these patterns and reporting if they match, we are effectively leaking the BogoMIPS value in the length of the payload, as the payload’s length will be proportional to the magnitude of the BogoMIPS value. In our testing, we were able to exfiltrate the BogoMIPS value to the nearest whole BogoMIP, supporting any value from a range of 1 to 65536. However, the attack could be adapted to read different parts of any file that is accessible by the RedNote app. Testing our attack on two Android devices, the payload lengths transmitted by the app varied between 3.3 and 3.4MB.

In summary, any network attacker with a MITM position can read the contents of any file that RedNote has permissions to read on the user’s device.

3. Users’ device metadata available to network attackers

RedNote for Android and iOS both send device metadata using insufficient encryption. On the version of this application downloaded from the Google Play Store, these requests containing insecurely encrypted data are delivered over HTTP and not HTTPS. On the remaining Android versions that we analyzed and on the iOS version, though these requests are delivered over TLS, the RedNote application does not validate TLS certificates, enabling any network attacker with an active machine-in-the-middle (MITM) position to decrypt the underlying data. These requests are then additionally encrypted with an insecure encryption algorithm.

HTTPS POST requests for https://devc.zztfly.com/dinfo contain URL-encoded, base64-encoded payloads of encrypted data in the POST request body. This data is encrypted using standard AES-ECB with PKCS5 padding with the key b'sdk.commonap.sdk'. Here are snippets from a sample payload, captured from RedNote on iOS:

{"breaked":false,"dataStorage":127870980096,"mac":"02:00:00:00:00:00",
"datatime":1737492754724,"plat":2,"ram":3840327680,"factory":"APPLE",
"screensize":"1170x2532", ,
"model":"iPhone13,2","carrier":"-1"}

We believe these issues are introduced by an upstream analytics SDK developed by MobTech. We sent emails to MobTech disclosing these issues on November 26, 2024, but, as of February 12, we have not received any reply. When we re-analyzed RedNote in January 2025, we noticed that in all versions of RedNote other than the one downloaded from the Google Play Store, MobTech SDK requests had been upgraded to use TLS, albeit without any certificate validation.

Other potential MobTech issues

In this section, we discuss other potential issues with the cryptography used by other requests sent by MobTech. These requests also contain significant amounts of device metadata, and contain weaknesses in their encryption.

We also found that many other HTTPS POST requests to zztfly.com endpoints, including upc.zztfly.com/v5/gcl and log-auth.zztfly.com/api/log, are also not validating the TLS server certificate, and use yet another custom encryption method for many of their messages. We found that these requests use RSA without OAEP padding in order to encrypt AES keys. These AES keys are subsequently used in AES-ECB mode with PKCS7 padding to encrypt ciphertexts. In this section, we describe multiple weaknesses with this cryptographic construction.

During RSA encryption, instead of OAEP, MobSDK uses this custom padding scheme:

[01 00 00 00 10 00 00 00 (repeated zeros) (AES KEY)]

This padding scheme is similar to PKCS#1-1.5, which has historically been susceptible to numerous attacks with widespread applicability.

In general, RSA-bootstrapped key exchange is notoriously difficult to implement correctly, provides no forward secrecy, has a long history of widespread issues, and provides no cryptographic authenticity or integrity. TLS 1.3 no longer supports RSA key exchange for these reasons.

Finally, AES in ECB mode is generally weak due to its determinism. That is, within a ciphertext, an attacker can observe if blocks are encrypting the same plaintext block. In addition, ECB-encrypted ciphertexts are highly malleable. Since the MobSDK encryption scheme provides no integrity or authenticity, an attacker can arbitrarily insert, remove, alter, or swap ciphertext blocks without being noticed by the attacker.

Other insecure requests

We observed some other data leakages and the use of insufficient encryption on RedNote. We summarize the results below.

Insufficiently encrypted configuration file

We observed RedNote Android making an HTTP POST request to the endpoint fe.xiaohongshu.com/api/feresource/v1/web, which fetches a JSON configuration file. Since this request is unauthenticated and seems to contain no method for ensuring cryptographic integrity, it can be modified by network attackers without detection. Since the configuration file seems to contain various matching regexes, this configuration file could enable a similar attack vector to the NEXTDATA vulnerability.

Insufficiently encrypted DNS requests

Both RedNote iOS and Android make HTTP GET requests similar to the following:

http://119.29.29.98/d?dn=0f94e1082a116d20242a7cdceba900f7eed0eac4b1c3e7b3&clientip=1&ttl=1&query=1&id=8713&type=addrs&alg=des

We recognize these as Tencent HTTPDNS requests. These requests use a static DES key for encryption and can thus be easily decrypted by anyone who reverse-engineers the key from the RedNote application.

Discussion

We reiterate that the network security vulnerabilities outlined in this report could enable surveillance by any government or ISP, and not just the Chinese government. For instance, these issues make the new wave of American RedNote users more vulnerable to surveillance by their own government and ISPs. As the Chinese government might already have mechanisms to lawfully obtain detailed data from RedNote about their users, the issues that we found also make Chinese users especially vulnerable to surveillance by non-Chinese governments. This is particularly relevant due to RedNote’s popularity with Chinese travellers abroad and Chinese diaspora communities. Thanks to the Snowden revelations, we know similar vulnerabilities in Chinese applications have previously been exploited by the Five Eyes intelligence alliance to surveil Chinese users globally.

The security issues we discovered in RedNote are not unique to this app. TikTok, which is RedNote’s closest competitor outside of the Chinese market, has a history of similar issues. In 2019, analysts at the US-based law firm Glancy, Prongay, and Murray found that TikTok used non-standard encryption to protect sensitive user data. The findings informed a class action lawsuit on TikTok’s privacy violations that, in 2021, after consolidation with a similar suit, was ultimately settled for $92 million USD. In our 2021 report, we found that all requests by then were encrypted with HTTPS, although some requests also continued to use non-standard encryption such as ttEncrypt.

Applications that are popular in China often use no encryption, proprietary encryption protocols, or use TLS without certificate validation to encrypt sensitive data. We discussed possible reasons for this systemic issue, and potential paths for remediation in the discussion section of our previous report analyzing Chinese keyboard apps. Our findings on the RedNote application reaffirm the importance of using well-understood encryption libraries correctly and of refraining from developing custom encryption algorithms.

Suggestions for users

RedNote users that are concerned with local ISP surveillance may choose to use a trusted VPN. However, we note that this solution simply shifts control to the VPN provider and would simply enable the VPN provider to surveil or attack end-users rather than the ISP or other local network attackers. We suggest users that are highly concerned about network surveillance from any party refrain from using RedNote until these security issues are resolved.

As we mentioned earlier, we did not review RedNote’s privacy policy, data collection, or data sharing policies. As far as we know, the risks in this regard would be similar to those for any other application based in China. Therefore, users that may be concerned with RedNote’s possession of their data or their data being collected in a Chinese legal jurisdiction may also want to refrain from using the application.

Disclosure timeline

Following our responsible disclosure policy, we disclosed the issues that we discovered according to the timeline below.

November 2024

We discover that RedNote contains two vulnerabilities introduced by upstream SDKs, MobTech, and NEXTDATA.
On November 13, 2024, we disclosed to NEXTDATA at info@nextdata.ai.
On November 26, 2024, we disclosed to MobTech at notice@mob.com.

January 2025

We re-analyzed RedNote, finding that the NEXTDATA issue is still present. We observed that the MobTech SDK’s requests have been upgraded to use TLS in RedNote.
On January 16, 2025, we disclosed to RedNote at shuduizhang@xiaohongshu.com and app_feedback@xiaohongshu.com. The second email address bounces.
We discover that the MobTech SDK’s requests, despite having been upgraded to TLS after our disclosure, are not subject to certificate validation.
On January 22, 2025, we followed up with MobTech and RedNote about this issue.

As of February 12, 2025, we have not received any responses from any of these parties.

Acknowledgements

We thank Pellaeon Lin, Adam Senft, and Siena Anstis for their review and feedback on this work. Research for this project was supervised by Jonathan Mayer, Prateek Mittal, and Ron Deibert.

Appendix

NEXTDATA RSA encryption

Some requests sent by NEXTDATA are encrypted using this method. The request relevant to our attack is an HTTP POST request to http://fp-it.fengkongcloud.com/v3/profile/android, which contains the following JSON payload in the request body:

{"data":
  {"pri": "seed encrypted with RSA pubkey>",
   "tn": "",
   "fingerprint": "
    seed>",
   "sessionId": "",
   "fpEncode":11},
"encrypt":1, 
"organization": "", ... }

This is RSA-bootstrapped AES encryption. First, NEXTDATA generates a seed by randomly selecting 16 letters from a to z. pri is this seed encrypted with a pinned RSA public key. tn is the MD5 checksum of sessionId, the underlying plaintext, seed, organization, and sm_tn, all encrypted with the same pinned RSA key. Finally, fingerprint is the underlying plaintext, encrypted with AES-CBC, as follows:

fingerprint = AES_encrypt(key = MD5(seed).hexdigest(), iv = b'0102030405060708')

Though we are not able to trivially decrypt this payload, this method of encryption still has issues. For instance, due to the method of generating the AES key seed, the entropy of the AES key is approximately 75 bits, even though the full AES key is 256 bits long. We do not recommend this method of encryption and strongly recommend the use of well-understood network encryption libraries.

【我们继续聊天？】常问问题

Mona Wang — Tue, 15 Oct 2024 18:59:49 +0000

阅读完整报告：我们也应该聊天吗？（Should We Chat, Too?）微信 MMTLS 加密协议安全性分析

这项研究如何加深我们对微信的了解？

微信是一款具有多种功能的应用程序。之前，我们研究了小程序的隐私问题及其监视以及审查文本和图像消息。本研究重点研究微信的网络加密协议及其安全性。

当我们这样的信息安全研究人员分析应用程序的安全性时，我们会执行网络流量分析，以研究应用程序发送了什么以及如何发送。通过该分析，我们可以了解该应用程序收集了哪些数据以及与谁共享这些数据。

在微信上进行这样的分析最初并不简单。当今大多数应用程序使用行业标准传输层安全性协议（TLS）来加密其网络流量的内容，这通常可以使窃听者无法读取底层数据。当研究人员希望分析他们自己的应用程序发送的流量时，已经存在解密此类内容的常用工具。然而，这些工具不适用于微信，因为它使用不同于TLS的专有网络加密协议，称为“MMTLS”。在本次研究之前，人们对MMTLS知之甚少，并且没有现有的工具可以检查使用MMTLS加密的内容。

我们对微信网络加密的内部工作原理进行了逆向工程，发现其安全性存在一些小问题。我们发现，之前所说的MMTLS只是微信使用的外层加密。在MMTLS中，我们发现了完全独立于MMTLS的第二层加密，称为“业务层加密”。两种加密体系像俄罗斯套娃一样互相“嵌套”，即先对明文内容进行业务层加密，再将得到的业务层密文作为MMTLS加密的输入，得到MMTLS密文，并透过网络发送出去。

我们发现业务层加密存在几个问题，最严重的是元数据泄漏，导致用户帐户ID和一些其他信息在此层未被加密。我们简单研究了一下旧版微信，发现只包含业务层加密。这些发现表明，业务层加密比MMTLS更早被使用，并且MMTLS很可能是为了弥补业务层加密的缺点而设计的。由于MMTLS加密包裹着业务层加密，因此试图利用业务层加密的弱点通常必须先破坏MMTLS层提供的保护。

到目前为止，我们还没有发现MMTLS存在严重的安全问题。因此，尽管业务层加密存在漏洞，但这些问题无法被攻击者利用，并且不会影响应用程序网络加密的整体安全性。

我在微信上的通讯安全吗？

每个人认同的威胁模型各异，因此对“安全”的定义也不同。如果您担心与其他微信用户的通信内容被网络窃听者看到，我们的研究表明，尽管与行业标准加密协议相比，微信的加密协议对网络窃听的保护较弱，但它不易受到目前已知的任何攻击技术的影响。

微信使用自定义加密协议，而不是行业标准的传输层安全性协议（TLS）。信息安全专家通常不建议使用定制设计的加密协议，因为经过良好测试的加密协议通常需要许多研究人员多年的共同努力。单个公司不可能投入同等程度的心力。我们还发现微信的加密协议中存在一些小问题，而TLS中并不存在同样的问题。

总而言之，尽管我们没有发现微信的加密协议存在任何重大弱点，但我们仍然发现了一些小问题。这些问题不会损害用户通讯的机密性。然而，行业标准TLS中并不存在同样的问题。

中国政府能够阅读我的微信信息吗？

在监管层面，由于腾讯总部位于中国，因此必须遵守当地法律并回应中国政府的用户数据请求。

从技术层面上，微信的加密协议保护了用户设备与微信服务器之间的通信。它不是一个端到端加密系统，不会对两个用户设备之间发送的数据进行加密。微信的服务器可以解密并阅读每条传输的消息。过去，我们发现微信使用基于关键词的检测系统审查中国用户发送和接收的私人信息。该应用程序还使用非中国用户发送的文件训练他们审查中国用户文件的数据库。

微信收集哪些类型的用户数据？

请参阅我们的之前的报告，其中回答了这个问题。

我有一部包含敏感数据的手机。如果我安装微信，它能窃取这些数据吗？

现代手机操作系统（OS）通常限制应用程序访问敏感用户数据（例如联系人和照片）和系统资源（例如地理位置服务），以及存储在其他应用程序中的数据（例如聊天应用程序中的聊天记录）。因此，从技术上讲，应用程序很难在未经用户授权的情况下访问敏感数据。然而，恶意应用程序可以利用操作系统保护中的漏洞并绕过这些访问限制。恶意应用程序还可能会诱骗用户授予访问数据的权限。

测试微信是否表现出这些恶意行为超出了我们的研究范围。但我们在研究中也没有观察到上述恶意行为。

为了保护自己免受此类攻击，我们建议：

确保您的手机的操作系统版本目前受供应商支持
保持手机操作系统为最新版本
从官方来源（内置应用商店）安装应用程序，而不是非官方来源
安装前检查应用程序的声誉
对于高度敏感的信息，使用其他专属的设备来处理

微信卸载后还会损害手机安全和隐私吗？

如上所述，现代手机操作系统通过实施系统保护来控制应用程序对敏感数据的访问。从技术上来说，应用程序卸载后很难在系统上保留植入物。然而，一些恶意应用程序可能会尝试利用系统漏洞来实现这一点。

测试微信是否表现出这些恶意行为超出了我们的研究范围。然而，我们在整个研究中并没有观察到这种恶意行为。

如果还有其他问题怎么办？

阅读我们的完整报告并查看我们之前分析微信隐私的报告中的常问问题。

翻译说明：这是原始英文报告的非正式翻译。此非正式翻译可能包含不准确之处。其目的仅是为了提供对我们研究的基本了解。若存在差异或歧义，请以本报告的英文版本为准。

【我們繼續聊天？】繁體中文摘要

Mona Wang — Tue, 15 Oct 2024 18:59:49 +0000

重要發現

微信有超過十億每月活躍使用者，我們分析了微信使用的主要網路協定 MMTLS 的安全和隱私特性，並發佈了首篇公開的研究報告。
我們發現 MMTLS 其實是修改自 TLS 1.3 協定，微信開發者修改了當中的密碼學機制，在部分修改當中引入了弱點。
進一步分析發現，早期的微信版本使用一個不同的、更不安全的自製協定，稱作「業務層加密」。業務層加密包含多個安全漏洞，在我們測試的新版微信當中，業務層加密與 MMTLS 同時被使用。
雖然我們並未發展出方法完全破解微信的網路加密，但其網路加密仍存在部分弱點，例如當中使用了決定性的初始向量 (Initialization Vector) ，以及欠缺向前保密特性 (forward secrecy) 。以一款擁有十億以上使用者的應用程式來說，其安全程度仍有待加強。
近期已有一些其他研究指出，中國市場中的應用程式經常不遵循密碼學界公認的最佳做法 (best practices)，而選擇發展它們自行設計的加密系統，背離最佳做法的後果是，這些加密系統經常含有大小不一的漏洞。本研究為上述現象提出了更強烈的佐證。
我們在研究中開發使用的工具程式和技術方法文件，釋出於我們的 Github 儲存庫。這些工具程式和技術文件將會協助其他研究人員進一步探索微信程式系統內部的運作方式。

註：本摘要未翻譯所有研究發現，請閱讀我們的英文報告以得知研究發現的細節。本摘要僅翻譯了完整報告中的「探討」及「建議」章節。

探討

「業務層加密」為何重要？

既然我們已經在報告中提到，業務層加密外面會再包裹一層尚稱安全的 MMTLS 加密，那麼業務層加密即使不安全，又會有何實質影響？在騰訊回覆我們的信件中，主要提到的是業務層加密的各種問題，信件中也暗示了它們正在緩慢地將業務層加密中有問題的 AES-CBC 加密法汰換為 AES-GCM，此回覆顯示騰訊對業務層加密的問題有所顧慮。

首先，我們研究了舊版微信（v6.3.16, 2016 發佈），當時業務層加密是微信傳輸網路資料的唯一一層加密。第二，由於業務層加密將內部的請求網址暴露未加密，我們猜測微信設計的伺服器端架構中可能由不同內部伺服器端點來處理不同類型的網路請求（不同類型的網路請求含有不同的 “requestType” 數值，還有不同的 “cgi-bin” 網址）。舉例來說，微信的伺服器端架構可能由最外層的伺服器端點來解開 MMTLS 加密，再依照不同的請求類型將內層請求轉遞至負責的內層伺服器端點（轉遞時不再重新加密，僅仰賴業務層加密提供的安全性）。在這樣假設的架構下，若微信內部網路中存有網路竊聽者，它們將可以直接攻擊這些被轉遞請求所使用的業務層加密。

為何不直接使用 TLS？

根據騰訊自行公開的技術文件，以及我們研究的驗證，MMTLS（微信所使用的「外層」加密）主要是基於 TLS 1.3。該技術文件顯示 MMTLS 的設計者對於非對稱式加密法有良好的理解。

該技術文件闡述了不使用 TLS 的理由：因為微信大多數的網路資料傳輸只要在一次請求和回應循環中即可完成（中國業界術語稱「短連接」），因此特別需要底層協定的 0-RTT 特性。MMTLS 只需要一來一回的下層 TCP 封包建立 TCP 連線，即可馬上開始傳輸資料，相較之下，TLS 1.2 在 TCP 連線建立後需要再增加一來一回的 TLS 交互握手才能開始傳輸資料。

好在TLS1.3草案标准中提出了0-RTT（不额外增加网络延时）建立安全连接的方法，另外TLS协议本身通过版本号、CipherSuite、Extension机制提供了良好的可扩展性。但是，TLS1.3草案标准仍然在制定过程中，基于标准的实现更是遥遥无期，并且TLS1.3是一个对所有软件制定的一个通用协议，如果结合微信自己的特点，还有很大的优化空间。因此我们最终选择基于TLS1.3草案标准，设计实现我们自己的安全通信协议mmtls。

然而，即使在該文件撰寫當時的 2016 年，TLS 1.2 也已經提供會話恢復 (session resumption) 的功能。更有甚者，即使當時 TLS 1.3 在 IETF 協定制定流程中仍屬草案，若微信需要會話恢復的功能，因為微信對伺服器端和客戶端程式碼有完整的控制權，部署當時仍測試中的 TLS 1.3 實作並非難事。

儘管 MMTLS 設計者付出了大量努力，總合來看，微信所使用的安全協定在安全性和效能上均不如 TLS 1.3。一般情況下，設計一套既安全又有效率的傳輸協定並非易事。

傳輸協定為了進行交互握手而使用額外的封包來回造成延遲，長久以來都令應用程式開發者傷透腦筋。TCP 和 TLS 交互握手流程各自需要一次封包來回，意味著在傳輸任何新資料以前，協定需要等待兩次封包來回，造成延遲。今日，已有如 TLS-over-QUIC 等協定，結合了傳輸層和加密層的交互握手，僅需要一次封包來回即可完成握手開始傳輸資料。QUIC 提供了兩全其美的辦法，同時具備強固並且向前保密 (forward secret) 的加密，並且減半先前協定需要的連線建立封包來回時間。我們建議微信改用標準的 QUIC 協定。

最後，除了考量網路效能，客戶端應用程式的效能也是一重要課題。微信的協定設計中，每個網路請求都需要進行兩層加密，意味著相較於業界標準協定只需一次加密，微信客戶端應用程式需要花費接近雙倍的運算資源及時間。

中國應用程式自製加密法的趨勢

本研究的發現與我們早先的研究發現共同指出一項趨勢：中國應用程式廣泛地使用自製的加密法。一般狀況下，選擇不使用業界標竿的 TLS，並且發展自製非標準加密法，與業界公認安全的最佳做法背道而馳。儘管於 TLS 普及初期的 2011 年曾存在眾多讓 TLS 不值得信任的合理原因，例如 EFF 及 Access Now 對於憑證簽發機構生態系的疑慮，TLS 的發展在接下來已經大致穩定下來，機制變得更加透明且可稽核。

如同 MMTLS，我們過往研究過的所有自製協定相較於 TLS 都含有一些安全弱點，在部分情況下，甚至這些自製協定可以輕易地被網路攻擊者破解加密。全球的網際網路發展趨勢正逐步普及國際標準的 QUIC 及 TLS 以保護資料傳輸，中國業界反其道而行的趨勢為中國獨有，令人擔憂。

反域名劫持 (Anti-DNS-hijacking) 機制

類似於騰訊自行設計了加密系統的作法，我們發現在 Mars （微信所使用的網路通訊模組）中，他們也撰寫了一套特製的網域查詢系統。這個系統是 Mars STN 子模組的一部分，能夠透過 HTTP 以域名查詢得到 IP 位址。這個功能在 Mars 中被稱為「NewDNS」。根據我們的動態分析，這個功能在微信中經常被使用。乍看之下，NewDNS 重複了 DNS（域名系統）已經提供的相同功能，而 DNS 已經內建在幾乎所有連接網際網路的設備中。

在中國，使用類似系統的應用程式不只微信一家。中國的主要雲端運算供應商，如阿里巴巴雲和騰訊雲均提供自己的「透過 HTTP 查詢 DNS (DNS over HTTP)」服務。騰訊雲的 DNS over HTTP 服務端點 IP 位於 119.29.29.98，我們嘗試在 VirusTotal （線上的程式行為分析工具及樣本庫）中搜尋試圖聯繫此位址的應用程式，得到了 3,865 個獨特結果。

採用這種系統的一個可能原因是，中國的 ISP （網路服務供應商）經常對網路使用者實施 DNS 劫持，以插入廣告，並重導向網頁流量來進行廣告流量詐欺。此問題非常嚴重，以至於中國的六家網際網路巨頭在 2015 年發表了一份聯合聲明，敦促 ISP 進行改善。根據這篇新聞報導，美團（一個線上購物網站）約有 1-2% 的流量遭受 DNS 劫持。中國 ISP 的廣告流量詐欺問題近年來似乎仍然普遍存在。

與他們的 MMTLS 加密系統相似，騰訊的 NewDNS 網域查詢系統也是為了配合中國特殊的網路環境。多年來的經驗證明，DNS 本質上存在多種安全和隱私問題。我們在本研究中發現，微信的 MMTLS 比起 TLS 存在更多缺陷。然而，NewDNS 與 DNS 相比，是否也存在更多缺陷，仍然是一個未解的問題。我們期待未來針對此問題的研究。

使用微信網路模組 “Mars STN” 的其他應用程式

我們推測在微信以外，還有許多其他應用程式也使用了微信自行開發的網路通訊模組，稱作 “Mars”（STN 又為 Mars 的一部分）。這項推測是基於以下觀察：

Mars 開源存放於 GitHub 的儲存庫中，Mars 的使用者（其他開發者）回報了眾多問題
有許多技術文章概述了如何使用 Mars 建立即時通訊系統
市面上已存在一個基於 Mars 開發的白牌即時通訊系統產品

開發者在微信之外採用 Mars 的情況令人擔憂，因為 Mars 預設不提供任何傳輸加密。正如我們在研究報告主文 “Three Parts of Mars” （Mars 的三個部分）段落所提到，在微信中使用的 MMTLS 加密是來自 mars-wechat（Mars 的三部分之一），而 mars-wechat 並非開放原始碼，意味著這一部分僅有騰訊能夠使用。除此之外，Mars 開發者也沒有計劃支援 TLS，若其他開發者要使用 Mars，Mars 開發者認為他們應該在上層自行設計實作加密。更糟的是，在 Mars 的框架中實作 TLS 似乎需要大幅更動架構。儘管騰訊維持 MMTLS 閉源的作法無可厚非，但 MMTLS 仍是 Mars 框架主要配合的加密系統，維持 MMTLS 閉源，不給外部開發者的作法使得其他使用 Mars 的外部開發者必須投入大量資源來另行整合其他加密系統，或者讓所有資料傳輸都不加密。

另一方面，Mars 也缺乏技術說明文件。官方 wiki 只包含幾篇舊文章，說明如何整合 Mars 到應用程式當中。因為缺乏說明文件，許多使用 Mars 的開發者都會在 GitHub 上提出問題，同時也讓開發者更容易犯下技術錯誤，造成整體安全性降低。

這方面需要進一步的研究，以分析使用騰訊 Mars 程式庫的其他應用程式的安全性。

動態程式載入模組 “Tinker”

在本節中，我們暫且稱呼「從 Google Play 商店下載的 APK」為 “WeChat APK”，而稱呼「從微信官方網站下載的 APK」為 “Weixin APK”。WeChat 和 Weixin 的區別似乎很模糊。WeChat APK 和 Weixin APK 所包含的程式碼略有不同，我們稍後會在本節中討論。然而，將這兩個 APK 安裝到系統語系設為英文的 Android 模擬器時（此二 APK 不能同時安裝，因此必須分次），它們的應用程式名稱都顯示為 “WeChat”。它們的應用程式 ID（Android 系統和 Google Play 商店用來識別應用程式的名稱）也都是 “com.tencent.mm”。我們也都能夠使用這兩個 APK 登入我們的美國號碼帳戶。

與 WeChat APK 不同的是，我們發現 Weixin APK 包含一個稱為 “Tinker” 的程式庫，根據 Tinker 的技術文件，它是一個「熱修復解決方案」。Tinker 讓開發者在不呼叫 Android 系統 APK 安裝程式的情況下，仍能安裝應用程式更新，這是透過一種稱為「動態程式碼載入」的技術來達成的。我們在之前的研究中也有發現 TikTok 與抖音有類似的區別：抖音有類似的動態程式碼載入功能，而 TikTok 則沒有。動態程式碼載入技術存有三個常見疑慮：

如果下載和載入動態程式碼的流程未能充分驗證下載的程式碼 (例如：該程式碼是以正確的公開金鑰數位簽名、並未過期，以及該程式碼乃是正確的模組，而非只是具有正確簽名且未過期的任意程式碼)，攻擊者可能會利用此流程在裝置上執行惡意程式碼 (例如透過下列方法：注入任意程式碼、執行降級攻擊或注入新版但含有漏洞的程式碼)。早在 2016 年，我們就在其他中國應用程式中發現這類情況。
即使程式碼下載與載入機制不存在任何弱點，動態程式碼載入技術仍可讓應用程式在不通知使用者的情況下載入程式碼，此作法繞過了使用者的同意權，使用者應有權決定哪些程式可在其裝置上執行。舉例來說，開發者可能會推送使用者不想要的更新，而使用者無法選擇繼續使用舊版本。此外，開發者也可能選擇性地針對特定使用者推送會危害其安全或隱私的更新。2016 年，一位中國安全分析師指控阿里巴巴向支付寶推送動態載入的程式碼，在他的設備上偷偷拍照和錄音。
動態載入的程式碼不需要透過應用程式商店即可安裝在使用者的裝置上，如此一來應用程式商店就無法檢核應用程式的所有程式行為，確保其安全。因此，Google Play 開發者政策不允許應用程式使用動態載入程式碼。

分析 WeChat APK 時，我們發現它仍保留了 Tinker 的某些元件。WeChat APK 保有處理下載應用程式更新的元件，但 Tinker 的核心，也就是負責動態載入並執行程式碼的組件，已被不執行任何動作的 “no-op” 函式取代。我們並未分析來自其他第三方應用程式商店的微信軟體包。

分析 Tinker 應用程式更新流程的安全性、其他來源的 WeChat APK 是否包含動態程式碼載入功能，以及 Wechat APK 與 Weixin APK 之間的其他差異，還需要進一步的研究。

建議

此小節中我們根據研究發現對不同讀者提供建議。

給應用程式開發者

相對於使用廣泛驗證的標準加密法，設計與實作自製的加密法不僅更昂貴、效能更差，而且也更不安全。應用程式經常需要傳輸敏感資料，我們建議應用程式開發者使用身經百戰的加密法組合和協定，避免自行開發加密協定。SSL/TLS 過去近三十年間已經過一次又一次的公眾及學術檢驗，並納入了各式各樣的功能改進。TLS 在現今已比過去更容易於設定和部署，新發展的基於 QUIC 的 TLS 協定也將大幅改善效能。

給騰訊及微信開發者

以下是我們在安全漏洞揭露信件中提供給微信和騰訊的建議：

在這篇 2016 年的文章中，微信開發人員指出，他們希望升級加密，但若使用 TLS 1.2 握手則會額外增加一次封包來回，將顯著降低微信的網路效能，因為微信使用大量的簡短訊息進行通訊。當時 TLS 1.3 尚未成為 RFC（雖然 TLS 1.2 已經有會話恢復的擴充功能），因此他們選擇「閉門造車」，將 TLS 1.3 的會話恢復模型納入 MMTLS。

對所有應用程式開發者來說，這個為連線握手執行額外封包來回的問題存在已久。TCP 和 TLS 的握手各需要一次往返，這意味著每個全新傳送的資料封包就需要兩次往返。如今，TLS-over-QUIC 結合了傳輸層和加密層的握手，只需要單次握手。QUIC 就是為了這個明確的目的而開發的，它既能提供強大的前向保密加密，又能將安全通訊所需的封包往返次數減半。我們也注意到，微信似乎已將 QUIC 用於一些大型檔案的下載。我們的建議是讓微信完全轉換為標準 TLS 或 QUIC+TLS 實作。

除了網路效能外，還有客戶端效能的問題。由於微信的加密方案會對每個請求執行兩層加密，因此相較通用標準中的加密系統僅需一次加密，微信客戶端程式要執行雙倍的工作來加密資料。

給作業系統

在網頁生態系中，客戶端瀏覽器會在未使用 HTTPS 的時候顯示安全警告，搜尋引擎會使用 HTTPS 支援與否作為排序網頁的依據，此二作法推進了 TLS 的廣泛普及。以此為鑑，我們可以將這些作法大致地比照辦理至手機作業系統和應用程式商店，以推進 TLS 在手機應用程式生態系中的普及。

是否有任何平台或作業系統層級的權限機制設計可以顯示應用程式使用了標準的網路加密法？我們在先前研究中文輸入法中的自製加密法的時候，曾提到作業系統設計者可以考慮加入一警告機制，在應用程式未使用標準網路加密法的程式介面，卻直接使用底層系統的網路系統呼叫的時候，顯示警告。

給有隱私疑慮的高風險使用者

許多微信 / WeChat 用戶使用微信是出於需要而非選擇。對於有隱私權疑慮，但基於需要而使用微信的用戶，我們在上一份報告中的建議仍然適用：

盡可能避免使用被界定為「微信」服務的功能（盡量只使用國際版 WeChat 所提供的功能）。我們注意到，隱私權政策中規定的許多核心「微信」服務（如搜尋、頻道、小程序）比核心 “WeChat” 服務進行更多的追蹤。
在可能的情況下，請選擇網頁或專屬應用程式，而非小程序或其他此類嵌入式功能。
套用更嚴格的裝置權限設定，並定期更新軟體和作業系統，以獲得最新安全功能。

對於中國用戶來說，我們建議盡可能改用非中國 (+86) 的手機號碼註冊使用 WeChat。但我們理解中國帳戶中所提供的必備功能在非中國帳戶中可能並未提供，此狀況下可以考慮使用兩隻手機將存有隱私和安全風險的應用程式與較無隱私和安全風險的應用程式分開使用。

此外，由於從官方網站下載的微信存在動態程式碼載入所帶來的風險，我們建議用戶盡可能改從 Google Play 商店下載微信。對於已經從官網安裝了微信的用戶，移除並重新安裝 Google Play 商店版本也可以降低前述風險。

給安全和隱私研究人員

由於微信擁有超過十億的使用者，我們假設全球 MMTLS 使用者的數量級與全球 TLS 使用者的數量級相近。儘管如此，對 MMTLS 的公開分析或查核卻少之又少，與 TLS 的狀況相反。在這樣的影響規模下，MMTLS 應該受到與 TLS 類似程度的公開稽核。我們懇請未來的安全和隱私研究人員在本研究的基礎上繼續研究 MMTLS 協定，因為從我們的通信中得知，騰訊堅持在微信中繼續使用和開發 MMTLS。

【我們繼續聊天？】常見問題

Mona Wang — Tue, 15 Oct 2024 18:59:25 +0000

閱讀完整研究報告：【我們繼續聊天？】微信 MMTLS 加密協定的安全性分析

這項研究如何加深我們對微信的了解？

微信是一個具有許多功能的應用程式。之前我們研究過圍繞小程式的隱私問題，以及微信對文字和圖片訊息的監視及審查。在這項研究中，我們主要關注微信的網路加密協定及其安全性。

當像我們這樣的資訊安全研究人員分析應用程式的安全性時，我們會執行網路流量分析，以分析應用程式傳送的內容及傳送方式。此類分析可以告訴我們應用程式收集哪些資料以及與誰共享資料。

在微信上執行這樣的分析起初並非易事。如今，大多數應用程式都採用業界標準的傳輸層安全協定（TLS）來加密其網路流量的內容，這通常可以防止竊聽者讀取底層資料。當研究人員希望分析自己的應用程式傳送的流量時，已經存在用於解密此類內容的通用工具。然而，此類工具不適用於微信，因為微信採用了與 TLS 不同的專有網路加密協定，稱為「MMTLS」。在這項研究之前，人們對 MMTLS 知之甚少，也沒有現有的工具來檢查採用 MMTLS 加密的內容。

我們對微信網路加密的內部工作原理進行了逆向工程，發現其安全性存在一些小問題。我們發現之前提到的 MMTLS 只是微信採用的外層加密。在 MMTLS 中，我們發現了完全獨立於 MMTLS 運作的第二層加密，稱為「業務層加密」。兩個加密系統像俄羅斯套娃一樣「嵌套」在一起，即明文內容首先使用業務層加密進行加密，得到的業務層密文將作為 MMTLS 加密的輸入，產生 MMTLS 密文，最終再透過網路傳送 MMTLS 密文。

我們發現業務層加密存在幾個問題，最嚴重的是後設資料洩漏，導致用戶帳戶編號和其他一些資訊未在這一層加密。我們初步研究了一個早期的微信版本，發現其中只有業務層加密。這些發現表明業務層加密比 MMTLS 更早出現，而 MMTLS 可能旨在彌補業務層加密的缺點。由於 MMTLS 加密包覆在業務層加密之外，因此若攻擊者欲利用業務層加密的弱點，通常必須先破壞 MMTLS 層提供的保護。

到目前為止，我們尚未發現 MMTLS 有嚴重的安全性問題。因此，儘管業務層加密存在漏洞，但這些問題無法被攻擊者利用，並且不會影響應用程式網路加密的整體安全性。

我在微信上的通訊安全嗎？

每個人認同的威脅模型各異，因此對「安全」的定義也不同。如果您擔心您與其他微信用戶的通訊內容會被網路竊聽者讀取，我們的研究表明，儘管與行業標準加密協定相比，微信的加密協定對網路竊聽的保護較弱，但微信的加密協定應不至於受到當今已知的攻擊方法所影響。

微信使用自訂的加密協定而不是行業標準的傳輸層安全協定（TLS）。資訊安全專家通常不建議使用自訂設計進行加密，因為經過充分測試的加密協定通常需要許多研究人員多年的共同努力。單一公司不太可能投入同等程度的心力。我們也發現微信的加密協定存在一些小問題，而 TLS 中不存在類似的問題。

總而言之，儘管我們沒有發現微信加密協定有任何重大缺陷，但我們仍然發現了一些小問題。這些問題不會損害用戶通訊的機密性。然而，在業界標準 TLS 中並不存在同樣的問題。

中國政府可以閱讀我的微信訊息嗎？

在監管層面，由於騰訊總部位於中國，因此必須遵守當地法律並回應中國政府對用戶資料的要求。

從技術層面來說，微信的加密協定保護了用戶設備與微信伺服器之間的通訊。但它並非端對端加密系統，不會對兩個用戶設備之間傳送的資料進行加密。微信的伺服器可以並且確實會解密並讀取每條傳輸的訊息。之前，我們已經發現微信採用基於關鍵字的偵測系統來審查中國用戶傳送和接收的私人訊息。該應用程式還使用非中國用戶傳送的附檔來訓練他們針對中國用戶的審查資料庫。

微信收集哪些用戶資料？

我們之前的報告回答了這個問題。

我的手機內有敏感資料。如果我安裝了微信，它會竊取這些資料嗎？

現代手機作業系統（OS）通常會限制應用程式存取使用者的敏感資料（例如聯絡人和照片）及系統資源（例如定位服務）以及其他應用程式中儲存的資料（例如聊天紀錄）。因此，在沒有使用者授權的情況下，應用程式很難存取敏感資料。然而，惡意應用程式可能會利用作業系統保護中的漏洞規避這些存取限制。惡意應用程式也可能誘騙使用者授予存取資料的權限。

測試微信是否表現出這些惡意行為超出了我們的研究範圍。但在我們的研究過程中，我們亦沒有觀察到上述惡意行為。（也就是說，我們的研究並未刻意尋找上述惡意行為，但在研究過程中我們所接觸到的微信程式碼的範圍內，也沒有發現上述惡意行為。在我們研究未探索的程式碼中，是否存在上述惡意行為仍是未知數。）

為了保護自己免受此類攻擊，我們建議：

確保您手機的作業系統版本目前仍受製造商支援
及時更新手機作業系統
從官方來源（內建應用商店）安裝應用程式，而不是非官方來源
安裝前檢查應用程式的聲譽
使用專用設備來處理高度敏感的資料

微信在解除安裝後是否仍會損害手機的安全及隱私？

如上所述，現代手機作業系統採取系統保護手段來控制應用程式對敏感資料的存取。從技術上來說，應用程式在解除安裝後很難維持對系統的植入。然而，某些惡意應用程式可能會嘗試利用系統漏洞來執行此操作。

測試微信是否表現出這些惡意行為超出了我們的研究範圍。然而，我們在研究中並沒有觀察到此類惡意行為。

如果我還有其他問題怎麼辦？

閱讀我們的完整報告以及我們之前的微信隱私分析報告中的常見問題。

譯文注意事項：此為英文原始報告的非正式翻譯，其中可能含有不準確之處。其目的僅為提供對我們研究的基本說明。如有差異或歧義，以本報告的英文版本為準。

Should We Chat, Too? FAQ

Mona Wang — Tue, 15 Oct 2024 18:59:13 +0000

Read the full report: Should We Chat, Too? Security Analysis of WeChat’s MMTLS Encryption Protocol

What does this research contribute to what we already know about WeChat?

WeChat is an app with many features. Previously, we studied the privacy issues surrounding its Mini Programs, as well as its surveillance and censorship of text and image messages. In this research, we focus on WeChat’s network encryption protocol and its security.

When information security researchers like us analyze the security of apps, one typeof analysis we perform is network traffic analysis, wherein we analyze what is sent by apps and how. This analysis can inform us about what data the app collects and with whom the data is shared.

Performing such an analysis on WeChat was initially not straightforward. Most apps today use industry standard Transport Layer Security (TLS) to encrypt the content of their network traffic, which normally protects eavesdroppers from reading the underlying data. When researchers wish to analyze their own traffic that their own apps are sending there already exist common tools to decrypt such content. However, such tools were inapplicable to WeChat because it uses a proprietary network encryption protocol different from TLS, called “MMTLS”. Prior to this research, little was known about MMTLS and there were no pre-existing tools to inspect content encrypted with MMTLS.

We reverse engineered the inner workings of WeChat’s network encryption and found minor issues with its security. We found that what we previously referred to as MMTLS was only the outer layer of encryption used by WeChat. Within MMTLS, we found a second layer of encryption that works entirely separately from MMTLS, called “Business-layer Encryption”. The two encryption systems are “wrapped” around each other like a Russian doll, i.e., the plaintext content would first be encrypted with the Business-layer Encryption, and the resulting Business-layer ciphertext would be used as input to MMTLS encryption, producing MMTLS ciphertext that would eventually be sent over the network.

We found several issues with the Business-layer Encryption, the most serious being a metadata leak, which leaves the user account ID and some other information unencrypted at this layer. We briefly studied an older WeChat version and found that it only contained the Business-layer Encryption. These findings suggest that Business-layer Encryption is older than MMTLS and that MMTLS was likely designed to remedy the shortcomings of Business-layer Encryption. Since MMTLS Encryption wraps around Business-layer Encryption, attempts to exploit the weaknesses of Business-layer Encryption would typically have to first defeat the protection provided by the MMTLS layer.

So far, we have not found serious security issues with MMTLS. Therefore, despite vulnerabilities in the Business Layer of encryption, these issues could not be utilized by an attacker, and does not affect the overall security of the app’s network encryption.

Is my communication on WeChat safe?

Everyone identifies with different threat models and thus have different definitions of “safe.” If you are concerned about the contents of your communication with another WeChat user being visible to a network eavesdropper, our research has shown that, despite having weaker protection against network eavesdropping compared to industry standard encryption protocols, WeChat’s encryption protocol is not vulnerable to any attack techniques known today.

WeChat uses a custom encryption protocol instead of the industry standard Transport Layer Security (TLS). Using a custom design for encryption is generally not recommended by information security experts, because a well-tested encryption protocol usually takes a multi-year joint effort by many researchers. It is unlikely that a single company is able to invest the same level of effort. We have also found minor issues in WeChat’s encryption protocol, while these same issues are not present in TLS.

In conclusion, despite that we did not find any significant weaknesses in WeChat’s encryption protocol, we still did find minor issues with it. These issues do not compromise the confidentiality of user communications. However, the same issues are not present within industry-standard TLS.

Can the Chinese government read my WeChat text messages?

On a regulatory level, because Tencent’s headquarter is located in China, it must comply with local law and respond to the Chinese government’s user data requests.

On a technical level, WeChat’s encryption protocol protects the communication between user devices and WeChat’s servers. It is not an end-to-end encryption system, which would encrypt data sent between two user devices. WeChat’s servers can and do decrypt and read each transmitted message. In the past, we found that WeChat uses a keyword-based detection system to censor private messages to and from Chinese users. The app also uses files sent by non-Chinese users to train their database of censored files for Chinese users.

What kind of user data does WeChat collect?

Please refer to our previous report which answers this question.

I have a phone with sensitive data. If I install WeChat, would it be able to steal that data?

Modern phone operating systems (OS) typically restrict apps from accessing sensitive user data (such as contacts and photos) and system resources (such as a geo-location service), as well as data stored in other apps (such as chat history in a chat app). Therefore, it is technically difficult for apps to access sensitive data without the user granting them permission. However, malicious apps could exploit vulnerabilities in OS protections and circumvent these access restrictions. Malicious apps might also trick the user into granting permission to access data.

It is outside our research scope to test whether WeChat exhibits these malicious behaviors. During our research we have not observed the malicious behaviors mentioned above.

To protect yourself from this kind of attack, we suggest:

Make sure your phone’s OS version is currently supported by the vendor
Keep your phone’s OS up-to-date
Install apps from official sources (built-in app stores) instead of unofficial ones
Check the app’s reputation before installing
For highly sensitive information, use a separate dedicated device to handle it

Can WeChat harm my phone security and privacy even after uninstallation?

As mentioned above, modern phone operating systems implement system protection to control apps’ access to sensitive data. It is technically difficult for apps to maintain implants on a system after the app’s uninstallation. However, some malicious apps might try to do so by exploiting system vulnerabilities.

It is outside our research scope to test whether WeChat exhibits these malicious behaviors. However, throughout our research, we have not observed such malicious behavior.

What if I have other questions?

Read our full report and check out the FAQ from our previous WeChat report analyzing the app’s privacy.

Should We Chat, Too? Security Analysis of WeChat’s MMTLS Encryption Protocol

Mona Wang — Tue, 15 Oct 2024 18:59:12 +0000

This report’s key findings are translated into Chinese, it has an accompanying FAQ, and the FAQ is translated into Simplified and Traditional Chinese.

Key contributions

We performed the first public analysis of the security and privacy properties of MMTLS, the main network protocol used by WeChat, an app with over one billion monthly active users.
We found that MMTLS is a modified version of TLS 1.3, with many of the modifications that WeChat developers made to the cryptography introducing weaknesses.
Further analysis revealed that earlier versions of WeChat used a less secure, custom-designed protocol that contains multiple vulnerabilities, which we describe as “Business-layer encryption”. This layer of encryption is still being used in addition to MMTLS in modern WeChat versions.
Although we were unable to develop an attack to completely defeat WeChat’s encryption, the implementation is inconsistent with the level of cryptography you would expect in an app used by a billion users, such as its use of deterministic IVs and lack of forward secrecy.
These findings contribute to a larger body of work that suggests that apps in the Chinese ecosystem fail to adopt cryptographic best practices, opting instead to invent their own, often problematic systems.
We are releasing technical tools and further documentation of our technical methodologies in an accompanying Github repository. These tools and documents, along with this main report, will assist future researchers to study WeChat’s inner workings.

Introduction

WeChat, with over 1.2 billion monthly active users, stands as the most popular messaging and social media platform in China and third globally. As indicated by market research, WeChat’s network traffic accounted for 34% of Chinese mobile traffic in 2018. WeChat’s dominance has monopolized messaging in China, making it increasingly unavoidable for those in China to use. With an ever-expanding array of features, WeChat has also grown beyond its original purpose as a messaging app.

Despite the universality and importance of WeChat, there has been little study of the proprietary network encryption protocol, MMTLS, used by the WeChat application. This knowledge gap serves as a barrier for researchers in that it hampers additional security and privacy study of such a critical application. In addition, home–rolled cryptography is unfortunately common in many incredibly popular Chinese applications, and there have historically been issues with cryptosystems developed independently of well-tested standards such as TLS.

This work is a deep dive into the mechanisms behind MMTLS and the core workings of the WeChat program. We compare the security and performance of MMTLS to TLS 1.3 and discuss our overall findings. We also provide public documentation and tooling to decrypt WeChat network traffic. These tools and documents, along with our report, will assist future researchers to study WeChat’s privacy and security properties, as well as its other inner workings.

This report consists of a technical description of how WeChat launches a network request and its encryption protocols, followed by a summary of weaknesses in WeChat’s protocol, and finally a high-level discussion of WeChat’s design choices and their impact. The report is intended for privacy, security, or other technical researchers interested in furthering the privacy and security study of WeChat. For non-technical audiences, we have summarized our findings in this FAQ.

Prior work on MMTLS and WeChat transport security

Code internal to the WeChat mobile app refers to its proprietary TLS stack as MMTLS (MM is short for MicroMessenger, which is a direct translation of 微信, the Chinese name for WeChat) and uses it to encrypt the bulk of its traffic.

There is limited public documentation of the MMTLS protocol. This technical document from WeChat developers describes in which ways it is similar and different from TLS 1.3, and attempts to justify various decisions they made to either simplify or change how the protocol is used. In this document, there are various key differences they identify between MMTLS and TLS 1.3, which help us understand the various modes of usage of MMTLS.

Wan et al. conducted the most comprehensive study of WeChat transport security in 2015 using standard security analysis techniques. However, this analysis was performed before the deployment of MMTLS, WeChat’s upgraded security protocol. In 2019, Chen et al. studied the login process of WeChat and specifically studied packets that are encrypted with TLS and not MMTLS.

As for MMTLS itself, in 2016 WeChat developers published a document describing the design of the protocol at a high level that compares the protocol with TLS 1.3. Other MMTLS publications focus on website fingerprinting-type attacks, but none specifically perform a security evaluation. A few Github repositories and blog posts look briefly into the wire format of MMTLS, though none are comprehensive. Though there has been little work studying MMTLS specifically, previous Citizen Lab reports have discovered security flaws of other cryptographic protocols designed and implemented by Tencent.

Methodology

We analyzed two versions of WeChat Android app:

Version 8.0.23 (APK “versionCode” 2160) released on May 26, 2022, downloaded from the WeChat website.
Version 8.0.21 (APK “versionCode” 2103) released on April 7, 2022, downloaded from Google Play Store.

All findings in this report apply to both of these versions.

We used an account registered to a U.S. phone number for the analysis, which changes the behavior of the application compared to a mainland Chinese number. Our setup may not be representative of all WeChat users, and the full limitations are discussed further below.

For dynamic analysis, we analyzed the application installed on a rooted Google Pixel 4 phone and an emulated Android OS. We used Frida to hook the app’s functions and manipulate and export application memory. We also performed network analysis of WeChat’s network traffic using Wireshark. However, due to WeChat’s use of nonstandard cryptographic libraries like MMTLS, standard network traffic analysis tools that might work with HTTPS/TLS do not work for all of WeChat’s network activity. Our use of Frida was paramount for capturing the data and information flows we detail in this report. These Frida scripts are designed to intercept WeChat’s request data immediately before WeChat sends it to its MMTLS encryption module. The Frida scripts we used are published in our Github repository.

For static analysis, we used Jadx, a popular Android decompiler, to decompile WeChat’s Android Dex files into Java code. We also used Ghidra and IDA Pro to decompile the native libraries (written in C++) bundled with WeChat.

Notation

In this report, we reference a lot of code from the WeChat app. When we reference any code (including file names and paths), we will style the text using monospace fonts to indicate it is code. If a function is referenced, we will add empty parentheses after the function name, like this: somefunction(). The names of variables and functions that we show may come from one of the three following:

The original decompiled name.
In cases where the name cannot be decompiled into a meaningful string (e.g., the symbol name was not compiled into the code), we rename it according to how the nearby internal log messages reference it.
In cases where there is not enough information for us to tell the original name, we name it according to our understanding of the code. In such cases, we will note that these names are given by us.

In the cases where the decompiled name and log message name of functions are available, they are generally consistent. Bolded or italicized terms can refer to higher-level concepts or parameters we have named.

Utilization of open source components

We also identified open source components being used by the project, the two largest being OpenSSL and Tencent Mars. Based on our analysis of decompiled WeChat code, large parts of its code are identical to Mars. Mars is an “infrastructure component” for mobile applications, providing common features and abstractions that are needed by mobile applications, such as networking and logging.

By compiling these libraries separately with debug symbols, we were able to import function and class definitions into Ghidra for further analysis. This helped tremendously to our understanding of other non-open-source code in WeChat. For instance, when we were analyzing the network functions decompiled from WeChat, we found a lot of them to be highly similar to the open source Mars, so we could just read the source code and comments to understand what a function was doing. What was not included in open source Mars are encryption related functions, so we still needed to read decompiled code, but even in these cases we were aided by various functions and structures that we already know from the open source Mars.

Matching decompiled code to its source

In the internal logging messages of WeChat, which contain source file paths, we noticed three top level directories, which we have highlighted below:

/home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars/
/home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars-wechat/
/home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars-private/

The source files under “mars” can all be found in the open source Mars repository as well, while source files in the other two top level directories cannot be found in the open source repository. To illustrate, below is a small section of decompiled code from libwechatnetwork.so :

    XLogger::XLogger((XLogger *)&local_2c8,5,"mars::stn",

"/home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars/mars/stn/src/longlink.cc"
                ,"Send",0xb2,false,(FuncDef0 *)0x0);
    XLogger::Assert((XLogger *)&local_2c8,"tracker_.get()");
    XLogger::~XLogger((XLogger *)&local_2c8);

From its similarity, is highly likely that this section of code was compiled from this line in the Send() function, defined in longlink.cc file from the open source repository:

xassert2(tracker_.get());

Reusing this observation, whenever our decompiler is unable to determine the name of a function, we can use logging messages within the compiled code to determine its name. Moreover, if the source file is from open source Mars, we can read its source code as well.

Three parts of Mars

In a few articles on the Mars wiki, Tencent developers provided the following motivations to develop Mars:

The need for a cross-platform networking library, to reduce the development and maintenance costs of two separate network libraries on Android and iOS.
The need to customize parameters of the TCP handshake process, in order for faster connection establishment.

According to its developers, Mars and its STN module are comparable to networking libraries such as AFNetworking and OkHttp, which are widely used in other mobile apps.

One of the technical articles released by the WeChat development team wrote about the process of open-sourcing Mars. According to the article, they had to separate WeChat-specific code, which was kept private, from the general use code, which was open sourced. In the end, three parts were separated from each other:

mars-open: to be open sourced, independent repository.
mars-private: potentially open sourced, depends on mars-open.
mars-wechat: WeChat business logic code, depends on mars-open and mars-private.

These three names match the top level directories we found earlier if we take “mars-open” to be in the “mars” top-level directory. Using this knowledge, when reading decompiled WeChat code, we could easily know whether it was WeChat-specific or not. From our reading of the code, mars-open contains basic and generic structures and functions, for instance, buffer structures, config stores, thread management and, most importantly, the module named “STN” responsible for network transmission. (We were unable to determine what STN stands for.) On the other hand, mars-wechat contains the MMTLS implementation, and mars-private is not closely related to the features within our research scope.

As a technical side note, the open source Mars compiles to just one object file named “libmarsstn.so”. However, in WeChat, multiple shared object files reference code within the open source Mars, including the following:

libwechatxlog.so
libwechatbase.so
libwechataccessory.so
libwechathttp.so
libandromeda.so
libwechatmm.so
libwechatnetwork.so

Our research focuses on the transport protocol and encryption of WeChat, which is implemented mainly in libwechatmm.so and libwechatnetwork.so. In addition, we inspected libMMProtocalJni.so, which is not part of Mars but contains functions for cryptographic calculations. We did not inspect the other shared object files.

Matching Mars versions

Despite being able to find open source code to parts of WeChat, in the beginning of our research, we were unable to pinpoint the specific version of the source code of mars-open that was used to build WeChat. Later, we found version strings contained in libwechatnetwork.so. For WeChat 8.0.21, searching for the string “MARS_” yielded the following:

MARS_BRANCH: HEAD
MARS_COMMITID: d92f1a94604402cf03939dc1e5d3af475692b551
MARS_PRIVATE_BRANCH: HEAD
MARS_PRIVATE_COMMITID: 193e2fb710d2bb42448358c98471cd773bbd0b16
MARS_URL:
MARS_PATH: HEAD
MARS_REVISION: d92f1a9
MARS_BUILD_TIME: 2022-03-28 21:52:49
MARS_BUILD_JOB: rb/2022-MAR-p-e118ef4209d745e1b9ea0b1daa0137ab-22.3_1040

The specific MARS_COMMITID (d92f1a…) exists in the open source Mars repository. This version of the source code also matches the decompiled code.

Pinpointing the specific source code version helped us tremendously with Ghidra’s decompilation. Since a lot of the core data structures used in WeChat are from Mars, by importing the known data structures, we can observe the non-open-sourced code accessing structure fields, and inferring its purpose.

Limitations

This investigation only looks at client behavior and is therefore subject to other common limitations in privacy research that can only perform client analysis. Much of the data that the client transmits to WeChat servers may be required for functionality of the application. For instance, WeChat servers can certainly see chat messages since WeChat can censor them according to their content. We cannot always measure what Tencent is doing with the data that they collect, but we can make inferences about what is possible. Previous work has made certain limited inferences about data sharing, such as that messages sent by non-mainland-Chinese users are used to train censorship algorithms for mainland Chinese users. In this report, we focus on the version of WeChat for non-mainland-Chinese users.

Our investigation was also limited due to legal and ethical constraints. It has become increasingly difficult to obtain Chinese phone numbers for investigation due to the strict phone number and associated government ID requirements. Therefore, we did not test on Chinese phone numbers, which causes WeChat to behave differently. In addition, without a mainland Chinese account, the types of interaction with certain features and Mini Programs were limited. For instance, we did not perform financial transactions on the application.

Our primary analysis was limited to analyzing only two versions of WeChat Android (8.0.21 and 8.0.23). However, we also re-confirmed our tooling works on WeChat 8.0.49 for Android (released April 2024) and that the MMTLS network format matches that used by WeChat 8.0.49 for iOS. Testing different versions of WeChat, the backwards-compatibility of the servers with older versions of the application, and testing on a variety of Android operating systems with variations in API version, are great avenues for future work.

Within the WeChat Android app, we focused on its networking components. Usually, within a mobile application (and in most other programs as well), all other components will defer the work of communicating over the network to the networking components. Our research is not a complete security and privacy audit of the WeChat app, as even if the network communication is properly protected, other parts of the app still need to be secure and private. For instance, an app would not be secure if the server accepts any password to an account login, even if the password is confidentially transmitted.

Tooling for studying WeChat and MMTLS

In the Github repository, we have released tooling that can log keys using Frida and decrypt network traffic that is captured during the same period of time, as well as samples of decrypted payloads. In addition, we have provided additional documentation and our reverse-engineering notes from studying the protocol. We hope that these tools and documentation will further aid researchers in the study of WeChat.

Launching a WeChat network request

As with any other apps, WeChat is composed of various components. Components within WeChat can invoke the networking components to send or receive network transmissions. In this section, we provide a highly simplified description of the process and components surrounding sending a network request in WeChat. The actual process is much more complex, which we explain in more detail in a separate document. The specifics of data encryption is discussed in the next section “WeChat network request encryption”.

In the WeChat source code, each API is referred to as a different “Scene”. For instance, during the registration process, there is one API that submits all new account information provided by the user, called NetSceneReg. NetSceneReg is referred to by us as a “Scene class”, Other components could start a network request towards an API by calling the particular Scene class. In the case of NetSceneReg, it is usually invoked by a click event of a button UI component.

Upon invocation, the Scene class would prepare the request data. The structure of the request data (as well as the response) is defined in “RR classes”. (We dub them RR classes because they tend to have “ReqResp” in their names.) Usually, one Scene class would correspond to one RR class. In the case of NetSceneReg, it corresponds to the RR class MMReqRespReg2, and contains fields like the desired username and phone number. For each API, its RR class also defines a unique internal URI (usually starting with “/cgi-bin”) and a “request type” number (an approximately 2–4 digit integer). The internal URI and request type number is often used throughout the code to identify different APIs. Once the data is prepared by the Scene class, it is sent to MMNativeNetTaskAdapter.

MMNativeNetTaskAdapter is a task queue manager, it manages and monitors the progress of each network connection and API requests. When a Scene Class calls MMNativeNetTaskAdapter, it places the new request (a task) onto the task queue, and calls the req2Buf() function. req2Buf() serializes the request Protobuf object that was prepared by the Scene Class into bytes, then encrypts the bytes using Business-layer Encryption.

Finally, the resultant ciphertext from Business-layer encryption is sent to the “STN” module, which is part of Mars. STN then encrypts the data again using MMTLS Encryption. Then, STN establishes the network transport connection, and sends the MMTLS Encryption ciphertext over it. In STN, there are two types of transport connections: Shortlink and Longlink. Shortlink refers to an HTTP connection that carries MMTLS ciphertext. Shortlink connections are closed after one request-response cycle. Longlink refers to a long-lived TCP connection. A Longlink connection can carry multiple MMTLS encrypted requests and responses without being closed.

WeChat network request encryption

WeChat network requests are encrypted twice, with different sets of keys. Serialized request data is first encrypted using what we call the Business-layer Encryption, as internal encryption is referred to in this blog post as occurring at the “Business-layer”. The Business-layer Encryption has two modes: Symmetric Mode and Asymmetric Mode. The resultant Business-layer-encrypted ciphertext is appended to metadata about the Business-layer request. Then, the Business-layer requests (i.e., request metadata and inner ciphertext) are additionally encrypted, using MMTLS Encryption. The final resulting ciphertext is then serialized as an MMTLS Request and sent over the wire.

WeChat’s network encryption system is disjointed and seems to still be a combination of at least three different cryptosystems. The encryption process described in the Tencent documentation mostly matches our findings about MMTLS Encryption, but the document does not seem to describe in detail the Business-layer Encryption, whose operation differs when logged-in and when logged-out. Logged-in clients use Symmetric Mode while logged-out clients use Asymmetric Mode. We also observed WeChat utilizing HTTP, HTTPS, and QUIC to transmit large, static resources such as translation strings or transmitted files. The endpoint hosts for these communications are different from MMTLS server hosts. Their domain names also suggest that they belong to CDNs. However, the endpoints that are interesting to us are those that download dynamically generated, often confidential resources (i.e., generated by the server on every request) or endpoints where users transmit, often confidential, data to WeChat’s servers. These types of transmissions are made using MMTLS.

As a final implementation note, WeChat, across all these cryptosystems, uses internal OpenSSL bindings that are compiled into the program. In particular, the libwechatmm.so library seems to have been compiled with OpenSSL version 1.1.1l, though the other libraries that use OpenSSL bindings, namely libMMProtocalJni.so and libwechatnetwork.so were not compiled with the OpenSSL version strings. We note that OpenSSL internal APIs can be confusing and are often misused by well-intentioned developers. Our full notes about each of the OpenSSL APIs that are used can be found in the Github repository.

In Table 1, we have summarized each of the relevant cryptosystems, how their keys are derived, how encryption and authentication are achieved, and which libraries contain the relevant encryption and authentication functions. We will discuss cryptosystem’s details in the coming sections.

	Key derivation	Encryption	Authentication	Library	Functions that perform the symmetric encryption
MMTLS, Longlink	Diffie-Hellman (DH)	AES-GCM	AES-GCM tag	`libwechatnetwork.so`	`Crypt()`
MMTLS, Shortlink	DH with session resumption	AES-GCM	AES-GCM tag	`libwechatnetwork.so`	`Crypt()`
Business-layer, Asymmetric Mode	Static DH with fresh client keys	AES-GCM	AES-GCM tag	`libwechatmm.so`	`HybridEcdhEncrypt(),` `AesGcmEncryptWithCompress()`
Business-layer, Symmetric Mode	Fixed key from server	AES-CBC	Checksum + MD5	`libMMProtocalJNI.so`	`pack(), EncryptPack(), genSignature()`

Table 1: Overview of different cryptosystems for WeChat network request encryption, how keys are derived, how encryption and authentication are performed, and which libraries perform them.

1. MMTLS Wire Format

Since MMTLS can go over various transports, we refer to an MMTLS packet as a unit of correspondence within MMTLS. Over Longlink, MMTLS packets can be split across multiple TCP packets. Over Shortlink, MMTLS packets are generally contained within an HTTP POST request or response body.¹

Each MMTLS packet contains one or more MMTLS records (which are similar in structure and purpose to TLS records). Records are units of messages that carry handshake data, application data, or alert/error message data within each MMTLS packet.

1A. MMTLS Records

Records can be identified by different record headers, a fixed 3-byte sequence preceding the record contents. In particular, we observed 4 different record types, with the corresponding record headers:

Handshake-Resumption Record	`19 f1 04`
Handshake Record	`16 f1 04`
Data Record	`17 f1 04`
Alert Record	`15 f1 04`

Handshake records contain metadata and the key establishment material needed for the other party to derive the same shared session key using Diffie-Hellman. Handshake-Resumption record contains sufficient metadata for “resuming” a previously established session, by re-using previously established key material. Data records can contain encrypted ciphertext that carries meaningful WeChat request data. Some Data packets simply contain an encrypted no-op heartbeat. Alert records signify errors or signify that one party intends to end a connection. In MMTLS, all non-handshake records are encrypted, but the key material used differs based on which stage of the handshake has been completed.

Here is an annotated MMTLS packet from the server containing a Handshake record:

Here is an example of a Data record sent from the client to the server:

To give an example of how these records interact, generally the client and server will exchange Handshake records until the Diffie-Hellman handshake is complete and they have established shared key material. Afterwards, they will exchange Data records, encrypted using the shared key material. When either side wants to close the connection, they will send an Alert record. More illustrations of each record type’s usage will be made in the following section.

1B. MMTLS Extensions

As MMTLS’ wire protocol is heavily modeled after TLS, we note that it has also borrowed the wire format of “TLS Extensions” to exchange relevant encryption data during the handshake. Specifically, MMTLS uses the same format as TLS Extensions for the Client to communicate their key share (i.e. the client’s public key) for Diffie-Hellman, similar to TLS 1.3’s key_share extension, and to communicate session data for session resumption (similar to TLS 1.3’s pre_shared_key extension). In addition, MMTLS has support for Encrypted Extensions, similar to TLS, but they are currently not used in MMTLS (i.e., the Encrypted Extensions section is always empty).

2. MMTLS Encryption

This section describes the outer layer of encryption, that is, what keys and encryption functions are used to encrypt and decrypt the ciphertexts found in the “MMTLS Wire Format” section, and how the encryption keys are derived.

The encryption and decryption at this layer occurs in the STN module, in a separate spawned “com.tencent.mm:push”² process on Android. The spawned process ultimately transmits and receives data over the network. The code for all of the MMTLS Encryption and MMTLS serialization were analyzed from the library libwechatnetwork.so. In particular, we studied the Crypt() function, a central function used for all encryption and decryption whose name we derived from debug logging code. We also hooked all calls to HKDF_Extract() and HKDF_Expand(), the OpenSSL functions for HKDF, in order to understand how keys are derived.

When the “:push” process is spawned, it starts an event loop in HandshakeLoop(), which processes all outgoing and incoming MMTLS Records. We hooked all functions called by this event loop to understand how each MMTLS Record is processed. The code for this study, as well as the internal function addresses identified for the particular version of WeChat we studied, can be found in the Github repository.

Figure 1: Network requests: MMTLS encryption connection over longlink and over shortlink. Each box is an MMTLS Record, and each arrow represents an “MMTLS packet” sent over either Longlink (i.e., a single TCP packet) or shortlink (i.e., in the body of HTTP POST). Once both sides have received the DH keyshare, all further records are encrypted.

2A. Handshake and key establishment

In order for Business-layer Encryption to start sending messages and establish keys, it has to use the MMTLS Encryption tunnel. Since the key material for the MMTLS Encryption has to be established first, the handshakes in this section happen before any data can be sent or encrypted via Business-layer Encryption. The end goal of the MMTLS Encryption handshake discussed in this section is to establish a common secret value that is known only to the client and server.

On a fresh startup of WeChat, it tries to complete one MMTLS handshake over Shortlink, and one MMTLS handshake over Longlink, resulting in two MMTLS encryption tunnels, each using different sets of encryption keys. For Longlink, after the handshake completes, the same Longlink (TCP) connection is kept open to transport future encrypted data. For Shortlink, the MMTLS handshake is completed in the first HTTP request-response cycle, then the first HTTP connection closes. The established keys are stored by the client and server, and when data needs to be sent over Shortlink, those established keys are used for encryption, then sent over a newly established Shortlink connection. In the remainder of this section, we describe details of the handshakes.

ClientHello

First, the client generates keypairs on the SECP256R1 elliptic curve. Note that these elliptic curve keys are entirely separate pairs from those generated in the Business-layer Encryption section. The client also reads some Resumption Ticket data from a file stored on local storage named psk.key, if it exists. The psk.key file is written to after the first ServerHello is received, so, on a fresh install of WeChat, the resumption ticket is omitted from the ClientHello.

The client first simultaneously sends a ClientHello message (contained in a Handshake record) over both the Shortlink and Longlink. The first of these two handshakes that completes successfully is the one that the initial Business-layer Encryption handshake occurs over (details of Business-layer Encryption are discussed in Section 4). Both Shortlink and Longlink connections are used afterwards for sending other data.

In both the initial Shortlink and Longlink handshake, each ClientHello packet contains the following data items:

ClientRandom (32 bytes of randomness)
Resumption Ticket data read from psk.key, if available
Client public key

An abbreviated version of the MMTLS ClientHello is shown below.

16 f1 04 (Handshake Record header) . . .
01 04 f1 (ClientHello) . . .
08 cd 1a 18 f9 1c . . . (ClientRandom) . . .
00 0c c2 78 00 e3 . . . (Resumption Ticket from psk.key) . . .
04 0f 1a 52 7b 55 . . . (Client public key) . . .

Note that the client generates a separate keypair for the Shortlink ClientHello and the Longlink ClientHello. The Resumption Ticket sent by the client is the same on both ClientHello packets because it is always read from the same psk.key file. On a fresh install of WeChat, the Resumption Ticket is omitted since there is no psk.key file.

ServerHello

The client receives a ServerHello packet in response to each ClientHello packet. Each contains:

A record containing ServerRandom and Server public key
Records containing encrypted server certificate, new resumption ticket, and a ServerFinished message.

An abbreviated version of the MMTLS ServerHello is shown below; a full packet sample with labels can be found in the annotated network capture.

16 f1 04 (Handshake Record header) . . .
02 04 f1 (ServerHello) . . .
2b a6 88 7e 61 5e 27 eb . . . (ServerRandom) . . .
04 fa e3 dc 03 4a 21 d9 . . . (Server public key) . . .
16 f1 04 (Handshake Record header) . . .
b8 79 a1 60 be 6c . . . (ENCRYPTED server certificate) . . .
16 f1 04 (Handshake Record header) . . .
1a 6d c9 dd 6e f1 . . . (ENCRYPTED NEW resumption ticket) . . .
16 f1 04 (Handshake Record header) . . .
b8 79 a1 60 be 6c . . . (ENCRYPTED ServerFinished) . . .

On receiving the server public key, the client generates

secret = ecdh(client_private_key, server_public_key).

Note that since each MMTLS encrypted tunnel uses a different pair of client keys, the shared secret, and any derived keys and IVs will be different between MMTLS tunnels. This also means Longlink handshake and Shortlink handshake each compute a different shared secret.

Then, the shared secret is used to derive several sets of cryptographic parameters via HKDF, a mathematically secure way to transform a short secret value into a long secret value. In this section, we will focus on the handshake parameters. Alongside each set of keys, initialization vectors (IVs) are also generated. The IV is a value that is needed to initialize the AES-GCM encryption algorithm. IVs do not need to be kept secret. However, they need to be random and not reused.

The handshake parameters are generated using HKDF (“handshake key expansion” is a constant string in the program, as well as other monotype double quoted strings in this section):

key_enc, key_dec, iv_enc, iv_dec = HKDF(secret, 56, “handshake key expansion”)

Using key_dec and iv_dec, the client can decrypt the remainder of the ServerHello records. Once decrypted, the client validates the server certificate. Then, the client also saves the new Resumption Ticket to the file psk.key.

At this point, since the shared secret has been established, the MMTLS Encryption Handshake is considered completed. To start encrypting and sending data, the client derives other sets of parameters via HKDF from the shared secret. The details of which keys are derived and used for which connections are fully specified in these notes where we annotate the keys and connections created on WeChat startup.

2B. Data encryption

After the handshake, MMTLS uses AES-GCM with a particular key and IV, which are tied to the particular MMTLS tunnel, to encrypt data. The IV is incremented by the number of records previously encrypted with this key. This is important because re-using an IV with the same key destroys the confidentiality provided in AES-GCM, as it can lead to a key recovery attack using the known tag.

ciphertext, tag = AES-GCM(input, key, iv+n)
ciphertext = ciphertext | tag

The 16-byte tag is appended to the end of the ciphertext. This tag is authentication data computed by AES-GCM; it functions as a MAC in that when verified properly, this data provides authentication and integrity. In many cases, if this is a Data record being encrypted, input contains metadata and ciphertext that has already been encrypted as described in the Business-layer Encryption section.

We separately discuss data encryption in Longlink and Shortlink in the following subsections.

2B1. Longlink

Client-side Encryption for Longlink packets is done using AES-GCM with key_enc and iv_enc derived earlier in the handshake. Client-side Decryption uses key_dec and iv_dec. Below is a sample Longlink (TCP) packet containing a single data record containing an encrypted heartbeat message from the server³:

17 f1 04     RECORD HEADER (of type “DATA”)
00 20                                           RECORD LENGTH
e6 55 7a d6 82 1d a7 f4 2b 83 d4 b7 78 56 18 f3         ENCRYPTED DATA
1b 94 27 e1 1e c3 01 a6 f6 23 6a bc 94 eb 47 39             TAG (MAC)

Within a long-lived Longlink connection, the IV is incremented for each record encrypted. If a new Longlink connection is created, the handshake is restarted and new key material is generated.

2B2. Shortlink

Shortlink connections can only contain a single MMTLS packet request and a single MMTLS packet response (via HTTP POST request and response, respectively). After the initial Shortlink ClientHello sent on startup, WeChat will send ClientHello with Handshake Resumption packets. These records have the header 19 f1 04 instead of the 16 f1 04 on the regular ClientHello/ServerHello handshake packets.

An abbreviated sample of a Shortlink request packet containing Handshake Resumption is shown below.

19 f1 04 (Handshake Resumption Record header) . . .
01 04 f1 (ClientHello) . . .
9b c5 3c 42 7a 5b 1a 3b . . . (ClientRandom) . . .
71 ae ce ff d8 3f 29 48 . . . (NEW Resumption Ticket) . . .
19 f1 04 (Handshake Resumption Record header) . . .
47 4c 34 03 71 9e . . . (ENCRYPTED Extensions) . . .
17 f1 04 (Data Record header) . . .
98 cd 6e a0 7c 6b . . . (ENCRYPTED EarlyData) . . .
15 f1 04 (Alert Record header) . . .
8a d1 c3 42 9a 30 . . . (ENCRYPTED Alert (ClientFinished)) . . .

Note that, based on our understanding of the MMTLS protocol, the ClientRandom sent in this packet is not used at all by the server, because there is no need to re-run Diffie-Hellman in a resumed session. The Resumption Ticket is used by the server to identify which prior-established shared secret should be used to decrypt the following packet content.

Encryption for Shortlink packets is done using AES-GCM with the handshake parameters key_enc and iv_enc. (Note that, despite their identical name, key_enc and iv_enc here are different from those of the Longlink, since Shortlink and Longlink each complete their own handshake using different elliptic curve client keypair.) The iv_enc is incremented for each record encrypted. Usually, EarlyData records sent over Shortlink contain ciphertext that has been encrypted with Business-layer Encryption as well as associated metadata. This metadata and ciphertext will then be additionally encrypted at this layer.

The reason this is referred to as EarlyData internally in WeChat is likely due to it being borrowed from TLS; typically, it refers to the data that is encrypted with a key derived from a pre-shared key, before the establishment of a regular session key via Diffie-Hellman. However, in this case, when using Shortlink, there is no data sent “after the establishment of a regular session key”, so almost all Shortlink data is encrypted and sent in this EarlyData section.

Finally, ClientFinished indicates that the client has finished its side of the handshake. It is an encrypted Alert record with a fixed message that always follows the EarlyData Record. From our reverse-engineering, we found that the handlers for this message referred to it as ClientFinished.

3. Business-layer Request

MMTLS Data Records either carry an “Business-layer request” or heartbeat messages. In other words, if one decrypts the payload from an MMTLS Data Record, the result will often be messages described below.

This Business-layer request contains several metadata parameters that describe the purpose of the request, including the internal URI and the request type number, which we briefly described in the “Launching a WeChat network request” section.

When logged-in, the format of a Business-layer request looks like the following:

00 00 00 7b                 (total data length)
00 24                       (URI length)
/cgi-bin/micromsg-bin/...   (URI)
00 12                       (hostname length)
sgshort.wechat.com          (hostname)
00 00 00 3D                 (length of rest of data)
BF B6 5F                    (request flags)
41 41 41 41                 (user ID)
42 42 42 42                 (device ID)
FC 03 48 02 00 00 00 00     (cookie)
1F 9C 4C 24 76 0E 00        (cookie)
D1 05 varint                (request_type)
0E 0E 00 02                 (4 more varints)
BD 95 80 BF 0D varint       (signature)
FE                          (flag)
80 D2 89 91
04 00 00                    (marks start of data)
08 A6 29 D1 A4 2A CA F1 ... (ciphertext)

Responses are formatted very similarly:

bf b6 5f                    (flags)
41 41 41 41                 (user ID)
42 42 42 42                 (device ID)
fc 03 48 02 00 00 00 00     (cookie)
1f 9c 4c 24 76 0e 00        (cookie)
fb 02 varint                (request_type)
35 35 00 02 varints
a9 ad 88 e3 08 varint       (signature)
fe
ba da e0 93
04 00 00                    (marks start of data)
b6 f8 e9 99 a1 f4 d1 20 . . . ciphertext

This request then contains another encrypted ciphertext, which is encrypted by what we refer to as Business-layer Encryption. Business-layer Encryption is separate from the system we described in the MMTLS Encryption section. The signature mentioned above is the output of genSignature(), which is discussed in the “Integrity check” section. Pseudocode for the serialization schemes and more samples of WeChat’s encrypted request header can be found in our Github repository.

4. Business-layer Encryption

WeChat Crypto diagrams (inner layer)

This section describes how the Business-layer requests described in Section 3 are encrypted and decrypted, and how the keys are derived. We note that the set of keys and encryption processes introduced in this section are completely separate from those referred to in the MMTLS Encryption section. Generally, for Business-layer Encryption, much of the protocol logic is handled in the Java code, and the Java code calls out to the C++ libraries for encryption and decryption calculations. Whereas for MMTLS Encryption everything is handled in C++ libraries, and occurs on a different process entirely. There is very little interplay between these two layers of encryption.

The Business-layer Encryption has two modes using different cryptographic processes: Asymmetric Mode and Symmetric Mode. To transition into Symmetric Mode, WeChat needs to perform an Autoauth request. Upon startup, WeChat typically goes through the three following stages:

Before the user logs in to their account, Business-layer Encryption first uses asymmetric cryptography to derive a shared secret via static Diffie-Hellman (static DH), then uses the shared secret as a key to AES-GCM encrypt the data. We name this Asymmetric Mode. In Asymmetric Mode, the client derives a new shared secret for each request.
Using Asymmetric Mode, WeChat can send an Autoauth request, to which the server would return an Autoauth response, which contains a session_key.
After the client obtains session_key, Business-layer Encryption uses it to AES-CBC encrypt the data. We name this Symmetric Mode since it only uses symmetric cryptography. Under Symmetric Mode, the same session_key can be used for multiple requests.

For Asymmetric Mode, we performed dynamic and static analysis of C++ functions in libwechatmm.so; in particular the HybridEcdhEncrypt() and HybridEcdhDecrypt() functions, which call AesGcmEncryptWithCompress() / AesGcmDecryptWithUncompress(), respectively.

For Symmetric Mode, the requests are handled in pack(), unpack(), and genSignature() functions in libMMProtocalJNI.so. Generally, pack() handles outgoing requests, and unpack() handles incoming responses to those requests. They also perform encryption/decryption. Finally, genSignature() computes a checksum over the full request. In the Github repository, we’ve uploaded pseudocode for pack, AES-CBC encryption, and the genSignature routine.

The Business-layer Encryption is also tightly integrated with WeChat’s user authentication system. The user needs to log in to their account before the client is able to send an Autoauth request. For clients that have not logged in, they exclusively use Asymmetric Mode. For clients that have already logged in, their first Business-layer packet would most often be an Autoauth request encrypted using Asymmetric Mode, however, the second and onward Business-layer packets are encrypted using Symmetric Mode.

Figure 2: Business-layer encryption, logged-out, logging-in, and logged-in: Swimlane diagrams showing at a high-level what Business-layer Encryption requests look like, including which secrets are used to generate the key material used for encryption. 🔑secret is generated via DH(static server public key, client private key), and 🔑new_secret is DH(server public key, client private key). 🔑session is decrypted from the first response when logged-in. Though it isn’t shown above, 🔑new_secret is also used in genSignature() when logged-in; this signature is sent with request and response metadata.

4A. Business-layer Encryption, Asymmetric Mode

Before the user logs in to their WeChat account, the Business-layer Encryption process uses a static server public key, and generates new client keypair to agree on a static Diffie-Hellman shared secret for every WeChat network request. The shared secret is run through the HKDF function and any data is encrypted with AES-GCM and sent alongside the generated client public key so the server can calculate the shared secret.

For each request, the client generates a public, private keypair for use with ECDH. We also note that the client has a static server public key pinned in the application. The client then calculates an initial secret.

secret = ECDH(static_server_pub, client_priv)
hash = sha256(client_pub)
client_random = <32 randomly generated bytes>
derived_key = HKDF(secret)

derived_key is then used to AES-GCM encrypt the data, which we describe in detail in the next section.

4B. Business-layer Encryption, obtaining session_key

If the client is logged-in (i.e., the user has logged in to a WeChat account on a previous app run), the first request will be a very large data packet authenticating the client to the server (referred to as Autoauth in WeChat internals) which also contains key material. We refer to this request as the Autoauth request. In addition, the client pulls a locally-stored key autoauth_key, which we did not trace the provenance of, since it does not seem to be used other than in this instance. The key for encrypting this initial request (authrequest_data) is derived_key, calculated in the same way as in Section 4A. The encryption described in the following is the Asymmetric Mode encryption, albeit a special case where the data is the authrequest_data.

Below is an abbreviated version of a serialized and encrypted Autoauth request:

    08 01 12 . . . [Header metadata]
    04 46 40 96 4d 3e 3e 7e [client_publickey] . . .
    fa 5a 7d a7 78 e1 ce 10 . . . [ClientRandom encrypted w secret]
    a1 fb 0c da . . .               [IV]
    9e bc 92 8a 5b 81 . . .         [tag]
    db 10 d3 0f f8 e9 a6 40 . . . [ClientRandom encrypted w autoauth_key]
    75 b4 55 30 . . .               [IV]
    d7 be 7e 33 a3 45 . . .         [tag]
    c1 98 87 13 eb 6f f3 20 . . . [authrequest_data encrypted w derived_key]
    4c ca 86 03 . .                 [IV]
    3c bc 27 4f 0e 7b . . .         [tag]

A full sample of the Autoauth request and response at each layer of encryption can be found in the Github repository. Finally, we note that the autoauth_key above does not seem to be actively used outside of encrypting in this particular request. We suspect this is vestigial from a legacy encryption protocol used by WeChat.

The client encrypts here using AES-GCM with a randomly generated IV, and uses a SHA256 hash of the preceding message contents as AAD. At this stage, the messages (including the ClientRandom messages) are always ZLib compressed before encryption.

iv = <12 random bytes>

compressed = zlib_compress(plaintext)

ciphertext, tag = AESGCM_encrypt(compressed, aad = hash(previous), derived_key, iv)

In the above, previous is the header of the request (i.e. all header bytes preceding the 04 00 00 marker of data start). The client appends the 12-byte IV, then the 16-byte tag, onto the ciphertext. This tag can be used by the server to verify the integrity of the ciphertext, and essentially functions as a MAC.

4B1. Obtaining session_key: Autoauth Response

The response to autoauth is serialized similarly to the request:

08 01 12 . . . [Header metadata]
04 46 40 96 4d 3e 3e 7e [new_server_pub] . . .
c1 98 87 13 eb 6f f3 20 . . . [authresponse_data encrypted w new_secret]
4c ca 86 03 . . [IV]
3c bc 27 4f 0e 7b . . . [tag]

With the newly received server public key (new_server_pub), which is different from the static_server_pub hardcoded in the app, the client then derives a new secret (new_secret). new_secret is then used as the key to AES-GCM decrypt authresponse_data. The client can also verify authresponse_data with the given tag.

new_secret = ECDH(new_server_pub, client_privatekey)

authresponse_data= AESGCM_decrypt(aad = hash(authrequest_data),

new_secret, iv)

authresponse_data is a serialized Protobuf containing a lot of important data for WeChat to start, starting with a helpful “Everything is ok” status message. A full sample of this Protobuf can be found in the Github repository. Most importantly, authresponse_data contains session_key, which is the key used for future AES-CBC encryption under Symmetric Mode. From here on out, new_secret is only used in genSignature(), which is discussed below in Section 4C2 Integrity Check.

We measured the entropy of the session_key provided by the server, as it is used for future encryption. This key exclusively uses printable ASCII characters, and is thus limited to around ~100 bits of entropy.

The WeChat code refers to three different keys: client_session, server_session, and single_session. Generally, client_session refers to the client_publickey, server_session refers to the shared secret key generated using ECDH i.e. new_secret, and single_session refers to the session_key provided by the server.

4C. Business-layer Encryption, Symmetric Mode

After the client receives session_key from the server, future data is encrypted using Symmetric Mode. Symmetric Mode encryption is mostly done using AES-CBC instead of AES-GCM, with the exception of some large files being encrypted with AesGcmEncryptWithCompress(). As AesGcmEncryptWithCompress() requests are the exception, we focus on the more common use of AES-CBC.

Specifically, the Symmetric Mode uses AES-CBC with PKCS-7 padding, with the session_key as a symmetric key:

ciphertext = AES-CBC(PKCS7_pad(plaintext), session_key, iv = session_key)

This session_key is doubly used as the IV for encryption.

4C1. Integrity check

In Symmetric Mode, a function called genSignature() calculates a pseudo-integrity code on the plaintext. This function first calculates the MD5 hash of WeChat’s assigned user ID for the logged-in user (uin), new_secret, and the plaintext length. Then, genSignature() uses Adler32, a checksumming function, on the MD5 hash concatenated with the plaintext.

signature = adler32(md5(uin | new_secret | plaintext_len) |
            plaintext)

The result from Adler32 is concatenated to the ciphertext as metadata (see Section 3A for how it is included in the request and response headers), and is referred to as a signature in WeChat’s codebase. We note that though it is referred to as a signature, it does not provide any cryptographic properties; details can be found in the Security Issues section. The full pseudocode for this function can also be found in the Github repository.

5. Protobuf data payload

The input to Business-layer Encryption is generally a serialized Protobuf, optionally compressed with Zlib. When logged-in, many of the Protobufs sent to the server contain the following header data:

"1": {
    "1": "\u0000",
    "2": "1111111111", # User ID (assigned by WeChat)
    "3": "AAAAAAAAAAAAAAA\u0000", # Device ID (assigned by WeChat)
    "4": "671094583", # Client Version
    "5": "android-34", # Android Version
    "6": "0"
    },

The Protobuf structure is defined in each API’s corresponding RR class, as we previously mentioned in the “Launching a WeChat network request” section.

6. Putting it all together

In the below diagram, we demonstrate the network flow for the most common case of opening the WeChat application. We note that in order to prevent further complicating the diagram, HKDF derivations are not shown; for instance, when “🔑mmtls” is used, HKDF is used to derive a key from “🔑mmtls”, and the derived key is used for encryption. The specifics of how keys are derived, and which derived keys are used to encrypt which data, can be found in these notes.

Figure 3: Swimlane diagram demonstrating the encryption setup and network flow of the most common case (user is logged in, opens WeChat application).

We note that other configurations are possible. For instance, we have observed that if the Longlink MMTLS handshake completes first, the Business-layer “Logging-in” request and response can occur over the Longlink connection instead of over several shortlink connections. In addition, if the user is logged-out, Business-layer requests are simply encrypted with 🔑secret (resembling Shortlink 2 requests)

Security issues

In this section, we outline potential security issues and privacy weaknesses we identified with the construction of the MMTLS encryption and Business-layer encryption layers. There could be other issues as well.

Issues with MMTLS encryption

Below we detail the issues we found with WeChat’s MMTLS encryption.

Deterministic IV

The MMTLS encryption process generates a single IV once per connection. Then, they increment the IV for each subsequent record encrypted in that connection. Generally, NIST recommends not using a wholly deterministic derivation for IVs in AES-GCM since it is easy to accidentally re-use IVs. In the case of AES-GCM, reuse of the (key, IV) tuple is catastrophic as it allows key recovery from the AES-GCM authentication tags. Since these tags are appended to AES-GCM ciphertexts for authentication, this enables plaintext recovery from as few as 2 ciphertexts encrypted with the same key and IV pair.

In addition, Bellare and Tackmann have shown that the use of a deterministic IV can make it possible for a powerful adversary to brute-force a particular (key, IV) combination. This type of attack applies to powerful adversaries, if the crypto system is deployed to a very large (i.e., the size of the Internet) pool of (key, IV) combinations being chosen. Since WeChat has over a billion users, this order of magnitude puts this attack within the realm of feasibility.

Lack of forward secrecy

Forward secrecy is generally expected of modern communications protocols to reduce the importance of session keys. Generally, TLS itself is forward-secret by design, except in the case of the first packet of a “resumed” session. This first packet is encrypted with a “pre-shared key”, or PSK established during a previous handshake.

MMTLS makes heavy use of PSKs by design. Since the Shortlink transport format only supports a single round-trip of communication (via a single HTTP POST request and response), any encrypted data sent via the transport format is encrypted with a pre-shared key. Since leaking the shared `PSK_ACCESS` secret would enable a third-party to decrypt any EarlyData sent across multiple MMTLS connections, data encrypted with the pre-shared key is not forward secret. The vast majority of records encrypted via MMTLS are sent via the Shortlink transport, which means that the majority of network data sent by WeChat is not forward-secret between connections. In addition, when opening the application, WeChat creates a single long-lived Longlink connection. This long-lived Longlink connection is open for the duration of the WeChat application, and any encrypted data that needs to be sent is sent over the same connection. Since most WeChat requests are either encrypted using (A) a session-resuming PSK or (B) the application data key of the long-lived Longlink connection, WeChat’s network traffic often does not retain forward-secrecy between network requests.

Issues with Business-layer encryption

On its own, the business-layer encryption construction, and, in particular the Symmetric Mode, AES-CBC construction, has many severe issues. Since the requests made by WeChat are double-encrypted, and these concerns only affect the inner, business layer of encryption, we did not find an immediate way to exploit them. However, in older versions of WeChat which exclusively used business-layer encryption, these issues would be exploitable.

Metadata leak

Business-layer encryption does not encrypt metadata such as the user ID and request URI, as shown in the “Business-layer request” section. This issue is also acknowledged by the WeChat developers themselves to be one of the motivations to develop MMTLS encryption.

Forgeable genSignature integrity check

While the purpose of the genSignature code is not entirely clear, if it is being used for authentication (since the ecdh_key is included in the MD5) or integrity, it fails on both parts. A valid forgery can be calculated with any known plaintext without knowledge of the ecdh_key. If the client generates the following for some known plaintext message plaintext:

sig = adler32(md5(uin | ecdh_key | plaintext_len) | plaintext)

We can do the following to forge the signature evil_sig for some evil_plaintext with length plaintext_len:

evil_sig = sig - adler32(plaintext) + adler32(evil_plaintext)

Subtracting and adding from adler32 checksums is achievable by solving for a system of equations when the message is short. Code for subtracting and adding to adler32 checksum, thereby forging this integrity check, can be found in adler.py in our Github repository.

Possible AES-CBC padding oracle

Since AES-CBC is used alongside PKCS7 padding, it is possible that the use of this encryption on its own would be susceptible to an AES-CBC padding oracle, which can lead to recovery of the encrypted plaintext. Earlier this year, we found that another custom cryptography scheme developed by a Tencent company was susceptible to this exact attack.

Key, IV re-use in block cipher mode

Re-using the key as the IV for AES-CBC, as well as re-using the same key for all encryption in a given session (i.e., the length of time that the user has the application opened) introduces some privacy issues for encrypted plaintexts. For instance, since the key and the IV provide all the randomness, re-using both means that if two plaintexts are identical, they will encrypt to the same ciphertext. In addition, due to the use of CBC mode in particular, two plaintexts with identical N block-length prefixes will encrypt to the same first N ciphertext blocks.

Encryption key issues

It is highly unconventional for the server to choose the encryption key used by the client. In fact, we note that the encryption key generated by the server (the “session key”) exclusively uses printable ASCII characters. Thus, even though the key is 128 bits long, the entropy of this key is at most 106 bits.

No forward secrecy

As mentioned in the previous section, forward-secrecy is a standard property for modern network communication encryption. When the user is logged-in, all communication with WeChat, at this encryption layer, is done with the exact same key. The client does not receive a new key until the user closes and restarts WeChat.

Other versions of WeChat

To confirm our findings, we also tested our decryption code on WeChat 8.0.49 for Android (released April 2024) and found that the MMTLS network format matches that used by WeChat 8.0.49 for iOS.

Previous versions of WeChat network encryption

To understand how WeChat’s complex cryptosystems are tied together, we also briefly reverse-engineered an older version of WeChat that did not utilize MMTLS. The newest version of WeChat that did not utilize MMTLS was v6.3.16, released in 2016. Our full notes on this reverse-engineering can be found here.

While logged-out, requests were largely using the Business-layer Encryption cryptosystem, using RSA public-key encryption rather than static Diffie-Hellman plus symmetric encryption via AES-GCM. We observed requests to the internal URIs cgi-bin/micromsg-bin/encryptcheckresupdate and cgi-bin/micromsg-bin/getkvidkeystrategyrsa.

There was also another encryption mode used, DES with a static key. This mode was used for sending crash logs and memory stacks; POST requests to the URI /cgi-bin/mmsupport-bin/stackreport were encrypted using DES.

We were not able to login to this version for dynamic analysis, but from our static analysis, we determined that the encryption behaves the same as Business-layer Encryption when logged-in (i.e. using a session_key provided by the server for AES-CBC encryption).

Discussion

Why does Business-layer encryption matter?

Since Business-layer encryption is wrapped in MMTLS, why should it matter whether or not it is secure? First, from our study of previous versions of WeChat, Business-layer encryption was the sole layer of encryption for WeChat network requests until 2016. Second, from the the fact that Business-layer encryption exposes internal request URI unencrypted, one of the possible architectures for WeChat would be to host different internal servers to handle different types of network requests (corresponding to different “requestType” values and different cgi-bin request URLs). It could be the case, for instance, that after MMTLS is terminated at the front WeChat servers (handles MMTLS decryption), the inner WeChat request that is forwarded to the corresponding internal WeChat server is not re-encrypted, and therefore solely encrypted using Business-layer encryption. A network eavesdropper, or network tap, placed within WeChat’s intranet could then attack the Business-layer encryption on these forwarded requests. However, this scenario is purely conjectural. Tencent’s response to our disclosure is concerned with issues in Business-layer encryption and implies they are slowly migrating from the more problematic AES-CBC to AES-GCM, so Tencent is also concerned with this.

Why not use TLS?

According to public documentation and confirmed by our own findings, MMTLS (the “Outer layer” of encryption) is based heavily on TLS 1.3. In fact, the document demonstrates that the architects of MMTLS have a decent understanding of asymmetric cryptography in general.

The document contains reasoning for not using TLS. It explains that the way WeChat uses network requests necessitates something like 0-RTT session resumption, because the majority of WeChat data transmission needs only one request-response cycle (i.e., Shortlink). MMTLS only required one round-trip handshake to establish the underlying TCP connection before any application data can be sent; according to this document, introducing another round-trip for the TLS 1.2 handshake was a non-starter.

Fortunately, TLS1.3 proposes a 0-RTT (no additional network delay) method for the protocol handshake. In addition, the protocol itself provides extensibility through the version number, CipherSuite, and Extension mechanisms. However, TLS1.3 is still in draft phases, and its implementation may still be far away. TLS1.3 is also a general-purpose protocol for all apps, given the characteristics of WeChat, there is great room for optimization. Therefore, at the end, we chose to design and implement our own secure transport protocol, MMTLS, based on the TLS1.3 draft standard. [originally written in Chinese]

However, even at the time of writing in 2016, TLS 1.2 did provide an option for session resumption. In addition, since WeChat controls both the servers and the clients, it doesn’t seem unreasonable to deploy the fully-fledged TLS 1.3 implementations that were being tested at the time, even if the IETF draft was incomplete.

Despite the architects of MMTLS’ best effort, generally, the security protocols used by WeChat seem both less performant and less secure than TLS 1.3. Generally speaking, designing a secure and performant transport protocol is no easy feat.

The issue of performing an extra round-trip for a handshake has been a perennial issue for application developers. The TCP and TLS handshake each require a single round-trip, meaning each new data packet sent requires two round-trips. Today, TLS-over-QUIC combines the transport-layer and encryption-layer handshakes, requiring only a single handshake. QUIC provides the best of both worlds, both strong, forward-secret encryption, and halving the number of round-trips needed for secure communication. Our recommendation would be for WeChat to migrate to a standard QUIC implementation.

Finally, there is also the issue of client-side performance, in addition to network performance. Since WeChat’s encryption scheme performs two layers of encryption per request, the client is performing double the work to encrypt data, than if they used a single standardized cryptosystem.

The trend of home-rolled cryptography in Chinese applications

The findings here contribute to much of our prior research that suggests the popularity of home-grown cryptography in Chinese applications. In general, the avoidance of TLS and the preference for proprietary and non-standard cryptography is a departure from cryptographic best practices. While there may have been many legitimate reasons to distrust TLS in 2011 (like EFF and Access Now’s concerns over the certificate authority ecosystem), the TLS ecosystem has largely stabilized since then, and is more auditable and transparent. Like MMTLS, all the proprietary protocols we have researched in the past contain weaknesses relative to TLS, and, in some cases, could even be trivially decrypted by a network adversary. This is a growing, concerning trend unique to the Chinese security landscape as the global Internet progresses towards technologies like QUIC or TLS to protect data in transit.

Anti-DNS-hijacking mechanisms

Similar to how Tencent wrote their own cryptographic system, we found that in Mars they also wrote a proprietary domain lookup system. This system is part of STN and has the ability to support domain name to IP address lookups over HTTP. This feature is referred to as “NewDNS” in Mars. Based on our dynamic analysis, this feature is regularly used in WeChat. At first glance, NewDNS duplicates the same functions already provided by DNS (Domain Name System), which is already built into nearly all internet-connected devices.

WeChat is not the only app in China that utilizes such a system. Major cloud computing providers in China such as Alibaba Cloud and Tencent Cloud both offer their own DNS over HTTP service. A VirusTotal search for apps that tries to contact Tencent Cloud’s DNS over HTTP service endpoint (119.29.29.98) yielded 3,865 unique results.

One likely reason for adopting such a system is that ISPs in China often implement DNS hijacking to insert ads and redirect web traffic to perform ad fraud. The problem was so serious that six Chinese internet giants issued a joint statement in 2015 urging ISPs to improve. According to the news article, about 1–2% of traffic to Meituan (an online shopping site) suffers from DNS hijacking. Ad fraud by Chinese ISPs seems to remain a widespread problem in recent years.

Similar to their MMTLS cryptographic system, Tencent’s NewDNS domain lookup system was motivated by trying to meet the needs of the Chinese networking environment. DNS proper over the years has proven to have multiple security and privacy issues. Compared to TLS, we found that WeChat’s MMTLS has additional deficiencies. However, it remains an open question as to, when compared to DNS proper, whether NewDNS is more or less problematic. We leave this question for future work.

Use of Mars STN outside WeChat

We speculate that there is a widespread adoption of Mars (mars-open) outside of WeChat, based on the following observations:

There are numerous issues opened on the Mars GitHub repository.
There are plenty of technical articles outlining building instant messaging systems using Mars.
There is already a white-label instant messaging system product that is based on Mars.

The adoption of Mars outside of WeChat is concerning because Mars by default does not provide any transport encryption. As we have mentioned in the “Three Parts of Mars” section, the MMTLS encryption used in WeChat is part of mars-wechat, which is not open source. The Mars developers also have no plans to add support of TLS, and expect other developers using Mars to implement their own encryption in the upper layers. To make matters worse, implementing TLS within Mars seems to require a fair bit of architectural changes. Even though it would not be unfair for Tencent to keep MMTLS proprietary, MMTLS is still the main encryption system that Mars was designed for, leaving MMTLS proprietary would mean other developers using Mars would have to either devote significant resources to integrate a different encryption system with Mars, or leave everything unencrypted.

Mars is also lacking in documentation. The official wiki only contains a few, old articles on how to integrate with Mars. Developers using Mars often resort to asking questions on GitHub. The lack of documentation means that developers are more prone to making mistakes, and ultimately reducing security.

Further research is needed in this area to analyze the security of apps that use Tencent’s Mars library.

“Tinker”, a dynamic code-loading module

In this section, we tentatively refer to the APK downloaded from the Google Play Store as “WeChat APK”, and the APK downloaded from WeChat’s official website as “Weixin APK”. The distinction between WeChat and Weixin seems blurry. The WeChat APK and Weixin APK contain partially different code, as we will later discuss in this section. However, when installing both of these APKs to an English-locale Android Emulator, they both show their app names as “WeChat”. Their application ID, which is used by the Android system and Google Play Store to identify apps, are also both “com.tencent.mm”. We were also able to login to our US-number accounts using both APKs.

Unlike the WeChat APK, we found that the Weixin APK contains Tinker, “a hot-fix solution library”. Tinker allows the developer to update the app itself without calling Android’s system APK installer by using a technique called “dynamic code loading”. In an earlier report we found a similar distinction between TikTok and Douyin, where we found Douyin to have a similar dynamic code-loading feature that was not present in TikTok. This feature raises three concerns:

If the process for downloading and loading the dynamic code does not sufficiently authenticate the downloaded code (e.g., that it is cryptographically signed with the correct public key, that it is not out of date, and that it is the code intended to be downloaded and not other cryptographically signed and up-to-date code), an attacker might be able to exploit this process to run malicious code on the device (e.g., by injecting arbitrary code, by performing a downgrade attack, or by performing a sidegrade attack). Back in 2016, we found such instances in other Chinese apps.
Even if the code downloading and loading mechanism contains no weaknesses, the dynamic code loading feature still allows the application to load code without notifying the user, bypassing users’ consent to decide what program could run on their device. For example, the developer may push out an unwanted update, and the users do not have a choice to keep using the old version. Furthermore, a developer may selectively target a user with an update that compromises their security or privacy. In 2016, a Chinese security analyst accused Alibaba of pushing dynamically loaded code to Alipay to surreptitiously take photos and record audio on his device.
Dynamically loading code deprives app store reviewers from reviewing all relevant behavior of an app’s execution. As such, the Google Play Developer Program Policy does not permit apps to use dynamic code loading.

When analyzing the WeChat APK, we found that, while it retains some components of Tinker. The component which seems to handle the downloading of app updates is present, however the core part of Tinker that handles loading and executing the downloaded app updates has been replaced with “no-op” functions, which perform no actions. We did not analyze the WeChat binaries available from other third party app stores.

Further research is needed to analyze the security of Tinker’s app update process, whether WeChat APKs from other sources contain the dynamic code loading feature, as well as any further differences between the WeChat APK and Weixin APK.

Recommendations

In this section, we make recommendations based on our findings to relevant audiences.

To application developers

Implementing proprietary encryption is more expensive, less performant, and less secure than using well-scrutinized standard encryption suites. Given the sensitive nature of data that can be sent by applications, we encourage application developers to use tried-and-true encryption suites and protocols and to avoid rolling their own crypto. SSL/TLS has seen almost three decades of various improvements as a result of rigorous public and academic scrutiny. TLS configuration is now easier than ever before, and the advent of QUIC-based TLS has dramatically improved performance.

To Tencent and WeChat developers

Below is a copy of the recommendations we sent to WeChat and Tencent in our disclosure. The full disclosure correspondence can be found in the Appendix.

In this post from 2016, WeChat developers note that they wished to upgrade their encryption, but the addition of another round-trip for the TLS 1.2 handshake would significantly degrade WeChat network performance, as the application relies on many short bursts of communication. At that time, TLS 1.3 was not yet an RFC (though session resumption extensions were available for TLS 1.2), so they opted to “roll their own” and incorporate TLS 1.3’s session resumption model into MMTLS.

This issue of performing an extra round-trip for a handshake has been a perennial issue for application developers around the world. The TCP and TLS handshake each require a single round-trip, meaning each new data packet sent requires two round-trips. Today, TLS-over-QUIC combines the transport-layer and encryption-layer handshakes, requiring only a single handshake. QUIC was developed for this express purpose, and can provide both strong, forward-secret encryption, while halving the number of round-trips needed for secure communication. We also note that WeChat seems to already use QUIC for some large file downloads. Our recommendation would be for WeChat to migrate entirely to a standard TLS or QUIC+TLS implementation.

There is also the issue of client-side performance, in addition to network performance. Since WeChat’s encryption scheme performs two layers of encryption per request, the client is performing double the work to encrypt data than if WeChat used a single standardized cryptosystem.

To operating systems

On the web, client-side browser security warnings and the use of HTTPS as a ranking factor in search engines contributed to widespread TLS adoption. We can draw loose analogies to the mobile ecosystem’s operating systems and application stores.

Is there any platform or OS-level permission model that can indicate regular usage of standard encrypted network communications? As we mentioned in our prior work studying proprietary cryptography in Chinese IME keyboards, OS developers could consider device permission models that surface whether applications use lower-level system calls for network access.

To high-risk users with privacy concerns

Many WeChat users use it out of necessity rather than choice. For users with privacy concerns who are using WeChat out of necessity, our recommendations from the previous report still hold:

Avoid features delineated as “Weixin” services if possible. We note that many core “Weixin” services (such as Search, Channels, Mini Programs) as delineated by the Privacy Policy perform more tracking than core “WeChat” services.
When possible, prefer web or applications over Mini Programs or other such embedded functionality.
Use stricter device permissions and update your software and OS regularly for security features.

In addition, due to the risks introduced by dynamic code loading in WeChat downloaded from the official website, we recommend users to instead download WeChat from the Google Play Store whenever possible. For users who have already installed WeChat from the official website, removing and re-installing the Google Play Store version would also mitigate the risk.

To security and privacy researchers

As WeChat has over one billion users, we posit that the order of magnitude of global MMTLS users is on a similar order of magnitude as global TLS users. Despite this, there is little-to-no third-party analysis or scrutiny of MMTLS, as there is in TLS. At this scale of influence, MMTLS deserves similar scrutiny as TLS. We implore future security and privacy researchers to build on this work to continue the study of the MMTLS protocol, as from our correspondences, Tencent insists on continuing to use and develop MMTLS for WeChat connections.

Acknowledgments

We would like to thank Jedidiah Crandall, Jakub Dalek, Prateek Mittal, and Jonathan Mayer for their guidance and feedback on this report. Research for this project was supervised by Ron Deibert.

Appendix

In this appendix, we detail our disclosure to Tencent concerning our findings and their response.

April 24, 2024 — Our disclosure

To Whom It May Concern:

The Citizen Lab is an academic research group based at the Munk School of Global Affairs & Public Policy at the University of Toronto in Toronto, Canada.

We analyzed WeChat v8.0.23 on Android and iOS as part of our ongoing work analyzing popular mobile and desktop apps for security and privacy issues. We found that WeChat’s proprietary network encryption protocol, MMTLS, contains weaknesses compared to modern network encryption protocols, such as TLS or QUIC+TLS. For instance, the protocol is not forward-secret and may be susceptible to replay attacks. We plan on publishing a documentation of the MMTLS network encryption protocol and strongly suggest that WeChat, which is responsible for the network security of over 1 billion users, switch to a strong and performant encryption protocol like TLS or QUIC+TLS.

For further details, please see the attached document.

Timeline to Public Disclosure

The Citizen Lab is committed to research transparency and will publish details regarding the security vulnerabilities it discovers in the context of its research activities, absent exceptional circumstances, on its website: https://citizenlab.ca/.

The Citizen Lab will publish the details of our analysis no sooner than 45 calendar days from the date of this communication.

Should you have any questions about our findings please let us know. We can be reached at this email address: disclosure@citlab.utoronto.ca.

Sincerely,

The Citizen Lab

May 17, 2024 — Tencent’s response

Thank you for your report.Since receiving your report on April 25th, 2024, we have conducted a careful evaluation.The core of WeChat’s security protocol is outer layer mmtls encryption, currently ensuring that outer layer mmtls encryption is secure. On the other hand, the encryption issues in the inner layer are handled as follows: the core data traffic has been switched to AES-GCM encryption, while other traffic is gradually switching from AES-CBC to AES-GCM.If you have any other questions, please let us know.thanks.

The terms “shortlink” and “longlink” do not seem to be specific to WeChat, since it was also mentioned in other technical blogs.↩︎
On Android, the main process is named after the app ID, “com.tencent.mm”. (The process name can be seen using the ps command in adb shell.) When an app starts a new process, it assigns a name. The assigned name will be added to the app ID to form the full name of the new process. So the “:push” process’s full name is “com.tencent.mm:push”.↩︎
This server heartbeat is a reply to a prior client-sent heartbeat.↩︎

敲敲打打：一系列雲端輸入法漏洞允許網路攻擊者監看輸入內容（摘要）

Jeffrey Knockel — Tue, 23 Apr 2024 11:59:54 +0000

重要：我們建議所有使用者立即更新他們所使用的輸入法軟體以及作業系統。並建議高風險使用者停止使用任何輸入法提供的雲端建議功能，改為使用完全離線的輸入法，以避免資料外洩。
本文是完整報告的摘要翻譯。

重要發現

我們分析了常見雲端拼音輸入鍵盤的安全性，包含百度、榮耀、華為、訊飛、OPPO、三星、騰訊九家廠商，並檢視了它們傳送使用者輸入到雲端的過程是否含有安全缺陷。
分析結果指出，九家廠商中，有八家輸入法軟體包含嚴重漏洞，讓我們得以完整破解廠商設計用於保護使用者輸入內容的加密法。亦有部分廠商並未使用任何加密法保護使用者輸入內容。
綜合本研究和我們先前研究中發現的搜狗輸入法漏洞，我們估計至多有十億使用者受到這些漏洞影響。基於下述原因，我們認為使用者輸入的內容可能已經遭到大規模收集：
- 這些漏洞影響眾多使用者
- 使用者在鍵盤中輸入的資訊極為敏感
- 發現這些漏洞不需要高深技術
- 五眼聯盟過去曾利用中國應用程式中類似的漏洞施行監控
我們已向受影響的九家開發商回報這些漏洞，大部分開發商均認真看待並回應我們，並修補漏洞，但仍有少數輸入法未修補漏洞。
在報告的最後，我們提供綜合建議予受漏洞影響的各方，我們期待這些建議可以減少未來類似漏洞所造成的危害。

漏洞總結

在我們測試的 9 家廠商的應用程式中，僅有華為的產品未被發現任何傳輸使用者輸入相關的安全問題，其餘每一家廠商都至少有一個應用程式含有漏洞，使得被動的網路攻擊者得以監看使用者輸入的完整內容。

註：主動的網路監聽意指監聽時必須要主動發出訊號，例如在傳輸過程中篡改少數資料位元，才能達成解密。主動的網路監聽有可能可以被偵測到。被動的網路監聽意指無需發出任何訊號，單純讀取傳輸中的的資料，即可達成解密。被動的網路監聽難以被偵測到。

圖例
✘✘	主動和被動的網路監聽者可以破解加密的使用者輸入內容，且我們成功實測此方法
✘	主動的網路監聽者可以破解加密的使用者輸入內容，且我們成功實測此方法
!	加密法實作中存在弱點
✔	未發現問題
N/A	該產品在我們測試的裝置上不提供或是不存在

輸入法開發商	Android	iOS	Windows
騰訊^†	✘	N/A	✘
百度	!	!	✘✘
訊飛	✘✘	✔	✔

預載輸入法開發商

裝置製造商	自有	搜狗	百度	訊飛	iOS	Windows
三星	✘✘	✔*	✘✘	N/A	N/A	N/A
華為	✔*	✔	N/A	N/A	N/A	N/A
小米	N/A	✘*	✘✘	✘✘	N/A	N/A
OPPO	N/A	✘	✘✘*	N/A	N/A	N/A
Vivo	✔*	✘	N/A	N/A	N/A	N/A
榮耀	N/A	N/A	✘✘*	N/A	N/A	N/A

* 在我們的測試裝置上，這個是預設的輸入法
^† QQ 输入法及搜狗輸入法都是由騰訊所開發，本研究中我們分析了 QQ 输入法，發現它含有我們先前在搜狗輸入法中發現的相同漏洞

修補總結

我們依據漏洞揭露政策，向各廠商回報了所發現的漏洞。除了百度、Vivo 和小米，其他廠商皆有回覆我們。在我們回報漏洞不久之後，百度修復了當中最嚴重的幾個，但並未修補其餘漏洞。數家手機製造商預載了有漏洞的輸入法程式，除了預載的百度輸入法之外，如今手機製造商都已經修補了這些漏洞。針對預載的百度輸入法，榮耀完全未修補任何漏洞，其餘廠商都只修補了部分最嚴重的漏洞。關於 QQ 输入法，騰訊早先表示（中譯）：「撇除已停止維護的產品，我們計劃將於 [2024] 第一季前將所有使用 EncryptWall （加密法）的活躍產品升級為使用 HTTPS。」截至 2024 年 4 月 1 日，我們未發現騰訊提供任何 QQ 输入法的修補，儘管 QQ 输入法仍提供外界下載，騰訊自 2020 年起就未再提供 QQ 输入法的更新，可能已經將此產品視為停止維護。我們與廠商的聯絡內容、時間以及其他細節，請見我們的完整版報告。

圖例
✘✘	主動和被動的網路監聽者可以破解加密的使用者輸入內容，且我們成功實測此方法
✘	主動的網路監聽者可以破解加密的使用者輸入內容，且我們成功實測此方法
!	加密法實作中存在弱點
✔	未發現問題
N/A	該產品在我們測試的裝置上不提供或是不存在

輸入法開發商	Android	iOS	Windows
騰訊^†	✘	N/A	✘
百度	!	!	!
訊飛	✔	✔	✔

預載輸入法開發商

裝置製造商	自有	搜狗	百度	訊飛	iOS	Windows
三星	✔	✔*	!	N/A	N/A	N/A
華為	✔*	✔	N/A	N/A	N/A	N/A
小米	N/A	✔*	!	✔	N/A	N/A
OPPO	N/A	✔	!*	N/A	N/A	N/A
Vivo	✔*	✔	N/A	N/A	N/A	N/A
榮耀	N/A	N/A	✘✘*	N/A	N/A	N/A

總結來說，除了榮耀以外，我們發現的加密破解方法在經過廠商修補後，均已無效。而在榮耀手機以外廠牌的百度輸入法中，仍持續存在加密的弱點，但我們暫時還未找到方法可以利用這些弱點解密傳輸中的使用者輸入資訊。

受影響軟體列表

我們建議所有使用者保持作業系統和應用程式（包含輸入法）在最新版本，若您有使用下列軟體，我們強烈建議您檢查並安裝這些軟體及作業系統最新的更新。截至 2024 年 4 月 1 日，下列軟體已有更新可供安裝，安裝後可修補我們發現的安全漏洞。

非作業系統預載（手動安裝）的第三方開發者的輸入法：

Android 和 Windows 平台的 Sogou IME / 搜狗输入法
Android 和 Windows 平台的 Baidu IME / 百度输入法（此開發者未完整修補我們發現的漏洞，詳情見下）
Android 平台的 iFlyTek IME / 讯飞输入法

三星中國版作業系統中預載的：

Samsung Keyboard
Baidu IME / 百度输入法

小米中國版作業系統中預載的：

Sogou IME Xiaomi Version / 搜狗输入法小米版
iFlyTek IME Xiaomi Version / 讯飞输入法小米版

OPPO 中國版作業系統中預載的：

Sogou IME Custom Version / 搜狗输入法定制版

Vivo 中國版作業系統中預載的：

Sogou IME Custom Version / 搜狗输入法定制版

下列軟體仍未使用 TLS，因此可能仍有漏洞：

非作業系統預載（手動安裝）的第三方開發者的輸入法：

Android, Windows, 和 iOS 平台的 Baidu IME / 百度输入法

小米中國版作業系統中預載的：

Baidu IME Xiaomi Version / 百度输入法小米版

OPPO 中國版作業系統中預載的：

Baidu IME Custom Version / 百度输入法定制版

下列軟體含有未修補的漏洞，能夠輕易被攻擊者所利用，我們建議使用者改用其他輸入法：

非作業系統預載（手動安裝）的第三方開發者的輸入法：

Android 和 Windows 平台的 QQ Pinyin IME / QQ 输入法

榮耀中國版作業系統中預載的：

Baidu IME Honor Version / 百度输入法荣耀版

綜合建議

給資安研究人員

資安研究人員應該多加研究東亞及其他熱門區域的手機應用程式生態系，即使這些區域並非研究人員原生的區域。
資安研究人員應發展更佳的動態及靜態分析方法，以利大規模尋找我們發現的此類型漏洞。
資安研究人員通報漏洞時應以開發者所在地區的常見語言寫出簡短摘要及郵件標題。

給應用程式商店

應用程式商店不應要求需註冊帳號才能下載安全性更新。
應用程式商店不應該根據地理位置阻擋安全性更新。
如同 Google Play
商店，其他應用程式商店應該提供方式讓開發者標示隱私和安全資訊，包含網路資料傳輸是否加密。
當開發者在應用程式商店中標示應用程式會加密所有傳輸資料時，應用程式商店應予顯示，當開發者並未如此標示時，應用程式商店亦應警告使用者。
應用程式商店應針對特定機敏類型的應用程式（例如輸入法）要求開發者保證所有傳輸資料均經加密，或保證不傳輸任何資料。

給輸入法開發者

使用經過廣泛測試的標準加密通訊協定，例如 TLS 及 QUIC。
儘可能將功能設計為可離線運作、不需傳輸任何敏感資料到雲端伺服器。

給手機作業系統開發者

如同 iOS, Android 應實作沙箱來限制輸入法程式的網路傳輸和其他危險行為，在使用者主動允許前不予放行。
Android 及 iOS 開發者應設計更好的「網路存取」權限，讓使用者一目瞭然應用程式是否透過網路傳輸任何資料。

給手機製造商

將輸入法整合並預載在作業系統之前，應稽核其安全性。

給一般使用者

搜狗、QQ、百度、訊飛輸入法的使用者，無論輸入法是手動從應用程式商店安裝或者原本就預載在作業系統當中，應確保輸入法及作業系統維持在最新版本。
顧慮隱私的使用者應停用任何輸入法中的雲端功能。
顧慮隱私的 iOS 使用者不應啟用輸入法的「允許完整存取權」。

敲敲打打：一系列云端输入法漏洞使网络攻击者得以监看个人用户的输入内容（摘要）

Jeffrey Knockel — Tue, 23 Apr 2024 11:59:41 +0000

重要：我们建议所有用户立即更新所使用的输入法软件以及操作系统。并建议高风险用户停止使用任何输入法提供的云端建议功能，改为完全离线的输入法，以避免数据外泄。
本文是完整版报告的摘要翻译。

重要发现

我们分析了常见云端拼音输入法的安全性，包含百度、荣耀、华为、讯飞、OPPO、三星、腾讯等九家厂商，并分析了它们发送用户输入内容到云端的过程是否含有安全缺陷。
分析结果指出，九家厂商中，有八家输入法软件包含严重漏洞，使我们得以完整破解厂商设计用于保护用户输入内容的加密法。亦有部分厂商并未使用任何加密法保护用户输入内容。
综合本研究和我们先前研究中发现的搜狗输入法漏洞，我们估计至多有十亿用户受到这些漏洞影响。基于下述原因，我们认为用户输入的内容可能已经遭到大规模收集：
- 这些漏洞影响了广泛的用户群体
- 用户在键盘中输入的信息极为敏感
- 发现这些漏洞不需要高深技术
- 五眼联盟过去曾利用中国应用程序中类似的漏洞施行监控
我们已向受影响的九家开发商提交这些漏洞，大部分开发商均认真看待问题并予以回应，修补了漏洞，但仍有少数输入法未修补漏洞。
在报告的末尾，我们为受漏洞影响的各方提供了综合建议，期待这些建议可以减少未来类似漏洞所造成的危害。

漏洞总结

在我们测试的九家厂商的应用程序中，仅有华为的产品未发现任何上传用户输入内容至云端相关的安全问题，其余每一家厂商都至少有一个应用程序含有漏洞，使得被动型网络攻击者得以监看用户输入的完整内容。

注：主动型网络监听攻击意指监听时必须要主动发出讯号，例如在信息传输过程中篡改少数比特数据，才能破解加密内容。主动型网络监听相对容易被侦测到。被动型网络监听攻击意指无需发出任何讯号，单纯读取传输中的的数据，即可达成解密。与主动性攻击相比，被动型网络监听攻击难以被侦测到。

图例
✘✘	主动和被动型网络监听者均可以破解加密的用户输入内容，已被我们成功实测
✘	主动型网络监听者可以破解加密的用户输入内容，已被我们成功实测
!	加密法实操中存在弱点
✔	未发现问题
N/A	该产品在我们测试的设备上不提供或是不存在

输入法开发商	Android	iOS	Windows
腾讯^†	✘	N/A	✘
百度	!	!	✘✘
讯飞	✘✘	✔	✔

内置输入法开发商

装置制造商	自有	搜狗	百度	讯飞	iOS	Windows
三星	✘✘	✔*	✘✘	N/A	N/A	N/A
华为	✔*	✔	N/A	N/A	N/A	N/A
小米	N/A	✘*	✘✘	✘✘	N/A	N/A
OPPO	N/A	✘	✘✘*	N/A	N/A	N/A
Vivo	✔*	✘	N/A	N/A	N/A	N/A
荣耀	N/A	N/A	✘✘*	N/A	N/A	N/A

* 在我们的测试设备上，此为默认的输入法
^† QQ 输入法及搜狗输入法都是由腾讯开发，本研究中我们分析了 QQ 输入法，发现它含有我们先前在搜狗输入法中发现的相同漏洞

补丁总结

我们依据漏洞披露政策，向各厂商提交了所发现的漏洞。除了百度、Vivo 和小米，其余厂商皆回复了我们。在我们提交这些漏洞不久之后，百度修复了当中最严重的几个，但并未修补其余漏洞。数家手机制造商在操作系统中内置了这些带有漏洞的输入法程序，除了百度输入法之外，如今手机制造商都已对内置输入法的这些漏洞作出修补。针对内置的百度输入法，荣耀完全未修补任何漏洞，其余厂商都只修补了部分最严重的漏洞。我们与厂商的联络内容、时间以及其它细节，参见我们的完整版报告。

图例
✘✘	主动和被动型网络监听者均可以破解加密的用户输入内容，已被我们成功实测
✘	主动型网络监听者可以破解加密的用户输入内容，已被我们成功实测
!	加密法实操中存在弱点
✔	未发现问题
N/A	该产品在我们测试的设备上不提供或是不存在

输入法开发商	Android	iOS	Windows
腾讯^†	✘	N/A	✘
百度	!	!	!
讯飞	✔	✔	✔

内置输入法开发商

装置制造商	自有	搜狗	百度	讯飞	iOS	Windows
三星	✔	✔*	!	N/A	N/A	N/A
华为	✔*	✔	N/A	N/A	N/A	N/A
小米	N/A	✔*	!	✔	N/A	N/A
OPPO	N/A	✔	!*	N/A	N/A	N/A
Vivo	✔*	✔	N/A	N/A	N/A	N/A
荣耀	N/A	N/A	✘✘*	N/A	N/A	N/A

* 在我们的测试装置上，这个是默认的输入法
^† QQ 输入法及搜狗输入法都是由腾讯开发，本研究中我们分析了 QQ 输入法，发现它含有我们先前在搜狗输入法中发现的相同漏洞

总结来说，除了荣耀以外，我们发现的加密破解方法在经过厂商修补后，均已无效。而在荣耀手机以外厂牌的百度输入法中，仍持续存在加密的弱点，但我们暂时还未找到方法可以利用这些弱点破解传输中的用户输入内容。

受影响的软件列表

我们建议所有用户将操作系统和应用程序（包含输入法）升级到最新版本，若您使用了下列软件，我们强烈建议您检查并安装这些软件及操作系统最新的补丁。截至 2024 年 4 月 1 日，下列软件已有可供安装的更新补丁，安装后可修补我们发现的安全漏洞。

非操作系统内置（手动安装）的第三方开发者的输入法：

Android 和 Windows 平台的 Sogou IME / 搜狗输入法
Android 和 Windows 平台的 Baidu IME /
百度输入法（此开发者未完整修补我们发现的漏洞，详情见下）
Android 平台的 iFlyTek IME / 讯飞输入法

三星中国版操作系统中内置输入法：

Samsung Keyboard
Baidu IME / 百度输入法

小米中国版操作系统中内置输入法：

Sogou IME Xiaomi Version / 搜狗输入法小米版
iFlyTek IME Xiaomi Version / 讯飞输入法小米版

OPPO 中国版操作系统中内置输入法：

Sogou IME Custom Version / 搜狗输入法定制版

Vivo 中国版操作系统中内置输入法：

Sogou IME Custom Version / 搜狗输入法定制版

下列软件仍未使用 TLS 加密协议，因此可能仍有漏洞：

非操作系统内置（手动安装）的第三方开发者的输入法：

Android, Windows, 和 iOS 平台的 Baidu IME / 百度输入法

小米中国版操作系统中内置输入法：

Baidu IME Xiaomi Version / 百度输入法小米版

OPPO 中国版操作系统中内置输入法：

Baidu IME Custom Version / 百度输入法定制版

下列软件含有未修补的漏洞，能够轻易被攻击者所利用，我们建议用户改用其它输入法：

非操作系统内置（手动安装）的第三方开发者的输入法：

Android 和 Windows 平台的 QQ Pinyin IME / QQ 输入法

荣耀中国版操作系统中内置输入法：

Baidu IME Honor Version / 百度输入法荣耀版

综合建议

致信息安全研究人员

信息安全研究人员应多加研究东亚及其它热门区域的移动应用程序生态系，哪怕这些区域并非研究人员的原生区域。
信息安全研究人员应进一步发展动态及静态分析方法，以利大规模寻找本研究发现的此类型漏洞。
信息安全研究人员通报程序漏洞时应以程序开发者所在地区的常见语言写出简短摘要及邮件标题。

致应用商店

应用商店不应要求必须注册帐号才能下载安全补丁。
应用商店不应以设备的地理位置为由阻挡安全补丁。
如同 Google Play 商店，其他应用商店应提供方式给开发者标示隐私和安全信息，包含网络数据传输是否加密。
当开发者在应用商店中标示应用程序会加密所有传输数据时，应用商店应予以显示，当开发者并未如此标示时，应用程序商店亦应警告用户。
应用商店应针对特定敏感类型的应用程序（例如输入法）要求开发者保证所有传输数据均经加密，或保证不上传任何数据。

致输入法开发者

使用经过广泛测试的标准加密通信协议，例如 TLS 及 QUIC。
尽可能将功能设计为可离线运作、不需上传任何敏感数据到云端服务器。

致移动操作系统开发者

如同 iOS, Android 应通过沙盒化来限制输入法程序的网络传输和其它危险行为，在用户主动允许前不予放行。
Android 及 iOS 开发者应设计更好的「网络访问」权限，让用户一目了然应用程序是否通过网络传输任何数据。

致智能手机制造商

将输入法集成并内置在操作系统之前，应稽核其安全性。

致一般用户

搜狗、QQ、百度、讯飞输入法的用户，无论输入法是手动从应用商店安装或者原本就内置在操作系统当中，应确保输入法及操作系统维持在最新版本。
顾虑隐私的用户应停用任何输入法中的云端功能。
顾虑隐私的 iOS 用户不要启用输入法的「允许完整访问权」。