An NLP Analysis of the Sinocism Newsletter

Last updated on Jul 3, 2019 8 min read

Two and a half years ago I subscribed to Bill Bishop’s Sinocism Newsletter to aggregate and streamline my news consumption on China and Sino-US relations. I don’t regret it. Through Bill’s newsletter, I came upon another newsletter, ChinaAI, which is the work of Jeffrey Ding from the Future of Humanity Institute at Oxford. Both are great supplements, but have come at odds with each other recently.

In response to the increasingly conflictual trend in US-China relations, Bill’s editorializing has taken on a harder, more cynical (the newsletter is not called “Sinocism” for nothing) edge while Jeff has been a voice for moderation. In particular, he has called on the China watching community to do a better job lifting up diverse voices into gatekeeping positions. In that issue, Jeff also made a general call asking someone perform an text analysis of Sinocism to explore trends in general sentiment over time, different memes—New Cold War, AI, tech war, etc.—content, and so on.

This post is a response to that call. As a longtime reader of Sinocism, I felt I was in a good position to combine the surface-level insights from a text analysis with my personal “domain” knowledge. Also, it was a good opportunity to get my hands dirty with some different NLP libraries. Below I provide a quick overview of my methods and discuss the results. In future posts, I will dig into the R and Python side of things.

Approach

To get the emails, I requested them from Google’s Takeout service. For the uninitiated, it’s a great way to archive data from your Google account(s). Google bundles the requested data and sends it to you in an email. Upon receipt, I converted them from .mbox format to HTML and began parsing away. I broke the text up into three categories:

Introductory Remarks
In-text Commentary
Entire Newsletter

Introductory remarks refers to Bill’s headline commentary. In-text commentary refers to his commentary on specific articles. Entire Newsletter is all of the paragraphs including summaries of the news articles and his commentary.

I chose this breakdown because Bill has altered the newsletter’s formatting multiple times since its inception. Certain aspects are fixtures like the ‘Essential Eight’, a set of 8 articles he’s rank as most important for the day, and the content headers. Others have changed a fair amount.

Of those features that have changed, the most relevant for an NLP analysis is his technique for commenting on the news articles. At first, he explicitly set-off his commentary with // Comment: or Comment: and most of this was in the body of the newsletter. Over time he has slowly shifted his commentary to the head of the newsletter and away from the body.

Now when he makes in-text comments, they are rarely explicitly set off apart from the use of bold typeface. The result? It is less easily traceable with regex or HTML tags. While this may be more aesthetically streamlined, it makes my job harder. I did my best to isolate the in-text commentary, but there is indeed a fair amount of spillover from the article summaries, especially in more recent issues.

In my unscientific estimate, in-text commentary picks up about 70% Bill’s thoughts and 30% news content. The introductory remarks are exclusively Bill’s opinions. The entire newsletter content is a decent proxy for trends in China as well as Sino-US relations, a thermometer so to speak, but should not be taken as unbiased. Much of Bill’s editorializing is via selection. What articles and topics he chooses to include, how much attention he gives them, and where they are placed all are subtle, yet important forms of editorializing.

A quick and dirty bag-of-words text analysis as done here won’t pick up on these subtleties, but it does provide food for thought.

Findings

The analysis proceeds from general to specific. I start by exploring the commentary length over time. Then I dig into the most common unigram and bigrams. Afterwards, I briefly look at two memes that are increasingly prevalent in news about China: artificial intelligence and New Cold War. I conclude with a temporal sentiment analysis using Lexicoder Sentiment Dictionary (Young and Soroka 2012), a canned dictionary that comes with quanteda.

Commentary Length

I first provide a sense of Bill’s verbosity over time. To do so, I tokenized the texts and took a count, grouping by month and each type of commentary. The figure below plots the results. As you can see, the introductory remarks feature appears in November 2017 and slowly grows in magnitude until May 2018, when it surpasses the in-text commentary.

From my daily reading, I can tell you that the in-text commentary continues to grow sparser and Bill more laconic therein. Some of the residual token counts in this category are due to spillover from my roughshod classification approach combined with the fluid nature of his formatting.

Another thing you will notice is that both types of commentary tend to trend together. Some months Bill is simply less verbose, and the newsletter acts more like a simple news aggregator. The correlation between counts is also due to Bill taking a hiatus for different personal reasons.

Top Words and Phrases

Next, I wanted to get a sense of whether and how closely the content of each type of commentary aligns with the entire text. I did this in two ways. The first way is by comparing a simple count of the top 50 terms by frequency faceted by the three categories. The results are below.

Unsurprisingly, terms like China, US, Trump, Xi, Beijing, PRC dominate the top 5 slots across all three categories. A quick scan of the in-text commentary suggests that it has mostly focused on insider knowledge/rumours on the trade talks and the Party Congress/other party meetings. The introductory commentary is almost exclusively dominated by the trade war, but only because this has so heavily colored bilateral relations.

Only one chinese character, 的 (de) a possesive particle, makes it on the list. If I had utilized a Mandarin dictionary, it probably would have been removed as a stopword. On a side note, an interesting future analysis would be to focus explicitly which Chinese articles Bill includes, their content, and the top news outlets he taps apart from the People’s Daily.

While the single tokens might make for a nice plot, they are not that informative. So the second way I explore the congruency between commentary and content is to use spaCy’s pretrained part-of-speech (POS) tagger and extract the top 50 noun phrases, dropping collateral damage caused by the Mandarin. The figure below presents the findings.

The top 5-10 phrases closely mirror the results from the unigram approach, but slightly more proper names and places making the cut, e.g. Wang Qishan, Guo Wengui, Liu He, etc. Moving down the list, things become more interesting. Even though overall counts tend to drop off, we can start to see a divergence in what issues Bill pays attention to vis-a-vis the overall content.

Looking at the entire newsletter, we see the phrases are well-distributed across multiple topics. For instance, local_government is probably a result of the PRC’s focus on addressing local government debt issues. This topic was especially pronounced throughout 2017, but has since taken a backseat to the trade war. Lower on the list are phrases like national_security, a political staple; artificial_intelligence which is increasingly in the spotlight; foreign_investment, a sticking point in the trade talks; peking_university which has undergone a significant change in leadership and been engrossd in a few scandals; new_zealand, an epicenter for news on China’s United Front influence operations; discipline_inspection in relation to Xi’s corruption campaign; and environmental_protection, a nagging problem in China.

Moving to the in-text comentary, we see Bill’s focus centers more on high-level Chinese party officials and other notable figures, the Trump administration and its members, and “insider knowledge,” so to speak. These comments aim to provide deeper context and Bill’s personal read on the private dealings of both governments. This is not evident from the phrase counts alone, but my personal experience tells me so. Phrases such as ‘another_reminder’ and ‘another_sign’ usually precede clauses like ‘of Xi’s increasing consolidation of power’ and ‘that neither side is likely to make any concessions soon.’ The tailored nature of this commentary is intuitive given that it is often in response to one or more news specific news articles.

Finishing with the introductory remarks, three topics dominate apart from administrativia: the trade war, US-DPRK nuclear negotiations, and Sino-US technology competition. The first and third topics could be nested under a broader topic of Sino-US competition. Bill and many in the Beltway have framed these two issues as part and parcel of a ‘New Cold War’ (Elsewhere, I dubbed this meme ‘Cold War 2’ or CW2 for short). However, I think they are worthy of a distinction. Bill usually conveys his personal opinion on trends in each of these areas and occasionally offers predictions or things to look for on the horizon.

Memes

News articles often bundle coverage of tech advances—especially in areas like machine and deep learning, cloud computing, UAVs, and the hardware necessary to power these algorithm—under the umbrella of artificial intelligence. Typical coverage of the tech industry in China is no exception. While I believe this oversimplication gives the misleading impression of AI as an monolith and further obfuscates important distinctions, it is the norm among non-specialist journalists. Thus, despite my personal convictions, I employ just the term “artificial intelligence” to search for instances of the AI meme in the newsletter. CW2 allusions tend to be slightly more nuanced, sometimes not referencing the term “Cold War”" at all. I therefore represent this meme with the phrases “cold war,” “containment,” “tech war,” and “arms race.” I then used glob wildcard matching to trace the presence of these memes in each issue of the newsletter.

Juxtaposing the two types of commentary with the entire newsletter, one immediate result stands out. Although I have felt Bill increasingly employs Cold War-esque frames in his commentary, the data does not support this sentiment. The last time the topic appears is in September 2018. One conjecture is that the CW2 meme has become more prevalent in the news articles Bill chooses to feature but not his commentary. Another, more plausible, conjecture is that my hasty parsing of the text coupled with an under-defined set of terms caused me to miss more recent appearances of the meme in his liner notes.

The second result is the rapid growth in the CW2 meme within the general media since the end of 2017. While AI has been a fixture in coverage of China, CW2 is a relatively newer beast. I consider Campbell and Ratner’s “The China Reckoning” piece in Foreign Affairs to be one of the defining events that touched off the so-called “China Consensus” in DC. The plot confers some descriptive legitimacy to my hypothesis. In the two months prior to this piece CW2 appears 10 times in the newsletter, and 17 times in the two months after.

I am not implying a causal relationship here. Their piece could easily be considered a symptom of the shift rather than a harbinger. I merely believe this publication is a useful heuristic to mark the structural shift in Sino-Us relations.

Emotional Sentiment

Ideally, one should tailor a sentiment dictionary to corpus under study, but I haven’t the time for that. As an alternative, I use quanteda’s canned sentiment dictionary. While I am generally skeptical of the validity of a sentiment analysis, especially when using a canned dictionary, it can provide some exploratory utility. Below I plot the Positive-Negative ratio (valence) of the newsletter over time. The difference is normalized by the total number of tokens in each category per issue. I again demarcate the Campbell and Ratner piece’s publication date.

Based on this dictionary, the entire newsletter usually sits in the 2-5 percent negative valence range, with occasional positive spikes. The in-text commenatary is even more negative, with some issues containing over 20% negative tokens. However, the in-text commentary does appear to have more positive content than either the entire newsletter or the introductory remarks. I explore this more below. The intro remarks display a similar negative lean.

Does this tell us anything we don’t already know about Bill’s editorializing? Not really. Bill named his newsletter Sinocism as a play on cynicism for a reason. That alone should have sufficiently conditioned our expectations.

What about in regards to the entire newsletter which I believe is a useful barometer for bilateral relations. Again, the sentiment analysis turns up few surprises. As an interlocutor in the DoD aptly stated the other day, “… we went from cooperating for cooperation’s sake to where we are now, competing for competition’s sake.” I’d be surprised if the relative valences looked any different.

The one finding that stands out is how much the in-text valence vacillates in recent months and, in particular, the sheer number of positive issues. To look into this further, I randomly sample four issues from after December 2018, two with negative valences and two positive. Let’s take a look.

Positive
2019-05-06	2019-06-19
It takes a while for the propaganda department to get its orders, especially after a holiday and a weekend www.pbc.gov.cn/goutongjiaoliu/113456/113469/3820253/index.html ).“All the funds will be loaned to private and small companies,” the statement said…	And we have the annual liquidity pressures around June 30 coming up as well.
The market has become increasingly concerned with potential policy tightening since mid-April, after a central bank monetary policy committee meeting and a gathering of the Politburo, the group of the ruling Communist Party’s 25 top leaders, both indicated a “neutralization” of the monetary policy stance, economists with investment bank China International Capital Corp. said in a note.	Could be a very stressful 12 days in the banking system Zhao Kezhi, minister of public security wrote a long op-ed on People’s Daily, urging the police troops to stay “absolute loyal” to Xi Jinping. "
However, “the RRR cut this morning has eased these concerns.	On this fundamental issue relevant to our flag, our direction, and our path, our mind must be very clear, our position must be very firm, and our attitude must be very clear-cut."
It is clear that the overall monetary policy stance is unlikely to revert to a ‘de facto’ tightening mode we experience in 1H2018," they wrote.	He also talked about how the police must work hard to safeguard the CCP regime, and “severely crackdown all kinds of infiltration and subversion activities, violent and terrorist activities and radical religious activities”.
In a note this morning Capital Economics wrote that “The PBOC is requiring that all lending funded by the cut go to small and private firms.	郭声琨表示，在以习近平同志为核心的党中央坚强领导下，在习近平新时代中国特色社会主义思想指引下，中国创造了经济持续健康发展、社会持续安全稳定“两大奇迹”。中国愿与世界各国分享安全治理的经验，也愿与各方携手应对各种问题和挑战。我们应更加严厉地打击恐怖主义，坚决维护人类社会的安全底线；更加有效地维护网络安全，共同守护清朗健康的网络空间；更加深入地打击跨国犯罪，努力保护各国人民的安全和福祉。
Our calculations suggest that the net RMB280bn liquidity injection can be used to expand bank lending by RMB3500bn, enough to boost the stock of outstanding loans to small firms by 10% and overall bank loans by 5%."	I thought CCTV anchors were banned from making private appearances?
Airbnb also in the news in China because of the discovery of a hidden camera in a Qingdao listing.
Airbnb has apologized - Airbnb致歉青岛民宿装摄像头：永久撤出涉事房源 Abridged translation of latest Caixin cover story 海南去地产化风暴新建海景房被推倒 By the US embassy in the new embassy district.
Negative
This deployed all over Beijing or just around embassies?
It must be much harder for foreign intel services in operate in China now, and I doubt Chen Guangcheng could have been brought into the US embassy now
2019-03-04	2019-03-26
Xiang Songzuo is the economist who claimed in December that 2018 GDP growth was actually 67% or lower There may not be much new in Xi’s message	And this may be the ultimate irony of the pressure on China: If it agrees to everything the US wants it may ultimately strengthen the PRC economy and make it an even richer and more formidable competitor to the US A sign of the times.
but it is a useful reiteration of his views toward the cultural and social science sectors, and another sign that there is little reason to expect any near-term loosening in information controls, and many reasons to expect them to tighten…	Anyone know who is funding it?
Obviously no coincidence with the timing of this official report, just days after Canada rules the Meng Wanzhou extradition process will proceed.	More conformation to Xi that he can believe nothing anyone in the US government other than Trump tells him
I know Xi is comfortable with contradictions but the circumstances around detention of these two makes it hard to trust the promises of better treatment for foreign investors and foreigners…
Blatant

These samples lend further credence to my skepticism of canned dictionaries. The sentences from the relatively positive samples do not evince positivity. If anything, there seems to be a fair amount of cynicism throughout the May 6, 2019 newsletter. What’s more, Chinese propaganda is peppered about the June 19, 2019 newsletter which is probably driving the relative positive valence. A sample size of 2 is not nearly large enough to be representative, but given my hasty data cleaning, I’m confident the sentiment analysis is mainly noise.

Conclusion

The text analysis did not uncover anything counter-intuitive to long time readers of Sinocism. For those who do not watch China closely, the results provide a good window into Sino-Us relations over the past couple years and flesh out Bill’s editorial priorities.

The most intriguing finding, in my view, is that mentions of the CW2 and AI memes are extremely scant in the commentary, despite their consistence prevalence in the news. This does not conform to my intuition gained by reading the newsletter daily. Either my perception of the newsletter is biased or the bulk of Bill’s editorializing is structural, as opposed to verbal, in nature. The reality is likely some combination of both these factors.

It is also interesting to note that the AI meme predates CW2 and only more recently has the latter started to engulf the former. These two memes are already exceptionally reductionist and need not be further merged into a “New AI-fueled, Tech Cold War” meme.

In coming weeks, I will walk through my code for this post step-by-step, breaking it into multiple installments. If anyone has specific requests for some type of NLP or otherwise they would like to see done on this data, let me know. I’ll do my best to work it in.

I normally adhere to an open data policy, but the newsletter is behind a paywall. I don’t think Bill would be happy about me sharing 2.5 years of archived issues. If anyone is interested in doing their own analysis, contact me and we’ll see if we can work out a data format that protects Bill’s intellectual property.

Thanks for reading.