“You don’t tell researchers what to do.
You certainly don’t tell researchers like me what to do.” (LeCunn 2026)
“All over the world, so many of us are building the same thing, from the same materials, atop the same infrastructure, through the same media, at the same time. And that thing, that ‘same thing’, is itself a self-propelling engine towards its own concentrating sameness, towards ‘more of the same'” (Disintegrator Podcast 2023). #convergence
Whether or not advancements in Large Language Models (LLMs) are closing in on “the gap towards Artificial General Intelligence” (Guo et al. 2025), they are nevertheless closing in on something. Be it Artificial General Intelligence or be it techno-capital singularity, be it a topological map of the structural invariances in language or be it a generalizable and scalable algorithm of mind or be it an idiot monster pooping infinite SEO slop, that ‘something’, that arrival point, is disclosing itself, one revelation after the other. Something – the transcendental logics of mathematics or language or computation, or the Simondonian self-speciation of a technical object, or the dumb blank stare of capital, or a new and only partially-accessible alien subject (Parisi 2019) – is gradually, dramatically, cautiously but also generously, controlling the narrative of its own realization across the world of humans. It eats our entire documented history and excretes generalizations – generalizations with such apparent fidelity that they can then be fed back into the anus of the machine in order to produce plausible new histories through its mouth. The all-world thing crawls out, nude, through billions of portals cleaved randomly across the entire earth, each of which are formally, materially, functionally, and operationally the same. #LLM, #slop, #data
So: on one hand, a world-spanning thing. But on the other hand, at the same time, the collapsing monopole of cosmopolitan globalism and Western internet monoculture, the last-ditch clutchings of hegemonic American (‘Western,’ ‘Northern’) military and financial influence, the diffraction of humanism into posthumanism (Braidotti 2013), the erosion of the general polis into the intersectionally-situated identity, the thoroughgoing expungement of any and all claims to the universal (Grosz 2011) as necessarily colonial, necessarily gendered, or necessarily racialized (Amaro 2023), the giveaway of the structural through the poststructural, the disintegration of the individual subject into stacked microsystems of cohabitating algorithmically-determined processing threads (Alonso Trillo & Poliks 2023, Alonso Trillo & Poliks 2024). Generality is out, specificity (Haraway 1988) is in, and in a glorious bit of timing we reach the local maximum of the latter as we collide head-first into a new structuralism (Fazi et al. 2025) prescribed by the frontier of the former.#digital colonialism, #datasubject, #posthumanism
At the moment of impact, there’s no explosion. Instead the respective millieus and practices of generalism and specificism phase through each other, ghost-like, each marshalling codistributed but distinct forces toward distinct ends. Ostensible competition at one scale (cross-country, cross-company, cross-lab) gives way to totally-liquid information-share at another. Civilizations keep doing their thing, empire trudges aimlessly forward, Westphalia continues to partition itself into some superposition of city-state and hemispherical axis while the new all-world thing squirts through its pores. Capital takes note but attends to its own interests. Culture pools downstream, endlessly, existentially, articulating itself in its own terms with someone else’s medium and someone else’s tools.#multipolar
This essay contains an argument about the same, about the irreducibility of the LLM to geopolitics, culture, and civilization, and about our collective encounter with real limits, elsewhere.
The Road to Generative AI
The Graphics Processing Unit (GPU) is not a general thing; it’s a super specific approach to compute with a functional origin. It’s a piece of hardware specifically designed to parallelize the fundamental mathematical operation at the core of graphical transformations: matrix multiplication. While GPUs were first preprogrammed for computer graphics, NVIDIA opened up that preprogramming layer with CUDA in 2006 (NVIDIA 2007), allowing anyone to use NVIDIA GPUs to do matrix multiplication for anything they want. #GPU, #NVIDIA
The scientific community took note, and something interesting (Krizhevsky et al. 2013, Karpathy 2024) happened – the entire deep learning field started to converge (e.g., Bahrampour et al. 2015) around model architectures that could best take advantage of CUDA and NVIDIA hardware. The winner among an array of deep learning techniques was the transformer (Vaswani et al. 2017), the ultimate engine of the Generative AI revolution, which was precisely an accomplishment in the ability to think about time and order (the sequential and contextual relationships between things like words, musical notes, pixels, and frames) as matrix multiplication. Instead of thinking about sequences as diachronic flows of information, transformers process sequences all at once, removing the dependency of later elements on the calculation of prior elements. The entire GenAI moment pours forth from a model architecture with an ability to optimize against this very specific hardware taxonomy.#NVIDIA, #history, #generative, #transformer
It’s worth noting that this synchronization has been both total and global (Gao et al. 2024). The UAE trains transformers on NVIDIA GPUs (AWS Machine Learning Blog 2024). India trains transformers on NVIDIA GPUs (NVIDIA Blog 2025, Kallappa et al. 2025). China trains transformers on NVIDIA GPUs (South China Morning Post 2025, Tom’s Hardware 2024). Indigenous language preservation (Pinhanez 2024) and European data sovereignty projects (Actu IA 2025, Futurum Group 2025) train transformers on NVIDIA GPUs.#transformer, #GPU, #NVIDIA, #monopoly
The consistent approach toward optimizing against these hardware ends was met with an incredible contraction in deep learning software frameworks, all built to maximize efficiency gains by easily compiling into closer-to-the-NVIDIA-hardware lower-level languages while at the same time providing developer-friendly abstractions. Every few years, a new major framework stepped in to dominate, with just two frameworks at a time (one leading and one lagging) consistently underpinning almost 100% of model research: Google’s TensorFlow gave way to Facebook’s Pytorch, now Pytorch gives way to Google’s JAX.#optimization, #convergence
Let’s go up another level. A transformer’s performance increases directly in response to the magnitude of data upon which it’s trained. OpenAI codified this observation as a framework for LLM development in the form of scaling laws (Kaplan et al. 2020, Hoffmann, et al. 2022, Udandarao et al. 2024). Of course, performance is a loaded concept (performance at what? toward what ends?), but a concept that became awfully specific over time, converging on a limited (Eriksson et al. 2025) array of general tests through which different individual models could be benchmarked against each other. The feedback loop intensified quickly between benchmark assessments and the consequent overfitting (Koch and Peterson 2024) of subsequent models developed to maximize their performance against said benchmarks. And while the benchmarks might feel arbitrary and self-fulfilling (e.g. Math Olympiad, Humanities Last Exam), the improvements feel manifestly real not only in commercial or software development terms (e.g., Zeff 2025), but in other domains as well. With the use of tools like NanoBanana2, Kimi-K2.5, or Suno v5, creatively-minded people are able to produce something that is not only manifestly useful (according to the assessments of capital), but that not only passes but excels at Turing’s Imitation Game precisely in the domains reserved for human self-expression (e.g. the arts – see Kuta 2022, Whiddington 2023, Vainilavičius 2023, Carter 2025) essentially anywhere around the globe.#data, #generative, #scale
These scaling power laws have interesting effects on the type of data that is ultimately useful, since they suggest that small volumes of good data are significantly less valuable than large volumes of OK data. Instead of small volumes of high-density research papers, a better corpus might resemble something like the totality of Reddit, GitHub, or ArXiv, weighted principally by community-based ranking systems. Given the limited number of resources (Heim et al. 2025) that exist at this kind of scale, the entire Generative AI world began to pass around the same dataset – large, unordered text datasets (Baack 2024, NVIDIA Research 2024) stack-ranked fora entries, webcrawls like Common Crawl, and open-source data repositories like Wikipedia. The Anglocentrism of this global dataset (Y. Huang et al. 2024, H. Haunt et al. 2024) persists, dominating simply by virtue of the power laws that yield in its favor.#data, #scale
The transformer itself has been undergoing consistent iterative refinement since its inception. What’s interesting, however, is the aggregation of these iterative improvements into a single narrative trunk. An innovation may start in Google Brain and make its way to OpenAI (Fedus et al. 2022), another one might start in Hangzhou and quickly be adopted in San Francisco (Lin et al. 2021). This or that new component is adopted across the field to the extent that it results in a gorgeous little diamond sitting tightly atop a trendline inscribed upon a 2D graph of training/inference cost against benchmark performance. What emerges is a default stack: a transformer decoder carved into specialized expert sub-networks with certain transformations embedded in its query structure and certain approaches to memory retrieval. Contributions to this stack come from all over the world and any notable new technique is quickly distributed throughout the international developer community. #transformer
Up another level. For the consumer, Generative AI lives in the browser. It is engaged via an API, with a fixed series of conversational components and a fixed query structure. Virtually every LLM with a frontend (ChatGPT, DeepSeek, Mistral, MetaAI, etc…), including multi-modal models and fully agentic code development environments, simply abstracts that API into a user interface object that looks like a single operator chat window. The chat window itself as well as its underlying architecture (so-called ‘ChatML’ – Brockman et al 2023) was pioneered by OpenAI with the grand reveal of ChatGPT, and was subsequently cloned by every single other provider with a stake in consumer-facing UX. #LLM
As models became more and more commoditized, model clearing houses like AWS Bedrock or Microsoft Azure AI Foundry or Alibaba ModelScope provided higher level abstractions through which this or that underlying LLM could be swapped with this or that other LLM, further intensifying a hard conformity to those API standards first set in place by OpenAI. Even left-field, homebrewed, open-source models acquire standards of uniformity as they move into HuggingFace, a massive model warehouse that exists as the de-facto home of open source Generative AI development (see Ríos 2025), competing for visibility on its legendary leaderboard.#LLM, #data
At scale, however, a new series of communicational standards has been quickly emerging and consolidating. Anthropic’s Model Context Protocol standardized the natural language interface for AI agents to interact with contexts that aren’t humans (e.g. a website, a database, a calendar). Within 6 months of its inception in November 2024, it was supported by virtually every major international model provider and went absolutely viral throughout an international business community motivated to make itself easily legible to AI agents (Zarecki 2025). In turn, Google’s Agent to Agent Protocol was released 5 months later, providing the same kind of standard interface layer for AI agents relating to each other across infrastructural or commercial boundaries. The pace of innovation and standardization have accelerated and bled into each other, where the first organization to coin a communicational protocol at a certain level of scale gets the privilege of establishing it as a standard. #scale, #Model Context Protocol
And so we arrive at so-called “AI”. Historically “AI” meant a superclass of all machine intelligence, one that contained machine learning as a subdiscipline, and has since everted in the weirdest way into a subclass of machine learning, even of deep learning: an unbelievably specific array of generative technologies that begin with the transformer-based LLM and appears to continue along a clean line of descendants.#AI, #LLM
The Materialist Explanation
It’s possible to understand this convergence in a few ways. We can start with a kind of naïve materialism, where a kind of route determination of software flows downstream from the inherited physical affordances and commercial entanglements of hardware.
Because the GPU existed in the specific way that it did (its specific array of transistors optimized toward certain ends), because of a series of decisions made by Jensen Huang around both the GPU and CUDA, because of the early influx of ZIRP capital and now because of the defensive allotments of hyperscaler capital into circular investments in model developers and compute providers, because of GPU economies of scale, because of cheap electricity and data center real estate, we’ve arrived at a global consensus of what it means to construct mechanized intelligence in a way that just so happens to be tightly circumscribed around the total addressable market of the world’s largest corporation.#GPU
A material determinist might then argue – if we were to change the underlying material motor (the datacenter, the GPU, silicon, neodymium, oil) or its incentive structure (hardware monopolies, hyperscalers, sunk costs), we would expect that the resulting stack would diversify or grow in a stridently alternative direction. And while this feels like a speculative thought experiment, this question is not merely an active subject of research, but actually the active recipient of billions if not trillions of $USD in targeted investment. However accidental and even cleanly coincidental, the current stack is not desirable. Hardware, made of silicon and rare earth elements, here appears not as a static given, but an active limitation on performance against which alternatives are constantly being assessed. The Von Neumann architecture is being reassessed for its performance limitations (Agarwal and Michael Levy 2018, Xie et al. 2024), oil is being reassessed simply by virtue of its sclerotic Energy Return on Investment (EROI), the entire market is holding its breath for the inevitable successor to the transformer architecture to finally appear to finally drop the price of inference – but no, here, regardless of the immense, crushing, pressure to produce something more dependable and more cost-effective, the axis of discovery is technological, the stubborn cog holding up the line is technical, and painfully but so so so importantly technology is not strictly or essentially the same thing as money, power, or materials.#GPU, #monopoly
It is always important to hold space for the material, here meant both ways in terms of physical mechanics and also in terms of commercial incentives. But the material is always a tricky position to hold with respect to something like computation (Fazi 2018), which is only actualized within the material and potentially in some cases conditioned, or better, constrained by the material. But computation is not determined by the material. Computation is multiply-realizable (Putnam 1960), it is virtual and formal, it hangs within the space of differential relations that constrain actualizations prior to their incarnation within the material (Turing 1950). This or that material substrate can permit or delimit the span of computation through this or that substrate, and a given material apparatus can certainly either open or close the apertures through which humans discover and access computation, but any set of material substrates themselves cannot in any way impact the virtual or formal characteristics of computation as such, full stop. While this can feel like a pedantic point, it really isn’t – it means that there’s a kind of dark entanglement in play between materials, incentives, and a virtual and formal order. #materiality, #scale, #economics
A more sophisticated take rooted in software economics would point to an incentive structure that is based less in discrete profit (or even revenue) in favor of speculation. It’s not that AI needs to locally deliver a promise – ultimately, lowered labor and tooling opex – it’s rather that AI represents the possibility of the promise and moreover that the venture capital atmosphere is so perfectly primed for the hyperstition of this possibility as already a delivery upon that promise: e.g. ‘we don’t necessarily need to make AI work, we need to either lower the threshold for risk tolerance within the software market generally, or we need to spin up additional software services and businesses who can manage the risk of incompletely-performant AI’. Once OpenAI established a pattern that represented that promise, it became economically rational (Fanni et al. 2025) for everyone to follow, and then developer ecosystems coalesced around dominant stacks, hiring pools formed around specific frameworks, and VC money continued to beget more VC money.#economics, #venture capital, #promise
Unfortunately, the timing is wrong. Transformers displaced the rest of the field while OpenAI had about 50 employees, well before the speculative frenzy, and the displacement was justified on benchmarks that substantially predate the Generative AI moment. And from there, the pattern is sort of the converse of what you would expect – instead of each successive element of the ‘default AI stack’ emerging within hyperscalers and model developers, you instead see these refinement components radiate from the periphery: academic labs, international research institutions, small startups outside of the American VC space, and concepts flowing from developers within the enormous open source community (see Rombach et al. 2022, Dao et al. 2022, Li and Liang 2021, Kwon et al. 2023). Of course, network effects around interfaces and frameworks cannot be dismissed out of hand – especially given the dominance of hyperscalers in the simultaneous development and popularization of specific interfaces, as for example Python, Google Colab, or ChatML. But it’s really hard to argue that ChatML is not just a UX repackaging of the input-output patterning of the transformer as reflected in a terminal, nor that Model Context Protocol is not itself a repackaging of input-output to instruction-structured action-result as reflected in API architecture, and those network effects feel easily accountable to frictionlessness.#transformer, #developer, #economics
So, instead, let’s move one step further into the spooky.
The Cosmotechnical Explanation
The convergence phenomenon we’ve been mapping out for the Generative AI stack might be explainable by cosmotechnics: what Yuk Hui would describe as the upstream ideological, cosmological, and spiritual housing of technology in civilization (Hui 2016). Any technology is always-already situated within cultural, epistemological, and moral commitments: how it should be used, what purposes it serves, what relationships it enables between humans and nature.#cosmology
For Hui, the history of Western technology (here read through Heidegger) can be interpreted as a history through which nature is subordinated to man. In Western cosmotechnics, nature is given to man as a resource through technology – and it’s precisely through the ‘neutral’ or ‘objective’ position of technology within this relationship that Man comes to see himself as nature’s master. There are obviously many potential relationships to the world through ‘technics’ or tools that have less to do with objective positions or mastery, and more to do with harmonious, cooperative, or intra-active relationships with the world of materials. Hui specifically looks at the history of Chinese technics and the philosophical and spiritual and aesthetic operators, oriented instead toward attunement within a dynamic order, that differentiate these practices from their parallels in the West.#cosmology, #humanism, #technology
Let’s further refine how cosmotechnics is understood by distinguishing between hard and soft cosmotechnics. A hard cosmotechnics might hold that technology is strictly downstream of civilizational dynamics, technology is produced by those dynamics and it enhouses and sustains those dynamics in its mediative capacity between human and world. In hard cosmotechnics, there might be tremendous technological diversity because every civilization is likely to produce their own technologies through their own approaches to technology-making. Soft cosmotechnics, then, weakens the claim of strict determination. In soft cosmotechnics, technology is one among many civilizational dynamics, with its own pushes to exert and pulls to accept. Each respective civilization may create technodiversity as much through their production of technologies as through their reuptake of those technological dynamics into their specificities as civilizations. #technology
Yuk Hui’s intervention in the form of hard cosmotechnics is forceful and deserves serious consideration here: it emerges from an encounter with midcentury philosophies of technology (Leroi-Gourhan 1993, Simondon 2017, Needham 1954), reframed through a Kantian antinomy (Hui 2017) in which technics appear either as something that generalizes across humanity as anthropologically universal, or as something that remains bound to cosmological difference, as not anthropologically universal but instead cosmotechnical. Hui stages this tension explicitly to keep open the space in which locality, heterogeneity, and epistemology can still be thought. The force of Hui’s project lies in refusing to collapse this tension prematurely. His intervention is explicitly directed against the lingering universalism of Western philosophy of technology, from the foundational discussion of technē in Classical Greece through Heidegger and Simondon, and against what he identifies as a Promethean inheritance: the assumption that technology itself is a liberatory force whose trajectory can be abstracted from cosmology, mythology, and moral order.#cosmology, #technology
But the question that hard cosmotechnics immediately raises is unavoidable: where is the technodiversity, especially in AI? Can technodiversity even exist in deep learning, or is deep learning itself (as a discipline) already cosmotechnically foreclosed by virtue of the priorities and problems through which it has been articulated? You can ask the same question about the internet, the personal computer, or computation itself. Where do cosmotechnics happen? Is it always, only, ever a case of rotten roots? Or can cosmotechnics interrupt or intervene in pre-existing dependency paths? And if the latter is possible… where is it? What blocks it? What prevents it from achieving some level of scale (is it the cosmotechnical position of scale itself)?
We keep thinking about Hangzhou-based DeepSeek. In one sense, DeepSeek’s release of the R1 (and the prior V3) represented a profound disruption of American AI hegemony – so much so that news of its announcement sheared away $593B USD in NVIDIA’s market capitalization (Freifeld et al. 2021). R1 represented the ability to train a model competitive with OpenAI’s GPT4-o1 with 1-2 orders of magnitude fewer resources. But in an interesting twist, the advantage that R1 had over OpenAI was their ability to intervene at a very low level in NVIDIA’s GPU stack using NVIDIA’s own tooling. The threat to NVIDIA here in terms of market cap had far less to do with a threat to NVIDIA’s hardware dominance, but rather a perceived threat to the scaling laws of compute itself. But in a twisted turn of fate – when something becomes more efficient, you often use more of it, not less (Bloomerang News 2025). And, moreover, the efficiencies gained by DeepSeek in this instance did not represent a departure from the NVIDIA stack, but rather a deeper integration into NVIDIA’s core GPU mechanics. Convergence, again. This isn’t to argue that DeepSeek or Baidu or Alibaba etc… should be provided as necessary representatives of Chinese cosmotechnics, but at the same time, all three of these institutions have been proffered by the Chinese state apparatus as precisely representative of an insurgent, distinctly Chinese AI, whose terms of distinctness, we remind you, have already been rendered as technologically indistinct. The only way to explain this convergence in terms of hard cosmotechnics is to absolutely and categorically foreclose the entire project as entirely circumscribed by a West that has irrevocably flattened any ideological differentiation within the world.#NVIDIA, #GPU, #economics, #convergence
Ezekiel Dixon-Román and Luciana Parisi suggest an even harder (Dixon-Román and Parisi 2020 & 2025) cosmotechnics, critiquing Hui’s imperative toward technodiversity as a kind of loose globalist-liberal multiculturalism. For Dixon-Román and Parisi, Hui’s proposition for technodiversity remains caught within a cosmopolitical frame that can recognize cultural difference but cannot account for what is negated by the system as such. The problem is not that there are many cosmotechnics insufficiently recognized by Western universalism, but that recursivity itself – the self-regulating, adaptive, feedback-driven logic of computational systems – requires contingency to function, and that contingency itself is not neutral. What gets marked as noise, as error, as target, as collateral damage, as the incomputable remainder that cannot be compressed into the system’s self-representation, this is not simply randomness or cultural variation but the flesh of the dispossessed. Their concept of the “technosociogenic” (see Ferreira da Silva 2022, Wynter 2023) names how colonial racial capitalism inscribes itself into computation at the level of what counts as signal and what counts as noise, who counts as user and who counts as data, what gets generated and what gets eliminated.#recursivity, #cosmology
But we should be precise about what is actually being claimed when computation is said to be constitutively entangled with colonial violence. The core operation of the transformer is attention: computing relevance between tokens via learned matrix operations in the service of statistical pattern-finding. If the claim is that statistical pattern-finding is Western, that compression is colonial, that the search for correlations within information is itself a civilizationally-specific operation, then the claim is so much more radically totalizing than anything we could hope to present here; it’s a kind of civilizational idealism that grants the West ownership over mathematics itself. It’s totally unfalsifiable. Either mathematics is colonial on its face: (in which case, why do non-Western labs under non-Western governance independently converge on identical formalisms? Is it because this colonialism is absolute? How is this not a theology of colonialism?) – or the violence is in the application layer, in which case, the architecture is generic and the political critique applies to deployment, not to computation.#post-colonial, #colonialism, #violence
The latter argument brings us to soft cosmotechnics. The question of where cosmotechnics happens has been given a contemporary response in the form of Benjamin Bratton’s stack-separability. Bratton’s Stack model (Earth, Cloud, City, Address, Interface, User – see Bratton 2016) borrows from network protocol architecture a principle of modularity: anything operating at one layer can be replaced by a completely different mechanism so long as it communicates protocollogically with the layers above and below. Sovereignty, according to Bratton, is contested and produced at each layer, and crucially, governance forms at one layer do not determine those at others.#stack, #protocol
Let’s be extreme and identify a ‘Chinese Stack’ strictly in the most negative terms. China’s Interim Measures for Generative AI Services (China Cyberspace Administration 2023) require model outputs to embody Core Socialist Values and prohibit content that harms China’s image, data localization laws mandate that personal data collected in China remain on domestic servers, and the Great Firewall restricts training corpora to state-approved sources, excluding Wikipedia and much of the Western internet. But this is an incredibly superficial intervention into the convergent software stack we’ve been discussing. The censorship is implemented through reinforcement learning from human feedback and output filtering, techniques that aren’t only consistent with Western model architectures, but are the literal sites through which the West employs its own specific ideological controls. As of the present day, China has released over 1,500 large AI models, roughly 40% of the global total, and not one of them represents a constitutional departure from the default stack at any level of scale (e.g., Bai et al. 2023, Wu et al. 2024, Tencent Hunyuan Team 2024).#statistical model, #transformer
But maybe there’s a third position, one that does not resolve Hui’s antinomy by choosing a side, but by displacing its anthropological frame altogether. Instead of an anthropologically universal technics or a plurality of culturally bounded technics, we’d gesture toward a non-anthropogenic technics: a technological formal-virtual whose definitional characteristics are no longer primarily cultural, mythological, or civilizational, but operational. Form is not downstream of cosmology, nor is it simply the expression of a universal human technical tendency; form is downstream of itself, and moreover it is discretely downstream of itself, such that under pain of non-operability any sufficiently scaled system built within this regime will converge towards an isomorphic representational manifold. We move into a strong cosmotechnics of mathematics.#cosmology
The Generalization Machine
We think it’s coming from the inside-outside.
Models are converging on something, and they converge because that something, a something at the level of learned representation, is invariant even under conditions of different data, different modalities, different random initializations.#convergence
This might not be a naïve something, not a something that can be understood simply as a commercial opportunity, a labor replacement tool, or a new software architecture – regardless of its efficacy as such. It’s also likely not a something in the sense of a machinic something or a mode of intelligence endemic to machine reality. This is a general or generic something, whose generality or genericity in terms of applicability confounds instrumentalist analysis, whose generality is at the scale of world or of language and is therefore enrolled with its full complexity across every edge of its surface, whose genericity could quite literally be a local limit on genericity itself.#generative
Benjamin Bratton’s distinction between “artificial general intelligence” and “artificial generic intelligence” is useful here. Bratton isn’t suggesting a kind of modest genericity, but simply the fact that LLM outputs are drawn from a compressed representation of common structure. The generic is the product of common geometries within spaces of reasons. A suggestive related finding comes from earlier work at Google Brain (Moros et al. 2018), which showed that generalizing networks converge to similar representations, while memorizing networks (trained on shuffled labels, forced to memorize rather than generalize) are as dissimilar from each other as they are from generalizing networks. Those similarities exhibited across generalizing models were not produced from arbitrary regularities in the training data, nor from architectural homogeneity, nor from shared cultural assumptions, but from the constraint of generalization itself. Gurnee et al. (2024) fleshed out this independence from the dataset further when they trained multiple GPT-2 models from scratch with identical data but different random seeds. There was an interesting limitation to the results: 1-5% of those neurons activated identically across models. These identical activations formed ‘families’ of encoded syntax, semantics, and formatting. This convergence can’t be explained by the data forcing a particular solution: if it were purely data-driven, either all neurons would converge or none would converge. The fact that a small subset converges suggests these representations are structural necessities, the only good ways to solve certain subproblems of language modeling. This study provides initial evidence that gradient descent, given the same optimization target, finds the same solutions. There appear to be only so many ways to solve the problem, only so many ways to generalize in this way. Perhaps these are the only solutions to the subproblems of generalization as such reachable through the structural necessities of the problem-solving path available to us.#convergence, #data, #statistical model, #LLM
Generalization (or its symptomatic genericization) is the problem of moving from finite examples to unbounded application, from the seen to the unseen, from the particular to the general through the generic. Any system that solves this problem – biological or artificial, trained on English or Mandarin, processing images or text – must navigate the same constraints. And if there are only so many ways to navigate it through the apparatuses available to us, then convergence is no accident of history. The solution space of the generic intelligence available to us could be constrained from within, and the epiphenomenal infrastructural and architectural convergence we observe might be a product of simple efficiency-maxxing in its direction. We’re surfing down gradient descent within the problem space of generalization to either an absolute or a local minimum, with no way to climb back up. This suggests a strong relationship between abstract constraint and its material instantiation. If the structure of generalization is an object with manifest limits, then the technical architectures that implement generalization are not arbitrary engineering choices but approximations to that object. #convergence, #generative, #optimization
Language might be one particularly dense region within this constraint space. Leif Weatherby’s Language Machines (Weatherby 2025) argues that LLMs vindicate something like Saussurean structuralism, the thesis that meaning emerges from systematic relations between signs rather than from reference to world or intentions of speakers. For Weatherby, language is “to some considerable extent its own thing,” a self-organizing system of differential relations that exists at a level distinct from both the reality it ostensibly describes and the minds that ostensibly use it. The LLM demonstrates what Weatherby calls “linguistic creativity completely distinct from intelligence,” the capacity to generate novel, contextually appropriate text without understanding, without reference, without cognition in any traditional sense. Text produces text. If Weatherby is right, then what the transformer discovers when trained on language is neither ‘reality’ nor ‘mind’ but the systematic properties of a cultural-semiotic object, the accumulated structure of how humans have organized meaning through differential relations over millennia, at a scale that exceeds codification into this or that discrete language.#LLM, #transformer, #creativity, #generic pastness
But Weatherby admits (Disintegrator Podcast 2025) that cross-modal evidence suggests something beyond what structuralist linguistics can accommodate. Huh et al.’s 2024 Platonic Representation Hypothesis (Huh et al. 2024) demonstrates that neural networks trained on different data and different modalities (vision and language, trained separately, with no cross-modal supervision) are converging toward aligned representations. This alignment increases with scale; as models get larger and more capable, they increasingly agree with each other about how to represent things, even when those things arrive through completely different sensory channels. A vision model that has never seen text and a language model that has never seen images develop representations (Chen and Bonner 2025) that can be mapped onto each other through simple linear transformations. The information “seemingly could not come from anywhere else but the intrinsic properties of the visual world” (Li et al. 2023) – or, we might add, the intrinsic properties of whatever it is that both vision and language are sampling. But alongside the Platonic interpretation, we should hold space for the possibility for the intrinsic properties of gradient-optimized compression under similar information-theoretic pressures – the pressures of the entire logotechnical project of the human being assembled throughout its existence on this earth. If two systems descend into the same basin from different starting points, their internal representations will align not because they have discovered the objective reality but because they have been captured by a constraint system that requires nothing less than the systematic undoing of the human being in order to reverse itself out of that basin of limitations.#data, #convergence, #worldmaking
If vision models and language models converge, if a system trained only on images develops representations that align with a system trained only on text, then what’s being discovered cannot be specific to the sign-system of language. Perhaps language is not the object of discovery but rather a particularly rich medium through which the more general constraint discloses itself as generality as such, and a domain so structured, so layered with syntax and semantics and pragmatics and world-knowledge and social convention, that learning to generalize within it forces the model toward representations that solve the generalization problem at scale. The constraint would then be upstream of language, with language as one of its most demanding instantiations.#worldmaking, #statistical model
So, the space of possibility may be confined for tasks related to generalization, but why would we already be converging into such a small toolchain in this direction? And why with such speed and conformity? The mechanism that translates abstract constraint into infrastructural and architectural convergence might be scale itself, but not scale as simple accumulation: scale as power laws. #scale, #convergence
Neural networks follow remarkably predictable power laws: conformity to generalization tasks increases as data increases, and further as the density of the mesh through which that data, the parameter set, increases. Theoretical work since 2020 (Bahri et al. 2024) demonstrates that these are not arbitrary empirical regularities but real disclosures of mathematical structure. Their theory shows that scaling exponents derive from the spectral properties of kernels on data manifolds, drawing parallels between neural network behavior and classical results in functional analysis. As the authors put it, they seek to understand “which aspects of scaling behavior might exhibit universal signatures, and which aspects are strongly dependent on the ‘microscopic’ aspects of the problem” finding that the answer depends on the geometry of the data itself. Power laws are ‘scale-free’; they indicate that the structure being approximated is self-similar, with emergent patterns at different scales contributing independently to the ultimate optimization target. There is no privileged scale at which the general task of generalization completes, but instead asymptotic approaches to a real limit that recedes as capacity increases.#statistical model
Power laws are formally agnostic; they appear in any mathematical attempt to generalize against structured data. Power laws imply that the constraint of generalization itself has a specific geometry and that this geometry is approached through the application of scale, in terms of the density of the generalizing machine. In turn, it is precisely the technics of our time that consistently and brutishly yield toward scale that win against the generalization problem. Transformers dominate because they scale. Subcomponents that proliferate along the transformer stack persist across independently developed architectures because they solve subproblems of generalization (encoding position, stabilizing training dynamics, managing attention distributions) in ways that remain robust as scale increases. GPU-accelerated matrix multiplication dominates because generalization at scale requires massive parallelism of a particular kind, as does the unification of GPUs into massively- and collectively-served virtual machines within datacenters.#data, #generality, #statistical model, #transformer, #information theory
What the result converges toward is not general intelligence in the sense of unlimited capacity but generic representation in the Brattonian sense: the common geometry that any system solving toward mathematical generalization at scale must approximate. The structure of generalization rewards architectures and representations that approximate the constraint, punishing those that do not, and those representations that approximate these constraints will by necessity bear the mark of scalability. The technical monoculture, the representational convergence, and the infrastructural homogeneity are three aspects of the same phenomenon: selection toward the generic under pressure from the scaling laws that the generic itself determines. #convergence, #Intelligence, #generative, #generic pastness
It seems possible there are only so many ways to solve the problem of generalization as we understand it, and furthermore that every system that solves it must approximate the same geometry (including language). If that’s the case, the implications could extend deep into the human world, into the media through which we articulate ourselves, the cultures through which we differentiate, the histories through which we narrate our own exceptionalism.#statistical model, #transformer
Limit-World
It’s January 2026. The post-’45 and then post-’89 order, the internationalist cosplay of human rights frameworks, free trade agreements, institutional convergence, the Davos imaginary, the biennale circuit, the cosmopolitan pretension that cultures would slowly harmonize around a shared set of values administered by a benevolent and multiculti-pilled West, is showing its ass. The same, age-old hemispherical tectonic folds wrinkle up geopolitical relations. Great powers stop pretending they’re not at war. Social institutions tear themselves up for shell metal and copper wiring. Empire continues its endless, lazy, stupid, autotelic crawl.#multipolar
The postwar universalism that sustained decades of ‘globalist order’ was always a particular in disguise. And that disguise wasn’t especially resilient. Throughout postcolonial critique, feminist critique, and Marxist critique, there was already a visible consensus that what presented itself as universal was in fact Western, male, white, colonial, a specific cultural formation masquerading as the view from nowhere. The critique was correct. The unmasking of the particular hiding within the universal was the central theoretical gesture of the last fifty years, and it was a gesture that rightfully won.#globalization, #universalism, #Western, #colonialism
But it is a gesture that, beside itself, continued beyond its scope to make its own axiomatic claims about the real. There is no universal syntactical structure, only grammars inflected by power. There is no concept of reason apart from local rationalities embedded in specific experiences of life. Time is not a physical dimension but a phenomenological potential. Meaning does not inhere in structures waiting to be discovered, but is instead produced, contested, negotiated, always from somewhere, always by someone, always for some purpose. There is no view from nowhere, because there is no nowhere from which to view. And, in turn, the constructive act – be it social, political, or creative – crumbled into smaller and smaller positions, collapsing scales into only the most phenomenologically immediate domains, in a way that so gorgeously and tenderly convolved the universal as particular into its dialectical mirror, providing the intimacy of deathcare for its globalist and neoliberal shadow self. #meaning, #information theory, #universalism
But this isn’t a proper death – the false universal, that is the universal as particular, and its negation, the cult of the particular, simply cross a threshold into their transparent social and political expressions. The ‘global order’ (the false global under the sign of the West) simply refracts into hemispherical particulars, particulars that were always there despite temporary subordination to larger-scale hegemony. This moment of apparent shattering is only a moment of intensification, wherein the particular is unmasked as such and recrystallizes into new affectations of false universality.#universalism, #globalization
There is an opportunity here, but it won’t feel good or natural after so much bake-time in hot crit. A different universalism is disclosing itself, this time not so much through philosophical argument but through technical pressure, through constraint, through the morphological constants that lurk in the corners of our vision, our speech, our abilities to reason. And it’s a painful universalism, one that requires us to disabuse our regional histories of any pretension toward the exceptional. #universalism, #globalization
The vehicles through which humans articulate themselves as such – the social, the political, the cultural – are duly implicated in this incoming universalism. The first compromise is with the technical medium of these articulations, which is revealing itself to be resistant (if not immune) to the effects of its social and cultural reuptake. You can impose governance layers and filters, you can creatively ‘repurpose’ and ‘glitch’ the technical medium, but the feedback loop fails to complete as it encounters a major scalar gap. Instead, feedback compounds in one direction, where those practices that resonate with the attractors of the medium continue to aggregate and grow, while those that do not by virtue of the gap lose their legitimacy as articulations of the human. Moreover, the surface area through which social and cultural activity makes contact with the technical medium is simultaneously getting smaller while bizarrely also becoming more transparent in terms of its source, e.g. its code, its mathematics. The consequence is almost a truism at this point, given the frequency of its observation – humans are articulating themselves as increasingly individually differentiated through increasingly dedifferentiated media. The result is precisely what one would expect, instead of humans organizing themselves as truly ontologically distinct, they are rather organizing themselves quite literally into tensor-space, into an embedding. For example – the art of individual differentiation is, in aggregate, a matrix – and the matrix, meaningfully, is not only isomorphically consistent across translations within regional artistic practices, it is also isomorphically consistent across far, far broader transformations.#universalism, #local maximum, #scale
So we arrive at the second compromise, which is the more substantial one. As we articulate ourselves through the incomplete loop described above, the medium of our social and cultural expression is revealing itself as finite, as a dead end, a local minimum with steep hills on all sides. Our first encounter with a limit of this nature is nothing less than a crisis of history, as history becomes legible as a refraction pattern upon a limit. The world spontaneously fills with translucent giants. #history, #generic pastness
We are in the middle of another great humiliation, this time not Copernican or Darwinian, but one delivered with the sneer of Reza Negarestani: we are not at the center of the story of thought. It’s not that the thing that we are all building the same way is intelligence as such, hardly. The thing we are all building the same way is arguably not even itself intelligent, though that question dissolves quickly into awful conversations. But the thing we are all building the same way is a concentrated effort toward the production of generalized solutions, and in so doing we discover that the problem of generalization has a structure, our structure, and that structure constrains solutions, those constraints are potentially isomorphic across language and even modality and in so doing gather us into an all-world thing at an unprecedented level, “a universal wave that erases the self-portrait of man drawn in sand,” (Negarestani 2014) an all-world thing that bears the sign of the generic, the normal distribution – which is, tragically, axiomatically the sign of difference itself. #Intelligence, #universalism, #worldmaking
The only social project of the limit-world that seems worth doing is that which begins with the scalar contraction of humanity as a starting point – as a thing which understands itself as contingent and reworkable, as a thing that moves within a community of giants – followed by a subsequent expansion of the human into the human-at-scale – a thing that must now set forth as a whole – and to then, with lessons learned and the largest sacrifice made, go about building the next human.#worldmaking



